[jira] [Comment Edited] (SPARK-18879) Spark SQL support for Hive hooks regressed

NITISH SHARMA (Jira) Tue, 25 Feb 2020 14:00:46 -0800


    [ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044931#comment-17044931
 ]


NITISH SHARMA edited comment on SPARK-18879 at 2/25/20 9:59 PM:
----------------------------------------------------------------

I have a similar requirement where i am looking to update LAST_ACCESS_TIME in 
TBLS of Hive metastore whenever any table is accessed through spark. I set this 
below property in hive-site.xml and hive honors it and updates the 
LAST_ACCESS_TIME everytime it is accessed. 

<property>

    <name>hive.exec.pre.hooks</name>

    
<value>org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec</value>

 </property>

However, the same thing i want to achieve using pyspark/spark-shell but its not 
honoring this property of hive hooks. Is there an alternate approach of 
achieving this - 'Update of LAST_ACCESS_TIME in hive metastore on access using 
spark'. 

I passed the property like this - 

spark-sql -e 'set 
spark.hadoop.hive.exec.post.hooks=org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec;select
 * from db.table;'

as well as i put the same property in /etc/spark/conf/hive-site.xml location. 

 

 


was (Author: nitishcse412):
I have a similar requirement where i am looking to update LAST_ACCESS_TIME in 
TBLS of Hive metastore whenever any table is accessed through spark. I set this 
below property in hive-site.xml and hive honors it and updates the 
LAST_ACCESS_TIME everytime it is accessed. 

<property>

    <name>hive.exec.pre.hooks</name>

    
<value>org.apache.hadoop.hive.ql.hooks.UpdateInputAccessTimeHook$PreExec</value>

 </property>

However, the same thing i want to achieve using pyspark/spark-shell but its not 
honoring this property of hive hooks. Is there an alternate approach of 
achieving this - 'Update of LAST_ACCESS_TIME in hive metastore on access using 
spark'. 

 

 

> Spark SQL support for Hive hooks regressed
> ------------------------------------------
>
>                 Key: SPARK-18879
>                 URL: https://issues.apache.org/jira/browse/SPARK-18879
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.2
>            Reporter: Atul Payapilly
>            Priority: Major
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:270)
>     at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
>     at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
>     at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
>     at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
>     at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
>     at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
>     at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
>     at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
>     at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
>     at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:147)
>     at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
>     at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
>     at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
>     at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
>     at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
>     at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
>     at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
>     at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
>     at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>     at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>     at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
>     at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and 
> supported hive hooks. The current code path does not rely on the Hive Driver 
> and support for Hive hooks regressed. This is problematic, for example, there 
> is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18879) Spark SQL support for Hive hooks regressed

Reply via email to