[jira] [Updated] (HIVE-26739) When kerberos is enabled, hiveserver2 error connecting metastore: No valid credentials provided

2022-11-15 Thread weiliang hao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

weiliang hao updated HIVE-26739:

Description: 
If the environment variable HADOOP_USER_NAME exists, hiveserver2 error 
connecting metastore: No valid credentials provided.

There is a problem with the getUGI method of the 
org.apache.hadoop.hive.shims.Utils class to obtain the UGI. It should be added 
to determine whether 'UserGroupInformation IsSecurityEnabled () `. If it is 
true, it returns' UserGroupInformation GetCurrentUser() `. If it is false, the 
user name is obtained from the environment variable HADOOP_USER_NAME to create 
a UGI

 
{code:java}
2022-11-15T15:41:06,971 ERROR [HiveServer2-Background-Pool: Thread-36] 
transport.TSaslTransport: SASL negotiation failure
javax.security.sasl.SaslException: GSS initiate failed
        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
 ~[?:1.8.0_144]
        at 
org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) 
~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:51)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:48)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at java.security.AccessController.doPrivileged(Native Method) 
~[?:1.8.0_144]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_144]
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
 ~[hadoop-common-3.2.1.jar:?]
        at 
org.apache.hadoop.hive.metastore.security.TUGIAssumingTransport.open(TUGIAssumingTransport.java:48)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:516)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:224)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:94)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_144]
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_144]
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_144]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_144]
        at 
org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84) 
~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:95)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119)
 ~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:4306) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4374) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:4354) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1662) 
~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1651) 
~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.ql.exec.DDLTask.showTablesOrViews(DDLTask.java:2824) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:509) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
~[hive-exec-3.1.3.jar:3.1.3]
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709) 
~[hive-exec-3.1.3.jar:3.1.3]
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703) 
~[hive-exec-3.1.3.jar:3.1.3]
        

[jira] [Commented] (HIVE-26631) Remove unused variable requestTimeout and beBackoffSlotLength in the initServer method of the ThriftBinaryCLIService class

2022-11-14 Thread weiliang hao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17633751#comment-17633751
 ] 

weiliang hao commented on HIVE-26631:
-

[~zabetak]   In this issuse(https://issues.apache.org/jira/browse/THRIFT-5297 
),the parameters requestTimeout and beBackoffSlotLength are removed。Because it 
adds a lot of complexity to the code and doesn't make a whole lot of sense. 

> Remove unused variable requestTimeout and beBackoffSlotLength in the 
> initServer method of the ThriftBinaryCLIService class
> --
>
> Key: HIVE-26631
> URL: https://issues.apache.org/jira/browse/HIVE-26631
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: weiliang hao
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Upgrade thrift from 0.13.0 to 0.14.1 in 
> issuse(https://issues.apache.org/jira/browse/HIVE-25098), The settings of 
> requestTimeout and beBackoffSlotLength are removed for TThreadPoolServer.Args 
> objects. However, variable requestTimeout and beBackoffSlotLength unused, 
> better to clean up them and HiveConf:
>  * hive.server2.thrift.exponential.backoff.slot.length
>  * hive.server2.thrift.login.timeout



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26620) Remove unused imports for ThriftBinaryCLIService class

2022-10-11 Thread weiliang hao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17615647#comment-17615647
 ] 

weiliang hao commented on HIVE-26620:
-

I will solve it

> Remove unused imports for ThriftBinaryCLIService class
> --
>
> Key: HIVE-26620
> URL: https://issues.apache.org/jira/browse/HIVE-26620
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0-alpha-1
>Reporter: weiliang hao
>Priority: Major
>
> Some imports are not used in ThriftBinaryCLIService class,better clean up 
> them:
>  * import com.google.common.base.Splitter;
>  * import com.google.common.collect.Sets;



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-25795) [CVE-2021-44228] Update log4j2 version to 2.15.0

2021-12-15 Thread hao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460399#comment-17460399
 ] 

hao commented on HIVE-25795:


 the [hive1. X] version is not affected

> [CVE-2021-44228] Update log4j2 version to 2.15.0
> 
>
> Key: HIVE-25795
> URL: https://issues.apache.org/jira/browse/HIVE-25795
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Nikhil Gupta
>Assignee: Nikhil Gupta
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> [Worst Apache Log4j RCE Zero day Dropped on Internet - Cyber 
> Kendra|https://www.cyberkendra.com/2021/12/worst-log4j-rce-zeroday-dropped-on.html]
> Vulnerability:
> https://github.com/apache/logging-log4j2/commit/7fe72d6



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-10569) Hive CLI gets stuck when hive.exec.parallel=true; and some exception happens during SessionState.start

2021-08-03 Thread hao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao updated HIVE-10569:
---
Description: 
_强调文本_The CLI gets stuck in the loop in [DriverContext.pollFinished | 
https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java#L108]
 when some {{TaskRunner}} which has completed has not been marked as 
non-running.
This can happen when there is exception in [SessionState.start | 
https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java#L74]
 which is called from {{TaskRunner.run}}.

This happened with us when we were running with {{hive.exec.parallel=true}}, 
{{hive.execution.engine=tez}} and Tez wasn't correctly setup.
In this case the CLI printed the exception and then got hung (No prompt.)

A simple fix is to call {{result.setRunning(false);}} in the {{finally}} block 
of {{TaskRunner.run}}

  was:
The CLI gets stuck in the loop in [DriverContext.pollFinished | 
https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java#L108]
 when some {{TaskRunner}} which has completed has not been marked as 
non-running.
This can happen when there is exception in [SessionState.start | 
https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java#L74]
 which is called from {{TaskRunner.run}}.

This happened with us when we were running with {{hive.exec.parallel=true}}, 
{{hive.execution.engine=tez}} and Tez wasn't correctly setup.
In this case the CLI printed the exception and then got hung (No prompt.)

A simple fix is to call {{result.setRunning(false);}} in the {{finally}} block 
of {{TaskRunner.run}}


> Hive CLI gets stuck when hive.exec.parallel=true; and some exception happens 
> during SessionState.start
> --
>
> Key: HIVE-10569
> URL: https://issues.apache.org/jira/browse/HIVE-10569
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.1, 1.1.0, 1.2.0
>Reporter: Rohit Agarwal
>Assignee: Rohit Agarwal
>Priority: Critical
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-10569.patch
>
>
> _强调文本_The CLI gets stuck in the loop in [DriverContext.pollFinished | 
> https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java#L108]
>  when some {{TaskRunner}} which has completed has not been marked as 
> non-running.
> This can happen when there is exception in [SessionState.start | 
> https://github.com/apache/hive/blob/release-1.1.0/ql/src/java/org/apache/hadoop/hive/ql/exec/TaskRunner.java#L74]
>  which is called from {{TaskRunner.run}}.
> This happened with us when we were running with {{hive.exec.parallel=true}}, 
> {{hive.execution.engine=tez}} and Tez wasn't correctly setup.
> In this case the CLI printed the exception and then got hung (No prompt.)
> A simple fix is to call {{result.setRunning(false);}} in the {{finally}} 
> block of {{TaskRunner.run}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25418) The hive server is rammed to death when the job on the yarn ends, and there is no error log

2021-08-03 Thread hao (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17392190#comment-17392190
 ] 

hao commented on HIVE-25418:


 !screenshot-1.png! 
When the task executes to the last move data, the task is rammed to death 
without any log

> The hive server is rammed to death when the job on the yarn ends, and there 
> is no error log
> ---
>
> Key: HIVE-25418
> URL: https://issues.apache.org/jira/browse/HIVE-25418
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: hao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The hive server is rammed to death when the job on the yarn ends, and there 
> is no error log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25418) The hive server is rammed to death when the job on the yarn ends, and there is no error log

2021-08-03 Thread hao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao updated HIVE-25418:
---
Attachment: screenshot-1.png

> The hive server is rammed to death when the job on the yarn ends, and there 
> is no error log
> ---
>
> Key: HIVE-25418
> URL: https://issues.apache.org/jira/browse/HIVE-25418
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: hao
>Priority: Major
> Attachments: screenshot-1.png
>
>
> The hive server is rammed to death when the job on the yarn ends, and there 
> is no error log



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25023) Optimize the operation of reading jar stream to avoid stream closed exception

2021-04-16 Thread hao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao updated HIVE-25023:
---
Issue Type: Improvement  (was: Bug)

> Optimize the operation of reading jar stream to avoid stream closed exception
> -
>
> Key: HIVE-25023
> URL: https://issues.apache.org/jira/browse/HIVE-25023
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 3.1.2
>Reporter: hao
>Priority: Major
>
> Optimize the operation of reading jar stream to avoid stream closed exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24115) Kryo's instantiation strategy should use the DefaultInstantiatorStrategy instead of the dangerous StdInstantiatorStrategy

2020-09-02 Thread hao (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hao updated HIVE-24115:
---
Description: 
DefaultInstantiatorStrategy is the recommended way of creating objects with 
Kryo. It runs constructors just like would be done with Java code. Alternative, 
extralinguistic mechanisms can also be used to create objects. The 
[Objenesis|http://objenesis.org/] StdInstantiatorStrategy uses JVM specific 
APIs to create an instance of a class without calling any constructor at all. 
Using this is dangerous because most classes expect their constructors to be 
called. Creating the object by bypassing its constructors may leave the object 
in an uninitialized or invalid state. Classes must be designed to be created in 
this way.

Kryo can be configured to try DefaultInstantiatorStrategy first, then fallback 
to StdInstantiatorStrategy if necessary

like :

kryo.setInstantiatorStrategy(new DefaultInstantiatorStrategy(new 
StdInstantiatorStrategy()));

  was:DefaultInstantiatorStrategy is the recommended way of creating objects 
with Kryo. It runs constructors just like would be done with Java code. 
Alternative, extralinguistic mechanisms can also be used to create objects. The 
[Objenesis|http://objenesis.org/] StdInstantiatorStrategy uses JVM specific 
APIs to create an instance of a class without calling any constructor at all. 
Using this is dangerous because most classes expect their constructors to be 
called. Creating the object by bypassing its constructors may leave the object 
in an uninitialized or invalid state. Classes must be designed to be created in 
this way.


> Kryo's instantiation strategy should use the DefaultInstantiatorStrategy  
> instead of the dangerous StdInstantiatorStrategy
> --
>
> Key: HIVE-24115
> URL: https://issues.apache.org/jira/browse/HIVE-24115
> Project: Hive
>  Issue Type: Wish
>Reporter: hao
>Priority: Minor
>
> DefaultInstantiatorStrategy is the recommended way of creating objects with 
> Kryo. It runs constructors just like would be done with Java code. 
> Alternative, extralinguistic mechanisms can also be used to create objects. 
> The [Objenesis|http://objenesis.org/] StdInstantiatorStrategy uses JVM 
> specific APIs to create an instance of a class without calling any 
> constructor at all. Using this is dangerous because most classes expect their 
> constructors to be called. Creating the object by bypassing its constructors 
> may leave the object in an uninitialized or invalid state. Classes must be 
> designed to be created in this way.
> Kryo can be configured to try DefaultInstantiatorStrategy first, then 
> fallback to StdInstantiatorStrategy if necessary
> like :
> kryo.setInstantiatorStrategy(new DefaultInstantiatorStrategy(new 
> StdInstantiatorStrategy()));



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-21640) Do not create (HD)FS directory when creating a table stored and managed by other systems

2019-04-21 Thread Hao Hao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-21640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hao Hao updated HIVE-21640:
---
Description: When creating a table in HMS, a (HD)FS directory will be 
[created|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1983]
 even for the table even for table stored and managed by other systems 
specified by Hive Storage Handler. It is ideal to skip directory creation in 
such case (same for dropping a table).  (was: When creating a table in HMS, a 
(HD)FS directory will be created even for the table even for table stored and 
managed by other systems specified by Hive Storage Handler. It is ideal to skip 
directory creation in such case (same for dropping a table).)

> Do not create (HD)FS directory when creating a table stored and managed by 
> other systems
> 
>
> Key: HIVE-21640
> URL: https://issues.apache.org/jira/browse/HIVE-21640
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 3.1.1, 2.3.4
>Reporter: Hao Hao
>Priority: Major
>
> When creating a table in HMS, a (HD)FS directory will be 
> [created|https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L1983]
>  even for the table even for table stored and managed by other systems 
> specified by Hive Storage Handler. It is ideal to skip directory creation in 
> such case (same for dropping a table).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-17565) NullPointerException occurs when hive.optimize.skewjoin and hive.auto.convert.join are switched on at the same time

2017-09-21 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16175715#comment-16175715
 ] 

Xin Hao commented on HIVE-17565:


Hive on MR. Thanks.

> NullPointerException occurs when hive.optimize.skewjoin and 
> hive.auto.convert.join are switched on at the same time
> ---
>
> Key: HIVE-17565
> URL: https://issues.apache.org/jira/browse/HIVE-17565
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Xin Hao
>Assignee: liyunzhang_intel
>
> (A)NullPointerException occurs when hive.optimize.skewjoin and 
> hive.auto.convert.join are switched on at the same time.
> Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.
> (B)Hive Version:
> Found on Apache Hive 1.2.1
> (C)Workload:
> (1)TPCx-BB Q19
> (2) A small case as below,which was actually simplified from Q19:
> SELECT *
> FROM store_returns sr,
> (
>   SELECT d1.d_date_sk
>   FROM date_dim d1, date_dim d2
>   WHERE d1.d_week_seq = d2.d_week_seq
> ) sr_dateFilter
> WHERE sr.sr_returned_date_sk = d_date_sk;
> (D)Exception Error Message:
> Error: java.lang.RuntimeException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
> at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17565) NullPointerException occurs when hive.optimize.skewjoin and hive.auto.convert.join are switched on at the same time

2017-09-20 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-17565:
---
Description: 
(A)NullPointerException occurs when hive.optimize.skewjoin and 
hive.auto.convert.join are switched on at the same time.
Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.

(B)Hive Version:
Found on Apache Hive 1.2.1

(C)Workload:
(1)TPCx-BB Q19
(2) A small case as below,which was actually simplified from Q19:

SELECT *
FROM store_returns sr,
(
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk;


(D)Exception Error Message:
Error: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more






  was:
NullPointerException occurs when hive.optimize.skewjoin and 
hive.auto.convert.join are switched on at the same time.
Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.

Hive Version:
Found on Apache Hive 1.2.1

Workload:
(1)TPCx-BB Q19
(2) A small case as below,which was actually simplified from Q19:

SELECT *
FROM store_returns sr,
(
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk;


Exception Error Message:
Error: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more







> NullPointerException occurs when hive.optimize.skewjoin and 
> hive.auto.convert.join are switched on at the same time
> ---
>
> Key: HIVE-17565
> URL: https://issues.apache.org/jira/browse/HIVE-17565
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Xin Hao
>
> (A)NullPointerException occurs when hive.optimize.skewjoin and 
> hive.auto.convert.join are switched on at the same time.
> Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.
> (B)Hive Version:
> Found on Apache Hive 1.2.1

[jira] [Updated] (HIVE-17565) NullPointerException occurs when hive.optimize.skewjoin and hive.auto.convert.join are switched on at the same time

2017-09-20 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-17565:
---
Affects Version/s: 1.2.1
  Description: 
NullPointerException occurs when hive.optimize.skewjoin and 
hive.auto.convert.join are switched on at the same time.
Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.

Hive Version:
Found on Apache Hive 1.2.1

Workload:
(1)TPCx-BB Q19
(2) A small case as below,which was actually simplified from Q19:

SELECT *
FROM store_returns sr,
(
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk;


Exception Error Message:
Error: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more






  was:
NullPointerException occurs when hive.optimize.skewjoin and 
hive.auto.convert.join are switched on at the same time.
Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.

Workload:
(1)TPCx-BB Q19
(2) A small case as below,which was actually simplified from Q19:

SELECT *
FROM store_returns sr,
(
  SELECT d1.d_date_sk
  FROM date_dim d1, date_dim d2
  WHERE d1.d_week_seq = d2.d_week_seq
) sr_dateFilter
WHERE sr.sr_returned_date_sk = d_date_sk;


Exception Error Message:
Error: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:194)
at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:223)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:490)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 8 more







> NullPointerException occurs when hive.optimize.skewjoin and 
> hive.auto.convert.join are switched on at the same time
> ---
>
> Key: HIVE-17565
> URL: https://issues.apache.org/jira/browse/HIVE-17565
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Xin Hao
>
> NullPointerException occurs when hive.optimize.skewjoin and 
> hive.auto.convert.join are switched on at the same time.
> Could pass when hive.optimize.skewjoin=true and hive.auto.convert.join=false.
> Hive Version:
> Found on Apache Hive 1.2.1
> Workload:
> (1)TPCx-BB 

[jira] (HIVE-15761) ObjectStore.getNextNotification could return an empty NotificationEventResponse causing TProtocolException

2017-01-30 Thread Hao Hao (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Hao Hao updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Hive /  HIVE-15761 
 
 
 
  ObjectStore.getNextNotification could return an empty NotificationEventResponse causing TProtocolException   
 
 
 
 
 
 
 
 
 

Change By:
 
 Hao Hao 
 
 
 

Summary:
 
 ObjectStore.getNextNotification could return an empty NotificationEventResponse  causing TProtocolException   
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Updated] (HIVE-13634) Hive-on-Spark performed worse than Hive-on-MR, for queries with external scripts

2016-04-27 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-13634:
---
Description: 
Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
scripts.

For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
external scripts to handle reduce tasks. We found that for these 3 queries 
Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
with external scripts’ seems a performance optimization opportunity.

The following shows the Q2/Q3/Q4 test result on 8-worker-node cluster with 
TPCx-BB 3TB data size.

TPCx-BB Query 2
(1)Hive-on-MR 
Total Query Execution Time (sec): 2172.180
Execution Time of External Scripts (sec): 736
(2)Hive-on-Spark
Total Query Execution Time (sec): 2283.604
Execution Time of External Scripts (sec): 1197

TPCx-BB Query 3
(1)Hive-on-MR 
Total Query Execution Time (sec): 1070.632
Execution Time of External Scripts (sec): 513
(2)Hive-on-Spark
Total Query Execution Time (sec): 1287.679
Execution Time of External Scripts (sec): 919

TPCx-BB Query 4
(1)Hive-on-MR 
Total Query Execution Time (sec): 1781.864
Execution Time of External Scripts (sec): 1518
(2)Hive-on-Spark
Total Query Execution Time (sec): 2028.023
Execution Time of External Scripts (sec): 1599

  was:
Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
scripts.

For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
external scripts to handle reduce tasks. We found that for these 3 queries 
Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
with external scripts’ seems a performance optimization opportunity.


> Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
> scripts
> 
>
> Key: HIVE-13634
> URL: https://issues.apache.org/jira/browse/HIVE-13634
> Project: Hive
>  Issue Type: Bug
>Reporter: Xin Hao
>
> Hive-on-Spark performed worse than Hive-on-MR, for queries with external 
> scripts.
> For TPCx-BB Q2/Q3/Q4, they are Python Streaming related cases and will call 
> external scripts to handle reduce tasks. We found that for these 3 queries 
> Hive-on-Spark shows lower performance than Hive-on-MR when processing reduce 
> tasks with external (Python) scripts. So ‘Improve HoS performance for queries 
> with external scripts’ seems a performance optimization opportunity.
> The following shows the Q2/Q3/Q4 test result on 8-worker-node cluster with 
> TPCx-BB 3TB data size.
> TPCx-BB Query 2
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 2172.180
> Execution Time of External Scripts (sec): 736
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 2283.604
> Execution Time of External Scripts (sec): 1197
> TPCx-BB Query 3
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 1070.632
> Execution Time of External Scripts (sec): 513
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 1287.679
> Execution Time of External Scripts (sec): 919
> TPCx-BB Query 4
> (1)Hive-on-MR 
> Total Query Execution Time (sec): 1781.864
> Execution Time of External Scripts (sec): 1518
> (2)Hive-on-Spark
> Total Query Execution Time (sec): 2028.023
> Execution Time of External Scripts (sec): 1599



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when ve

2016-03-21 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205599#comment-15205599
 ] 

Xin Hao commented on HIVE-13277:


Hi, Kapil & Rui,
TPCx-BB query2 is only an example here. Many queries in TPCx-BB failed due to 
similar reason. 

> Exception "Unable to create serializer 
> 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " 
> occurred during query execution on spark engine when vectorized execution is 
> switched on
> -
>
> Key: HIVE-13277
> URL: https://issues.apache.org/jira/browse/HIVE-13277
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Hive Version: Apache Hive 2.0.0
> Spark Version: Apache Spark 1.6.0
>Reporter: Xin Hao
>
> Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on :
> Found during TPCx-BB query2 execution on spark engine when vectorized 
> execution is switched on:
> (1) set hive.vectorized.execution.enabled=true; 
> (2) set hive.vectorized.execution.reduce.enabled=true; (default value for 
> Apache Hive 2.0.0)
> It's OK for spark engine when hive.vectorized.execution.enabled is switched 
> off:
> (1) set hive.vectorized.execution.enabled=false;
> (2) set hive.vectorized.execution.reduce.enabled=true;
> For MR engine, the query could pass and no exception occurred when vectorized 
> execution is either switched on or switched off.
> Detail Error Message is below:
> {noformat}
> 2016-03-14T10:09:33,692 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO 
> spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 
> bytes
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN 
> scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): 
> java.lang.RuntimeException: Failed to load plan: 
> hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml:
>  org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - Serialization trace:
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - childOperators 
> (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) - reducer 
> (org.apache.hadoop.hive.ql.plan.ReduceWork)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451)
> 2016-03-14T10:09:33,818 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192)
> 2016-03-14T10:09:33,819 INFO  [stderr-redir-1]: client.SparkClientImpl 
> (SparkClientImpl.java:run(593)) -at 
> org.apache.spark.api.java.JavaRDDLike$$anonfu

[jira] [Updated] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-03-14 Thread Xin Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Hao updated HIVE-13278:
---
Description: 
Many redundant 'File not found' messages appeared in container log during query 
execution with Hive on Spark.
Certainly, it doesn't prevent the query from running successfully. So mark it 
as Minor currently.

Error message example:
16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
/tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

  was:
Many redundant 'File not found' messages appeared in container log during query 
execution with Hive on Spark

Error message example:
16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
/tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)


> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL:

[jira] [Commented] (HIVE-13278) Many redundant 'File not found' messages appeared in container log during query execution with Hive on Spark

2016-03-14 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194503#comment-15194503
 ] 

Xin Hao commented on HIVE-13278:


Yes, this problem doesn't prevent the query from running successfully.

> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> 
>
> Key: HIVE-13278
> URL: https://issues.apache.org/jira/browse/HIVE-13278
> Project: Hive
>  Issue Type: Bug
> Environment: Hive on Spark engine
> Found based on :
> Apache Hive 2.0.0
> Apache Spark 1.6.0
>Reporter: Xin Hao
>Priority: Minor
>
> Many redundant 'File not found' messages appeared in container log during 
> query execution with Hive on Spark
> Error message example:
> 16/03/14 01:45:06 INFO exec.Utilities: File not found: File does not exist: 
> /tmp/hive/hadoop/2d378538-f5d3-493c-9276-c62dd6634fb4/hive_2016-03-14_01-44-16_835_623058724409492515-6/-mr-10010/0a6d0cae-1eb3-448c-883b-590b3b198a73/reduce.xml
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1932)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1873)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1853)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1825)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:565)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:87)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:363)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12650) Increase default value of hive.spark.client.server.connect.timeout to exceeds spark.yarn.am.waitTime

2016-03-14 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192912#comment-15192912
 ] 

Xin Hao commented on HIVE-12650:


Encountered the issue based on Apache Hive 2.0.0.

BTW, suggest to change the issue title to a new one to avoid confusion. E.g. 
"Spark-submit is killed when Hive times out.  Killing spark-submit doesn't 
cancel AM request. When AM is finally launched, it tries to connect back to 
Hive and gets refused.". Thanks.

> Increase default value of hive.spark.client.server.connect.timeout to exceeds 
> spark.yarn.am.waitTime
> 
>
> Key: HIVE-12650
> URL: https://issues.apache.org/jira/browse/HIVE-12650
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.2.1
>Reporter: JoneZhang
>Assignee: Xuefu Zhang
>
> I think hive.spark.client.server.connect.timeout should be set greater than 
> spark.yarn.am.waitTime. The default value for 
> spark.yarn.am.waitTime is 100s, and the default value for 
> hive.spark.client.server.connect.timeout is 90s, which is not good. We can 
> increase it to a larger value such as 120s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12091) HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch]

2015-10-12 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954498#comment-14954498
 ] 

Xin Hao commented on HIVE-12091:


Hi, Rui Li,  I'v tried the patch and it works for my workload now. Thanks.

> HiveException (Failed to close AbstractFileMergeOperator) occurs during 
> loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark 
> Branch]
> ---
>
> Key: HIVE-12091
> URL: https://issues.apache.org/jira/browse/HIVE-12091
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>Assignee: Rui Li
> Attachments: HIVE-12091.1-spark.patch
>
>
> This issue occurs when hive.merge.sparkfiles is set to true. And can be 
> workaround by setting hive.merge.sparkfiles to false.
> BTW, we did a local experiment to run the case with MR engine (set 
> hive.merge.mapfiles=true; set hive.merge.mapredfiles=true;), it can pass.
> (1)Component Version:
> -- Hive Spark Branch 70eeadd2f019dcb2e301690290c8807731eab7a1  +  Hive-11473 
> patch (HIVE-11473.3-spark.patch)  ---> This is to support Spark 1.5 for Hive 
> on Spark
> -- Spark 1.5.1
> (2)Case used:
> -- Big-Bench  Data Load (load data from HDFS to Hive warehouse, scored as ORC 
> format). The related HiveQL:
> {noformat}
> DROP TABLE IF EXISTS customer_temporary;
> CREATE EXTERNAL TABLE customer_temporary
>   ( c_customer_sk bigint  --not null
>   , c_customer_id string  --not null
>   , c_current_cdemo_skbigint
>   , c_current_hdemo_skbigint
>   , c_current_addr_sk bigint
>   , c_first_shipto_date_skbigint
>   , c_first_sales_date_sk bigint
>   , c_salutation  string
>   , c_first_name  string
>   , c_last_name   string
>   , c_preferred_cust_flag string
>   , c_birth_day   int
>   , c_birth_month int
>   , c_birth_year  int
>   , c_birth_country   string
>   , c_login   string
>   , c_email_address   string
>   , c_last_review_datestring
>   )
>   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
>   STORED AS TEXTFILE LOCATION 
> '/user/root/benchmarks/bigbench_n1t/data/customer'
> ;
> DROP TABLE IF EXISTS customer;
> CREATE TABLE customer
> STORED AS ORC
> AS
> SELECT * FROM customer_temporary
> ;
> {noformat}
> (3)Error/Exception Message:
> {noformat}
> 15/10/12 14:28:38 INFO exec.Utilities: PLAN PATH = 
> hdfs://bhx2:8020/tmp/hive/root/4e145415-d4ea-4751-9e16-ff31edb0c258/hive_2015-10-12_14-28-12_485_2093357701513622173-1/-mr-10005/d891fdec-eacc-4f66-8827-e2b650c24810/map.xml
> 15/10/12 14:28:38 INFO OrcFileMergeOperator: ORC merge file input path: 
> hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
> 15/10/12 14:28:38 INFO OrcFileMergeOperator: Merged stripe from file 
> hdfs://bhx2:8020/user/hive/warehouse/bigbench_n100g.db/.hive-staging_hive_2015-10-12_14-28-12_485_2093357701513622173-1/-ext-10003/01_0
>  [ offset : 3 length: 10525754 row: 247500 ]
> 15/10/12 14:28:38 INFO spark.SparkMergeFileRecordHandler: Closing Merge 
> Operator OFM
> 15/10/12 14:28:38 ERROR executor.Executor: Exception in task 1.0 in stage 1.0 
> (TID 4)
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed to close AbstractFileMergeOperator
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMergeFileRecordHandler.close(SparkMergeFileRecordHandler.java:115)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:58)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:106)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
>   at 
> org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:118)
>   at 
> org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
>   at 
> org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:1984)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:88)
>   at org.apache.spark.executor.Executor$TaskRunner.run(E

[jira] [Resolved] (HIVE-10532) SpecificMutableRow doesn't handle Date Type correctly

2015-04-29 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao resolved HIVE-10532.
--
Resolution: Invalid

Oops.

> SpecificMutableRow doesn't handle Date Type correctly
> -
>
> Key: HIVE-10532
> URL: https://issues.apache.org/jira/browse/HIVE-10532
> Project: Hive
>  Issue Type: Bug
>Reporter: Cheng Hao
>
> {code}
>   test("test DATE types in cache") {
> val rows = TestSQLContext.jdbc(urlWithUserAndPass, 
> "TEST.TIMETYPES").collect()
> TestSQLContext.jdbc(urlWithUserAndPass, 
> "TEST.TIMETYPES").cache().registerTempTable("mycached_date")
> val cachedRows = sql("select * from mycached_date").collect()
> assert(rows(0).getAs[java.sql.Date](1) === 
> java.sql.Date.valueOf("1996-01-01"))
> assert(cachedRows(0).getAs[java.sql.Date](1) === 
> java.sql.Date.valueOf("1996-01-01"))
>   }
> {code}
> {panel}
> java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.MutableAny cannot be cast to 
> org.apache.spark.sql.catalyst.expressions.MutableInt
>   at 
> org.apache.spark.sql.catalyst.expressions.SpecificMutableRow.getInt(SpecificMutableRow.scala:252)
>   at 
> org.apache.spark.sql.columnar.IntColumnStats.gatherStats(ColumnStats.scala:208)
>   at 
> org.apache.spark.sql.columnar.NullableColumnBuilder$class.appendFrom(NullableColumnBuilder.scala:56)
>   at 
> org.apache.spark.sql.columnar.NativeColumnBuilder.org$apache$spark$sql$columnar$compression$CompressibleColumnBuilder$$super$appendFrom(ColumnBuilder.scala:87)
>   at 
> org.apache.spark.sql.columnar.compression.CompressibleColumnBuilder$class.appendFrom(CompressibleColumnBuilder.scala:78)
>   at 
> org.apache.spark.sql.columnar.NativeColumnBuilder.appendFrom(ColumnBuilder.scala:87)
>   at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:148)
>   at 
> org.apache.spark.sql.columnar.InMemoryRelation$$anonfun$3$$anon$1.next(InMemoryColumnarTableScan.scala:124)
>   at 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:277)
>   at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
>   at org.apache.spark.scheduler.Task.run(Task.scala:64)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>   at java.lang.Thread.run(Thread.java:722)
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9697) Hive on Spark is not as aggressive as MR on map join [Spark Branch]

2015-03-17 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364718#comment-14364718
 ] 

Xin Hao commented on HIVE-9697:
---

Could we consider to use rawDataSize by default (should be safer for most 
scenarios), and add a true/false hive parameter flag so that user could choose 
to use totalSize on demand?

> Hive on Spark is not as aggressive as MR on map join [Spark Branch]
> ---
>
> Key: HIVE-9697
> URL: https://issues.apache.org/jira/browse/HIVE-9697
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We have a finding during running some Big-Bench cases:
> when the same small table size threshold is used, Map Join operator will not 
> be generated in Stage Plans for Hive on Spark, while will be generated for 
> Hive on MR.
> For example, When we run BigBench Q25, the meta info of one input ORC table 
> is as below:
> totalSize=1748955 (about 1.5M)
> rawDataSize=123050375 (about 120M)
> If we use the following parameter settings,
> set hive.auto.convert.join=true;
> set hive.mapjoin.smalltable.filesize=2500;
> set hive.auto.convert.join.noconditionaltask=true;
> set hive.auto.convert.join.noconditionaltask.size=1; (100M)
> Map Join will be enabled for Hive on MR mode, while will not be enabled for 
> Hive on Spark.
> We found that for Hive on MR, the HDFS file size for the table 
> (ContentSummary.getLength(), should approximate the value of ‘totalSize’) 
> will be used to compare with the threshold 100M (smaller than 100M), while 
> for Hive on Spark 'rawDataSize' will be used to compare with the threshold 
> 100M (larger than 100M). That's why MapJoin is not enabled for Hive on Spark 
> for this case. And as a result Hive on Spark will get much lower performance 
> data than Hive on MR for this case.
> When we set  hive.auto.convert.join.noconditionaltask.size=15000; (150M), 
> MapJoin will be enabled for Hive on Spark mode, and Hive on Spark will have 
> similar performance data with Hive on MR by then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-03-04 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348108#comment-14348108
 ] 

Xin Hao commented on HIVE-9659:
---

Hi Rui, verified that Q12 could be passed when hive.optimize.skewjoin set to 
'true', based on your patch HIVE-9659.2-spark.patch. Thanks.

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>Assignee: Rui Li
> Attachments: HIVE-9659.1-spark.patch, HIVE-9659.2-spark.patch
>
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 4093

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-03-02 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344456#comment-14344456
 ] 

Xin Hao commented on HIVE-9659:
---

Hi, Rui, tried to verify this issue based on HIVE-9659.1-spark.patch, and seems 
that the issue still exists. Could you update Big-Bench to latest version to 
have a double check (Q12 has update recently)? Thanks.

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>Assignee: Rui Li
> Attachments: HIVE-9659.1-spark.patch
>
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-03-02 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344318#comment-14344318
 ] 

Xin Hao commented on HIVE-9659:
---

Sure, working on verifying it and will provide the feedback soon. Thanks.

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>Assignee: Rui Li
> Attachments: HIVE-9659.1-spark.patch
>
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
> 15/02/12 01:29:49 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.spark.Sp

[jira] [Commented] (HIVE-9794) java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence

2015-02-26 Thread Xin Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339722#comment-14339722
 ] 

Xin Hao commented on HIVE-9794:
---

Hi, Xuefu, we double checked the issue in another test environment and this 
issue can not be reproduced. So it should be an issue in our local test env. 
Sorry for the troubling and please close it. Thanks.

> java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD 
> FILE .jar' sentence
> -
>
> Key: HIVE-9794
> URL: https://issues.apache.org/jira/browse/HIVE-9794
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We updated our code to the latest revision on Spark Branch  (i.e. 
> fd0f638a8d481a9a98b34d3dd08236d6d591812f) , rebuild and deploy Hive in our 
> cluster and run BigBench case again. Many cases (e.g. Q1, Q2, Q3, Q4, Q8) 
> failed due to a common 'NoSuchMethodError'. The root cause sentence in these 
> queries should be  ‘ADD FILE .jar’.
> Detail error message:
> Exception in thread "main" java.lang.NoSuchMethodError: 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(Lorg/apache/hadoop/hive/ql/session/SessionState$ResourceType;Ljava/util/List;)Ljava/util/List;
> at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:262)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:159)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:370)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:305)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:403)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:419)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)