[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108843#comment-14108843 ] Chengxiang Li commented on HIVE-7799: - Depends on the implementation of {{ResultIterator.hasNext()}}, it is designed to be a lazy iterator as it only try to call {{processNextRecord()}} while RowContainer is empty, but RowContainer does not support add more rows after already read as mentioned in previous comments. Here is what happens while different queries is executed: # For Map only job, it write map output into file directly, no need Collector in this case. # For Map Reduce job with GroupByOperator, {{HiveBaseFunctionResultList.collect()}} is triggered by {{closeRecordProcessor()}}, which is beyond the lazy-computing logic, so the ResultIterator does not do lazy computing in this case. # For Map Reduce job without GroupByOperator(like cluster by queries), ResultIterator do lazy computing, and it clear RowContainer each time befor call {{processNextRecord()}}. While read/write HiveBaseFunctionResultList in the same thread, access progress of RowContainer is like .clear()-addRow()-first()-clear()-addRow()-first().. so it won't violate RowContainer's access rule. But with mutli threads to read/write HiveBaseFunctionResultList, as the ScriptOperator does which venki mentioned above, it would definitely hit this JIRA issue. In my opinion, there are 2 solutions: # remove ResultIterator lazy computing feature as patch1 does. # implement a RowConatiner-like class, which support current RowContainer features. it also need to be thread-safe, and support add row after {{first()}} is already called. The second solution is quite complex, it may introduce performance degrade after support thread-safe access and write-after-read, compare with the performance upgrade of lazy-computing support, it's hardly to say whether it's worthy or not now. So I suggest we take the first solution to fix this issue, and left the possible optimization to milestone 4. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109246#comment-14109246 ] Venki Korukanti commented on HIVE-7799: --- [~chengxiang li] Your plan sounds good. Lets log a JIRA to enable lazy computing and we will revisit in milestone 4. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109398#comment-14109398 ] Hive QA commented on HIVE-7799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12664112/HIVE-7799.3-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6253 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-92/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12664112 TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109679#comment-14109679 ] Brock Noland commented on HIVE-7799: Hey guys, I created HIVE-7873 to track the improvement in M4. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, HIVE-7799.3-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108598#comment-14108598 ] Brock Noland commented on HIVE-7799: When resolved me might consider adding * orc_merge1.q * orc_merge2.q * orc_merge3.q * orc_merge4.q from HIVE-7792. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106708#comment-14106708 ] Hive QA commented on HIVE-7799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12663637/HIVE-7799.2-spark.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5980 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-79/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12663637 TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106868#comment-14106868 ] Venki Korukanti commented on HIVE-7799: --- I think with the v2 patch we need unbounded memory as we store the results in Queue and sometime a single input record could generate more than one record (UDTF) or some operators (such as group by) flush after processing many input records. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106874#comment-14106874 ] Venki Korukanti commented on HIVE-7799: --- Let me look at the RowContainer and see if we can modify/extend it to support read/write like a queue with a persistent support. As far as I see we need persistent support. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107106#comment-14107106 ] Chengxiang Li commented on HIVE-7799: - Thanks,[~venki387], it seems i miss something here, group by does need a persistent storage support here. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107139#comment-14107139 ] Venki Korukanti commented on HIVE-7799: --- Whenever {{ResultIterator.hasNext()}} or {{ResultIterator.next()}} is called we first serve records from RowContainer until all records in RowContainer are consumed. If there are no more records in RowContainer then we clear RowContainer and call {{processNextRecord}} or {{closeRecordProcessor}} in {{ResultIterator.hasNext()}} to get the next output record(s). So we start adding records only when RowContainer is empty (or cleared). I am trying to understand how we got into the situation where we are trying to write after reading has started. One scenario I can think of is if Spark has two threads like in producer-consumer. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107462#comment-14107462 ] Venki Korukanti commented on HIVE-7799: --- ScriptOperator is spawning a separate thread which is adding the records to collector. Iterator thread is trying to read from RowContainer while ScriptOperator spawned thread is adding the records. There may be other operators that may spawn threads for processing. Looks like we need a synchronized queue with persistence support. TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107466#comment-14107466 ] Brock Noland commented on HIVE-7799: MapDB might offer something we use or wrap: https://github.com/jankotek/mapdb TRANSFORM failed in transform_ppr1.q[Spark Branch] -- Key: HIVE-7799 URL: https://issues.apache.org/jira/browse/HIVE-7799 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Labels: Spark-M1 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch Here is the exception: {noformat} 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) - Exception in task 0.0 in stage 1.0 (TID 0) java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) {noformat} Basically, the cause is that RowContainer is misused(it's not allowed to write once someone read row from it), i'm trying to figure out whether it's a hive issue or just in hive on spark mode. -- This message was sent by Atlassian JIRA (v6.2#6252)