[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108843#comment-14108843
 ] 

Chengxiang Li commented on HIVE-7799:
-

Depends on the implementation of {{ResultIterator.hasNext()}}, it is designed 
to be a lazy iterator as it only try to call {{processNextRecord()}} while 
RowContainer is empty, but RowContainer does not support add more rows after 
already read as mentioned in previous comments. Here is what happens while 
different queries is executed:
# For Map only job, it write map output into file directly, no need Collector 
in this case.
# For Map Reduce job with GroupByOperator, 
{{HiveBaseFunctionResultList.collect()}} is triggered by 
{{closeRecordProcessor()}}, which is beyond the lazy-computing logic, so the 
ResultIterator does not do lazy computing in this case.
# For Map Reduce job without GroupByOperator(like cluster by queries), 
ResultIterator do lazy computing, and it clear RowContainer each time befor 
call {{processNextRecord()}}. While read/write HiveBaseFunctionResultList in 
the same thread, access progress of RowContainer is like 
.clear()-addRow()-first()-clear()-addRow()-first().. so it won't 
violate RowContainer's access rule. But with mutli threads to read/write 
HiveBaseFunctionResultList, as the ScriptOperator does which venki mentioned 
above, it would definitely hit this JIRA issue.

In my opinion, there are 2 solutions:
# remove ResultIterator lazy computing feature as patch1 does.
# implement a RowConatiner-like class, which support current RowContainer 
features. it also need to be thread-safe, and support add row after {{first()}} 
is already called. 

The second solution is quite complex, it may introduce performance degrade 
after support thread-safe access and write-after-read, compare with the 
performance upgrade of lazy-computing support, it's hardly to say whether it's 
worthy or not now. So I suggest we take the first solution to fix this issue, 
and left the possible optimization to milestone 4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109246#comment-14109246
 ] 

Venki Korukanti commented on HIVE-7799:
---

[~chengxiang li] Your plan sounds good. Lets log a JIRA to enable lazy 
computing and we will revisit in milestone 4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109398#comment-14109398
 ] 

Hive QA commented on HIVE-7799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12664112/HIVE-7799.3-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6253 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/92/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-92/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12664112

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14109679#comment-14109679
 ] 

Brock Noland commented on HIVE-7799:


Hey guys, I created HIVE-7873 to track the improvement in M4.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch, 
 HIVE-7799.3-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-24 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14108598#comment-14108598
 ] 

Brock Noland commented on HIVE-7799:


When resolved me might consider adding 

 * orc_merge1.q
 * orc_merge2.q
 * orc_merge3.q
 * orc_merge4.q

from HIVE-7792.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106708#comment-14106708
 ] 

Hive QA commented on HIVE-7799:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12663637/HIVE-7799.2-spark.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5980 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_union_null
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/79/console
Test logs: 
http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-79/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12663637

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106868#comment-14106868
 ] 

Venki Korukanti commented on HIVE-7799:
---

I think with the v2 patch we need unbounded memory as we store the results in 
Queue and sometime a single input record could generate more than one record 
(UDTF) or some operators (such as group by) flush after processing many input 
records.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14106874#comment-14106874
 ] 

Venki Korukanti commented on HIVE-7799:
---

Let me look at the RowContainer and see if we can modify/extend it to support 
read/write like a queue with a persistent support. As far as I see we need 
persistent support.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107106#comment-14107106
 ] 

Chengxiang Li commented on HIVE-7799:
-

Thanks,[~venki387], it seems i miss something here, group by does need a 
persistent storage support here.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107139#comment-14107139
 ] 

Venki Korukanti commented on HIVE-7799:
---

Whenever {{ResultIterator.hasNext()}} or {{ResultIterator.next()}} is called we 
first serve records from RowContainer until all records in RowContainer are 
consumed. If there are no more records in RowContainer then we clear 
RowContainer and call {{processNextRecord}} or {{closeRecordProcessor}} in 
{{ResultIterator.hasNext()}} to get the next output record(s). So we start 
adding records only when RowContainer is empty (or cleared). I am trying to 
understand how we got into the situation where we are trying to write after 
reading has started. One scenario I can think of is if Spark has two threads 
like in producer-consumer.

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107462#comment-14107462
 ] 

Venki Korukanti commented on HIVE-7799:
---

ScriptOperator is spawning a separate thread which is adding the records to 
collector. Iterator thread is trying to read from RowContainer while 
ScriptOperator spawned thread is adding the records. There may be other 
operators that may spawn threads for processing. Looks like we need a 
synchronized queue with persistence support. 

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7799) TRANSFORM failed in transform_ppr1.q[Spark Branch]

2014-08-22 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14107466#comment-14107466
 ] 

Brock Noland commented on HIVE-7799:


MapDB might offer something we use or wrap: https://github.com/jankotek/mapdb

 TRANSFORM failed in transform_ppr1.q[Spark Branch]
 --

 Key: HIVE-7799
 URL: https://issues.apache.org/jira/browse/HIVE-7799
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Chengxiang Li
Assignee: Chengxiang Li
  Labels: Spark-M1
 Attachments: HIVE-7799.1-spark.patch, HIVE-7799.2-spark.patch


 Here is the exception:
 {noformat}
 2014-08-20 01:14:36,594 ERROR executor.Executor (Logging.scala:logError(96)) 
 - Exception in task 0.0 in stage 1.0 (TID 0)
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveKVResultCache.next(HiveKVResultCache.java:113)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:124)
 at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.next(HiveBaseFunctionResultList.java:82)
 at 
 scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:42)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at 
 org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:54)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:722)
 {noformat}
 Basically, the cause is that RowContainer is misused(it's not allowed to 
 write once someone read row from it), i'm trying to figure out whether it's a 
 hive issue or just in hive on spark mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)