[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383406#comment-14383406 ] Chengxiang Li commented on HIVE-10073: -- Committed to spark branch, thanks jimmy for this contribution. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383219#comment-14383219 ] Xuefu Zhang commented on HIVE-10073: Okay. Makes sense. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383238#comment-14383238 ] Chengxiang Li commented on HIVE-10073: -- +1 Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382162#comment-14382162 ] Xuefu Zhang commented on HIVE-10073: Hi [~jxiang] and [~chengxiang li], before we patch this on Hive side, I think it's better to find the root cause. If the problem is due to Spark, we can bring up the problem to that community. So far, I'm not convinced that the problem is on hive side. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382253#comment-14382253 ] Jimmy Xiang commented on HIVE-10073: [~xuefuz], I think it's an issue on Hive side. In SparkRecordHandler, we use the job conf passed in from Hive. So it should be Hive's responsibility to make sure it has all the needed information. [~chengxiang li], though I called checkOutputSpecs for both MapWork and ReduceWork, I agree with you that it is better to call it in SparkPlanGenerator::generate(BaseWork work). Let me upload a new patch. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382712#comment-14382712 ] Hive QA commented on HIVE-10073: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12707558/HIVE-10073.3-spark.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7644 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_percentile_approx_23 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/807/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/807/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-807/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12707558 - PreCommit-HIVE-SPARK-Build Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14383152#comment-14383152 ] Chengxiang Li commented on HIVE-10073: -- [~xuefuz], the root cause should be just like Jimmy mentioned, some hbase table properties are set to JobConf during checkOutputSpecs, and this method is not invoked in HoS. Actually Spark checkout output specs while user build RDD graph with certain actions, like PairRDDFunctions::saveAsHadoopDataset, PairRDDFunctions::saveAsNewAPIHadoopDataset, in HoS, we use foreach as action, and write data to hadoop storage inside Hive, so it should be Hive's reponsbility to check output specs. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch, HIVE-10073.2-spark.patch, HIVE-10073.3-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381424#comment-14381424 ] Chengxiang Li commented on HIVE-10073: -- Hi, [~jxiang], I saw you only call checkOutputSpecs for ReduceWork, but there may be a FileSinkOperator in map-only job as well, so we may also need to checkOutputSpecs for MapWork. Besides, the checkOutputSpecs is invoked at SparkRecordHandler::init which would be executed for each task, SparkPlanGenerator::generate(BaseWork work) may be a better place to do this, we can checkOutputSpecs between clone jobconf and serialized jobconf, so this would only be checked once time at RSC side. Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch Attachments: HIVE-10073.1-spark.patch When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-10073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14378743#comment-14378743 ] Jimmy Xiang commented on HIVE-10073: It looks like property hbase.mapred.outputtable is not set for HoS. It is in the table properties which is set properly. For MR, it works because JobSubmitter (mapred code) calls output.checkOutputSpecs. Here the output class is HiveOuptputFormatImpl. In the checkOutputSpecs founction, the hbase related settings are copied to JobConf. However, for Spark, I don't see where output.checkOutputSpecs is called based on the stacktrace: {noformat} at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:431) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:103) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:58) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:32) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {noformat} [~chengxiang li], [~ruili], do you know why checkOutputSpecs isn't called for HoS in this case? Runtime exception when querying HBase with Spark [Spark Branch] --- Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)