Re: Documenting API's for the 1.0.0 release
javadocs + java annotations But I think we'll need a page on the wiki covering this and describing our policies including the discussion of the thrift layer which we won't be able to doc via annotations or javadocs as it's generate code. On Tue, Jan 13, 2015 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote: IMO, this would be javadocs. That should be the primary source of truth for this information. We can additionally capture this in wikidocs as well. On Tue, Jan 13, 2015 at 3:08 PM, Lefty Leverenz leftylever...@gmail.com wrote: Thanks Brock. Is this for javadocs or wikidocs, or both? -- Lefty On Tue, Jan 13, 2015 at 12:47 PM, Brock Noland br...@cloudera.com wrote: Hi, As discussed at our last meetup, we should really document our public APIs. Many had requested this be completed before Hive 1.0.0. As such I have created an uber jira to track this https://issues.apache.org/jira/browse/HIVE-9362 and created two sample sub-tasks for how I imagine this will play out. The most important task is documenting what we consider to be the public API. Additionally even if the API is not public, our user community, e.g. SparkSQL, Pig, etc, should document the API's they use. This way we'll at least know who we are breaking when we make a change. Cheers, Brock -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-9235) Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9235: --- Attachment: HIVE-9235.01.patch Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Key: HIVE-9235 URL: https://issues.apache.org/jira/browse/HIVE-9235 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9235.01.patch Support for doing vector column assign is missing for some data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9235: --- Summary: Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR (was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR - Key: HIVE-9235 URL: https://issues.apache.org/jira/browse/HIVE-9235 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9235.01.patch Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Support for doing vector column assign is missing for some data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9235) Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9235: --- Description: Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Support for doing vector column assign is missing for some data types. was:Support for doing vector column assign is missing for some data types. Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Key: HIVE-9235 URL: https://issues.apache.org/jira/browse/HIVE-9235 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9235.01.patch Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Support for doing vector column assign is missing for some data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9235: --- Status: Patch Available (was: Open) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR - Key: HIVE-9235 URL: https://issues.apache.org/jira/browse/HIVE-9235 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9235.01.patch Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Support for doing vector column assign is missing for some data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276294#comment-14276294 ] Hive QA commented on HIVE-9178: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12692014/HIVE-9178.1-spark.patch {color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 6783 tests executed *Failed tests:* {noformat} TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file TestMultiSessionsHS2WithLocalClusterSpark - did not produce a TEST-*.xml file TestSparkCliDriver-auto_join30.q-join9.q-input17.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-auto_join_reordering_values.q-ptf_seqfile.q-auto_join18.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-avro_decimal_native.q-ptf_rcfile.q-auto_join4.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-avro_joins.q-join36.q-join1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-bucket3.q-bucketmapjoin1.q-groupby7_map_multi_single_reducer.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-bucketsortoptimize_insert_7.q-skewjoin_noskew.q-sample2.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-skewjoinopt8.q-union_remove_1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby4.q-tez_joins_explain.q-load_dyn_part3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_complex_types.q-auto_join9.q-groupby_map_ppr.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-vectorization_16.q-multi_insert_mixed.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join11.q-join18.q-groupby2.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join13.q-join_reorder3.q-union14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join2.q-script_pipe.q-auto_join24.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_casesensitive.q-decimal_join.q-mapjoin_addjar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_cond_pushdown_3.q-groupby7.q-union_remove_9.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_cond_pushdown_unqual4.q-load_dyn_part5.q-bucketmapjoin12.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-order.q-auto_join18_multi_distinct.q-groupby7_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-parallel_join1.q-escape_distributeby1.q-timestamp_null.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-skewjoinopt3.q-auto_join1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_transform.q-auto_sortmerge_join_7.q-bucketmapjoin_negative3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ptf_general_queries.q-bucketmapjoin3.q-enforce_order.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-skewjoin_union_remove_2.q-join4.q-groupby_cube1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-skewjoinopt15.q-join39.q-bucketmapjoin10.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-smb_mapjoin_15.q-mapreduce2.q-mapreduce1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-smb_mapjoin_4.q-groupby8_map.q-union_remove_11.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-stats12.q-groupby10.q-bucketmapjoin7.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-stats13.q-stats2.q-ppd_gby_join.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-table_access_keys_stats.q-bucketsortoptimize_insert_4.q-join_rc.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-union29.q-join23.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-transform_ppr2.q-join20.q-multi_insert_gby2.q-and-3-more - did not produce a TEST-*.xml file TestSparkCliDriver-union2.q-join_vc.q-input1_limit.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-union_remove_7.q-avro_joins_native.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-vector_distinct_2.q-join15.q-union19.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-vectorization_10.q-list_bucket_dml_2.q-scriptfile1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-vectorization_13.q-auto_sortmerge_join_13.q-auto_join10.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/639/testReport Console output:
[jira] [Updated] (HIVE-9356) Fail to handle the case that a qfile contains a semicolon in the annotation
[ https://issues.apache.org/jira/browse/HIVE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9356: --- Summary: Fail to handle the case that a qfile contains a semicolon in the annotation (was: Fail to handle the case that a qfile contains a semicolon) Fail to handle the case that a qfile contains a semicolon in the annotation --- Key: HIVE-9356 URL: https://issues.apache.org/jira/browse/HIVE-9356 Project: Hive Issue Type: Sub-task Affects Versions: encryption-branch Reporter: Ferdinand Xu Assignee: Dong Chen Fix For: encryption-branch Attachments: HIVE-9356-encryption.patch, HIVE-9356.patch Currently, we split the qfile by the semicolon. It should be able to handle the comment statement in the qfile with a semicolon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9369) fix arguments length checking in Upper and Lower UDF
Alexander Pivovarov created HIVE-9369: - Summary: fix arguments length checking in Upper and Lower UDF Key: HIVE-9369 URL: https://issues.apache.org/jira/browse/HIVE-9369 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.0, 0.14.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial currently initialize method checks that arguments.length 0 it should check if arguments.length != 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9278) Cached expression feature broken in one case
[ https://issues.apache.org/jira/browse/HIVE-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276335#comment-14276335 ] Ashutosh Chauhan commented on HIVE-9278: Committed to 0.14 as well. Cached expression feature broken in one case Key: HIVE-9278 URL: https://issues.apache.org/jira/browse/HIVE-9278 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Navis Priority: Blocker Fix For: 0.15.0, 0.14.1 Attachments: HIVE-9278.1.patch.txt Different query result depending on whether hive.cache.expr.evaluation is true or false. When true, no query results are produced (this is wrong). The q file: {noformat} set hive.cache.expr.evaluation=true; CREATE TABLE cache_expr_repro (date_str STRING); LOAD DATA LOCAL INPATH '../../data/files/cache_expr_repro.txt' INTO TABLE cache_expr_repro; SELECT MONTH(date_str) AS `mon`, CAST((MONTH(date_str) - 1) / 3 + 1 AS int) AS `quarter`, YEAR(date_str) AS `year` FROM cache_expr_repro WHERE ((CAST((MONTH(date_str) - 1) / 3 + 1 AS int) = 1) AND (YEAR(date_str) = 2015)) GROUP BY MONTH(date_str), CAST((MONTH(date_str) - 1) / 3 + 1 AS int), YEAR(date_str) ; {noformat} cache_expr_repro.txt {noformat} 2015-01-01 00:00:00 2015-02-01 00:00:00 2015-01-01 00:00:00 2015-02-01 00:00:00 2015-01-01 00:00:00 2015-01-01 00:00:00 2015-02-01 00:00:00 2015-02-01 00:00:00 2015-01-01 00:00:00 2015-01-01 00:00:00 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf
[ https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9366: -- Attachment: HIVE-9366.3.patch HIVE-9366.3.patch - fixed extra arguments in UDFDayOfMonth and UDFYear wrong date in description annotation in date_add() and date_sub() udf - Key: HIVE-9366 URL: https://issues.apache.org/jira/browse/HIVE-9366 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' but it should be '2009-07-31' instead the @Description annotation needs to be fixed for both date_add() and date_sub() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf
[ https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9366: -- Status: In Progress (was: Patch Available) wrong date in description annotation in date_add() and date_sub() udf - Key: HIVE-9366 URL: https://issues.apache.org/jira/browse/HIVE-9366 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' but it should be '2009-07-31' instead the @Description annotation needs to be fixed for both date_add() and date_sub() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-9358) Create LAST_DAY UDF
[ https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-9358 started by Alexander Pivovarov. - Create LAST_DAY UDF --- Key: HIVE-9358 URL: https://issues.apache.org/jira/browse/HIVE-9358 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' last_day function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9370: Summary: Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch] (was: Enable Hive on Spark for BigBench and run Query 8, the test failed ) Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch] - Key: HIVE-9370 URL: https://issues.apache.org/jira/browse/HIVE-9370 Project: Hive Issue Type: Bug Components: Spark Reporter: yuyun.chen enable hive on spark and run BigBench Query 8 then got the following exception: 2015-01-14 11:43:46,057 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor (SessionState.java:printError(839)) - Status: Failed 2015-01-14 11:43:46,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob start=1421206996052 end=1421207026062 duration=30010 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - java.lang.InterruptedException 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native Method) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Object.java:503) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1282) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1300) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1314) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.RDD.collect(RDD.scala:780) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner.init(Partitioner.scala:124) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at
[jira] [Updated] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-9370: Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch] - Key: HIVE-9370 URL: https://issues.apache.org/jira/browse/HIVE-9370 Project: Hive Issue Type: Sub-task Components: Spark Reporter: yuyun.chen enable hive on spark and run BigBench Query 8 then got the following exception: 2015-01-14 11:43:46,057 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor (SessionState.java:printError(839)) - Status: Failed 2015-01-14 11:43:46,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob start=1421206996052 end=1421207026062 duration=30010 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - java.lang.InterruptedException 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native Method) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Object.java:503) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1282) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1300) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1314) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.RDD.collect(RDD.scala:780) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner.init(Partitioner.scala:124) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SparkPlan.generateGraph(SparkPlan.java:69) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl
[jira] [Comment Edited] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276485#comment-14276485 ] Xuefu Zhang edited comment on HIVE-9178 at 1/14/15 5:10 AM: The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It seems likely that the patch has some defects. Chengxiang's question might be a hint. was (Author: xuefuz): The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It seems likely that the patch has some defects. Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, HIVE-9178.2-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276485#comment-14276485 ] Xuefu Zhang commented on HIVE-9178: --- The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It seems likely that the patch has some defects. Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, HIVE-9178.2-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9336) Fix Hive throws ParseException while handling Grouping-Sets clauses
[ https://issues.apache.org/jira/browse/HIVE-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276491#comment-14276491 ] Hive QA commented on HIVE-9336: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12691869/HIVE-9336.1.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7311 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2356/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2356/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2356/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12691869 - PreCommit-HIVE-TRUNK-Build Fix Hive throws ParseException while handling Grouping-Sets clauses --- Key: HIVE-9336 URL: https://issues.apache.org/jira/browse/HIVE-9336 Project: Hive Issue Type: Bug Components: Parser Affects Versions: 0.13.1 Reporter: zhaohm3 Fix For: 0.14.0 Attachments: Fix-Hive-ParseException-of-Grouping-Sets.htm, HIVE-9336.1.patch Currently, when Hive parses GROUPING SETS clauses, and if there are some expressions that were composed of two or more common subexpressions, then the first element of those expressions can only be a simple Identifier without any qualifications, otherwise Hive will throw ParseException during its parser stage. Therefore, Hive will throw ParseException while parsing the following HQLs: drop table test; create table test(tc1 int, tc2 int, tc3 int); explain select test.tc1, test.tc2 from test group by test.tc1, test.tc2 grouping sets(test.tc1, (test.tc1, test.tc2)); explain select tc1+tc2, tc2 from test group by tc1+tc2, tc2 grouping sets(tc2, (tc1 + tc2, tc2)); drop table test; The following contents show some ParseExctption stacktrace: 2015-01-07 09:53:34,718 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver 2015-01-07 09:53:34,719 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver 2015-01-07 09:53:34,721 INFO [main]: ql.Driver (Driver.java:checkConcurrency(158)) - Concurrency mode is disabled, not creating a lock manager 2015-01-07 09:53:34,721 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver 2015-01-07 09:53:34,724 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver 2015-01-07 09:53:34,724 INFO [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: explain select test.tc1, test.tc2 from test group by test.tc1, test.tc2 grouping sets(test.tc1, (test.tc1, test.tc2)) 2015-01-07 09:53:34,734 ERROR [main]: ql.Driver (SessionState.java:printError(545)) - FAILED: ParseException line 1:105 missing ) at ',' near 'EOF' line 1:116 extraneous input ')' expecting EOF near 'EOF' org.apache.hadoop.hive.ql.parse.ParseException: line 1:105 missing ) at ',' near 'EOF' line 1:116 extraneous input ')' expecting EOF near 'EOF' at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:210) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975) at
[jira] [Updated] (HIVE-9358) Create LAST_DAY UDF
[ https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9358: -- Attachment: HIVE-9358.1.patch Create LAST_DAY UDF --- Key: HIVE-9358 URL: https://issues.apache.org/jira/browse/HIVE-9358 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9358.1.patch LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' last_day function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9358) Create LAST_DAY UDF
[ https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9358: -- Status: Patch Available (was: In Progress) Create LAST_DAY UDF --- Key: HIVE-9358 URL: https://issues.apache.org/jira/browse/HIVE-9358 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9358.1.patch LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' last_day function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9358) Create LAST_DAY UDF
[ https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276498#comment-14276498 ] Alexander Pivovarov commented on HIVE-9358: --- Review Board Request https://reviews.apache.org/r/29877/ Create LAST_DAY UDF --- Key: HIVE-9358 URL: https://issues.apache.org/jira/browse/HIVE-9358 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9358.1.patch LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' last_day function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive
[ https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276414#comment-14276414 ] Rui Li commented on HIVE-9367: -- Hi [~jxiang], could you elaborate a little how this will avoid the expensive calls? Seems we still have to iterate all the file statuses to check if it's a directory? CombineFileInputFormatShim#getDirIndices is expensive - Key: HIVE-9367 URL: https://issues.apache.org/jira/browse/HIVE-9367 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-9367.1.patch [~lirui] found out that we spent quite some time on CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me we should be able to get rid of this method completely if we can enhance CombineFileInputFormatShim a little. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276425#comment-14276425 ] Chengxiang Li commented on HIVE-9178: - [~vanzin], how do we send SyncJobRequest result back to SparkClient? I don't see any related code in RemoteDriver. Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, HIVE-9178.2-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276449#comment-14276449 ] Alexander Pivovarov commented on HIVE-9357: --- add_months function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions004.htm I put several ADD_MONTHS examples below {code} select to_date('14-JAN-2014') from_date, 1 months, add_months('14-JAN-2014', 1) res from dual union all select to_date('31-JAN-2014') from_date, 1 months, add_months('31-JAN-2014', 1) res from dual union all select to_date('28-FEB-2014') from_date, -1 months, add_months('28-FEB-2014', -1) res from dual union all select to_date('28-FEB-2014') from_date, 2 months, add_months('28-FEB-2014', 2) res from dual union all select to_date('30-APR-2014') from_date, -2 months, add_months('30-APR-2014', -2) res from dual union all select to_date('28-FEB-2015') from_date, 12 months, add_months('28-FEB-2015', 12) res from dual union all select to_date('29-FEB-2016') from_date, -12 months, add_months('29-FEB-2016', -12) res from dual union all select to_date('29-JAN-2016') from_date, 1 months, add_months('29-JAN-2016', 1) res from dual union all select to_date('29-FEB-2016') from_date, -1 months, add_months('29-FEB-2016', -1) res from dual; from_datemonths res 2014-01-14 12014-02-14 2014-01-31 12014-02-28 2014-02-28-12014-01-31 2014-02-28 22014-04-30 2014-04-30-22014-02-28 2015-02-28122016-02-29 2016-02-29 -122015-02-28 2016-01-29 12016-02-29 2016-02-29-12016-01-31 {code} add_month function is used in many BI projects (especially in financial applications (e.g. to determine 36 mo loan end date)). From my experience most of BI projects was implemented in Oracle. Lots of BI projects are migrating to Hive now. So, lots of projects depends on Oracle add_months function business logic. as a separate activity existing hive udf date_add can be improved to be similar to MySQL implementation (if smbd need it) I belive add_months udf brings lots of business value to hive because many companies want easier migration from oracle fo hive. I think Oracle is used in most of enterprise big data companies. Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9357.1.patch ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed
yuyun.chen created HIVE-9370: Summary: Enable Hive on Spark for BigBench and run Query 8, the test failed Key: HIVE-9370 URL: https://issues.apache.org/jira/browse/HIVE-9370 Project: Hive Issue Type: Bug Components: Spark Reporter: yuyun.chen 2015-01-14 11:43:46,057 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor (SessionState.java:printError(839)) - Status: Failed 2015-01-14 11:43:46,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob start=1421206996052 end=1421207026062 duration=30010 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - java.lang.InterruptedException 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native Method) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Object.java:503) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1282) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1300) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1314) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.RDD.collect(RDD.scala:780) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner.init(Partitioner.scala:124) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SparkPlan.generateGraph(SparkPlan.java:69) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:223) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:298) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl
[jira] [Updated] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed
[ https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuyun.chen updated HIVE-9370: - Description: enable hive on spark and run BigBench Query 8 then got the following exception: 2015-01-14 11:43:46,057 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 INFO [main]: impl.RemoteSparkJobStatus (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted after 30s. Aborting it. 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor (SessionState.java:printError(839)) - Status: Failed 2015-01-14 11:43:46,062 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob start=1421206996052 end=1421207026062 duration=30010 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) - java.lang.InterruptedException 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native Method) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Object.java:503) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514) 2015-01-14 11:43:46,071 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1282) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1300) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1314) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.RDD.collect(RDD.scala:780) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.RangePartitioner.init(Partitioner.scala:124) 2015-01-14 11:43:46,072 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.SparkPlan.generateGraph(SparkPlan.java:69) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:223) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:298) 2015-01-14 11:43:46,073 INFO [stderr-redir-1]: client.SparkClientImpl (SparkClientImpl.java:run(436)) -at
[jira] [Commented] (HIVE-9196) MetaStoreDirectSql.getTableStats may need to call doDbSpecificInitializationsBeforeQuery
[ https://issues.apache.org/jira/browse/HIVE-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276419#comment-14276419 ] Binglin Chang commented on HIVE-9196: - Hi [~sershe], is my last comments make sense? If not, I can update the patch to remove 2nd call. MetaStoreDirectSql.getTableStats may need to call doDbSpecificInitializationsBeforeQuery - Key: HIVE-9196 URL: https://issues.apache.org/jira/browse/HIVE-9196 Project: Hive Issue Type: Bug Reporter: Binglin Chang Assignee: Binglin Chang Priority: Minor Attachments: HIVE-9196.001.patch Our hive metastore server sometimes print logs like this: {noformat} 2014-12-17 07:03:22,415 ERROR [pool-3-thread-154]: metastore.ObjectStore (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling back to ORM javax.jdo.JDODataStoreException: Error executing SQL query select COLUMN_NAME, COLUMN_TYPE, LONG_LOW_VALUE, LONG_HIGH_VALUE, DOUBLE_LOW_VALUE, DOUBLE_HIGH_VALUE, BIG_DECIMAL_LOW_VALUE, BIG_DECIMAL_HIGH_VALUE, NUM_NULLS, NUM_DISTINCTS, AVG_COL_LEN, MAX_COL_LEN, NUM_TRUES, NUM_FALSES, LAST_ANALYZED from TAB_COL_STATS where DB_NAME = ? and TABLE_NAME = ? and COLUMN_NAME in (?). at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getTableStats(MetaStoreDirectSql.java:879) at org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:5754) at org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:5751) at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213) at org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:5751) at org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:5745) at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) at $Proxy8.getTableColumnStatistics(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_column_statistics(HiveMetaStore.java:3552) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_column_statistics.getResult(ThriftHiveMetastore.java:9468) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_column_statistics.getResult(ThriftHiveMetastore.java:9452) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:666) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:662) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1589) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:662) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) NestedThrowablesStackTrace: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'TAB_COL_STATS where DB_NAME = 'user_profile' and TABLE_NAME = 'md5_device' at line 1 at sun.reflect.GeneratedConstructorAccessor103.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:377) at com.mysql.jdbc.Util.getInstance(Util.java:360) at
[jira] [Commented] (HIVE-5412) HivePreparedStatement.setDate not implemented
[ https://issues.apache.org/jira/browse/HIVE-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276420#comment-14276420 ] Matt Burgess commented on HIVE-5412: What about: public void setDate(int parameterIndex, Date x, Calendar cal) throws SQLException { // TODO Auto-generated method stub throw new SQLException(Method not supported); } HivePreparedStatement.setDate not implemented - Key: HIVE-5412 URL: https://issues.apache.org/jira/browse/HIVE-5412 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.12.0 Reporter: Alan Gates Fix For: 0.13.0 The DATE type was added in Hive 0.12, but the HivePreparedStatement.setDate method was not implemented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive
[ https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276435#comment-14276435 ] Jimmy Xiang commented on HIVE-9367: --- With the FileStatus, we don't need to go to NN to get the FileStatus again, since FileStatus already has info about if the path is a file or dir. Originally, in getDirIndices, we get FileStatus again, which is an extra call for each file. So this patch saves us a call to get FileStatus for each file. CombineFileInputFormatShim#getDirIndices is expensive - Key: HIVE-9367 URL: https://issues.apache.org/jira/browse/HIVE-9367 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-9367.1.patch [~lirui] found out that we spent quite some time on CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me we should be able to get rid of this method completely if we can enhance CombineFileInputFormatShim a little. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276443#comment-14276443 ] Hive QA commented on HIVE-9178: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12692117/HIVE-9178.2-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7307 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/640/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/640/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-640/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12692117 - PreCommit-HIVE-SPARK-Build Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, HIVE-9178.2-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive
[ https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276444#comment-14276444 ] Rui Li commented on HIVE-9367: -- I see. Thanks [~jxiang] for the explanation! CombineFileInputFormatShim#getDirIndices is expensive - Key: HIVE-9367 URL: https://issues.apache.org/jira/browse/HIVE-9367 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-9367.1.patch [~lirui] found out that we spent quite some time on CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me we should be able to get rid of this method completely if we can enhance CombineFileInputFormatShim a little. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9358) Create LAST_DAY UDF
[ https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9358: -- Description: LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' last_day function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm was: LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' Create LAST_DAY UDF --- Key: HIVE-9358 URL: https://issues.apache.org/jira/browse/HIVE-9358 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov LAST_DAY returns the date of the last day of the month that contains date: last_day('2015-01-14') = '2015-01-31' last_day('2016-02-01') = '2016-02-29' last_day function went from oracle http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276514#comment-14276514 ] Marcelo Vanzin commented on HIVE-9178: -- [~chengxiang li] ah, good catch. This method: {code} private void handle(ChannelHandlerContext ctx, SyncJobRequest msg) throws Exception { {code} Should actually be returning the result of the RPC instead of void. I'll update the patch tomorrow and add a unit test (d'oh). Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, HIVE-9178.2-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive
[ https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276527#comment-14276527 ] Rui Li commented on HIVE-9367: -- I just verified the patch here can reduce the getSplits time from 1s to less than 200ms. The test table consists of one 100GB sequence file. CombineFileInputFormatShim#getDirIndices is expensive - Key: HIVE-9367 URL: https://issues.apache.org/jira/browse/HIVE-9367 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Attachments: HIVE-9367.1.patch [~lirui] found out that we spent quite some time on CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me we should be able to get rid of this method completely if we can enhance CombineFileInputFormatShim a little. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276532#comment-14276532 ] Szehon Ho commented on HIVE-9360: - Unless anyone objects, I'm going to commit this in a bit without waiting for HiveQA, as bunch of other HiveQA runs today already have this error, and this is a test-only fix that shouldn't affect anything else. TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276537#comment-14276537 ] Brock Noland commented on HIVE-9360: Makes sense, just doublecheck the patch locally. TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho resolved HIVE-9360. - Resolution: Fixed Fix Version/s: 0.15.0 Assignee: Szehon Ho Double-checked by running the test again (had run before). Committed to trunk, thanks for review! TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Assignee: Szehon Ho Fix For: 0.15.0 Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode
[ https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9248: --- Attachment: HIVE-9248.04.patch Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode --- Key: HIVE-9248 URL: https://issues.apache.org/jira/browse/HIVE-9248 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, HIVE-9248.03.patch, HIVE-9248.04.patch Under Tez and Vectorization, ReduceWork not getting vectorized unless it GROUP BY operator is MergePartial. Add valid cases where GROUP BY is Hash (and presumably there are downstream reducers that will do MergePartial). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode
[ https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9248: --- Status: In Progress (was: Patch Available) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode --- Key: HIVE-9248 URL: https://issues.apache.org/jira/browse/HIVE-9248 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, HIVE-9248.03.patch, HIVE-9248.04.patch Under Tez and Vectorization, ReduceWork not getting vectorized unless it GROUP BY operator is MergePartial. Add valid cases where GROUP BY is Hash (and presumably there are downstream reducers that will do MergePartial). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode
[ https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9248: --- Status: Patch Available (was: In Progress) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode --- Key: HIVE-9248 URL: https://issues.apache.org/jira/browse/HIVE-9248 Project: Hive Issue Type: Bug Components: Tez, Vectorization Affects Versions: 0.14.0 Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, HIVE-9248.03.patch, HIVE-9248.04.patch Under Tez and Vectorization, ReduceWork not getting vectorized unless it GROUP BY operator is MergePartial. Add valid cases where GROUP BY is Hash (and presumably there are downstream reducers that will do MergePartial). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9309) schematool fails on Postgres 8.1
[ https://issues.apache.org/jira/browse/HIVE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275566#comment-14275566 ] Brock Noland commented on HIVE-9309: Yes I think so... [~mohitsabharwal]? schematool fails on Postgres 8.1 Key: HIVE-9309 URL: https://issues.apache.org/jira/browse/HIVE-9309 Project: Hive Issue Type: Bug Components: Database/Schema Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Fix For: 0.15.0 Attachments: HIVE-9309.patch Postgres upgrade scripts set {{standard_conforming_strings}} which is not allowed in 8.1: {code} ERROR: parameter standard_conforming_strings cannot be changed (state=55P02,code=0) {code} Postgres [8.1 Release notes|http://www.postgresql.org/docs/8.2/static/release-8-1.html] say that standard_conforming_strings value is read-only Postgres [8.2 notes|http://www.postgresql.org/docs/8.2/static/release-8-2.html] say that it can be set at runtime. It'd be nice to address this for those still using Postgres 8.1 This patch provides a schemaTool db option postgres.filter.81 which, if set, filters out the standard_conforming_strings statement from upgrade scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9320) Add UnionEliminatorRule on cbo path
[ https://issues.apache.org/jira/browse/HIVE-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275601#comment-14275601 ] Ashutosh Chauhan commented on HIVE-9320: [~jpullokkaran] Can you take a look at this small patch? Add UnionEliminatorRule on cbo path --- Key: HIVE-9320 URL: https://issues.apache.org/jira/browse/HIVE-9320 Project: Hive Issue Type: Improvement Components: CBO, Logical Optimizer Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9320.patch Shorten the pipeline, where possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Attachment: HIVE-9194.04.patch Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Status: Open (was: Patch Available) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.
On Jan. 13, 2015, 6:47 a.m., chengxiang li wrote: spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java, line 55 https://reviews.apache.org/r/29832/diff/1/?file=818434#file818434line55 In API level, it's still an asynchronous RPC API, as the use case of this API described in the javadoc, do you think it would be more clean to supply a synchronous API like: T run(JobT job)? No. With a client-side synchronous API, it's awkward to specify things like timeouts - you either need explicit parameters which are not really part of the RPC, or extra configuration. Here, you just say `client.run().get(someTimeout)` if you want the call to be synchronous on the client side. - Marcelo --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29832/#review67813 --- On Jan. 13, 2015, 12:31 a.m., Marcelo Vanzin wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29832/ --- (Updated Jan. 13, 2015, 12:31 a.m.) Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang. Bugs: HIVE-9178 https://issues.apache.org/jira/browse/HIVE-9178 Repository: hive-git Description --- HIVE-9178. Add a synchronous RPC API to the remote Spark context. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java 5c3ca018bb177ef9fd9fb24b054a9db29274b31e spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java f9c10b196ab47b5b4f4c0126ad455869ab68f0ca spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 5e767ef5eb47e493a332607204f4c522028d7d0e spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java f8b2202a465bb8abe3d2c34e49ade6387482738c Diff: https://reviews.apache.org/r/29832/diff/ Testing --- Thanks, Marcelo Vanzin
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Attachment: (was: HIVE-9194.04.patch) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Attachment: HIVE-9194.04.patch Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Status: Patch Available (was: Open) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Status: Open (was: Patch Available) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3405) UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase
[ https://issues.apache.org/jira/browse/HIVE-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-3405: -- Release Note: (was: Initcap method tested.please verify) UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase - Key: HIVE-3405 URL: https://issues.apache.org/jira/browse/HIVE-3405 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.11.0, 0.13.0, 0.14.0, 0.15.0, 0.14.1 Reporter: Archana Nair Assignee: Alexander Pivovarov Labels: TODOC15, patch Fix For: 0.15.0 Attachments: HIVE-3405.1.patch.txt, HIVE-3405.2.patch, HIVE-3405.3.patch, HIVE-3405.4.patch, HIVE-3405.5.patch, HIVE-3405.5.patch Hive current releases lacks a INITCAP function which returns String with first letter of the word in uppercase.INITCAP returns String, with the first letter of each word in uppercase, all other letters in same case. Words are delimited by white space.This will be useful report generation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29671: Support select distinct *
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29671/ --- (Updated Jan. 13, 2015, 6:04 p.m.) Review request for hive and John Pullokkaran. Changes --- remove spaces Repository: hive-git Description --- Support select distinct * in operator genaration phase. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 917b3a4 ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3534551 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 9ad6714 ql/src/test/queries/clientnegative/selectDistinctStarNeg_1.q PRE-CREATION ql/src/test/queries/clientnegative/selectDistinctStarNeg_2.q PRE-CREATION ql/src/test/queries/clientpositive/selectDistinctStar.q PRE-CREATION ql/src/test/results/clientnegative/selectDistinctStarNeg_1.q.out PRE-CREATION ql/src/test/results/clientnegative/selectDistinctStarNeg_2.q.out PRE-CREATION ql/src/test/results/clientpositive/selectDistinctStar.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/selectDistinctStar.q.out PRE-CREATION Diff: https://reviews.apache.org/r/29671/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Status: Patch Available (was: Open) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Status: Open (was: Patch Available) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Attachment: HIVE-9194.04.patch remove spaces Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9255) Fastpath for limited fetches from unpartitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9255: -- Status: Patch Available (was: Open) Fastpath for limited fetches from unpartitioned tables -- Key: HIVE-9255 URL: https://issues.apache.org/jira/browse/HIVE-9255 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0, 0.15.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9255.1.patch, HIVE-9255.2.patch Currently, for flat tables, the threshold check is applicable for a query like {{select * from lineitem limit 1;}}. This is not necessary as without a filter clause, this can be executed entirely via FetchTask. Running a cluster task is redundant for this case. This fastpath is applicable for partitioned tables already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9255) Fastpath for limited fetches from unpartitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9255: -- Attachment: HIVE-9255.2.patch Regenerate q.out for the select cast(...) ... limit 10; case without the filter. Fastpath for limited fetches from unpartitioned tables -- Key: HIVE-9255 URL: https://issues.apache.org/jira/browse/HIVE-9255 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0, 0.15.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9255.1.patch, HIVE-9255.2.patch Currently, for flat tables, the threshold check is applicable for a query like {{select * from lineitem limit 1;}}. This is not necessary as without a filter clause, this can be executed entirely via FetchTask. Running a cluster task is redundant for this case. This fastpath is applicable for partitioned tables already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9255) Fastpath for limited fetches from unpartitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-9255: -- Status: Open (was: Patch Available) Fastpath for limited fetches from unpartitioned tables -- Key: HIVE-9255 URL: https://issues.apache.org/jira/browse/HIVE-9255 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0, 0.15.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-9255.1.patch, HIVE-9255.2.patch Currently, for flat tables, the threshold check is applicable for a query like {{select * from lineitem limit 1;}}. This is not necessary as without a filter clause, this can be executed entirely via FetchTask. Running a cluster task is redundant for this case. This fastpath is applicable for partitioned tables already. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7550) Extend cached evaluation to multiple expressions
[ https://issues.apache.org/jira/browse/HIVE-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7550: --- Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Extend cached evaluation to multiple expressions Key: HIVE-7550 URL: https://issues.apache.org/jira/browse/HIVE-7550 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 0.15.0 Attachments: HIVE-7550.1.patch.txt, HIVE-7550.2.patch.txt Currently, hive.cache.expr.evaluation caches per expression. But cache context might be shared for multiple expressions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29671: Support select distinct *
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29671/ --- (Updated Jan. 13, 2015, 6:07 p.m.) Review request for hive and John Pullokkaran. Repository: hive-git Description --- Support select distinct * in operator genaration phase. Diffs (updated) - itests/src/test/resources/testconfiguration.properties 917b3a4 ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3534551 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 9ad6714 ql/src/test/queries/clientnegative/selectDistinctStarNeg_1.q PRE-CREATION ql/src/test/queries/clientnegative/selectDistinctStarNeg_2.q PRE-CREATION ql/src/test/queries/clientpositive/selectDistinctStar.q PRE-CREATION ql/src/test/results/clientnegative/selectDistinctStarNeg_1.q.out PRE-CREATION ql/src/test/results/clientnegative/selectDistinctStarNeg_2.q.out PRE-CREATION ql/src/test/results/clientpositive/selectDistinctStar.q.out PRE-CREATION ql/src/test/results/clientpositive/tez/selectDistinctStar.q.out PRE-CREATION Diff: https://reviews.apache.org/r/29671/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Attachment: (was: HIVE-9194.04.patch) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9344) Fix flaky test optimize_nullscan
[ https://issues.apache.org/jira/browse/HIVE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275677#comment-14275677 ] Ashutosh Chauhan commented on HIVE-9344: +1 Fix flaky test optimize_nullscan Key: HIVE-9344 URL: https://issues.apache.org/jira/browse/HIVE-9344 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Navis Attachments: HIVE-9344.1.patch.txt The optimize_nullscan test is extremely flaky. We need to find a way to fix this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276570#comment-14276570 ] Alexander Pivovarov commented on HIVE-9357: --- according to Quora ADD_MONTS is #1 feature gap in hive http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL When people say ADD_MONTHS they mean Oracle ADD_MONTHS implementation logic. I searched ADD_MONTHS finction in google - first page (first 10 results) mention oracle add_months https://www.google.com/#q=add_months+function Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9357.1.patch ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276571#comment-14276571 ] Alexander Pivovarov commented on HIVE-9357: --- From the Oracle reference on add_months If date is the last day of the month or if the resulting month has fewer days than the day component of date, then the result is the last day of the resulting month. Otherwise, the result has the same day component as date. Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9357.1.patch ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276579#comment-14276579 ] Brock Noland commented on HIVE-9235: Can you describe the issue you see? Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR - Key: HIVE-9235 URL: https://issues.apache.org/jira/browse/HIVE-9235 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9235.01.patch Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Support for doing vector column assign is missing for some data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9356) Fail to handle the case that a qfile contains a semicolon in the annotation
[ https://issues.apache.org/jira/browse/HIVE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dong Chen updated HIVE-9356: Attachment: HIVE-9356.1-encryption.patch Thanks for your review! Sergio, Brock. I found we have to handle below 2 difference cases of .q in current command parser implementation, so updated patch V1. {quote} --comment; sql statement {quote} and {quote} --comment; HiveCommand {quote} Fail to handle the case that a qfile contains a semicolon in the annotation --- Key: HIVE-9356 URL: https://issues.apache.org/jira/browse/HIVE-9356 Project: Hive Issue Type: Sub-task Affects Versions: encryption-branch Reporter: Ferdinand Xu Assignee: Dong Chen Fix For: encryption-branch Attachments: HIVE-9356-encryption.patch, HIVE-9356.1-encryption.patch, HIVE-9356.patch Currently, we split the qfile by the semicolon. It should be able to handle the comment statement in the qfile with a semicolon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
Matt McCline created HIVE-9371: -- Summary: Execution error for Parquet table and GROUP BY involving CHAR data type Key: HIVE-9371 URL: https://issues.apache.org/jira/browse/HIVE-9371 Project: Hive Issue Type: Bug Components: File Formats, Query Processor Reporter: Matt McCline Priority: Critical Query fails involving PARQUET table format, CHAR data type, and GROUP BY. Probably also fails for VARCHAR, too. {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 10 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) ... 16 more {noformat} Here is a q file: {noformat} SET hive.vectorized.execution.enabled=false; drop table char_2; create table char_2 ( key char(10), value char(20) ) stored as parquet; insert overwrite table char_2 select * from src; select value, sum(cast(key as int)), count(*) numrows from src group by value order by value asc limit 5; explain select value, sum(cast(key as int)), count(*) numrows from char_2 group by value order by value asc limit 5; -- should match the query from src select value, sum(cast(key as int)), count(*) numrows from char_2 group by value order by value asc limit 5; select value, sum(cast(key as int)), count(*) numrows from src group by value order by value desc limit 5; explain select value, sum(cast(key as int)), count(*) numrows from char_2 group by value order by value desc limit 5; -- should match the query from src select value, sum(cast(key as int)), count(*) numrows from char_2 group by value order by value desc limit 5; drop table char_2; {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9344) Fix flaky test optimize_nullscan
[ https://issues.apache.org/jira/browse/HIVE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276617#comment-14276617 ] Hive QA commented on HIVE-9344: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12691882/HIVE-9344.1.patch.txt {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 7311 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2357/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2357/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2357/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12691882 - PreCommit-HIVE-TRUNK-Build Fix flaky test optimize_nullscan Key: HIVE-9344 URL: https://issues.apache.org/jira/browse/HIVE-9344 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Navis Attachments: HIVE-9344.1.patch.txt The optimize_nullscan test is extremely flaky. We need to find a way to fix this test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR
[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276620#comment-14276620 ] Matt McCline commented on HIVE-9235: First issue (vectorization of Parquet): Missing cases in VectorColumnAssignFactory.java's public static VectorColumnAssign[] buildAssigners(VectorizedRowBatch outputBatch, Writable[] writables) for HiveCharWritable, HiveVarcharWritable, DateWritable, HiveDecimalWriter. Example of exception caused: {noformat} Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable at org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127) ... 23 more {noformat} Added code to fix that. Then, I copied a half dozen q vectorized tests using ORC tables and tried converted them to use PARQUET, but encountered another issue in *non-vectorized* mode. I was trying to establish base query outputs that I could use to verify the vectorized query output. This indicated a basic non-vectorized case of CHAR data type usage wasn't working for PARQUET. {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 10 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) ... 16 more {noformat} I filed this problem under HIVE-9371: Execution error for Parquet table and GROUP BY involving CHAR data type At that point we concluded we should temporarily disable vectorization of PARQUET since there is only one test that doesn't provide complete coverage of data types. FYI: [~hagleitn] Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR - Key: HIVE-9235 URL: https://issues.apache.org/jira/browse/HIVE-9235 Project: Hive Issue Type: Bug Components: Vectorization Reporter: Matt McCline Assignee: Matt McCline Priority: Critical Attachments: HIVE-9235.01.patch Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR Support for doing vector column
[jira] [Updated] (HIVE-9351) Running Hive Jobs with Tez cause templeton to never report percent complete
[ https://issues.apache.org/jira/browse/HIVE-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-9351: - Resolution: Fixed Fix Version/s: 0.15.0 Status: Resolved (was: Patch Available) Running Hive Jobs with Tez cause templeton to never report percent complete --- Key: HIVE-9351 URL: https://issues.apache.org/jira/browse/HIVE-9351 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.15.0 Attachments: HIVE-9351.patch Currently, when submitting Hive jobs through WebHCat and Hive is configured to use Tez, the percentComplete field returned by WebHCat is empty. LaunchMapper in WebHCat parses stderr of the process that it launches to extract map = 100%, reduce = 100% for Map/Reduce case. With Tez the content of stderr looks like {noformat} Map 1: -/- Reducer 2: 0/1 Map 1: -/- Reducer 2: 0(+1)/1 Map 1: -/- Reducer 2: 1/1 {noformat} WebHCat should handle that as well. WebHCat will follow HIVE-8495 and report (completed tasks)/(total tasks) as a percentage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9351) Running Hive Jobs with Tez cause templeton to never report percent complete
[ https://issues.apache.org/jira/browse/HIVE-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275734#comment-14275734 ] Eugene Koifman commented on HIVE-9351: -- Thanks [~thejas] for the review Running Hive Jobs with Tez cause templeton to never report percent complete --- Key: HIVE-9351 URL: https://issues.apache.org/jira/browse/HIVE-9351 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.15.0 Attachments: HIVE-9351.patch Currently, when submitting Hive jobs through WebHCat and Hive is configured to use Tez, the percentComplete field returned by WebHCat is empty. LaunchMapper in WebHCat parses stderr of the process that it launches to extract map = 100%, reduce = 100% for Map/Reduce case. With Tez the content of stderr looks like {noformat} Map 1: -/- Reducer 2: 0/1 Map 1: -/- Reducer 2: 0(+1)/1 Map 1: -/- Reducer 2: 1/1 {noformat} WebHCat should handle that as well. WebHCat will follow HIVE-8495 and report (completed tasks)/(total tasks) as a percentage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9178: -- Attachment: HIVE-9178.1-spark.patch Reattach the same patch to have another test run. Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275783#comment-14275783 ] Marcelo Vanzin commented on HIVE-9360: -- Yeah, I dislike timeouts in tests but in this case it's kinda hard to avoid them. Feel free to increase them if that makes things better. TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275791#comment-14275791 ] Szehon Ho commented on HIVE-9360: - Great, thanks for confirming. FYI [~brocknoland] TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9331) get rid of pre-optimized-hashtable memory optimizations
[ https://issues.apache.org/jira/browse/HIVE-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-9331: --- Attachment: HIVE-9331.02.patch Updating the spark output too... other test failures look unrelated get rid of pre-optimized-hashtable memory optimizations --- Key: HIVE-9331 URL: https://issues.apache.org/jira/browse/HIVE-9331 Project: Hive Issue Type: Bug Affects Versions: 0.15.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.15.0 Attachments: HIVE-9331.01.patch, HIVE-9331.01.patch, HIVE-9331.02.patch, HIVE-9331.patch, HIVE-9331.patch These were added in 13 because optimized hashtable couldn't make it in; they reduced memory usage by some amount (10-25%), and informed the design of the optimized hashtable, but now extra settings and code branches are just confusing and may have their own bugs. Might as well remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275808#comment-14275808 ] Brock Noland commented on HIVE-9360: +1 TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: mistake in udf_date_add.q.out
Yeah I think you're right, can you file a Jira? Looks like the @Description annotation needs to be fixed for both date_add() and date_sub() On Jan 13, 2015, at 10:42 AM, Alexander Pivovarov apivova...@gmail.com wrote: files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' should it be '2009-07-31' instead? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-9341) Apply ColumnPrunning for noop PTFs
[ https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275691#comment-14275691 ] Ashutosh Chauhan commented on HIVE-9341: +1 Apply ColumnPrunning for noop PTFs -- Key: HIVE-9341 URL: https://issues.apache.org/jira/browse/HIVE-9341 Project: Hive Issue Type: Improvement Components: PTF-Windowing Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9341.1.patch.txt, HIVE-9341.2.patch.txt Currently, PTF disables CP optimization, which can make a huge burden. For example, {noformat} select p_mfgr, p_name, p_size, rank() over (partition by p_mfgr order by p_name) as r, dense_rank() over (partition by p_mfgr order by p_name) as dr, sum(p_retailprice) over (partition by p_mfgr order by p_name rows between unbounded preceding and current row) as s1 from noop(on part partition by p_mfgr order by p_name ); STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: part Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: p_mfgr (type: string), p_name (type: string) sort order: ++ Map-reduce partition columns: p_mfgr (type: string) Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE Column stats: NONE value expressions: p_partkey (type: int), p_name (type: string), p_mfgr (type: string), p_brand (type: string), p_type (type: string), p_size (type: int), p_container (type: string), p_retailprice (type: double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: structtransactionid:bigint,bucketid:int,rowid:bigint) ... {noformat} There should be a generic way to discern referenced columns but before that, we know CP can be safely applied to noop functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 29761: HIVE-9315
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29761/ --- (Updated Jan. 13, 2015, 7:01 p.m.) Review request for hive and John Pullokkaran. Changes --- Rebased patch. Bugs: HIVE-9315 https://issues.apache.org/jira/browse/HIVE-9315 Repository: hive-git Description --- CBO (Calcite Return Path): Inline FileSinkOperator, Properties Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java 2f1497ab3d876b8d4152b076d95971e65887a2e5 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java de20302583e3b6260c545c17a9251775e5bff7c5 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26f48f1611d3c64f6df5bcfac02069e3a67 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86dfbf67b85cba24fb0e7ebf270abbe9c31f9 ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe11198ac5893a84bdf94f9c843c4ee2ccb4 ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 8b25c2b6b9f6cfb087ba3c1beaf0c2164ab70de0 Diff: https://reviews.apache.org/r/29761/diff/ Testing --- Existing tests. Thanks, Jesús Camacho Rodríguez
Re: Review Request 29625: HIVE-9200
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29625/ --- (Updated Jan. 13, 2015, 7:07 p.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-9200 https://issues.apache.org/jira/browse/HIVE-9200 Repository: hive-git Description --- CBO (Calcite Return Path): Inline Join, Properties Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2e771ec60851113ef9a717c87e142ca70bc53c07 ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java 03742d436930526ff2db15d6ed159f4f0d7136f0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java eba35f583fd077f492811b6231dfd59e8b05ea58 ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapjoinProc.java 264d3f0b0ad80163831179b57aefdd4a4c5cc647 ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 7ab35eec5987c78dee0349431e06ee65a20ee2cd ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java ae0addcee51abf08904872ddf8dfb2c12e71a9e0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/JoinReorder.java 9238e0e541b748f5e45fe572e6b4575cc3299b7f ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 828f87c1f043324b0432bcc7c1f461267e19d0a6 ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java 5291851b105730490033ff91e583ee44022ed24f ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java ea06503b0377ffb98f2583869e2c51ac1ea4e398 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapjoinProc.java 11ce47eb4ff4b8ae1162eb5f3842b8e32d3a21e1 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeJoinProc.java 8a0c47477718141cab85a4d6f71070117372df91 ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java bed95faa9bf072563262292931cc4b7d7cb034b3 ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java c52f7530b10c81a662118d2cb43599c82f7dbb4f ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/AbstractJoinTaskDispatcher.java 33ef581a97768d6391c67558e768d10e46a366f2 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 9c26907544ad8ced31d5cf47ed27c8a240f93925 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java 6f92b13ff7c1cdd4c651f5e1bff42626dee52750 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 037983434d2ab5ce6c8f523b89370ca68cd98e27 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java f62ad6cd109755f60e0e673a679c5107f91c43c0 ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java ffe11a0f0d2ee8f63b124b13275aca8de4704d8b ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java d00c48d8df3958a0a274aa30f2b999a98a6256c8 ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26f48f1611d3c64f6df5bcfac02069e3a67 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86dfbf67b85cba24fb0e7ebf270abbe9c31f9 ql/src/java/org/apache/hadoop/hive/ql/parse/TableAccessAnalyzer.java da14ab4e96bcc9089e10eb3a9d4e5d575b51d5ab ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe11198ac5893a84bdf94f9c843c4ee2ccb4 ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java c144d8c05c73025ba33b300229125e74930e ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 9f8c0918179d9226e36cecc3bd955946d6b5fe98 Diff: https://reviews.apache.org/r/29625/diff/ Testing --- Existing tests. Thanks, Jesús Camacho Rodríguez
[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275735#comment-14275735 ] Marcelo Vanzin commented on HIVE-9178: -- Should I worry about those test failures? I ran a subset of qtests locally and they passed. Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] --- Key: HIVE-9178 URL: https://issues.apache.org/jira/browse/HIVE-9178 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Marcelo Vanzin Attachments: HIVE-9178.1-spark.patch Based on discussions in HIVE-8972, it seems making sense to create a separate API for RPCs, such as addJar and getExecutorCounter. These jobs are different from a query submission in that they don't need to be queued in the backend and can be executed right away. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9315) CBO (Calcite Return Path): Inline FileSinkOperator, Properties
[ https://issues.apache.org/jira/browse/HIVE-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275745#comment-14275745 ] Laljo John Pullokkaran commented on HIVE-9315: -- +1 conditional on all unit test pass CBO (Calcite Return Path): Inline FileSinkOperator, Properties -- Key: HIVE-9315 URL: https://issues.apache.org/jira/browse/HIVE-9315 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-9315.01.patch, HIVE-9315.02.patch, HIVE-9315.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9194) Support select distinct *
[ https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9194: -- Status: Patch Available (was: Open) Support select distinct * - Key: HIVE-9194 URL: https://issues.apache.org/jira/browse/HIVE-9194 Project: Hive Issue Type: Sub-task Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch As per [~jpullokkaran]'s review comments, implement select distinct * -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9200) CBO (Calcite Return Path): Inline Join, Properties
[ https://issues.apache.org/jira/browse/HIVE-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-9200: -- Attachment: HIVE-9200.08.patch [~jpullokkaran], I'm rebasing the patch and uploading it again to run QA and be sure. CBO (Calcite Return Path): Inline Join, Properties -- Key: HIVE-9200 URL: https://issues.apache.org/jira/browse/HIVE-9200 Project: Hive Issue Type: Sub-task Components: CBO Reporter: Jesus Camacho Rodriguez Assignee: Jesus Camacho Rodriguez Fix For: 0.15.0 Attachments: HIVE-9200.01.patch, HIVE-9200.02.patch, HIVE-9200.03.patch, HIVE-9200.04.patch, HIVE-9200.05.patch, HIVE-9200.06.patch, HIVE-9200.07.patch, HIVE-9200.08.patch, HIVE-9200.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9360) TestSparkClient throws Timeoutexception
Szehon Ho created HIVE-9360: --- Summary: TestSparkClient throws Timeoutexception Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho TestSparkClient has been throwing TimeoutException in some test runs. {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-9360: Description: TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. was: TestSparkClient has been throwing TimeoutException in some test runs. {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} for each of the tests. TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9357: -- Attachment: HIVE-9357.1.patch HIVE-9357.1.patch Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9357.1.patch ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9357: -- Status: Patch Available (was: In Progress) Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9357.1.patch ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
mistake in udf_date_add.q.out
files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' should it be '2009-07-31' instead?
Re: Review Request 29763: HIVE-9292
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29763/#review67910 --- ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java https://reviews.apache.org/r/29763/#comment111998 It seems like groupOpToInputTables is used only by RewriteQueryUsingAggregateIndexCtx. Couldn't we get rid of groupToInputTables from parse context by changing RewriteQueryUsingAggregateIndexCtx? - John Pullokkaran On Jan. 13, 2015, 9:17 a.m., Jesús Camacho Rodríguez wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29763/ --- (Updated Jan. 13, 2015, 9:17 a.m.) Review request for hive and John Pullokkaran. Bugs: HIVE-9292 https://issues.apache.org/jira/browse/HIVE-9292 Repository: hive-git Description --- CBO (Calcite Return Path): Inline GroupBy, Properties Diffs - ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java fe686d96b642572059ef13129951d01fce4fedce ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 8215c26f48f1611d3c64f6df5bcfac02069e3a67 ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86dfbf67b85cba24fb0e7ebf270abbe9c31f9 ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 23fbbe11198ac5893a84bdf94f9c843c4ee2ccb4 ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 7a0b0da5de5c1c7dc8dd099167e9d06e6b27eea2 Diff: https://reviews.apache.org/r/29763/diff/ Testing --- Existing tests. Thanks, Jesús Camacho Rodríguez
[jira] [Updated] (HIVE-9352) Merge from spark to trunk (follow-up of HIVE-9257)
[ https://issues.apache.org/jira/browse/HIVE-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-9352: Attachment: HIVE-9352.patch I think these are flaky, TestSparkClient has been failing in some other runs as well with timeout exception (we'll have to increase it, I think). I also couldn't reproduce the other failures (Hadoop20SAuthBridge, louter_join_ppr). Other ones I've seen before as flaky tests. I think this run had a bad (slow) set of hosts. Uploading same patch again. Merge from spark to trunk (follow-up of HIVE-9257) -- Key: HIVE-9352 URL: https://issues.apache.org/jira/browse/HIVE-9352 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: 0.15.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-9352.patch, HIVE-9352.patch Will include following JIRA's (not-inclusive list) HIVE-7674 (remove spark-snapshot dependency) HIVE-9335 (cleanup) HIVE-9340 (cleanup 2, including removing spark snapshot repo) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8485) HMS on Oracle incompatibility
[ https://issues.apache.org/jira/browse/HIVE-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275746#comment-14275746 ] Sushanth Sowmyan commented on HIVE-8485: @[~vikram.dixit] : Since this is an important robustness fix, I'd like to see this included in 0.14.1 as well. @[~ctang.ma] : Are you okay with this approach of fixing the robustness on our end? @[~sershe] : Could you please review this patch? HMS on Oracle incompatibility - Key: HIVE-8485 URL: https://issues.apache.org/jira/browse/HIVE-8485 Project: Hive Issue Type: Bug Components: Metastore Environment: Oracle as metastore DB Reporter: Ryan Pridgeon Assignee: Chaoyu Tang Attachments: HIVE-8485.2.patch, HIVE-8485.patch Oracle does not distinguish between empty strings and NULL,which proves problematic for DataNucleus. In the event a user creates a table with some property stored as an empty string the table will no longer be accessible. i.e. TBLPROPERTIES ('serialization.null.format'='') If they try to select, describe, drop, etc the client prints the following exception. ERROR ql.Driver: FAILED: SemanticException [Error 10001]: Table not found table name The work around for this was to go into the hive metastore on the Oracle database and replace NULL with some other string. Users could then drop the tables or alter their data to use the new null format they just set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-9357 started by Alexander Pivovarov. - Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9360) TestSparkClient throws Timeoutexception
[ https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-9360: Attachment: HIVE-9360.patch I am not sure, I think the test seems ok, but on PTest build infra it is running concurrently with bunch of other tests so Future.get might take awhile to return. So giving a try to increase the timeout. [~chengxiang li], [~vanzin] any thoughts? TestSparkClient throws Timeoutexception --- Key: HIVE-9360 URL: https://issues.apache.org/jira/browse/HIVE-9360 Project: Hive Issue Type: Test Components: Tests Affects Versions: 0.15.0 Reporter: Szehon Ho Attachments: HIVE-9360.patch TestSparkClient has been throwing TimeoutException in some test runs. The exception looks like: {noformat} java.util.concurrent.TimeoutException: null at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74) at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35) at org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130) at org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224) at org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126) {noformat} but for each of the tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF
[ https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275817#comment-14275817 ] Alexander Pivovarov commented on HIVE-9357: --- Review Board Request https://reviews.apache.org/r/29861/ Create ADD_MONTHS UDF - Key: HIVE-9357 URL: https://issues.apache.org/jira/browse/HIVE-9357 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9357.1.patch ADD_MONTHS adds a number of months to startdate: add_months('2015-01-14', 1) = '2015-02-14' add_months('2015-01-31', 1) = '2015-02-28' add_months('2015-02-28', 2) = '2015-04-30' add_months('2015-02-28', 12) = '2016-02-29' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9361) Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable
Eugene Koifman created HIVE-9361: Summary: Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable Key: HIVE-9361 URL: https://issues.apache.org/jira/browse/HIVE-9361 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.14.0 Reporter: Eugene Koifman Assignee: Eugene Koifman it's happening at {noformat} MetaStoreUtils.updateUnpartitionedTableStatsFast(newtCopy, wh.getFileStatusesForSD(newtCopy.getSd()), false, true); {noformat} other methods in this class call getWh() to get Warehouse so this likely explains why it's intermittent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf
[ https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated HIVE-9366: -- Status: Patch Available (was: In Progress) wrong date in description annotation in date_add() and date_sub() udf - Key: HIVE-9366 URL: https://issues.apache.org/jira/browse/HIVE-9366 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' but it should be '2009-07-31' instead the @Description annotation needs to be fixed for both date_add() and date_sub() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf
[ https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276357#comment-14276357 ] Alexander Pivovarov commented on HIVE-9366: --- Review Board Request https://reviews.apache.org/r/29871/ wrong date in description annotation in date_add() and date_sub() udf - Key: HIVE-9366 URL: https://issues.apache.org/jira/browse/HIVE-9366 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' but it should be '2009-07-31' instead the @Description annotation needs to be fixed for both date_add() and date_sub() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf
[ https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276362#comment-14276362 ] Jason Dere commented on HIVE-9366: -- +1 if tests pass wrong date in description annotation in date_add() and date_sub() udf - Key: HIVE-9366 URL: https://issues.apache.org/jira/browse/HIVE-9366 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.14.1 Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Trivial Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch files: ql/src/test/results/clientpositive/udf_date_add.q.out ql/src/test/results/beelinepositive/udf_date_add.q.out last line shows '2009-31-07' but it should be '2009-07-31' instead the @Description annotation needs to be fixed for both date_add() and date_sub() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9038) Join tests fail on Tez
[ https://issues.apache.org/jira/browse/HIVE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276373#comment-14276373 ] Vikram Dixit K commented on HIVE-9038: -- This issue as Navis mentioned, stems from the fact that tez does not make filterTag. FilterTag is generated in case of MR by the HashTableSinkOperator which is not used in case of tez. The right solution would be to have a select operator that adds the filtertag to the value field so as to work without sacrificing the increased stages by moving this to a shuffle join instead of map join. However, since this only happens in the case where there are multiple joins on the same key with an outer join doing the filtering, I have this patch that changes the join to a shuffle join in this case for the time being so as to get away from the asserts. I will be raising a different jira for the good fix. Join tests fail on Tez -- Key: HIVE-9038 URL: https://issues.apache.org/jira/browse/HIVE-9038 Project: Hive Issue Type: Bug Components: Tests, Tez Reporter: Ashutosh Chauhan Assignee: Vikram Dixit K Tez doesn't run all tests. But, if you run them, following tests fail with runt time exception pointing to bugs. {{auto_join21.q,auto_join29.q,auto_join30.q ,auto_join_filters.q,auto_join_nulls.q}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9038) Join tests fail on Tez
[ https://issues.apache.org/jira/browse/HIVE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276402#comment-14276402 ] Sergey Shelukhin commented on HIVE-9038: I think default switch case should throw/assert... otherwise looks good. I didn't check the out files, I assume the query results are no different from non-tez Join tests fail on Tez -- Key: HIVE-9038 URL: https://issues.apache.org/jira/browse/HIVE-9038 Project: Hive Issue Type: Bug Components: Tests, Tez Reporter: Ashutosh Chauhan Assignee: Vikram Dixit K Attachments: HIVE-9038.1.patch Tez doesn't run all tests. But, if you run them, following tests fail with runt time exception pointing to bugs. {{auto_join21.q,auto_join29.q,auto_join30.q ,auto_join_filters.q,auto_join_nulls.q}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9341) Apply ColumnPrunning for noop PTFs
[ https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276405#comment-14276405 ] Hive QA commented on HIVE-9341: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12691867/HIVE-9341.2.patch.txt {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7312 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1 org.apache.hive.jdbc.TestSSL.testSSLFetchHttp org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2355/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2355/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2355/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12691867 - PreCommit-HIVE-TRUNK-Build Apply ColumnPrunning for noop PTFs -- Key: HIVE-9341 URL: https://issues.apache.org/jira/browse/HIVE-9341 Project: Hive Issue Type: Improvement Components: PTF-Windowing Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-9341.1.patch.txt, HIVE-9341.2.patch.txt Currently, PTF disables CP optimization, which can make a huge burden. For example, {noformat} select p_mfgr, p_name, p_size, rank() over (partition by p_mfgr order by p_name) as r, dense_rank() over (partition by p_mfgr order by p_name) as dr, sum(p_retailprice) over (partition by p_mfgr order by p_name rows between unbounded preceding and current row) as s1 from noop(on part partition by p_mfgr order by p_name ); STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: part Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: p_mfgr (type: string), p_name (type: string) sort order: ++ Map-reduce partition columns: p_mfgr (type: string) Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE Column stats: NONE value expressions: p_partkey (type: int), p_name (type: string), p_mfgr (type: string), p_brand (type: string), p_type (type: string), p_size (type: int), p_container (type: string), p_retailprice (type: double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: structtransactionid:bigint,bucketid:int,rowid:bigint) ... {noformat} There should be a generic way to discern referenced columns but before that, we know CP can be safely applied to noop functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)