Re: Documenting API's for the 1.0.0 release

2015-01-13 Thread Brock Noland
javadocs + java annotations

But I think we'll need a page on the wiki covering this and describing our
policies including the discussion of the thrift layer which we won't be
able to doc via annotations or javadocs as it's generate code.

On Tue, Jan 13, 2015 at 3:20 PM, Thejas Nair the...@hortonworks.com wrote:

 IMO, this would be javadocs. That should be the primary source of
 truth for this information. We can additionally capture this in
 wikidocs as well.



 On Tue, Jan 13, 2015 at 3:08 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:
  Thanks Brock.  Is this for javadocs or wikidocs, or both?
 
  -- Lefty
 
  On Tue, Jan 13, 2015 at 12:47 PM, Brock Noland br...@cloudera.com
 wrote:
 
  Hi,
 
  As discussed at our last meetup, we should really document our public
 APIs.
  Many had requested this be completed before Hive 1.0.0.
 
  As such I have created an uber jira to track this
  https://issues.apache.org/jira/browse/HIVE-9362 and created two sample
  sub-tasks for how I imagine this will play out.
 
  The most important task is documenting what we consider to be the public
  API. Additionally even if the API is not public, our user community,
 e.g.
  SparkSQL, Pig, etc, should document the API's they use. This way we'll
 at
  least know who we are breaking when we make a change.
 
  Cheers,
  Brock
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



[jira] [Updated] (HIVE-9235) Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9235:
---
Attachment: HIVE-9235.01.patch

 Make Parquet Vectorization of these data types work: DECIMAL, DATE, 
 TIMESTAMP, CHAR, and VARCHAR
 

 Key: HIVE-9235
 URL: https://issues.apache.org/jira/browse/HIVE-9235
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9235.01.patch


 Support for doing vector column assign is missing for some data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9235:
---
Summary: Turn off Parquet Vectorization until all data types work: DECIMAL, 
DATE, TIMESTAMP, CHAR, and VARCHAR  (was: Make Parquet Vectorization of these 
data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR)

 Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, 
 TIMESTAMP, CHAR, and VARCHAR
 -

 Key: HIVE-9235
 URL: https://issues.apache.org/jira/browse/HIVE-9235
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9235.01.patch


 Title was: Make Parquet Vectorization of these data types work: DECIMAL, 
 DATE, TIMESTAMP, CHAR, and VARCHAR
 Support for doing vector column assign is missing for some data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9235) Make Parquet Vectorization of these data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9235:
---
Description: 
Title was: Make Parquet Vectorization of these data types work: DECIMAL, DATE, 
TIMESTAMP, CHAR, and VARCHAR

Support for doing vector column assign is missing for some data types.

  was:Support for doing vector column assign is missing for some data types.


 Make Parquet Vectorization of these data types work: DECIMAL, DATE, 
 TIMESTAMP, CHAR, and VARCHAR
 

 Key: HIVE-9235
 URL: https://issues.apache.org/jira/browse/HIVE-9235
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9235.01.patch


 Title was: Make Parquet Vectorization of these data types work: DECIMAL, 
 DATE, TIMESTAMP, CHAR, and VARCHAR
 Support for doing vector column assign is missing for some data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9235:
---
Status: Patch Available  (was: Open)

 Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, 
 TIMESTAMP, CHAR, and VARCHAR
 -

 Key: HIVE-9235
 URL: https://issues.apache.org/jira/browse/HIVE-9235
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9235.01.patch


 Title was: Make Parquet Vectorization of these data types work: DECIMAL, 
 DATE, TIMESTAMP, CHAR, and VARCHAR
 Support for doing vector column assign is missing for some data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276294#comment-14276294
 ] 

Hive QA commented on HIVE-9178:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12692014/HIVE-9178.1-spark.patch

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 6783 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithLocalClusterSpark - did not produce a TEST-*.xml file
TestMultiSessionsHS2WithLocalClusterSpark - did not produce a TEST-*.xml file
TestSparkCliDriver-auto_join30.q-join9.q-input17.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-auto_join_reordering_values.q-ptf_seqfile.q-auto_join18.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-avro_decimal_native.q-ptf_rcfile.q-auto_join4.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-avro_joins.q-join36.q-join1.q-and-12-more - did not produce 
a TEST-*.xml file
TestSparkCliDriver-bucket3.q-bucketmapjoin1.q-groupby7_map_multi_single_reducer.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-bucketsortoptimize_insert_7.q-skewjoin_noskew.q-sample2.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby3_map.q-skewjoinopt8.q-union_remove_1.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-groupby4.q-tez_joins_explain.q-load_dyn_part3.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-groupby_complex_types.q-auto_join9.q-groupby_map_ppr.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-vectorization_16.q-multi_insert_mixed.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join11.q-join18.q-groupby2.q-and-12-more - did not produce a 
TEST-*.xml file
TestSparkCliDriver-join13.q-join_reorder3.q-union14.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-join2.q-script_pipe.q-auto_join24.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-join_casesensitive.q-decimal_join.q-mapjoin_addjar.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_cond_pushdown_3.q-groupby7.q-union_remove_9.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-join_cond_pushdown_unqual4.q-load_dyn_part5.q-bucketmapjoin12.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-order.q-auto_join18_multi_distinct.q-groupby7_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-parallel_join1.q-escape_distributeby1.q-timestamp_null.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-ppd_join4.q-skewjoinopt3.q-auto_join1.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-ppd_transform.q-auto_sortmerge_join_7.q-bucketmapjoin_negative3.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-ptf_general_queries.q-bucketmapjoin3.q-enforce_order.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-skewjoin_union_remove_2.q-join4.q-groupby_cube1.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-skewjoinopt15.q-join39.q-bucketmapjoin10.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-smb_mapjoin_15.q-mapreduce2.q-mapreduce1.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-smb_mapjoin_4.q-groupby8_map.q-union_remove_11.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-stats12.q-groupby10.q-bucketmapjoin7.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-stats13.q-stats2.q-ppd_gby_join.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-table_access_keys_stats.q-bucketsortoptimize_insert_4.q-join_rc.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-union29.q-join23.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-transform_ppr2.q-join20.q-multi_insert_gby2.q-and-3-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-union2.q-join_vc.q-input1_limit.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-union_remove_7.q-avro_joins_native.q-date_udf.q-and-12-more 
- did not produce a TEST-*.xml file
TestSparkCliDriver-vector_distinct_2.q-join15.q-union19.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-vectorization_10.q-list_bucket_dml_2.q-scriptfile1.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-vectorization_13.q-auto_sortmerge_join_13.q-auto_join10.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/639/testReport
Console output: 

[jira] [Updated] (HIVE-9356) Fail to handle the case that a qfile contains a semicolon in the annotation

2015-01-13 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-9356:
---
Summary: Fail to handle the case that a qfile contains a semicolon in the 
annotation  (was: Fail to handle the case that a qfile contains a semicolon)

 Fail to handle the case that a qfile contains a semicolon in the annotation
 ---

 Key: HIVE-9356
 URL: https://issues.apache.org/jira/browse/HIVE-9356
 Project: Hive
  Issue Type: Sub-task
Affects Versions: encryption-branch
Reporter: Ferdinand Xu
Assignee: Dong Chen
 Fix For: encryption-branch

 Attachments: HIVE-9356-encryption.patch, HIVE-9356.patch


 Currently, we split the qfile by the semicolon. It should be able to handle 
 the comment statement in the qfile with a semicolon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9369) fix arguments length checking in Upper and Lower UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created HIVE-9369:
-

 Summary: fix arguments length checking in Upper and Lower UDF
 Key: HIVE-9369
 URL: https://issues.apache.org/jira/browse/HIVE-9369
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.0, 0.14.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial


currently initialize method checks that arguments.length  0

it should check if arguments.length != 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9278) Cached expression feature broken in one case

2015-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276335#comment-14276335
 ] 

Ashutosh Chauhan commented on HIVE-9278:


Committed to 0.14 as well.

 Cached expression feature broken in one case
 

 Key: HIVE-9278
 URL: https://issues.apache.org/jira/browse/HIVE-9278
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Navis
Priority: Blocker
 Fix For: 0.15.0, 0.14.1

 Attachments: HIVE-9278.1.patch.txt


 Different query result depending on whether hive.cache.expr.evaluation is 
 true or false.  When true, no query results are produced (this is wrong).
 The q file:
 {noformat}
 set hive.cache.expr.evaluation=true;
 CREATE TABLE cache_expr_repro (date_str STRING);
 LOAD DATA LOCAL INPATH '../../data/files/cache_expr_repro.txt' INTO TABLE 
 cache_expr_repro;
 SELECT MONTH(date_str) AS `mon`, CAST((MONTH(date_str) - 1) / 3 + 1 AS int) 
 AS `quarter`,   YEAR(date_str) AS `year` FROM cache_expr_repro WHERE 
 ((CAST((MONTH(date_str) - 1) / 3 + 1 AS int) = 1) AND (YEAR(date_str) = 
 2015)) GROUP BY MONTH(date_str), CAST((MONTH(date_str) - 1) / 3 + 1 AS int),  
  YEAR(date_str) ;
 {noformat}
 cache_expr_repro.txt
 {noformat}
 2015-01-01 00:00:00
 2015-02-01 00:00:00
 2015-01-01 00:00:00
 2015-02-01 00:00:00
 2015-01-01 00:00:00
 2015-01-01 00:00:00
 2015-02-01 00:00:00
 2015-02-01 00:00:00
 2015-01-01 00:00:00
 2015-01-01 00:00:00
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9366:
--
Attachment: HIVE-9366.3.patch

HIVE-9366.3.patch - fixed extra arguments in UDFDayOfMonth and UDFYear

 wrong date in description annotation in date_add() and date_sub() udf
 -

 Key: HIVE-9366
 URL: https://issues.apache.org/jira/browse/HIVE-9366
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch


 files:
 ql/src/test/results/clientpositive/udf_date_add.q.out
 ql/src/test/results/beelinepositive/udf_date_add.q.out
 last line shows   '2009-31-07' but it should be   '2009-07-31'   instead
 the @Description annotation needs to be fixed for both date_add() and 
 date_sub()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9366:
--
Status: In Progress  (was: Patch Available)

 wrong date in description annotation in date_add() and date_sub() udf
 -

 Key: HIVE-9366
 URL: https://issues.apache.org/jira/browse/HIVE-9366
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch


 files:
 ql/src/test/results/clientpositive/udf_date_add.q.out
 ql/src/test/results/beelinepositive/udf_date_add.q.out
 last line shows   '2009-31-07' but it should be   '2009-07-31'   instead
 the @Description annotation needs to be fixed for both date_add() and 
 date_sub()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-9358) Create LAST_DAY UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-9358 started by Alexander Pivovarov.
-
 Create LAST_DAY UDF
 ---

 Key: HIVE-9358
 URL: https://issues.apache.org/jira/browse/HIVE-9358
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 LAST_DAY returns the date of the last day of the month that contains date:
 last_day('2015-01-14') = '2015-01-31'
 last_day('2016-02-01') = '2016-02-29'
 last_day function went from oracle  
 http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch]

2015-01-13 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9370:

Summary: Enable Hive on Spark for BigBench and run Query 8, the test failed 
[Spark Branch]  (was: Enable Hive on Spark for BigBench and run Query 8, the 
test failed )

 Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark 
 Branch]
 -

 Key: HIVE-9370
 URL: https://issues.apache.org/jira/browse/HIVE-9370
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: yuyun.chen

 enable hive on spark and run BigBench Query 8 then got the following 
 exception:
 2015-01-14 11:43:46,057 INFO  [main]: impl.RemoteSparkJobStatus 
 (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
 after 30s. Aborting it.
 2015-01-14 11:43:46,061 INFO  [main]: impl.RemoteSparkJobStatus 
 (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
 after 30s. Aborting it.
 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor 
 (SessionState.java:printError(839)) - Status: Failed
 2015-01-14 11:43:46,062 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob 
 start=1421206996052 end=1421207026062 duration=30010 
 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed 
 to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) - java.lang.InterruptedException
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native 
 Method)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 java.lang.Object.wait(Object.java:503)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1282)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1300)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1314)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.rdd.RDD.collect(RDD.scala:780)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.RangePartitioner.init(Partitioner.scala:124)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 

[jira] [Updated] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark Branch]

2015-01-13 Thread Chengxiang Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengxiang Li updated HIVE-9370:

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-7292

 Enable Hive on Spark for BigBench and run Query 8, the test failed [Spark 
 Branch]
 -

 Key: HIVE-9370
 URL: https://issues.apache.org/jira/browse/HIVE-9370
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: yuyun.chen

 enable hive on spark and run BigBench Query 8 then got the following 
 exception:
 2015-01-14 11:43:46,057 INFO  [main]: impl.RemoteSparkJobStatus 
 (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
 after 30s. Aborting it.
 2015-01-14 11:43:46,061 INFO  [main]: impl.RemoteSparkJobStatus 
 (RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
 after 30s. Aborting it.
 2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor 
 (SessionState.java:printError(839)) - Status: Failed
 2015-01-14 11:43:46,062 INFO  [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob 
 start=1421206996052 end=1421207026062 duration=30010 
 from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed 
 to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) - java.lang.InterruptedException
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native 
 Method)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 java.lang.Object.wait(Object.java:503)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514)
 2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1282)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1300)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1314)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.rdd.RDD.collect(RDD.scala:780)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.RangePartitioner.init(Partitioner.scala:124)
 2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 (SparkClientImpl.java:run(436)) -at 
 org.apache.hadoop.hive.ql.exec.spark.SparkPlan.generateGraph(SparkPlan.java:69)
 2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
 

[jira] [Comment Edited] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276485#comment-14276485
 ] 

Xuefu Zhang edited comment on HIVE-9178 at 1/14/15 5:10 AM:


The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It 
seems likely that the patch has some defects. Chengxiang's question might be a 
hint.


was (Author: xuefuz):
The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It 
seems likely that the patch has some defects.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276485#comment-14276485
 ] 

Xuefu Zhang commented on HIVE-9178:
---

The dummy patch produces 3 failures and the run takes 1 hour 48 minutes. It 
seems likely that the patch has some defects.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9336) Fix Hive throws ParseException while handling Grouping-Sets clauses

2015-01-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276491#comment-14276491
 ] 

Hive QA commented on HIVE-9336:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691869/HIVE-9336.1.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7311 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testRemoteClient
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2356/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2356/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2356/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691869 - PreCommit-HIVE-TRUNK-Build

 Fix Hive throws ParseException while handling Grouping-Sets clauses
 ---

 Key: HIVE-9336
 URL: https://issues.apache.org/jira/browse/HIVE-9336
 Project: Hive
  Issue Type: Bug
  Components: Parser
Affects Versions: 0.13.1
Reporter: zhaohm3
 Fix For: 0.14.0

 Attachments: Fix-Hive-ParseException-of-Grouping-Sets.htm, 
 HIVE-9336.1.patch


 Currently, when Hive parses GROUPING SETS clauses, and if there are some 
 expressions that were composed of two or more common subexpressions, then the 
 first element of those expressions can only be a simple Identifier without 
 any qualifications, otherwise Hive will throw ParseException during its 
 parser stage. Therefore, Hive will throw ParseException while parsing the 
 following HQLs:
 drop table test;
 create table test(tc1 int, tc2 int, tc3 int);
 
 explain select test.tc1, test.tc2 from test group by test.tc1, test.tc2 
 grouping sets(test.tc1, (test.tc1, test.tc2));
 explain select tc1+tc2, tc2 from test group by tc1+tc2, tc2 grouping 
 sets(tc2, (tc1 + tc2, tc2));
 
 drop table test;
 The following contents show some ParseExctption stacktrace:
 2015-01-07 09:53:34,718 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=Driver.run 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,719 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=TimeToSubmit 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,721 INFO [main]: ql.Driver 
 (Driver.java:checkConcurrency(158)) - Concurrency mode is disabled, not 
 creating a lock manager
 2015-01-07 09:53:34,721 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=compile 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,724 INFO [main]: log.PerfLogger 
 (PerfLogger.java:PerfLogBegin(108)) - PERFLOG method=parse 
 from=org.apache.hadoop.hive.ql.Driver
 2015-01-07 09:53:34,724 INFO [main]: parse.ParseDriver 
 (ParseDriver.java:parse(185)) - Parsing command: explain select test.tc1, 
 test.tc2 from test group by test.tc1, test.tc2 grouping sets(test.tc1, 
 (test.tc1, test.tc2))
 2015-01-07 09:53:34,734 ERROR [main]: ql.Driver 
 (SessionState.java:printError(545)) - FAILED: ParseException line 1:105 
 missing ) at ',' near 'EOF'
 line 1:116 extraneous input ')' expecting EOF near 'EOF'
 org.apache.hadoop.hive.ql.parse.ParseException: line 1:105 missing ) at 
 ',' near 'EOF'
 line 1:116 extraneous input ')' expecting EOF near 'EOF'
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:210)
 at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
 at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
 at 

[jira] [Updated] (HIVE-9358) Create LAST_DAY UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9358:
--
Attachment: HIVE-9358.1.patch

 Create LAST_DAY UDF
 ---

 Key: HIVE-9358
 URL: https://issues.apache.org/jira/browse/HIVE-9358
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9358.1.patch


 LAST_DAY returns the date of the last day of the month that contains date:
 last_day('2015-01-14') = '2015-01-31'
 last_day('2016-02-01') = '2016-02-29'
 last_day function went from oracle  
 http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9358) Create LAST_DAY UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9358:
--
Status: Patch Available  (was: In Progress)

 Create LAST_DAY UDF
 ---

 Key: HIVE-9358
 URL: https://issues.apache.org/jira/browse/HIVE-9358
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9358.1.patch


 LAST_DAY returns the date of the last day of the month that contains date:
 last_day('2015-01-14') = '2015-01-31'
 last_day('2016-02-01') = '2016-02-29'
 last_day function went from oracle  
 http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9358) Create LAST_DAY UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276498#comment-14276498
 ] 

Alexander Pivovarov commented on HIVE-9358:
---

Review Board Request https://reviews.apache.org/r/29877/

 Create LAST_DAY UDF
 ---

 Key: HIVE-9358
 URL: https://issues.apache.org/jira/browse/HIVE-9358
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9358.1.patch


 LAST_DAY returns the date of the last day of the month that contains date:
 last_day('2015-01-14') = '2015-01-31'
 last_day('2016-02-01') = '2016-02-29'
 last_day function went from oracle  
 http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive

2015-01-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276414#comment-14276414
 ] 

Rui Li commented on HIVE-9367:
--

Hi [~jxiang], could you elaborate a little how this will avoid the expensive 
calls? Seems we still have to iterate all the file statuses to check if it's a 
directory?

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Chengxiang Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276425#comment-14276425
 ] 

Chengxiang Li commented on HIVE-9178:
-

[~vanzin], how do we send SyncJobRequest result back to SparkClient? I don't 
see any related code in RemoteDriver.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276449#comment-14276449
 ] 

Alexander Pivovarov commented on HIVE-9357:
---

add_months function went from oracle 
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions004.htm
I put several ADD_MONTHS examples below
{code}
select to_date('14-JAN-2014') from_date, 1 months, add_months('14-JAN-2014', 1) 
res from dual union all
select to_date('31-JAN-2014') from_date, 1 months, add_months('31-JAN-2014', 1) 
res from dual union all
select to_date('28-FEB-2014') from_date, -1 months, add_months('28-FEB-2014', 
-1) res from dual union all
select to_date('28-FEB-2014') from_date, 2 months, add_months('28-FEB-2014', 2) 
res from dual union all
select to_date('30-APR-2014') from_date, -2 months, add_months('30-APR-2014', 
-2) res from dual union all
select to_date('28-FEB-2015') from_date, 12 months, add_months('28-FEB-2015', 
12) res from dual union all
select to_date('29-FEB-2016') from_date, -12 months, add_months('29-FEB-2016', 
-12) res from dual union all
select to_date('29-JAN-2016') from_date, 1 months, add_months('29-JAN-2016', 1) 
res from dual union all
select to_date('29-FEB-2016') from_date, -1 months, add_months('29-FEB-2016', 
-1) res from dual;


from_datemonths res
2014-01-14 12014-02-14
2014-01-31 12014-02-28
2014-02-28-12014-01-31
2014-02-28 22014-04-30
2014-04-30-22014-02-28
2015-02-28122016-02-29
2016-02-29   -122015-02-28
2016-01-29 12016-02-29
2016-02-29-12016-01-31
{code}

add_month function is used in many BI projects (especially in financial 
applications (e.g. to determine 36 mo loan end date)).
From my experience most of BI projects was implemented in Oracle.
Lots of BI projects are migrating to Hive now.

So, lots of projects depends on Oracle add_months function business logic.

as a separate activity existing hive udf date_add can be improved to be similar 
to MySQL implementation (if smbd need it)

I belive add_months udf brings lots of business value to hive because many 
companies want easier migration from oracle fo hive.
I think Oracle is used in most of enterprise big data companies.

 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9357.1.patch


 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed

2015-01-13 Thread yuyun.chen (JIRA)
yuyun.chen created HIVE-9370:


 Summary: Enable Hive on Spark for BigBench and run Query 8, the 
test failed 
 Key: HIVE-9370
 URL: https://issues.apache.org/jira/browse/HIVE-9370
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: yuyun.chen


2015-01-14 11:43:46,057 INFO  [main]: impl.RemoteSparkJobStatus 
(RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
after 30s. Aborting it.
2015-01-14 11:43:46,061 INFO  [main]: impl.RemoteSparkJobStatus 
(RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
after 30s. Aborting it.
2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor 
(SessionState.java:printError(839)) - Status: Failed
2015-01-14 11:43:46,062 INFO  [main]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob 
start=1421206996052 end=1421207026062 duration=30010 
from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed 
to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) - java.lang.InterruptedException
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native Method)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
java.lang.Object.wait(Object.java:503)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1282)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1300)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1314)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.rdd.RDD.collect(RDD.scala:780)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.RangePartitioner.init(Partitioner.scala:124)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlan.generateGraph(SparkPlan.java:69)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:223)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:298)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 

[jira] [Updated] (HIVE-9370) Enable Hive on Spark for BigBench and run Query 8, the test failed

2015-01-13 Thread yuyun.chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuyun.chen updated HIVE-9370:
-
Description: 
enable hive on spark and run BigBench Query 8 then got the following exception:

2015-01-14 11:43:46,057 INFO  [main]: impl.RemoteSparkJobStatus 
(RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
after 30s. Aborting it.
2015-01-14 11:43:46,061 INFO  [main]: impl.RemoteSparkJobStatus 
(RemoteSparkJobStatus.java:getSparkJobInfo(143)) - Job hasn't been submitted 
after 30s. Aborting it.
2015-01-14 11:43:46,061 ERROR [main]: status.SparkJobMonitor 
(SessionState.java:printError(839)) - Status: Failed
2015-01-14 11:43:46,062 INFO  [main]: log.PerfLogger 
(PerfLogger.java:PerfLogEnd(148)) - /PERFLOG method=SparkRunJob 
start=1421206996052 end=1421207026062 duration=30010 
from=org.apache.hadoop.hive.ql.exec.spark.status.SparkJobMonitor
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) - 15/01/14 11:43:46 INFO RemoteDriver: Failed 
to run job 0a9a7782-0e0b-4561-8468-959a6d8df0a3
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) - java.lang.InterruptedException
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at java.lang.Object.wait(Native Method)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
java.lang.Object.wait(Object.java:503)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:514)
2015-01-14 11:43:46,071 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1282)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1300)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1314)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.SparkContext.runJob(SparkContext.scala:1328)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.rdd.RDD.collect(RDD.scala:780)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:262)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.RangePartitioner.init(Partitioner.scala:124)
2015-01-14 11:43:46,072 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:63)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:894)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.spark.api.java.JavaPairRDD.sortByKey(JavaPairRDD.scala:864)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.SortByShuffler.shuffle(SortByShuffler.java:48)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.ShuffleTran.transform(ShuffleTran.java:45)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.SparkPlan.generateGraph(SparkPlan.java:69)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:223)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 
org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:298)
2015-01-14 11:43:46,073 INFO  [stderr-redir-1]: client.SparkClientImpl 
(SparkClientImpl.java:run(436)) -at 

[jira] [Commented] (HIVE-9196) MetaStoreDirectSql.getTableStats may need to call doDbSpecificInitializationsBeforeQuery

2015-01-13 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276419#comment-14276419
 ] 

Binglin Chang commented on HIVE-9196:
-

Hi [~sershe], is my last comments make sense? If not, I can update the patch to 
remove 2nd call. 


 MetaStoreDirectSql.getTableStats may need to call 
 doDbSpecificInitializationsBeforeQuery 
 -

 Key: HIVE-9196
 URL: https://issues.apache.org/jira/browse/HIVE-9196
 Project: Hive
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HIVE-9196.001.patch


 Our hive metastore server sometimes print logs like this:
 {noformat}
 2014-12-17 07:03:22,415 ERROR [pool-3-thread-154]: metastore.ObjectStore 
 (ObjectStore.java:handleDirectSqlError(2248)) - Direct SQL failed, falling 
 back to ORM
 javax.jdo.JDODataStoreException: Error executing SQL query select 
 COLUMN_NAME, COLUMN_TYPE, LONG_LOW_VALUE, LONG_HIGH_VALUE, 
 DOUBLE_LOW_VALUE, DOUBLE_HIGH_VALUE, BIG_DECIMAL_LOW_VALUE, 
 BIG_DECIMAL_HIGH_VALUE, NUM_NULLS, NUM_DISTINCTS, AVG_COL_LEN, 
 MAX_COL_LEN, NUM_TRUES, NUM_FALSES, LAST_ANALYZED  from 
 TAB_COL_STATS  where DB_NAME = ? and TABLE_NAME = ? and COLUMN_NAME 
 in (?).
 at 
 org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
 at 
 org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321)
 at 
 org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getTableStats(MetaStoreDirectSql.java:879)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:5754)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$5.getSqlResult(ObjectStore.java:5751)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2213)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:5751)
 at 
 org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:5745)
 at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
 at $Proxy8.getTableColumnStatistics(Unknown Source)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_column_statistics(HiveMetaStore.java:3552)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_column_statistics.getResult(ThriftHiveMetastore.java:9468)
 at 
 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table_column_statistics.getResult(ThriftHiveMetastore.java:9452)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:666)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge20S.java:662)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1589)
 at 
 org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge20S$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge20S.java:662)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 NestedThrowablesStackTrace:
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
 in your SQL syntax; check the manual that corresponds to your MySQL server 
 version for the right syntax to use near 'TAB_COL_STATS  where DB_NAME = 
 'user_profile' and TABLE_NAME = 'md5_device' at line 1
 at sun.reflect.GeneratedConstructorAccessor103.newInstance(Unknown 
 Source)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at com.mysql.jdbc.Util.handleNewInstance(Util.java:377)
 at com.mysql.jdbc.Util.getInstance(Util.java:360)
 at 

[jira] [Commented] (HIVE-5412) HivePreparedStatement.setDate not implemented

2015-01-13 Thread Matt Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276420#comment-14276420
 ] 

Matt Burgess commented on HIVE-5412:


What about:

public void setDate(int parameterIndex, Date x, Calendar cal) throws 
SQLException {
// TODO Auto-generated method stub
throw new SQLException(Method not supported);
  }

 HivePreparedStatement.setDate not implemented
 -

 Key: HIVE-5412
 URL: https://issues.apache.org/jira/browse/HIVE-5412
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.12.0
Reporter: Alan Gates
 Fix For: 0.13.0


 The DATE type was added in Hive 0.12, but the HivePreparedStatement.setDate 
 method was not implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive

2015-01-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276435#comment-14276435
 ] 

Jimmy Xiang commented on HIVE-9367:
---

With the FileStatus, we don't need to go to NN to get the FileStatus again, 
since FileStatus already has info about if the path is a file or dir. 
Originally, in getDirIndices, we get FileStatus again, which is an extra call 
for each file. So this patch saves us a call to get FileStatus for each file.

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276443#comment-14276443
 ] 

Hive QA commented on HIVE-9178:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12692117/HIVE-9178.2-spark.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 7307 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/640/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/640/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-640/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12692117 - PreCommit-HIVE-SPARK-Build

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive

2015-01-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276444#comment-14276444
 ] 

Rui Li commented on HIVE-9367:
--

I see. Thanks [~jxiang] for the explanation!

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9358) Create LAST_DAY UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9358:
--
Description: 
LAST_DAY returns the date of the last day of the month that contains date:
last_day('2015-01-14') = '2015-01-31'
last_day('2016-02-01') = '2016-02-29'

last_day function went from oracle  
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm

  was:
LAST_DAY returns the date of the last day of the month that contains date:
last_day('2015-01-14') = '2015-01-31'
last_day('2016-02-01') = '2016-02-29'


 Create LAST_DAY UDF
 ---

 Key: HIVE-9358
 URL: https://issues.apache.org/jira/browse/HIVE-9358
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 LAST_DAY returns the date of the last day of the month that contains date:
 last_day('2015-01-14') = '2015-01-31'
 last_day('2016-02-01') = '2016-02-29'
 last_day function went from oracle  
 http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions072.htm



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276514#comment-14276514
 ] 

Marcelo Vanzin commented on HIVE-9178:
--

[~chengxiang li] ah, good catch. This method:

{code}
private void handle(ChannelHandlerContext ctx, SyncJobRequest msg) throws 
Exception {
{code}

Should actually be returning the result of the RPC instead of void. I'll update 
the patch tomorrow and add a unit test (d'oh).

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch, 
 HIVE-9178.2-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9367) CombineFileInputFormatShim#getDirIndices is expensive

2015-01-13 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276527#comment-14276527
 ] 

Rui Li commented on HIVE-9367:
--

I just verified the patch here can reduce the getSplits time from 1s to less 
than 200ms. The test table consists of one 100GB sequence file.

 CombineFileInputFormatShim#getDirIndices is expensive
 -

 Key: HIVE-9367
 URL: https://issues.apache.org/jira/browse/HIVE-9367
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: HIVE-9367.1.patch


 [~lirui] found out that we spent quite some time on 
 CombineFileInputFormatShim#getDirIndices. Looked into it and it seems to me 
 we should be able to get rid of this method completely if we can enhance 
 CombineFileInputFormatShim a little.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276532#comment-14276532
 ] 

Szehon Ho commented on HIVE-9360:
-

Unless anyone objects, I'm going to commit this in a bit without waiting for 
HiveQA, as bunch of other HiveQA runs today already have this error, and this 
is a test-only fix that shouldn't affect anything else.

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276537#comment-14276537
 ] 

Brock Noland commented on HIVE-9360:


Makes sense, just doublecheck the patch locally.

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho resolved HIVE-9360.
-
   Resolution: Fixed
Fix Version/s: 0.15.0
 Assignee: Szehon Ho

Double-checked by running the test again (had run before). 
Committed to trunk, thanks for review!

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Fix For: 0.15.0

 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9248:
---
Attachment: HIVE-9248.04.patch

 Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is 
 Hash mode
 ---

 Key: HIVE-9248
 URL: https://issues.apache.org/jira/browse/HIVE-9248
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, 
 HIVE-9248.03.patch, HIVE-9248.04.patch


 Under Tez and Vectorization, ReduceWork not getting vectorized unless it 
 GROUP BY operator is MergePartial.  Add valid cases where GROUP BY is Hash 
 (and presumably there are downstream reducers that will do MergePartial).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9248:
---
Status: In Progress  (was: Patch Available)

 Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is 
 Hash mode
 ---

 Key: HIVE-9248
 URL: https://issues.apache.org/jira/browse/HIVE-9248
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, 
 HIVE-9248.03.patch, HIVE-9248.04.patch


 Under Tez and Vectorization, ReduceWork not getting vectorized unless it 
 GROUP BY operator is MergePartial.  Add valid cases where GROUP BY is Hash 
 (and presumably there are downstream reducers that will do MergePartial).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9248) Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is Hash mode

2015-01-13 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9248:
---
Status: Patch Available  (was: In Progress)

 Vectorization : Tez Reduce vertex not getting vectorized when GROUP BY is 
 Hash mode
 ---

 Key: HIVE-9248
 URL: https://issues.apache.org/jira/browse/HIVE-9248
 Project: Hive
  Issue Type: Bug
  Components: Tez, Vectorization
Affects Versions: 0.14.0
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9248.01.patch, HIVE-9248.02.patch, 
 HIVE-9248.03.patch, HIVE-9248.04.patch


 Under Tez and Vectorization, ReduceWork not getting vectorized unless it 
 GROUP BY operator is MergePartial.  Add valid cases where GROUP BY is Hash 
 (and presumably there are downstream reducers that will do MergePartial).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9309) schematool fails on Postgres 8.1

2015-01-13 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275566#comment-14275566
 ] 

Brock Noland commented on HIVE-9309:


Yes I think so... [~mohitsabharwal]?

 schematool fails on Postgres 8.1
 

 Key: HIVE-9309
 URL: https://issues.apache.org/jira/browse/HIVE-9309
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Reporter: Mohit Sabharwal
Assignee: Mohit Sabharwal
 Fix For: 0.15.0

 Attachments: HIVE-9309.patch


 Postgres upgrade scripts set {{standard_conforming_strings}} which is not 
 allowed in 8.1:
 {code}
 ERROR: parameter standard_conforming_strings cannot be changed 
 (state=55P02,code=0)
 {code}
 Postgres [8.1 Release 
 notes|http://www.postgresql.org/docs/8.2/static/release-8-1.html] say that 
 standard_conforming_strings value is read-only
 Postgres [8.2 
 notes|http://www.postgresql.org/docs/8.2/static/release-8-2.html] say that it 
 can be set at runtime.
 It'd be nice to address this for those still using Postgres 8.1
 This patch provides a schemaTool db option postgres.filter.81 which, if 
 set, filters out the standard_conforming_strings statement from upgrade 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9320) Add UnionEliminatorRule on cbo path

2015-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275601#comment-14275601
 ] 

Ashutosh Chauhan commented on HIVE-9320:


[~jpullokkaran] Can you take a look at this small patch?

 Add UnionEliminatorRule on cbo path
 ---

 Key: HIVE-9320
 URL: https://issues.apache.org/jira/browse/HIVE-9320
 Project: Hive
  Issue Type: Improvement
  Components: CBO, Logical Optimizer
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9320.patch


 Shorten the pipeline, where possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Attachment: HIVE-9194.04.patch

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Status: Open  (was: Patch Available)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 29832: HIVE-9178. Add a synchronous RPC API to the remote Spark context.

2015-01-13 Thread Marcelo Vanzin


 On Jan. 13, 2015, 6:47 a.m., chengxiang li wrote:
  spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java, 
  line 55
  https://reviews.apache.org/r/29832/diff/1/?file=818434#file818434line55
 
  In API level, it's still an asynchronous RPC API, as the use case of 
  this API described in the javadoc, do you think it would be more clean to 
  supply a synchronous API like: T run(JobT job)?

No. With a client-side synchronous API, it's awkward to specify things like 
timeouts - you either need explicit parameters which are not really part of the 
RPC, or extra configuration. Here, you just say `client.run().get(someTimeout)` 
if you want the call to be synchronous on the client side.


- Marcelo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29832/#review67813
---


On Jan. 13, 2015, 12:31 a.m., Marcelo Vanzin wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29832/
 ---
 
 (Updated Jan. 13, 2015, 12:31 a.m.)
 
 
 Review request for hive, Brock Noland, chengxiang li, and Xuefu Zhang.
 
 
 Bugs: HIVE-9178
 https://issues.apache.org/jira/browse/HIVE-9178
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 HIVE-9178. Add a synchronous RPC API to the remote Spark context.
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/RemoteSparkJobStatus.java
  5c3ca018bb177ef9fd9fb24b054a9db29274b31e 
   spark-client/src/main/java/org/apache/hive/spark/client/BaseProtocol.java 
 f9c10b196ab47b5b4f4c0126ad455869ab68f0ca 
   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java 
 0d49ed3d9e33ca08d6a7526c1c434a0dd0a06a67 
   spark-client/src/main/java/org/apache/hive/spark/client/SparkClient.java 
 5e767ef5eb47e493a332607204f4c522028d7d0e 
   
 spark-client/src/main/java/org/apache/hive/spark/client/SparkClientImpl.java 
 f8b2202a465bb8abe3d2c34e49ade6387482738c 
 
 Diff: https://reviews.apache.org/r/29832/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Marcelo Vanzin
 




[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Attachment: (was: HIVE-9194.04.patch)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Attachment: HIVE-9194.04.patch

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Status: Patch Available  (was: Open)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Status: Open  (was: Patch Available)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3405) UDF initcap to obtain a string with the first letter of each word in uppercase other letters in lowercase

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-3405:
--
Release Note:   (was: Initcap method tested.please verify)

 UDF initcap to obtain a string with the first letter of each word in 
 uppercase other letters in lowercase
 -

 Key: HIVE-3405
 URL: https://issues.apache.org/jira/browse/HIVE-3405
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.11.0, 0.13.0, 0.14.0, 
 0.15.0, 0.14.1
Reporter: Archana Nair
Assignee: Alexander Pivovarov
  Labels: TODOC15, patch
 Fix For: 0.15.0

 Attachments: HIVE-3405.1.patch.txt, HIVE-3405.2.patch, 
 HIVE-3405.3.patch, HIVE-3405.4.patch, HIVE-3405.5.patch, HIVE-3405.5.patch


 Hive current releases lacks a INITCAP function  which returns String with 
 first letter of the word in uppercase.INITCAP returns String, with the first 
 letter of each word in uppercase, all other letters in same case. Words are 
 delimited by white space.This will be useful report generation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 29671: Support select distinct *

2015-01-13 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29671/
---

(Updated Jan. 13, 2015, 6:04 p.m.)


Review request for hive and John Pullokkaran.


Changes
---

remove spaces


Repository: hive-git


Description
---

Support select distinct * in operator genaration phase.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 917b3a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3534551 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 9ad6714 
  ql/src/test/queries/clientnegative/selectDistinctStarNeg_1.q PRE-CREATION 
  ql/src/test/queries/clientnegative/selectDistinctStarNeg_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/selectDistinctStar.q PRE-CREATION 
  ql/src/test/results/clientnegative/selectDistinctStarNeg_1.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/selectDistinctStarNeg_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/selectDistinctStar.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/tez/selectDistinctStar.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/29671/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Status: Patch Available  (was: Open)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Status: Open  (was: Patch Available)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Attachment: HIVE-9194.04.patch

remove spaces

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9255) Fastpath for limited fetches from unpartitioned tables

2015-01-13 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9255:
--
Status: Patch Available  (was: Open)

 Fastpath for limited fetches from unpartitioned tables
 --

 Key: HIVE-9255
 URL: https://issues.apache.org/jira/browse/HIVE-9255
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0, 0.15.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9255.1.patch, HIVE-9255.2.patch


 Currently, for flat tables, the threshold check is applicable for a query 
 like {{select * from lineitem limit 1;}}.
 This is not necessary as without a filter clause, this can be executed 
 entirely via FetchTask. Running a cluster task is redundant for this case.
 This fastpath is applicable for partitioned tables already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9255) Fastpath for limited fetches from unpartitioned tables

2015-01-13 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9255:
--
Attachment: HIVE-9255.2.patch

Regenerate q.out for the select cast(...) ... limit 10; case without the 
filter.

 Fastpath for limited fetches from unpartitioned tables
 --

 Key: HIVE-9255
 URL: https://issues.apache.org/jira/browse/HIVE-9255
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0, 0.15.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9255.1.patch, HIVE-9255.2.patch


 Currently, for flat tables, the threshold check is applicable for a query 
 like {{select * from lineitem limit 1;}}.
 This is not necessary as without a filter clause, this can be executed 
 entirely via FetchTask. Running a cluster task is redundant for this case.
 This fastpath is applicable for partitioned tables already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9255) Fastpath for limited fetches from unpartitioned tables

2015-01-13 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-9255:
--
Status: Open  (was: Patch Available)

 Fastpath for limited fetches from unpartitioned tables
 --

 Key: HIVE-9255
 URL: https://issues.apache.org/jira/browse/HIVE-9255
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0, 0.15.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-9255.1.patch, HIVE-9255.2.patch


 Currently, for flat tables, the threshold check is applicable for a query 
 like {{select * from lineitem limit 1;}}.
 This is not necessary as without a filter clause, this can be executed 
 entirely via FetchTask. Running a cluster task is redundant for this case.
 This fastpath is applicable for partitioned tables already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7550) Extend cached evaluation to multiple expressions

2015-01-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7550:
---
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

 Extend cached evaluation to multiple expressions
 

 Key: HIVE-7550
 URL: https://issues.apache.org/jira/browse/HIVE-7550
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.15.0

 Attachments: HIVE-7550.1.patch.txt, HIVE-7550.2.patch.txt


 Currently, hive.cache.expr.evaluation caches per expression. But cache 
 context might be shared for multiple expressions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 29671: Support select distinct *

2015-01-13 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29671/
---

(Updated Jan. 13, 2015, 6:07 p.m.)


Review request for hive and John Pullokkaran.


Repository: hive-git


Description
---

Support select distinct * in operator genaration phase.


Diffs (updated)
-

  itests/src/test/resources/testconfiguration.properties 917b3a4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 3534551 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java cea86df 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 9ad6714 
  ql/src/test/queries/clientnegative/selectDistinctStarNeg_1.q PRE-CREATION 
  ql/src/test/queries/clientnegative/selectDistinctStarNeg_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/selectDistinctStar.q PRE-CREATION 
  ql/src/test/results/clientnegative/selectDistinctStarNeg_1.q.out PRE-CREATION 
  ql/src/test/results/clientnegative/selectDistinctStarNeg_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/selectDistinctStar.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/tez/selectDistinctStar.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/29671/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Attachment: (was: HIVE-9194.04.patch)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9344) Fix flaky test optimize_nullscan

2015-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275677#comment-14275677
 ] 

Ashutosh Chauhan commented on HIVE-9344:


+1

 Fix flaky test optimize_nullscan
 

 Key: HIVE-9344
 URL: https://issues.apache.org/jira/browse/HIVE-9344
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Navis
 Attachments: HIVE-9344.1.patch.txt


 The optimize_nullscan test is extremely flaky. We need to find a way to fix 
 this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276570#comment-14276570
 ] 

Alexander Pivovarov commented on HIVE-9357:
---

according to Quora ADD_MONTS is #1 feature gap in hive
http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL

When people say ADD_MONTHS they mean Oracle ADD_MONTHS implementation logic.
I searched ADD_MONTHS finction in google - first page (first 10 results) 
mention oracle add_months
https://www.google.com/#q=add_months+function

 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9357.1.patch


 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276571#comment-14276571
 ] 

Alexander Pivovarov commented on HIVE-9357:
---

From the Oracle reference on add_months
If date is the last day of the month or if the resulting month has fewer days 
than the day component of date, then the result is the last day of the 
resulting month. Otherwise, the result has the same day component as date.

 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9357.1.patch


 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

2015-01-13 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276579#comment-14276579
 ] 

Brock Noland commented on HIVE-9235:


Can you describe  the issue you see?

 Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, 
 TIMESTAMP, CHAR, and VARCHAR
 -

 Key: HIVE-9235
 URL: https://issues.apache.org/jira/browse/HIVE-9235
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9235.01.patch


 Title was: Make Parquet Vectorization of these data types work: DECIMAL, 
 DATE, TIMESTAMP, CHAR, and VARCHAR
 Support for doing vector column assign is missing for some data types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9356) Fail to handle the case that a qfile contains a semicolon in the annotation

2015-01-13 Thread Dong Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Chen updated HIVE-9356:

Attachment: HIVE-9356.1-encryption.patch

Thanks for your review! Sergio, Brock.

I found we have to handle below 2 difference cases of .q in current command 
parser implementation, so updated patch V1.
{quote}
--comment;
sql statement
{quote}
and 
{quote}
--comment;
HiveCommand
{quote}


 Fail to handle the case that a qfile contains a semicolon in the annotation
 ---

 Key: HIVE-9356
 URL: https://issues.apache.org/jira/browse/HIVE-9356
 Project: Hive
  Issue Type: Sub-task
Affects Versions: encryption-branch
Reporter: Ferdinand Xu
Assignee: Dong Chen
 Fix For: encryption-branch

 Attachments: HIVE-9356-encryption.patch, 
 HIVE-9356.1-encryption.patch, HIVE-9356.patch


 Currently, we split the qfile by the semicolon. It should be able to handle 
 the comment statement in the qfile with a semicolon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type

2015-01-13 Thread Matt McCline (JIRA)
Matt McCline created HIVE-9371:
--

 Summary: Execution error for Parquet table and GROUP BY involving 
CHAR data type
 Key: HIVE-9371
 URL: https://issues.apache.org/jira/browse/HIVE-9371
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Query Processor
Reporter: Matt McCline
Priority: Critical


Query fails involving PARQUET table format, CHAR data type, and GROUP BY.

Probably also fails for VARCHAR, too.

{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.hive.serde2.io.HiveCharWritable
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 10 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
... 16 more
{noformat}

Here is a q file:
{noformat}
SET hive.vectorized.execution.enabled=false;
drop table char_2;

create table char_2 (
  key char(10),
  value char(20)
) stored as parquet;

insert overwrite table char_2 select * from src;

select value, sum(cast(key as int)), count(*) numrows
from src
group by value
order by value asc
limit 5;

explain select value, sum(cast(key as int)), count(*) numrows
from char_2
group by value
order by value asc
limit 5;

-- should match the query from src
select value, sum(cast(key as int)), count(*) numrows
from char_2
group by value
order by value asc
limit 5;

select value, sum(cast(key as int)), count(*) numrows
from src
group by value
order by value desc
limit 5;

explain select value, sum(cast(key as int)), count(*) numrows
from char_2
group by value
order by value desc
limit 5;

-- should match the query from src
select value, sum(cast(key as int)), count(*) numrows
from char_2
group by value
order by value desc
limit 5;

drop table char_2;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9344) Fix flaky test optimize_nullscan

2015-01-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276617#comment-14276617
 ] 

Hive QA commented on HIVE-9344:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691882/HIVE-9344.1.patch.txt

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 7311 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_minimr_broken_pipe
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2357/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2357/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2357/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691882 - PreCommit-HIVE-TRUNK-Build

 Fix flaky test optimize_nullscan
 

 Key: HIVE-9344
 URL: https://issues.apache.org/jira/browse/HIVE-9344
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Navis
 Attachments: HIVE-9344.1.patch.txt


 The optimize_nullscan test is extremely flaky. We need to find a way to fix 
 this test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9235) Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, TIMESTAMP, CHAR, and VARCHAR

2015-01-13 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276620#comment-14276620
 ] 

Matt McCline commented on HIVE-9235:


First issue (vectorization of Parquet):
Missing cases in VectorColumnAssignFactory.java's  public static 
VectorColumnAssign[] buildAssigners(VectorizedRowBatch outputBatch,
  Writable[] writables) for HiveCharWritable, HiveVarcharWritable, 
DateWritable, HiveDecimalWriter.

Example of exception caused:
{noformat}
Caused by: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner 
for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136)
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
... 21 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented 
vector assigner for writable type class 
org.apache.hadoop.hive.serde2.io.HiveDecimalWritable
at 
org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528)
at 
org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127)
... 23 more
{noformat}

Added code to fix that.

Then, I copied a half dozen q vectorized tests using ORC tables and tried 
converted them to use PARQUET, but encountered another issue in 
*non-vectorized* mode.  I was trying to establish base query outputs that I 
could use to verify the vectorized query output.  This indicated a basic 
non-vectorized case of CHAR data type usage wasn't working for PARQUET.
{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.hive.serde2.io.HiveCharWritable
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 10 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be 
cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable
at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104)
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142)
at 
org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739)
at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809)
... 16 more
{noformat}

I filed this problem under HIVE-9371: Execution error for Parquet table and 
GROUP BY involving CHAR data type

At that point we concluded we should temporarily disable vectorization of 
PARQUET since there is only one test that doesn't provide complete coverage of 
data types.

FYI: [~hagleitn]

 Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, 
 TIMESTAMP, CHAR, and VARCHAR
 -

 Key: HIVE-9235
 URL: https://issues.apache.org/jira/browse/HIVE-9235
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Reporter: Matt McCline
Assignee: Matt McCline
Priority: Critical
 Attachments: HIVE-9235.01.patch


 Title was: Make Parquet Vectorization of these data types work: DECIMAL, 
 DATE, TIMESTAMP, CHAR, and VARCHAR
 Support for doing vector column 

[jira] [Updated] (HIVE-9351) Running Hive Jobs with Tez cause templeton to never report percent complete

2015-01-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-9351:
-
   Resolution: Fixed
Fix Version/s: 0.15.0
   Status: Resolved  (was: Patch Available)

 Running Hive Jobs with Tez cause templeton to never report percent complete
 ---

 Key: HIVE-9351
 URL: https://issues.apache.org/jira/browse/HIVE-9351
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.15.0

 Attachments: HIVE-9351.patch


 Currently, when submitting Hive jobs through WebHCat and Hive is configured 
 to use Tez, the percentComplete field returned by WebHCat is empty.
 LaunchMapper in WebHCat parses stderr of the process that it launches to 
 extract map = 100%,  reduce = 100% for Map/Reduce case.  With Tez the content 
 of stderr looks like 
 {noformat}
   Map 1: -/-  Reducer 2: 0/1  
   Map 1: -/-  Reducer 2: 0(+1)/1  
   Map 1: -/-  Reducer 2: 1/1
 {noformat}
 WebHCat should handle that as well.
 WebHCat will follow HIVE-8495 and report (completed tasks)/(total tasks) as a 
 percentage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9351) Running Hive Jobs with Tez cause templeton to never report percent complete

2015-01-13 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275734#comment-14275734
 ] 

Eugene Koifman commented on HIVE-9351:
--

Thanks [~thejas] for the review

 Running Hive Jobs with Tez cause templeton to never report percent complete
 ---

 Key: HIVE-9351
 URL: https://issues.apache.org/jira/browse/HIVE-9351
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 0.15.0

 Attachments: HIVE-9351.patch


 Currently, when submitting Hive jobs through WebHCat and Hive is configured 
 to use Tez, the percentComplete field returned by WebHCat is empty.
 LaunchMapper in WebHCat parses stderr of the process that it launches to 
 extract map = 100%,  reduce = 100% for Map/Reduce case.  With Tez the content 
 of stderr looks like 
 {noformat}
   Map 1: -/-  Reducer 2: 0/1  
   Map 1: -/-  Reducer 2: 0(+1)/1  
   Map 1: -/-  Reducer 2: 1/1
 {noformat}
 WebHCat should handle that as well.
 WebHCat will follow HIVE-8495 and report (completed tasks)/(total tasks) as a 
 percentage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9178:
--
Attachment: HIVE-9178.1-spark.patch

Reattach the same patch to have another test run.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch, HIVE-9178.1-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275783#comment-14275783
 ] 

Marcelo Vanzin commented on HIVE-9360:
--

Yeah, I dislike timeouts in tests but in this case it's kinda hard to avoid 
them. Feel free to increase them if that makes things better.

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275791#comment-14275791
 ] 

Szehon Ho commented on HIVE-9360:
-

Great, thanks for confirming.  FYI [~brocknoland]

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9331) get rid of pre-optimized-hashtable memory optimizations

2015-01-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-9331:
---
Attachment: HIVE-9331.02.patch

Updating the spark output too... other test failures look unrelated

 get rid of pre-optimized-hashtable memory optimizations
 ---

 Key: HIVE-9331
 URL: https://issues.apache.org/jira/browse/HIVE-9331
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.15.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.15.0

 Attachments: HIVE-9331.01.patch, HIVE-9331.01.patch, 
 HIVE-9331.02.patch, HIVE-9331.patch, HIVE-9331.patch


 These were added in 13 because optimized hashtable couldn't make it in; they 
 reduced memory usage by some amount (10-25%), and informed the design of the 
 optimized hashtable, but now extra settings and code branches are just 
 confusing and may have their own bugs. Might as well remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275808#comment-14275808
 ] 

Brock Noland commented on HIVE-9360:


+1

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: mistake in udf_date_add.q.out

2015-01-13 Thread Jason Dere
Yeah I think you're right, can you file a Jira? Looks like the @Description 
annotation needs to be fixed for both date_add() and date_sub()

On Jan 13, 2015, at 10:42 AM, Alexander Pivovarov apivova...@gmail.com wrote:

 files:
 ql/src/test/results/clientpositive/udf_date_add.q.out
 
 ql/src/test/results/beelinepositive/udf_date_add.q.out
 
 last line shows   '2009-31-07'
 
 should it be   '2009-07-31'   instead?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Commented] (HIVE-9341) Apply ColumnPrunning for noop PTFs

2015-01-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275691#comment-14275691
 ] 

Ashutosh Chauhan commented on HIVE-9341:


+1

 Apply ColumnPrunning for noop PTFs
 --

 Key: HIVE-9341
 URL: https://issues.apache.org/jira/browse/HIVE-9341
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9341.1.patch.txt, HIVE-9341.2.patch.txt


 Currently, PTF disables CP optimization, which can make a huge burden. For 
 example,
 {noformat}
 select p_mfgr, p_name, p_size,
 rank() over (partition by p_mfgr order by p_name) as r,
 dense_rank() over (partition by p_mfgr order by p_name) as dr,
 sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
 unbounded preceding and current row) as s1
 from noop(on part 
   partition by p_mfgr
   order by p_name
   );
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: part
 Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   key expressions: p_mfgr (type: string), p_name (type: string)
   sort order: ++
   Map-reduce partition columns: p_mfgr (type: string)
   Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: p_partkey (type: int), p_name (type: 
 string), p_mfgr (type: string), p_brand (type: string), p_type (type: 
 string), p_size (type: int), p_container (type: string), p_retailprice (type: 
 double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: 
 bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: 
 structtransactionid:bigint,bucketid:int,rowid:bigint)
 ...
 {noformat}
 There should be a generic way to discern referenced columns but before that, 
 we know CP can be safely applied to noop functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 29761: HIVE-9315

2015-01-13 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29761/
---

(Updated Jan. 13, 2015, 7:01 p.m.)


Review request for hive and John Pullokkaran.


Changes
---

Rebased patch.


Bugs: HIVE-9315
https://issues.apache.org/jira/browse/HIVE-9315


Repository: hive-git


Description
---

CBO (Calcite Return Path): Inline FileSinkOperator, Properties


Diffs (updated)
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketingSortingReduceSinkOptimizer.java
 2f1497ab3d876b8d4152b076d95971e65887a2e5 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java
 de20302583e3b6260c545c17a9251775e5bff7c5 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 
8215c26f48f1611d3c64f6df5bcfac02069e3a67 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
cea86dfbf67b85cba24fb0e7ebf270abbe9c31f9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 
23fbbe11198ac5893a84bdf94f9c843c4ee2ccb4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/FileSinkDesc.java 
8b25c2b6b9f6cfb087ba3c1beaf0c2164ab70de0 

Diff: https://reviews.apache.org/r/29761/diff/


Testing
---

Existing tests.


Thanks,

Jesús Camacho Rodríguez



Re: Review Request 29625: HIVE-9200

2015-01-13 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29625/
---

(Updated Jan. 13, 2015, 7:07 p.m.)


Review request for hive and John Pullokkaran.


Bugs: HIVE-9200
https://issues.apache.org/jira/browse/HIVE-9200


Repository: hive-git


Description
---

CBO (Calcite Return Path): Inline Join, Properties


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 
2e771ec60851113ef9a717c87e142ca70bc53c07 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractBucketJoinProc.java 
03742d436930526ff2db15d6ed159f4f0d7136f0 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/AbstractSMBJoinProc.java 
eba35f583fd077f492811b6231dfd59e8b05ea58 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/BucketMapjoinProc.java 
264d3f0b0ad80163831179b57aefdd4a4c5cc647 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java 
7ab35eec5987c78dee0349431e06ee65a20ee2cd 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 
ae0addcee51abf08904872ddf8dfb2c12e71a9e0 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/JoinReorder.java 
9238e0e541b748f5e45fe572e6b4575cc3299b7f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java 
828f87c1f043324b0432bcc7c1f461267e19d0a6 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java 
5291851b105730490033ff91e583ee44022ed24f 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SkewJoinOptimizer.java 
ea06503b0377ffb98f2583869e2c51ac1ea4e398 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeBucketMapjoinProc.java
 11ce47eb4ff4b8ae1162eb5f3842b8e32d3a21e1 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedMergeJoinProc.java 
8a0c47477718141cab85a4d6f71070117372df91 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/SparkMapJoinProcessor.java 
bed95faa9bf072563262292931cc4b7d7cb034b3 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/CorrelationOptimizer.java
 c52f7530b10c81a662118d2cb43599c82f7dbb4f 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/AbstractJoinTaskDispatcher.java
 33ef581a97768d6391c67558e768d10e46a366f2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
 9c26907544ad8ced31d5cf47ed27c8a240f93925 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SortMergeJoinTaskDispatcher.java
 6f92b13ff7c1cdd4c651f5e1bff42626dee52750 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java
 037983434d2ab5ce6c8f523b89370ca68cd98e27 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSMBJoinHintOptimizer.java
 f62ad6cd109755f60e0e673a679c5107f91c43c0 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java
 ffe11a0f0d2ee8f63b124b13275aca8de4704d8b 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSortMergeJoinOptimizer.java
 d00c48d8df3958a0a274aa30f2b999a98a6256c8 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 
8215c26f48f1611d3c64f6df5bcfac02069e3a67 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
cea86dfbf67b85cba24fb0e7ebf270abbe9c31f9 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TableAccessAnalyzer.java 
da14ab4e96bcc9089e10eb3a9d4e5d575b51d5ab 
  ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 
23fbbe11198ac5893a84bdf94f9c843c4ee2ccb4 
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java 
c144d8c05c73025ba33b300229125e74930e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java 
9f8c0918179d9226e36cecc3bd955946d6b5fe98 

Diff: https://reviews.apache.org/r/29625/diff/


Testing
---

Existing tests.


Thanks,

Jesús Camacho Rodríguez



[jira] [Commented] (HIVE-9178) Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]

2015-01-13 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275735#comment-14275735
 ] 

Marcelo Vanzin commented on HIVE-9178:
--

Should I worry about those test failures? I ran a subset of qtests locally and 
they passed.

 Create a separate API for remote Spark Context RPC other than job submission 
 [Spark Branch]
 ---

 Key: HIVE-9178
 URL: https://issues.apache.org/jira/browse/HIVE-9178
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Marcelo Vanzin
 Attachments: HIVE-9178.1-spark.patch


 Based on discussions in HIVE-8972, it seems making sense to create a separate 
 API for RPCs, such as addJar and getExecutorCounter. These jobs are different 
 from a query submission in that they don't need to be queued in the backend 
 and can be executed right away.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9315) CBO (Calcite Return Path): Inline FileSinkOperator, Properties

2015-01-13 Thread Laljo John Pullokkaran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275745#comment-14275745
 ] 

Laljo John Pullokkaran commented on HIVE-9315:
--

+1 conditional on all unit test pass

 CBO (Calcite Return Path): Inline FileSinkOperator, Properties
 --

 Key: HIVE-9315
 URL: https://issues.apache.org/jira/browse/HIVE-9315
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-9315.01.patch, HIVE-9315.02.patch, HIVE-9315.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9194) Support select distinct *

2015-01-13 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9194:
--
Status: Patch Available  (was: Open)

 Support select distinct *
 -

 Key: HIVE-9194
 URL: https://issues.apache.org/jira/browse/HIVE-9194
 Project: Hive
  Issue Type: Sub-task
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: HIVE-9194.00.patch, HIVE-9194.01.patch, 
 HIVE-9194.02.patch, HIVE-9194.03.patch, HIVE-9194.04.patch


 As per [~jpullokkaran]'s review comments, implement select distinct *



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9200) CBO (Calcite Return Path): Inline Join, Properties

2015-01-13 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-9200:
--
Attachment: HIVE-9200.08.patch

[~jpullokkaran], I'm rebasing the patch and uploading it again to run QA and be 
sure.

 CBO (Calcite Return Path): Inline Join, Properties
 --

 Key: HIVE-9200
 URL: https://issues.apache.org/jira/browse/HIVE-9200
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 0.15.0

 Attachments: HIVE-9200.01.patch, HIVE-9200.02.patch, 
 HIVE-9200.03.patch, HIVE-9200.04.patch, HIVE-9200.05.patch, 
 HIVE-9200.06.patch, HIVE-9200.07.patch, HIVE-9200.08.patch, HIVE-9200.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-9360:
---

 Summary: TestSparkClient throws Timeoutexception
 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho


TestSparkClient has been throwing TimeoutException in some test runs.

{noformat}
java.util.concurrent.TimeoutException: null
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
at 
org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
at 
org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
at 
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
{noformat}

for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9360:

Description: 
TestSparkClient has been throwing TimeoutException in some test runs.

The exception looks like:
{noformat}
java.util.concurrent.TimeoutException: null
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
at 
org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
at 
org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
at 
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
{noformat}
but for each of the tests.

  was:
TestSparkClient has been throwing TimeoutException in some test runs.

{noformat}
java.util.concurrent.TimeoutException: null
at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
at 
org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
at 
org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
at 
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
{noformat}

for each of the tests.


 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho

 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9357:
--
Attachment: HIVE-9357.1.patch

HIVE-9357.1.patch

 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9357.1.patch


 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9357:
--
Status: Patch Available  (was: In Progress)

 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9357.1.patch


 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


mistake in udf_date_add.q.out

2015-01-13 Thread Alexander Pivovarov
files:
ql/src/test/results/clientpositive/udf_date_add.q.out

ql/src/test/results/beelinepositive/udf_date_add.q.out

last line shows   '2009-31-07'

should it be   '2009-07-31'   instead?


Re: Review Request 29763: HIVE-9292

2015-01-13 Thread John Pullokkaran

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29763/#review67910
---



ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java
https://reviews.apache.org/r/29763/#comment111998

It seems like groupOpToInputTables is used only by 
RewriteQueryUsingAggregateIndexCtx. Couldn't we get rid of groupToInputTables 
from parse context by changing RewriteQueryUsingAggregateIndexCtx?


- John Pullokkaran


On Jan. 13, 2015, 9:17 a.m., Jesús Camacho Rodríguez wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29763/
 ---
 
 (Updated Jan. 13, 2015, 9:17 a.m.)
 
 
 Review request for hive and John Pullokkaran.
 
 
 Bugs: HIVE-9292
 https://issues.apache.org/jira/browse/HIVE-9292
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 CBO (Calcite Return Path): Inline GroupBy, Properties
 
 
 Diffs
 -
 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/index/RewriteQueryUsingAggregateIndexCtx.java
  fe686d96b642572059ef13129951d01fce4fedce 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 
 8215c26f48f1611d3c64f6df5bcfac02069e3a67 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
 cea86dfbf67b85cba24fb0e7ebf270abbe9c31f9 
   ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 
 23fbbe11198ac5893a84bdf94f9c843c4ee2ccb4 
   ql/src/java/org/apache/hadoop/hive/ql/plan/GroupByDesc.java 
 7a0b0da5de5c1c7dc8dd099167e9d06e6b27eea2 
 
 Diff: https://reviews.apache.org/r/29763/diff/
 
 
 Testing
 ---
 
 Existing tests.
 
 
 Thanks,
 
 Jesús Camacho Rodríguez
 




[jira] [Updated] (HIVE-9352) Merge from spark to trunk (follow-up of HIVE-9257)

2015-01-13 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9352:

Attachment: HIVE-9352.patch

I think these are flaky, TestSparkClient has been failing in some other runs as 
well with timeout exception (we'll have to increase it, I think).  I also 
couldn't reproduce the other failures (Hadoop20SAuthBridge, louter_join_ppr).  
Other ones I've seen before as flaky tests.  I think this run had a bad (slow) 
set of hosts.

Uploading same patch again.

 Merge from spark to trunk (follow-up of HIVE-9257)
 --

 Key: HIVE-9352
 URL: https://issues.apache.org/jira/browse/HIVE-9352
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: 0.15.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-9352.patch, HIVE-9352.patch


 Will include following JIRA's (not-inclusive list)
 HIVE-7674 (remove spark-snapshot dependency)
 HIVE-9335 (cleanup)
 HIVE-9340 (cleanup 2, including removing spark snapshot repo)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8485) HMS on Oracle incompatibility

2015-01-13 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275746#comment-14275746
 ] 

Sushanth Sowmyan commented on HIVE-8485:


@[~vikram.dixit] : Since this is an important robustness fix, I'd like to see 
this included in 0.14.1 as well.

@[~ctang.ma] : Are you okay with this approach of fixing the robustness on our 
end?

@[~sershe] : Could you please review this patch?

 HMS on Oracle incompatibility
 -

 Key: HIVE-8485
 URL: https://issues.apache.org/jira/browse/HIVE-8485
 Project: Hive
  Issue Type: Bug
  Components: Metastore
 Environment: Oracle as metastore DB
Reporter: Ryan Pridgeon
Assignee: Chaoyu Tang
 Attachments: HIVE-8485.2.patch, HIVE-8485.patch


 Oracle does not distinguish between empty strings and NULL,which proves 
 problematic for DataNucleus.
 In the event a user creates a table with some property stored as an empty 
 string the table will no longer be accessible.
 i.e. TBLPROPERTIES ('serialization.null.format'='')
 If they try to select, describe, drop, etc the client prints the following 
 exception.
 ERROR ql.Driver: FAILED: SemanticException [Error 10001]: Table not found 
 table name
 The work around for this was to go into the hive metastore on the Oracle 
 database and replace NULL with some other string. Users could then drop the 
 tables or alter their data to use the new null format they just set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-9357 started by Alexander Pivovarov.
-
 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov

 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9360) TestSparkClient throws Timeoutexception

2015-01-13 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-9360:

Attachment: HIVE-9360.patch

I am not sure, I think the test seems ok, but on PTest build infra it is 
running concurrently with bunch of other tests so Future.get might take awhile 
to return.  So giving a try to increase the timeout.

[~chengxiang li], [~vanzin] any thoughts?

 TestSparkClient throws Timeoutexception
 ---

 Key: HIVE-9360
 URL: https://issues.apache.org/jira/browse/HIVE-9360
 Project: Hive
  Issue Type: Test
  Components: Tests
Affects Versions: 0.15.0
Reporter: Szehon Ho
 Attachments: HIVE-9360.patch


 TestSparkClient has been throwing TimeoutException in some test runs.
 The exception looks like:
 {noformat}
 java.util.concurrent.TimeoutException: null
   at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:74)
   at org.apache.hive.spark.client.JobHandleImpl.get(JobHandleImpl.java:35)
   at 
 org.apache.hive.spark.client.TestSparkClient$5.call(TestSparkClient.java:130)
   at 
 org.apache.hive.spark.client.TestSparkClient.runTest(TestSparkClient.java:224)
   at 
 org.apache.hive.spark.client.TestSparkClient.testMetricsCollection(TestSparkClient.java:126)
 {noformat}
 but for each of the tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9357) Create ADD_MONTHS UDF

2015-01-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275817#comment-14275817
 ] 

Alexander Pivovarov commented on HIVE-9357:
---

Review Board Request https://reviews.apache.org/r/29861/

 Create ADD_MONTHS UDF
 -

 Key: HIVE-9357
 URL: https://issues.apache.org/jira/browse/HIVE-9357
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9357.1.patch


 ADD_MONTHS adds a number of months to startdate: 
 add_months('2015-01-14', 1) = '2015-02-14'
 add_months('2015-01-31', 1) = '2015-02-28'
 add_months('2015-02-28', 2) = '2015-04-30'
 add_months('2015-02-28', 12) = '2016-02-29'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9361) Intermittent NPE in SessionHiveMetaStoreClient.alterTempTable

2015-01-13 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-9361:


 Summary: Intermittent NPE in 
SessionHiveMetaStoreClient.alterTempTable
 Key: HIVE-9361
 URL: https://issues.apache.org/jira/browse/HIVE-9361
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


it's happening at 
{noformat}
MetaStoreUtils.updateUnpartitionedTableStatsFast(newtCopy,
wh.getFileStatusesForSD(newtCopy.getSd()), false, true);
{noformat}

other methods in this class call getWh() to get Warehouse so this likely 
explains why it's intermittent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf

2015-01-13 Thread Alexander Pivovarov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Pivovarov updated HIVE-9366:
--
Status: Patch Available  (was: In Progress)

 wrong date in description annotation in date_add() and date_sub() udf
 -

 Key: HIVE-9366
 URL: https://issues.apache.org/jira/browse/HIVE-9366
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch


 files:
 ql/src/test/results/clientpositive/udf_date_add.q.out
 ql/src/test/results/beelinepositive/udf_date_add.q.out
 last line shows   '2009-31-07' but it should be   '2009-07-31'   instead
 the @Description annotation needs to be fixed for both date_add() and 
 date_sub()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf

2015-01-13 Thread Alexander Pivovarov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276357#comment-14276357
 ] 

Alexander Pivovarov commented on HIVE-9366:
---

Review Board Request https://reviews.apache.org/r/29871/

 wrong date in description annotation in date_add() and date_sub() udf
 -

 Key: HIVE-9366
 URL: https://issues.apache.org/jira/browse/HIVE-9366
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch


 files:
 ql/src/test/results/clientpositive/udf_date_add.q.out
 ql/src/test/results/beelinepositive/udf_date_add.q.out
 last line shows   '2009-31-07' but it should be   '2009-07-31'   instead
 the @Description annotation needs to be fixed for both date_add() and 
 date_sub()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9366) wrong date in description annotation in date_add() and date_sub() udf

2015-01-13 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276362#comment-14276362
 ] 

Jason Dere commented on HIVE-9366:
--

+1 if tests pass

 wrong date in description annotation in date_add() and date_sub() udf
 -

 Key: HIVE-9366
 URL: https://issues.apache.org/jira/browse/HIVE-9366
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.14.1
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Trivial
 Attachments: HIVE-9366.1.patch, HIVE-9366.2.patch, HIVE-9366.3.patch


 files:
 ql/src/test/results/clientpositive/udf_date_add.q.out
 ql/src/test/results/beelinepositive/udf_date_add.q.out
 last line shows   '2009-31-07' but it should be   '2009-07-31'   instead
 the @Description annotation needs to be fixed for both date_add() and 
 date_sub()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9038) Join tests fail on Tez

2015-01-13 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276373#comment-14276373
 ] 

Vikram Dixit K commented on HIVE-9038:
--

This issue as Navis mentioned, stems from the fact that tez does not make 
filterTag. FilterTag is generated in case of MR by the HashTableSinkOperator 
which is not used in case of tez. The right solution would be to have a select 
operator that adds the filtertag to the value field so as to work without 
sacrificing the increased stages by moving this to a shuffle join instead of 
map join. However, since this only happens in the case where there are multiple 
joins on the same key with an outer join doing the filtering, I have this patch 
that changes the join to a shuffle join in this case for the time being so as 
to get away from the asserts. I will be raising a different jira for the good 
fix.

 Join tests fail on Tez
 --

 Key: HIVE-9038
 URL: https://issues.apache.org/jira/browse/HIVE-9038
 Project: Hive
  Issue Type: Bug
  Components: Tests, Tez
Reporter: Ashutosh Chauhan
Assignee: Vikram Dixit K

 Tez doesn't run all tests. But, if you run them, following tests fail with 
 runt time exception pointing to bugs. 
 {{auto_join21.q,auto_join29.q,auto_join30.q
 ,auto_join_filters.q,auto_join_nulls.q}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9038) Join tests fail on Tez

2015-01-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276402#comment-14276402
 ] 

Sergey Shelukhin commented on HIVE-9038:


I think default switch case should throw/assert... otherwise looks good. I 
didn't check the out files, I assume the query results are no different from 
non-tez 

 Join tests fail on Tez
 --

 Key: HIVE-9038
 URL: https://issues.apache.org/jira/browse/HIVE-9038
 Project: Hive
  Issue Type: Bug
  Components: Tests, Tez
Reporter: Ashutosh Chauhan
Assignee: Vikram Dixit K
 Attachments: HIVE-9038.1.patch


 Tez doesn't run all tests. But, if you run them, following tests fail with 
 runt time exception pointing to bugs. 
 {{auto_join21.q,auto_join29.q,auto_join30.q
 ,auto_join_filters.q,auto_join_nulls.q}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9341) Apply ColumnPrunning for noop PTFs

2015-01-13 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276405#comment-14276405
 ] 

Hive QA commented on HIVE-9341:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691867/HIVE-9341.2.patch.txt

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 7312 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hive.jdbc.TestSSL.testSSLFetchHttp
org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
org.apache.hive.spark.client.TestSparkClient.testCounters
org.apache.hive.spark.client.TestSparkClient.testErrorJob
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2355/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2355/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2355/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691867 - PreCommit-HIVE-TRUNK-Build

 Apply ColumnPrunning for noop PTFs
 --

 Key: HIVE-9341
 URL: https://issues.apache.org/jira/browse/HIVE-9341
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-9341.1.patch.txt, HIVE-9341.2.patch.txt


 Currently, PTF disables CP optimization, which can make a huge burden. For 
 example,
 {noformat}
 select p_mfgr, p_name, p_size,
 rank() over (partition by p_mfgr order by p_name) as r,
 dense_rank() over (partition by p_mfgr order by p_name) as dr,
 sum(p_retailprice) over (partition by p_mfgr order by p_name rows between 
 unbounded preceding and current row) as s1
 from noop(on part 
   partition by p_mfgr
   order by p_name
   );
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Map Operator Tree:
   TableScan
 alias: part
 Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
 Reduce Output Operator
   key expressions: p_mfgr (type: string), p_name (type: string)
   sort order: ++
   Map-reduce partition columns: p_mfgr (type: string)
   Statistics: Num rows: 26 Data size: 3147 Basic stats: COMPLETE 
 Column stats: NONE
   value expressions: p_partkey (type: int), p_name (type: 
 string), p_mfgr (type: string), p_brand (type: string), p_type (type: 
 string), p_size (type: int), p_container (type: string), p_retailprice (type: 
 double), p_comment (type: string), BLOCK__OFFSET__INSIDE__FILE (type: 
 bigint), INPUT__FILE__NAME (type: string), ROW__ID (type: 
 structtransactionid:bigint,bucketid:int,rowid:bigint)
 ...
 {noformat}
 There should be a generic way to discern referenced columns but before that, 
 we know CP can be safely applied to noop functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >