date:20150706


[ 
https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14614960#comment-14614960
 ] 

Jesus Camacho Rodriguez commented on HIVE-10281:


+1, thanks [~Ferd]!

 Update people page for the new committers
 -

 Key: HIVE-10281
 URL: https://issues.apache.org/jira/browse/HIVE-10281
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Chao Sun
Assignee: Ferdinand Xu
 Attachments: HIVE-10281.patch


 NO PRECOMMIT TESTS
 Add Jesus and Chinna as committer in the people page 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results


 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10996:
---
Affects Version/s: (was: 1.0.0)

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
 HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
 HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1

2015-07-06 Thread Aihua Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615066#comment-14615066
 ] 

Aihua Xu commented on HIVE-11129:
-

That should be the case. While seems like the warning may be too restrict since 
converting between UTF-8, UTF-16 and UTF-32 should cause no loss. let me handle 
that case.

 Issue a warning when copied from UTF-8 to ISO 8859-1
 

 Key: HIVE-11129
 URL: https://issues.apache.org/jira/browse/HIVE-11129
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Aihua Xu
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11129.patch


 Copying data from a table using UTF-8 encoding to one using ISO 8859-1 
 encoding causes data corruption without warning.
 {noformat}
 CREATE TABLE person_utf8 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('serialization.encoding'='UTF8');
 {noformat}
 Put the following data in the table:
 Müller,Thomas
 Jørgensen,Jørgen
 Vega,Andrés
 中村,浩人
 אביה,נועם
 {noformat}
 CREATE TABLE person_2 ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1')
 AS select * from person_utf8;
 {noformat}
 expected to get mangled data but we should give a warning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11164) WebHCat should log contents of HiveConf on startup

2015-07-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11164:
--
Attachment: HIVE-11164.patch

 WebHCat should log contents of HiveConf on startup
 --

 Key: HIVE-11164
 URL: https://issues.apache.org/jira/browse/HIVE-11164
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11164.patch


 There are a few places in WebHCat that do new HiveConf() but HiveConf is not 
 added to AppConfig.  Need to log HiveConf contents on startup to help 
 diagnosing issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11013) LLAP: MiniTez tez_join_hash test on the branch fails with NPE (initializeOp not called?)


 [ 
https://issues.apache.org/jira/browse/HIVE-11013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11013:

Attachment: HIVE-11013.01.patch

master patch. Hopefully HiveQA will also run

 LLAP: MiniTez tez_join_hash test on the branch fails with NPE (initializeOp 
 not called?)
 

 Key: HIVE-11013
 URL: https://issues.apache.org/jira/browse/HIVE-11013
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11013.01.patch, HIVE-11013.patch


 Line numbers are shifted due to logging; the NPE is at 
 {noformat}
 hashMapRowGetters = new ReusableGetAdaptor[mapJoinTables.length];
 {noformat}
 So looks like mapJoinTables is null.
 I added logging to see if they could be set to null from cache, but that 
 doesn't seem to be the case.
 Looks like initializeOp is not called. 
 {noformat}
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected 
 exception from MapJoinOperator : null
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:428)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:872)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:87)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:872)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:643)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:656)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:659)
   at 
 org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:755)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:315)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:278)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:271)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:257)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
   ... 17 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.process(MapJoinOperator.java:339)
   ... 29 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call


 [ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10940:

Assignee: (was: Sergey Shelukhin)

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.03.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11130) Refactoring the code so that HiveTxnManager interface will support lock/unlock table/database object


[ 
https://issues.apache.org/jira/browse/HIVE-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615574#comment-14615574
 ] 

Alan Gates commented on HIVE-11130:
---

The only comment I have is that in the HiveTxnManagerImpl implementations of 
lockTable, etc. I think it would be good to call 
HiveTxnManger.supportsExplicitLock and throw if that returns true.  This avoids 
an erroneous code path ending up there from DbTxnManager, which should never 
call these methods.  

Other than that, +1.

 Refactoring the code so that HiveTxnManager interface will support 
 lock/unlock table/database object
 

 Key: HIVE-11130
 URL: https://issues.apache.org/jira/browse/HIVE-11130
 Project: Hive
  Issue Type: Sub-task
  Components: Locking
Affects Versions: 2.0.0
Reporter: Aihua Xu
Assignee: Aihua Xu
 Attachments: HIVE-11130.patch


 This is just a refactoring step which keeps the current logic, but it exposes 
 the explicit lock/unlock table and database  in HiveTxnManager which should 
 be implemented differently by the subclasses ( currently it's not. e.g., for 
 ZooKeeper implementation, we should lock table and database when we try to 
 lock the table).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call


[ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615615#comment-14615615
 ] 

Gopal V commented on HIVE-10940:


[~hagleitn]: this fixes the leak, but reintroduces the performance issue. Added 
log lines and it showed for query27

{code}
2015-07-06 13:08:31,521 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: 
hasObj = false, hasExpr=true
2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.ids=0,6
2015-07-06 13:08:31,522 INFO [InputInitializer [Map 5] #0] io.HiveInputFormat: 
hive.io.file.readcolumn.names=d_date_sk,d_year
{code}

so it hits the serialize codepath still

{code}
 if (!hasObj) {
   serializedFilterObj = Utilities.serializeObject(filterObject);
 }
{code}

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.03.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo


 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-9557:
-
Attachment: (was: udf_cosine_similarity-v01.patch)

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Nishant Kelkar
  Labels: CosineSimilarity, SimilarityMetric, UDF
 Attachments: HIVE-9557.1.patch, HIVE-9557.2.patch, HIVE-9557.3.patch


 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11011) LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE

2015-07-06 Thread Vikram Dixit K (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615650#comment-14615650
 ] 

Vikram Dixit K commented on HIVE-11011:
---

+1 LGTM

 LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE
 --

 Key: HIVE-11011
 URL: https://issues.apache.org/jira/browse/HIVE-11011
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11011.patch


 Original issue here was fixed by TEZ-2568.
 The new issue is:
 {noformat}
 2015-07-01 15:53:44,374 ERROR [main]: SessionState 
 (SessionState.java:printError(987)) - Vertex failed, vertexName=Map 2, 
 vertexId=vertex_1435791127343_0002_2_00, diagnostics=[Task failed, 
 taskId=task_1435791127343_0002_2_00_00, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task: 
 attempt_1435791127343_0002_2_00_00_0:java.lang.RuntimeException: 
 java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:255)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeLocalWork(CommonMergeJoinOperator.java:631)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:221)
   ... 15 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-9272) Tests for utf-8 support

2015-07-06 Thread Aswathy Chellammal Sreekumar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615684#comment-14615684
 ] 

Aswathy Chellammal Sreekumar commented on HIVE-9272:


[~ekoifman] Could you please review the attached patch and see if it solves the 
issue

 Tests for utf-8 support
 ---

 Key: HIVE-9272
 URL: https://issues.apache.org/jira/browse/HIVE-9272
 Project: Hive
  Issue Type: Test
  Components: Tests, WebHCat
Affects Versions: 0.14.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Attachments: HIVE-9272.1.patch, HIVE-9272.2.patch, HIVE-9272.3.patch, 
 HIVE-9272.4.patch, HIVE-9272.5.patch, HIVE-9272.6.patch, HIVE-9272.7.patch, 
 HIVE-9272.8.patch, HIVE-9272.9.patch, HIVE-9272.patch


 Including some test cases for utf8 support in webhcat. The first four tests 
 invoke hive, pig, mapred and streaming apis for testing the utf8 support for 
 data processed, file names and job name. The last test case tests the 
 filtering of job name with utf8 character



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results


[ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615273#comment-14615273
 ] 

Jesus Camacho Rodriguez commented on HIVE-10996:


Pushed to 1.1 branch.

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Fix For: 1.1.1, 2.0.0, 1.2.2

 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
 HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
 HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2015-07-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615282#comment-14615282
 ] 

Alain Blankenburg-Schröder commented on HIVE-5317:
--

Thanks for your email.
Unfortunately, you will no longer be able to reach me under this mailaccount.
Please note that your email will not be forwarded.
For urgent inquiries, please contact my colleague Philipp Kölmel via email 
p.koel...@bigpoint.netmailto:p.koel...@bigpoint.net.
Best regards,
Alain Blankenburg-Schröder


 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11184) Lineage - ExprProcFactory#getExprString may throw NullPointerException

2015-07-06 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11184:
---
Fix Version/s: 2.0.0

 Lineage - ExprProcFactory#getExprString may throw NullPointerException
 --

 Key: HIVE-11184
 URL: https://issues.apache.org/jira/browse/HIVE-11184
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: 2.0.0

 Attachments: HIVE-11184.1.patch


 ColumnInfo may have null alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-11011) LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-11011:
---

Assignee: Sergey Shelukhin

 LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE
 --

 Key: HIVE-11011
 URL: https://issues.apache.org/jira/browse/HIVE-11011
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Original issue here was fixed by TEZ-2568.
 The new issue is:
 {noformat}
 2015-07-01 15:53:44,374 ERROR [main]: SessionState 
 (SessionState.java:printError(987)) - Vertex failed, vertexName=Map 2, 
 vertexId=vertex_1435791127343_0002_2_00, diagnostics=[Task failed, 
 taskId=task_1435791127343_0002_2_00_00, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task: 
 attempt_1435791127343_0002_2_00_00_0:java.lang.RuntimeException: 
 java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:255)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeLocalWork(CommonMergeJoinOperator.java:631)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:221)
   ... 15 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11170) port parts of HIVE-11015 to master for ease of future merging


 [ 
https://issues.apache.org/jira/browse/HIVE-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11170:

Attachment: HIVE-11170.01.patch

Same patch for HiveQA

 port parts of HIVE-11015 to master for ease of future merging
 -

 Key: HIVE-11170
 URL: https://issues.apache.org/jira/browse/HIVE-11170
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 2.0.0

 Attachments: HIVE-11170.01.patch, HIVE-11170.patch


 That patch changes how IOContext is created (file structure) and adds tests; 
 I will merge non-LLAP parts of it now, so it's easier to merge later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4239) Remove lock on compilation stage

2015-07-06 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615692#comment-14615692
 ] 

Thejas M Nair commented on HIVE-4239:
-

+1
Sorry about the delay!


 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-4239) Remove lock on compilation stage


[ 
https://issues.apache.org/jira/browse/HIVE-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615552#comment-14615552
 ] 

Sergey Shelukhin commented on HIVE-4239:


[~thejas] I just realized this actually still needs review :)

 Remove lock on compilation stage
 

 Key: HIVE-4239
 URL: https://issues.apache.org/jira/browse/HIVE-4239
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, Query Processor
Reporter: Carl Steinbach
Assignee: Sergey Shelukhin
 Attachments: HIVE-4239.01.patch, HIVE-4239.02.patch, 
 HIVE-4239.03.patch, HIVE-4239.04.patch, HIVE-4239.05.patch, 
 HIVE-4239.06.patch, HIVE-4239.07.patch, HIVE-4239.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10940) HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader call


 [ 
https://issues.apache.org/jira/browse/HIVE-10940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-10940:
---
Assignee: Gunther Hagleitner

 HiveInputFormat::pushFilters serializes PPD objects for each getRecordReader 
 call
 -

 Key: HIVE-10940
 URL: https://issues.apache.org/jira/browse/HIVE-10940
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Affects Versions: 1.2.0
Reporter: Gopal V
Assignee: Gunther Hagleitner
 Fix For: 2.0.0

 Attachments: HIVE-10940.01.patch, HIVE-10940.02.patch, 
 HIVE-10940.03.patch, HIVE-10940.patch


 {code}
 String filterText = filterExpr.getExprString();
 String filterExprSerialized = Utilities.serializeExpression(filterExpr);
 {code}
 the serializeExpression initializes Kryo and produces a new packed object for 
 every split.
 HiveInputFormat::getRecordReader - pushProjectionAndFilters - pushFilters.
 And Kryo is very slow to do this for a large filter clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11164) WebHCat should log contents of HiveConf on startup

2015-07-06 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615569#comment-14615569
 ] 

Thejas M Nair commented on HIVE-11164:
--

+1

 WebHCat should log contents of HiveConf on startup
 --

 Key: HIVE-11164
 URL: https://issues.apache.org/jira/browse/HIVE-11164
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11164.patch


 There are a few places in WebHCat that do new HiveConf() but HiveConf is not 
 added to AppConfig.  Need to log HiveConf contents on startup to help 
 diagnosing issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11011) LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE


 [ 
https://issues.apache.org/jira/browse/HIVE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-11011:

Attachment: HIVE-11011.patch

This appears to be branch-specific issue, the line that sets dummy ops for 
map-side record processor is missing. git blame does not give me conclusive 
results for when it was removed... re-adding it

[~vikram.dixit] can you take a look?

 LLAP: test auto_sortmerge_join_5 on MiniTez fails with NPE
 --

 Key: HIVE-11011
 URL: https://issues.apache.org/jira/browse/HIVE-11011
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11011.patch


 Original issue here was fixed by TEZ-2568.
 The new issue is:
 {noformat}
 2015-07-01 15:53:44,374 ERROR [main]: SessionState 
 (SessionState.java:printError(987)) - Vertex failed, vertexName=Map 2, 
 vertexId=vertex_1435791127343_0002_2_00, diagnostics=[Task failed, 
 taskId=task_1435791127343_0002_2_00_00, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task: 
 attempt_1435791127343_0002_2_00_00_0:java.lang.RuntimeException: 
 java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: Map operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:255)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:157)
   ... 14 more
 Caused by: java.lang.NullPointerException
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.initializeLocalWork(CommonMergeJoinOperator.java:631)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initializeLocalWork(Operator.java:439)
   at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:221)
   ... 15 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9272) Tests for utf-8 support

2015-07-06 Thread Aswathy Chellammal Sreekumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-9272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aswathy Chellammal Sreekumar updated HIVE-9272:
---
Attachment: HIVE-9272.9.patch

 Tests for utf-8 support
 ---

 Key: HIVE-9272
 URL: https://issues.apache.org/jira/browse/HIVE-9272
 Project: Hive
  Issue Type: Test
  Components: Tests, WebHCat
Affects Versions: 0.14.0
Reporter: Aswathy Chellammal Sreekumar
Assignee: Aswathy Chellammal Sreekumar
Priority: Minor
 Attachments: HIVE-9272.1.patch, HIVE-9272.2.patch, HIVE-9272.3.patch, 
 HIVE-9272.4.patch, HIVE-9272.5.patch, HIVE-9272.6.patch, HIVE-9272.7.patch, 
 HIVE-9272.8.patch, HIVE-9272.9.patch, HIVE-9272.patch


 Including some test cases for utf8 support in webhcat. The first four tests 
 invoke hive, pig, mapred and streaming apis for testing the utf8 support for 
 data processed, file names and job name. The last test case tests the 
 filtering of job name with utf8 character



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1

2015-07-06 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11129:

Attachment: (was: HIVE-11129.patch)

 Issue a warning when copied from UTF-8 to ISO 8859-1
 

 Key: HIVE-11129
 URL: https://issues.apache.org/jira/browse/HIVE-11129
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Aihua Xu
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11129.patch


 Copying data from a table using UTF-8 encoding to one using ISO 8859-1 
 encoding causes data corruption without warning.
 {noformat}
 CREATE TABLE person_utf8 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('serialization.encoding'='UTF8');
 {noformat}
 Put the following data in the table:
 Müller,Thomas
 Jørgensen,Jørgen
 Vega,Andrés
 中村,浩人
 אביה,נועם
 {noformat}
 CREATE TABLE person_2 ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1')
 AS select * from person_utf8;
 {noformat}
 expected to get mangled data but we should give a warning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11129) Issue a warning when copied from UTF-8 to ISO 8859-1

2015-07-06 Thread Aihua Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-11129:

Attachment: HIVE-11129.patch

 Issue a warning when copied from UTF-8 to ISO 8859-1
 

 Key: HIVE-11129
 URL: https://issues.apache.org/jira/browse/HIVE-11129
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Aihua Xu
Assignee: Aihua Xu
 Fix For: 2.0.0

 Attachments: HIVE-11129.patch


 Copying data from a table using UTF-8 encoding to one using ISO 8859-1 
 encoding causes data corruption without warning.
 {noformat}
 CREATE TABLE person_utf8 (name STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('serialization.encoding'='UTF8');
 {noformat}
 Put the following data in the table:
 Müller,Thomas
 Jørgensen,Jørgen
 Vega,Andrés
 中村,浩人
 אביה,נועם
 {noformat}
 CREATE TABLE person_2 ROW FORMAT SERDE 
 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ('serialization.encoding'='ISO8859_1')
 AS select * from person_utf8;
 {noformat}
 expected to get mangled data but we should give a warning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11184) Lineage - ExprProcFactory#getExprString may throw NullPointerException

2015-07-06 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-11184:
---
Attachment: HIVE-11184.1.patch

 Lineage - ExprProcFactory#getExprString may throw NullPointerException
 --

 Key: HIVE-11184
 URL: https://issues.apache.org/jira/browse/HIVE-11184
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: HIVE-11184.1.patch


 ColumnInfo may have null alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2015-07-06 Thread Paul Fosse (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615279#comment-14615279
 ] 

Paul Fosse commented on HIVE-5317:
--

Merge command seems to be needed to do the first use case of the ACID feature.

Once an hour, a set of inserts and updates (up to 500k rows) for various 
dimension tables (eg. customer, inventory, stores) needs to be processed. The 
dimension tables have primary keys and are typically bucketed and sorted on 
those keys.

Typically we will load the updates to a hive table and just want to merge that 
table to the existing dimension. We are either using the old way of doing this 
(ingest, reconcile, compact  purge) or we are writing a Python script to 
process the updates. But we can't do 500K update statements an hour, so it 
doesn't seem the ACID does us any good for this use case until we have merge

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10996) Aggregation / Projection over Multi-Join Inner Query producing incorrect results


 [ 
https://issues.apache.org/jira/browse/HIVE-10996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-10996:
---
Fix Version/s: 1.1.1

 Aggregation / Projection over Multi-Join Inner Query producing incorrect 
 results
 

 Key: HIVE-10996
 URL: https://issues.apache.org/jira/browse/HIVE-10996
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 1.2.0, 1.1.0, 1.3.0, 2.0.0
Reporter: Gautam Kowshik
Assignee: Jesus Camacho Rodriguez
Priority: Critical
 Fix For: 1.1.1, 2.0.0, 1.2.2

 Attachments: HIVE-10996.01.patch, HIVE-10996.02.patch, 
 HIVE-10996.03.patch, HIVE-10996.04.patch, HIVE-10996.05.patch, 
 HIVE-10996.06.patch, HIVE-10996.07.patch, HIVE-10996.08.patch, 
 HIVE-10996.09.patch, HIVE-10996.patch, explain_q1.txt, explain_q2.txt


 We see the following problem on 1.1.0 and 1.2.0 but not 0.13 which seems like 
 a regression.
 The following query (Q1) produces no results:
 {code}
 select s
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 {code}
 While this one (Q2) does produce results :
 {code}
 select *
 from (
   select last.*, action.st2, action.n
   from (
 select purchase.s, purchase.timestamp, max (mevt.timestamp) as 
 last_stage_timestamp
 from (select * from purchase_history) purchase
 join (select * from cart_history) mevt
 on purchase.s = mevt.s
 where purchase.timestamp  mevt.timestamp
 group by purchase.s, purchase.timestamp
   ) last
   join (select * from events) action
   on last.s = action.s and last.last_stage_timestamp = action.timestamp
 ) list;
 1 21  20  Bob 1234
 1 31  30  Bob 1234
 3 51  50  Jeff1234
 {code}
 The setup to test this is:
 {code}
 create table purchase_history (s string, product string, price double, 
 timestamp int);
 insert into purchase_history values ('1', 'Belt', 20.00, 21);
 insert into purchase_history values ('1', 'Socks', 3.50, 31);
 insert into purchase_history values ('3', 'Belt', 20.00, 51);
 insert into purchase_history values ('4', 'Shirt', 15.50, 59);
 create table cart_history (s string, cart_id int, timestamp int);
 insert into cart_history values ('1', 1, 10);
 insert into cart_history values ('1', 2, 20);
 insert into cart_history values ('1', 3, 30);
 insert into cart_history values ('1', 4, 40);
 insert into cart_history values ('3', 5, 50);
 insert into cart_history values ('4', 6, 60);
 create table events (s string, st2 string, n int, timestamp int);
 insert into events values ('1', 'Bob', 1234, 20);
 insert into events values ('1', 'Bob', 1234, 30);
 insert into events values ('1', 'Bob', 1234, 25);
 insert into events values ('2', 'Sam', 1234, 30);
 insert into events values ('3', 'Jeff', 1234, 50);
 insert into events values ('4', 'Ted', 1234, 60);
 {code}
 I realize select * and select s are not all that interesting in this context 
 but what lead us to this issue was select count(distinct s) was not returning 
 results. The above queries are the simplified queries that produce the issue. 
 I will note that if I convert the inner join to a table and select from that 
 the issue does not appear.
 Update: Found that turning off  hive.optimize.remove.identity.project fixes 
 this issue. This optimization was introduced in 
 https://issues.apache.org/jira/browse/HIVE-8435



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11171) Join reordering algorithm might introduce projects between joins

2015-07-06 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-11171:
--
Attachment: HIVE-11171.02.patch

 Join reordering algorithm might introduce projects between joins
 

 Key: HIVE-11171
 URL: https://issues.apache.org/jira/browse/HIVE-11171
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Attachments: HIVE-11171.01.patch, HIVE-11171.02.patch, 
 HIVE-11171.patch, HIVE-11171.patch


 Join reordering algorithm might introduce projects between joins which causes 
 multijoin optimization in SemanticAnalyzer to not kick in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10673) Dynamically partitioned hash join for Tez

2015-07-06 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-10673:
--
Attachment: HIVE-10673.9.patch

Precommit tests never ran - re-uploading patch

 Dynamically partitioned hash join for Tez
 -

 Key: HIVE-10673
 URL: https://issues.apache.org/jira/browse/HIVE-10673
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, 
 HIVE-10673.3.patch, HIVE-10673.4.patch, HIVE-10673.5.patch, 
 HIVE-10673.6.patch, HIVE-10673.7.patch, HIVE-10673.8.patch, HIVE-10673.9.patch


 Some analysis of shuffle join queries by [~mmokhtar]/[~gopalv] found about 
 2/3 of the CPU was spent during sorting/merging.
 While this does not work for MR, for other execution engines (such as Tez), 
 it is possible to create a reduce-side join that uses unsorted inputs in 
 order to eliminate the sorting, which may be faster than a shuffle join. To 
 join on unsorted inputs, we can use the hash join algorithm to perform the 
 join in the reducer. This will require the small tables in the join to fit in 
 the reducer/hash table for this to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2015-07-06 Thread Paul Fosse (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615278#comment-14615278
 ] 

Paul Fosse commented on HIVE-5317:
--

It was moved into issue 10924.  I don't know why. 


 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2015-07-06 Thread Paul Fosse (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615283#comment-14615283
 ] 

Paul Fosse commented on HIVE-5317:
--

By it, I mean Merge. 

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-5456) Queries fail on avro backed table with empty partition

2015-07-06 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-5456:
-
Labels: Avro AvroSerde  (was: )

 Queries fail on avro backed table with empty partition 
 ---

 Key: HIVE-5456
 URL: https://issues.apache.org/jira/browse/HIVE-5456
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.13.1
Reporter: Prasad Mujumdar
Assignee: Chaoyu Tang
  Labels: Avro, AvroSerde
 Fix For: 0.14.0

 Attachments: HIVE-5456.patch, HIVE-5456.patch


 The following query fails
 {noformat}
 DROP TABLE IF EXISTS episodes_partitioned;
 CREATE TABLE episodes_partitioned
 PARTITIONED BY (doctor_pt INT)
 ROW FORMAT
 SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 TBLPROPERTIES ('avro.schema.literal'='{
   namespace: testing.hive.avro.serde,
   name: episodes,
   type: record,
   fields: [
 {
   name:title,
   type:string,
   doc:episode title
 },
 {
   name:air_date,
   type:string,
   doc:initial date
 },
 {
   name:doctor,
   type:int,
   doc:main actor playing the Doctor in episode
 }
   ]
 }');
 ALTER TABLE episodes_partitioned ADD PARTITION (doctor_pt=4);
 ALTER TABLE episodes_partitioned ADD PARTITION (doctor_pt=5);
 SELECT COUNT(*) FROM episodes_partitioned;
 {noformat}
 with following exception 
 {noformat}
 java.io.IOException: org.apache.hadoop.hive.serde2.avro.AvroSerdeException: 
 Neither avro.schema.literal nor avro.schema.url specified, can't determine 
 table schema
 at 
 org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat.getHiveRecordWriter(AvroContainerOutputFormat.java:61)
 at 
 org.apache.hadoop.hive.ql.exec.Utilities.createEmptyFile(Utilities.java:2869)
 at 
 org.apache.hadoop.hive.ql.exec.Utilities.createDummyFileForEmptyPartition(Utilities.java:2901)
 at 
 org.apache.hadoop.hive.ql.exec.Utilities.getInputPaths(Utilities.java:2825)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:381)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
 at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
 at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1409)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1187)
 at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1015)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:883)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:446)
 at 
 org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:456)
 at 
 org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:737)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support


[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615433#comment-14615433
 ] 

Alan Gates commented on HIVE-5317:
--

Yes, agreed that the merge command is needed, and hence is being worked on 
HIVE-10924.

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10924) add support for MERGE statement


 [ 
https://issues.apache.org/jira/browse/HIVE-10924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-10924:
--
Issue Type: New Feature  (was: Bug)

 add support for MERGE statement
 ---

 Key: HIVE-10924
 URL: https://issues.apache.org/jira/browse/HIVE-10924
 Project: Hive
  Issue Type: New Feature
  Components: Query Planning, Query Processor
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman

 add support for 
 MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10986) Check of fs.trash.interval in HiveMetaStore should be consistent with Trash.moveToAppropriateTrash()

2015-07-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615438#comment-14615438
 ] 

Eugene Koifman commented on HIVE-10986:
---

try getting the FileSystem based on Path not Configuration.

 Check of fs.trash.interval in HiveMetaStore should be consistent with 
 Trash.moveToAppropriateTrash()
 

 Key: HIVE-10986
 URL: https://issues.apache.org/jira/browse/HIVE-10986
 Project: Hive
  Issue Type: Sub-task
  Components: Hive
Affects Versions: 1.2.1
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-10986.2.patch, HIVE-10986.patch


 This is a followup to HIVE-10629.
 Trash.moveToAppropriateTrash() takes core-site.xml but HiveMetaStore checks 
 hiveConf which is a problem when they disagree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support


 [ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-5317:
-
Comment: was deleted

(was: Thanks for your email.
Unfortunately, you will no longer be able to reach me under this mailaccount.
Please note that your email will not be forwarded.
For urgent inquiries, please contact my colleague Philipp Kölmel via email 
p.koel...@bigpoint.netmailto:p.koel...@bigpoint.net.
Best regards,
Alain Blankenburg-Schröder
)

 Implement insert, update, and delete in Hive with full ACID support
 ---

 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.14.0

 Attachments: InsertUpdatesinHive.pdf


 Many customers want to be able to insert, update and delete rows from Hive 
 tables with full ACID support. The use cases are varied, but the form of the 
 queries that should be supported are:
 * INSERT INTO tbl SELECT …
 * INSERT INTO tbl VALUES ...
 * UPDATE tbl SET … WHERE …
 * DELETE FROM tbl WHERE …
 * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
 ...
 * SET TRANSACTION LEVEL …
 * BEGIN/END TRANSACTION
 Use Cases
 * Once an hour, a set of inserts and updates (up to 500k rows) for various 
 dimension tables (eg. customer, inventory, stores) needs to be processed. The 
 dimension tables have primary keys and are typically bucketed and sorted on 
 those keys.
 * Once a day a small set (up to 100k rows) of records need to be deleted for 
 regulatory compliance.
 * Once an hour a log of transactions is exported from a RDBS and the fact 
 tables need to be updated (up to 1m rows)  to reflect the new data. The 
 transactions are a combination of inserts, updates, and deletes. The table is 
 partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11016) MiniTez mergejoin test fails with Tez input error (issue in merge join under certain conditions)


[ 
https://issues.apache.org/jira/browse/HIVE-11016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615446#comment-14615446
 ] 

Sergey Shelukhin commented on HIVE-11016:
-

la la la

 MiniTez mergejoin test fails with Tez input error (issue in merge join under 
 certain conditions)
 

 Key: HIVE-11016
 URL: https://issues.apache.org/jira/browse/HIVE-11016
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-11016.01.patch, HIVE-11016.patch


 Didn't spend a lot of time investigating, but from the code it looks like we 
 shouldn't be calling it after false at least on this path (after false from 
 next, pushRecord returns false, which causes fetchDone to be set for the tag; 
 and fetchOneRow is not called if that is set; should be ok unless tags are 
 messed up?)
 {noformat}
 2015-06-15 17:28:17,272 ERROR [main]: SessionState 
 (SessionState.java:printError(984)) - Vertex failed, vertexName=Reducer 2, 
 vertexId=vertex_1434414363282_0002_17_03, diagnostics=[Task failed, 
 taskId=task_1434414363282_0002_17_03_02, diagnostics=[TaskAttempt 0 
 failed, info=[Error: Failure while running task: 
 attempt_1434414363282_0002_17_03_02_0:java.lang.RuntimeException: 
 java.lang.RuntimeException: Hive Runtime Error while closing operators: 
 java.lang.RuntimeException: java.io.IOException: Please check if you are 
 invoking moveToNext() even after it returned false.
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:181)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:146)
   at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
   at 
 org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
 operators: java.lang.RuntimeException: java.io.IOException: Please check if 
 you are invoking moveToNext() even after it returned false.
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:338)
   at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:172)
   ... 14 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: java.io.IOException: Please check if you are 
 invoking moveToNext() even after it returned false.
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:412)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchNextGroup(CommonMergeJoinOperator.java:380)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinFinalLeftData(CommonMergeJoinOperator.java:449)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.closeOp(CommonMergeJoinOperator.java:389)
   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:651)
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:314)
   ... 15 more
 Caused by: java.lang.RuntimeException: java.io.IOException: Please check if 
 you are invoking moveToNext() even after it returned false.
   at 
 org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:302)
   at 
 org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.fetchOneRow(CommonMergeJoinOperator.java:404)
   ... 20 more
 Caused by: java.io.IOException: Please check if you are invoking moveToNext() 
 even after it returned false.
   at 
 org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:223)
   at

[jira] [Updated] (HIVE-11160) Collect column stats when set hive.stats.autogather=true

2015-07-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-11160:
---
Attachment: Design doc for auto column stats gathering.docx

 Collect column stats when set hive.stats.autogather=true
 

 Key: HIVE-11160
 URL: https://issues.apache.org/jira/browse/HIVE-11160
 Project: Hive
  Issue Type: New Feature
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong
 Attachments: Design doc for auto column stats gathering.docx, 
 HIVE-11160.01.patch


 Hive will collect table stats when set hive.stats.autogather=true during the 
 INSERT OVERWRITE command. And then the users need to collect the column stats 
 themselves using Analyze command. In this patch, the column stats will also 
 be collected automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11186) Remove unused LlapUtils class from ql.io.orc

2015-07-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-11186:
-
Attachment: HIVE-11186.patch

 Remove unused LlapUtils class from ql.io.orc
 

 Key: HIVE-11186
 URL: https://issues.apache.org/jira/browse/HIVE-11186
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Attachments: HIVE-11186.patch


 LlapUtils class is unused. Remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11030) Enhance storage layer to create one delta file per write

2015-07-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-11030:
--
Attachment: HIVE-11030.6.patch

 Enhance storage layer to create one delta file per write
 

 Key: HIVE-11030
 URL: https://issues.apache.org/jira/browse/HIVE-11030
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 1.2.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-11030.2.patch, HIVE-11030.3.patch, 
 HIVE-11030.4.patch, HIVE-11030.5.patch, HIVE-11030.6.patch


 Currently each txn using ACID insert/update/delete will generate a delta 
 directory like delta_100_101.  In order to support multi-statement 
 transactions we must generate one delta per operation within the transaction 
 so the deltas would be named like delta_100_101_0001, etc.
 Support for MERGE (HIVE-10924) would need the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils


 [ 
https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-11137:
--
Assignee: Owen O'Malley  (was: Nishant Kelkar)

 In DateWritable remove the use of LazyBinaryUtils
 -

 Key: HIVE-11137
 URL: https://issues.apache.org/jira/browse/HIVE-11137
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HIVE-11137.1.patch


 Currently the DateWritable class uses LazyBinaryUtils, which has a lot of 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2015-07-06 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4734:
-
Labels: Avro AvroSerde Performance  (was: )

 Use custom ObjectInspectors for AvroSerde
 -

 Key: HIVE-4734
 URL: https://issues.apache.org/jira/browse/HIVE-4734
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mark Wagner
  Labels: Avro, AvroSerde, Performance
 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch, 
 HIVE-4734.4.patch, HIVE-4734.5.patch


 Currently, the AvroSerde recursively copies all fields of a record from the 
 GenericRecord to a List row object and provides the standard 
 ObjectInspectors. Performance can be improved by providing ObjectInspectors 
 to the Avro record itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11151) Calcite transitive predicate inference rule should not transitively add not null filter on non-nullable input

2015-07-06 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11151:

Fix Version/s: 1.2.2

 Calcite transitive predicate inference rule should not transitively add not 
 null filter on non-nullable input
 -

 Key: HIVE-11151
 URL: https://issues.apache.org/jira/browse/HIVE-11151
 Project: Hive
  Issue Type: Bug
  Components: CBO, Logical Optimizer
Affects Versions: 1.2.0, 1.2.1
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11151.2.patch, HIVE-11151.3.patch, 
 HIVE-11151.4.patch, HIVE-11151.patch


 Calcite rule will add predicates even if types don't match



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string


 [ 
https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11054:
---
Description: 
Hi,
I have one table with VARCHAR and CHAR datatypes.My target table has structure 
like this :--
{code}
CREATE EXTERNAL TABLE test_table(
dob string COMMENT '',
version_nbr int COMMENT '',
record_status string COMMENT '',
creation_timestamp timestamp COMMENT '')
PARTITIONED BY (
src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS ORC
LOCATION
'/test/test_table';
My source table has structure like below :--
CREATE EXTERNAL TABLE test_staging_table(
dob string COMMENT '',
version_nbr int COMMENT '',
record_status string COMMENT '',
creation_timestamp timestamp COMMENT ''
src_sys_cd varchar(10) COMMENT '',
batch_id string COMMENT '')
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
STORED AS ORC
LOCATION
'/test/test_staging_table';
{code}

We were loading data using pig script. Its a direct load, no transformation 
needed. But when i was checking test_table's data in hive. It is giving 
belowmentioned error:

{code}
Diagnostic Messages for this Task:
Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: 
java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
cannot be cast to java.lang.String
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: java.lang.RuntimeException: 
java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
cannot be cast to java.lang.String
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271)
... 11 more
Caused by: java.lang.RuntimeException: java.lang.ClassCastException: 
org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
java.lang.String
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
... 15 more
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
java.lang.String
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:566)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:90)
... 17 more
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive
{code}

Please do the needful.

  was:
Hi,
I have one table with VARCHAR and CHAR datatypes.My target table has structure 
like this :--
CREATE EXTERNAL TABLE test_table(
dob string COMMENT '',
version_nbr int COMMENT '',
record_status string COMMENT '',
creation_timestamp timestamp COMMENT

[jira] [Assigned] (HIVE-10535) LLAP: Cleanup map join cache when a query completes


 [ 
https://issues.apache.org/jira/browse/HIVE-10535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-10535:
---

Assignee: Sergey Shelukhin

 LLAP: Cleanup map join cache when a query completes
 ---

 Key: HIVE-10535
 URL: https://issues.apache.org/jira/browse/HIVE-10535
 Project: Hive
  Issue Type: Sub-task
Reporter: Siddharth Seth
Assignee: Sergey Shelukhin
 Fix For: llap






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string


 [ 
https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11054:
---
Component/s: (was: Database/Schema)
 Vectorization

 Read error : Partition Varchar column cannot be cast to string
 --

 Key: HIVE-11054
 URL: https://issues.apache.org/jira/browse/HIVE-11054
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Devansh Srivastava
Assignee: Gopal V
  Labels: Vectorization
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11054.1.patch


 Hi,
 I have one table with VARCHAR and CHAR datatypes.My target table has 
 structure like this :--
 {code}
 CREATE EXTERNAL TABLE test_table(
 dob string COMMENT '',
 version_nbr int COMMENT '',
 record_status string COMMENT '',
 creation_timestamp timestamp COMMENT '')
 PARTITIONED BY (
 src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '')
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 STORED AS ORC
 LOCATION
 '/test/test_table';
 My source table has structure like below :--
 CREATE EXTERNAL TABLE test_staging_table(
 dob string COMMENT '',
 version_nbr int COMMENT '',
 record_status string COMMENT '',
 creation_timestamp timestamp COMMENT ''
 src_sys_cd varchar(10) COMMENT '',
 batch_id string COMMENT '')
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 STORED AS ORC
 LOCATION
 '/test/test_staging_table';
 {code}
 We were loading data using pig script. Its a direct load, no transformation 
 needed. But when i was checking test_table's data in hive. It is giving 
 belowmentioned error:
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: 
 java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
 cannot be cast to java.lang.String
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.io.IOException: java.lang.RuntimeException: 
 java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
 cannot be cast to java.lang.String
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271)
 ... 11 more
 Caused by: java.lang.RuntimeException: java.lang.ClassCastException: 
 org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
 java.lang.String
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 15 more
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
 java.lang.String
 at

[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string


 [ 
https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11054:
---
Labels: Vectorization  (was: )

 Read error : Partition Varchar column cannot be cast to string
 --

 Key: HIVE-11054
 URL: https://issues.apache.org/jira/browse/HIVE-11054
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0
Reporter: Devansh Srivastava
Assignee: Gopal V
  Labels: Vectorization
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11054.1.patch


 Hi,
 I have one table with VARCHAR and CHAR datatypes.My target table has 
 structure like this :--
 {code}
 CREATE EXTERNAL TABLE test_table(
 dob string COMMENT '',
 version_nbr int COMMENT '',
 record_status string COMMENT '',
 creation_timestamp timestamp COMMENT '')
 PARTITIONED BY (
 src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '')
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 STORED AS ORC
 LOCATION
 '/test/test_table';
 My source table has structure like below :--
 CREATE EXTERNAL TABLE test_staging_table(
 dob string COMMENT '',
 version_nbr int COMMENT '',
 record_status string COMMENT '',
 creation_timestamp timestamp COMMENT ''
 src_sys_cd varchar(10) COMMENT '',
 batch_id string COMMENT '')
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 STORED AS ORC
 LOCATION
 '/test/test_staging_table';
 {code}
 We were loading data using pig script. Its a direct load, no transformation 
 needed. But when i was checking test_table's data in hive. It is giving 
 belowmentioned error:
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: 
 java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
 cannot be cast to java.lang.String
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.io.IOException: java.lang.RuntimeException: 
 java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
 cannot be cast to java.lang.String
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271)
 ... 11 more
 Caused by: java.lang.RuntimeException: java.lang.ClassCastException: 
 org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
 java.lang.String
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 15 more
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
 java.lang.String
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:566)
 at

[jira] [Updated] (HIVE-11054) Read error : Partition Varchar column cannot be cast to string


 [ 
https://issues.apache.org/jira/browse/HIVE-11054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-11054:
---
Affects Version/s: 1.2.0

 Read error : Partition Varchar column cannot be cast to string
 --

 Key: HIVE-11054
 URL: https://issues.apache.org/jira/browse/HIVE-11054
 Project: Hive
  Issue Type: Bug
  Components: Vectorization
Affects Versions: 0.14.0, 1.2.0
Reporter: Devansh Srivastava
Assignee: Gopal V
  Labels: Vectorization
 Fix For: 1.3.0, 2.0.0

 Attachments: HIVE-11054.1.patch


 Hi,
 I have one table with VARCHAR and CHAR datatypes.My target table has 
 structure like this :--
 {code}
 CREATE EXTERNAL TABLE test_table(
 dob string COMMENT '',
 version_nbr int COMMENT '',
 record_status string COMMENT '',
 creation_timestamp timestamp COMMENT '')
 PARTITIONED BY (
 src_sys_cd varchar(10) COMMENT '',batch_id string COMMENT '')
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 STORED AS ORC
 LOCATION
 '/test/test_table';
 My source table has structure like below :--
 CREATE EXTERNAL TABLE test_staging_table(
 dob string COMMENT '',
 version_nbr int COMMENT '',
 record_status string COMMENT '',
 creation_timestamp timestamp COMMENT ''
 src_sys_cd varchar(10) COMMENT '',
 batch_id string COMMENT '')
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 STORED AS ORC
 LOCATION
 '/test/test_staging_table';
 {code}
 We were loading data using pig script. Its a direct load, no transformation 
 needed. But when i was checking test_table's data in hive. It is giving 
 belowmentioned error:
 {code}
 Diagnostic Messages for this Task:
 Error: java.io.IOException: java.io.IOException: java.lang.RuntimeException: 
 java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
 cannot be cast to java.lang.String
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:273)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:183)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
 at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.io.IOException: java.lang.RuntimeException: 
 java.lang.ClassCastException: org.apache.hadoop.hive.common.type.HiveVarchar 
 cannot be cast to java.lang.String
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 at 
 org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:352)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:115)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:271)
 ... 11 more
 Caused by: java.lang.RuntimeException: java.lang.ClassCastException: 
 org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
 java.lang.String
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:95)
 at 
 org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:49)
 at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347)
 ... 15 more
 Caused by: java.lang.ClassCastException: 
 org.apache.hadoop.hive.common.type.HiveVarchar cannot be cast to 
 java.lang.String
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.addPartitionColsToBatch(VectorizedRowBatchCtx.java:566)
 at

[jira] [Updated] (HIVE-10937) LLAP: make ObjectCache for plans work properly in the daemon


 [ 
https://issues.apache.org/jira/browse/HIVE-10937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-10937:

Attachment: HIVE-10937.02.patch

rebased patch

 LLAP: make ObjectCache for plans work properly in the daemon
 

 Key: HIVE-10937
 URL: https://issues.apache.org/jira/browse/HIVE-10937
 Project: Hive
  Issue Type: Sub-task
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: llap

 Attachments: HIVE-10937.01.patch, HIVE-10937.02.patch, 
 HIVE-10937.patch


 There's perf hit otherwise, esp. when stupid planner creates 1009 reducers of 
 4Mb each.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-11186) Remove unused LlapUtils class from ql.io.orc

2015-07-06 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran resolved HIVE-11186.
--
   Resolution: Fixed
Fix Version/s: llap

Committed patch to llap branch.

 Remove unused LlapUtils class from ql.io.orc
 

 Key: HIVE-11186
 URL: https://issues.apache.org/jira/browse/HIVE-11186
 Project: Hive
  Issue Type: Sub-task
Affects Versions: llap
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
 Fix For: llap

 Attachments: HIVE-11186.patch


 LlapUtils class is unused. Remove it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11137) In DateWritable remove the use of LazyBinaryUtils


 [ 
https://issues.apache.org/jira/browse/HIVE-11137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-11137:
--
Attachment: (was: HIVE-11137.1.patch)

 In DateWritable remove the use of LazyBinaryUtils
 -

 Key: HIVE-11137
 URL: https://issues.apache.org/jira/browse/HIVE-11137
 Project: Hive
  Issue Type: Sub-task
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently the DateWritable class uses LazyBinaryUtils, which has a lot of 
 dependencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11110) Enable HiveJoinAddNotNullRule in CBO

2015-07-06 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14615740#comment-14615740
 ] 

Mostafa Mokhtar commented on HIVE-0:


[~jpullokkaran]

this is the full query 
{code}
select  i_item_id
   ,i_item_desc
   ,s_state
   ,count(ss_quantity) as store_sales_quantitycount
   ,avg(ss_quantity) as store_sales_quantityave
   ,stddev_samp(ss_quantity) as store_sales_quantitystdev
   ,stddev_samp(ss_quantity)/avg(ss_quantity) as store_sales_quantitycov
   ,count(sr_return_quantity) as_store_returns_quantitycount
   ,avg(sr_return_quantity) as_store_returns_quantityave
   ,stddev_samp(sr_return_quantity) as_store_returns_quantitystdev
   ,stddev_samp(sr_return_quantity)/avg(sr_return_quantity) as 
store_returns_quantitycov
   ,count(cs_quantity) as catalog_sales_quantitycount ,avg(cs_quantity) as 
catalog_sales_quantityave
   ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitystdev
   ,stddev_samp(cs_quantity)/avg(cs_quantity) as catalog_sales_quantitycov
 from store_sales
 ,store_returns
 ,catalog_sales
 ,date_dim d1
 ,date_dim d2
 ,date_dim d3
 ,store
 ,item
 where d1.d_quarter_name = '2000Q1'
   and d1.d_date_sk = store_sales.ss_sold_date_sk
   and item.i_item_sk = store_sales.ss_item_sk
   and store.s_store_sk = store_sales.ss_store_sk
   and store_sales.ss_customer_sk = store_returns.sr_customer_sk
   and store_sales.ss_item_sk = store_returns.sr_item_sk
   and store_sales.ss_ticket_number = store_returns.sr_ticket_number
   and store_returns.sr_returned_date_sk = d2.d_date_sk
   and d2.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
   and store_returns.sr_customer_sk = catalog_sales.cs_bill_customer_sk
   and store_returns.sr_item_sk = catalog_sales.cs_item_sk
   and catalog_sales.cs_sold_date_sk = d3.d_date_sk
   and d3.d_quarter_name in ('2000Q1','2000Q2','2000Q3')
 group by i_item_id
 ,i_item_desc
 ,s_state
 order by i_item_id
 ,i_item_desc
 ,s_state
limit 100;
{code}

Expected plan 
{code}
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 10 - Map 11 (BROADCAST_EDGE)
Map 3 - Map 7 (BROADCAST_EDGE)
Map 8 - Map 10 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
Reducer 4 - Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE), Map 3 
(SIMPLE_EDGE), Map 8 (SIMPLE_EDGE)
Reducer 5 - Reducer 4 (SIMPLE_EDGE)
Reducer 6 - Reducer 5 (SIMPLE_EDGE)
  DagName: jenkins_20150706174402_eceec100-6023-4058-85de-5cc96c9a150e:2
  Vertices:
Map 1
Map Operator Tree:
TableScan
  alias: item
  filterExpr: i_item_sk is not null (type: boolean)
  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: i_item_sk is not null (type: boolean)
Statistics: Num rows: 48000 Data size: 13824000 Basic 
stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: i_item_sk (type: int), i_item_id (type: 
string), i_item_desc (type: string)
  outputColumnNames: _col0, _col1, _col2
  Statistics: Num rows: 48000 Data size: 13824000 Basic 
stats: COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 48000 Data size: 13824000 Basic 
stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string), _col2 (type: 
string)
Execution mode: vectorized
Map 10
Map Operator Tree:
TableScan
  alias: store_returns
  filterExpr: ((sr_customer_sk is not null and sr_item_sk is 
not null) and sr_ticket_number is not null) (type: boolean)
  Statistics: Num rows: 55578005 Data size: 4155315616 Basic 
stats: COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: ((sr_customer_sk is not null and sr_item_sk is 
not null) and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 54568434 Data size: 1083441396 Basic 
stats: COMPLETE Column stats: COMPLETE
Select Operator
  expressions: sr_item_sk (type: int), sr_customer_sk 
(type: int), sr_ticket_number (type: int), sr_return_quantity (type: int), 
sr_returned_date_sk (type: int)
  outputColumnNames: _col0, _col1, _col2, _col3, _col4

[jira] [Updated] (HIVE-11188) Make ORCFile's String Dictionary more efficient

2015-07-06 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11188:
--
Description: 
Currently, ORCFile's String Dictionary uses StringRedBlackTree for 
adding/finding/sorting duplicate strings.  When there are a large number of 
unique strings (let's say over 16K) and a large number of rows (let's say 1M), 
the binary search will take O(1M * log(16K)) time which can be very long.

Alternatively, ORCFile's String Dictionary can use HashMap for adding/finding 
duplicate strings, and use quicksort at the end to produce a sorted order.  In 
the same case above, the total time spent will be O(1M + 16K * log(16K)) which 
is much faster.

When the number of unique string is close to the number of rows (let's say, 
both around 1M), ORC will automatically disable the dictionary encoding.  In 
the old approach will take O(1M * log(1M)), and our new approach will take 
O(1M) since we can skip the final quicksort if the dictionary encoding is 
disabled.

So in either case, the new approach should be a win.


Here is an PMP output based on ~600 traces (so 126 means 126/600 ~= 21% of 
total time). It's a query like INSERT OVERWRITE TABLE SELECT * FROM src using 
hive-1.1.0-cdh-5.4.1.

126  
org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:67)
 35  java.util.zip.Deflater.deflateBytes(Native Method)
 26  
org.apache.hadoop.hive.ql.io.orc.SerializationUtils.findClosestNumBits(SerializationUtils.java:218)
 24  
org.apache.hadoop.hive.serde2.lazy.LazyNonPrimitive.isNull(LazyNonPrimitive.java:63)
 22  org.apache.hadoop.hive.serde2.lazy.LazyMap.parse(LazyMap.java:204)
 22  
org.apache.hadoop.hive.serde2.lazy.LazyLong.parseLong(LazyLong.java:116)
 21  
org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:111)
 19  
org.apache.hadoop.hive.serde2.lazy.LazyPrimitive.hashCode(LazyPrimitive.java:57)
 18  
org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getRight(RedBlackTree.java:99)
 16  
org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1932)
 15  
org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native 
Method)
 15  
org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:929)
 12  
org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1607)
 12  org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
 11  
org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getLeft(RedBlackTree.java:92)
 11  
org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.add(DynamicIntArray.java:105)
 10  org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
 ...

 Make ORCFile's String Dictionary more efficient
 ---

 Key: HIVE-11188
 URL: https://issues.apache.org/jira/browse/HIVE-11188
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 1.2.0, 1.1.0
Reporter: Zheng Shao
Priority: Minor

 Currently, ORCFile's String Dictionary uses StringRedBlackTree for 
 adding/finding/sorting duplicate strings.  When there are a large number of 
 unique strings (let's say over 16K) and a large number of rows (let's say 
 1M), the binary search will take O(1M * log(16K)) time which can be very long.
 Alternatively, ORCFile's String Dictionary can use HashMap for adding/finding 
 duplicate strings, and use quicksort at the end to produce a sorted order.  
 In the same case above, the total time spent will be O(1M + 16K * log(16K)) 
 which is much faster.
 When the number of unique string is close to the number of rows (let's say, 
 both around 1M), ORC will automatically disable the dictionary encoding.  In 
 the old approach will take O(1M * log(1M)), and our new approach will take 
 O(1M) since we can skip the final quicksort if the dictionary encoding is 
 disabled.
 So in either case, the new approach should be a win.
 Here is an PMP output based on ~600 traces (so 126 means 126/600 ~= 21% of 
 total time). It's a query like INSERT OVERWRITE TABLE SELECT * FROM src 
 using hive-1.1.0-cdh-5.4.1.
 126  
 org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:67)
  35  java.util.zip.Deflater.deflateBytes(Native Method)
  26  
 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.findClosestNumBits(SerializationUtils.java:218)
  24  
 org.apache.hadoop.hive.serde2.lazy.LazyNonPrimitive.isNull(LazyNonPrimitive.java:63)
  22  org.apache.hadoop.hive.serde2.lazy.LazyMap.parse(LazyMap.java:204)
  22  
 org.apache.hadoop.hive.serde2.lazy.LazyLong.parseLong(LazyLong.java:116)
  21

[jira] [Updated] (HIVE-11152) Swapping join inputs in ASTConverter

2015-07-06 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-11152:

Fix Version/s: 1.2.2

 Swapping join inputs in ASTConverter
 

 Key: HIVE-11152
 URL: https://issues.apache.org/jira/browse/HIVE-11152
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
 Fix For: 2.0.0, 1.2.2

 Attachments: HIVE-11152.02.patch, HIVE-11152.patch


 We want that multijoin optimization in SemanticAnalyzer always kicks in when 
 CBO is enabled (if possible). For that, we may need to swap the join inputs 
 when we return from CBO through the Hive AST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo


 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-9557:
-
Attachment: (was: HIVE-9557.1.patch)

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
  Labels: CosineSimilarity, SimilarityMetric, UDF

 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo


 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-9557:
-
Attachment: (was: HIVE-9557.3.patch)

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
  Labels: CosineSimilarity, SimilarityMetric, UDF

 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11188) Make ORCFile's String Dictionary more efficient

2015-07-06 Thread Zheng Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-11188:
--
Priority: Major  (was: Minor)

 Make ORCFile's String Dictionary more efficient
 ---

 Key: HIVE-11188
 URL: https://issues.apache.org/jira/browse/HIVE-11188
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 1.2.0, 1.1.0
Reporter: Zheng Shao

 Currently, ORCFile's String Dictionary uses StringRedBlackTree for 
 adding/finding/sorting duplicate strings.  When there are a large number of 
 unique strings (let's say over 16K) and a large number of rows (let's say 
 1M), the binary search will take O(1M * log(16K)) time which can be very long.
 Alternatively, ORCFile's String Dictionary can use HashMap for adding/finding 
 duplicate strings, and use quicksort at the end to produce a sorted order.  
 In the same case above, the total time spent will be O(1M + 16K * log(16K)) 
 which is much faster.
 When the number of unique string is close to the number of rows (let's say, 
 both around 1M), ORC will automatically disable the dictionary encoding.  In 
 the old approach will take O(1M * log(1M)), and our new approach will take 
 O(1M) since we can skip the final quicksort if the dictionary encoding is 
 disabled.
 So in either case, the new approach should be a win.
 Here is an PMP output based on ~600 traces (so 126 means 126/600 ~= 21% of 
 total time). It's a query like INSERT OVERWRITE TABLE SELECT * FROM src 
 using hive-1.1.0-cdh-5.4.1.
 126  
 org.apache.hadoop.hive.ql.io.orc.StringRedBlackTree.compareValue(StringRedBlackTree.java:67)
  35  java.util.zip.Deflater.deflateBytes(Native Method)
  26  
 org.apache.hadoop.hive.ql.io.orc.SerializationUtils.findClosestNumBits(SerializationUtils.java:218)
  24  
 org.apache.hadoop.hive.serde2.lazy.LazyNonPrimitive.isNull(LazyNonPrimitive.java:63)
  22  org.apache.hadoop.hive.serde2.lazy.LazyMap.parse(LazyMap.java:204)
  22  
 org.apache.hadoop.hive.serde2.lazy.LazyLong.parseLong(LazyLong.java:116)
  21  
 org.apache.hadoop.hive.serde2.columnar.ColumnarStructBase$FieldInfo.uncheckedGetField(ColumnarStructBase.java:111)
  19  
 org.apache.hadoop.hive.serde2.lazy.LazyPrimitive.hashCode(LazyPrimitive.java:57)
  18  
 org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getRight(RedBlackTree.java:99)
  16  
 org.apache.hadoop.hive.ql.io.RCFile$Reader.getCurrentRow(RCFile.java:1932)
  15  
 org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native 
 Method)
  15  
 org.apache.hadoop.hive.ql.io.orc.WriterImpl$IntegerTreeWriter.write(WriterImpl.java:929)
  12  
 org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.write(WriterImpl.java:1607)
  12  org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
  11  
 org.apache.hadoop.hive.ql.io.orc.RedBlackTree.getLeft(RedBlackTree.java:92)
  11  
 org.apache.hadoop.hive.ql.io.orc.DynamicIntArray.add(DynamicIntArray.java:105)
  10  org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
  ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-9557) create UDF to measure strings similarity using Cosine Similarity algo


 [ 
https://issues.apache.org/jira/browse/HIVE-9557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Kelkar updated HIVE-9557:
-
Attachment: (was: HIVE-9557.2.patch)

 create UDF to measure strings similarity using Cosine Similarity algo
 -

 Key: HIVE-9557
 URL: https://issues.apache.org/jira/browse/HIVE-9557
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
  Labels: CosineSimilarity, SimilarityMetric, UDF

 algo description http://en.wikipedia.org/wiki/Cosine_similarity
 {code}
 --one word different, total 2 words
 str_sim_cosine('Test String1', 'Test String2') = (2 - 1) / 2 = 0.5f
 {code}
 reference implementation:
 https://github.com/Simmetrics/simmetrics/blob/master/src/uk/ac/shef/wit/simmetrics/similaritymetrics/CosineSimilarity.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10927) Add number of HMS/HS2 connection metrics

2015-07-06 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10927:
-
Summary: Add number of HMS/HS2 connection metrics  (was: Add number of HMS 
connection metrics)

 Add number of HMS/HS2 connection metrics
 

 Key: HIVE-10927
 URL: https://issues.apache.org/jira/browse/HIVE-10927
 Project: Hive
  Issue Type: Sub-task
  Components: Diagnosability
Reporter: Szehon Ho
 Fix For: 1.3.0

 Attachments: HIVE-10927.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10927) Add number of HMS connection metrics

2015-07-06 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-10927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-10927:
-
Attachment: HIVE-10927.patch

 Add number of HMS connection metrics
 

 Key: HIVE-10927
 URL: https://issues.apache.org/jira/browse/HIVE-10927
 Project: Hive
  Issue Type: Sub-task
  Components: Diagnosability
Reporter: Szehon Ho
 Fix For: 1.3.0

 Attachments: HIVE-10927.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11189) Add 'IGNORE NULLS' to FIRST_VALUE/LAST_VALUE

2015-07-06 Thread Prateek Rungta (JIRA)

[
https://issues.apache.org/jira/browse/HIVE-11189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616107#comment-14616107
]

Prateek Rungta commented on HIVE-11189:
---

Looks like the functions already support it: [1]. So I am able to do what I
need by passing an extra parameter to the functions. i.e. the 'true' in the
query below is to specify whether to skill_nulls or not.

```
SELECT id, LAST_VALUE(col, true) over (PARTITION BY id ORDER BY date)
```

Which means the easy fix is to update the specification for the functions: [2],
along with the docs. I still think adding syntactic support IGNORE NULLS is a
good idea, it'll help people already familiar with other systems avoid this
issue.

[1]:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java#L74
[2]:
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFLastValue.java#L40-L41

Add 'IGNORE NULLS' to FIRST_VALUE/LAST_VALUE

Key: HIVE-11189
URL: https://issues.apache.org/jira/browse/HIVE-11189
Project: Hive
Issue Type: Improvement
Components: PTF-Windowing
Reporter: Prateek Rungta

Other RDBMS support the specification of 'IGNORE NULLS' over a paritition to
skip NULL values for Analytic Functions. Example - Oracle's docs:
http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions057.htm
Please consider adding this to Hive.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers

2015-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616113#comment-14616113
 ] 

ASF GitHub Bot commented on HIVE-11179:
---

GitHub user sundapeng opened a pull request:

https://github.com/apache/hive/pull/44

HIVE-11179: HIVE should allow custom converting from HivePrivilegeObj…

…ectDesc to privilegeObject for different authorizers

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sundapeng/hive master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/44.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #44


commit f82dc66be7cc876323567670b7000756394baf91
Author: Sun Dapeng s...@apache.org
Date:   2015-07-07T02:02:48Z

HIVE-11179: HIVE should allow custom converting from 
HivePrivilegeObjectDesc to privilegeObject for different authorizers




 HIVE should allow custom converting from HivePrivilegeObjectDesc to 
 privilegeObject for different authorizers
 -

 Key: HIVE-11179
 URL: https://issues.apache.org/jira/browse/HIVE-11179
 Project: Hive
  Issue Type: Improvement
Reporter: Dapeng Sun
Assignee: Dapeng Sun
  Labels: Authorization

 HIVE should allow custom converting from HivePrivilegeObjectDesc to 
 privilegeObject for different authorizers:
 There is a case in Apache Sentry: Sentry support uri and server level 
 privilege, but in hive side, it uses 
 {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the 
 converting, and the code in {{getHivePrivilegeObject()}} only handle the 
 scenes for table and database 
 {noformat}
 privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW :
 HivePrivilegeObjectType.DATABASE;
 {noformat}
 A solution is move this method to {{HiveAuthorizer}}, so that a custom 
 Authorizer could enhance it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11179) HIVE should allow custom converting from HivePrivilegeObjectDesc to privilegeObject for different authorizers


[ 
https://issues.apache.org/jira/browse/HIVE-11179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616116#comment-14616116
 ] 

Ferdinand Xu commented on HIVE-11179:
-

LGTM +1 pending to the tests

 HIVE should allow custom converting from HivePrivilegeObjectDesc to 
 privilegeObject for different authorizers
 -

 Key: HIVE-11179
 URL: https://issues.apache.org/jira/browse/HIVE-11179
 Project: Hive
  Issue Type: Improvement
Reporter: Dapeng Sun
Assignee: Dapeng Sun
  Labels: Authorization
 Attachments: HIVE-11179.001.patch


 HIVE should allow custom converting from HivePrivilegeObjectDesc to 
 privilegeObject for different authorizers:
 There is a case in Apache Sentry: Sentry support uri and server level 
 privilege, but in hive side, it uses 
 {{AuthorizationUtils.getHivePrivilegeObject(privSubjectDesc)}} to do the 
 converting, and the code in {{getHivePrivilegeObject()}} only handle the 
 scenes for table and database 
 {noformat}
 privSubjectDesc.getTable() ? HivePrivilegeObjectType.TABLE_OR_VIEW :
 HivePrivilegeObjectType.DATABASE;
 {noformat}
 A solution is move this method to {{HiveAuthorizer}}, so that a custom 
 Authorizer could enhance it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11182) Enable optimized hash tables for spark [Spark Branch]

2015-07-06 Thread Rui Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-11182:
--
Attachment: HIVE-11182.1-spark.patch

The optimized table is not a {{MapJoinPersistableTableContainer}}. So in patch 
v1, we still dump the table as HashMapWrapper, but we can optionally load them 
back as optimized table.

 Enable optimized hash tables for spark [Spark Branch]
 -

 Key: HIVE-11182
 URL: https://issues.apache.org/jira/browse/HIVE-11182
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Attachments: HIVE-11182.1-spark.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11190) ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default

2015-07-06 Thread Dapeng Sun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dapeng Sun updated HIVE-11190:
--
Attachment: HIVE-11190.001.patch

 ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
 when the value is not default
 

 Key: HIVE-11190
 URL: https://issues.apache.org/jira/browse/HIVE-11190
 Project: Hive
  Issue Type: Bug
Reporter: Dapeng Sun
Assignee: Dapeng Sun
 Attachments: HIVE-11190.001.patch


 ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
 when the value is not default.
 it will cause user failed to customize the METASTORE_FILTER_HOOK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11190) ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default

2015-07-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616228#comment-14616228
 ] 

ASF GitHub Bot commented on HIVE-11190:
---

GitHub user sundapeng opened a pull request:

https://github.com/apache/hive/pull/45

HIVE-11190: ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not 
be hard code when the value is not default

ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
when the value is not default

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sundapeng/hive HIVE-11190

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/45.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #45


commit db87e59f6e1b213bfea9b6e84c056716c20210d5
Author: Sun Dapeng s...@apache.org
Date:   2015-07-07T05:49:26Z

HIVE-11190: ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not 
be hard code when the value is not default




 ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
 when the value is not default
 

 Key: HIVE-11190
 URL: https://issues.apache.org/jira/browse/HIVE-11190
 Project: Hive
  Issue Type: Bug
Reporter: Dapeng Sun
Assignee: Dapeng Sun
 Attachments: HIVE-11190.001.patch


 ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
 when the value is not default.
 it will cause user failed to customize the METASTORE_FILTER_HOOK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-11190) ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code when the value is not default


[ 
https://issues.apache.org/jira/browse/HIVE-11190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616229#comment-14616229
 ] 

Ferdinand Xu commented on HIVE-11190:
-

[~dapengsun], thanks for your patch. LGTM for the patch. 
[~thejas], do you have any further comments on this patch?

 ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
 when the value is not default
 

 Key: HIVE-11190
 URL: https://issues.apache.org/jira/browse/HIVE-11190
 Project: Hive
  Issue Type: Bug
Reporter: Dapeng Sun
Assignee: Dapeng Sun
 Attachments: HIVE-11190.001.patch


 ConfVars.METASTORE_FILTER_HOOK in authorization V2 should not be hard code 
 when the value is not default.
 it will cause user failed to customize the METASTORE_FILTER_HOOK



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-11053) Add more tests for HIVE-10844[Spark Branch]

2015-07-06 Thread GaoLun (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-11053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

GaoLun updated HIVE-11053:
--
Attachment: HIVE-11053.2-spark.patch

Format corrected.

 Add more tests for HIVE-10844[Spark Branch]
 ---

 Key: HIVE-11053
 URL: https://issues.apache.org/jira/browse/HIVE-11053
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Affects Versions: spark-branch
Reporter: Chengxiang Li
Assignee: GaoLun
Priority: Minor
 Attachments: HIVE-11053.1-spark.patch, HIVE-11053.2-spark.patch


 Add some test cases for self union, self-join, CWE, and repeated sub-queries 
 to verify the job of combining quivalent works in HIVE-10844.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-10281) Update people page for the new committers


 [ 
https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu reassigned HIVE-10281:
---

Assignee: Ferdinand Xu

 Update people page for the new committers
 -

 Key: HIVE-10281
 URL: https://issues.apache.org/jira/browse/HIVE-10281
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Chao Sun
Assignee: Ferdinand Xu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10281) Update people page for the new committers


 [ 
https://issues.apache.org/jira/browse/HIVE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-10281:

Description: 
NO PRECOMMIT TESTS
Add Jesus and Chinna as committer in the people page 

 Update people page for the new committers
 -

 Key: HIVE-10281
 URL: https://issues.apache.org/jira/browse/HIVE-10281
 Project: Hive
  Issue Type: Task
  Components: Website
Reporter: Chao Sun
Assignee: Ferdinand Xu

 NO PRECOMMIT TESTS
 Add Jesus and Chinna as committer in the people page 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-10281) Update people page for the new committers