[jira] [Commented] (HIVE-4662) first_value can't have more than one order by column
[ https://issues.apache.org/jira/browse/HIVE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194853#comment-15194853 ] Hive QA commented on HIVE-4662: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793291/HIVE-4662.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9811 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7270/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7270/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7270/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793291 - PreCommit-HIVE-TRUNK-Build > first_value can't have more than one order by column > > > Key: HIVE-4662 > URL: https://issues.apache.org/jira/browse/HIVE-4662 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 0.11.0 >Reporter: Frans Drijver >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-4662.01.patch, HIVE-4662.01.patch, > HIVE-4662.01.patch, HIVE-4662.patch > > > In the current implementation of the first_value function, it's not allowed > to have more than one (1) order by column, as so: > {quote} > select distinct > first_value(kastr.DEWNKNR) over ( partition by kastr.DEKTRNR order by > kastr.DETRADT, kastr.DEVPDNR ) > from RTAVP_DRKASTR kastr > ; > {quote} > Error given: > {quote} > FAILED: SemanticException Range based Window Frame can have only 1 Sort Key > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Attachment: (was: HIVE-13084.03.patch) > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Attachment: HIVE-13084.03.patch > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12481) Occasionally "Request is a replay" will be thrown from HS2
[ https://issues.apache.org/jira/browse/HIVE-12481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194927#comment-15194927 ] Lefty Leverenz commented on HIVE-12481: --- Doc note: The final patch did not add *hive_server2_thrift_auth_max_retries* or any other configuration parameters. No documentation is needed. > Occasionally "Request is a replay" will be thrown from HS2 > -- > > Key: HIVE-12481 > URL: https://issues.apache.org/jira/browse/HIVE-12481 > Project: Hive > Issue Type: Improvement > Components: Authentication >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-12481.2.patch, HIVE-12481.3.patch, HIVE-12481.patch > > > We have seen the following exception thrown from HS2 in secured cluster when > many queries are running simultaneously on single HS2 instance. > The cause I can guess is that it happens that two queries are submitted at > the same time and have the same timestamp. For such case, we can add a retry > for the query. > > {noformat} > 2015-11-18 16:12:33,117 ERROR org.apache.thrift.transport.TSaslTransport: > SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: Failure unspecified at GSS-API level (Mechanism level: Request > is a replay (34))] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177) > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) > at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Request is a replay (34)) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:155) > ... 14 more > Caused by: KrbException: Request is a replay (34) > at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:308) > at sun.security.krb5.KrbApReq.(KrbApReq.java:144) > at > sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771) > ... 17 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12481) Occasionally "Request is a replay" will be thrown from HS2
[ https://issues.apache.org/jira/browse/HIVE-12481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-12481: -- Labels: (was: TODOC2.1) > Occasionally "Request is a replay" will be thrown from HS2 > -- > > Key: HIVE-12481 > URL: https://issues.apache.org/jira/browse/HIVE-12481 > Project: Hive > Issue Type: Improvement > Components: Authentication >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.1.0 > > Attachments: HIVE-12481.2.patch, HIVE-12481.3.patch, HIVE-12481.patch > > > We have seen the following exception thrown from HS2 in secured cluster when > many queries are running simultaneously on single HS2 instance. > The cause I can guess is that it happens that two queries are submitted at > the same time and have the same timestamp. For such case, we can add a retry > for the query. > > {noformat} > 2015-11-18 16:12:33,117 ERROR org.apache.thrift.transport.TSaslTransport: > SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: Failure unspecified at GSS-API level (Mechanism level: Request > is a replay (34))] > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:177) > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:539) > at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:356) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism > level: Request is a replay (34)) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) > at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) > at > com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:155) > ... 14 more > Caused by: KrbException: Request is a replay (34) > at sun.security.krb5.KrbApReq.authenticate(KrbApReq.java:308) > at sun.security.krb5.KrbApReq.(KrbApReq.java:144) > at > sun.security.jgss.krb5.InitSecContextToken.(InitSecContextToken.java:108) > at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771) > ... 17 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13183) More logs in operation logs
[ https://issues.apache.org/jira/browse/HIVE-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajat Khandelwal updated HIVE-13183: Status: Patch Available (was: Open) > More logs in operation logs > --- > > Key: HIVE-13183 > URL: https://issues.apache.org/jira/browse/HIVE-13183 > Project: Hive > Issue Type: Improvement >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-13183.02.patch, HIVE-13183.03.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Attachment: HIVE-11424.04.patch > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-11424 started by Jesus Camacho Rodriguez. -- > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Status: Open (was: Patch Available) > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-11424: --- Status: Patch Available (was: In Progress) > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194982#comment-15194982 ] Jesus Camacho Rodriguez commented on HIVE-11424: [~damien.carol], there are multiple reasons. >From the optimizer perspective, it is a way of normalizing expressions in >Filter operators so we can find potential duplicate expressions and remove >them. This also has an impact in the operator statistics estimation. Further, it will make a difference in execution performance for large number of comparisons using e.g. vectorization. I do not remember the exact details, but I believe [~gopalv] can extend on it. > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11424) Rule to transform OR clauses into IN clauses in CBO
[ https://issues.apache.org/jira/browse/HIVE-11424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194986#comment-15194986 ] Jesus Camacho Rodriguez commented on HIVE-11424: [~ashutoshc], could you take a look? Thanks > Rule to transform OR clauses into IN clauses in CBO > --- > > Key: HIVE-11424 > URL: https://issues.apache.org/jira/browse/HIVE-11424 > Project: Hive > Issue Type: Bug >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11424.01.patch, HIVE-11424.01.patch, > HIVE-11424.03.patch, HIVE-11424.03.patch, HIVE-11424.04.patch, > HIVE-11424.2.patch, HIVE-11424.patch > > > We create a rule that will transform OR clauses into IN clauses (when > possible). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats
[ https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13269: --- Attachment: HIVE-13269.01.patch > Simplify comparison expressions using column stats > -- > > Key: HIVE-13269 > URL: https://issues.apache.org/jira/browse/HIVE-13269 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-13269) Simplify comparison expressions using column stats
[ https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-13269 started by Jesus Camacho Rodriguez. -- > Simplify comparison expressions using column stats > -- > > Key: HIVE-13269 > URL: https://issues.apache.org/jira/browse/HIVE-13269 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats
[ https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13269: --- Status: Open (was: Patch Available) > Simplify comparison expressions using column stats > -- > > Key: HIVE-13269 > URL: https://issues.apache.org/jira/browse/HIVE-13269 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats
[ https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13269: --- Status: Patch Available (was: In Progress) > Simplify comparison expressions using column stats > -- > > Key: HIVE-13269 > URL: https://issues.apache.org/jira/browse/HIVE-13269 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats
[ https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13269: --- Attachment: HIVE-13269.01.patch > Simplify comparison expressions using column stats > -- > > Key: HIVE-13269 > URL: https://issues.apache.org/jira/browse/HIVE-13269 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13269) Simplify comparison expressions using column stats
[ https://issues.apache.org/jira/browse/HIVE-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13269: --- Attachment: (was: HIVE-13269.01.patch) > Simplify comparison expressions using column stats > -- > > Key: HIVE-13269 > URL: https://issues.apache.org/jira/browse/HIVE-13269 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13269.01.patch, HIVE-13269.patch, HIVE-13269.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13233) Use min and max values to estimate better stats for comparison operators
[ https://issues.apache.org/jira/browse/HIVE-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13233: --- Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) Pushed to master, thanks [~ashutoshc]! > Use min and max values to estimate better stats for comparison operators > > > Key: HIVE-13233 > URL: https://issues.apache.org/jira/browse/HIVE-13233 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 2.1.0 > > Attachments: HIVE-13233.01.patch, HIVE-13233.patch > > > We should benefit from the min/max values for each column to calculate more > precisely the number of rows produced by expressions with comparison operators -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator
[ https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13287: --- Affects Version/s: 2.1.0 > Add logic to estimate stats for IN operator > --- > > Key: HIVE-13287 > URL: https://issues.apache.org/jira/browse/HIVE-13287 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Currently, IN operator is considered in the default case: reduces the input > rows to the half. This may lead to wrong estimates for the number of rows > produced by Filter operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail
[ https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195071#comment-15195071 ] Hive QA commented on HIVE-12619: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12792594/HIVE-12619.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9778 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-cte_4.q-orc_merge5.q-vectorization_limit.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-dynpart_sort_optimization2.q-cte_mat_1.q-tez_bmj_schema_evolution.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_coalesce.q-auto_sortmerge_join_7.q-dynamic_partition_pruning.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ivyDownload org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7271/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7271/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7271/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12792594 - PreCommit-HIVE-TRUNK-Build > Switching the field order within an array of structs causes the query to fail > - > > Key: HIVE-12619 > URL: https://issues.apache.org/jira/browse/HIVE-12619 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Ang Zhang >Assignee: Mohammad Kamrul Islam >Priority: Minor > Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch > > > Switching the field order within an array of structs causes the query to fail > or return the wrong data for the fields, but switching the field order within > just a struct works. > How to reproduce: > Case1 if the two fields have the same type, query will return wrong data for > the fields > drop table if exists schema_test; > create table schema_test (msg array>) stored > as parquet; > insert into table schema_test select stack(2, array(named_struct('f1', 'abc', > 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one > limit 2; > select * from schema_test; > --returns > --[{"f1":"efg","f2":"efg2"}] > --[{"f1":"abc","f2":"abc2"}] > alter table schema_test change msg msg array>; > select * from schema_test; > --returns > --[{"f2":"efg","f1":"efg2"}] > --[{"f2":"abc","f1":"abc2"}] > Case2: if the two fields have different type, the query will fail > drop table if exists schema_test; > create table schema_test (msg array>) stored as > parquet; > insert into table schema_test select stack(2, array(named_struct('f1', 'abc', > 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2; > select * from schema_test; > --returns > --[{"f1":"efg","f2":2}] > --[{"f1":"abc","f2":1}] > alter table schema_test change msg msg array>; > select * from schema_test; > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.io.IntWritable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195278#comment-15195278 ] Aihua Xu commented on HIVE-13286: - [~vikram.dixit] This is to check if we provide queryId from the client. If a client provides a queryId, then we will use that queryId internally, otherwise, we will make a new one since the client could need a meaningful queryId. Seems there is a bug in here. We should make a new queryId inside if statement and set it outside the if statement. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path
[ https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195283#comment-15195283 ] Alina Abramova commented on HIVE-12244: --- I see that most test that passed locally are failed by Jenkins. For example: I ran tests with -Dqfile_regex=smb_mapjoin.* Tests run: 33, Failures: 3, Errors: 0, Skipped: 0, Failed tests : org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin15 >From Jenkins message failed tests: org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_mapjoin9 I do not understand what happens and why these tests results are different. Could somebody run this part of tests ( -Dqfile_regex=smb_mapjoin.* ) locally and show me results please? Maybe I do something wrong. > Refactoring code for avoiding of comparison of Strings and do comparison on > Path > > > Key: HIVE-12244 > URL: https://issues.apache.org/jira/browse/HIVE-12244 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1 >Reporter: Alina Abramova >Assignee: Alina Abramova >Priority: Minor > Labels: patch > Fix For: 1.2.1 > > Attachments: HIVE-12244.1.patch, HIVE-12244.2.patch, > HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, > HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, > HIVE-12244.8.patch, HIVE-12244.9.patch > > > In Hive often String is used for representation path and it causes new issues. > We need to compare it with equals() but comparing Strings often is not right > in terms comparing paths . > I think if we use Path from org.apache.hadoop.fs we will avoid new problems > in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13232) Aggressively drop compression buffers in ORC OutStreams
[ https://issues.apache.org/jira/browse/HIVE-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195297#comment-15195297 ] Hive QA commented on HIVE-13232: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793328/HIVE-13232.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9826 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7272/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7272/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7272/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793328 - PreCommit-HIVE-TRUNK-Build > Aggressively drop compression buffers in ORC OutStreams > --- > > Key: HIVE-13232 > URL: https://issues.apache.org/jira/browse/HIVE-13232 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 0.14.1, 1.3.0, 2.1.0 > > Attachments: HIVE-13232.patch, HIVE-13232.patch, HIVE-13232.patch > > > In Hive 0.11, when ORC's OutStream's were flushed they dropped all of the > their buffers. In the patch for HIVE-4324, we inadvertently changed that > behavior so that one of the buffers is held on to. For queries with a lot of > writers and thus under significant memory pressure this can have a > significant impact on the memory usage. > Note that "hive.optimize.sort.dynamic.partition" avoids this problem by > sorting on the dynamic partition key and thus only a single ORC writer is > open at once. This will use memory more effectively and avoid creating ORC > files with very small stripes, which will produce better downstream > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195298#comment-15195298 ] Aihua Xu commented on HIVE-13286: - OK. We had a followup to fix HIVE-12456 to avoid storing queryId in SessionState since multiple queries can run in the same session at the same time. Later we will combine session conf and confOverlay conf to the query conf so the query should have the new queryId. [~vikram.dixit] Did you have the patch-12456 applied? > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12244) Refactoring code for avoiding of comparison of Strings and do comparison on Path
[ https://issues.apache.org/jira/browse/HIVE-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195422#comment-15195422 ] Alina Abramova commented on HIVE-12244: --- Could somebody apply my last patch locally and run this part of tests? > Refactoring code for avoiding of comparison of Strings and do comparison on > Path > > > Key: HIVE-12244 > URL: https://issues.apache.org/jira/browse/HIVE-12244 > Project: Hive > Issue Type: Improvement > Components: Hive >Affects Versions: 0.13.0, 0.14.0, 1.0.0, 1.2.1 >Reporter: Alina Abramova >Assignee: Alina Abramova >Priority: Minor > Labels: patch > Fix For: 1.2.1 > > Attachments: HIVE-12244.1.patch, HIVE-12244.2.patch, > HIVE-12244.3.patch, HIVE-12244.4.patch, HIVE-12244.5.patch, > HIVE-12244.6.patch, HIVE-12244.7.patch, HIVE-12244.8.patch, > HIVE-12244.8.patch, HIVE-12244.9.patch > > > In Hive often String is used for representation path and it causes new issues. > We need to compare it with equals() but comparing Strings often is not right > in terms comparing paths . > I think if we use Path from org.apache.hadoop.fs we will avoid new problems > in future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Attachment: HIVE-13084.03.patch > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13084: Attachment: (was: HIVE-13084.03.patch) > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12540) Create function failed, but show functions display it
[ https://issues.apache.org/jira/browse/HIVE-12540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195436#comment-15195436 ] Reuben Kuhnert commented on HIVE-12540: --- I just tested this, but it worked for me. Is this still an issue? Declaration: {code} public class FunctionTask extends Task { public static class MyUDF extends UDF { } } {code} Test: {code} create function udfTest as 'org.apache.hadoop.hive.ql.exec.FunctionTask$MyUDF'; INFO : Compiling command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045): create function udfTest as 'org.apache.hadoop.hive.ql.exec.FunctionTask$MyUDF' INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) INFO : Completed compiling command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045); Time taken: 0.108 seconds INFO : Concurrency mode is disabled, not creating a lock manager INFO : Executing command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045): create function udfTest as 'org.apache.hadoop.hive.ql.exec.FunctionTask$MyUDF' INFO : Starting task [Stage-0:FUNC] in serial mode INFO : Completed executing command(queryId=sircodesalot_20160315095656_38c72e48-856e-4ece-94e8-eecc145cc045); Time taken: 75.289 seconds INFO : OK No rows affected (75.5 seconds) {code} {code} 0: jdbc:hive2://localhost:1> show functions; show functions; +-+--+ |tab_name | +-+--+ | ... | | default.udftest | | ... | +-+--+ {code} > Create function failed, but show functions display it > - > > Key: HIVE-12540 > URL: https://issues.apache.org/jira/browse/HIVE-12540 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.2.0, 1.2.1 >Reporter: Weizhong >Priority: Minor > > {noformat} > 0: jdbc:hive2://vm119:1> create function udfTest as > 'hive.udf.UDFArrayNotE'; > ERROR : Failed to register default.udftest using class hive.udf.UDFArrayNotE > Error: Error while processing statement: FAILED: Execution Error, return code > 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) > 0: jdbc:hive2://vm119:1> show functions; > +-+--+ > |tab_name | > +-+--+ > | ... | > | default.udftest | > | ... | > +-+--+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-13287) Add logic to estimate stats for IN operator
[ https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-13287 started by Jesus Camacho Rodriguez. -- > Add logic to estimate stats for IN operator > --- > > Key: HIVE-13287 > URL: https://issues.apache.org/jira/browse/HIVE-13287 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13287.patch > > > Currently, IN operator is considered in the default case: reduces the input > rows to the half. This may lead to wrong estimates for the number of rows > produced by Filter operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator
[ https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13287: --- Status: Patch Available (was: In Progress) > Add logic to estimate stats for IN operator > --- > > Key: HIVE-13287 > URL: https://issues.apache.org/jira/browse/HIVE-13287 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13287.patch > > > Currently, IN operator is considered in the default case: reduces the input > rows to the half. This may lead to wrong estimates for the number of rows > produced by Filter operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13287) Add logic to estimate stats for IN operator
[ https://issues.apache.org/jira/browse/HIVE-13287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13287: --- Attachment: HIVE-13287.patch > Add logic to estimate stats for IN operator > --- > > Key: HIVE-13287 > URL: https://issues.apache.org/jira/browse/HIVE-13287 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13287.patch > > > Currently, IN operator is considered in the default case: reduces the input > rows to the half. This may lead to wrong estimates for the number of rows > produced by Filter operators. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11837) comments do not support unicode characters well.
[ https://issues.apache.org/jira/browse/HIVE-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195520#comment-15195520 ] Yongzhi Chen commented on HIVE-11837: - Need more research on how to make desc formatted work. > comments do not support unicode characters well. > > > Key: HIVE-11837 > URL: https://issues.apache.org/jira/browse/HIVE-11837 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 0.13.1, 1.1.0 > Environment: Hadoop 2.7 > Hive 0.13.1 / Hive 1.1.0 > RHEL 6.4 / SLES 11.3 >Reporter: Rudd Chen >Assignee: Rudd Chen >Priority: Minor > Attachments: HIVE-11837.1.patch, HIVE-11837.patch > > > the terminal encoding is set to UTF-8, It can display Chinese characters. > then I create a table with a comment in Chinese, both "show create table" and > "desc formatted table" can not display the Chinese characters in the table > comments, meanwhile it can display Chinese characters in column comment.. See > below: > 0: jdbc:hive2://ha-cluster/default> create table tt(id int comment '列ä¸æ–‡æµ‹è¯•') > comment '表ä¸æ–‡æµ‹è¯•'; > No rows affected (0.152 seconds) > 0: jdbc:hive2://ha-cluster/default> > 0: jdbc:hive2://ha-cluster/default> > 0: jdbc:hive2://ha-cluster/default> desc formatted tt; > +---+---+-+ > | col_name| data_type > | comment | > +---+---+-+ > | # col_name| data_type > | comment | > | | NULL > | NULL| > | id| int > | 列ä¸æ–‡æµ‹è¯• | > | | NULL > | NULL| > | # Detailed Table Information | NULL > | NULL| > | Database: | default > | NULL| > | Owner:| admin > | NULL| > | CreateTime: | Wed Sep 16 11:13:34 CST 2015 > | NULL| > | LastAccessTime: | UNKNOWN > | NULL| > | Protect Mode: | None > | NULL| > | Retention:| 0 > | NULL| > | Location: | hdfs://hacluster/user/hive/warehouse/tt > | NULL| > | Table Type: | MANAGED_TABLE > | NULL| > | Table Parameters: | NULL > | NULL| > | | comment > | \u8868\u4E2D\u6587\u6D4B\u8BD5 | > | | transient_lastDdlTime > | 1442373214 | > | | NULL > | NULL| > | # Storage Information | NULL > | NULL| > | SerDe Library:| > org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe | NULL > | > | InputFormat: | > org.apache.hadoop.hive.ql.io.RCFileInputFormat| NULL > | > | OutputFormat: | > org.apache.hadoop.hive.ql.io.RCFileOutputFormat | NULL > | > | Compressed: | No > | NULL| > | Num Buckets: | -1 > | NULL| > | Bucket Columns: | [] > | NULL| > | Sort Columns: |
[jira] [Updated] (HIVE-13243) Hive drop table on encyption zone fails for external tables
[ https://issues.apache.org/jira/browse/HIVE-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-13243: --- Resolution: Fixed Fix Version/s: 2.0.1 2.1.0 Status: Resolved (was: Patch Available) Committed to 2.1.0 and 2.0.1. Thanks [~spena] for review! > Hive drop table on encyption zone fails for external tables > --- > > Key: HIVE-13243 > URL: https://issues.apache.org/jira/browse/HIVE-13243 > Project: Hive > Issue Type: Bug > Components: Encryption, Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.1.0, 2.0.1 > > Attachments: HIVE-13243.1.patch, HIVE-13243.2.patch, HIVE-13243.patch > > > When dropping an external table with its data located in an encryption zone, > hive should not throw out MetaException(message:Unable to drop table because > it is in an encryption zone and trash is enabled. Use PURGE option to skip > trash.) in checkTrashPurgeCombination since the data should not get deleted > (or trashed) anyway regardless HDFS Trash is enabled or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13249) Hard upper bound on number of open transactions
[ https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195597#comment-15195597 ] Alan Gates commented on HIVE-13249: --- Definitely we want it more done more frequently for countOpenTxns. I wasn't suggesting they should be in the same thread. I see the disconnect now. I was thinking of AcidHouseKeeperService as a threadpool, but it isn't, it's one thread. So I think in general we should consolidate these threads into one pool. We should think too about whether the initiator, worker, and cleaner threads should be handled in such a pool as well. But all that's out of the scope of this JIRA. For this particular thread I still think we should run it server side and not client side. Unfortunately for now that means a second HousekeeperService implementation. But I think it's better to do that and clean it up later than it is to put the thread pool in the wrong place. > Hard upper bound on number of open transactions > --- > > Key: HIVE-13249 > URL: https://issues.apache.org/jira/browse/HIVE-13249 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13249.1.patch > > > We need to have a safeguard by adding an upper bound for open transactions to > avoid huge number of open-transaction requests, usually due to improper > configuration of clients such as Storm. > Once that limit is reached, clients will start failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13149) Remove some unnecessary HMS connections from HS2
[ https://issues.apache.org/jira/browse/HIVE-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195622#comment-15195622 ] Hive QA commented on HIVE-13149: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793335/HIVE-13149.4.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 9798 tests executed *Failed tests:* {noformat} TestJdbcWithMiniHS2 - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_binary org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_custom_key3 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_snapshot org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_null_first_col org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_join org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_timestamp_format org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges org.apache.hadoop.hive.cli.TestHBaseMinimrCliDriver.testCliDriver_hbase_bulk org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_generatehfiles_require_family_path org.apache.hadoop.hive.metastore.hbase.TestHBaseMetastoreMetrics.testMetaDataCounts {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7273/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7273/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7273/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 32 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793335 - PreCommit-HIVE-TRUNK-Build > Remove some unnecessary HMS connections from HS2 > - > > Key: HIVE-13149 > URL: https://issues.apache.org/jira/browse/HIVE-13149 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-13149.1.patch, HIVE-13149.2.patch, > HIVE-13149.3.patch, HIVE-13149.4.patch > > > In SessionState class, currently we will always try to get a HMS connection > in {{start(SessionState startSs, boolean isAsync, LogHelper console)}} > regardless of if the connection will be used later or not. > When SessionState is accessed by the tasks in TaskRunner.java, although most > of the tasks other than some like StatsTask, don't need to access HMS. > Currently a new HMS
[jira] [Commented] (HIVE-13260) ReduceSinkDeDuplication throws exception when pRS key is empty
[ https://issues.apache.org/jira/browse/HIVE-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195623#comment-15195623 ] Prasanth Jayachandran commented on HIVE-13260: -- +1 > ReduceSinkDeDuplication throws exception when pRS key is empty > -- > > Key: HIVE-13260 > URL: https://issues.apache.org/jira/browse/HIVE-13260 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-13260.01.patch, HIVE-13260.02.patch > > > Steps to reproduce: > {code} > set hive.mapred.mode=nonstrict; > set hive.cbo.enable=false; > set hive.map.aggr=false; > set hive.groupby.skewindata=false; > set mapred.reduce.tasks=31; > select > compute_stats(a,16),compute_stats(b,16),compute_stats(c,16),compute_stats(d,16) > from > ( > select > avg(DISTINCT substr(src.value,5)) as a, > max(substr(src.value,5)) as b, > variance(substr(src.value,5)) as c, > var_samp(substr(src.value,5)) as d > from src)subq; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-12049: -- Attachment: HIVE-12049.13.patch > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.2.patch, > HIVE-12049.3.patch, HIVE-12049.4.patch, HIVE-12049.5.patch, > HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195761#comment-15195761 ] Siddharth Seth commented on HIVE-13286: --- [~aihuaxu] - I'm curious as to why we allow the queryId to be overwritten by users. Isn't that meant to be unique within HiveServer. If some historic query information were to be retained by HiveServer - that would break. The query name can already be overwritten. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-10176) skip.header.line.count causes values to be skipped when performing insert values
[ https://issues.apache.org/jira/browse/HIVE-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladyslav Pavlenko updated HIVE-10176: -- Attachment: HIVE-10176.4.patch > skip.header.line.count causes values to be skipped when performing insert > values > > > Key: HIVE-10176 > URL: https://issues.apache.org/jira/browse/HIVE-10176 > Project: Hive > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Wenbo Wang >Assignee: Vladyslav Pavlenko > Attachments: HIVE-10176.1.patch, HIVE-10176.2.patch, > HIVE-10176.3.patch, HIVE-10176.4.patch, data > > > When inserting values in to tables with TBLPROPERTIES > ("skip.header.line.count"="1") the first value listed is also skipped. > create table test (row int, name string) TBLPROPERTIES > ("skip.header.line.count"="1"); > load data local inpath '/root/data' into table test; > insert into table test values (1, 'a'), (2, 'b'), (3, 'c'); > (1, 'a') isn't inserted into the table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195794#comment-15195794 ] Aihua Xu commented on HIVE-13286: - QueryId should be unique. The user overwritten queryId is for the user to provide meaningful queryId if the user wants to and the user needs to make sure it's unique. If the user doesn't overwrite the queryId, then hive will generate one as before. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195815#comment-15195815 ] Aihua Xu commented on HIVE-13286: - OK. I think there is an issue there. confOverlay is passed from the client. Seems we need to make a copy of that otherwise if the client reuses the same confOverlay, then queryId is reused. Is that the issue? I will correct that. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13285) Orc concatenation may drop old files from moving to final path
[ https://issues.apache.org/jira/browse/HIVE-13285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13285: - Attachment: HIVE-13285.3.patch Addressed [~gopalv]'s comment. closeOp uses abort state instead of exception state. > Orc concatenation may drop old files from moving to final path > -- > > Key: HIVE-13285 > URL: https://issues.apache.org/jira/browse/HIVE-13285 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0, 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-13285.1.patch, HIVE-13285.2.patch, > HIVE-13285.3.patch > > > ORC concatenation uses combine hive input format for merging files. Under > specific case where all files within a combine split are incompatible for > merge (old files without stripe statistics) then these files are added to > incompatible file set. But this file set is not processed as closeOp() will > not be called (no output file writer will exist which will skip > super.closeOp()). As a result, these incompatible files are not moved to > final path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195836#comment-15195836 ] Vikram Dixit K commented on HIVE-13286: --- The issue here is that if we make a change in the incoming configuration, it remains set for the duration of the session. We need to make sure that the user does not set the query id configuration because there is a chance of them breaking what a query id is - a unique id for each query. I think what you really want is something like the HIVE_LOG_TRACE_ID which could be renamed to something like a HIVE_USER_TRACE_ID - an id that a user can set and trace which can stay constant until the user decides to change it. You could create a separate configuration too for the use case you have. I think messing around with the query id looks like a recipe for bugs. I think we should move the query id to a config that the user cannot change and just put it in the utilities class for e.g. like INPUT_NAME that mapreduce used. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13226) Improve tez print summary to print query execution breakdown
[ https://issues.apache.org/jira/browse/HIVE-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195839#comment-15195839 ] Gopal V commented on HIVE-13226: LGTM - +1. > Improve tez print summary to print query execution breakdown > > > Key: HIVE-13226 > URL: https://issues.apache.org/jira/browse/HIVE-13226 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13226.1.patch, HIVE-13226.2.patch, > HIVE-13226.3.patch, sampleoutput.png > > > When tez print summary is enabled, methods summary is printed which are > difficult to correlate with the actual execution time. We can improve that to > print the execution times in the sequence of operations that happens behind > the scenes. > Instead of printing the methods name it will be useful to print something > like below > 1) Query Compilation time > 2) Query Submit to DAG Submit time > 3) DAG Submit to DAG Accept time > 4) DAG Accept to DAG Start time > 5) DAG Start to DAG End time > With this it will be easier to find out where the actual time is spent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195836#comment-15195836 ] Vikram Dixit K edited comment on HIVE-13286 at 3/15/16 6:14 PM: The issue here is that if we make a change in the incoming configuration, it remains set for the duration of the session. We need to make sure that the user does not set the query id configuration because there is a chance of them breaking what a query id is - a unique id for each query. I think what you really want is something like the HIVE_LOG_TRACE_ID which could be renamed to something like a HIVE_USER_TRACE_ID - an id that a user can set and trace which can stay constant until the user decides to change it. You could create a separate configuration too for the use case you have. I think allowing the user to mess around with the query id looks like a recipe for bugs. I think we should move the query id to a config that the user cannot change and just put it in the utilities class for e.g. like INPUT_NAME that mapreduce used. was (Author: vikram.dixit): The issue here is that if we make a change in the incoming configuration, it remains set for the duration of the session. We need to make sure that the user does not set the query id configuration because there is a chance of them breaking what a query id is - a unique id for each query. I think what you really want is something like the HIVE_LOG_TRACE_ID which could be renamed to something like a HIVE_USER_TRACE_ID - an id that a user can set and trace which can stay constant until the user decides to change it. You could create a separate configuration too for the use case you have. I think messing around with the query id looks like a recipe for bugs. I think we should move the query id to a config that the user cannot change and just put it in the utilities class for e.g. like INPUT_NAME that mapreduce used. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13226) Improve tez print summary to print query execution breakdown
[ https://issues.apache.org/jira/browse/HIVE-13226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13226: - Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) Committed to master. Thanks [~gopalv] for the review! > Improve tez print summary to print query execution breakdown > > > Key: HIVE-13226 > URL: https://issues.apache.org/jira/browse/HIVE-13226 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 2.1.0 > > Attachments: HIVE-13226.1.patch, HIVE-13226.2.patch, > HIVE-13226.3.patch, sampleoutput.png > > > When tez print summary is enabled, methods summary is printed which are > difficult to correlate with the actual execution time. We can improve that to > print the execution times in the sequence of operations that happens behind > the scenes. > Instead of printing the methods name it will be useful to print something > like below > 1) Query Compilation time > 2) Query Submit to DAG Submit time > 3) DAG Submit to DAG Accept time > 4) DAG Accept to DAG Start time > 5) DAG Start to DAG End time > With this it will be easier to find out where the actual time is spent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13285) Orc concatenation may drop old files from moving to final path
[ https://issues.apache.org/jira/browse/HIVE-13285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195866#comment-15195866 ] Gopal V commented on HIVE-13285: LGTM - +1, tests pending. > Orc concatenation may drop old files from moving to final path > -- > > Key: HIVE-13285 > URL: https://issues.apache.org/jira/browse/HIVE-13285 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.3.0, 2.0.0, 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran >Priority: Critical > Attachments: HIVE-13285.1.patch, HIVE-13285.2.patch, > HIVE-13285.3.patch > > > ORC concatenation uses combine hive input format for merging files. Under > specific case where all files within a combine split are incompatible for > merge (old files without stripe statistics) then these files are added to > incompatible file set. But this file set is not processed as closeOp() will > not be called (no output file writer will exist which will skip > super.closeOp()). As a result, these incompatible files are not moved to > final path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195987#comment-15195987 ] Aihua Xu commented on HIVE-13286: - Actually what I need is the unique queryId. Think of the scenario that hive is just one of the components in the pipeline. The client could have a queryId (e.g., to trace the generation of the query) and then call hive. Then such queryId can link them together and better for diagnosis. If we create other ids, then seems to defeat that purpose. If the user doesn't provide queryId, then Hive will take care of that. Is the following the actual issue you see? {nformat} I think there is an issue there. confOverlay is passed from the client. Seems we need to make a copy of that otherwise if the client reuses the same confOverlay, then queryId is reused. Is that the issue? I will correct that. {noformat} > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196044#comment-15196044 ] Vikram Dixit K commented on HIVE-13286: --- [~aihuaxu] Consider the following scenario: In Tez/Spark, if we end up caching the small table based on the hive query id. If say the user set the hive query id for 1 query and does not reset it for the subsequent query, we will end up picking the previously cached hash table for the join resulting in incorrect results right? Creating a new conf object would only work if we reset the query id after the query completes. If we allow it to exist in the configuration object after a query has completed running, it will result in incorrect results or some weird behavior. Consider hs2 or cli session, if a user in a session assigns a query id and doesn't reset it, it can result in incorrect results. You are expecting a user to set a query id each time after setting it once? I don't think that is great behavior. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196054#comment-15196054 ] Aihua Xu commented on HIVE-13286: - I moved to initialize the queryId earlier so that starting from the beginning of the execution, the workflow will have unique queryId. Actually I think your statement makes sense. What I need is really a traceId. Does the same queryId cause the issues? If it does, I can change to disallow the change from the client. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12995) LLAP: Synthetic file ids need collision checks
[ https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196077#comment-15196077 ] Hive QA commented on HIVE-12995: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793362/HIVE-12995.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9825 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7274/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7274/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7274/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793362 - PreCommit-HIVE-TRUNK-Build > LLAP: Synthetic file ids need collision checks > -- > > Key: HIVE-12995 > URL: https://issues.apache.org/jira/browse/HIVE-12995 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, > HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch > > > LLAP synthetic file ids do not have any way of checking whether a collision > occurs other than a data-error. > Synthetic file-ids have only been used with unit tests so far - but they will > be needed to add cache mechanisms to non-HDFS filesystems. > In case of Synthetic file-ids, it is recommended that we track the full-tuple > (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id > can be compared against the parameters & only accepted if those match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13262) LLAP: Remove log levels from DebugUtils
[ https://issues.apache.org/jira/browse/HIVE-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13262: - Attachment: HIVE-13262.2.patch Addressed [~sershe]'s review comments. > LLAP: Remove log levels from DebugUtils > --- > > Key: HIVE-13262 > URL: https://issues.apache.org/jira/browse/HIVE-13262 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13262.1.patch, HIVE-13262.2.patch > > > DebugUtils has many hardcoded log levels. To enable logging we need to > recompile code with desired value. Instead configure add loggers for these > classes with log levels via log4j properties. Also use parametrized logging > in IO elevator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196097#comment-15196097 ] Aihua Xu commented on HIVE-13286: - I see. I will disallow the input of queryId and generate a new one every time then. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196100#comment-15196100 ] Vikram Dixit K commented on HIVE-13286: --- Yeah. The same queryId causes issues. We should disallow a change from the client. The HIVE_LOG_TRACE_ID is already present in the hive configuration. You could use that. > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196105#comment-15196105 ] Vikram Dixit K commented on HIVE-13286: --- Great! Thanks! > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12619) Switching the field order within an array of structs causes the query to fail
[ https://issues.apache.org/jira/browse/HIVE-12619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196198#comment-15196198 ] Xuefu Zhang commented on HIVE-12619: Patch #3 seems simpler and fixing the field ordering issue. Looking good on my side. +1. [~spena], it would be good if you can take a look too. > Switching the field order within an array of structs causes the query to fail > - > > Key: HIVE-12619 > URL: https://issues.apache.org/jira/browse/HIVE-12619 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Ang Zhang >Assignee: Mohammad Kamrul Islam >Priority: Minor > Attachments: HIVE-12619.2.patch, HIVE-12619.3.patch > > > Switching the field order within an array of structs causes the query to fail > or return the wrong data for the fields, but switching the field order within > just a struct works. > How to reproduce: > Case1 if the two fields have the same type, query will return wrong data for > the fields > drop table if exists schema_test; > create table schema_test (msg array>) stored > as parquet; > insert into table schema_test select stack(2, array(named_struct('f1', 'abc', > 'f2', 'abc2')), array(named_struct('f1', 'efg', 'f2', 'efg2'))) from one > limit 2; > select * from schema_test; > --returns > --[{"f1":"efg","f2":"efg2"}] > --[{"f1":"abc","f2":"abc2"}] > alter table schema_test change msg msg array>; > select * from schema_test; > --returns > --[{"f2":"efg","f1":"efg2"}] > --[{"f2":"abc","f1":"abc2"}] > Case2: if the two fields have different type, the query will fail > drop table if exists schema_test; > create table schema_test (msg array>) stored as > parquet; > insert into table schema_test select stack(2, array(named_struct('f1', 'abc', > 'f2', 1)), array(named_struct('f1', 'efg', 'f2', 2))) from one limit 2; > select * from schema_test; > --returns > --[{"f1":"efg","f2":2}] > --[{"f1":"abc","f2":1}] > alter table schema_test change msg msg array>; > select * from schema_test; > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.io.IntWritable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13232) Aggressively drop compression buffers in ORC OutStreams
[ https://issues.apache.org/jira/browse/HIVE-13232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-13232: - Resolution: Fixed Fix Version/s: (was: 1.3.0) (was: 0.14.1) Status: Resolved (was: Patch Available) > Aggressively drop compression buffers in ORC OutStreams > --- > > Key: HIVE-13232 > URL: https://issues.apache.org/jira/browse/HIVE-13232 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.1.0 > > Attachments: HIVE-13232.patch, HIVE-13232.patch, HIVE-13232.patch > > > In Hive 0.11, when ORC's OutStream's were flushed they dropped all of the > their buffers. In the patch for HIVE-4324, we inadvertently changed that > behavior so that one of the buffers is held on to. For queries with a lot of > writers and thus under significant memory pressure this can have a > significant impact on the memory usage. > Note that "hive.optimize.sort.dynamic.partition" avoids this problem by > sorting on the dynamic partition key and thus only a single ORC writer is > open at once. This will use memory more effectively and avoid creating ORC > files with very small stripes, which will produce better downstream > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12995) LLAP: Synthetic file ids need collision checks
[ https://issues.apache.org/jira/browse/HIVE-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12995: Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) Committed to master. > LLAP: Synthetic file ids need collision checks > -- > > Key: HIVE-12995 > URL: https://issues.apache.org/jira/browse/HIVE-12995 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Fix For: 2.1.0 > > Attachments: HIVE-12995.01.patch, HIVE-12995.02.patch, > HIVE-12995.03.patch, HIVE-12995.04.patch, HIVE-12995.patch > > > LLAP synthetic file ids do not have any way of checking whether a collision > occurs other than a data-error. > Synthetic file-ids have only been used with unit tests so far - but they will > be needed to add cache mechanisms to non-HDFS filesystems. > In case of Synthetic file-ids, it is recommended that we track the full-tuple > (path, mtime, len) in the cache so that a cache-hit for the synthetic file-id > can be compared against the parameters & only accepted if those match. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-13286: Attachment: HIVE-13286.1.patch > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-13286: Status: Patch Available (was: Open) > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13286) Query ID is being reused across queries
[ https://issues.apache.org/jira/browse/HIVE-13286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196271#comment-15196271 ] Aihua Xu commented on HIVE-13286: - Attached the patch-1: disallow the input of the queryId. queryId will be regenernated and put in confOverlay for each query. [~vikram.dixit] Can you take a look? > Query ID is being reused across queries > --- > > Key: HIVE-13286 > URL: https://issues.apache.org/jira/browse/HIVE-13286 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0 >Reporter: Vikram Dixit K >Assignee: Aihua Xu >Priority: Critical > Attachments: HIVE-13286.1.patch > > > [~aihuaxu] I see this commit made via HIVE-11488. I see that query id is > being reused across queries. This defeats the purpose of a query id. I am not > sure what the purpose of the change in that jira is but it breaks the > assumption about a query id being unique for each query. Please take a look > into this at the earliest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12374) Improve setting of JVM configs for HS2 and Metastore shell scripts
[ https://issues.apache.org/jira/browse/HIVE-12374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196344#comment-15196344 ] Ashutosh Chauhan commented on HIVE-12374: - This has come up multiple times. Shall we get this in [~thejas] ? > Improve setting of JVM configs for HS2 and Metastore shell scripts > --- > > Key: HIVE-12374 > URL: https://issues.apache.org/jira/browse/HIVE-12374 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Metastore >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-12374.1.patch > > > Adding {{HIVESERVER2_JVM_OPTS}} and {{METASTORE_JVM_OPTS}} env variables, > which will eventually set {{HADOOP_CLIENT_OPTS}} (since we start the > processes using hadoop jar ...). Also setting these defaults:{{-Xms128m > -Xmx2048m -XX:MaxPermSize=128m}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13290) Support primary keys/foreign keys constraint as part of create table command in Hive
[ https://issues.apache.org/jira/browse/HIVE-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13290: - Attachment: HIVE-13290.1.patch > Support primary keys/foreign keys constraint as part of create table command > in Hive > > > Key: HIVE-13290 > URL: https://issues.apache.org/jira/browse/HIVE-13290 > Project: Hive > Issue Type: Sub-task > Components: CBO, Logical Optimizer >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13290.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13290) Support primary keys/foreign keys constraint as part of create table command in Hive
[ https://issues.apache.org/jira/browse/HIVE-13290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196362#comment-15196362 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-13290: -- Draft #1 with the basic changes to accept primary keys/foreign keys in create statement and the APIs to retrieve them which is exposed via desc extended tablename; cc [~ashutoshc] Will improve on this and add test cases to cover any existing issues. > Support primary keys/foreign keys constraint as part of create table command > in Hive > > > Key: HIVE-13290 > URL: https://issues.apache.org/jira/browse/HIVE-13290 > Project: Hive > Issue Type: Sub-task > Components: CBO, Logical Optimizer >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13290.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196415#comment-15196415 ] Prasanth Jayachandran edited comment on HIVE-13178 at 3/15/16 10:51 PM: I haven't gone through the core changes yet. Left some initial comments. Main concerns is we are bringing in ObjectInspector back into tree readers which will make it difficult to separate ORC out of hive. If this feature is targeted to be supported inside of orc then these object inspectors should be replaced by TypeDescriptors. Also it will good to subclass TreeReaderFactory to handle type conversions. was (Author: prasanth_j): I haven't gone through core changes. Left some initial comments. Main concerns is we are bringing in ObjectInspector back into tree readers which will make it difficult to separate ORC out of hive. If this feature is targeted to be supported inside of orc then these object inspectors should be replaced by TypeDescriptors. Also it will good to subclass TreeReaderFactory to handle type conversions. > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196415#comment-15196415 ] Prasanth Jayachandran commented on HIVE-13178: -- I haven't gone through core changes. Left some initial comments. Main concerns is we are bringing in ObjectInspector back into tree readers which will make it difficult to separate ORC out of hive. If this feature is targeted to be supported inside of orc then these object inspectors should be replaced by TypeDescriptors. Also it will good to subclass TreeReaderFactory to handle type conversions. > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy
[ https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196423#comment-15196423 ] Hive QA commented on HIVE-11675: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793363/HIVE-11675.14.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9827 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7275/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7275/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7275/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793363 - PreCommit-HIVE-TRUNK-Build > make use of file footer PPD API in ETL strategy or separate strategy > > > Key: HIVE-11675 > URL: https://issues.apache.org/jira/browse/HIVE-11675 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, > HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, > HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, > HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, > HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, > HIVE-11675.patch, HIVE-11675.premature.opti.patch > > > Need to take a look at the best flow. It won't be much different if we do > filtering metastore call for each partition. So perhaps we'd need the custom > sync point/batching after all. > Or we can make it opportunistic and not fetch any footers unless it can be > pushed down to metastore or fetched from local cache, that way the only slow > threaded op is directory listings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11388) there should only be 1 Initiator for compactions per Hive installation
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196428#comment-15196428 ] Alan Gates commented on HIVE-11388: --- TxnHandler.isDuplicateKeyError - There are only cases in here for Derby and MySQL. The other options will need to be added before this is committed. Other than that looks good. > there should only be 1 Initiator for compactions per Hive installation > -- > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.patch > > > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session
[ https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] NITHIN MAHESH updated HIVE-13264: - Attachment: HIVE-13264.patch Fixes Hive-13264 by refactoring the code to retry in openSession layer instead of openTransport. > JDBC driver makes 2 Open Session Calls for every open session > - > > Key: HIVE-13264 > URL: https://issues.apache.org/jira/browse/HIVE-13264 > Project: Hive > Issue Type: Bug > Components: JDBC >Reporter: NITHIN MAHESH >Assignee: NITHIN MAHESH > Attachments: HIVE-13264.patch > > > When HTTP is used as the transport mode by the Hive JDBC driver, we noticed > that there is an additional open/close session just to validate the > connection. > > TCLIService.Iface client = new TCLIService.Client(new > TBinaryProtocol(transport)); > TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq()); > if (openResp != null) { > client.CloseSession(new > TCloseSessionReq(openResp.getSessionHandle())); > } > > The open session call is a costly one and should not be used to test > transport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13178: Status: In Progress (was: Patch Available) > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13178: Status: Patch Available (was: In Progress) > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch, HIVE-13178.04.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13178) Enhance ORC Schema Evolution to handle more standard data type conversions
[ https://issues.apache.org/jira/browse/HIVE-13178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13178: Attachment: HIVE-13178.04.patch Rebase with recent commits. > Enhance ORC Schema Evolution to handle more standard data type conversions > -- > > Key: HIVE-13178 > URL: https://issues.apache.org/jira/browse/HIVE-13178 > Project: Hive > Issue Type: Bug > Components: Hive, ORC >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-13178.01.patch, HIVE-13178.02.patch, > HIVE-13178.03.patch, HIVE-13178.04.patch > > > Currently, SHORT -> INT -> BIGINT is supported. > Handle ORC data type conversions permitted by Implicit conversion allowed by > TypeIntoUtils.implicitConvertible method. >* STRING_GROUP -> DOUBLE >* STRING_GROUP -> DECIMAL >* DATE_GROUP -> STRING >* NUMERIC_GROUP -> STRING >* STRING_GROUP -> STRING_GROUP >* >* // Upward from "lower" type to "higher" numeric type: >* BYTE -> SHORT -> INT -> BIGINT -> FLOAT -> DOUBLE -> DECIMAL -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13264) JDBC driver makes 2 Open Session Calls for every open session
[ https://issues.apache.org/jira/browse/HIVE-13264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] NITHIN MAHESH updated HIVE-13264: - Labels: jdbc (was: ) Affects Version/s: 1.2.1 Target Version/s: 1.2.1, 1.2.0 Tags: jdbc Status: Patch Available (was: Open) Refactored the code in HiveConnection to do the connection retry at openSession level instead of OpenConnection. > JDBC driver makes 2 Open Session Calls for every open session > - > > Key: HIVE-13264 > URL: https://issues.apache.org/jira/browse/HIVE-13264 > Project: Hive > Issue Type: Bug > Components: JDBC >Affects Versions: 1.2.1 >Reporter: NITHIN MAHESH >Assignee: NITHIN MAHESH > Labels: jdbc > Attachments: HIVE-13264.patch > > > When HTTP is used as the transport mode by the Hive JDBC driver, we noticed > that there is an additional open/close session just to validate the > connection. > > TCLIService.Iface client = new TCLIService.Client(new > TBinaryProtocol(transport)); > TOpenSessionResp openResp = client.OpenSession(new TOpenSessionReq()); > if (openResp != null) { > client.CloseSession(new > TCloseSessionReq(openResp.getSessionHandle())); > } > > The open session call is a costly one and should not be used to test > transport. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13288) Confusing exception message in DagUtils.localizeResource
[ https://issues.apache.org/jira/browse/HIVE-13288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated HIVE-13288: -- Affects Version/s: 1.2.1 Component/s: Clients > Confusing exception message in DagUtils.localizeResource > > > Key: HIVE-13288 > URL: https://issues.apache.org/jira/browse/HIVE-13288 > Project: Hive > Issue Type: Improvement > Components: Clients >Affects Versions: 1.2.1 >Reporter: Jeff Zhang > > I got the following exception when query through hive server2. And check the > source code, it it due to some error when copying data from local to hdfs. > But the IOException is ignored and assume that it is due to another thread is > also writing. I don't think it make sense to assume that, at least should log > the IOException. > {code} > LOG.info("Localizing resource because it does not exist: " + src + " to dest: > " + dest); > try { > destFS.copyFromLocalFile(false, false, src, dest); > } catch (IOException e) { > LOG.info("Looks like another thread is writing the same file will > wait."); > int waitAttempts = > > conf.getInt(HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.varname, > > HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_NUM_WAIT_ATTEMPTS.defaultIntVal); > long sleepInterval = HiveConf.getTimeVar( > conf, HiveConf.ConfVars.HIVE_LOCALIZE_RESOURCE_WAIT_INTERVAL, > TimeUnit.MILLISECONDS); > LOG.info("Number of wait attempts: " + waitAttempts + ". Wait > interval: " > + sleepInterval); > boolean found = false; > {code} > {noformat} > 2016-03-15 11:25:39,921 INFO [HiveServer2-Background-Pool: Thread-249]: > tez.DagUtils (DagUtils.java:getHiveJarDirectory(876)) - Jar dir is > null/directory doesn't exist. Choosing HIVE_INSTALL_DIR - /user/jeff/.hiveJars > 2016-03-15 11:25:40,058 INFO [HiveServer2-Background-Pool: Thread-249]: > tez.DagUtils (DagUtils.java:localizeResource(952)) - Localizing resource > because it does not exist: > file:/usr/hdp/2.3.2.0-2950/hive/lib/hive-exec-1.2.1.2.3.2.0-2950.jar to dest: > hdfs://sandbox.hortonworks.com:8020/user/jeff/.hiveJars/hive-exec-1.2.1.2.3.2.0-2950-a97c953db414a4f792d868e2b0417578a61ccfa368048016926117b641b07f34.jar > 2016-03-15 11:25:40,063 INFO [HiveServer2-Background-Pool: Thread-249]: > tez.DagUtils (DagUtils.java:localizeResource(956)) - Looks like another > thread is writing the same file will wait. > 2016-03-15 11:25:40,064 INFO [HiveServer2-Background-Pool: Thread-249]: > tez.DagUtils (DagUtils.java:localizeResource(963)) - Number of wait attempts: > 5. Wait interval: 5000 > 2016-03-15 11:25:53,548 INFO [HiveServer2-Handler-Pool: Thread-48]: > thrift.ThriftCLIService (ThriftCLIService.java:OpenSession(294)) - Client > protocol version: HIVE_CLI_SERVICE_PROTOCOL_V8 > 2016-03-15 11:25:53,548 INFO [HiveServer2-Handler-Pool: Thread-48]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: Shutting down > the object store... > 2016-03-15 11:25:53,549 INFO [HiveServer2-Handler-Pool: Thread-48]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - > ugi=hive/sandbox.hortonworks@example.com ip=unknown-ip-addr > cmd=Shutting down the object store... > 2016-03-15 11:25:53,549 INFO [HiveServer2-Handler-Pool: Thread-48]: > metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 1: Metastore > shutdown complete. > 2016-03-15 11:25:53,549 INFO [HiveServer2-Handler-Pool: Thread-48]: > HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - > ugi=hive/sandbox.hortonworks@example.com ip=unknown-ip-addr > cmd=Metastore shutdown complete. > 2016-03-15 11:25:53,573 INFO [HiveServer2-Handler-Pool: Thread-48]: > session.SessionState (SessionState.java:createPath(641)) - Created local > directory: /tmp/e43fbaab-a659-4331-90cb-0ea0b2098e25_resources > 2016-03-15 11:25:53,577 INFO [HiveServer2-Handler-Pool: Thread-48]: > session.SessionState (SessionState.java:createPath(641)) - Created HDFS > directory: /tmp/hive/ambari-qa/e43fbaab-a659-4331-90cb-0ea0b2098e25 > 2016-03-15 11:25:53,582 INFO [HiveServer2-Handler-Pool: Thread-48]: > session.SessionState (SessionState.java:createPath(641)) - Created local > directory: /tmp/hive/e43fbaab-a659-4331-90cb-0ea0b2098e25 > 2016-03-15 11:25:53,587 INFO [HiveServer2-Handler-Pool: Thread-48]: > session.SessionState (SessionState.java:createPath(641)) - Created HDFS > directory: > /tmp/hive/ambari-qa/e43fbaab-a659-4331-90cb-0ea0b2098e25/_tmp_space.db > 2016-03-15 11:25:53,592 INFO [HiveServer2-Handler-Pool: Thread-48]: > session.HiveSessionImpl (HiveSessionImpl.java:setOperationLogSessionDir(236)) > - Operation log session directory is created: > /home/hive/${sy
[jira] [Commented] (HIVE-13027) Async loggers for LLAP
[ https://issues.apache.org/jira/browse/HIVE-13027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196507#comment-15196507 ] Prasanth Jayachandran commented on HIVE-13027: -- I tried another query with TPCDS 1TB scale. The runtime for queries at WARN level and INFO are 18.1s vs 18.5s respectively. One thing I am noticing is that the presence of disruptor jar in the classpath itself triggers Async logging without -DLog4jContextSelector system property. I don't know why yet. > Async loggers for LLAP > -- > > Key: HIVE-13027 > URL: https://issues.apache.org/jira/browse/HIVE-13027 > Project: Hive > Issue Type: Improvement > Components: Logging >Affects Versions: 2.1.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13027.1.patch > > > LOG4j2's async logger claims to have 6-68 times better performance than > synchronous logger. https://logging.apache.org/log4j/2.x/manual/async.html > We should use that for LLAP. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
[ https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196513#comment-15196513 ] Eugene Koifman commented on HIVE-12439: --- what about TestTxnCommands2? This is certainly a relevant test > CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements > -- > > Key: HIVE-12439 > URL: https://issues.apache.org/jira/browse/HIVE-12439 > Project: Hive > Issue Type: Improvement > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-12439.1.patch > > > # add a safeguard to make sure IN clause is not too large; break up by txn id > to delete from TXN_COMPONENTS where tc_txnid in ... > # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, > rather than 1 DB roundtrip per row -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11388: -- Summary: Allow ACID Compactor components to run in multiple metastores (was: there should only be 1 Initiator for compactions per Hive installation) > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.patch > > > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11388) Allow ACID Compactor components to run in multiple metastores
[ https://issues.apache.org/jira/browse/HIVE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-11388: -- Description: (this description is no loner accurate; see further comments) org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs inside the metastore service to manage compactions of ACID tables. There should be exactly 1 instance of this thread (even with multiple Thrift services). This is documented in https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration but not enforced. Should add enforcement, since more than 1 Initiator could cause concurrent attempts to compact the same table/partition - which will not work. was: org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs inside the metastore service to manage compactions of ACID tables. There should be exactly 1 instance of this thread (even with multiple Thrift services). This is documented in https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration but not enforced. Should add enforcement, since more than 1 Initiator could cause concurrent attempts to compact the same table/partition - which will not work. > Allow ACID Compactor components to run in multiple metastores > - > > Key: HIVE-11388 > URL: https://issues.apache.org/jira/browse/HIVE-11388 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-11388.patch > > > (this description is no loner accurate; see further comments) > org.apache.hadoop.hive.ql.txn.compactor.Initiator is a thread that runs > inside the metastore service to manage compactions of ACID tables. There > should be exactly 1 instance of this thread (even with multiple Thrift > services). > This is documented in > https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-Configuration > but not enforced. > Should add enforcement, since more than 1 Initiator could cause concurrent > attempts to compact the same table/partition - which will not work. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
[ https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196525#comment-15196525 ] Wei Zheng commented on HIVE-12439: -- Oh that's not a real failure. It's complaining about no TEST-*.xml file. > CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements > -- > > Key: HIVE-12439 > URL: https://issues.apache.org/jira/browse/HIVE-12439 > Project: Hive > Issue Type: Improvement > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-12439.1.patch > > > # add a safeguard to make sure IN clause is not too large; break up by txn id > to delete from TXN_COMPONENTS where tc_txnid in ... > # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, > rather than 1 DB roundtrip per row -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
[ https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196536#comment-15196536 ] Eugene Koifman commented on HIVE-12439: --- BTW, this patch no longer applies to current master > CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements > -- > > Key: HIVE-12439 > URL: https://issues.apache.org/jira/browse/HIVE-12439 > Project: Hive > Issue Type: Improvement > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-12439.1.patch > > > # add a safeguard to make sure IN clause is not too large; break up by txn id > to delete from TXN_COMPONENTS where tc_txnid in ... > # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, > rather than 1 DB roundtrip per row -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12049) Provide an option to write serialized thrift objects in final tasks
[ https://issues.apache.org/jira/browse/HIVE-12049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohit Dholakia updated HIVE-12049: -- Attachment: HIVE-12049.14.patch > Provide an option to write serialized thrift objects in final tasks > --- > > Key: HIVE-12049 > URL: https://issues.apache.org/jira/browse/HIVE-12049 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2 >Reporter: Rohit Dholakia >Assignee: Rohit Dholakia > Attachments: HIVE-12049.1.patch, HIVE-12049.11.patch, > HIVE-12049.12.patch, HIVE-12049.13.patch, HIVE-12049.14.patch, > HIVE-12049.2.patch, HIVE-12049.3.patch, HIVE-12049.4.patch, > HIVE-12049.5.patch, HIVE-12049.6.patch, HIVE-12049.7.patch, HIVE-12049.9.patch > > > For each fetch request to HiveServer2, we pay the penalty of deserializing > the row objects and translating them into a different representation suitable > for the RPC transfer. In a moderate to high concurrency scenarios, this can > result in significant CPU and memory wastage. By having each task write the > appropriate thrift objects to the output files, HiveServer2 can simply stream > a batch of rows on the wire without incurring any of the additional cost of > deserialization and translation. > This can be implemented by writing a new SerDe, which the FileSinkOperator > can use to write thrift formatted row batches to the output file. Using the > pluggable property of the {{hive.query.result.fileformat}}, we can set it to > use SequenceFile and write a batch of thrift formatted rows as a value blob. > The FetchTask can now simply read the blob and send it over the wire. On the > client side, the *DBC driver can read the blob and since it is already > formatted in the way it expects, it can continue building the ResultSet the > way it does in the current implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11675) make use of file footer PPD API in ETL strategy or separate strategy
[ https://issues.apache.org/jira/browse/HIVE-11675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11675: Resolution: Fixed Fix Version/s: 2.1.0 Status: Resolved (was: Patch Available) Committed to master after resolving conflicts. > make use of file footer PPD API in ETL strategy or separate strategy > > > Key: HIVE-11675 > URL: https://issues.apache.org/jira/browse/HIVE-11675 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.1.0 > > Attachments: HIVE-11675.01.patch, HIVE-11675.02.patch, > HIVE-11675.03.patch, HIVE-11675.04.patch, HIVE-11675.05.patch, > HIVE-11675.06.patch, HIVE-11675.07.patch, HIVE-11675.08.patch, > HIVE-11675.09.patch, HIVE-11675.10.patch, HIVE-11675.11.patch, > HIVE-11675.12.patch, HIVE-11675.13.patch, HIVE-11675.14.patch, > HIVE-11675.patch, HIVE-11675.premature.opti.patch > > > Need to take a look at the best flow. It won't be much different if we do > filtering metastore call for each partition. So perhaps we'd need the custom > sync point/batching after all. > Or we can make it opportunistic and not fetch any footers unless it can be > pushed down to metastore or fetched from local cache, that way the only slow > threaded op is directory listings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13249) Hard upper bound on number of open transactions
[ https://issues.apache.org/jira/browse/HIVE-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196611#comment-15196611 ] Eugene Koifman commented on HIVE-13249: --- AcidHouseKeeper uses a ScheduledExecutorService. It run multiple tasks on separate schedules. I don't think there was ever a proposal to run any threads on client side. The idea Wei and I discussed was to run a single thread per metastore JVM to count number of txns periodically and check the computed value in TxnHandler.openTxnx() each time it's called. I think this is conceptually the same as your idea. It's easy enough to have HouseKeeper run multiple tasks, but it complicates testing since it makes it harder to just run one iteration of a particular task. We'd need to do some refactoring in HouseKeepers to make sure this is possible - then they can be combined into a single HouseKeeper that runs multiple periodic tasks. Wei, I said earlier that putting this computation in AcidHouseKeeper was a bad idea but I was wrong. Since there is a single AcidHouseKeeper per JVM, the task that it runs can easily just set a static variable on TxnHandler with results of the computation which openTxns() can read. As far as testing, look at TestTxnHandler for example, there are multiple examples openTxns() calls. In fact each call can open many txnxs at once. TestTxnCommands.testTimeOutReaper() has an example on how to run the HouseKeeper in UT, but like I said, you'd need to refactor it a bit if you want to run multiple tasks in it. > Hard upper bound on number of open transactions > --- > > Key: HIVE-13249 > URL: https://issues.apache.org/jira/browse/HIVE-13249 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13249.1.patch > > > We need to have a safeguard by adding an upper bound for open transactions to > avoid huge number of open-transaction requests, usually due to improper > configuration of clients such as Storm. > Once that limit is reached, clients will start failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196615#comment-15196615 ] Hive QA commented on HIVE-13084: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793547/HIVE-13084.03.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 9800 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vector_decimal_10_0.q-vector_acid3.q-vector_decimal_trailing.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_distinct_2.q-vector_interval_2.q-load_dyn_part2.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_multi_and_projection org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_multi_or_projection org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_multi_and_projection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7277/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7277/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7277/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793547 - PreCommit-HIVE-TRUNK-Build > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13291: - Attachment: HIVE-13291.1.patch > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13291: - Status: Patch Available (was: Open) > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196695#comment-15196695 ] Prasanth Jayachandran commented on HIVE-13291: -- [~gopalv] Could you please review this patch? > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196711#comment-15196711 ] Prasanth Jayachandran commented on HIVE-13291: -- [~gopalv] Addressed your review comments about using locations with offsets. > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13291: - Attachment: HIVE-13291.2.patch > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196717#comment-15196717 ] Gopal V commented on HIVE-13291: Left some minor comments about that loop. Approach LGTM - +1, tests pending. > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12439) CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements
[ https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196731#comment-15196731 ] Eugene Koifman commented on HIVE-12439: --- 1. CompactionTxnHandler.cleanEmptyAborted() - why rewrite "String s = "select txn_id from TXNS where " + "txn_id not in (select tc_txnid from TXN_COMPONENTS) and " + "txn_state = '" + TXN_ABORTED + "'";" The IN clause here doesn't list values - it's not (cannot in fact be) subject to 1000 or any other limit. Also, part of your rewrite lost "LOG.info("Removed " + rc + " empty Aborted transactions: " + txnIdBatch + " from TXNS");" This is a critical debug/support log statement - it logs the actual txn IDs that were cleared. 2. TxnHandler.openTxns() " if (i > first) { valuesClause.append(", "); } " this will generate a query with "values,(..." if the previous "if" with METASTORE_DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE executes. This is a nit but this class has quoteString() and quoteChar() to generate SQL with string values 3. TxnHandler.timeOutLocks() - why does this need a suffix at all? The extra parentheses seem redundant. 4. TxnHandler.abortTxns() - there seems to be a redundant set or parentheses wrapping the IN clause. Why is this necessary? 5. TestTxnUtils - I think this test is very limited. It would be better (in addition) to add some tests that will actually cause the new queries to execute in a DB (Derby in practice). In particular, once the 2 new properties are exceeded. I think that would provide better test coverage. > CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements > -- > > Key: HIVE-12439 > URL: https://issues.apache.org/jira/browse/HIVE-12439 > Project: Hive > Issue Type: Improvement > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-12439.1.patch > > > # add a safeguard to make sure IN clause is not too large; break up by txn id > to delete from TXN_COMPONENTS where tc_txnid in ... > # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, > rather than 1 DB roundtrip per row -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13291: - Attachment: HIVE-13291.3.patch Addressed [~gopalv]'s RB comments. > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13291: - Attachment: (was: HIVE-13291.3.patch) > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13291) ORC BI Split strategy should consider block size instead of file size
[ https://issues.apache.org/jira/browse/HIVE-13291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13291: - Attachment: HIVE-13291.3.patch > ORC BI Split strategy should consider block size instead of file size > - > > Key: HIVE-13291 > URL: https://issues.apache.org/jira/browse/HIVE-13291 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Prasanth Jayachandran > Attachments: HIVE-13291.1.patch, HIVE-13291.2.patch, > HIVE-13291.3.patch > > > When we force split strategy to use "BI" (using > hive.exec.orc.split.strategy), entire file is considered as single split. > This might be inefficient when the files are large. Instead, BI should > consider splitting at block boundary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12977) Pass credentials in the current UGI while creating Tez session
[ https://issues.apache.org/jira/browse/HIVE-12977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196769#comment-15196769 ] Hive QA commented on HIVE-12977: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12793407/HIVE-12977.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 9829 tests executed *Failed tests:* {noformat} TestSparkCliDriver-groupby3_map.q-sample2.q-auto_join14.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_map_ppr_multi_distinct.q-table_access_keys_stats.q-groupby4_noskew.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_rc.q-insert1.q-vectorized_rcfile_columnar.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-ppd_join4.q-join9.q-ppd_join3.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_limit org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_hybridgrace_hashjoin_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_llap_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_result_complex org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_decimal org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union_multiinsert org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_date_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_aggregate org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_expressions org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_mapjoin org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_partition_diff_num_cols org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_reduce2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorization_1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7279/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/7279/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-7279/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 20 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12793407 - PreCommit-HIVE-TRUNK-Build > Pass credentials in the current UGI while creating Tez session > -- > > Key: HIVE-12977 > URL: https://issues.apache.org/jira/browse/HIVE-12977 > Project: Hive > Issue Type: Bug > Components: Tez >Reporter: Vinoth Sathappan >Assignee: Vinoth Sathappan > Attachments: HIVE-12977.1.patch, HIVE-12977.1.patch > > > The credentials present in the current UGI i.e. > UserGroupInformation.getCurrentUser().getCredentials() isn't passed to the > Tez session. It is instantiated with null credentials > session = TezClient.create("HIVE-" + sessionId, tezConfig, true, > commonLocalResources, null); > In this case, Tez fails to access resources even if the tokens are available > in memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13277) Exception "Unable to create serializer 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " occurred during query execution on spark engine when ve
[ https://issues.apache.org/jira/browse/HIVE-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196806#comment-15196806 ] Rui Li commented on HIVE-13277: --- Pinging [~xuefuz] > Exception "Unable to create serializer > 'org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer' " > occurred during query execution on spark engine when vectorized execution is > switched on > - > > Key: HIVE-13277 > URL: https://issues.apache.org/jira/browse/HIVE-13277 > Project: Hive > Issue Type: Bug > Environment: Hive on Spark engine > Hive Version: Apache Hive 2.0.0 > Spark Version: Apache Spark 1.6.0 >Reporter: Xin Hao > > Found when executing TPCx-BB query2 for Hive on Spark engine, and switch on : > Found during TPCx-BB query2 execution on spark engine when vectorized > execution is switched on: > (1) set hive.vectorized.execution.enabled=true; > (2) set hive.vectorized.execution.reduce.enabled=true; (default value for > Apache Hive 2.0.0) > It's OK for spark engine when hive.vectorized.execution.enabled is switched > off: > (1) set hive.vectorized.execution.enabled=false; > (2) set hive.vectorized.execution.reduce.enabled=true; > For MR engine, the query could pass and no exception occurred when vectorized > execution is either switched on or switched off. > Detail Error Message is below: > {noformat} > 2016-03-14T10:09:33,692 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 INFO > spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 154 > bytes > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - 16/03/14 10:09:33 WARN > scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 25, bhx3): > java.lang.RuntimeException: Failed to load plan: > hdfs://bhx3:8020/tmp/hive/root/40b90ebd-32d4-47bc-a5ab-12ff1c05d0d2/hive_2016-03-14_10-08-56_307_7692316402338632647-1/-mr-10002/ab0c0021-0c1a-496e-9703-87d5879353c8/reduce.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.lang.IllegalArgumentException: Unable to create serializer > "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for > class: org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - Serialization trace: > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - childOperators > (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - childOperators > (org.apache.hadoop.hive.ql.exec.vector.VectorLimitOperator) > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - childOperators > (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator) > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) - reducer > (org.apache.hadoop.hive.ql.plan.ReduceWork) > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:451) > 2016-03-14T10:09:33,818 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.hadoop.hive.ql.exec.Utilities.getReduceWork(Utilities.java:306) > 2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.hadoop.hive.ql.exec.spark.SparkReduceRecordHandler.init(SparkReduceRecordHandler.java:117) > 2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:46) > 2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.hadoop.hive.ql.exec.spark.HiveReduceFunction.call(HiveReduceFunction.java:28) > 2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192) > 2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.SparkClientImpl > (SparkClientImpl.java:run(593)) -at > org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:192) > 2016-03-14T10:09:33,819 INFO [stderr-redir-1]: client.Sp
[jira] [Commented] (HIVE-12612) beeline always exits with 0 status when reading query from standard input
[ https://issues.apache.org/jira/browse/HIVE-12612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196814#comment-15196814 ] stephen sprague commented on HIVE-12612: [~psequeirag] - hey, thanks for the workaround suggestion. we too use tons of here documents and this is a show stopper. /dev/stdin should be treated same as -f flag (a file) when /dev/stdin is not a tty. :) > beeline always exits with 0 status when reading query from standard input > - > > Key: HIVE-12612 > URL: https://issues.apache.org/jira/browse/HIVE-12612 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 1.1.0 > Environment: CDH5.5.0 >Reporter: Paulo Sequeira >Priority: Minor > > Similar to what was reported on HIVE-6978, but now it only happens when the > query is read from the standard input. For example, the following fails as > expected: > {code} > bash$ if beeline -u "jdbc:hive2://..." -e "boo;" ; then echo "Ok?!" ; else > echo "Failed!" ; fi > Connecting to jdbc:hive2://... > Connected to: Apache Hive (version 1.1.0-cdh5.5.0) > Driver: Hive JDBC (version 1.1.0-cdh5.5.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Error: Error while compiling statement: FAILED: ParseException line 1:0 > cannot recognize input near 'boo' '' '' (state=42000,code=4) > Closing: 0: jdbc:hive2://... > Failed! > {code} > But the following does not: > {code} > bash$ if echo "boo;"|beeline -u "jdbc:hive2://..." ; then echo "Ok?!" ; else > echo "Failed!" ; fi > Connecting to jdbc:hive2://... > Connected to: Apache Hive (version 1.1.0-cdh5.5.0) > Driver: Hive JDBC (version 1.1.0-cdh5.5.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 1.1.0-cdh5.5.0 by Apache Hive > 0: jdbc:hive2://...:8> Error: Error while compiling statement: FAILED: > ParseException line 1:0 cannot recognize input near 'boo' '' '' > (state=42000,code=4) > 0: jdbc:hive2://...:8> Closing: 0: jdbc:hive2://... > Ok?! > {code} > This was misleading our batch scripts to always believe that the execution of > the queries succeded, when sometimes that was not the case. > h2. Workaround > We found we can work around the issue by always using the -e or the -f > parameters, and even reading the standard input through the /dev/stdin device > (this was useful because a lot of the scripts fed the queries from here > documents), like this: > {code:title=some-script.sh} > #!/bin/sh > set -o nounset -o errexit -o pipefail > # As beeline is failing to report an error status if reading the query > # to be executed from STDIN, check whether no -f or -e option is used > # and, in that case, pretend it has to read the query from a regular > # file using -f to read from /dev/stdin > function beeline_workaround_exit_status () { > for arg in "$@" > do if [ "$arg" = "-f" -o "$arg" = "-e" ] >then beeline -u "..." "$@" > return >fi > done > beeline -u "..." "$@" -f /dev/stdin > } > beeline_workaround_exit_status < boo; > EOF > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13084) Vectorization add support for PROJECTION Multi-AND/OR
[ https://issues.apache.org/jira/browse/HIVE-13084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196817#comment-15196817 ] Matt McCline commented on HIVE-13084: - Seems like there may be a bug in the Comparison Vector Expressions: {code} SELECT t, si, i, (t < 0) as child1, (si > 0) as child2, (i < 0) as child3, (t < 0 OR si > 0 OR i < 0) as multi_or_col from vectortab2k_orc where pmod(i,4) = 2 order by t, si, i; Non-Vectorized: tsi i child1 child2child3 multi_or_col -124 NULL 206942178 trueNULL false true Vectorized: tsi ichild1 child2child3 multi_or_col -124 NULL 206942178 truetrue false true {code} Child 2 is different! LongColGreaterLongScalar ??? > Vectorization add support for PROJECTION Multi-AND/OR > - > > Key: HIVE-13084 > URL: https://issues.apache.org/jira/browse/HIVE-13084 > Project: Hive > Issue Type: Bug > Components: Vectorization >Reporter: Rajesh Balamohan >Assignee: Matt McCline > Attachments: HIVE-13084.01.patch, HIVE-13084.02.patch, > HIVE-13084.03.patch, vector_between_date.q > > > When there is case statement in group by, hive throws unable to vectorize > exception. > e.g query just to demonstrate the problem > {noformat} > explain select l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END as wk from lineitem_test_l_shipdate_ts > group by l_partkey, case when l_commitdate between '2015-06-30' AND > '2015-07-06' THEN '2015-06-30' END; > org.apache.hadoop.hive.ql.metadata.HiveException: Could not vectorize > expression: org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 2 > File Output Operator [FS_7] > Group By Operator [GBY_5] (rows=888777234 width=108) > Output:["_col0","_col1"],keys:KEY._col0, KEY._col1 > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_4] > PartitionCols:_col0, _col1 > Group By Operator [GBY_3] (rows=1777554469 width=108) > Output:["_col0","_col1"],keys:_col0, _col1 > Select Operator [SEL_1] (rows=1777554469 width=108) > Output:["_col0","_col1"] > TableScan [TS_0] (rows=1777554469 width=108) > > rajesh@lineitem_test_l_shipdate_ts,lineitem_test_l_shipdate_ts,Tbl:COMPLETE,Col:NONE,Output:["l_partkey","l_commitdate"] > {noformat} > \cc [~mmccline], [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine
[ https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196819#comment-15196819 ] Sergey Shelukhin commented on HIVE-13292: - With double type, it's usually by design > Different DOUBLE type precision issue between Spark and MR engine > - > > Key: HIVE-13292 > URL: https://issues.apache.org/jira/browse/HIVE-13292 > Project: Hive > Issue Type: Bug > Environment: Apache Hive 2.0.0 > Apache Spark 1.6.0 >Reporter: Xin Hao > > Different DOUBLE type precision issue between Spark and MR engine. > Found when executing the TPC-H query5 with scale factor 2 (2GB data size). > More details are as below. > (1)The MR engine output: > MOZAMBIQUE,1.0646195910990009E8 > ETHIOPIA,1.0108856206629996E8 > ALGERIA,9.987582690420012E7 > MOROCCO,9.785484184850013E7 > KENYA,9.412388077690017E7 > (2)The Spark engine output: > MOZAMBIQUE,1.064619591099E8 > ETHIOPIA,1.0108856206630005E8 > ALGERIA,9.987582690419997E7 > MOROCCO,9.785484184850003E7 > KENYA,9.412388077690002E7 > (3)Detail SQL used: > drop table if exists ${env:RESULT_TABLE}; > create table ${env:RESULT_TABLE} ( > pid1 STRING, > pid2 DOUBLE > ) > row format delimited fields terminated by ',' lines terminated by '\n' > stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location > '${env:RESULT_DIR}'; > insert into table ${env:RESULT_TABLE} > select > n_name, > sum(l_extendedprice * (1 - l_discount)) as revenue > from > customer, > orders, > lineitem, > supplier, > nation, > region > where > c_custkey = o_custkey > and l_orderkey = o_orderkey > and l_suppkey = s_suppkey > and c_nationkey = s_nationkey > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = 'AFRICA' > and o_orderdate >= '1993-01-01' > and o_orderdate < '1994-01-01' > group by > n_name > order by > revenue desc; > (4)Similar issue also exists even after we simplified original query to a > simpler one as below: > drop table if exists ${env:RESULT_TABLE}; > create table ${env:RESULT_TABLE} ( > pid2 DOUBLE > ) > row format delimited fields terminated by ',' lines terminated by '\n' > stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location > '${env:RESULT_DIR}'; > insert into table ${env:RESULT_TABLE} > select > sum(l_extendedprice * (1 - l_discount)) as revenue > from > lineitem > group by > l_orderkey > order by > revenue; -- This message was sent by Atlassian JIRA (v6.3.4#6332)