[jira] [Updated] (HIVE-18905) HS2: SASL auth loads HiveConf for every JDBC call
[ https://issues.apache.org/jira/browse/HIVE-18905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-18905: --- Status: Open (was: Patch Available) > HS2: SASL auth loads HiveConf for every JDBC call > - > > Key: HIVE-18905 > URL: https://issues.apache.org/jira/browse/HIVE-18905 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Igor Kryvenko >Priority: Minor > Attachments: HIVE-18905.01.patch, HIVE-18905.03.patch, > HIVE-18905.04.patch, HIVE-18905.patch > > > SASL authentication filter does a new HiveConf() for no good reason. > {code} > public static PasswdAuthenticationProvider > getAuthenticationProvider(AuthMethods authMethod) > throws AuthenticationException { > return getAuthenticationProvider(authMethod, new HiveConf()); > } > {code} > The session HiveConf is not needed to do this operation & it can't be changed > after the HS2 starts up (today). > {code} > org.apache.hadoop.hive.conf.HiveConf.() HiveConf.java:4404 > org.apache.hive.service.auth.AuthenticationProviderFactory.getAuthenticationProvider(AuthenticationProviderFactory$AuthMethods) > AuthenticationProviderFactory.java:61 > org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(Callback[]) > PlainSaslHelper.java:106 > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(byte[]) > PlainSaslServer.java:103 > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(byte[]) > TSaslTransport.java:539 > org.apache.thrift.transport.TSaslTransport.open() TSaslTransport.java:283 > org.apache.thrift.transport.TSaslServerTransport.open() > TSaslServerTransport.java:41 > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TTransport) > TSaslServerTransport.java:216 > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run() > TThreadPoolServer.java:269 > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) > ThreadPoolExecutor.java:1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run() > ThreadPoolExecutor.java:617 > java.lang.Thread.run() Thread.java:745 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22165) Synchronisation introduced by HIVE-14296 on SessionManager.closeSession causes high latency in a busy hive server
[ https://issues.apache.org/jira/browse/HIVE-22165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921966#comment-16921966 ] Gopal V commented on HIVE-22165: The patch doesn't build, because the variable "session" got pulled out of the scope. > Synchronisation introduced by HIVE-14296 on SessionManager.closeSession > causes high latency in a busy hive server > - > > Key: HIVE-22165 > URL: https://issues.apache.org/jira/browse/HIVE-22165 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 2.1.0, 2.3.2 >Reporter: Amruth S >Assignee: Amruth S >Priority: Major > Attachments: HIVE-22165.patch > > > HIVE-14296 introduces this > [commit|https://github.com/apache/hive/commit/477a47d3b4b9e3da3c22465217c2024588f7f000] > which adds synchronization to SessionManager.closeSession. > And it looks like it is used only for logging purposes. > In a busy hive server where 5-10 sessions are created closed every second, an > increase in latency of any other downstream services (Zk, HDFS) causes a > queueing effect (lot of threads getting blocked on > SessionManager.closeSession) creating an induced latency of 3-5 minutes at > times for just closing the session. > Since the gauge (MetricsConstant.HS2_OPEN_SESSIONS) is already tracking the > open session counts, the synchronization (along with the additional logging) > can be without any functionality losses. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
[ https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-22161: --- Labels: concurrency performance (was: performance) > UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType > class > - > > Key: HIVE-22161 > URL: https://issues.apache.org/jira/browse/HIVE-22161 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.2, 4.0.0, 3.1.2 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Labels: concurrency, performance > Fix For: 4.0.0 > > Attachments: HIVE-22161.1.patch > > > There's a hidden synchronization across threads when looking up isStateful > and isDeterministic. > https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27 > {code} > // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142 > public static T getAnnotation(Class clazz, > Class annotationClass) { > synchronized (annotationClass) { > return clazz.getAnnotation(annotationClass); > } > } > {code} > This is serializing multiple threads initializing UDFs (or checking them > during compilation) & also being locked across threads for each instance of > GenericUDFOpEqual in the specific scenario. > https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
[ https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-22161: --- Labels: performance (was: ) > UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType > class > - > > Key: HIVE-22161 > URL: https://issues.apache.org/jira/browse/HIVE-22161 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.2, 4.0.0, 3.1.2 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Labels: performance > Fix For: 4.0.0 > > Attachments: HIVE-22161.1.patch > > > There's a hidden synchronization across threads when looking up isStateful > and isDeterministic. > https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27 > {code} > // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142 > public static T getAnnotation(Class clazz, > Class annotationClass) { > synchronized (annotationClass) { > return clazz.getAnnotation(annotationClass); > } > } > {code} > This is serializing multiple threads initializing UDFs (or checking them > during compilation) & also being locked across threads for each instance of > GenericUDFOpEqual in the specific scenario. > https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
[ https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-22161: --- Affects Version/s: 4.0.0 1.2.2 3.1.2 > UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType > class > - > > Key: HIVE-22161 > URL: https://issues.apache.org/jira/browse/HIVE-22161 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.2, 4.0.0, 3.1.2 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22161.1.patch > > > There's a hidden synchronization across threads when looking up isStateful > and isDeterministic. > https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27 > {code} > // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142 > public static T getAnnotation(Class clazz, > Class annotationClass) { > synchronized (annotationClass) { > return clazz.getAnnotation(annotationClass); > } > } > {code} > This is serializing multiple threads initializing UDFs (or checking them > during compilation) & also being locked across threads for each instance of > GenericUDFOpEqual in the specific scenario. > https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
[ https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-22161: --- Resolution: Fixed Status: Resolved (was: Patch Available) > UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType > class > - > > Key: HIVE-22161 > URL: https://issues.apache.org/jira/browse/HIVE-22161 > Project: Hive > Issue Type: Bug > Components: UDF >Affects Versions: 1.2.2, 4.0.0, 3.1.2 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Labels: concurrency, performance > Fix For: 4.0.0 > > Attachments: HIVE-22161.1.patch > > > There's a hidden synchronization across threads when looking up isStateful > and isDeterministic. > https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27 > {code} > // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142 > public static T getAnnotation(Class clazz, > Class annotationClass) { > synchronized (annotationClass) { > return clazz.getAnnotation(annotationClass); > } > } > {code} > This is serializing multiple threads initializing UDFs (or checking them > during compilation) & also being locked across threads for each instance of > GenericUDFOpEqual in the specific scenario. > https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class
[ https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-22161: --- Fix Version/s: 4.0.0 > UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType > class > - > > Key: HIVE-22161 > URL: https://issues.apache.org/jira/browse/HIVE-22161 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22161.1.patch > > > There's a hidden synchronization across threads when looking up isStateful > and isDeterministic. > https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27 > {code} > // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142 > public static T getAnnotation(Class clazz, > Class annotationClass) { > synchronized (annotationClass) { > return clazz.getAnnotation(annotationClass); > } > } > {code} > This is serializing multiple threads initializing UDFs (or checking them > during compilation) & also being locked across threads for each instance of > GenericUDFOpEqual in the specific scenario. > https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+ -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22107) Correlated subquery producing wrong schema
[ https://issues.apache.org/jira/browse/HIVE-22107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-22107: -- Labels: pull-request-available (was: ) > Correlated subquery producing wrong schema > -- > > Key: HIVE-22107 > URL: https://issues.apache.org/jira/browse/HIVE-22107 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22107.1.patch, HIVE-22107.2.patch, > HIVE-22107.3.patch, HIVE-22107.4.patch, HIVE-22107.5.patch > > > *Repro* > {code:sql} > create table test(id int, name string,dept string); > insert into test values(1,'a','it'),(2,'b','eee'),(NULL, 'c', 'cse'); > select distinct 'empno' as eid, a.id from test a where NOT EXISTS (select > c.id from test c where a.id=c.id); > {code} > {code} > +---++ > | eid | a.id | > +---++ > | NULL | empno | > +---++ > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (HIVE-22107) Correlated subquery producing wrong schema
[ https://issues.apache.org/jira/browse/HIVE-22107?focusedWorklogId=305873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-305873 ] ASF GitHub Bot logged work on HIVE-22107: - Author: ASF GitHub Bot Created on: 03/Sep/19 20:46 Start Date: 03/Sep/19 20:46 Worklog Time Spent: 10m Work Description: jcamachor commented on pull request #755: HIVE-22107 Correlated subquery producing wrong schema URL: https://github.com/apache/hive/pull/755#discussion_r320470567 ## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java ## @@ -199,7 +199,7 @@ private RexNode rewriteScalar(RelMetadataQuery mq, RexSubQuery e, Set Correlated subquery producing wrong schema > -- > > Key: HIVE-22107 > URL: https://issues.apache.org/jira/browse/HIVE-22107 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 4.0.0 >Reporter: Vineet Garg >Assignee: Vineet Garg >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22107.1.patch, HIVE-22107.2.patch, > HIVE-22107.3.patch, HIVE-22107.4.patch, HIVE-22107.5.patch > > Time Spent: 10m > Remaining Estimate: 0h > > *Repro* > {code:sql} > create table test(id int, name string,dept string); > insert into test values(1,'a','it'),(2,'b','eee'),(NULL, 'c', 'cse'); > select distinct 'empno' as eid, a.id from test a where NOT EXISTS (select > c.id from test c where a.id=c.id); > {code} > {code} > +---++ > | eid | a.id | > +---++ > | NULL | empno | > +---++ > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22162) MVs are not using ACID tables by default
[ https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22162: --- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to master, thanks [~kkasa]! > MVs are not using ACID tables by default > > > Key: HIVE-22162 > URL: https://issues.apache.org/jira/browse/HIVE-22162 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 3.1.2 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, > HIVE-22162.3.patch, HIVE-22162.4.patch > > > {code} > SET hive.support.concurrency=true; > SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > SET metastore.strict.managed.tables=true; > SET hive.default.fileformat=textfile; > SET hive.default.fileformat.managed=orc; > SET metastore.create.as.acid=true; > CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2)); > INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, > 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8); > CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite > AS SELECT a, b, c FROM cmv_basetable_n4; > DESCRIBE FORMATTED cmv_mat_view_n4; > {code} > {code} > POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4 > ... > Table Type: MATERIALIZED_VIEW > Table Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}} > bucketing_version 2 > numFiles1 > numRows 5 > rawDataSize 1025 > totalSize 509 > {code} > Missing table parameter > {code} > transaction = true > {code} > cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22162) MVs are not using ACID tables by default
[ https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921573#comment-16921573 ] Jesus Camacho Rodriguez commented on HIVE-22162: +1 > MVs are not using ACID tables by default > > > Key: HIVE-22162 > URL: https://issues.apache.org/jira/browse/HIVE-22162 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 3.1.2 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, > HIVE-22162.3.patch, HIVE-22162.4.patch > > > {code} > SET hive.support.concurrency=true; > SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > SET metastore.strict.managed.tables=true; > SET hive.default.fileformat=textfile; > SET hive.default.fileformat.managed=orc; > SET metastore.create.as.acid=true; > CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2)); > INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, > 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8); > CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite > AS SELECT a, b, c FROM cmv_basetable_n4; > DESCRIBE FORMATTED cmv_mat_view_n4; > {code} > {code} > POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4 > ... > Table Type: MATERIALIZED_VIEW > Table Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}} > bucketing_version 2 > numFiles1 > numRows 5 > rawDataSize 1025 > totalSize 509 > {code} > Missing table parameter > {code} > transaction = true > {code} > cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22162) MVs are not using ACID tables by default
[ https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-22162: --- Summary: MVs are not using ACID tables by default (was: MVs are not using ACID tables.) > MVs are not using ACID tables by default > > > Key: HIVE-22162 > URL: https://issues.apache.org/jira/browse/HIVE-22162 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 3.1.2 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, > HIVE-22162.3.patch, HIVE-22162.4.patch > > > {code} > SET hive.support.concurrency=true; > SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > SET metastore.strict.managed.tables=true; > SET hive.default.fileformat=textfile; > SET hive.default.fileformat.managed=orc; > SET metastore.create.as.acid=true; > CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2)); > INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, > 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8); > CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite > AS SELECT a, b, c FROM cmv_basetable_n4; > DESCRIBE FORMATTED cmv_mat_view_n4; > {code} > {code} > POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4 > ... > Table Type: MATERIALIZED_VIEW > Table Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}} > bucketing_version 2 > numFiles1 > numRows 5 > rawDataSize 1025 > totalSize 509 > {code} > Missing table parameter > {code} > transaction = true > {code} > cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22164) Vectorized Limit operator returns wrong number of results with offset
[ https://issues.apache.org/jira/browse/HIVE-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22164: Attachment: HIVE-22164.4.patch Status: Patch Available (was: Open) > Vectorized Limit operator returns wrong number of results with offset > - > > Key: HIVE-22164 > URL: https://issues.apache.org/jira/browse/HIVE-22164 > Project: Hive > Issue Type: Bug > Components: Hive, llap, Vectorization >Affects Versions: 4.0.0 >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22164.1.patch, HIVE-22164.2.patch, > HIVE-22164.3.patch, HIVE-22164.4.patch > > > Vectorized Limit operator returns wrong number of results with offset -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22164) Vectorized Limit operator returns wrong number of results with offset
[ https://issues.apache.org/jira/browse/HIVE-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramesh Kumar Thangarajan updated HIVE-22164: Status: Open (was: Patch Available) > Vectorized Limit operator returns wrong number of results with offset > - > > Key: HIVE-22164 > URL: https://issues.apache.org/jira/browse/HIVE-22164 > Project: Hive > Issue Type: Bug > Components: Hive, llap, Vectorization >Affects Versions: 4.0.0 >Reporter: Ramesh Kumar Thangarajan >Assignee: Ramesh Kumar Thangarajan >Priority: Major > Attachments: HIVE-22164.1.patch, HIVE-22164.2.patch, > HIVE-22164.3.patch > > > Vectorized Limit operator returns wrong number of results with offset -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22099) Several date related UDFs can't handle Julian dates properly since HIVE-20007
[ https://issues.apache.org/jira/browse/HIVE-22099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921542#comment-16921542 ] Jesus Camacho Rodriguez commented on HIVE-22099: [~szita], a few minor comments on the patch. Consider using {{DateTimeMath.retrieveProlepticGregorianCalendarUTC}} instead of importing the static method. Also {{retrieveProlepticGregorianCalendarUTC}} -> {{getProlepticGregorianCalendarUTC}}. > Several date related UDFs can't handle Julian dates properly since HIVE-20007 > - > > Key: HIVE-22099 > URL: https://issues.apache.org/jira/browse/HIVE-22099 > Project: Hive > Issue Type: Bug >Reporter: Adam Szita >Assignee: Adam Szita >Priority: Major > Attachments: HIVE-22099.0.patch, HIVE-22099.1.patch, > HIVE-22099.2.patch, HIVE-22099.3.patch, HIVE-22099.4.patch, HIVE-22099.5.patch > > > Currently dates that belong to Julian calendar (before Oct 15, 1582) are > handled improperly by date/timestamp UDFs. > E.g. DateFormat UDF: > Although the dates are in Julian calendar, the formatter insists to print > these according to Gregorian calendar causing multiple days of difference in > some cases: > > {code:java} > beeline> select date_format('1001-01-05','dd---MM--'); > ++ > | _c0 | > ++ > | 30---12--1000 | > ++{code} > I've observed similar problems in the following UDFs: > * add_months > * date_format > * day > * month > * months_between > * weekofyear > * year > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.9.1
[ https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921487#comment-16921487 ] Nandor Kollar commented on HIVE-21737: -- [~Fokko] changes on {{RelTreeSignature.java}} look unrelated, would you mind reverting those? In addition, I'm afraid that [this|https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L238] part of TypeInfoToSchema no longer sets default to null as it used to do: this call landed [here|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L394] before, but now [this|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L557] constructor is getting called, which - if I'm not mistaken - will end with a {{org.apache.avro.AvroRuntimeException: Unknown datum class: class com.fasterxml.jackson.databind.node.NullNode}}. Is my assumption correct? Unfortunately I'm not too familiar with Hive, so I don't know which test case would fail. I think we should simply get rid of Jackson classes here, and just pass null in the Schema.Field constructor. > Upgrade Avro to version 1.9.1 > - > > Key: HIVE-21737 > URL: https://issues.apache.org/jira/browse/HIVE-21737 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Ismaël Mejía >Assignee: Fokko Driesprong >Priority: Minor > Labels: pull-request-available > Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.1.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Avro 1.9.0 was released recently. It brings a lot of fixes including a leaner > version of Avro without Jackson in the public API. Worth the update. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22150) HS2 allows setting system properties
[ https://issues.apache.org/jira/browse/HIVE-22150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921402#comment-16921402 ] Hui An commented on HIVE-22150: --- [~alangates] [~pxiong] Could you please review this patch? > HS2 allows setting system properties > > > Key: HIVE-22150 > URL: https://issues.apache.org/jira/browse/HIVE-22150 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.1 >Reporter: Craig Condit >Assignee: Hui An >Priority: Major > Attachments: HIVE-22150.patch.1, HIVE-22150.patch.2 > > > HiveServer2 currently allows setting system properties, which is a problem > when used in a multi-user environment. > Connecting via beeline and executing the following demonstrates the issue: > {noformat} > 0: jdbc:hive2://serv1000.example.com:2181,serv> SET system:java.io.tmpdir; > +-+ > | set | > +-+ > | system:java.io.tmpdir=/tmp | > +-+ > 1 row selected (0.018 seconds) > 0: jdbc:hive2://serv1000.example.com:2181,serv> SET > system:java.io.tmpdir=/tmp/attacker-dir; > No rows affected (0.013 seconds) > 0: jdbc:hive2://serv1000.example.com:2181,serv> SET system:java.io.tmpdir; > +--+ > | set| > +--+ > | system:java.io.tmpdir=/tmp/attacker-dir | > +--+ > 1 row selected (0.019 seconds) > {noformat} > Any changes persist until HS2 is restarted, and affect all connected users. > At the very least, this is a denial-of-service vector (verified by setting > line.separator to a random string). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HIVE-22030) Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind)
[ https://issues.apache.org/jira/browse/HIVE-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921364#comment-16921364 ] Robert Schaft edited comment on HIVE-22030 at 9/3/19 11:51 AM: --- Even jackson 2.9.9.1 has vulnerabilites: [CVE-2019-14439|https://www.cvedetails.com/cve/CVE-2019-14439/] and [CVE-2019-14379|https://www.cvedetails.com/cve/CVE-2019-14379/] You need to bump to at least version 2.9.9.2. Newest ist 2.9.9.3 was (Author: robert.schaft): Even jackson 2.9.9.1 has vulnerabilites: [CVE-2019-14439|https://www.cvedetails.com/cve/CVE-2019-14439/] and [CVE-2019-14379|https://www.cvedetails.com/cve/CVE-2019-14379/] You need to bump to at least version 2.9.9.2. Newest ist 2.9.9.3 > Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind) > --- > > Key: HIVE-22030 > URL: https://issues.apache.org/jira/browse/HIVE-22030 > Project: Hive > Issue Type: Task >Reporter: Dombi Akos >Assignee: Dombi Akos >Priority: Major > Fix For: 4.0.0 > > > Bump the following jackson versions: > - jackson version to 2.9.9 > - jackson-databind version to 2.9.9.1 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22030) Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind)
[ https://issues.apache.org/jira/browse/HIVE-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921364#comment-16921364 ] Robert Schaft commented on HIVE-22030: -- Even jackson 2.9.9.1 has vulnerabilites: [CVE-2019-14439|https://www.cvedetails.com/cve/CVE-2019-14439/] and [CVE-2019-14379|https://www.cvedetails.com/cve/CVE-2019-14379/] You need to bump to at least version 2.9.9.2. Newest ist 2.9.9.3 > Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind) > --- > > Key: HIVE-22030 > URL: https://issues.apache.org/jira/browse/HIVE-22030 > Project: Hive > Issue Type: Task >Reporter: Dombi Akos >Assignee: Dombi Akos >Priority: Major > Fix For: 4.0.0 > > > Bump the following jackson versions: > - jackson version to 2.9.9 > - jackson-databind version to 2.9.9.1 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HIVE-21002) TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly
[ https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919389#comment-16919389 ] Piotr Findeisen edited comment on HIVE-21002 at 9/3/19 10:59 AM: - [~klcopp] [~zi] this issue explicitly talks about Avro and Parquet, whereas the same problem applies also to "RCBinary" ({{ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS RCFILE;}}). -Has this been addressed too, or should I create a new issue?- created HIVE-22167 was (Author: findepi): [~klcopp] [~zi] this issue explicitly talks about Avro and Parquet, whereas the same problem applies also to "RCBinary" ({{ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS RCFILE;}}). Has this been addressed too, or should I create a new issue? > TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and > Parquet timestamps written by Hive 2.x incorrectly > -- > > Key: HIVE-21002 > URL: https://issues.apache.org/jira/browse/HIVE-21002 > Project: Hive > Issue Type: Bug >Affects Versions: 3.1.0, 3.1.1 >Reporter: Zoltan Ivanfi >Priority: Major > > Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x > incorrectly. As an example session to demonstrate this problem, create a > dataset using Hive version 2.x in America/Los_Angeles: > {code:sql} > hive> create table ts_‹format› (ts timestamp) stored as ‹format›; > hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*); > {code} > Querying this table by issuing > {code:sql} > hive> select * from ts_‹format›; > {code} > from different time zones using different versions of Hive and different > storage formats gives the following results: > |‹format›|Writer time zone (in Hive 2.x)|Reader time zone|Result in Hive 2.x > reader|Result in Hive 3.1 reader| > |Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 > *00*:00:00.0|2018-01-01 *08*:00:00.0| > |Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 > *09*:00:00.0|2018-01-01 *08*:00:00.0| > |Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 > 00:00:00.0|2018-01-01 00:00:00.0| > |Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 > 00:00:00.0|2018-01-01 00:00:00.0| > *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored > in Avro and Parquet formats.* Apache ORC behaviour has not changed because it > was modified to adjust timestamps to retain backwards compatibility. Textfile > behaviour has not changed, because its processing involves parsing and > formatting instead of proper serializing and deserializing, so they > inherently had LocalDateTime semantics even in Hive 2.x. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata
[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921275#comment-16921275 ] Hui An edited comment on HIVE-22077 at 9/3/19 9:21 AM: --- [~kgyrtkirk] Could you please review this patch? was (Author: bone an): [~kgyrtkirk]Could you please review this patch? > Inserting overwrite partitions clause does not clean directories while > partitions' info is not stored in metadata > - > > Key: HIVE-22077 > URL: https://issues.apache.org/jira/browse/HIVE-22077 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1, 4.0.0, 2.3.4 >Reporter: Hui An >Assignee: Hui An >Priority: Major > Attachments: HIVE-22077.patch.1 > > > Inserting overwrite static partitions may not clean related HDFS location if > partitions' info is not stored in metadata. > Steps to reproduce this issue : > > 1. Create a managed table : > > {code:sql} > CREATE TABLE `test`( >`id` string) > PARTITIONED BY ( >`dayno` string) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > LOCATION >'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' > TBLPROPERTIES ( >'transient_lastDdlTime'='1564731656') > {code} > > 2. Create partition's directory and put some data in it > > {code:java} > hdfs dfs -mkdir > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > hdfs dfs -put test.data > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > {code} > > 3. Insert overwrite partition dayno=20190802 > > {code:sql} > INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') > SELECT "some value"; > {code} > > 4. We could see the test.data under partition directory is not deleted. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata
[ https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921275#comment-16921275 ] Hui An commented on HIVE-22077: --- [~kgyrtkirk]Could you please review this patch? > Inserting overwrite partitions clause does not clean directories while > partitions' info is not stored in metadata > - > > Key: HIVE-22077 > URL: https://issues.apache.org/jira/browse/HIVE-22077 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 1.1.1, 4.0.0, 2.3.4 >Reporter: Hui An >Assignee: Hui An >Priority: Major > Attachments: HIVE-22077.patch.1 > > > Inserting overwrite static partitions may not clean related HDFS location if > partitions' info is not stored in metadata. > Steps to reproduce this issue : > > 1. Create a managed table : > > {code:sql} > CREATE TABLE `test`( >`id` string) > PARTITIONED BY ( >`dayno` string) > ROW FORMAT SERDE >'org.apache.hadoop.hive.ql.io.orc.OrcSerde' > STORED AS INPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' > OUTPUTFORMAT >'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' > LOCATION >'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' > TBLPROPERTIES ( >'transient_lastDdlTime'='1564731656') > {code} > > 2. Create partition's directory and put some data in it > > {code:java} > hdfs dfs -mkdir > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > hdfs dfs -put test.data > hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802 > {code} > > 3. Insert overwrite partition dayno=20190802 > > {code:sql} > INSERT OVERWRITE TABLE test PARTITION(dayno='20190802') > SELECT "some value"; > {code} > > 4. We could see the test.data under partition directory is not deleted. > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22166) Configure Kerberos for Hive Ranger Client via HS2 configuration
[ https://issues.apache.org/jira/browse/HIVE-22166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko updated HIVE-22166: -- Attachment: HIVE-22166.1.patch > Configure Kerberos for Hive Ranger Client via HS2 configuration > --- > > Key: HIVE-22166 > URL: https://issues.apache.org/jira/browse/HIVE-22166 > Project: Hive > Issue Type: Improvement >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > Attachments: HIVE-22166.1.patch > > > In Hive we would like to have possibility to enable Kerberos partially (i.e > only Ranger, Atlas and HMS). > However, since hadoop security is a global flag there are many places that > need to be commented out to avoid the UGI cluster wide configuration. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Assigned] (HIVE-22166) Configure Kerberos for Hive Ranger Client via HS2 configuration
[ https://issues.apache.org/jira/browse/HIVE-22166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Denys Kuzmenko reassigned HIVE-22166: - Assignee: Denys Kuzmenko > Configure Kerberos for Hive Ranger Client via HS2 configuration > --- > > Key: HIVE-22166 > URL: https://issues.apache.org/jira/browse/HIVE-22166 > Project: Hive > Issue Type: Improvement >Reporter: Denys Kuzmenko >Assignee: Denys Kuzmenko >Priority: Major > > In Hive we would like to have possibility to enable Kerberos partially (i.e > only Ranger, Atlas and HMS). > However, since hadoop security is a global flag there are many places that > need to be commented out to avoid the UGI cluster wide configuration. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22149) Metastore: Unify codahale metrics.log json structure between hiveserver2 and metastore services
[ https://issues.apache.org/jira/browse/HIVE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921225#comment-16921225 ] Zoltan Haindrich commented on HIVE-22149: - +1 pending tests I don't know about anything which might build upon these values; [~abstractdog] do you know any? > Metastore: Unify codahale metrics.log json structure between hiveserver2 and > metastore services > --- > > Key: HIVE-22149 > URL: https://issues.apache.org/jira/browse/HIVE-22149 > Project: Hive > Issue Type: Bug >Reporter: Laszlo Bodor >Assignee: Laszlo Bodor >Priority: Major > Attachments: HIVE-22149.01.patch, metrics_hiveserver2.log, > metrics_metastore.log > > > While fixing HIVE-22140 I found some really annoying differences between the > codahale metric file structures between hiveserver2 and metastore, e.g. > open_connections: can be found in "counters" for hs2, but in "gauges" for ms > threads count: it's a proper "threads.count" for hs2, but a really ambiguous > "count" for ms > so I realized that "memory." and "threads." prefix is completely absent in ms > metrics file, which is misleading -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (HIVE-22162) MVs are not using ACID tables.
[ https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921221#comment-16921221 ] Hive QA commented on HIVE-22162: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12979186/HIVE-22162.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 16746 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/18436/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18436/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18436/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12979186 - PreCommit-HIVE-Build > MVs are not using ACID tables. > -- > > Key: HIVE-22162 > URL: https://issues.apache.org/jira/browse/HIVE-22162 > Project: Hive > Issue Type: Bug > Components: Materialized views >Affects Versions: 3.1.2 >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, > HIVE-22162.3.patch, HIVE-22162.4.patch > > > {code} > SET hive.support.concurrency=true; > SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > SET metastore.strict.managed.tables=true; > SET hive.default.fileformat=textfile; > SET hive.default.fileformat.managed=orc; > SET metastore.create.as.acid=true; > CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2)); > INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, > 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8); > CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite > AS SELECT a, b, c FROM cmv_basetable_n4; > DESCRIBE FORMATTED cmv_mat_view_n4; > {code} > {code} > POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4 > ... > Table Type: MATERIALIZED_VIEW > Table Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}} > bucketing_version 2 > numFiles1 > numRows 5 > rawDataSize 1025 > totalSize 509 > {code} > Missing table parameter > {code} > transaction = true > {code} > cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22163) CBO: Enabling CBO turns on stats estimation, even when the estimation is disabled
[ https://issues.apache.org/jira/browse/HIVE-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-22163: -- Status: Patch Available (was: In Progress) > CBO: Enabling CBO turns on stats estimation, even when the estimation is > disabled > - > > Key: HIVE-22163 > URL: https://issues.apache.org/jira/browse/HIVE-22163 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Gopal V >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22163.1.patch > > > {code} > create table claims(claim_rec_id bigint, claim_invoice_num string, typ_c int); > alter table claims update statistics set > ('numRows'='1154941534','rawDataSize'='1135307527922'); > set hive.stats.estimate=false; > explain extended select count(1) from claims where typ_c=3; > set hive.stats.ndv.estimate.percent=5e-7; > explain extended select count(1) from claims where typ_c=3; > {code} > Expecting the standard /2 for the single filter, but we instead get 5 rows. > {code} > 'Map Operator Tree:' > 'TableScan' > ' alias: claims' > ' filterExpr: (typ_c = 3) (type: boolean)' > ' Statistics: Num rows: 1154941534 Data size: 4388777832 > Basic stats: COMPLETE Column stats: NONE' > ' GatherStats: false' > ' Filter Operator' > 'isSamplingPred: false' > 'predicate: (typ_c = 3) (type: boolean)' > 'Statistics: Num rows: 5 Data size: 19 Basic stats: > COMPLETE Column stats: NONE' > {code} > The estimation is in effect, as changing the estimate.percent changes this. > {code} > ' filterExpr: (typ_c = 3) (type: boolean)' > ' Statistics: Num rows: 1154941534 Data size: 4388777832 > Basic stats: COMPLETE Column stats: NONE' > ' GatherStats: false' > ' Filter Operator' > 'isSamplingPred: false' > 'predicate: (typ_c = 3) (type: boolean)' > 'Statistics: Num rows: 230988307 Data size: 877755567 > Basic stats: COMPLETE Column stats: NONE' > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (HIVE-22163) CBO: Enabling CBO turns on stats estimation, even when the estimation is disabled
[ https://issues.apache.org/jira/browse/HIVE-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa updated HIVE-22163: -- Attachment: HIVE-22163.1.patch > CBO: Enabling CBO turns on stats estimation, even when the estimation is > disabled > - > > Key: HIVE-22163 > URL: https://issues.apache.org/jira/browse/HIVE-22163 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Gopal V >Assignee: Krisztian Kasa >Priority: Major > Attachments: HIVE-22163.1.patch > > > {code} > create table claims(claim_rec_id bigint, claim_invoice_num string, typ_c int); > alter table claims update statistics set > ('numRows'='1154941534','rawDataSize'='1135307527922'); > set hive.stats.estimate=false; > explain extended select count(1) from claims where typ_c=3; > set hive.stats.ndv.estimate.percent=5e-7; > explain extended select count(1) from claims where typ_c=3; > {code} > Expecting the standard /2 for the single filter, but we instead get 5 rows. > {code} > 'Map Operator Tree:' > 'TableScan' > ' alias: claims' > ' filterExpr: (typ_c = 3) (type: boolean)' > ' Statistics: Num rows: 1154941534 Data size: 4388777832 > Basic stats: COMPLETE Column stats: NONE' > ' GatherStats: false' > ' Filter Operator' > 'isSamplingPred: false' > 'predicate: (typ_c = 3) (type: boolean)' > 'Statistics: Num rows: 5 Data size: 19 Basic stats: > COMPLETE Column stats: NONE' > {code} > The estimation is in effect, as changing the estimate.percent changes this. > {code} > ' filterExpr: (typ_c = 3) (type: boolean)' > ' Statistics: Num rows: 1154941534 Data size: 4388777832 > Basic stats: COMPLETE Column stats: NONE' > ' GatherStats: false' > ' Filter Operator' > 'isSamplingPred: false' > 'predicate: (typ_c = 3) (type: boolean)' > 'Statistics: Num rows: 230988307 Data size: 877755567 > Basic stats: COMPLETE Column stats: NONE' > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)