[jira] [Created] (HIVE-22445) LazySimpleSerDe toString is not Correct
David Mollitor created HIVE-22445: - Summary: LazySimpleSerDe toString is not Correct Key: HIVE-22445 URL: https://issues.apache.org/jira/browse/HIVE-22445 Project: Hive Issue Type: Improvement Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-22445.1.patch {code:none} 2019-11-01T10:03:49,228 INFO [pool-23-thread-1] exec.FileSinkOperator: Using serializer : class org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe[[[B@983dd25]:[_col0, _col1]:[struct
[jira] [Created] (HIVE-22444) Clean up Project POM Files
David Mollitor created HIVE-22444: - Summary: Clean up Project POM Files Key: HIVE-22444 URL: https://issues.apache.org/jira/browse/HIVE-22444 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor # Address warnings in the build process # Use DependencyManagement in Root POM for ITest (see HIVE-22426) # General POM cleanup -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22443) HBase Maven site configuration causes Hive project to get a directory named ${project.basedir}
David Mollitor created HIVE-22443: - Summary: HBase Maven site configuration causes Hive project to get a directory named ${project.basedir} Key: HIVE-22443 URL: https://issues.apache.org/jira/browse/HIVE-22443 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Upgrade HBase versions -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22441) Metrics Subsytem Improvements
David Mollitor created HIVE-22441: - Summary: Metrics Subsytem Improvements Key: HIVE-22441 URL: https://issues.apache.org/jira/browse/HIVE-22441 Project: Hive Issue Type: Improvement Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor # CodahaleMetrics uses Guava LoadingCache, which is already thread-safe, and then puts an explicit lock around the structure. Use Java 8 new Map API with ConcurrentHashMap. # Introduce Java 8 APIs # Simplifications # Updated unit tests to no longer include a 'sleep' https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java#L91-L94 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22428) Superfluous "Failed to get database" WARN Logging in ObjectStore
David Mollitor created HIVE-22428: - Summary: Superfluous "Failed to get database" WARN Logging in ObjectStore Key: HIVE-22428 URL: https://issues.apache.org/jira/browse/HIVE-22428 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-22428.1.patch In my testing, I get lots of logs like this: {code:none} Line 26319: 2019-10-28T21:09:52,134 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.compdb, returning NoSuchObjectException Line 26327: 2019-10-28T21:09:52,135 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.compdb, returning NoSuchObjectException Line 26504: 2019-10-28T21:09:52,600 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.tstatsfast, returning NoSuchObjectException Line 26519: 2019-10-28T21:09:52,606 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.tstatsfast, returning NoSuchObjectException Line 26695: 2019-10-28T21:09:52,922 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.createDb, returning NoSuchObjectException Line 26703: 2019-10-28T21:09:52,923 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.createDb, returning NoSuchObjectException Line 26763: 2019-10-28T21:09:52,936 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.compdb, returning NoSuchObjectException Line 26778: 2019-10-28T21:09:52,939 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.compdb, returning NoSuchObjectException Line 26963: 2019-10-28T21:09:53,273 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.db1, returning NoSuchObjectException Line 26978: 2019-10-28T21:09:53,276 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.db2, returning NoSuchObjectException Line 26986: 2019-10-28T21:09:53,277 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.db1, returning NoSuchObjectException Line 27018: 2019-10-28T21:09:53,300 WARN [pool-6-thread-5] metastore.ObjectStore: Failed to get database hive.db2, returning NoSuchObjectException {code} This is a superfluous log message. It might be pretty common for a database to not exists if, for example, a user fat-fingers the name of the database. The code also has the bad habit of log-and-throw. Just log or throw, not both. Since I'm looking at this class, touch up some of the other logging as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22427) PersistenceManagerProvider Logs a Warning About datanucleus.autoStartMechanismMode
David Mollitor created HIVE-22427: - Summary: PersistenceManagerProvider Logs a Warning About datanucleus.autoStartMechanismMode Key: HIVE-22427 URL: https://issues.apache.org/jira/browse/HIVE-22427 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor {code:none} WARN [pool-6-thread-2] metastore.PersistenceManagerProvider: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored {code} This does not need to be a WARN level logging for this scenario. Perhaps if user configures the value to some non-null value, then emit a warning, otherwise, simply emit an INFO level stating that the configuration is not set and that a reasonable default value will be used. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22426) Use DependencyManagement in Root POM for itests
David Mollitor created HIVE-22426: - Summary: Use DependencyManagement in Root POM for itests Key: HIVE-22426 URL: https://issues.apache.org/jira/browse/HIVE-22426 Project: Hive Issue Type: Improvement Components: Test, Tests Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22425) ReplChangeManager Not Logging Database Name
David Mollitor created HIVE-22425: - Summary: ReplChangeManager Not Logging Database Name Key: HIVE-22425 URL: https://issues.apache.org/jira/browse/HIVE-22425 Project: Hive Issue Type: Improvement Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-22425.1.patch {code:java|title=ReplChangeManager.java} LOG.debug("Repl policy is not set for database ", db.getName()); {code} The log statement is missing the placeholder '{}' so the DB name is not getting logged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22424) User PerfLogger in MetastoreDirectSqlUtils.java
David Mollitor created HIVE-22424: - Summary: User PerfLogger in MetastoreDirectSqlUtils.java Key: HIVE-22424 URL: https://issues.apache.org/jira/browse/HIVE-22424 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 3.2.0 Reporter: David Mollitor Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22423) Improve Logging In HadoopThriftAuthBridge
David Mollitor created HIVE-22423: - Summary: Improve Logging In HadoopThriftAuthBridge Key: HIVE-22423 URL: https://issues.apache.org/jira/browse/HIVE-22423 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor # Remove superfluous debug log guards # Improve messages # Improve message format -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22421) Improve Logging If Configuration File Not Found
David Mollitor created HIVE-22421: - Summary: Improve Logging If Configuration File Not Found Key: HIVE-22421 URL: https://issues.apache.org/jira/browse/HIVE-22421 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor {code:none} 2019-10-28T21:07:27,599 INFO [main] conf.MetastoreConf: Unable to find config file metastore-site.xml 2019-10-28T21:07:27,599 INFO [main] conf.MetastoreConf: Found configuration file null {code} Prints 'unable to find' followed by 'null'. Just print one or the other. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22419) Improve Messages Emitted From HiveMetaStoreClient
David Mollitor created HIVE-22419: - Summary: Improve Messages Emitted From HiveMetaStoreClient Key: HIVE-22419 URL: https://issues.apache.org/jira/browse/HIVE-22419 Project: Hive Issue Type: Improvement Components: Standalone Metastore Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor After reviewing some logs and errors emitted during a QTest run, I would like to propose some improvements to logging in {{HiveMetaStoreClient}}. * Remove duplicate logging * Remove superfluous class {{StackTraceLogger}} * Do not use contractions in public-facing error messages and logs * Make all logging side-effect free (see {{connCount}}) * Code simplification -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22417) Remove stringifyException from MetaStore
David Mollitor created HIVE-22417: - Summary: Remove stringifyException from MetaStore Key: HIVE-22417 URL: https://issues.apache.org/jira/browse/HIVE-22417 Project: Hive Issue Type: Sub-task Components: Metastore, Standalone Metastore Affects Versions: 3.2.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22415) Upgrade to Java 11
David Mollitor created HIVE-22415: - Summary: Upgrade to Java 11 Key: HIVE-22415 URL: https://issues.apache.org/jira/browse/HIVE-22415 Project: Hive Issue Type: Improvement Reporter: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22404) Upgrade to Java 9
David Mollitor created HIVE-22404: - Summary: Upgrade to Java 9 Key: HIVE-22404 URL: https://issues.apache.org/jira/browse/HIVE-22404 Project: Hive Issue Type: Improvement Reporter: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22403) Beeline Should Print Location of Configuration Directory at Startup
David Mollitor created HIVE-22403: - Summary: Beeline Should Print Location of Configuration Directory at Startup Key: HIVE-22403 URL: https://issues.apache.org/jira/browse/HIVE-22403 Project: Hive Issue Type: Improvement Components: Beeline Affects Versions: 2.4.0, 3.2.0 Reporter: David Mollitor Beeline should print the CONF directory it is utilizing when it starts up. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22402) Deprecate Hive PerfLogger
David Mollitor created HIVE-22402: - Summary: Deprecate Hive PerfLogger Key: HIVE-22402 URL: https://issues.apache.org/jira/browse/HIVE-22402 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22390) Remove Dependency on JODA Time Library
David Mollitor created HIVE-22390: - Summary: Remove Dependency on JODA Time Library Key: HIVE-22390 URL: https://issues.apache.org/jira/browse/HIVE-22390 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor Hive uses Joda time library. {quote} Joda-Time is the de facto standard date and time library for Java prior to Java SE 8. Users are now asked to migrate to java.time (JSR-310). https://www.joda.org/joda-time/ {quote} Remove this dependency from classes, POM files, and licence files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22370) Remove Deprecated Fields from HiveConf
David Mollitor created HIVE-22370: - Summary: Remove Deprecated Fields from HiveConf Key: HIVE-22370 URL: https://issues.apache.org/jira/browse/HIVE-22370 Project: Hive Issue Type: Improvement Affects Versions: 3.0.0 Reporter: David Mollitor Assignee: David Mollitor Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22337) Improve and Expand Text-Based SerDes
David Mollitor created HIVE-22337: - Summary: Improve and Expand Text-Based SerDes Key: HIVE-22337 URL: https://issues.apache.org/jira/browse/HIVE-22337 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 4.0.0 Reporter: David Mollitor Assignee: David Mollitor Fix For: 4.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22217) Better Logging for Hive JAR Reload
David Mollitor created HIVE-22217: - Summary: Better Logging for Hive JAR Reload Key: HIVE-22217 URL: https://issues.apache.org/jira/browse/HIVE-22217 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 2.3.6, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Troubleshooting Hive Reloadable Auxiliary JARs has always been difficult. Add logging to at least confirm which JAR files are being loaded. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-22078) Upgrade arrow version to 0.14.1
David Mollitor created HIVE-22078: - Summary: Upgrade arrow version to 0.14.1 Key: HIVE-22078 URL: https://issues.apache.org/jira/browse/HIVE-22078 Project: Hive Issue Type: Task Affects Versions: 4.0.0 Reporter: David Mollitor -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-22032) Allow Hive JSON SerDe To Be Case Insensitive for Field Names
David Mollitor created HIVE-22032: - Summary: Allow Hive JSON SerDe To Be Case Insensitive for Field Names Key: HIVE-22032 URL: https://issues.apache.org/jira/browse/HIVE-22032 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor https://fasterxml.github.io/jackson-databind/javadoc/2.9/com/fasterxml/jackson/databind/MapperFeature.html#ACCEPT_CASE_INSENSITIVE_PROPERTIES -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (HIVE-21792) Hive Indexes... Again
David Mollitor created HIVE-21792: - Summary: Hive Indexes... Again Key: HIVE-21792 URL: https://issues.apache.org/jira/browse/HIVE-21792 Project: Hive Issue Type: New Feature Components: Indexing Reporter: David Mollitor Hive had an implementation of indexing that was made somewhat obsolete given the introduction of columnar file formats with their own internal indexing. I propose that Hive introduce Indexing again. # Column Index: Stored in HBase # Full-Text Index: Stored in Solr The basic idea is that, the key in HBase is the record and the value is the relative file path of the data in the Hive table. Performing an INSERT statement creates the index for each record. https://dev.mysql.com/doc/refman/8.0/en/create-index.html When generating the explain plan, only the files involved in the query are considered. This would prevents having to scan large amounts of data for the typical BI tools when the set of data is known to be very small. {code:sql} -- Quick retrieval of small sets of records select * from user where userid=27; -- Full scans select count(1) from user; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21748) HBase Operations Can Fail When Using MAPREDLOCAL
David Mollitor created HIVE-21748: - Summary: HBase Operations Can Fail When Using MAPREDLOCAL Key: HIVE-21748 URL: https://issues.apache.org/jira/browse/HIVE-21748 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor https://github.com/apache/hive/blob/5634140b2beacdac20ceec8c73ff36bce5675ef8/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java#L258-L262 {code:java|title=HBaseStorageHandler.java} if (this.configureInputJobProps) { LOG.info("Configuring input job properties"); ... try { addHBaseDelegationToken(jobConf); } catch (IOException | MetaException e) { throw new IllegalStateException("Error while configuring input job properties", e); } } else { LOG.info("Configuring output job properties"); ... } {code} What we can see here is that the HBase Delegation Token is only created when there is an input job (reading from an HBase source). For a particular stage of a query, if there is no HBASE input, only HBASE output, then the delegation token is not created and will cause a failure. {code:none|title=Error Message in HS2 Log} 2019-05-17 10:24:55,036 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-388]: Error running hive query: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238) at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89) at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924) at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} You can tell it will fail because an HDFS Token will be created, but it will not report an HBASE token in the HS2 logs. The following is an example of a proper setup. If it is missing the HBASE_AUTH_TOKEN it will fail because it will try to initiate Kerberos handshake and fail. {code:none|title=Logging of a Proper Run} 2019-05-17 10:36:15,593 INFO org.apache.hadoop.mapreduce.JobSubmitter: [HiveServer2-Background-Pool: Thread-455]: Submitting tokens for job: job_1557858663665_0048 2019-05-17 10:36:15,593 INFO org.apache.hadoop.mapreduce.JobSubmitter: [HiveServer2-Background-Pool: Thread-455]: Kind: HDFS_DELEGATION_TOKEN, Service: 10.17.101.237:8020, Ident: (token for hive: HDFS_DELEGATION_TOKEN owner=hive/host-10-17-102-135.coe.cloudera@example.com, renewer=yarn, realUser=, issueDate=1558114574357, maxDate=1558719374357, sequenceNumber=75, masterKeyId=4) 2019-05-17 10:36:15,593 INFO org.apache.hadoop.mapreduce.JobSubmitter: [HiveServer2-Background-Pool: Thread-455]: Kind: HBASE_AUTH_TOKEN, Service: 9b282733-7927-4785-92ea-dad419f6f055, Ident: (org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@b1) 2019-05-17 10:36:15,859 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: [HiveServer2-Background-Pool: Thread-455]: Submitted application application_1557858663665_0048 {code} Error message in the Local MapReduce log. {code:none|title=Error message} 2019-05-10 07:43:24,875 WARN [htable-pool2-t1]: security.UserGroupInformation (UserGroupInformation.java:doAs(1927)) - PriviledgedActionException as:hive (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2019-05-10 07:43:24,876 WARN [htable-pool2-t1]: ipc.RpcClientImpl (RpcClientImpl.java:run(675)) - Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2019-05-10 07:43:24,876 ERROR [htable-pool2-t1]: ipc.RpcClientImpl (RpcClientImpl.java:run(685)) - SASL authentication failed. The most likely cause is missing
[jira] [Created] (HIVE-21747) Remove Dependency on org.cliffc.high_scale_lib.Counter
David Mollitor created HIVE-21747: - Summary: Remove Dependency on org.cliffc.high_scale_lib.Counter Key: HIVE-21747 URL: https://issues.apache.org/jira/browse/HIVE-21747 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor [https://github.com/apache/hive/blob/5634140b2beacdac20ceec8c73ff36bce5675ef8/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java#L327] {code:java} static { try { counterClass = Class.forName("org.cliffc.high_scale_lib.Counter"); } catch (ClassNotFoundException cnfe) { // this dependency is removed for HBase 1.0 } {code} I think this _counterClass_ stuff can be removed now that Hive is firmly on HBase 1.0+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21727) Allow For Ordinal Substitution
David Mollitor created HIVE-21727: - Summary: Allow For Ordinal Substitution Key: HIVE-21727 URL: https://issues.apache.org/jira/browse/HIVE-21727 Project: Hive Issue Type: New Feature Components: Query Processor Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Impala allows for ordinal substitution. Add a compatible feature to Hive to allow Hive to be more compatible with Impala. Allows for more of a drop-in replacement. [IMPALA-8548] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21655) Add Re-Try to LdapSearchFactory
David Mollitor created HIVE-21655: - Summary: Add Re-Try to LdapSearchFactory Key: HIVE-21655 URL: https://issues.apache.org/jira/browse/HIVE-21655 Project: Hive Issue Type: Improvement Components: Authentication Affects Versions: 4.0.0, 3.2.0 Environment: It may be the case that LDAP service is temporarily unreachable. Please implement a re-try facility here: https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/auth/ldap/LdapSearchFactory.java#L41 Reporter: David Mollitor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21581) Remove Lock in GetInputSummary
David Mollitor created HIVE-21581: - Summary: Remove Lock in GetInputSummary Key: HIVE-21581 URL: https://issues.apache.org/jira/browse/HIVE-21581 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Fix For: 4.0.0 Now that Hive compile lock has been relaxed in [HIVE-20535], remove the {{getInputSummary}} lock: [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2459] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21524) Impala Engine
David Mollitor created HIVE-21524: - Summary: Impala Engine Key: HIVE-21524 URL: https://issues.apache.org/jira/browse/HIVE-21524 Project: Hive Issue Type: New Feature Affects Versions: 4.0.0 Reporter: David Mollitor Now that Impala has "dedicated coordinator" capability, it could be interesting to pair HiveServer2 instances with Impala dedicated coordinators on the same localhost. A client could request an 'impala' execution engine and subsequent queries would be routed to the local coordinator. {code:sql} set hive.execution.engine=impala; {code} This would allow clients seamless access to both capabilities without needing different connections or drivers, Hive would also be a central location for auditing and authorization. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_dedicated_coordinator.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21515) Improvement to MoveTrash Facilities
David Mollitor created HIVE-21515: - Summary: Improvement to MoveTrash Facilities Key: HIVE-21515 URL: https://issues.apache.org/jira/browse/HIVE-21515 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-21515.1.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21469) Review of ZooKeeperHiveLockManager
David Mollitor created HIVE-21469: - Summary: Review of ZooKeeperHiveLockManager Key: HIVE-21469 URL: https://issues.apache.org/jira/browse/HIVE-21469 Project: Hive Issue Type: Improvement Components: Locking Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-21469.1.patch A lot of sins in this class to resolve: {code:java} @Override public void setContext(HiveLockManagerCtx ctx) throws LockException { try { curatorFramework = CuratorFrameworkSingleton.getInstance(conf); parent = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_NAMESPACE); try{ curatorFramework.create().withMode(CreateMode.PERSISTENT).forPath("/" + parent, new byte[0]); } catch (Exception e) { // ignore if the parent already exists if (!(e instanceof KeeperException) || ((KeeperException)e).code() != KeeperException.Code.NODEEXISTS) { LOG.warn("Unexpected ZK exception when creating parent node /" + parent, e); } } {code} Every time a new session is created and this {{setContext}} method is called, it attempts to create the root node. I have seen that, even though the root node exists, an create node action is written to the ZK logs. Check first if the node exists before trying to create it. {code:java} try { curatorFramework.delete().forPath(zLock.getPath()); } catch (InterruptedException ie) { curatorFramework.delete().forPath(zLock.getPath()); } {code} There has historically been a quite a few bugs regarding leaked locks. The Driver will signal the session {{Thread}} by performing an interrupt. That interrupt can happen any time and it can kill a create/delete action within the ZK framework. We can see one example of workaround for this. If the ZK action is interrupted, simply do it again. Well, what if it's interrupted yet again? The lock will be leaked anyway. Also, when the {{InterruptedException}} is caught in the try block, the thread's interrupted flag is cleared. The flag is not reset in this code and therefore we lose the fact that this thread has been interrupted. {code:java} if (tryNum > 1) { Thread.sleep(sleepTime); } unlockPrimitive(hiveLock, parent, curatorFramework); break; } catch (Exception e) { if (tryNum >= numRetriesForUnLock) { String name = ((ZooKeeperHiveLock)hiveLock).getPath(); throw new LockException("Node " + name + " can not be deleted after " + numRetriesForUnLock + " attempts.", e); } } {code} ... related... the sleep here may be interrupted, but we still need to delete the lock (again, for fear of leaking it). This sleep should be uninterruptible. If we need to get the lock deleted, and there's a problem, interrupting the sleep will cause the code to eventually exit and locks will be leaked. It also requires a bunch more TLC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21466) Increase Default Size of SPLIT_MAXSIZE
David Mollitor created HIVE-21466: - Summary: Increase Default Size of SPLIT_MAXSIZE Key: HIVE-21466 URL: https://issues.apache.org/jira/browse/HIVE-21466 Project: Hive Issue Type: Improvement Components: Configuration Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Assignee: David Mollitor Attachments: HIVE-21466.1.patch {code:java} MAPREDMAXSPLITSIZE(FileInputFormat.SPLIT_MAXSIZE, 25600L, "", true), {code} [https://github.com/apache/hive/blob/8d4300a02691777fc96f33861ed27e64fed72f2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L682] This field specifies a maximum size for each MR (maybe other?) splits. This number should be a multiple of the HDFS Block size. The way that this maximum is implemented, is that each block is added to the split, and if the split grows to be larger than the maximum allowed, the split is submitted to the cluster and a new split is opened. So, imagine the following scenario: * HDFS block size of 16 bytes * Maximum size of 40 bytes This will produce a split with 3 blocks. (2x16) = 32; another block will be inserted, (3x16) = 48 bytes in the split. So, while many operators would assume a split of 2 blocks, the actual is 3 blocks. Setting the maximum split size to a multiple of the HDFS block size will make this behavior less confusing. The current setting is ~256MB and when this was introduced, the default HDFS block size was 64MB. That is a factor of 4x. However, now HDFS block sizes are 128MB by default, so I propose setting this to 4x128MB. The larger splits (fewer tasks) should give a nice performance boost for modern hardware. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21433) Doc: Remove Reference to hive.stats.avg.row.size
David Mollitor created HIVE-21433: - Summary: Doc: Remove Reference to hive.stats.avg.row.size Key: HIVE-21433 URL: https://issues.apache.org/jira/browse/HIVE-21433 Project: Hive Issue Type: Improvement Components: Documentation Affects Versions: 4.0.0 Reporter: David Mollitor [https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties] Remove reference to {{hive.stats.avg.row.size}}. I think it's been replaced by {{hive.stats.max.variable.length}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21426) Remove Utilities Global Random
David Mollitor created HIVE-21426: - Summary: Remove Utilities Global Random Key: HIVE-21426 URL: https://issues.apache.org/jira/browse/HIVE-21426 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L253 Remove global {{Random}} object in favor of {{ThreadLocalRandom}}. {quote} ThreadLocalRandom is initialized with an internally generated seed that may not otherwise be modified. When applicable, use of ThreadLocalRandom rather than shared Random objects in concurrent programs will typically encounter much less overhead and contention. {quote} https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadLocalRandom.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21425) Use newDirectExecutorService for getInputSummary
David Mollitor created HIVE-21425: - Summary: Use newDirectExecutorService for getInputSummary Key: HIVE-21425 URL: https://issues.apache.org/jira/browse/HIVE-21425 Project: Hive Issue Type: Improvement Components: HiveServer2 Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor {code:java|title=Utilities.java} int numExecutors = getMaxExecutorsForInputListing(ctx.getConf(), pathNeedProcess.size()); if (numExecutors > 1) { LOG.info("Using {} threads for getContentSummary", numExecutors); executor = Executors.newFixedThreadPool(numExecutors, new ThreadFactoryBuilder().setDaemon(true) .setNameFormat("Get-Input-Summary-%d").build()); } else { executor = null; } {code} https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2482-L2490 Instead of using a 'null' {{ExecutorService}}, use Guava's {{DirectExecutorService}} and remove special casing for a 'null' value. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21414) Hive JSON SerDe Does Not Properly Handle Field Comments
David Mollitor created HIVE-21414: - Summary: Hive JSON SerDe Does Not Properly Handle Field Comments Key: HIVE-21414 URL: https://issues.apache.org/jira/browse/HIVE-21414 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 4.0.0, 3.2.0 Reporter: David Mollitor Field comments are handed to the JSON SerDe from HMS and then are ignored. The result is that all field comments are 'from deserializer' and cannot be changed. For example, Avro SerDe handles comments: https://github.com/apache/hive/blob/release-1.1.0/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L133 -- This message was sent by Atlassian JIRA (v7.6.3#76005)