[jira] [Created] (HIVE-22445) LazySimpleSerDe toString is not Correct

2019-11-01 Thread David Mollitor (Jira)
David Mollitor created HIVE-22445:
-

 Summary: LazySimpleSerDe toString is not Correct
 Key: HIVE-22445
 URL: https://issues.apache.org/jira/browse/HIVE-22445
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-22445.1.patch

{code:none}
2019-11-01T10:03:49,228  INFO [pool-23-thread-1] exec.FileSinkOperator: Using 
serializer : class 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe[[[B@983dd25]:[_col0, 
_col1]:[struct

[jira] [Created] (HIVE-22444) Clean up Project POM Files

2019-11-01 Thread David Mollitor (Jira)
David Mollitor created HIVE-22444:
-

 Summary: Clean up Project POM Files
 Key: HIVE-22444
 URL: https://issues.apache.org/jira/browse/HIVE-22444
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


# Address warnings in the build process
 # Use DependencyManagement in Root POM for ITest (see HIVE-22426)
 # General POM cleanup



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22443) HBase Maven site configuration causes Hive project to get a directory named ${project.basedir}

2019-11-01 Thread David Mollitor (Jira)
David Mollitor created HIVE-22443:
-

 Summary: HBase Maven site configuration causes Hive project to get 
a directory named ${project.basedir}
 Key: HIVE-22443
 URL: https://issues.apache.org/jira/browse/HIVE-22443
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


Upgrade HBase versions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22441) Metrics Subsytem Improvements

2019-10-31 Thread David Mollitor (Jira)
David Mollitor created HIVE-22441:
-

 Summary: Metrics Subsytem Improvements
 Key: HIVE-22441
 URL: https://issues.apache.org/jira/browse/HIVE-22441
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


# CodahaleMetrics uses Guava LoadingCache, which is already thread-safe, and 
then puts an explicit lock around the structure.  Use Java 8 new Map API with 
ConcurrentHashMap.
# Introduce Java 8 APIs
# Simplifications
# Updated unit tests to no longer include a 'sleep'

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/metrics/metrics2/CodahaleMetrics.java#L91-L94




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22428) Superfluous "Failed to get database" WARN Logging in ObjectStore

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22428:
-

 Summary: Superfluous "Failed to get database" WARN Logging in 
ObjectStore
 Key: HIVE-22428
 URL: https://issues.apache.org/jira/browse/HIVE-22428
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-22428.1.patch

In my testing, I get lots of logs like this:

{code:none}
Line 26319: 2019-10-28T21:09:52,134  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.compdb, returning 
NoSuchObjectException
Line 26327: 2019-10-28T21:09:52,135  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.compdb, returning 
NoSuchObjectException
Line 26504: 2019-10-28T21:09:52,600  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
NoSuchObjectException
Line 26519: 2019-10-28T21:09:52,606  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.tstatsfast, returning 
NoSuchObjectException
Line 26695: 2019-10-28T21:09:52,922  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.createDb, returning 
NoSuchObjectException
Line 26703: 2019-10-28T21:09:52,923  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.createDb, returning 
NoSuchObjectException
Line 26763: 2019-10-28T21:09:52,936  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.compdb, returning 
NoSuchObjectException
Line 26778: 2019-10-28T21:09:52,939  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.compdb, returning 
NoSuchObjectException
Line 26963: 2019-10-28T21:09:53,273  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.db1, returning 
NoSuchObjectException
Line 26978: 2019-10-28T21:09:53,276  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.db2, returning 
NoSuchObjectException
Line 26986: 2019-10-28T21:09:53,277  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.db1, returning 
NoSuchObjectException
Line 27018: 2019-10-28T21:09:53,300  WARN [pool-6-thread-5] 
metastore.ObjectStore: Failed to get database hive.db2, returning 
NoSuchObjectException
{code}

This is a superfluous log message.  It might be pretty common for a database to 
not exists if, for example, a user fat-fingers the name of the database.  The 
code also has the bad habit of log-and-throw.  Just log or throw, not both.

Since I'm looking at this class, touch up some of the other logging as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22427) PersistenceManagerProvider Logs a Warning About datanucleus.autoStartMechanismMode

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22427:
-

 Summary: PersistenceManagerProvider Logs a Warning About 
datanucleus.autoStartMechanismMode
 Key: HIVE-22427
 URL: https://issues.apache.org/jira/browse/HIVE-22427
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


{code:none}
WARN [pool-6-thread-2] metastore.PersistenceManagerProvider: 
datanucleus.autoStartMechanismMode is set to unsupported value null . Setting 
it to value: ignored
{code}

This does not need to be a WARN level logging for this scenario.  Perhaps if 
user configures the value to some non-null value, then emit a warning, 
otherwise, simply emit an INFO level stating that the configuration is not set 
and that a reasonable default value will be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22426) Use DependencyManagement in Root POM for itests

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22426:
-

 Summary: Use DependencyManagement in Root POM for itests
 Key: HIVE-22426
 URL: https://issues.apache.org/jira/browse/HIVE-22426
 Project: Hive
  Issue Type: Improvement
  Components: Test, Tests
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22425) ReplChangeManager Not Logging Database Name

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22425:
-

 Summary: ReplChangeManager Not Logging Database Name
 Key: HIVE-22425
 URL: https://issues.apache.org/jira/browse/HIVE-22425
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-22425.1.patch

{code:java|title=ReplChangeManager.java}
LOG.debug("Repl policy is not set for database ", db.getName());
{code}

The log statement is missing the placeholder '{}' so the DB name is not getting 
logged.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22424) User PerfLogger in MetastoreDirectSqlUtils.java

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22424:
-

 Summary: User PerfLogger in MetastoreDirectSqlUtils.java
 Key: HIVE-22424
 URL: https://issues.apache.org/jira/browse/HIVE-22424
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 3.2.0
Reporter: David Mollitor
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22423) Improve Logging In HadoopThriftAuthBridge

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22423:
-

 Summary: Improve Logging In HadoopThriftAuthBridge
 Key: HIVE-22423
 URL: https://issues.apache.org/jira/browse/HIVE-22423
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


# Remove superfluous debug log guards
# Improve messages
# Improve message format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22421) Improve Logging If Configuration File Not Found

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22421:
-

 Summary: Improve Logging If Configuration File Not Found
 Key: HIVE-22421
 URL: https://issues.apache.org/jira/browse/HIVE-22421
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


{code:none}
2019-10-28T21:07:27,599  INFO [main] conf.MetastoreConf: Unable to find config 
file metastore-site.xml
2019-10-28T21:07:27,599  INFO [main] conf.MetastoreConf: Found configuration 
file null
{code}

Prints 'unable to find' followed by 'null'.  Just print one or the other.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22419) Improve Messages Emitted From HiveMetaStoreClient

2019-10-29 Thread David Mollitor (Jira)
David Mollitor created HIVE-22419:
-

 Summary: Improve Messages Emitted From HiveMetaStoreClient
 Key: HIVE-22419
 URL: https://issues.apache.org/jira/browse/HIVE-22419
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


After reviewing some logs and errors emitted during a QTest run, I would like 
to propose some improvements to logging in {{HiveMetaStoreClient}}. 

* Remove duplicate logging
* Remove superfluous class {{StackTraceLogger}}
* Do not use contractions in public-facing error messages and logs
* Make all logging side-effect free (see {{connCount}})
* Code simplification



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22417) Remove stringifyException from MetaStore

2019-10-28 Thread David Mollitor (Jira)
David Mollitor created HIVE-22417:
-

 Summary: Remove stringifyException from MetaStore
 Key: HIVE-22417
 URL: https://issues.apache.org/jira/browse/HIVE-22417
 Project: Hive
  Issue Type: Sub-task
  Components: Metastore, Standalone Metastore
Affects Versions: 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22415) Upgrade to Java 11

2019-10-28 Thread David Mollitor (Jira)
David Mollitor created HIVE-22415:
-

 Summary: Upgrade to Java 11
 Key: HIVE-22415
 URL: https://issues.apache.org/jira/browse/HIVE-22415
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22404) Upgrade to Java 9

2019-10-25 Thread David Mollitor (Jira)
David Mollitor created HIVE-22404:
-

 Summary: Upgrade to Java 9
 Key: HIVE-22404
 URL: https://issues.apache.org/jira/browse/HIVE-22404
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22403) Beeline Should Print Location of Configuration Directory at Startup

2019-10-25 Thread David Mollitor (Jira)
David Mollitor created HIVE-22403:
-

 Summary: Beeline Should Print Location of Configuration Directory 
at Startup
 Key: HIVE-22403
 URL: https://issues.apache.org/jira/browse/HIVE-22403
 Project: Hive
  Issue Type: Improvement
  Components: Beeline
Affects Versions: 2.4.0, 3.2.0
Reporter: David Mollitor


Beeline should print the CONF directory it is utilizing when it starts up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22402) Deprecate Hive PerfLogger

2019-10-25 Thread David Mollitor (Jira)
David Mollitor created HIVE-22402:
-

 Summary: Deprecate Hive PerfLogger
 Key: HIVE-22402
 URL: https://issues.apache.org/jira/browse/HIVE-22402
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: David Mollitor
Assignee: David Mollitor






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22390) Remove Dependency on JODA Time Library

2019-10-22 Thread David Mollitor (Jira)
David Mollitor created HIVE-22390:
-

 Summary: Remove Dependency on JODA Time Library
 Key: HIVE-22390
 URL: https://issues.apache.org/jira/browse/HIVE-22390
 Project: Hive
  Issue Type: Improvement
Reporter: David Mollitor
Assignee: David Mollitor


Hive uses Joda time library.

{quote}
Joda-Time is the de facto standard date and time library for Java prior to Java 
SE 8. Users are now asked to migrate to java.time (JSR-310).

https://www.joda.org/joda-time/
{quote}

Remove this dependency from classes, POM files, and licence files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22370) Remove Deprecated Fields from HiveConf

2019-10-18 Thread David Mollitor (Jira)
David Mollitor created HIVE-22370:
-

 Summary: Remove Deprecated Fields from HiveConf
 Key: HIVE-22370
 URL: https://issues.apache.org/jira/browse/HIVE-22370
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: David Mollitor
Assignee: David Mollitor
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22337) Improve and Expand Text-Based SerDes

2019-10-14 Thread David Mollitor (Jira)
David Mollitor created HIVE-22337:
-

 Summary: Improve and Expand Text-Based SerDes
 Key: HIVE-22337
 URL: https://issues.apache.org/jira/browse/HIVE-22337
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0
Reporter: David Mollitor
Assignee: David Mollitor
 Fix For: 4.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22217) Better Logging for Hive JAR Reload

2019-09-18 Thread David Mollitor (Jira)
David Mollitor created HIVE-22217:
-

 Summary: Better Logging for Hive JAR Reload
 Key: HIVE-22217
 URL: https://issues.apache.org/jira/browse/HIVE-22217
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 2.3.6, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor


Troubleshooting Hive Reloadable Auxiliary JARs has always been difficult.

Add logging to at least confirm which JAR files are being loaded.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HIVE-22078) Upgrade arrow version to 0.14.1

2019-08-02 Thread David Mollitor (JIRA)
David Mollitor created HIVE-22078:
-

 Summary: Upgrade arrow version to 0.14.1
 Key: HIVE-22078
 URL: https://issues.apache.org/jira/browse/HIVE-22078
 Project: Hive
  Issue Type: Task
Affects Versions: 4.0.0
Reporter: David Mollitor






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-22032) Allow Hive JSON SerDe To Be Case Insensitive for Field Names

2019-07-22 Thread David Mollitor (JIRA)
David Mollitor created HIVE-22032:
-

 Summary: Allow Hive JSON SerDe To Be Case Insensitive for Field 
Names
 Key: HIVE-22032
 URL: https://issues.apache.org/jira/browse/HIVE-22032
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


https://fasterxml.github.io/jackson-databind/javadoc/2.9/com/fasterxml/jackson/databind/MapperFeature.html#ACCEPT_CASE_INSENSITIVE_PROPERTIES



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (HIVE-21792) Hive Indexes... Again

2019-05-24 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21792:
-

 Summary: Hive Indexes... Again
 Key: HIVE-21792
 URL: https://issues.apache.org/jira/browse/HIVE-21792
 Project: Hive
  Issue Type: New Feature
  Components: Indexing
Reporter: David Mollitor


Hive had an implementation of indexing that was made somewhat obsolete given 
the introduction of columnar file formats with their own internal indexing.

I propose that Hive introduce Indexing again.

# Column Index: Stored in HBase
# Full-Text Index: Stored in Solr

The basic idea is that, the key in HBase is the record and the value is the 
relative file path of the data in the Hive table.

Performing an INSERT statement creates the index for each record.

https://dev.mysql.com/doc/refman/8.0/en/create-index.html

When generating the explain plan, only the files involved in the query are 
considered.

This would prevents having to scan large amounts of data for the typical BI 
tools when the set of data is known to be very small.

{code:sql}
-- Quick retrieval of small sets of records
select * from user where userid=27;

-- Full scans
select count(1) from user;
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21748) HBase Operations Can Fail When Using MAPREDLOCAL

2019-05-17 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21748:
-

 Summary: HBase Operations Can Fail When Using MAPREDLOCAL
 Key: HIVE-21748
 URL: https://issues.apache.org/jira/browse/HIVE-21748
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


https://github.com/apache/hive/blob/5634140b2beacdac20ceec8c73ff36bce5675ef8/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java#L258-L262

{code:java|title=HBaseStorageHandler.java}
if (this.configureInputJobProps) {
  LOG.info("Configuring input job properties");
...
  try {
addHBaseDelegationToken(jobConf);
  } catch (IOException | MetaException e) {
throw new IllegalStateException("Error while configuring input job 
properties", e);
  }
   }
  else {
LOG.info("Configuring output job properties");
...
  }
{code}

What we can see here is that the HBase Delegation Token is only created when 
there is an input job (reading from an HBase source).  For a particular stage 
of a query, if there is no HBASE input, only HBASE output, then the delegation 
token is not created and will cause a failure.

{code:none|title=Error Message in HS2 Log}
2019-05-17 10:24:55,036 ERROR org.apache.hive.service.cli.operation.Operation: 
[HiveServer2-Background-Pool: Thread-388]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: 
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
at 
org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at 
org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
at 
org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
at 
org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)
at 
org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}


You can tell it will fail because an HDFS Token will be created, but it will 
not report an HBASE token in the HS2 logs.  The following is an example of a 
proper setup.  If it is missing the HBASE_AUTH_TOKEN it will fail because it 
will try to initiate Kerberos handshake and fail.

{code:none|title=Logging of a Proper Run}
2019-05-17 10:36:15,593 INFO  org.apache.hadoop.mapreduce.JobSubmitter: 
[HiveServer2-Background-Pool: Thread-455]: Submitting tokens for job: 
job_1557858663665_0048
2019-05-17 10:36:15,593 INFO  org.apache.hadoop.mapreduce.JobSubmitter: 
[HiveServer2-Background-Pool: Thread-455]: Kind: HDFS_DELEGATION_TOKEN, 
Service: 10.17.101.237:8020, Ident: (token for hive: HDFS_DELEGATION_TOKEN 
owner=hive/host-10-17-102-135.coe.cloudera@example.com, renewer=yarn, 
realUser=, issueDate=1558114574357, maxDate=1558719374357, sequenceNumber=75, 
masterKeyId=4)
2019-05-17 10:36:15,593 INFO  org.apache.hadoop.mapreduce.JobSubmitter: 
[HiveServer2-Background-Pool: Thread-455]: Kind: HBASE_AUTH_TOKEN, Service: 
9b282733-7927-4785-92ea-dad419f6f055, Ident: 
(org.apache.hadoop.hbase.security.token.AuthenticationTokenIdentifier@b1)
2019-05-17 10:36:15,859 INFO  
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: 
[HiveServer2-Background-Pool: Thread-455]: Submitted application 
application_1557858663665_0048
{code}

Error message in the Local MapReduce log.

{code:none|title=Error message}
2019-05-10 07:43:24,875 WARN  [htable-pool2-t1]: security.UserGroupInformation 
(UserGroupInformation.java:doAs(1927)) - PriviledgedActionException as:hive 
(auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed 
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed 
to find any Kerberos tgt)]
2019-05-10 07:43:24,876 WARN  [htable-pool2-t1]: ipc.RpcClientImpl 
(RpcClientImpl.java:run(675)) - Exception encountered while connecting to the 
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
2019-05-10 07:43:24,876 ERROR [htable-pool2-t1]: ipc.RpcClientImpl 
(RpcClientImpl.java:run(685)) - SASL authentication failed. The most likely 
cause is missing 

[jira] [Created] (HIVE-21747) Remove Dependency on org.cliffc.high_scale_lib.Counter

2019-05-17 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21747:
-

 Summary: Remove Dependency on org.cliffc.high_scale_lib.Counter
 Key: HIVE-21747
 URL: https://issues.apache.org/jira/browse/HIVE-21747
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


[https://github.com/apache/hive/blob/5634140b2beacdac20ceec8c73ff36bce5675ef8/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java#L327]

 

{code:java}
  static {
try {
  counterClass = Class.forName("org.cliffc.high_scale_lib.Counter");
} catch (ClassNotFoundException cnfe) {
  // this dependency is removed for HBase 1.0
}
{code}

I think this _counterClass_ stuff can be removed now that Hive is firmly on 
HBase 1.0+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21727) Allow For Ordinal Substitution

2019-05-14 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21727:
-

 Summary: Allow For Ordinal Substitution 
 Key: HIVE-21727
 URL: https://issues.apache.org/jira/browse/HIVE-21727
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


Impala allows for ordinal substitution.  Add a compatible feature to Hive to 
allow Hive to be more compatible with Impala.  Allows for more of a drop-in 
replacement.

[IMPALA-8548]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21655) Add Re-Try to LdapSearchFactory

2019-04-26 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21655:
-

 Summary: Add Re-Try to LdapSearchFactory
 Key: HIVE-21655
 URL: https://issues.apache.org/jira/browse/HIVE-21655
 Project: Hive
  Issue Type: Improvement
  Components: Authentication
Affects Versions: 4.0.0, 3.2.0
 Environment: It may be the case that LDAP service is temporarily 
unreachable.  Please implement a re-try facility here:

https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/auth/ldap/LdapSearchFactory.java#L41
Reporter: David Mollitor






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21581) Remove Lock in GetInputSummary

2019-04-04 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21581:
-

 Summary: Remove Lock in GetInputSummary
 Key: HIVE-21581
 URL: https://issues.apache.org/jira/browse/HIVE-21581
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Fix For: 4.0.0


Now that Hive compile lock has been relaxed in [HIVE-20535], remove the 
{{getInputSummary}} lock:

[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2459]

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21524) Impala Engine

2019-03-27 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21524:
-

 Summary: Impala Engine
 Key: HIVE-21524
 URL: https://issues.apache.org/jira/browse/HIVE-21524
 Project: Hive
  Issue Type: New Feature
Affects Versions: 4.0.0
Reporter: David Mollitor


Now that Impala has "dedicated coordinator" capability, it could be interesting 
to pair HiveServer2 instances with Impala dedicated coordinators on the same 
localhost.  A client could request an 'impala' execution engine and subsequent 
queries would be routed to the local coordinator.

{code:sql}
set hive.execution.engine=impala;
{code}

This would allow clients seamless access to both capabilities without needing 
different connections or drivers, Hive would also be a central location for 
auditing and authorization.

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_dedicated_coordinator.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21515) Improvement to MoveTrash Facilities

2019-03-26 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21515:
-

 Summary: Improvement to MoveTrash Facilities
 Key: HIVE-21515
 URL: https://issues.apache.org/jira/browse/HIVE-21515
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-21515.1.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21469) Review of ZooKeeperHiveLockManager

2019-03-18 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21469:
-

 Summary: Review of ZooKeeperHiveLockManager
 Key: HIVE-21469
 URL: https://issues.apache.org/jira/browse/HIVE-21469
 Project: Hive
  Issue Type: Improvement
  Components: Locking
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-21469.1.patch

A lot of sins in this class to resolve:

{code:java}
  @Override
  public void setContext(HiveLockManagerCtx ctx) throws LockException {
 try {
  curatorFramework = CuratorFrameworkSingleton.getInstance(conf);
  parent = conf.getVar(HiveConf.ConfVars.HIVE_ZOOKEEPER_NAMESPACE);
  try{
curatorFramework.create().withMode(CreateMode.PERSISTENT).forPath("/" + 
 parent, new byte[0]);
  } catch (Exception e) {
// ignore if the parent already exists
if (!(e instanceof KeeperException) || ((KeeperException)e).code() != 
KeeperException.Code.NODEEXISTS) {
  LOG.warn("Unexpected ZK exception when creating parent node /" + 
parent, e);
}
  }
{code}

Every time a new session is created and this {{setContext}} method is called, 
it attempts to create the root node.  I have seen that, even though the root 
node exists, an create node action is written to the ZK logs.  Check first if 
the node exists before trying to create it.

{code:java}
  try {
curatorFramework.delete().forPath(zLock.getPath());
  } catch (InterruptedException ie) {
curatorFramework.delete().forPath(zLock.getPath());
  }
{code}

There has historically been a quite a few bugs regarding leaked locks.  The 
Driver will signal the session {{Thread}} by performing an interrupt.  That 
interrupt can happen any time and it can kill a create/delete action within the 
ZK framework.  We can see one example of workaround for this.  If the ZK action 
is interrupted, simply do it again.  Well, what if it's interrupted yet again?  
The lock will be leaked anyway.  Also, when the {{InterruptedException}} is 
caught in the try block, the thread's interrupted flag is cleared.  The flag is 
not reset in this code and therefore we lose the fact that this thread has been 
interrupted.

{code:java}
if (tryNum > 1) {
  Thread.sleep(sleepTime);
}
unlockPrimitive(hiveLock, parent, curatorFramework);
break;
  } catch (Exception e) {
if (tryNum >= numRetriesForUnLock) {
  String name = ((ZooKeeperHiveLock)hiveLock).getPath();
  throw new LockException("Node " + name + " can not be deleted after " 
+ numRetriesForUnLock + " attempts.",
  e);
}
  }
{code}

... related... the sleep here may be interrupted, but we still need to delete 
the lock (again, for fear of leaking it).  This sleep should be 
uninterruptible.  If we need to get the lock deleted, and there's a problem, 
interrupting the sleep will cause the code to eventually exit and locks will be 
leaked.

It also requires a bunch more TLC.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21466) Increase Default Size of SPLIT_MAXSIZE

2019-03-18 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21466:
-

 Summary: Increase Default Size of SPLIT_MAXSIZE
 Key: HIVE-21466
 URL: https://issues.apache.org/jira/browse/HIVE-21466
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor
Assignee: David Mollitor
 Attachments: HIVE-21466.1.patch

{code:java}
 MAPREDMAXSPLITSIZE(FileInputFormat.SPLIT_MAXSIZE, 25600L, "", true),
{code}
[https://github.com/apache/hive/blob/8d4300a02691777fc96f33861ed27e64fed72f2c/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L682]

This field specifies a maximum size for each MR (maybe other?) splits.

This number should be a multiple of the HDFS Block size. The way that this 
maximum is implemented, is that each block is added to the split, and if the 
split grows to be larger than the maximum allowed, the split is submitted to 
the cluster and a new split is opened.

So, imagine the following scenario:
 * HDFS block size of 16 bytes
 * Maximum size of 40 bytes

This will produce a split with 3 blocks. (2x16) = 32; another block will be 
inserted, (3x16) = 48 bytes in the split. So, while many operators would assume 
a split of 2 blocks, the actual is 3 blocks. Setting the maximum split size to 
a multiple of the HDFS block size will make this behavior less confusing.

The current setting is ~256MB and when this was introduced, the default HDFS 
block size was 64MB. That is a factor of 4x. However, now HDFS block sizes are 
128MB by default, so I propose setting this to 4x128MB.  The larger splits 
(fewer tasks) should give a nice performance boost for modern hardware.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21433) Doc: Remove Reference to hive.stats.avg.row.size

2019-03-12 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21433:
-

 Summary: Doc: Remove Reference to hive.stats.avg.row.size
 Key: HIVE-21433
 URL: https://issues.apache.org/jira/browse/HIVE-21433
 Project: Hive
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 4.0.0
Reporter: David Mollitor


[https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties]

 

Remove reference to {{hive.stats.avg.row.size}}.  I think it's been replaced by 
{{hive.stats.max.variable.length}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21426) Remove Utilities Global Random

2019-03-11 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21426:
-

 Summary: Remove Utilities Global Random
 Key: HIVE-21426
 URL: https://issues.apache.org/jira/browse/HIVE-21426
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L253

Remove global {{Random}} object in favor of {{ThreadLocalRandom}}.

{quote}
ThreadLocalRandom is initialized with an internally generated seed that may not 
otherwise be modified. When applicable, use of ThreadLocalRandom rather than 
shared Random objects in concurrent programs will typically encounter much less 
overhead and contention.
{quote}

https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadLocalRandom.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21425) Use newDirectExecutorService for getInputSummary

2019-03-11 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21425:
-

 Summary: Use newDirectExecutorService for getInputSummary
 Key: HIVE-21425
 URL: https://issues.apache.org/jira/browse/HIVE-21425
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


{code:java|title=Utilities.java}
  int numExecutors = getMaxExecutorsForInputListing(ctx.getConf(), 
pathNeedProcess.size());
  if (numExecutors > 1) {
LOG.info("Using {} threads for getContentSummary", numExecutors);
executor = Executors.newFixedThreadPool(numExecutors,
new ThreadFactoryBuilder().setDaemon(true)
.setNameFormat("Get-Input-Summary-%d").build());
  } else {
executor = null;
  }
{code}

https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2482-L2490

Instead of using a 'null' {{ExecutorService}}, use Guava's 
{{DirectExecutorService}} and remove special casing for a 'null' value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21414) Hive JSON SerDe Does Not Properly Handle Field Comments

2019-03-08 Thread David Mollitor (JIRA)
David Mollitor created HIVE-21414:
-

 Summary: Hive JSON SerDe Does Not Properly Handle Field Comments
 Key: HIVE-21414
 URL: https://issues.apache.org/jira/browse/HIVE-21414
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 4.0.0, 3.2.0
Reporter: David Mollitor


Field comments are handed to the JSON SerDe from HMS and then are ignored.  The 
result is that all field comments are 'from deserializer' and cannot be changed.

For example, Avro SerDe handles comments:

https://github.com/apache/hive/blob/release-1.1.0/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerDe.java#L133



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3