[jira] [Created] (HIVE-20479) Update content/people.mdtext in cms

2018-08-28 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-20479:
-

 Summary: Update content/people.mdtext in cms 
 Key: HIVE-20479
 URL: https://issues.apache.org/jira/browse/HIVE-20479
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman


I added myself to the committers list. 

 
{code:java}
asherman 
Andrew Sherman 
http://cloudera.com/;>Cloudera 
 

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20030) Fix Java compile errors that show up in IntelliJ from ConvertJoinMapJoin.java and AnnotateRunTimeStatsOptimizer.java

2018-06-28 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-20030:
-

 Summary: Fix Java compile errors that show up in IntelliJ from 
ConvertJoinMapJoin.java and AnnotateRunTimeStatsOptimizer.java
 Key: HIVE-20030
 URL: https://issues.apache.org/jira/browse/HIVE-20030
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman


For some reason the Java compiler in IntellJ is more strict that the Oracle jdk 
compiler. Maybe this is something that can be configured away, but as it is 
simple I propose to make the code more type correct. 

{code}
/Users/asherman/git/asf/hive2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConvertJoinMapJoin.java
Error:(613, 24) java: no suitable method found for 
findOperatorsUpstream(java.util.List>,java.lang.Class)
method 
org.apache.hadoop.hive.ql.exec.OperatorUtils.findOperatorsUpstream(org.apache.hadoop.hive.ql.exec.Operator,java.lang.Class)
 is not applicable
  (cannot infer type-variable(s) T
(argument mismatch; 
java.util.List> cannot be converted to 
org.apache.hadoop.hive.ql.exec.Operator))
method 
org.apache.hadoop.hive.ql.exec.OperatorUtils.findOperatorsUpstream(java.util.Collection>,java.lang.Class)
 is not applicable
  (cannot infer type-variable(s) T
(argument mismatch; 
java.util.List> cannot be converted to 
java.util.Collection>))
method 
org.apache.hadoop.hive.ql.exec.OperatorUtils.findOperatorsUpstream(org.apache.hadoop.hive.ql.exec.Operator,java.lang.Class,java.util.Set)
 is not applicable
  (cannot infer type-variable(s) T
(actual and formal argument lists differ in length))
{code}

and

{code}
/Users/asherman/git/asf/hive2/ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/AnnotateRunTimeStatsOptimizer.java
Error:(76, 12) java: no suitable method found for 
addAll(java.util.List>)
method java.util.Collection.addAll(java.util.Collection>) is not applicable
  (argument mismatch; 
java.util.List> cannot be converted 
to java.util.Collection>)
method java.util.Set.addAll(java.util.Collection>) is not applicable
  (argument mismatch; 
java.util.List> cannot be converted 
to java.util.Collection>)
Error:(80, 14) java: no suitable method found for 
addAll(java.util.Set>)
method java.util.Collection.addAll(java.util.Collection>) is not applicable
  (argument mismatch; 
java.util.Set> cannot be converted 
to java.util.Collection>)
method java.util.Set.addAll(java.util.Collection>) is not applicable
  (argument mismatch; 
java.util.Set> cannot be converted 
to java.util.Collection>)
Error:(85, 14) java: no suitable method found for 
addAll(java.util.Set>)
method java.util.Collection.addAll(java.util.Collection>) is not applicable
  (argument mismatch; 
java.util.Set> cannot be converted 
to java.util.Collection>)
method java.util.Set.addAll(java.util.Collection>) is not applicable
  (argument mismatch; 
java.util.Set> cannot be converted 
to java.util.Collection>)
/Users/asherman/git/asf/hive2/ql/target/generated-sources/java/org/apache/hadoop/hive/ql/exec/vector/expressions/gen/IntervalYearMonthScalarAddTimestampColumn.java
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19987) Add logging of runtime statistics indicating when Hdfs Erasure Coding is used by Spark

2018-06-25 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-19987:
-

 Summary: Add logging of runtime statistics indicating when Hdfs 
Erasure Coding is used by Spark
 Key: HIVE-19987
 URL: https://issues.apache.org/jira/browse/HIVE-19987
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19986) Add logging of runtime statistics indicating when Hdfs Erasure Coding is used by MR

2018-06-25 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-19986:
-

 Summary: Add logging of runtime statistics indicating when Hdfs 
Erasure Coding is used by MR
 Key: HIVE-19986
 URL: https://issues.apache.org/jira/browse/HIVE-19986
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19971) TestRuntimeStats.testCleanup() is flaky

2018-06-22 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-19971:
-

 Summary: TestRuntimeStats.testCleanup() is flaky
 Key: HIVE-19971
 URL: https://issues.apache.org/jira/browse/HIVE-19971
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman


This test is timing dependent and sometimes fails. [You can see that it 
sometimes fails in otherwise clean 
runs|https://issues.apache.org/jira/issues/?jql=text%20~%20%22TestRuntimeStats%22].
  The test inserts a stat, sleeps for 2 seconds, inserts another stat, then 
deletes stats that are older than 1 second. The test asserts that exactly one 
stat is deleted. If the deletion is slow for some reason (perhaps a GC?) then 2 
stats will be deleted and the test will fail. The trouble is that the 1 second 
window is too small to work consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19758) Set hadoop.version=3.1.0 in standalone-metastore

2018-05-31 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-19758:
-

 Summary: Set hadoop.version=3.1.0 in standalone-metastore
 Key: HIVE-19758
 URL: https://issues.apache.org/jira/browse/HIVE-19758
 Project: Hive
  Issue Type: Task
Reporter: Andrew Sherman
Assignee: Andrew Sherman


When HIVE-19243 set hadoop.version=3.1.0 it did not change the value used in 
standalone-metastore which still uses 3.0.0-beta1.
 At the moment standalone-metastore is still a module of hive and so this can 
suck in the wrong code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19062) Update constraint_partition_columns.q.out

2018-03-27 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-19062:
-

 Summary: Update constraint_partition_columns.q.out
 Key: HIVE-19062
 URL: https://issues.apache.org/jira/browse/HIVE-19062
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


TestNegativeCliDriver is error-prone at present but if you run 
constraint_partition_columns.q on its own you get a diff. I think this is a 
simple regression caused by [HIVE-18726]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18877) HiveSchemaTool.validateSchemaTables() should wrap a SQLException when rethrowing

2018-03-06 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18877:
-

 Summary: HiveSchemaTool.validateSchemaTables() should wrap a 
SQLException when rethrowing
 Key: HIVE-18877
 URL: https://issues.apache.org/jira/browse/HIVE-18877
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


If schematool is run with the -verbose flag then it will print a stack trace 
for an exception that occurs. If a SQLException is caught during 
HiveSchemaTool.validateSchemaTables() then a HiveMetaException is rethrown 
containing the text of the SQLException. If we instead throw  a 
HiveMetaException that wraps the SQLException then the stacktrace will help 
with diagnosis of issues where the SQLException contains a generic error text. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18791) Fix TestJdbcWithMiniHS2#testHttpHeaderSize

2018-02-23 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18791:
-

 Summary: Fix TestJdbcWithMiniHS2#testHttpHeaderSize
 Key: HIVE-18791
 URL: https://issues.apache.org/jira/browse/HIVE-18791
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


TestJdbcWithMiniHS2#testHttpHeaderSize tests whether config of http header 
sizes works by using a long username. The local scratch directory for the 
session uses the username as part of its path. When this name is more than 255 
chars (on most modern file systems) then the directory creation will fail. 
HIVE-18625 made this failure throw an exception, which has caused a regression 
in testHttpHeaderSize.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18456) Add some tests for HIVE-18367 to check that the table information contains the query correctly

2018-01-16 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18456:
-

 Summary: Add some tests for HIVE-18367 to check that the table 
information contains the query correctly
 Key: HIVE-18456
 URL: https://issues.apache.org/jira/browse/HIVE-18456
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


This cannot be tested with a CliDriver test so add a java test to check the 
output of 'describe extended', which is changed by HIVE-18367 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-18367) Describe Extended output is truncated on a table with an explicit row format containing tabs or newlines.

2018-01-03 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18367:
-

 Summary: Describe Extended output is truncated on a table with an 
explicit row format containing tabs or newlines.
 Key: HIVE-18367
 URL: https://issues.apache.org/jira/browse/HIVE-18367
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


'Describe Extended' dumps information about a table. The protocol for sending 
this data relies on tabs and newlines to separate pieces of data. If a table 
has 'FIELDS terminated by XXX' or 'LINES terminated by XXX' where XXX is a tab 
or newline then the output seen by the user is prematurely truncated. Fix this 
by replacing tabs and newlines in the table description with “\n” and “\t”.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18310) Test 'vector_reduce_groupby_duplicate_cols.q' is misspelled in testconfiguration.properties

2017-12-19 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18310:
-

 Summary: Test 'vector_reduce_groupby_duplicate_cols.q' is 
misspelled in testconfiguration.properties
 Key: HIVE-18310
 URL: https://issues.apache.org/jira/browse/HIVE-18310
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman
Priority: Minor


The new testvector_reduce_groupby_duplicate_cols.q was introduced in 
[HIVE-18258] but is misspelled in testconfiguration.properties:
{noformat} 
-  vector_reduce_grpupby_duplicate_cols.q,\
+  vector_reduce_groupby_duplicate_cols.q,\
{noformat} 
I noticed this because TestDanglingQOuts.checkDanglingQOut failed



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18228) Azure credential properties should be added to the HiveConf hidden list

2017-12-05 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18228:
-

 Summary: Azure credential properties should be added to the 
HiveConf hidden list
 Key: HIVE-18228
 URL: https://issues.apache.org/jira/browse/HIVE-18228
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


The HIVE_CONF_HIDDEN_LIST("hive.conf.hidden.list") already contains keys 
contaiing aws credentials. The Azure properties to be added are:
* dfs.adls.oauth2.credential
* fs.adl.oauth2.credential



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18136) WorkloadManagerMxBean is missing the Apache license header

2017-11-22 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18136:
-

 Summary: WorkloadManagerMxBean is missing the Apache license header
 Key: HIVE-18136
 URL: https://issues.apache.org/jira/browse/HIVE-18136
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


This causes warnings in the yetus check:
{quote}Lines that start with ? in the ASF License  report indicate files 
that do not have an Apache license header:
 !? 
/data/hiveptest/working/yetus/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManagerMxBean.java{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18127) Do not strip '--' comments from shell commands issued from CliDriver

2017-11-21 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18127:
-

 Summary: Do not strip '--' comments from shell commands issued 
from CliDriver
 Key: HIVE-18127
 URL: https://issues.apache.org/jira/browse/HIVE-18127
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


CLiDriver has the ability to run shell commands by prefixing them with '!".
This behavior is not widely used (there are only 3 examples in .q files).
Since HIVE-16935 started stripping comments starting with '\-\-', a shell 
command containing '--' will not work correctly.
Fix this by using the unstripped command for shell commands.
Note that it would be a security hole for HS2 to allow execution of arbitrary 
shell commands from a client command.
Add tests to nail down correct behavior with '--' comments:
* CliDriver should not strip strings starting with '--' in a shell command 
(FIXED in this change).
* HiveCli should strip '--' comments.
* A Jdbc program should allow commands starting with "!" but these will fail in 
the sql parser.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-18054) Make Lineage work with concurrent queries on a Session

2017-11-13 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-18054:
-

 Summary:  Make Lineage work with concurrent queries on a Session
 Key: HIVE-18054
 URL: https://issues.apache.org/jira/browse/HIVE-18054
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


A Hive Session can contain multiple concurrent sql Operations.
Lineage is currently tracked in SessionState and is cleared when a query 
completes. This results in Lineage for other running queries being lost.
To fix this, move LineageState from SessionState to QueryState.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17935) Turn on hive.optimize.sort.dynamic.partition by default

2017-10-30 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17935:
-

 Summary: Turn on hive.optimize.sort.dynamic.partition by default
 Key: HIVE-17935
 URL: https://issues.apache.org/jira/browse/HIVE-17935
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


The config option hive.optimize.sort.dynamic.partition is an optimization for 
Hive’s dynamic partitioning feature. It was originally implemented in 
[HIVE-6455|https://issues.apache.org/jira/browse/HIVE-6455]. With this 
optimization, the dynamic partition columns and bucketing columns (in case of 
bucketed tables) are sorted before being fed to the reducers. Since the 
partitioning and bucketing columns are sorted, each reducer can keep only one 
record writer open at any time thereby reducing the memory pressure on the 
reducers. There were some early problems with this optimization and it was 
disabled by default in HiveConf in 
[HIVE-8151|https://issues.apache.org/jira/browse/HIVE-8151]. Since then setting 
hive.optimize.sort.dynamic.partition=true has been used to solve problems where 
dynamic partitioning produces with (1) too many small files on HDFS, which is 
bad for the cluster and can increase overhead for future Hive queries over 
those partitions, and (2) OOM issues in the map tasks because it trying to 
simultaneously write to 100 different files. 

It now seems that the feature is probably mature enough that it can be enabled 
by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17868) Make queries in spark_local_queries.q have deterministic output

2017-10-20 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17868:
-

 Summary: Make queries in spark_local_queries.q have deterministic 
output
 Key: HIVE-17868
 URL: https://issues.apache.org/jira/browse/HIVE-17868
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


Add 'order by' to queries so that output is always the same



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17826) Error writing to RandomAccessFile after operation log is closed

2017-10-17 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17826:
-

 Summary: Error writing to RandomAccessFile after operation log is 
closed
 Key: HIVE-17826
 URL: https://issues.apache.org/jira/browse/HIVE-17826
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


We are seeing the error from HS2 process stdout.

{noformat}
2017-09-07 10:17:23,933 AsyncLogger-1 ERROR Attempted to append to non-started 
appender query-file-appender
2017-09-07 10:17:23,934 AsyncLogger-1 ERROR Attempted to append to non-started 
appender query-file-appender
2017-09-07 10:17:23,935 AsyncLogger-1 ERROR Unable to write to stream 
/var/log/hive/operation_logs/dd38df5b-3c09-48c9-ad64-a2eee093bea6/hive_20170907101723_1a6ad4b9-f662-4e7a-a495-06e3341308f9
 for appender query-file-appender
2017-09-07 10:17:23,935 AsyncLogger-1 ERROR An exception occurred processing 
Appender query-file-appender 
org.apache.logging.log4j.core.appender.AppenderLoggingException: Error writing 
to RandomAccessFile 
/var/log/hive/operation_logs/dd38df5b-3c09-48c9-ad64-a2eee093bea6/hive_20170907101723_1a6ad4b9-f662-4e7a-a495-06e3341308f9
at 
org.apache.logging.log4j.core.appender.RandomAccessFileManager.flush(RandomAccessFileManager.java:114)
at 
org.apache.logging.log4j.core.appender.RandomAccessFileManager.write(RandomAccessFileManager.java:103)
at 
org.apache.logging.log4j.core.appender.OutputStreamManager.write(OutputStreamManager.java:136)
at 
org.apache.logging.log4j.core.appender.AbstractOutputStreamAppender.append(AbstractOutputStreamAppender.java:105)
at 
org.apache.logging.log4j.core.appender.RandomAccessFileAppender.append(RandomAccessFileAppender.java:89)
at 
org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:152)
at 
org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:125)
at 
org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:116)
at 
org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at 
org.apache.logging.log4j.core.appender.routing.RoutingAppender.append(RoutingAppender.java:112)
at 
org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:152)
at 
org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:125)
at 
org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:116)
at 
org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at 
org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:390)
at 
org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:378)
at 
org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:362)
at 
org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:79)
at 
org.apache.logging.log4j.core.async.AsyncLogger.actualAsyncLog(AsyncLogger.java:385)
at 
org.apache.logging.log4j.core.async.RingBufferLogEvent.execute(RingBufferLogEvent.java:103)
at 
org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:43)
at 
org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:28)
at 
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:129)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Stream Closed
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:525)
at 
org.apache.logging.log4j.core.appender.RandomAccessFileManager.flush(RandomAccessFileManager.java:111)
... 25 more
{noformat}





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17789) Flaky test: TestSessionManagerMetrics.testAbandonedSessionMetrics has timing related problems

2017-10-12 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17789:
-

 Summary: Flaky test: 
TestSessionManagerMetrics.testAbandonedSessionMetrics has timing related 
problems
 Key: HIVE-17789
 URL: https://issues.apache.org/jira/browse/HIVE-17789
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


The test is waiting for a worker thread to be timed out. The time after which 
the timeout should happen in 3000 ms. The test waits for 3200 ms, and sometimes 
this is not enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17760) Create a unit test which validates HIVE-9423 does not regress

2017-10-10 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17760:
-

 Summary: Create a unit test which validates HIVE-9423 does not 
regress 
 Key: HIVE-17760
 URL: https://issues.apache.org/jira/browse/HIVE-17760
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


During [HIVE-9423] we verified that when the Thrift server pool is exhausted, 
then Beeline connection times out, and provide a meaningful error message.
Create a unit test which verifies this, and helps to keep this feature working



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17677) Investigate using hive statistics information to optimize HoS parallel order by

2017-10-02 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17677:
-

 Summary: Investigate using hive statistics information to optimize 
HoS parallel order by
 Key: HIVE-17677
 URL: https://issues.apache.org/jira/browse/HIVE-17677
 Project: Hive
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Andrew Sherman
Assignee: Andrew Sherman


I think Spark's native parallel order by works in a similar way to what we do 
for Hive-on-MR.  That is, it scans the RDD once and sample the data to 
determine what ranges the data should be partitioned into, and then scans the 
RDD again to do the actual order by (with multiple reducers). 

One optimization suggested by [~stakiar] is that if we have column stats about 
the col we are ordering by, then the first scan on the RDD is not necessary. If 
we have histogram data about the RDD, we already know what the ranges of the 
order by should be. This should work when running parallel order by on simple 
tables, will be harder when we run it on derived datasets (although not 
impossible). 

To do his we would have to understand more about the internals of JavaPairRDD. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17635) Add unit tests to CompactionTxnHandler and use PreparedStatements for queries

2017-09-28 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17635:
-

 Summary: Add unit tests to CompactionTxnHandler and use 
PreparedStatements for queries
 Key: HIVE-17635
 URL: https://issues.apache.org/jira/browse/HIVE-17635
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Andrew Sherman
Assignee: Andrew Sherman


It is better for jdbc code that runs against the HMS database to use 
PreparedStatements. Convert CompactionTxnHandler queries to use 
PreparedStatement and add tests to TestCompactionTxnHandler to test these 
queries, and improve code coverage.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17128) Operation Logging leaks file descriptors as the log4j Appender is never closed

2017-07-19 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-17128:
-

 Summary: Operation Logging leaks file descriptors as the log4j 
Appender is never closed
 Key: HIVE-17128
 URL: https://issues.apache.org/jira/browse/HIVE-17128
 Project: Hive
  Issue Type: Bug
  Components: Logging
Reporter: Andrew Sherman
Assignee: Andrew Sherman


[HIVE-16061] and [HIVE-16400] changed operation logging to use the Log4j2 
RoutingAppender to automatically output the log for each query into each 
individual operation log file. As log4j does not know when a query is finished 
it keeps the OutputStream in the Appender open even when the query completes. 
The stream holds a file descriptor and so we leak file descriptors. Note that 
we are already careful to close any streams reading from the operation log file.

h2. Fix

To fix this we use a technique described in [LOG4J2-510] which uses reflection 
to close the appender. The test in TestOperationLoggingLayout will be extended 
to check that the Appender is closed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16991) HiveMetaStoreClient needs a 2-arg constructor for backwards compatibility

2017-06-29 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-16991:
-

 Summary: HiveMetaStoreClient needs a 2-arg constructor for 
backwards compatibility
 Key: HIVE-16991
 URL: https://issues.apache.org/jira/browse/HIVE-16991
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


Some client code that is not easy to change uses a 2-arg constructor on 
HiveMetaStoreClient.
It is trivial and safe to add this constructor:

{{public HiveMetaStoreClient(HiveConf conf, HiveMetaHookLoader hookLoader) 
throws MetaException {
this(conf, hookLoader, true);
}}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16935) Hive should strip comments from input before choosing which CommandProcessor to run.

2017-06-21 Thread Andrew Sherman (JIRA)
Andrew Sherman created HIVE-16935:
-

 Summary: Hive should strip comments from input before choosing 
which CommandProcessor to run.
 Key: HIVE-16935
 URL: https://issues.apache.org/jira/browse/HIVE-16935
 Project: Hive
  Issue Type: Bug
Reporter: Andrew Sherman
Assignee: Andrew Sherman


While using Beeswax, Hue fails to execute statement with following error:

Error while compiling statement: FAILED: ParseException line 3:4 missing 
KW_ROLE at 'a' near 'a' line 3:5 missing EOF at '=' near 'a'

{quote}
-- comment
SET a=1;
SELECT 1;
{quote}

The same code works in Beeline and in Impala.
The same code fails in CliDriver 
 
h2. Background

Hive deals with sql comments (“-- to end of line”) in different places.
Some clients attempt to strip comments. For example BeeLine was recently 
enhanced in https://issues.apache.org/jira/browse/HIVE-13864 to strip comments 
from multi-line commands before they are executed.
Other clients such as Hue or Jdbc do not strip comments before sending text.
Some tests such as TestCliDriver strip comments before running tests.
When Hive gets a command the CommandProcessorFactory looks at the text to 
determine which CommandProcessor should handle the command. In the bug case the 
correct CommandProcessor is SetProcessor, but the comments confuse the 
CommandProcessorFactory and so the command is treated as sql. Hive’s sql parser 
understands and ignores comments, but it does not understand the set commands 
usually handled by SetProcessor and so we get the ParseException shown above.
 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)