[jira] [Created] (HIVE-16348) HoS query is canceled but error message shows RPC is closed

2017-03-31 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-16348:
--

 Summary: HoS query is canceled but error message shows RPC is 
closed
 Key: HIVE-16348
 URL: https://issues.apache.org/jira/browse/HIVE-16348
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


When a HoS query is interrupted in getting app id, it keeps trying to get 
status till timedout, and return some RPC is closed error message which is 
misleading.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16292) SparkUtilities upload to HDFS doesn't work with viewfs

2017-03-24 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-16292:
--

 Summary: SparkUtilities upload to HDFS doesn't work with viewfs
 Key: HIVE-16292
 URL: https://issues.apache.org/jira/browse/HIVE-16292
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


If scratchdir is set to viewfs, HoS fails with exception like

{noformt}
java.lang.IllegalArgumentException: Wrong FS: 
viewfs://ns-default/tmp/hive_scratch/hive/hive/_spark_session_dir/f4031fca-2885-4e7a-9b05-764d25d0e488/hive-exec-1.1.0-cdh5.7.2.jar,
 expected: hdfs://nameservice1
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412)
at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:499)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:351)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
at 
org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1949)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.uploadToHDFS(SparkUtilities.java:86)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16286) Log canceled query id

2017-03-23 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-16286:
--

 Summary: Log canceled query id
 Key: HIVE-16286
 URL: https://issues.apache.org/jira/browse/HIVE-16286
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial


Currently, just a generic message is logged when a query is canceled. It is 
better to log the query id as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15915) Emit progress percentage in getting operation status

2017-02-14 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-15915:
--

 Summary: Emit progress percentage in getting operation status
 Key: HIVE-15915
 URL: https://issues.apache.org/jira/browse/HIVE-15915
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


When running a query asynchronously, client may want to check the progress 
periodically. HIVE-15473 is to support progressing bar on beeline for Tez. For 
this issue, we just want the progress percentage. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15208) Query string should be HTML encoded for Web UI

2016-11-15 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-15208:
--

 Summary: Query string should be HTML encoded for Web UI
 Key: HIVE-15208
 URL: https://issues.apache.org/jira/browse/HIVE-15208
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14928) Analyze table no scan mess up schema

2016-10-11 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-14928:
--

 Summary: Analyze table no scan mess up schema
 Key: HIVE-14928
 URL: https://issues.apache.org/jira/browse/HIVE-14928
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang


StatsNoJobTask uses static variables partUpdates and  table to track stats 
changes. If multiple analyze no scan tasks run at the same time, then 
table/partition schema could mess up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13782) Compile async query asynchronously

2016-05-18 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13782:
--

 Summary: Compile async query asynchronously
 Key: HIVE-13782
 URL: https://issues.apache.org/jira/browse/HIVE-13782
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Currently, when an async query is submitted to HS2, HS2 does the preparation 
synchronously. One of the preparation step is to compile the query, which may 
take some time. It will be helpful to provide an option to do the compilation 
asynchronously.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13679) Pass diagnostic message to failure hooks

2016-05-03 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13679:
--

 Summary: Pass diagnostic message to failure hooks
 Key: HIVE-13679
 URL: https://issues.apache.org/jira/browse/HIVE-13679
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Pass diagnostic message to failure hooks. This is useful for debugging remote 
job failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13559) Pass exception to failure hooks

2016-04-20 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13559:
--

 Summary: Pass exception to failure hooks
 Key: HIVE-13559
 URL: https://issues.apache.org/jira/browse/HIVE-13559
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Pass exception to failure hooks so that they know more about the failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13501) Invoke failure hooks if query fails on exception

2016-04-13 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13501:
--

 Summary: Invoke failure hooks if query fails on exception
 Key: HIVE-13501
 URL: https://issues.apache.org/jira/browse/HIVE-13501
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


When a query fails on some exception, failure hooks are not called currently. 
It's better to invoke such hooks so that we know the query is failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13430) Pass error message to failure hook

2016-04-05 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13430:
--

 Summary: Pass error message to failure hook
 Key: HIVE-13430
 URL: https://issues.apache.org/jira/browse/HIVE-13430
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Currently, the failure hook just knows the query failed. But it has no clue 
what the error is. It is better to pass the error message to the hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13237) Select parquet struct field with upper case throws NPE

2016-03-08 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13237:
--

 Summary: Select parquet struct field with upper case throws NPE
 Key: HIVE-13237
 URL: https://issues.apache.org/jira/browse/HIVE-13237
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Query "select msg.fieldone from test" throws NPE if msg's fieldone is actually 
fieldOne:

{noformat}
2016-03-08 17:30:57,772 ERROR [main]: exec.FetchTask 
(FetchTask.java:initialize(86)) - java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13043) Reload function has no impact to function registry

2016-02-11 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13043:
--

 Summary: Reload function has no impact to function registry
 Key: HIVE-13043
 URL: https://issues.apache.org/jira/browse/HIVE-13043
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


With HIVE-2573, users should run "reload function" to refresh cached function 
registry. However, "reload function" has no impact at all. We need to fix this. 
Otherwise, HS2 needs to be restarted to see new global functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13026) Pending/running operation metrics are wrong

2016-02-08 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-13026:
--

 Summary: Pending/running operation metrics are wrong
 Key: HIVE-13026
 URL: https://issues.apache.org/jira/browse/HIVE-13026
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


A query is finished, however the pending/running operation count doesn't 
decrease. 

For example, in TestHs2Metrics::testMetrics(), we have
{noformat}
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, 
"api_hs2_operation_PENDING", 1);
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, 
"api_hs2_operation_RUNNING", 1);
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.COUNTER, 
"hs2_completed_operation_FINISHED", 1);
{noformat}

Should it be below?

{noformat}
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, 
"api_hs2_operation_PENDING", 0);
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, 
"api_hs2_operation_RUNNING", 0);
MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.COUNTER, 
"hs2_completed_operation_FINISHED", 1);
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12987) Add metrics for HS2 active users and SQL operations

2016-02-02 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12987:
--

 Summary: Add metrics for HS2 active users and SQL operations
 Key: HIVE-12987
 URL: https://issues.apache.org/jira/browse/HIVE-12987
 Project: Hive
  Issue Type: Task
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


HIVE-12271 added metrics for all HS2 operations. Sometimes, users are also 
interested in metrics just for SQL operations.

It is useful to track active user count as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12511) IN clause performs differently then = clause

2015-11-24 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12511:
--

 Summary: IN clause performs differently then = clause
 Key: HIVE-12511
 URL: https://issues.apache.org/jira/browse/HIVE-12511
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Similar to HIVE-11973, IN clause performs differently then = clause for "int" 
type with string values.
For example,
{noformat}
SELECT * FROM inttest WHERE iValue IN ('01');
{noformat}
will not return any rows with int iValue = 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12485) Secure HS2 web UI with kerberos

2015-11-20 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12485:
--

 Summary: Secure HS2 web UI with kerberos
 Key: HIVE-12485
 URL: https://issues.apache.org/jira/browse/HIVE-12485
 Project: Hive
  Issue Type: Sub-task
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12484) Show meta operations on HS2 web UI

2015-11-20 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12484:
--

 Summary: Show meta operations on HS2 web UI
 Key: HIVE-12484
 URL: https://issues.apache.org/jira/browse/HIVE-12484
 Project: Hive
  Issue Type: Sub-task
Reporter: Jimmy Xiang


As Mohit pointed out in the review of HIVE-12338, it is nice to show meta 
operations on HS2 web UI too. So that we can have an end-to-end picture for 
those operations access HMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12471) Secure HS2 web UI with SSL and kerberos

2015-11-19 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12471:
--

 Summary: Secure HS2 web UI with SSL and kerberos
 Key: HIVE-12471
 URL: https://issues.apache.org/jira/browse/HIVE-12471
 Project: Hive
  Issue Type: Sub-task
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12338) Add webui to HiveServer2

2015-11-04 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12338:
--

 Summary: Add webui to HiveServer2
 Key: HIVE-12338
 URL: https://issues.apache.org/jira/browse/HIVE-12338
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


A web ui for HiveServer2 can show some useful information such as:
 
1. Sessions,
2. Queries that are executing on the HS2, their states, starting time, etc.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12318) qtest failing due to NPE in logStats

2015-11-02 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12318:
--

 Summary: qtest failing due to NPE in logStats
 Key: HIVE-12318
 URL: https://issues.apache.org/jira/browse/HIVE-12318
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang


{noformat}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.Operator.logStats(Operator.java:899) ~
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12317) Emit current database in lineage info

2015-11-02 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12317:
--

 Summary: Emit current database in lineage info
 Key: HIVE-12317
 URL: https://issues.apache.org/jira/browse/HIVE-12317
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


It will be easier to emit current database info explicitly instead of finding 
out such info from normalized column names.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12287) Lineage for lateral view shows wrong dependencies

2015-10-28 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12287:
--

 Summary: Lineage for lateral view shows wrong dependencies
 Key: HIVE-12287
 URL: https://issues.apache.org/jira/browse/HIVE-12287
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


The lineage dependency graph for select from lateral view is wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12278) Skip logging lineage for explain queries

2015-10-27 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12278:
--

 Summary: Skip logging lineage for explain queries
 Key: HIVE-12278
 URL: https://issues.apache.org/jira/browse/HIVE-12278
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


For explain queries, we don't generate the lineage info. So we should not try 
to log it at all.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12268) Context leaks deleteOnExit paths

2015-10-26 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12268:
--

 Summary: Context leaks deleteOnExit paths
 Key: HIVE-12268
 URL: https://issues.apache.org/jira/browse/HIVE-12268
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Long running HS2 saves lots of paths in the FileSystem's deleteOnExit map. We 
should remove those paths already removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12265) Generate lineage info only if requested

2015-10-26 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12265:
--

 Summary: Generate lineage info only if requested
 Key: HIVE-12265
 URL: https://issues.apache.org/jira/browse/HIVE-12265
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


If lineage related hook is not configured, we should not generate lineage info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12225) LineageCtx should release all resources at clear

2015-10-21 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12225:
--

 Summary: LineageCtx should release all resources at clear
 Key: HIVE-12225
 URL: https://issues.apache.org/jira/browse/HIVE-12225
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Somce maps are not released in clear() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12200) INSERT INTO table using a select statement w/o a FROM clause fails

2015-10-15 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12200:
--

 Summary: INSERT INTO table using a select statement w/o a FROM 
clause fails
 Key: HIVE-12200
 URL: https://issues.apache.org/jira/browse/HIVE-12200
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Here is the stack trace:

{noformat}
FailedPredicateException(regularBody,{$s.tree.getChild(1) !=null}?)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41047)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40222)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40092)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1656)
at 
org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1140)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202)
at 
org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:312)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1162)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1215)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1091)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1081)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:225)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:177)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:388)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:323)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:731)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:704)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:29 Failed to recognize predicate ''. Failed 
rule: 'regularBody' in statement
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12187) Release plan once a query is executed

2015-10-15 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12187:
--

 Summary: Release plan once a query is executed 
 Key: HIVE-12187
 URL: https://issues.apache.org/jira/browse/HIVE-12187
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Some clients leave query operations open for a while so that they can retrieve 
the query results later. That means the allocated memory will be kept around 
too. We should release those resources not needed for query execution any more 
once it is executed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12046) Re-create spark client if connection is dropped

2015-10-06 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-12046:
--

 Summary: Re-create spark client if connection is dropped
 Key: HIVE-12046
 URL: https://issues.apache.org/jira/browse/HIVE-12046
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Currently, if the connection to the spark cluster is dropped, the spark client 
will stay in a bad state. A new Hive session is needed to re-establish the 
connection. It is better to auto reconnect in this case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11984) Add HS2 open operation metrics

2015-09-28 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11984:
--

 Summary: Add HS2 open operation metrics
 Key: HIVE-11984
 URL: https://issues.apache.org/jira/browse/HIVE-11984
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Some metrics for open operations should be helpful to track operations not 
closed/cancelled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11946) TestNotificationListener is flaky

2015-09-24 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11946:
--

 Summary: TestNotificationListener is flaky
 Key: HIVE-11946
 URL: https://issues.apache.org/jira/browse/HIVE-11946
 Project: Hive
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


{noformat}
expected:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, 
DROP_PARTITION, ALTER_TABLE, DROP_TABLE, DROP_DATABASE]> but 
was:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, 
DROP_PARTITION, ALTER_TABLE, DROP_TABLE]>

Stacktrace

java.lang.AssertionError: expected:<[CREATE_DATABASE, CREATE_TABLE, 
ADD_PARTITION, ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE, 
DROP_DATABASE]> but was:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, 
ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE]>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hive.hcatalog.listener.TestNotificationListener.tearDown(TestNotificationListener.java:114)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11939) TxnDbUtil should turn off jdbc auto commit

2015-09-23 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11939:
--

 Summary: TxnDbUtil should turn off jdbc auto commit
 Key: HIVE-11939
 URL: https://issues.apache.org/jira/browse/HIVE-11939
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


TxnDbUtil uses jdbc transactions, but doesn't turn off auto commit. So some 
TestStreaming tests are flaky. For example,
{noformat}
testTransactionBatchAbortAndCommit(org.apache.hive.hcatalog.streaming.TestStreaming)
  Time elapsed: 0.011 sec  <<< ERROR!
java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'.
at org.apache.derby.iapi.error.StandardException.newException(Unknown 
Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.duplicateDescriptorException(Unknown
 Source)
at 
org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addDescriptor(Unknown 
Source)
at 
org.apache.derby.impl.sql.execute.CreateTableConstantAction.executeConstantAction(Unknown
 Source)
at org.apache.derby.impl.sql.execute.MiscResultSet.open(Unknown Source)
at 
org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source)
at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at 
org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:72)
at 
org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:131)
at 
org.apache.hive.hcatalog.streaming.TestStreaming.(TestStreaming.java:160)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11834) Lineage doesn't work with dynamic partitioning query

2015-09-15 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11834:
--

 Summary: Lineage doesn't work with dynamic partitioning query
 Key: HIVE-11834
 URL: https://issues.apache.org/jira/browse/HIVE-11834
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


As Mark found out,
https://issues.apache.org/jira/browse/HIVE-11139?focusedCommentId=14745937&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14745937

This is indeed a code bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11817) Window function max NullPointerException

2015-09-14 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11817:
--

 Summary: Window function max NullPointerException
 Key: HIVE-11817
 URL: https://issues.apache.org/jira/browse/HIVE-11817
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


This query
{noformat}
select key, max(value) over (order by key rows between 10 preceding and 20 
following) from src1 where length(key) > 10;
{noformat}
fails with NPE:
{noformat}
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:290)
 
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:477)
 
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
 
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11814) Emit query time in lineage info

2015-09-14 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11814:
--

 Summary: Emit query time in lineage info
 Key: HIVE-11814
 URL: https://issues.apache.org/jira/browse/HIVE-11814
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Currently, we emit query start time, not the query duration. It is nice to have 
it too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11771) Parquet timestamp conversion errors

2015-09-09 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11771:
--

 Summary: Parquet timestamp conversion errors
 Key: HIVE-11771
 URL: https://issues.apache.org/jira/browse/HIVE-11771
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


We have some problem to read timestamp written to parquet file by other tools. 
The value is wrong after the conversion (not the same as it is meant to be).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys

2015-09-04 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11737:
--

 Summary: IndexOutOfBounds compiling query with duplicated groupby 
keys
 Key: HIVE-11737
 URL: https://issues.apache.org/jira/browse/HIVE-11737
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


{noformat}
SELECT
tinyint_col_7,
MIN(timestamp_col_1) AS timestamp_col,
MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 
476) AS int))) AS int_col,
tinyint_col_7 AS int_col_1,
LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 476) 
AS int)) AS int_col_2
FROM table_3
GROUP BY
tinyint_col_7,
tinyint_col_7,
LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 476) 
AS int))
{noformat}

Query compilation fails:
{noformat}
Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11712) Duplicate groupby keys cause ClassCastException

2015-09-01 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11712:
--

 Summary: Duplicate groupby keys cause ClassCastException
 Key: HIVE-11712
 URL: https://issues.apache.org/jira/browse/HIVE-11712
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


With duplicate groupby keys, we could use wrong object inspectors for some 
groupby expressions, and lead to ClassCastException, for example, 

{noformat}
explain
SELECT distinct s1.customer_name as x, s1.customer_name as y
FROM default.testv1_staples s1 join default.src s2 on s1.customer_name = s2.key
HAVING (
(SUM(s1.customer_balance) <= 4074689.00041)
AND (AVG(s1.discount) <= 822)
AND (COUNT(s2.value) > 4)
{noformat}

will lead to

{noformat}
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableShortObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFAverage$AbstractGenericUDAFAverageEvaluator.init(GenericUDAFAverage.java:374)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:3887)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:4354)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5644)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8977)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9849)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9742)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10178)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10189)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10106)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11620) Fix several qtest output order

2015-08-21 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11620:
--

 Summary: Fix several qtest output order
 Key: HIVE-11620
 URL: https://issues.apache.org/jira/browse/HIVE-11620
 Project: Hive
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


selectDistinctStar.q
unionall_unbalancedppd.q
vector_cast_constant.q



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11586) ObjectInspectorFactory.getReflectionObjectInspector is not thread-safe

2015-08-17 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11586:
--

 Summary: ObjectInspectorFactory.getReflectionObjectInspector is 
not thread-safe
 Key: HIVE-11586
 URL: https://issues.apache.org/jira/browse/HIVE-11586
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


ObjectInspectorFactory#getReflectionObjectInspectorNoCache addes newly create 
object inspector to the cache before calling its init() method, to allow 
reusing the cache when dealing with recursive types. So a second thread can 
then call getReflectionObjectInspector and fetch an uninitialized instance of 
ReflectionStructObjectInspector.

Another issue is that if two threads calls 
ObjectInspectorFactory.getReflectionObjectInspector at the same time. One 
thread could get an object inspector not in the cache, i.e. they could both 
call getReflectionObjectInspectorNoCache() but only one will put the new object 
inspector to cache successfully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11580) ThriftUnionObjectInspector#toString throws NPE

2015-08-17 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11580:
--

 Summary: ThriftUnionObjectInspector#toString throws NPE
 Key: HIVE-11580
 URL: https://issues.apache.org/jira/browse/HIVE-11580
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


ThriftUnionObjectInspector uses toString from StructObjectInspector, which 
accesses uninitialized member variable fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11473) Failed to create spark client with Spark 1.5

2015-08-05 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11473:
--

 Summary: Failed to create spark client with Spark 1.5
 Key: HIVE-11473
 URL: https://issues.apache.org/jira/browse/HIVE-11473
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


In Spark 1.5, SparkListener interface is changed. So HoS may fail to create the 
spark client if the un-implemented event callback method is invoked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11464) lineage info missing if there are multiple outputs

2015-08-05 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11464:
--

 Summary: lineage info missing if there are multiple outputs
 Key: HIVE-11464
 URL: https://issues.apache.org/jira/browse/HIVE-11464
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


If there are multiple outputs, for example,
from (select ...) t
insert into table t1 select * from t
insert into table t2 select * from t;

The lineage info for table t2 is not emitted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11426) lineage3.q fails with -Phadoop-1

2015-07-31 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11426:
--

 Summary: lineage3.q fails with -Phadoop-1
 Key: HIVE-11426
 URL: https://issues.apache.org/jira/browse/HIVE-11426
 Project: Hive
  Issue Type: Bug
  Components: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Some queries in lineage3.q emit different results with -Phadoop-1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11184) Lineage - ExprProcFactory#getExprString may throw NullPointerException

2015-07-06 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11184:
--

 Summary: Lineage - ExprProcFactory#getExprString may throw 
NullPointerException
 Key: HIVE-11184
 URL: https://issues.apache.org/jira/browse/HIVE-11184
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


ColumnInfo may have null alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11139) Emit more lineage information

2015-06-29 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-11139:
--

 Summary: Emit more lineage information
 Key: HIVE-11139
 URL: https://issues.apache.org/jira/browse/HIVE-11139
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


HIVE-1131 emits some column lineage info. But it doesn't support INSERT 
statements, or CTAS statements. It doesn't emit the predicate information 
either.

We can enhance and use the dependency information created in HIVE-1131, 
generate more complete lineage info.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10956) HS2 leaks HMS connections

2015-06-05 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10956:
--

 Summary: HS2 leaks HMS connections
 Key: HIVE-10956
 URL: https://issues.apache.org/jira/browse/HIVE-10956
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


HS2 uses threadlocal to cache HMS client in class Hive. When the thread is 
dead, the HMS client is not closed. So the connection to the HMS is leaked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10757) Explain query plan should have operation name EXPLAIN

2015-05-19 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10757:
--

 Summary: Explain query plan should have operation name EXPLAIN
 Key: HIVE-10757
 URL: https://issues.apache.org/jira/browse/HIVE-10757
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial


In the plan of an Explain query, the operation name is not set to EXPLAIN. 
Instead, it is set to the operation name of the query itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10740) RpcServer should be restarted if related configuration is changed [Spark Branch]

2015-05-18 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10740:
--

 Summary: RpcServer should be restarted if related configuration is 
changed [Spark Branch]
 Key: HIVE-10740
 URL: https://issues.apache.org/jira/browse/HIVE-10740
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Jimmy Xiang


In reviewing patch for HIVE-10721, Chengxiang pointed out an existing issue 
with HoS: the RpcServer is never restarted even related configurations are 
changed, as we do for SparkSession. We should monitor related configurations 
and restart the RpcServer if any is changed. It should be restarted while there 
is no active SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10721) SparkSessionManagerImpl leaks SparkSessions [Spark Branch]

2015-05-15 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10721:
--

 Summary: SparkSessionManagerImpl leaks SparkSessions [Spark Branch]
 Key: HIVE-10721
 URL: https://issues.apache.org/jira/browse/HIVE-10721
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


In #getSession(), we create a SparkSession and save it in a set. If the session 
is failed to open, it will stay in the set till shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10499) Ensure Session/ZooKeeperClient instances are closed

2015-04-27 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10499:
--

 Summary: Ensure Session/ZooKeeperClient instances are closed
 Key: HIVE-10499
 URL: https://issues.apache.org/jira/browse/HIVE-10499
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Some Session/ZooKeeperClient instances are not closed in some scenario. We need 
to make sure they are always closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10473) Spark client is recreated even spark configuration is not changed

2015-04-23 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10473:
--

 Summary: Spark client is recreated even spark configuration is not 
changed
 Key: HIVE-10473
 URL: https://issues.apache.org/jira/browse/HIVE-10473
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Currently, we think a spark setting is changed as long as the set method is 
called, even we set it to the same value as before. We should check if the 
value is changed too, since it takes time to start a new spark client. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10365) First job fails with StackOverflowError

2015-04-16 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10365:
--

 Summary: First job fails with StackOverflowError
 Key: HIVE-10365
 URL: https://issues.apache.org/jira/browse/HIVE-10365
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


When running some queries on Yarn with standalone Hadoop, the first query fails 
with StackOverflowError:

{noformat}
java.lang.StackOverflowError
at 
java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333)
at 
java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1145)
at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:464)
at java.lang.ClassLoader.loadClass(ClassLoader.java:405)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10167) HS2 logs the server started only before the server is shut down

2015-03-31 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10167:
--

 Summary: HS2 logs the server started only before the server is 
shut down
 Key: HIVE-10167
 URL: https://issues.apache.org/jira/browse/HIVE-10167
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial


TThreadPoolServer#serve() blocks till the server is down. We should log before 
that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10146) Not count session as idle if query is running

2015-03-30 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10146:
--

 Summary: Not count session as idle if query is running
 Key: HIVE-10146
 URL: https://issues.apache.org/jira/browse/HIVE-10146
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


Currently, as long as there is no activity, we think the HS2 session is idle. 
This makes it very hard to set HIVE_SERVER2_IDLE_SESSION_TIMEOUT. If we don't 
set it long enough, an unattended query could be killed.

We should provide an option to not to count the session as idle if some query 
is still running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10109) Creating index in a different db should not be allowed

2015-03-26 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10109:
--

 Summary: Creating index in a different db should not be allowed
 Key: HIVE-10109
 URL: https://issues.apache.org/jira/browse/HIVE-10109
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang


With "in table" in creating an index, you can specify an index table name like 
db.index_table with a db that is different from the db the table belongs to. 
However, you can't drop this index any more since Hive assumes the index is 
always in the same db as the base table.

We should not allow a qualified index table name specified in the "in table" 
value in creating an index.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name

2015-03-26 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10108:
--

 Summary: Index#getIndexTableName() returns db.index_table_name
 Key: HIVE-10108
 URL: https://issues.apache.org/jira/browse/HIVE-10108
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang


Index#getIndexTableName() used to just returns index table name. Now it returns 
a qualified table name.  This change was introduced in HIVE-3781.

As a result:

IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName())

throws ObjectNotFoundException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]

2015-03-24 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10073:
--

 Summary: Runtime exception when querying HBase with Spark [Spark 
Branch]
 Key: HIVE-10073
 URL: https://issues.apache.org/jira/browse/HIVE-10073
 Project: Hive
  Issue Type: Bug
  Components: Spark
Affects Versions: spark-branch
Reporter: Jimmy Xiang


When querying HBase with Spark, we got 
{noformat}
 Caused by: java.lang.IllegalArgumentException: Must specify table name
at 
org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276)
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331)
{noformat}

But it works fine for MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10065) jp

2015-03-23 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10065:
--

 Summary: jp
 Key: HIVE-10065
 URL: https://issues.apache.org/jira/browse/HIVE-10065
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10023) Fix more cache related concurrency issue

2015-03-19 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10023:
--

 Summary: Fix more cache related concurrency issue
 Key: HIVE-10023
 URL: https://issues.apache.org/jira/browse/HIVE-10023
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Searched the code and found couple more issues with cache, such as
LazyBinaryObjectInspectorFactory, PrimitiveObjectInspectorFactory, and 
TypeInfoFactory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]

2015-03-18 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-10009:
--

 Summary: LazyObjectInspectorFactory is not thread safe [Spark 
Branch]
 Key: HIVE-10009
 URL: https://issues.apache.org/jira/browse/HIVE-10009
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


LazyObjectInspectorFactory is not thread safe, which causes random failures in 
multiple thread environment such as Hive on Spark. We got exceptions like below

{noformat}
java.lang.RuntimeException: Map operator initialization failed: 
java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55)
at 
org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
at 
org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: 
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector
 cannot be cast to 
org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355)
at 
org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92)
... 16 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9993) Retrying task could use cached bad operators [Spark Branch]

2015-03-17 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9993:
-

 Summary: Retrying task could use cached bad operators [Spark 
Branch]
 Key: HIVE-9993
 URL: https://issues.apache.org/jira/browse/HIVE-9993
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch


For a Spark task, it could be retried on the same executor in case some 
failures. In retrying, the cache task could be used. Since the operators in the 
task are already initialized, they won't be initialized again. The partial data 
in these operators could lead to wrong final results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9935) Fix tests for java 1.8 [Spark Branch]

2015-03-11 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9935:
-

 Summary: Fix tests for java 1.8 [Spark Branch]
 Key: HIVE-9935
 URL: https://issues.apache.org/jira/browse/HIVE-9935
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch


In spark branch, these tests don't have java 1.8 golden file:

join0.q
list_bucket_dml_2
subquery_multiinsert.q




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9929) StatsUtil#getAvailableMemory could return negative value

2015-03-11 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9929:
-

 Summary: StatsUtil#getAvailableMemory could return negative value
 Key: HIVE-9929
 URL: https://issues.apache.org/jira/browse/HIVE-9929
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


In MAPREDUCE-5785, the default value of mapreduce.map.memory.mb is set to -1. 
We need fix StatsUtil#getAvailableMemory not to return negative value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9902) Map join small table files need more replications [Spark Branch]

2015-03-09 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9902:
-

 Summary: Map join small table files need more replications [Spark 
Branch]
 Key: HIVE-9902
 URL: https://issues.apache.org/jira/browse/HIVE-9902
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch


We have
{noformat}
replication = (short) Math.min(MIN_REPLICATION, numOfPartitions);
{noformat}
It should be
{noformat}
replication = (short) Math.max(MIN_REPLICATION, numOfPartitions);
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9861) Add spark-assembly on Hive's classpath

2015-03-04 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9861:
-

 Summary: Add spark-assembly on Hive's classpath
 Key: HIVE-9861
 URL: https://issues.apache.org/jira/browse/HIVE-9861
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Fix For: spark-branch


If SPARK_HOME is set, or we can find out the SPARK_HOME, we should add Spark 
assembly to the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9847) Hive should not allow additional attemps when RSC fails [Spark Branch]

2015-03-03 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9847:
-

 Summary: Hive should not allow additional attemps when RSC fails 
[Spark Branch]
 Key: HIVE-9847
 URL: https://issues.apache.org/jira/browse/HIVE-9847
 Project: Hive
  Issue Type: Bug
  Components: Spark
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: spark-branch


In yarn-cluster mode, if RSC fails at the first time, yarn will restart it. HoS 
should set "yarn.resourcemanager.am.max-attempts" to 1 to disallow such 
restarting when submitting Spark jobs to Yarn in cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9830) Test auto_sortmerge_join_8 is flaky

2015-03-02 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9830:
-

 Summary: Test auto_sortmerge_join_8 is flaky
 Key: HIVE-9830
 URL: https://issues.apache.org/jira/browse/HIVE-9830
 Project: Hive
  Issue Type: Bug
  Components: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: spark-branch


We found auto_sortmerge_join_8 is flaky is flaky for Spark. Sometimes, the 
output could be wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9804) Turn on some kryo settings by default for Spark

2015-02-26 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9804:
-

 Summary: Turn on some kryo settings by default for Spark
 Key: HIVE-9804
 URL: https://issues.apache.org/jira/browse/HIVE-9804
 Project: Hive
  Issue Type: Improvement
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Disable referrence checking, set classesToRegiser for Spark can boost the 
performance. We should do so by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9772) Hive parquet timestamp conversion doesn't work with new Parquet

2015-02-24 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9772:
-

 Summary: Hive parquet timestamp conversion doesn't work with new 
Parquet
 Key: HIVE-9772
 URL: https://issues.apache.org/jira/browse/HIVE-9772
 Project: Hive
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Currently Hive handles parquet timestamp compatibility issue with 
ParquetInpuSplit#readSupportMetadata. Parquet deprecated read support metadata. 
We can't pass meta data around with readSupportMetadata any more. We need a new 
scheme to pass around the timestamp conversion info.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-18 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-9659:
-

Assignee: Jimmy Xiang

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>Assignee: Jimmy Xiang
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
> 15/02/12 01:29:49 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> {noformat}
> (b) Detail error message for NullPointerException:
> {noformat}
> 5/02/12 01:29:50 ERROR MapJoinOperat

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-16 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323537#comment-14323537
 ] 

Jimmy Xiang commented on HIVE-9659:
---

I just ran the same query with just a small data set with skew join enabled.

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
> 15/02/12 01:29:49 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> {noformat}
> (b) Detail error message for NullPo

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321013#comment-14321013
 ] 

Jimmy Xiang commented on HIVE-9659:
---

I can reproduce this issue with a tiny data set.

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
> 15/02/12 01:29:49 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> {noformat}
> (b) Detail error message for NullPointerException:
> {noformat}

[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-13 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320906#comment-14320906
 ] 

Jimmy Xiang commented on HIVE-9659:
---

How big is the data set?  Does it work with a small data set?

> 'Error while trying to create table container' occurs during hive query case 
> execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
> ---
>
> Key: HIVE-9659
> URL: https://issues.apache.org/jira/browse/HIVE-9659
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xin Hao
>
> We found that 'Error while trying to create table container'  occurs during 
> Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
> If hive.optimize.skewjoin set to 'false', the case could pass.
> How to reproduce:
> 1. set hive.optimize.skewjoin=true;
> 2. Run BigBench case Q12 and it will fail. 
> Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
> will found error 'Error while trying to create table container' in the log 
> and also a NullPointerException near the end of the log.
> (a) Detail error message for 'Error while trying to create table container':
> {noformat}
> 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
> create table container
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
>   at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
>   at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
>   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>   at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
>   at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
> trying to create table container
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
>   ... 21 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
> directory: 
> hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
>   at 
> org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
>   ... 22 more
> 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
> 15/02/12 01:29:49 INFO PerfLogger:  from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler>
> {noformat}
> (b) Detail error message for NullPointerException:

[jira] [Assigned] (HIVE-9649) Skip orderBy or sortBy for intermediate result [Spark Branch]

2015-02-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-9649:
-

Assignee: Jimmy Xiang

> Skip orderBy or sortBy for intermediate result [Spark Branch]
> -
>
> Key: HIVE-9649
> URL: https://issues.apache.org/jira/browse/HIVE-9649
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
>
> orderBy and sortBy seem relevant only to the final result. Thus, for most of 
> cases, if not all, orderBy/sortBy can be ignored for efficiency. This JIRA is 
> to identify the opportunity and optimize it accordingly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9075) Allow RPC Configuration [Spark Branch]

2015-02-11 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HIVE-9075.
---
Resolution: Fixed

This issue appears to be fixed already with HIVE-9337.

> Allow RPC Configuration [Spark Branch]
> --
>
> Key: HIVE-9075
> URL: https://issues.apache.org/jira/browse/HIVE-9075
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: spark-branch
>Reporter: Brock Noland
>
> [~vanzin] has a bunch of nice config properties in RpcConfiguration:
> https://github.com/apache/hive/blob/spark/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java#L68
> However, we only load config properties which start with spark:
> https://github.com/apache/hive/blob/spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L102
> thus it's not possible to set this one the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9646:
--
Status: Patch Available  (was: Open)

> Beeline doesn't show Spark job progress info [Spark Branch]
> ---
>
> Key: HIVE-9646
> URL: https://issues.apache.org/jira/browse/HIVE-9646
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: HIVE-9646.1-spark.patch
>
>
> Beeline can show MR job progress info, but can't show that of Spark job. CLI 
> doesn't have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9646:
--
Attachment: HIVE-9646.1-spark.patch

> Beeline doesn't show Spark job progress info [Spark Branch]
> ---
>
> Key: HIVE-9646
> URL: https://issues.apache.org/jira/browse/HIVE-9646
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Attachments: HIVE-9646.1-spark.patch
>
>
> Beeline can show MR job progress info, but can't show that of Spark job. CLI 
> doesn't have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315273#comment-14315273
 ] 

Jimmy Xiang commented on HIVE-9646:
---

It turns out that HIVE-9121 is lost somehow, probably during merging.

> Beeline doesn't show Spark job progress info [Spark Branch]
> ---
>
> Key: HIVE-9646
> URL: https://issues.apache.org/jira/browse/HIVE-9646
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> Beeline can show MR job progress info, but can't show that of Spark job. CLI 
> doesn't have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315234#comment-14315234
 ] 

Jimmy Xiang commented on HIVE-9646:
---

Such log seems to be filtered out by default. Let me update the filter a little.

> Beeline doesn't show Spark job progress info [Spark Branch]
> ---
>
> Key: HIVE-9646
> URL: https://issues.apache.org/jira/browse/HIVE-9646
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>
> Beeline can show MR job progress info, but can't show that of Spark job. CLI 
> doesn't have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9646:
-

 Summary: Beeline doesn't show Spark job progress info [Spark 
Branch]
 Key: HIVE-9646
 URL: https://issues.apache.org/jira/browse/HIVE-9646
 Project: Hive
  Issue Type: Bug
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Beeline can show MR job progress info, but can't show that of Spark job. CLI 
doesn't have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314722#comment-14314722
 ] 

Jimmy Xiang commented on HIVE-9574:
---

Test index_auto_mult_tables is ok for me on my box.

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, 
> HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch, 
> HIVE-9574.6-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314491#comment-14314491
 ] 

Jimmy Xiang commented on HIVE-9574:
---

Cool, thanks. Attached v6 that addressed more minor review comments.

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, 
> HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch, 
> HIVE-9574.6-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-10 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Attachment: HIVE-9574.6-spark.patch

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, 
> HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch, 
> HIVE-9574.6-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]

2015-02-09 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9627:
--
Attachment: HIVE-9627.1-spark.patch

> Add cbo_gby_empty.q.out for Spark [Spark Branch]
> 
>
> Key: HIVE-9627
> URL: https://issues.apache.org/jira/browse/HIVE-9627
> Project: Hive
>  Issue Type: Test
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Trivial
> Attachments: HIVE-9627.1-spark.patch
>
>
> The golden file cbo_gby_empty.q.out for Spark is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]

2015-02-09 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9627:
--
Status: Patch Available  (was: Open)

> Add cbo_gby_empty.q.out for Spark [Spark Branch]
> 
>
> Key: HIVE-9627
> URL: https://issues.apache.org/jira/browse/HIVE-9627
> Project: Hive
>  Issue Type: Test
>Affects Versions: spark-branch
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Trivial
> Attachments: HIVE-9627.1-spark.patch
>
>
> The golden file cbo_gby_empty.q.out for Spark is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]

2015-02-09 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HIVE-9627:
-

 Summary: Add cbo_gby_empty.q.out for Spark [Spark Branch]
 Key: HIVE-9627
 URL: https://issues.apache.org/jira/browse/HIVE-9627
 Project: Hive
  Issue Type: Test
Affects Versions: spark-branch
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial


The golden file cbo_gby_empty.q.out for Spark is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-09 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Attachment: HIVE-9574.5-spark.patch

Attached v5 that addressed review comments from Xuefu and Rui. Made the cache 
thread safe.

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, 
> HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Attachment: HIVE-9574.4-spark.patch

Another minor optimization.

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, 
> HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Attachment: HIVE-9574.3-spark.patch

Attached v3 with a minor fix.

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, 
> HIVE-9574.3-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Attachment: HIVE-9574.2-spark.patch

Attached v2 that removed aboutToSpill(). Discussed it with Szehon and Chao, it 
seems there is no need to wait till the cache/buffer is full.

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-06 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309648#comment-14309648
 ] 

Jimmy Xiang commented on HIVE-9574:
---

Patch 1 is on RB: https://reviews.apache.org/r/30739/

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Fix Version/s: spark-branch
   Status: Patch Available  (was: Open)

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9574.1-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-06 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9574:
--
Attachment: HIVE-9574.1-spark.patch

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
> Attachments: HIVE-9574.1-spark.patch
>
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]

2015-02-04 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-9574:
-

Assignee: Jimmy Xiang

> Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark 
> Branch]
> 
>
> Key: HIVE-9574
> URL: https://issues.apache.org/jira/browse/HIVE-9574
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Rui Li
>Assignee: Jimmy Xiang
>
> {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is 
> expensive. If we switch {{container}} and {{backupContainer}} frequently in 
> {{HiveKVResultCache}}, it will downgrade performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]

2015-02-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HIVE-9339.
---
Resolution: Won't Fix

We will depend on CombineFileInputFormat to do the split grouping. Close the 
issue for now.

> Optimize split grouping for CombineHiveInputFormat [Spark Branch]
> -
>
> Key: HIVE-9339
> URL: https://issues.apache.org/jira/browse/HIVE-9339
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
>
> It seems that split generation, especially in terms of grouping inputs, needs 
> to be improved. For this, we may need cluster information. Because of this, 
> we will first try to solve the problem for Spark.
> As to cluster information, Spark doesn't provide an API (SPARK-5080). 
> However, Spark doesn't have a listener API, with which Spark driver can get 
> notifications about executor going up/down, task starting/finishing, etc. 
> With this information, Spark client should be able to have a view of the 
> current cluster image.
> Spark developers mentioned that the listener can only be created after 
> SparkContext is started, at which time, some executions may have already 
> started and so the listener will miss some information. This can be fixed. 
> File a JIRA with Spark project if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8853) Make vectorization work with Spark [Spark Branch]

2015-02-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HIVE-8853.
---
   Resolution: Fixed
Fix Version/s: spark-branch

Vectorization works with Spark now. Close this jira.

> Make vectorization work with Spark [Spark Branch]
> -
>
> Key: HIVE-8853
> URL: https://issues.apache.org/jira/browse/HIVE-8853
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
>
> In Hive to make vectorization work, the reader needs to be also vectorized, 
> which means that the reader can read a chunk of rows (or a list of column 
> chunks) instead of one row at a time. However, we use Spark RDD for reading, 
> which again utilized the underlying inputformat to read. Subsequent 
> processing also needs to hapen in batches. We need to make sure that 
> vectorizatoin is working as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9492) Enable caching in MapInput for Spark

2015-02-03 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HIVE-9492:
--
Status: Open  (was: Patch Available)

> Enable caching in MapInput for Spark
> 
>
> Key: HIVE-9492
> URL: https://issues.apache.org/jira/browse/HIVE-9492
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9492.1-spark.patch, HIVE-9492.2-spark.patch, 
> prototype.patch
>
>
> Because of the IOContext problem (HIVE-8920, HIVE-9084), RDD caching is 
> currently disabled in MapInput. Prototyping shows that the problem can 
> solved. Thus, we should formalize the prototype and enable the caching. A 
> good query to test this is:
> {code}
> from (select * from dec union all select * from dec2) s
> insert overwrite table dec3 select s.name, sum(s.value) group by s.name
> insert overwrite table dec4 select s.name, s.value order by s.value;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9492) Enable caching in MapInput for Spark

2015-02-02 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301472#comment-14301472
 ] 

Jimmy Xiang commented on HIVE-9492:
---

V2 is uploaded to RB: https://reviews.apache.org/r/30502/

> Enable caching in MapInput for Spark
> 
>
> Key: HIVE-9492
> URL: https://issues.apache.org/jira/browse/HIVE-9492
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Jimmy Xiang
> Fix For: spark-branch
>
> Attachments: HIVE-9492.1-spark.patch, HIVE-9492.2-spark.patch, 
> prototype.patch
>
>
> Because of the IOContext problem (HIVE-8920, HIVE-9084), RDD caching is 
> currently disabled in MapInput. Prototyping shows that the problem can 
> solved. Thus, we should formalize the prototype and enable the caching. A 
> good query to test this is:
> {code}
> from (select * from dec union all select * from dec2) s
> insert overwrite table dec3 select s.name, sum(s.value) group by s.name
> insert overwrite table dec4 select s.name, s.value order by s.value;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >