[jira] [Created] (HIVE-16348) HoS query is canceled but error message shows RPC is closed
Jimmy Xiang created HIVE-16348: -- Summary: HoS query is canceled but error message shows RPC is closed Key: HIVE-16348 URL: https://issues.apache.org/jira/browse/HIVE-16348 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor When a HoS query is interrupted in getting app id, it keeps trying to get status till timedout, and return some RPC is closed error message which is misleading. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16292) SparkUtilities upload to HDFS doesn't work with viewfs
Jimmy Xiang created HIVE-16292: -- Summary: SparkUtilities upload to HDFS doesn't work with viewfs Key: HIVE-16292 URL: https://issues.apache.org/jira/browse/HIVE-16292 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor If scratchdir is set to viewfs, HoS fails with exception like {noformt} java.lang.IllegalArgumentException: Wrong FS: viewfs://ns-default/tmp/hive_scratch/hive/hive/_spark_session_dir/f4031fca-2885-4e7a-9b05-764d25d0e488/hive-exec-1.1.0-cdh5.7.2.jar, expected: hdfs://nameservice1 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:657) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:194) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215) at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1412) at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:499) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:351) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1949) at org.apache.hadoop.hive.ql.exec.spark.SparkUtilities.uploadToHDFS(SparkUtilities.java:86) {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-16286) Log canceled query id
Jimmy Xiang created HIVE-16286: -- Summary: Log canceled query id Key: HIVE-16286 URL: https://issues.apache.org/jira/browse/HIVE-16286 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Currently, just a generic message is logged when a query is canceled. It is better to log the query id as well. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15915) Emit progress percentage in getting operation status
Jimmy Xiang created HIVE-15915: -- Summary: Emit progress percentage in getting operation status Key: HIVE-15915 URL: https://issues.apache.org/jira/browse/HIVE-15915 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor When running a query asynchronously, client may want to check the progress periodically. HIVE-15473 is to support progressing bar on beeline for Tez. For this issue, we just want the progress percentage. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15208) Query string should be HTML encoded for Web UI
Jimmy Xiang created HIVE-15208: -- Summary: Query string should be HTML encoded for Web UI Key: HIVE-15208 URL: https://issues.apache.org/jira/browse/HIVE-15208 Project: Hive Issue Type: Bug Components: Web UI Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14928) Analyze table no scan mess up schema
Jimmy Xiang created HIVE-14928: -- Summary: Analyze table no scan mess up schema Key: HIVE-14928 URL: https://issues.apache.org/jira/browse/HIVE-14928 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang StatsNoJobTask uses static variables partUpdates and table to track stats changes. If multiple analyze no scan tasks run at the same time, then table/partition schema could mess up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13782) Compile async query asynchronously
Jimmy Xiang created HIVE-13782: -- Summary: Compile async query asynchronously Key: HIVE-13782 URL: https://issues.apache.org/jira/browse/HIVE-13782 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Currently, when an async query is submitted to HS2, HS2 does the preparation synchronously. One of the preparation step is to compile the query, which may take some time. It will be helpful to provide an option to do the compilation asynchronously. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13679) Pass diagnostic message to failure hooks
Jimmy Xiang created HIVE-13679: -- Summary: Pass diagnostic message to failure hooks Key: HIVE-13679 URL: https://issues.apache.org/jira/browse/HIVE-13679 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Pass diagnostic message to failure hooks. This is useful for debugging remote job failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13559) Pass exception to failure hooks
Jimmy Xiang created HIVE-13559: -- Summary: Pass exception to failure hooks Key: HIVE-13559 URL: https://issues.apache.org/jira/browse/HIVE-13559 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Pass exception to failure hooks so that they know more about the failure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13501) Invoke failure hooks if query fails on exception
Jimmy Xiang created HIVE-13501: -- Summary: Invoke failure hooks if query fails on exception Key: HIVE-13501 URL: https://issues.apache.org/jira/browse/HIVE-13501 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor When a query fails on some exception, failure hooks are not called currently. It's better to invoke such hooks so that we know the query is failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13430) Pass error message to failure hook
Jimmy Xiang created HIVE-13430: -- Summary: Pass error message to failure hook Key: HIVE-13430 URL: https://issues.apache.org/jira/browse/HIVE-13430 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, the failure hook just knows the query failed. But it has no clue what the error is. It is better to pass the error message to the hook. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13237) Select parquet struct field with upper case throws NPE
Jimmy Xiang created HIVE-13237: -- Summary: Select parquet struct field with upper case throws NPE Key: HIVE-13237 URL: https://issues.apache.org/jira/browse/HIVE-13237 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Query "select msg.fieldone from test" throws NPE if msg's fieldone is actually fieldOne: {noformat} 2016-03-08 17:30:57,772 ERROR [main]: exec.FetchTask (FetchTask.java:initialize(86)) - java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator.initialize(ExprNodeFieldEvaluator.java:61) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:980) at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:63) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13043) Reload function has no impact to function registry
Jimmy Xiang created HIVE-13043: -- Summary: Reload function has no impact to function registry Key: HIVE-13043 URL: https://issues.apache.org/jira/browse/HIVE-13043 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang With HIVE-2573, users should run "reload function" to refresh cached function registry. However, "reload function" has no impact at all. We need to fix this. Otherwise, HS2 needs to be restarted to see new global functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-13026) Pending/running operation metrics are wrong
Jimmy Xiang created HIVE-13026: -- Summary: Pending/running operation metrics are wrong Key: HIVE-13026 URL: https://issues.apache.org/jira/browse/HIVE-13026 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang A query is finished, however the pending/running operation count doesn't decrease. For example, in TestHs2Metrics::testMetrics(), we have {noformat} MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, "api_hs2_operation_PENDING", 1); MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, "api_hs2_operation_RUNNING", 1); MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.COUNTER, "hs2_completed_operation_FINISHED", 1); {noformat} Should it be below? {noformat} MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, "api_hs2_operation_PENDING", 0); MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.TIMER, "api_hs2_operation_RUNNING", 0); MetricsTestUtils.verifyMetricsJson(json, MetricsTestUtils.COUNTER, "hs2_completed_operation_FINISHED", 1); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12987) Add metrics for HS2 active users and SQL operations
Jimmy Xiang created HIVE-12987: -- Summary: Add metrics for HS2 active users and SQL operations Key: HIVE-12987 URL: https://issues.apache.org/jira/browse/HIVE-12987 Project: Hive Issue Type: Task Reporter: Jimmy Xiang Assignee: Jimmy Xiang HIVE-12271 added metrics for all HS2 operations. Sometimes, users are also interested in metrics just for SQL operations. It is useful to track active user count as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12511) IN clause performs differently then = clause
Jimmy Xiang created HIVE-12511: -- Summary: IN clause performs differently then = clause Key: HIVE-12511 URL: https://issues.apache.org/jira/browse/HIVE-12511 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Similar to HIVE-11973, IN clause performs differently then = clause for "int" type with string values. For example, {noformat} SELECT * FROM inttest WHERE iValue IN ('01'); {noformat} will not return any rows with int iValue = 1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12485) Secure HS2 web UI with kerberos
Jimmy Xiang created HIVE-12485: -- Summary: Secure HS2 web UI with kerberos Key: HIVE-12485 URL: https://issues.apache.org/jira/browse/HIVE-12485 Project: Hive Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Jimmy Xiang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12484) Show meta operations on HS2 web UI
Jimmy Xiang created HIVE-12484: -- Summary: Show meta operations on HS2 web UI Key: HIVE-12484 URL: https://issues.apache.org/jira/browse/HIVE-12484 Project: Hive Issue Type: Sub-task Reporter: Jimmy Xiang As Mohit pointed out in the review of HIVE-12338, it is nice to show meta operations on HS2 web UI too. So that we can have an end-to-end picture for those operations access HMS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12471) Secure HS2 web UI with SSL and kerberos
Jimmy Xiang created HIVE-12471: -- Summary: Secure HS2 web UI with SSL and kerberos Key: HIVE-12471 URL: https://issues.apache.org/jira/browse/HIVE-12471 Project: Hive Issue Type: Sub-task Reporter: Jimmy Xiang Assignee: Jimmy Xiang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12338) Add webui to HiveServer2
Jimmy Xiang created HIVE-12338: -- Summary: Add webui to HiveServer2 Key: HIVE-12338 URL: https://issues.apache.org/jira/browse/HIVE-12338 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Jimmy Xiang Assignee: Jimmy Xiang A web ui for HiveServer2 can show some useful information such as: 1. Sessions, 2. Queries that are executing on the HS2, their states, starting time, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12318) qtest failing due to NPE in logStats
Jimmy Xiang created HIVE-12318: -- Summary: qtest failing due to NPE in logStats Key: HIVE-12318 URL: https://issues.apache.org/jira/browse/HIVE-12318 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.Operator.logStats(Operator.java:899) ~ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12317) Emit current database in lineage info
Jimmy Xiang created HIVE-12317: -- Summary: Emit current database in lineage info Key: HIVE-12317 URL: https://issues.apache.org/jira/browse/HIVE-12317 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor It will be easier to emit current database info explicitly instead of finding out such info from normalized column names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12287) Lineage for lateral view shows wrong dependencies
Jimmy Xiang created HIVE-12287: -- Summary: Lineage for lateral view shows wrong dependencies Key: HIVE-12287 URL: https://issues.apache.org/jira/browse/HIVE-12287 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang The lineage dependency graph for select from lateral view is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12278) Skip logging lineage for explain queries
Jimmy Xiang created HIVE-12278: -- Summary: Skip logging lineage for explain queries Key: HIVE-12278 URL: https://issues.apache.org/jira/browse/HIVE-12278 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor For explain queries, we don't generate the lineage info. So we should not try to log it at all. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12268) Context leaks deleteOnExit paths
Jimmy Xiang created HIVE-12268: -- Summary: Context leaks deleteOnExit paths Key: HIVE-12268 URL: https://issues.apache.org/jira/browse/HIVE-12268 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Long running HS2 saves lots of paths in the FileSystem's deleteOnExit map. We should remove those paths already removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12265) Generate lineage info only if requested
Jimmy Xiang created HIVE-12265: -- Summary: Generate lineage info only if requested Key: HIVE-12265 URL: https://issues.apache.org/jira/browse/HIVE-12265 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor If lineage related hook is not configured, we should not generate lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12225) LineageCtx should release all resources at clear
Jimmy Xiang created HIVE-12225: -- Summary: LineageCtx should release all resources at clear Key: HIVE-12225 URL: https://issues.apache.org/jira/browse/HIVE-12225 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Somce maps are not released in clear() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12200) INSERT INTO table using a select statement w/o a FROM clause fails
Jimmy Xiang created HIVE-12200: -- Summary: INSERT INTO table using a select statement w/o a FROM clause fails Key: HIVE-12200 URL: https://issues.apache.org/jira/browse/HIVE-12200 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Here is the stack trace: {noformat} FailedPredicateException(regularBody,{$s.tree.getChild(1) !=null}?) at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41047) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40222) at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40092) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1656) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1140) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:202) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:407) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:312) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1162) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1215) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1091) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1081) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:225) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:177) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:388) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:323) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:731) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:704) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:633) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:29 Failed to recognize predicate ''. Failed rule: 'regularBody' in statement {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12187) Release plan once a query is executed
Jimmy Xiang created HIVE-12187: -- Summary: Release plan once a query is executed Key: HIVE-12187 URL: https://issues.apache.org/jira/browse/HIVE-12187 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Some clients leave query operations open for a while so that they can retrieve the query results later. That means the allocated memory will be kept around too. We should release those resources not needed for query execution any more once it is executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-12046) Re-create spark client if connection is dropped
Jimmy Xiang created HIVE-12046: -- Summary: Re-create spark client if connection is dropped Key: HIVE-12046 URL: https://issues.apache.org/jira/browse/HIVE-12046 Project: Hive Issue Type: Bug Components: Spark Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, if the connection to the spark cluster is dropped, the spark client will stay in a bad state. A new Hive session is needed to re-establish the connection. It is better to auto reconnect in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11984) Add HS2 open operation metrics
Jimmy Xiang created HIVE-11984: -- Summary: Add HS2 open operation metrics Key: HIVE-11984 URL: https://issues.apache.org/jira/browse/HIVE-11984 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Some metrics for open operations should be helpful to track operations not closed/cancelled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11946) TestNotificationListener is flaky
Jimmy Xiang created HIVE-11946: -- Summary: TestNotificationListener is flaky Key: HIVE-11946 URL: https://issues.apache.org/jira/browse/HIVE-11946 Project: Hive Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor {noformat} expected:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE, DROP_DATABASE]> but was:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE]> Stacktrace java.lang.AssertionError: expected:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE, DROP_DATABASE]> but was:<[CREATE_DATABASE, CREATE_TABLE, ADD_PARTITION, ALTER_PARTITION, DROP_PARTITION, ALTER_TABLE, DROP_TABLE]> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hive.hcatalog.listener.TestNotificationListener.tearDown(TestNotificationListener.java:114) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11939) TxnDbUtil should turn off jdbc auto commit
Jimmy Xiang created HIVE-11939: -- Summary: TxnDbUtil should turn off jdbc auto commit Key: HIVE-11939 URL: https://issues.apache.org/jira/browse/HIVE-11939 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor TxnDbUtil uses jdbc transactions, but doesn't turn off auto commit. So some TestStreaming tests are flaky. For example, {noformat} testTransactionBatchAbortAndCommit(org.apache.hive.hcatalog.streaming.TestStreaming) Time elapsed: 0.011 sec <<< ERROR! java.sql.SQLException: Table/View 'TXNS' already exists in Schema 'APP'. at org.apache.derby.iapi.error.StandardException.newException(Unknown Source) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.duplicateDescriptorException(Unknown Source) at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addDescriptor(Unknown Source) at org.apache.derby.impl.sql.execute.CreateTableConstantAction.executeConstantAction(Unknown Source) at org.apache.derby.impl.sql.execute.MiscResultSet.open(Unknown Source) at org.apache.derby.impl.sql.GenericPreparedStatement.executeStmt(Unknown Source) at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) at org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:72) at org.apache.hadoop.hive.metastore.txn.TxnDbUtil.prepDb(TxnDbUtil.java:131) at org.apache.hive.hcatalog.streaming.TestStreaming.(TestStreaming.java:160) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11834) Lineage doesn't work with dynamic partitioning query
Jimmy Xiang created HIVE-11834: -- Summary: Lineage doesn't work with dynamic partitioning query Key: HIVE-11834 URL: https://issues.apache.org/jira/browse/HIVE-11834 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang As Mark found out, https://issues.apache.org/jira/browse/HIVE-11139?focusedCommentId=14745937&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14745937 This is indeed a code bug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11817) Window function max NullPointerException
Jimmy Xiang created HIVE-11817: -- Summary: Window function max NullPointerException Key: HIVE-11817 URL: https://issues.apache.org/jira/browse/HIVE-11817 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor This query {noformat} select key, max(value) over (order by key rows between 10 preceding and 20 following) from src1 where length(key) > 10; {noformat} fails with NPE: {noformat} Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:290) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:477) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337) at org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11814) Emit query time in lineage info
Jimmy Xiang created HIVE-11814: -- Summary: Emit query time in lineage info Key: HIVE-11814 URL: https://issues.apache.org/jira/browse/HIVE-11814 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, we emit query start time, not the query duration. It is nice to have it too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11771) Parquet timestamp conversion errors
Jimmy Xiang created HIVE-11771: -- Summary: Parquet timestamp conversion errors Key: HIVE-11771 URL: https://issues.apache.org/jira/browse/HIVE-11771 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang We have some problem to read timestamp written to parquet file by other tools. The value is wrong after the conversion (not the same as it is meant to be). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11737) IndexOutOfBounds compiling query with duplicated groupby keys
Jimmy Xiang created HIVE-11737: -- Summary: IndexOutOfBounds compiling query with duplicated groupby keys Key: HIVE-11737 URL: https://issues.apache.org/jira/browse/HIVE-11737 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang {noformat} SELECT tinyint_col_7, MIN(timestamp_col_1) AS timestamp_col, MAX(LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 476) AS int))) AS int_col, tinyint_col_7 AS int_col_1, LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 476) AS int)) AS int_col_2 FROM table_3 GROUP BY tinyint_col_7, tinyint_col_7, LEAST(CAST(COALESCE(int_col_5, -279) AS int), CAST(COALESCE(tinyint_col_7, 476) AS int)) {noformat} Query compilation fails: {noformat} Caused by: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3 at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanReduceSinkOperator(SemanticAnalyzer.java:4633) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5630) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8987) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9864) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9757) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10193) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10204) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10121) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:305) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11712) Duplicate groupby keys cause ClassCastException
Jimmy Xiang created HIVE-11712: -- Summary: Duplicate groupby keys cause ClassCastException Key: HIVE-11712 URL: https://issues.apache.org/jira/browse/HIVE-11712 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang With duplicate groupby keys, we could use wrong object inspectors for some groupby expressions, and lead to ClassCastException, for example, {noformat} explain SELECT distinct s1.customer_name as x, s1.customer_name as y FROM default.testv1_staples s1 join default.src s2 on s1.customer_name = s2.key HAVING ( (SUM(s1.customer_balance) <= 4074689.00041) AND (AVG(s1.discount) <= 822) AND (COUNT(s2.value) > 4) {noformat} will lead to {noformat} Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableShortObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFAverage$AbstractGenericUDAFAverageEvaluator.init(GenericUDAFAverage.java:374) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getGenericUDAFInfo(SemanticAnalyzer.java:3887) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanGroupByOperator1(SemanticAnalyzer.java:4354) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggrNoSkew(SemanticAnalyzer.java:5644) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8977) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9849) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9742) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10178) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10189) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10106) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11620) Fix several qtest output order
Jimmy Xiang created HIVE-11620: -- Summary: Fix several qtest output order Key: HIVE-11620 URL: https://issues.apache.org/jira/browse/HIVE-11620 Project: Hive Issue Type: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor selectDistinctStar.q unionall_unbalancedppd.q vector_cast_constant.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11586) ObjectInspectorFactory.getReflectionObjectInspector is not thread-safe
Jimmy Xiang created HIVE-11586: -- Summary: ObjectInspectorFactory.getReflectionObjectInspector is not thread-safe Key: HIVE-11586 URL: https://issues.apache.org/jira/browse/HIVE-11586 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang ObjectInspectorFactory#getReflectionObjectInspectorNoCache addes newly create object inspector to the cache before calling its init() method, to allow reusing the cache when dealing with recursive types. So a second thread can then call getReflectionObjectInspector and fetch an uninitialized instance of ReflectionStructObjectInspector. Another issue is that if two threads calls ObjectInspectorFactory.getReflectionObjectInspector at the same time. One thread could get an object inspector not in the cache, i.e. they could both call getReflectionObjectInspectorNoCache() but only one will put the new object inspector to cache successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11580) ThriftUnionObjectInspector#toString throws NPE
Jimmy Xiang created HIVE-11580: -- Summary: ThriftUnionObjectInspector#toString throws NPE Key: HIVE-11580 URL: https://issues.apache.org/jira/browse/HIVE-11580 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor ThriftUnionObjectInspector uses toString from StructObjectInspector, which accesses uninitialized member variable fields. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11473) Failed to create spark client with Spark 1.5
Jimmy Xiang created HIVE-11473: -- Summary: Failed to create spark client with Spark 1.5 Key: HIVE-11473 URL: https://issues.apache.org/jira/browse/HIVE-11473 Project: Hive Issue Type: Bug Components: Spark Reporter: Jimmy Xiang Assignee: Jimmy Xiang In Spark 1.5, SparkListener interface is changed. So HoS may fail to create the spark client if the un-implemented event callback method is invoked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11464) lineage info missing if there are multiple outputs
Jimmy Xiang created HIVE-11464: -- Summary: lineage info missing if there are multiple outputs Key: HIVE-11464 URL: https://issues.apache.org/jira/browse/HIVE-11464 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang If there are multiple outputs, for example, from (select ...) t insert into table t1 select * from t insert into table t2 select * from t; The lineage info for table t2 is not emitted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11426) lineage3.q fails with -Phadoop-1
Jimmy Xiang created HIVE-11426: -- Summary: lineage3.q fails with -Phadoop-1 Key: HIVE-11426 URL: https://issues.apache.org/jira/browse/HIVE-11426 Project: Hive Issue Type: Bug Components: Test Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Some queries in lineage3.q emit different results with -Phadoop-1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11184) Lineage - ExprProcFactory#getExprString may throw NullPointerException
Jimmy Xiang created HIVE-11184: -- Summary: Lineage - ExprProcFactory#getExprString may throw NullPointerException Key: HIVE-11184 URL: https://issues.apache.org/jira/browse/HIVE-11184 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor ColumnInfo may have null alias. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11139) Emit more lineage information
Jimmy Xiang created HIVE-11139: -- Summary: Emit more lineage information Key: HIVE-11139 URL: https://issues.apache.org/jira/browse/HIVE-11139 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang HIVE-1131 emits some column lineage info. But it doesn't support INSERT statements, or CTAS statements. It doesn't emit the predicate information either. We can enhance and use the dependency information created in HIVE-1131, generate more complete lineage info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10956) HS2 leaks HMS connections
Jimmy Xiang created HIVE-10956: -- Summary: HS2 leaks HMS connections Key: HIVE-10956 URL: https://issues.apache.org/jira/browse/HIVE-10956 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang HS2 uses threadlocal to cache HMS client in class Hive. When the thread is dead, the HMS client is not closed. So the connection to the HMS is leaked. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10757) Explain query plan should have operation name EXPLAIN
Jimmy Xiang created HIVE-10757: -- Summary: Explain query plan should have operation name EXPLAIN Key: HIVE-10757 URL: https://issues.apache.org/jira/browse/HIVE-10757 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial In the plan of an Explain query, the operation name is not set to EXPLAIN. Instead, it is set to the operation name of the query itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10740) RpcServer should be restarted if related configuration is changed [Spark Branch]
Jimmy Xiang created HIVE-10740: -- Summary: RpcServer should be restarted if related configuration is changed [Spark Branch] Key: HIVE-10740 URL: https://issues.apache.org/jira/browse/HIVE-10740 Project: Hive Issue Type: Bug Components: Spark Reporter: Jimmy Xiang In reviewing patch for HIVE-10721, Chengxiang pointed out an existing issue with HoS: the RpcServer is never restarted even related configurations are changed, as we do for SparkSession. We should monitor related configurations and restart the RpcServer if any is changed. It should be restarted while there is no active SparkSession. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10721) SparkSessionManagerImpl leaks SparkSessions [Spark Branch]
Jimmy Xiang created HIVE-10721: -- Summary: SparkSessionManagerImpl leaks SparkSessions [Spark Branch] Key: HIVE-10721 URL: https://issues.apache.org/jira/browse/HIVE-10721 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang In #getSession(), we create a SparkSession and save it in a set. If the session is failed to open, it will stay in the set till shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10499) Ensure Session/ZooKeeperClient instances are closed
Jimmy Xiang created HIVE-10499: -- Summary: Ensure Session/ZooKeeperClient instances are closed Key: HIVE-10499 URL: https://issues.apache.org/jira/browse/HIVE-10499 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Some Session/ZooKeeperClient instances are not closed in some scenario. We need to make sure they are always closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10473) Spark client is recreated even spark configuration is not changed
Jimmy Xiang created HIVE-10473: -- Summary: Spark client is recreated even spark configuration is not changed Key: HIVE-10473 URL: https://issues.apache.org/jira/browse/HIVE-10473 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, we think a spark setting is changed as long as the set method is called, even we set it to the same value as before. We should check if the value is changed too, since it takes time to start a new spark client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10365) First job fails with StackOverflowError
Jimmy Xiang created HIVE-10365: -- Summary: First job fails with StackOverflowError Key: HIVE-10365 URL: https://issues.apache.org/jira/browse/HIVE-10365 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang When running some queries on Yarn with standalone Hadoop, the first query fails with StackOverflowError: {noformat} java.lang.StackOverflowError at java.util.concurrent.ConcurrentHashMap.hash(ConcurrentHashMap.java:333) at java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1145) at java.lang.ClassLoader.getClassLoadingLock(ClassLoader.java:464) at java.lang.ClassLoader.loadClass(ClassLoader.java:405) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) at java.lang.ClassLoader.loadClass(ClassLoader.java:412) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10167) HS2 logs the server started only before the server is shut down
Jimmy Xiang created HIVE-10167: -- Summary: HS2 logs the server started only before the server is shut down Key: HIVE-10167 URL: https://issues.apache.org/jira/browse/HIVE-10167 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial TThreadPoolServer#serve() blocks till the server is down. We should log before that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10146) Not count session as idle if query is running
Jimmy Xiang created HIVE-10146: -- Summary: Not count session as idle if query is running Key: HIVE-10146 URL: https://issues.apache.org/jira/browse/HIVE-10146 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Currently, as long as there is no activity, we think the HS2 session is idle. This makes it very hard to set HIVE_SERVER2_IDLE_SESSION_TIMEOUT. If we don't set it long enough, an unattended query could be killed. We should provide an option to not to count the session as idle if some query is still running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10109) Creating index in a different db should not be allowed
Jimmy Xiang created HIVE-10109: -- Summary: Creating index in a different db should not be allowed Key: HIVE-10109 URL: https://issues.apache.org/jira/browse/HIVE-10109 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang With "in table" in creating an index, you can specify an index table name like db.index_table with a db that is different from the db the table belongs to. However, you can't drop this index any more since Hive assumes the index is always in the same db as the base table. We should not allow a qualified index table name specified in the "in table" value in creating an index. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10108) Index#getIndexTableName() returns db.index_table_name
Jimmy Xiang created HIVE-10108: -- Summary: Index#getIndexTableName() returns db.index_table_name Key: HIVE-10108 URL: https://issues.apache.org/jira/browse/HIVE-10108 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Index#getIndexTableName() used to just returns index table name. Now it returns a qualified table name. This change was introduced in HIVE-3781. As a result: IMetaStoreClient#getTable(index.getDbName(), index.getIndexTableName()) throws ObjectNotFoundException. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10073) Runtime exception when querying HBase with Spark [Spark Branch]
Jimmy Xiang created HIVE-10073: -- Summary: Runtime exception when querying HBase with Spark [Spark Branch] Key: HIVE-10073 URL: https://issues.apache.org/jira/browse/HIVE-10073 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Jimmy Xiang When querying HBase with Spark, we got {noformat} Caused by: java.lang.IllegalArgumentException: Must specify table name at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.setConf(TableOutputFormat.java:188) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:276) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveOutputFormat(HiveFileFormatUtils.java:266) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:331) {noformat} But it works fine for MapReduce. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10065) jp
Jimmy Xiang created HIVE-10065: -- Summary: jp Key: HIVE-10065 URL: https://issues.apache.org/jira/browse/HIVE-10065 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10023) Fix more cache related concurrency issue
Jimmy Xiang created HIVE-10023: -- Summary: Fix more cache related concurrency issue Key: HIVE-10023 URL: https://issues.apache.org/jira/browse/HIVE-10023 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Searched the code and found couple more issues with cache, such as LazyBinaryObjectInspectorFactory, PrimitiveObjectInspectorFactory, and TypeInfoFactory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10009) LazyObjectInspectorFactory is not thread safe [Spark Branch]
Jimmy Xiang created HIVE-10009: -- Summary: LazyObjectInspectorFactory is not thread safe [Spark Branch] Key: HIVE-10009 URL: https://issues.apache.org/jira/browse/HIVE-10009 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang LazyObjectInspectorFactory is not thread safe, which causes random failures in multiple thread environment such as Hive on Spark. We got exceptions like below {noformat} java.lang.RuntimeException: Map operator initialization failed: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:55) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:170) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.SettableStructObjectInspector at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:154) at org.apache.hadoop.hive.ql.exec.MapOperator.initObjectInspector(MapOperator.java:199) at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:355) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:92) ... 16 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9993) Retrying task could use cached bad operators [Spark Branch]
Jimmy Xiang created HIVE-9993: - Summary: Retrying task could use cached bad operators [Spark Branch] Key: HIVE-9993 URL: https://issues.apache.org/jira/browse/HIVE-9993 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch For a Spark task, it could be retried on the same executor in case some failures. In retrying, the cache task could be used. Since the operators in the task are already initialized, they won't be initialized again. The partial data in these operators could lead to wrong final results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9935) Fix tests for java 1.8 [Spark Branch]
Jimmy Xiang created HIVE-9935: - Summary: Fix tests for java 1.8 [Spark Branch] Key: HIVE-9935 URL: https://issues.apache.org/jira/browse/HIVE-9935 Project: Hive Issue Type: Bug Components: spark-branch Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch In spark branch, these tests don't have java 1.8 golden file: join0.q list_bucket_dml_2 subquery_multiinsert.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9929) StatsUtil#getAvailableMemory could return negative value
Jimmy Xiang created HIVE-9929: - Summary: StatsUtil#getAvailableMemory could return negative value Key: HIVE-9929 URL: https://issues.apache.org/jira/browse/HIVE-9929 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang In MAPREDUCE-5785, the default value of mapreduce.map.memory.mb is set to -1. We need fix StatsUtil#getAvailableMemory not to return negative value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9902) Map join small table files need more replications [Spark Branch]
Jimmy Xiang created HIVE-9902: - Summary: Map join small table files need more replications [Spark Branch] Key: HIVE-9902 URL: https://issues.apache.org/jira/browse/HIVE-9902 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch We have {noformat} replication = (short) Math.min(MIN_REPLICATION, numOfPartitions); {noformat} It should be {noformat} replication = (short) Math.max(MIN_REPLICATION, numOfPartitions); {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9861) Add spark-assembly on Hive's classpath
Jimmy Xiang created HIVE-9861: - Summary: Add spark-assembly on Hive's classpath Key: HIVE-9861 URL: https://issues.apache.org/jira/browse/HIVE-9861 Project: Hive Issue Type: Task Components: Spark Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: spark-branch If SPARK_HOME is set, or we can find out the SPARK_HOME, we should add Spark assembly to the classpath. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9847) Hive should not allow additional attemps when RSC fails [Spark Branch]
Jimmy Xiang created HIVE-9847: - Summary: Hive should not allow additional attemps when RSC fails [Spark Branch] Key: HIVE-9847 URL: https://issues.apache.org/jira/browse/HIVE-9847 Project: Hive Issue Type: Bug Components: Spark Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial Fix For: spark-branch In yarn-cluster mode, if RSC fails at the first time, yarn will restart it. HoS should set "yarn.resourcemanager.am.max-attempts" to 1 to disallow such restarting when submitting Spark jobs to Yarn in cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9830) Test auto_sortmerge_join_8 is flaky
Jimmy Xiang created HIVE-9830: - Summary: Test auto_sortmerge_join_8 is flaky Key: HIVE-9830 URL: https://issues.apache.org/jira/browse/HIVE-9830 Project: Hive Issue Type: Bug Components: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Fix For: spark-branch We found auto_sortmerge_join_8 is flaky is flaky for Spark. Sometimes, the output could be wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9804) Turn on some kryo settings by default for Spark
Jimmy Xiang created HIVE-9804: - Summary: Turn on some kryo settings by default for Spark Key: HIVE-9804 URL: https://issues.apache.org/jira/browse/HIVE-9804 Project: Hive Issue Type: Improvement Reporter: Jimmy Xiang Assignee: Jimmy Xiang Disable referrence checking, set classesToRegiser for Spark can boost the performance. We should do so by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9772) Hive parquet timestamp conversion doesn't work with new Parquet
Jimmy Xiang created HIVE-9772: - Summary: Hive parquet timestamp conversion doesn't work with new Parquet Key: HIVE-9772 URL: https://issues.apache.org/jira/browse/HIVE-9772 Project: Hive Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Currently Hive handles parquet timestamp compatibility issue with ParquetInpuSplit#readSupportMetadata. Parquet deprecated read support metadata. We can't pass meta data around with readSupportMetadata any more. We need a new scheme to pass around the timestamp conversion info. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-9659: - Assignee: Jimmy Xiang > 'Error while trying to create table container' occurs during hive query case > execution when hive.optimize.skewjoin set to 'true' [Spark Branch] > --- > > Key: HIVE-9659 > URL: https://issues.apache.org/jira/browse/HIVE-9659 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xin Hao >Assignee: Jimmy Xiang > > We found that 'Error while trying to create table container' occurs during > Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. > If hive.optimize.skewjoin set to 'false', the case could pass. > How to reproduce: > 1. set hive.optimize.skewjoin=true; > 2. Run BigBench case Q12 and it will fail. > Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you > will found error 'Error while trying to create table container' in the log > and also a NullPointerException near the end of the log. > (a) Detail error message for 'Error while trying to create table container': > {noformat} > 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while > trying to create table container > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) > ... 21 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a > directory: > hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) > ... 22 more > 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 > 15/02/12 01:29:49 INFO PerfLogger: from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > {noformat} > (b) Detail error message for NullPointerException: > {noformat} > 5/02/12 01:29:50 ERROR MapJoinOperat
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14323537#comment-14323537 ] Jimmy Xiang commented on HIVE-9659: --- I just ran the same query with just a small data set with skew join enabled. > 'Error while trying to create table container' occurs during hive query case > execution when hive.optimize.skewjoin set to 'true' [Spark Branch] > --- > > Key: HIVE-9659 > URL: https://issues.apache.org/jira/browse/HIVE-9659 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xin Hao > > We found that 'Error while trying to create table container' occurs during > Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. > If hive.optimize.skewjoin set to 'false', the case could pass. > How to reproduce: > 1. set hive.optimize.skewjoin=true; > 2. Run BigBench case Q12 and it will fail. > Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you > will found error 'Error while trying to create table container' in the log > and also a NullPointerException near the end of the log. > (a) Detail error message for 'Error while trying to create table container': > {noformat} > 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while > trying to create table container > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) > ... 21 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a > directory: > hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) > ... 22 more > 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 > 15/02/12 01:29:49 INFO PerfLogger: from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > {noformat} > (b) Detail error message for NullPo
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321013#comment-14321013 ] Jimmy Xiang commented on HIVE-9659: --- I can reproduce this issue with a tiny data set. > 'Error while trying to create table container' occurs during hive query case > execution when hive.optimize.skewjoin set to 'true' [Spark Branch] > --- > > Key: HIVE-9659 > URL: https://issues.apache.org/jira/browse/HIVE-9659 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xin Hao > > We found that 'Error while trying to create table container' occurs during > Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. > If hive.optimize.skewjoin set to 'false', the case could pass. > How to reproduce: > 1. set hive.optimize.skewjoin=true; > 2. Run BigBench case Q12 and it will fail. > Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you > will found error 'Error while trying to create table container' in the log > and also a NullPointerException near the end of the log. > (a) Detail error message for 'Error while trying to create table container': > {noformat} > 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while > trying to create table container > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) > ... 21 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a > directory: > hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) > ... 22 more > 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 > 15/02/12 01:29:49 INFO PerfLogger: from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > {noformat} > (b) Detail error message for NullPointerException: > {noformat}
[jira] [Commented] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320906#comment-14320906 ] Jimmy Xiang commented on HIVE-9659: --- How big is the data set? Does it work with a small data set? > 'Error while trying to create table container' occurs during hive query case > execution when hive.optimize.skewjoin set to 'true' [Spark Branch] > --- > > Key: HIVE-9659 > URL: https://issues.apache.org/jira/browse/HIVE-9659 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xin Hao > > We found that 'Error while trying to create table container' occurs during > Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. > If hive.optimize.skewjoin set to 'false', the case could pass. > How to reproduce: > 1. set hive.optimize.skewjoin=true; > 2. Run BigBench case Q12 and it will fail. > Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you > will found error 'Error while trying to create table container' in the log > and also a NullPointerException near the end of the log. > (a) Detail error message for 'Error while trying to create table container': > {noformat} > 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > org.apache.hadoop.hive.ql.metadata.HiveException: > org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to > create table container > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) > at > org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) > at > org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) > at > org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) > at > org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) > at > scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) > at > org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while > trying to create table container > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) > at > org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) > ... 21 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a > directory: > hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable > at > org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) > ... 22 more > 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 > 15/02/12 01:29:49 INFO PerfLogger: from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler> > {noformat} > (b) Detail error message for NullPointerException:
[jira] [Assigned] (HIVE-9649) Skip orderBy or sortBy for intermediate result [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-9649: - Assignee: Jimmy Xiang > Skip orderBy or sortBy for intermediate result [Spark Branch] > - > > Key: HIVE-9649 > URL: https://issues.apache.org/jira/browse/HIVE-9649 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Jimmy Xiang > > orderBy and sortBy seem relevant only to the final result. Thus, for most of > cases, if not all, orderBy/sortBy can be ignored for efficiency. This JIRA is > to identify the opportunity and optimize it accordingly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9075) Allow RPC Configuration [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HIVE-9075. --- Resolution: Fixed This issue appears to be fixed already with HIVE-9337. > Allow RPC Configuration [Spark Branch] > -- > > Key: HIVE-9075 > URL: https://issues.apache.org/jira/browse/HIVE-9075 > Project: Hive > Issue Type: Sub-task >Affects Versions: spark-branch >Reporter: Brock Noland > > [~vanzin] has a bunch of nice config properties in RpcConfiguration: > https://github.com/apache/hive/blob/spark/spark-client/src/main/java/org/apache/hive/spark/client/rpc/RpcConfiguration.java#L68 > However, we only load config properties which start with spark: > https://github.com/apache/hive/blob/spark/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveSparkClientFactory.java#L102 > thus it's not possible to set this one the server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9646: -- Status: Patch Available (was: Open) > Beeline doesn't show Spark job progress info [Spark Branch] > --- > > Key: HIVE-9646 > URL: https://issues.apache.org/jira/browse/HIVE-9646 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Attachments: HIVE-9646.1-spark.patch > > > Beeline can show MR job progress info, but can't show that of Spark job. CLI > doesn't have this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9646: -- Attachment: HIVE-9646.1-spark.patch > Beeline doesn't show Spark job progress info [Spark Branch] > --- > > Key: HIVE-9646 > URL: https://issues.apache.org/jira/browse/HIVE-9646 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > Attachments: HIVE-9646.1-spark.patch > > > Beeline can show MR job progress info, but can't show that of Spark job. CLI > doesn't have this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315273#comment-14315273 ] Jimmy Xiang commented on HIVE-9646: --- It turns out that HIVE-9121 is lost somehow, probably during merging. > Beeline doesn't show Spark job progress info [Spark Branch] > --- > > Key: HIVE-9646 > URL: https://issues.apache.org/jira/browse/HIVE-9646 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > > Beeline can show MR job progress info, but can't show that of Spark job. CLI > doesn't have this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14315234#comment-14315234 ] Jimmy Xiang commented on HIVE-9646: --- Such log seems to be filtered out by default. Let me update the filter a little. > Beeline doesn't show Spark job progress info [Spark Branch] > --- > > Key: HIVE-9646 > URL: https://issues.apache.org/jira/browse/HIVE-9646 > Project: Hive > Issue Type: Bug >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang > > Beeline can show MR job progress info, but can't show that of Spark job. CLI > doesn't have this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9646) Beeline doesn't show Spark job progress info [Spark Branch]
Jimmy Xiang created HIVE-9646: - Summary: Beeline doesn't show Spark job progress info [Spark Branch] Key: HIVE-9646 URL: https://issues.apache.org/jira/browse/HIVE-9646 Project: Hive Issue Type: Bug Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Beeline can show MR job progress info, but can't show that of Spark job. CLI doesn't have this problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314722#comment-14314722 ] Jimmy Xiang commented on HIVE-9574: --- Test index_auto_mult_tables is ok for me on my box. > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, > HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch, > HIVE-9574.6-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314491#comment-14314491 ] Jimmy Xiang commented on HIVE-9574: --- Cool, thanks. Attached v6 that addressed more minor review comments. > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, > HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch, > HIVE-9574.6-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Attachment: HIVE-9574.6-spark.patch > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, > HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch, > HIVE-9574.6-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9627: -- Attachment: HIVE-9627.1-spark.patch > Add cbo_gby_empty.q.out for Spark [Spark Branch] > > > Key: HIVE-9627 > URL: https://issues.apache.org/jira/browse/HIVE-9627 > Project: Hive > Issue Type: Test >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Trivial > Attachments: HIVE-9627.1-spark.patch > > > The golden file cbo_gby_empty.q.out for Spark is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9627: -- Status: Patch Available (was: Open) > Add cbo_gby_empty.q.out for Spark [Spark Branch] > > > Key: HIVE-9627 > URL: https://issues.apache.org/jira/browse/HIVE-9627 > Project: Hive > Issue Type: Test >Affects Versions: spark-branch >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Trivial > Attachments: HIVE-9627.1-spark.patch > > > The golden file cbo_gby_empty.q.out for Spark is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9627) Add cbo_gby_empty.q.out for Spark [Spark Branch]
Jimmy Xiang created HIVE-9627: - Summary: Add cbo_gby_empty.q.out for Spark [Spark Branch] Key: HIVE-9627 URL: https://issues.apache.org/jira/browse/HIVE-9627 Project: Hive Issue Type: Test Affects Versions: spark-branch Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Trivial The golden file cbo_gby_empty.q.out for Spark is missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Attachment: HIVE-9574.5-spark.patch Attached v5 that addressed review comments from Xuefu and Rui. Made the cache thread safe. > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, > HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch, HIVE-9574.5-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Attachment: HIVE-9574.4-spark.patch Another minor optimization. > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, > HIVE-9574.3-spark.patch, HIVE-9574.4-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Attachment: HIVE-9574.3-spark.patch Attached v3 with a minor fix. > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch, > HIVE-9574.3-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Attachment: HIVE-9574.2-spark.patch Attached v2 that removed aboutToSpill(). Discussed it with Szehon and Chao, it seems there is no need to wait till the cache/buffer is full. > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch, HIVE-9574.2-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14309648#comment-14309648 ] Jimmy Xiang commented on HIVE-9574: --- Patch 1 is on RB: https://reviews.apache.org/r/30739/ > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Fix Version/s: spark-branch Status: Patch Available (was: Open) > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9574.1-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9574: -- Attachment: HIVE-9574.1-spark.patch > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > Attachments: HIVE-9574.1-spark.patch > > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9574) Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-9574: - Assignee: Jimmy Xiang > Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark > Branch] > > > Key: HIVE-9574 > URL: https://issues.apache.org/jira/browse/HIVE-9574 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Rui Li >Assignee: Jimmy Xiang > > {{RowContainer.first}} may call {{InputFormat.getSplits}}, which is > expensive. If we switch {{container}} and {{backupContainer}} frequently in > {{HiveKVResultCache}}, it will downgrade performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9339) Optimize split grouping for CombineHiveInputFormat [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HIVE-9339. --- Resolution: Won't Fix We will depend on CombineFileInputFormat to do the split grouping. Close the issue for now. > Optimize split grouping for CombineHiveInputFormat [Spark Branch] > - > > Key: HIVE-9339 > URL: https://issues.apache.org/jira/browse/HIVE-9339 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Jimmy Xiang > > It seems that split generation, especially in terms of grouping inputs, needs > to be improved. For this, we may need cluster information. Because of this, > we will first try to solve the problem for Spark. > As to cluster information, Spark doesn't provide an API (SPARK-5080). > However, Spark doesn't have a listener API, with which Spark driver can get > notifications about executor going up/down, task starting/finishing, etc. > With this information, Spark client should be able to have a view of the > current cluster image. > Spark developers mentioned that the listener can only be created after > SparkContext is started, at which time, some executions may have already > started and so the listener will miss some information. This can be fixed. > File a JIRA with Spark project if necessary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8853) Make vectorization work with Spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HIVE-8853. --- Resolution: Fixed Fix Version/s: spark-branch Vectorization works with Spark now. Close this jira. > Make vectorization work with Spark [Spark Branch] > - > > Key: HIVE-8853 > URL: https://issues.apache.org/jira/browse/HIVE-8853 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Jimmy Xiang > Fix For: spark-branch > > > In Hive to make vectorization work, the reader needs to be also vectorized, > which means that the reader can read a chunk of rows (or a list of column > chunks) instead of one row at a time. However, we use Spark RDD for reading, > which again utilized the underlying inputformat to read. Subsequent > processing also needs to hapen in batches. We need to make sure that > vectorizatoin is working as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9492) Enable caching in MapInput for Spark
[ https://issues.apache.org/jira/browse/HIVE-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HIVE-9492: -- Status: Open (was: Patch Available) > Enable caching in MapInput for Spark > > > Key: HIVE-9492 > URL: https://issues.apache.org/jira/browse/HIVE-9492 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Xuefu Zhang >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9492.1-spark.patch, HIVE-9492.2-spark.patch, > prototype.patch > > > Because of the IOContext problem (HIVE-8920, HIVE-9084), RDD caching is > currently disabled in MapInput. Prototyping shows that the problem can > solved. Thus, we should formalize the prototype and enable the caching. A > good query to test this is: > {code} > from (select * from dec union all select * from dec2) s > insert overwrite table dec3 select s.name, sum(s.value) group by s.name > insert overwrite table dec4 select s.name, s.value order by s.value; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9492) Enable caching in MapInput for Spark
[ https://issues.apache.org/jira/browse/HIVE-9492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301472#comment-14301472 ] Jimmy Xiang commented on HIVE-9492: --- V2 is uploaded to RB: https://reviews.apache.org/r/30502/ > Enable caching in MapInput for Spark > > > Key: HIVE-9492 > URL: https://issues.apache.org/jira/browse/HIVE-9492 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Xuefu Zhang >Assignee: Jimmy Xiang > Fix For: spark-branch > > Attachments: HIVE-9492.1-spark.patch, HIVE-9492.2-spark.patch, > prototype.patch > > > Because of the IOContext problem (HIVE-8920, HIVE-9084), RDD caching is > currently disabled in MapInput. Prototyping shows that the problem can > solved. Thus, we should formalize the prototype and enable the caching. A > good query to test this is: > {code} > from (select * from dec union all select * from dec2) s > insert overwrite table dec3 select s.name, sum(s.value) group by s.name > insert overwrite table dec4 select s.name, s.value order by s.value; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)