[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027483#comment-14027483 ] Hive QA commented on HIVE-6394: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649609/HIVE-6394.6.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 5612 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_dml org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/431/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-431/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12649609 Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()
[ https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027501#comment-14027501 ] SUYEON LEE commented on HIVE-7183: -- [~swarnim] what does that meaning of non-binding? do u know how to change this issue's status to 'solved or patch-available'? Size of partColumnGrants should be checked in ObjectStore#removeRole() -- Key: HIVE-7183 URL: https://issues.apache.org/jira/browse/HIVE-7183 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7183.patch Here is related code: {code} ListMPartitionColumnPrivilege partColumnGrants = listPrincipalAllPartitionColumnGrants( mRol.getRoleName(), PrincipalType.ROLE); if (tblColumnGrants.size() 0) { pm.deletePersistentAll(partColumnGrants); {code} Size of tblColumnGrants is currently checked. Size of partColumnGrants should be checked instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Xu updated HIVE-5771: - Attachment: HIVE-5771.11.patch Fixed the major bugs in last patch. Thanks Ashutosh for verifying this patch. Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Xu updated HIVE-5771: - Status: Patch Available (was: Open) Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: Status: Patch Available (was: Open) sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode: mergepartial
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: Attachment: HIVE-7188.1.patch The current implementation of ColAndCol is buggy. I am modifying the evaluate() of ColAndCol. Will add test cases in the next patch and upload for review. Thanks Hari sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint)
[jira] [Commented] (HIVE-7204) Use NULL vertex location hint for Prewarm DAG vertices
[ https://issues.apache.org/jira/browse/HIVE-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027609#comment-14027609 ] Hive QA commented on HIVE-7204: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649522/HIVE-7204.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5534 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/432/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/432/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-432/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12649522 Use NULL vertex location hint for Prewarm DAG vertices -- Key: HIVE-7204 URL: https://issues.apache.org/jira/browse/HIVE-7204 Project: Hive Issue Type: Sub-task Components: Tez Affects Versions: 0.14.0 Reporter: Gopal V Assignee: Gopal V Priority: Minor Attachments: HIVE-7204.1.patch The current 0.5.x branch of Tez added extra preconditions which check for parallelism settings to match between the number of containers and the vertex location hints. {code} Caused by: org.apache.hadoop.ipc.RemoteException(java.lang.IllegalArgumentException): Locations array length must match the parallelism set for the vertex at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88) at org.apache.tez.dag.api.Vertex.setTaskLocationsHint(Vertex.java:105) at org.apache.tez.dag.app.DAGAppMaster.startPreWarmContainers(DAGAppMaster.java:1004) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
FW: HiveServer2 VS HiveServer1 Logging
Any change somebody has a clue about this? From: Dima Machlin [mailto:dima.mach...@pursway.com] Sent: Sunday, May 25, 2014 1:54 PM To: u...@hive.apache.org Subject: RE: HiveServer2 VS HiveServer1 Logging I’ve made some progress in investigating this. It seems that this behavior happens on certain conditions. As long as i’m running any query that isn’t “set” or “add” command the logging is fine. For example “show table” : 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: openSession() 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: getInfo() 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.run 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=TimeToSubmit 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=compile 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=parse 14/05/25 13:47:18 INFO parse.ParseDriver: Parsing command: show tables 14/05/25 13:47:18 INFO parse.ParseDriver: Parse Completed 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=parse start=1401014838047 end=1401014838376 duration=329 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=semanticAnalyze 14/05/25 13:47:18 INFO ql.Driver: Semantic Analysis Completed 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=semanticAnalyze start=1401014838376 end=1401014838453 duration=77 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initializing Self 0 OP 14/05/25 13:47:18 INFO exec.ListSinkOperator: Operator 0 OP initialized 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initialization Done 0 OP 14/05/25 13:47:18 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=compile start=1401014838011 end=1401014838521 duration=510 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.execute 14/05/25 13:47:18 INFO ql.Driver: Starting command: show tables 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=TimeToSubmit start=1401014838011 end=1401014838531 duration=520 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=runTasks 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=task.DDL.Stage-0 14/05/25 13:47:18 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083 14/05/25 13:47:18 INFO hive.metastore: Waiting 1 seconds before next connection attempt. 14/05/25 13:47:19 INFO hive.metastore: Connected to metastore. 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=task.DDL.Stage-0 start=1401014838531 end=1401014839627 duration=1096 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=runTasks start=1401014838531 end=1401014839627 duration=1096 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.execute start=1401014838521 end=1401014839627 duration=1106 OK 14/05/25 13:47:19 INFO ql.Driver: OK 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks start=1401014839627 end=1401014839627 duration=0 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.run start=1401014838011 end=1401014839627 duration=1616 14/05/25 13:47:19 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement() 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: getResultSetMetadata() 14/05/25 13:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded 14/05/25 13:47:19 INFO mapred.FileInputFormat: Total input paths to process : 1 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults() 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults() 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 finished. closing... 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 forwarded 0 rows 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks start=1401014839857 end=1401014839857 duration=0 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: closeOperation Now running : “set hive.enforce.bucketing = true;” 14/05/25 13:48:07 INFO operation.Operation: Putting temp output to file /tmp/hadoop/2db07453-2235-4f22-ab72-4a27c1b1457d2566159976359370628.pipeout 14/05/25 13:48:07 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement() 14/05/25 13:48:07 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=7b13a3e2-e0ea-4dae-b693-0d456519fc66]: getOperationStatus() First thing that happens is : “Putting temp output to file” and from now on, nothing is shown in the console. Running again “show tables”
[jira] [Commented] (HIVE-7175) Provide password file option to beeline
[ https://issues.apache.org/jira/browse/HIVE-7175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027698#comment-14027698 ] Larry McCay commented on HIVE-7175: --- I just realized that this is the users' LDAP password. It would be unfortunate to have to leave this laying around in various places unless absolutely necessary. Does the beeline CLI currently allow for using the java Console to collect the password from the user? I understand that for scripting type purposes we may need another collection mechanism but for usecases with a user and console available the users' passwords should not be persisted outside of the directory itself when it can be avoided. For cases where it can not be avoided the side file approach is certainly better than on the command line itself in terms of visibility. Provide password file option to beeline --- Key: HIVE-7175 URL: https://issues.apache.org/jira/browse/HIVE-7175 Project: Hive Issue Type: Improvement Components: CLI, Clients Affects Versions: 0.13.0 Reporter: Robert Justice Assignee: Dr. Wendell Urth Labels: features, security Attachments: HIVE-7175.patch For people connecting to Hive Server 2 with LDAP authentication enabled, in order to batch run commands, we currently have to provide the password openly in the command line. They could use some expect scripting, but I think a valid improvement would be to provide a password file option similar to other CLI commands in hadoop (e.g. sqoop) to be more secure. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7213) COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE
Moustafa Aboul Atta created HIVE-7213: - Summary: COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE Key: HIVE-7213 URL: https://issues.apache.org/jira/browse/HIVE-7213 Project: Hive Issue Type: Bug Components: Query Processor, Statistics Affects Versions: 0.13.0 Environment: HDP 2.1 Windows Server 2012 64-bit Reporter: Moustafa Aboul Atta Priority: Minor Running a query to count number of rows in a table through {{SELECT COUNT( * ) FROM t}} always returns the last number of rows added through the following statement: {{INSERT INTO TABLE t SELECT r FROM t2}} However, running {{SELECT * FROM t}} returns the expected results i.e. the old and newly added rows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde
[ https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027729#comment-14027729 ] Hive QA commented on HIVE-7022: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649654/HIVE-7022.patch {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5609 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/435/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/435/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-435/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12649654 Replace BinaryWritable with BytesWritable in Parquet serde -- Key: HIVE-7022 URL: https://issues.apache.org/jira/browse/HIVE-7022 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7022.patch Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from Parquet data. However, existing Hadoop class, BytesWritable, already does that, and BinaryWritable offers no advantage. On the other hand, BinaryWritable has a confusing getString() method, which, in misused, can cause unexpected result. The proposal here is to replace it with Hadoop BytesWritable. The issue was identified in HIVE-6367, serving as a follow-up JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7172) Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion()
[ https://issues.apache.org/jira/browse/HIVE-7172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027737#comment-14027737 ] Hive QA commented on HIVE-7172: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649718/HIVE-7172.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/437/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/437/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-437/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]] + export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera + export PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-437/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRecordReaderImpl.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java' Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/Reader.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcInputFormat.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target metastore/target common/target common/src/gen serde/target serde/src/java/org/apache/hadoop/hive/serde2/SearchArgument.java serde/src/java/org/apache/hadoop/hive/serde2/PredicateLeaf.java ql/target ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentFactory.java + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1601886. At revision 1601886. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12649718 Potential resource leak in HiveSchemaTool#getMetaStoreSchemaVersion() - Key: HIVE-7172 URL: https://issues.apache.org/jira/browse/HIVE-7172 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7172.patch {code} ResultSet res = stmt.executeQuery(versionQuery); if (!res.next()) { throw new HiveMetaException(Didn't find version data in metastore); } String currentSchemaVersion =
[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package
[ https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027733#comment-14027733 ] Hive QA commented on HIVE-7208: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649696/HIVE-7208.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/436/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/436/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-436/ Messages: {noformat} This message was trimmed, see log for full details As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_CASE TinyintLiteral using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_CASE KW_STRUCT using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:68:4: Decision can match input such as LPAREN KW_CASE SmallintLiteral using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:115:5: Decision can match input such as KW_CLUSTER KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:127:5: Decision can match input such as KW_PARTITION KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:138:5: Decision can match input such as KW_DISTRIBUTE KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:149:5: Decision can match input such as KW_SORT KW_BY LPAREN using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:166:7: Decision can match input such as STAR using multiple alternatives: 1, 2 As a result, alternative(s) 2 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_STRUCT using multiple alternatives: 4, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_UNIONTYPE using multiple alternatives: 5, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:179:5: Decision can match input such as KW_ARRAY using multiple alternatives: 2, 6 As a result, alternative(s) 6 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_DATE StringLiteral using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_FALSE using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_TRUE using multiple alternatives: 3, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:261:5: Decision can match input such as KW_NULL using multiple alternatives: 1, 8 As a result, alternative(s) 8 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_OVERWRITE using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT KW_INTO using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL KW_VIEW using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that input warning(200): IdentifiersParser.g:393:5: Decision can match input such as {KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP KW_BY using multiple alternatives: 2, 9 As a result, alternative(s) 9 were disabled for that
[jira] [Commented] (HIVE-7213) COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE
[ https://issues.apache.org/jira/browse/HIVE-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027748#comment-14027748 ] David Zanter commented on HIVE-7213: Any known work-around for this issue? COUNT(*) returns the count of the last inserted rows through INSERT INTO TABLE -- Key: HIVE-7213 URL: https://issues.apache.org/jira/browse/HIVE-7213 Project: Hive Issue Type: Bug Components: Query Processor, Statistics Affects Versions: 0.13.0 Environment: HDP 2.1 Windows Server 2012 64-bit Reporter: Moustafa Aboul Atta Priority: Minor Running a query to count number of rows in a table through {{SELECT COUNT( * ) FROM t}} always returns the last number of rows added through the following statement: {{INSERT INTO TABLE t SELECT r FROM t2}} However, running {{SELECT * FROM t}} returns the expected results i.e. the old and newly added rows. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()
[ https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027773#comment-14027773 ] Swarnim Kulkarni commented on HIVE-7183: [~suyeon1222] I am only a contributor on the project and not a committer. So my vote counts towards being a non-binding. A committer's vote is considered as a binding vote which you would need to get this patch accepted. For further information, refer to [1] [1] https://cwiki.apache.org/confluence/display/Hive/Proposed+Changes+to+Hive+Bylaws+for+Submodule+Committers#ProposedChangestoHiveBylawsforSubmoduleCommitters-DecisionMaking Size of partColumnGrants should be checked in ObjectStore#removeRole() -- Key: HIVE-7183 URL: https://issues.apache.org/jira/browse/HIVE-7183 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7183.patch Here is related code: {code} ListMPartitionColumnPrivilege partColumnGrants = listPrincipalAllPartitionColumnGrants( mRol.getRoleName(), PrincipalType.ROLE); if (tblColumnGrants.size() 0) { pm.deletePersistentAll(partColumnGrants); {code} Size of tblColumnGrants is currently checked. Size of partColumnGrants should be checked instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()
[ https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-7183: --- Status: Patch Available (was: Open) Size of partColumnGrants should be checked in ObjectStore#removeRole() -- Key: HIVE-7183 URL: https://issues.apache.org/jira/browse/HIVE-7183 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7183.patch Here is related code: {code} ListMPartitionColumnPrivilege partColumnGrants = listPrincipalAllPartitionColumnGrants( mRol.getRoleName(), PrincipalType.ROLE); if (tblColumnGrants.size() 0) { pm.deletePersistentAll(partColumnGrants); {code} Size of tblColumnGrants is currently checked. Size of partColumnGrants should be checked instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7214) Support predicate pushdown for complex data types in ORCFile
Rohini Palaniswamy created HIVE-7214: Summary: Support predicate pushdown for complex data types in ORCFile Key: HIVE-7214 URL: https://issues.apache.org/jira/browse/HIVE-7214 Project: Hive Issue Type: Improvement Reporter: Rohini Palaniswamy Currently ORCFile does not support predicate pushdown for complex datatypes like map, array and struct while Parquet does. Came across this during discussion of PIG-3760. Our users have a lot of map and struct (tuple in pig) columns and most of the filter conditions are on them. Would be great to have support added for them in ORC -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7183) Size of partColumnGrants should be checked in ObjectStore#removeRole()
[ https://issues.apache.org/jira/browse/HIVE-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027776#comment-14027776 ] Swarnim Kulkarni commented on HIVE-7183: {quote} do u know how to change this issue's status to 'solved or patch-available'? {quote} You just need to click on the Submit Patch button to change the status to Patch Available. One of the committers probably need to add you to the contributors list so that you can assign JIRAs to yourself. Size of partColumnGrants should be checked in ObjectStore#removeRole() -- Key: HIVE-7183 URL: https://issues.apache.org/jira/browse/HIVE-7183 Project: Hive Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: HIVE-7183.patch Here is related code: {code} ListMPartitionColumnPrivilege partColumnGrants = listPrincipalAllPartitionColumnGrants( mRol.getRoleName(), PrincipalType.ROLE); if (tblColumnGrants.size() 0) { pm.deletePersistentAll(partColumnGrants); {code} Size of tblColumnGrants is currently checked. Size of partColumnGrants should be checked instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7215) Support predicate pushdown for null checks in ORCFile
Rohini Palaniswamy created HIVE-7215: Summary: Support predicate pushdown for null checks in ORCFile Key: HIVE-7215 URL: https://issues.apache.org/jira/browse/HIVE-7215 Project: Hive Issue Type: Improvement Reporter: Rohini Palaniswamy Came across this missing feature during discussion of PIG-3760. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7216) Hive Query Failure on Hive 0.10.0
Suddhasatwa Bhaumik created HIVE-7216: - Summary: Hive Query Failure on Hive 0.10.0 Key: HIVE-7216 URL: https://issues.apache.org/jira/browse/HIVE-7216 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: hadoop 0.20.0, hive 0.10.0, Ubuntu 13.04 LTS Reporter: Suddhasatwa Bhaumik Hello, I have created a table and a view in hive as below: ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar; CREATE EXTERNAL TABLE IF NOT EXISTS ulf_raw ( transactionid STRING, externaltraceid STRING, externalreferenceid STRING, usecaseid STRING, timestampin STRING, timestampout STRING, component STRING, destination STRING, callerid STRING, service STRING, logpoint STRING, requestin STRING, status STRING, errorcode STRING, error STRING, servername STRING, inboundrequestip STRING, inboundrequestport STRING, outboundurl STRING, messagesize STRING, jmsdestination STRING, msisdn STRING, countrycode STRING, acr STRING, imei STRING, imsi STRING, iccid STRING, email STRING, payload STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( mapping.transactionid = transaction-id,mapping.timestampin = timestamp-in ) LOCATION '/home/bhaumik/input'; ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar; create view IF NOT EXISTS parse_soap_payload as select transactionid, component, logpoint, g.service as service, case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') end as opcoNodeId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoId\']/text()') end as opcoId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') end as partnerParentNodeId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerId\']/text()') end as partnerId from ulf_raw g; When I am running hive query: select * from parse_soap_payload; it is failing with attached error. I only have json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file in Hadoop LIB and HIVE LIB folder. Please advise if there are other JAR files required to be added here. If yes, please advise from where I can download them? Thanks, Suddhasatwa -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7216) Hive Query Failure on Hive 0.10.0
[ https://issues.apache.org/jira/browse/HIVE-7216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suddhasatwa Bhaumik updated HIVE-7216: -- Attachment: HadoopTaskDetails.html Error details are in the attached HTML files. Hive Query Failure on Hive 0.10.0 - Key: HIVE-7216 URL: https://issues.apache.org/jira/browse/HIVE-7216 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Environment: hadoop 0.20.0, hive 0.10.0, Ubuntu 13.04 LTS Reporter: Suddhasatwa Bhaumik Attachments: HadoopTaskDetails.html Hello, I have created a table and a view in hive as below: ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar; CREATE EXTERNAL TABLE IF NOT EXISTS ulf_raw ( transactionid STRING, externaltraceid STRING, externalreferenceid STRING, usecaseid STRING, timestampin STRING, timestampout STRING, component STRING, destination STRING, callerid STRING, service STRING, logpoint STRING, requestin STRING, status STRING, errorcode STRING, error STRING, servername STRING, inboundrequestip STRING, inboundrequestport STRING, outboundurl STRING, messagesize STRING, jmsdestination STRING, msisdn STRING, countrycode STRING, acr STRING, imei STRING, imsi STRING, iccid STRING, email STRING, payload STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( mapping.transactionid = transaction-id,mapping.timestampin = timestamp-in ) LOCATION '/home/bhaumik/input'; ADD JAR json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar; create view IF NOT EXISTS parse_soap_payload as select transactionid, component, logpoint, g.service as service, case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoNodeId\']/text()') end as opcoNodeId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'opcoId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'opcoId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'opcoId\']/text()') end as opcoId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerParentNodeId\']/text()') end as partnerParentNodeId , case g.service when 'createHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'createHierarchyNode\']/*[local-name()=\'partnerId\']/text()') when 'retrieveHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'retrieveHierarchyNode\']/*[local-name()=\'partnerId\']/text()') when 'updateHierarchyNode' then xpath_string(payload,'/*[local-name()=\'Envelope\']/*[local-name()=\'Body\']/*[local-name()=\'updateHierarchyNode\']/*[local-name()=\'partnerId\']/text()') end as partnerId from ulf_raw g; When I am running hive query: select * from parse_soap_payload; it is failing with attached error. I only have json-serde-1.1.6-SNAPSHOT-jar-with-dependencies.jar file in Hadoop LIB and HIVE LIB folder. Please advise if there are other JAR files required to be added here. If yes, please advise from where I can download them? Thanks, Suddhasatwa -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde
[ https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027840#comment-14027840 ] Xuefu Zhang commented on HIVE-7022: --- None of the test failures seem related. Patch is ready to be reviewed. [~brocknoland] Do you mind taking a look when you get a chance? Replace BinaryWritable with BytesWritable in Parquet serde -- Key: HIVE-7022 URL: https://issues.apache.org/jira/browse/HIVE-7022 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7022.patch Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from Parquet data. However, existing Hadoop class, BytesWritable, already does that, and BinaryWritable offers no advantage. On the other hand, BinaryWritable has a confusing getString() method, which, in misused, can cause unexpected result. The proposal here is to replace it with Hadoop BytesWritable. The issue was identified in HIVE-6367, serving as a follow-up JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7022) Replace BinaryWritable with BytesWritable in Parquet serde
[ https://issues.apache.org/jira/browse/HIVE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027845#comment-14027845 ] Brock Noland commented on HIVE-7022: Awesome +1 Replace BinaryWritable with BytesWritable in Parquet serde -- Key: HIVE-7022 URL: https://issues.apache.org/jira/browse/HIVE-7022 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.13.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-7022.patch Currently ParquetHiveSerde uses BinaryWritable to enclose bytes read from Parquet data. However, existing Hadoop class, BytesWritable, already does that, and BinaryWritable offers no advantage. On the other hand, BinaryWritable has a confusing getString() method, which, in misused, can cause unexpected result. The proposal here is to replace it with Hadoop BytesWritable. The issue was identified in HIVE-6367, serving as a follow-up JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027850#comment-14027850 ] Brock Noland commented on HIVE-6394: Tests appear to be unrelated. LGTM +1 Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.3.patch, HIVE-6394.4.patch, HIVE-6394.5.patch, HIVE-6394.6.patch, HIVE-6394.6.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Documentation Policy
Feel free to label such jiras with this keyword and ask the contributors for more information if you need any. Cool. I'll start chugging through the queue today adding labels as apt. On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair the...@hortonworks.com wrote: Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13? Sounds good to me. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Swarnim
[jira] [Commented] (HIVE-7211) Throws exception if the name of conf var starts with hive. does not exists in HiveConf
[ https://issues.apache.org/jira/browse/HIVE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027954#comment-14027954 ] Hive QA commented on HIVE-7211: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649716/HIVE-7211.1.patch.txt {color:red}ERROR:{color} -1 due to 75 failed/errored test(s), 5609 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_compact3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dbtxnmgr_showlocks org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_hook_context_cs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join36 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join37 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nulls org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_nullsafe org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_export_drop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_overridden_confs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedid_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rcfile_toleratecorruptions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_union_remove_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoinopt9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_aggregator_error_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_publisher_error_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_truncate_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_nested_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_virtual_column org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_handler_bulk org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2 org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
[jira] [Commented] (HIVE-2372) java.io.IOException: error=7, Argument list too long
[ https://issues.apache.org/jira/browse/HIVE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027993#comment-14027993 ] Sergey Tryuber commented on HIVE-2372: -- Hi Ryan, Yes, your issue is very related. Hive passes properties to TRANSFORM script via environment variables. In the scope of this ticket I've shortened only environment variable which stores information about partitions. In case of user-defined variables (via SET statement), I'm not even sure that approach to shorten them is correct. May be it would be better just to fail with error before map-reduce job execution and ask user to unset the variable (as you did). But it is quite hard to judge what is length limitation, because it depends on OS (even in my patch, as I remember, I hardcoded the length and now it seems to be not the best choice). As an alternative, Hive can print a warning, but continue the execution. Anyway, this issue had been closed so much time ago (and applied patch really solves the problem in issue description) that I think it would be better to create a new one and lead all the discussion there. Don't you mind to do it? java.io.IOException: error=7, Argument list too long Key: HIVE-2372 URL: https://issues.apache.org/jira/browse/HIVE-2372 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sergey Tryuber Priority: Critical Fix For: 0.10.0 Attachments: HIVE-2372.1.patch.txt, HIVE-2372.2.patch.txt I execute a huge query on a table with a lot of 2-level partitions. There is a perl reducer in my query. Maps worked ok, but every reducer fails with the following exception: 2011-08-11 04:58:29,865 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/perl, reducer.pl, my_argument] 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null 2011-08-11 04:58:29,935 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:129390185139228,reducesinkkey1:8AF163CA6F},value:{_col0:8AF163CA6F,_col1:2011-07-27 22:48:52,_col2:129390185139228,_col3:2006,_col4:4100,_col5:10017388=6,_col6:1063,_col7:NULL,_col8:address.com,_col9:NULL,_col10:NULL},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.io.IOException: Cannot run program /usr/bin/perl: java.io.IOException: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) ... 15 more Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.init(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 16 more It seems to me, I found the cause. ScriptOperator.java puts a lot of configs as environment variables to the child reduce process. One of variables is mapred.input.dir, which in my case more than 150KB. There are a
[jira] [Commented] (HIVE-7203) Optimize limit 0
[ https://issues.apache.org/jira/browse/HIVE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027998#comment-14027998 ] Ashutosh Chauhan commented on HIVE-7203: Yup, this is only for outermost limit. Opportunity to optimize away inner subquery with limit 0 with null scan still exists, although thats not a common case. Yeah, schema will be retained since as you said fetch task still exists which will have right schema. Optimize limit 0 Key: HIVE-7203 URL: https://issues.apache.org/jira/browse/HIVE-7203 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7203.1.patch, HIVE-7203.patch Some tools generate queries with limit 0. Lets optimize that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7203) Optimize limit 0
[ https://issues.apache.org/jira/browse/HIVE-7203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7203: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Optimize limit 0 Key: HIVE-7203 URL: https://issues.apache.org/jira/browse/HIVE-7203 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: 0.14.0 Attachments: HIVE-7203.1.patch, HIVE-7203.patch Some tools generate queries with limit 0. Lets optimize that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7195) Improve Metastore performance
[ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028046#comment-14028046 ] Mithun Radhakrishnan commented on HIVE-7195: I've been trying to solve the problem from the other end in HCatalog, I.e. registering partitions in the metastore, for data that was written to HDFS outside of Hive/HCatalog (e.g. through an ingestion service like Apache Falcon, etc.) There were several points at which I wished we had an abstraction for a partition-spec, at the metastore level (if not at the ObjectStore level.) It would be cool to have parallel functions like the following in the HiveMetaStore(Client) interface: {code} public PartitionSpec listPartitions(db_name, tbl_name, max_parts) throws ... ; public int add_partitions( PartitionSpec new_parts ) throws ... ; {code} where the PartitionSpec looks like: {code} public interface PartitionSpec { public ListPartition getPartitions(); public ListString getPartNames(); public IteratorPartition getPartitionIter(); public IteratorString getPartNameIter(); } {code} The DefaultPartitionSpec composes a ListPartition. An HDFSDirBasedPartitionSpec could be implemented to store a root-level partition-dir, and return Partition objects via globStatus() on HDFS. I would use this as an argument to addPartitions(PartitionSpec), to avoid having to specify all partitions explicitly. This avoids a bunch of thrift-serialization and traffic over the wire. A future PartitionSpec could choose to compose other PartitionSpecs. HiveMetaStoreClient.listPartitions() could choose to return a PartitionSpec that composes several Partition objects that use the same StorageDescriptor instance, so that 1 partitions with nearly the same SD don't repeat the redundant bits. I haven't worked out the nuts-and-bolts completely. I'll put a more complete proposal out on a separate JIRA. I think this will have value for both listPartitions() (i.e. read) and addPartitions() (i.e. write). I'd value your opinion on the approach. Improve Metastore performance - Key: HIVE-7195 URL: https://issues.apache.org/jira/browse/HIVE-7195 Project: Hive Issue Type: Improvement Reporter: Brock Noland Priority: Critical Even with direct SQL, which significantly improves MS performance, some operations take a considerable amount of time, when there are many partitions on table. Specifically I believe the issue: * When a client gets all partitions we do not send them an iterator, we create a collection of all data and then pass the object over the network in total * Operations which require looking up data on the NN can still be slow since there is no cache of information and it's done in a serial fashion * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout to the server so it can calculate that the client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)
HIVE 13 : Simple Join throwing java.io.IOException
Hi, I am trying to run a simple join query on hive 13. Both tables are in text format. Both tables are read in mappers, and the error is thrown in reducer. I don't get why a reducer is reading a table when the mappers have read it already and the reason for assuming that the video file is in SequenceFile format. Below, you can find query, query plan, and the error. Any help will be greatly appreciated. Thanks, Sid *Hadoop Version:* 2.0.0-mr1 Query: SELECT computerguid FROM revenue_start_adeffx_v2 JOIN video ON revenue_start_adeffx_v2.video_id = video.video_id WHERE hourid = '389567'; Query Plan: STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: revenue_start_adeffx_v2 Statistics: Num rows: 3175840 Data size: 330287403 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: video_id (type: int) sort order: + Map-reduce partition columns: video_id (type: int) Statistics: Num rows: 3175840 Data size: 330287403 Basic stats: COMPLETE Column stats: NONE value expressions: computerguid (type: string) TableScan alias: video Statistics: Num rows: 146679792 Data size: 586719168 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator key expressions: video_id (type: int) sort order: + Map-reduce partition columns: video_id (type: int) Statistics: Num rows: 146679792 Data size: 586719168 Basic stats: COMPLETE Column stats: NONE Reduce Operator Tree: Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {VALUE._col0} 1 outputColumnNames: _col0 Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 161347776 Data size: 645391104 Basic stats: COMPLETE Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0 Fetch Operator limit: -1 Error: 2014-06-11 10:18:34,818 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://NNPath/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.io.IOException: hdfs:/NNPath/hive/warehouse/video/video_20140611051139 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more 2014-06-11 10:18:34,822 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 2014-06-11 10:18:34,824 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException:
[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom
[ https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7206: --- Status: Open (was: Patch Available) Duplicate declaration of build-helper-maven-plugin in root pom -- Key: HIVE-7206 URL: https://issues.apache.org/jira/browse/HIVE-7206 Project: Hive Issue Type: Task Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7206.patch Results in following warnings while building: [WARNING] Some problems were encountered while building the effective model for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.codehaus.mojo:build-helper-maven-plugin @ org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17 [WARNING] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom
[ https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7206: --- Attachment: HIVE-7206.1.patch Fix another reference to said property. Duplicate declaration of build-helper-maven-plugin in root pom -- Key: HIVE-7206 URL: https://issues.apache.org/jira/browse/HIVE-7206 Project: Hive Issue Type: Task Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7206.1.patch, HIVE-7206.patch Results in following warnings while building: [WARNING] Some problems were encountered while building the effective model for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.codehaus.mojo:build-helper-maven-plugin @ org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17 [WARNING] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7206) Duplicate declaration of build-helper-maven-plugin in root pom
[ https://issues.apache.org/jira/browse/HIVE-7206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7206: --- Status: Patch Available (was: Open) Duplicate declaration of build-helper-maven-plugin in root pom -- Key: HIVE-7206 URL: https://issues.apache.org/jira/browse/HIVE-7206 Project: Hive Issue Type: Task Components: Build Infrastructure Affects Versions: 0.14.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7206.1.patch, HIVE-7206.patch Results in following warnings while building: [WARNING] Some problems were encountered while building the effective model for org.apache.hive:hive-it-custom-serde:jar:0.14.0-SNAPSHOT [WARNING] 'build.pluginManagement.plugins.plugin.(groupId:artifactId)' must be unique but found duplicate declaration of plugin org.codehaus.mojo:build-helper-maven-plugin @ org.apache.hive:hive:0.14.0-SNAPSHOT, pom.xml, line 638, column 17 [WARNING] -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028176#comment-14028176 ] Szehon Ho commented on HIVE-7065: - [~thejas] [~ekoifman] Hi, are we filing a JIRA to fix the broken test TestTempletonUtils? It is still failing on trunk. Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set
[ https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7200: -- Description: A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) {code} was: A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) Beeline output displays column heading even if --showHeader=false is set Key: HIVE-7200 URL: https://issues.apache.org/jira/browse/HIVE-7200 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7200.1.patch A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected
[jira] [Commented] (HIVE-7195) Improve Metastore performance
[ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028200#comment-14028200 ] Sergey Shelukhin commented on HIVE-7195: There's a jira somewhere to add iterators/limits to all partition methods. Improve Metastore performance - Key: HIVE-7195 URL: https://issues.apache.org/jira/browse/HIVE-7195 Project: Hive Issue Type: Improvement Reporter: Brock Noland Priority: Critical Even with direct SQL, which significantly improves MS performance, some operations take a considerable amount of time, when there are many partitions on table. Specifically I believe the issue: * When a client gets all partitions we do not send them an iterator, we create a collection of all data and then pass the object over the network in total * Operations which require looking up data on the NN can still be slow since there is no cache of information and it's done in a serial fashion * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout to the server so it can calculate that the client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez
[ https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028208#comment-14028208 ] Hive QA commented on HIVE-7212: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649738/HIVE-7212.1.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5609 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hadoop.hive.ql.exec.tez.TestTezTask.testSubmit org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/439/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/439/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-439/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12649738 Use resource re-localization instead of restarting sessions in Tez -- Key: HIVE-7212 URL: https://issues.apache.org/jira/browse/HIVE-7212 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7212.1.patch scriptfile1.q is failing on Tez because of a recent breakage in localization. On top of that we're currently restarting sessions if the resources have changed. (add file/add jar/etc). Instead of doing this we should just have tez relocalize these new resources. This way no session/AM restart is required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer
[ https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028220#comment-14028220 ] David Chen commented on HIVE-7094: -- Does someone have a chance to take a look at this? Separate out static/dynamic partitioning code in FileRecordWriterContainer -- Key: HIVE-7094 URL: https://issues.apache.org/jira/browse/HIVE-7094 Project: Hive Issue Type: Sub-task Components: HCatalog Reporter: David Chen Assignee: David Chen Attachments: HIVE-7094.1.patch There are two major places in FileRecordWriterContainer that have the {{if (dynamicPartitioning)}} condition: the constructor and write(). This is the approach that I am taking: # Move the DP and SP code into two subclasses: DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer. # Make FileRecordWriterContainer an abstract class that contains the common code for both implementations. For write(), FileRecordWriterContainer will call an abstract method that will provide the local RecordWriter, ObjectInspector, SerDe, and OutputJobInfo. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez
[ https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028224#comment-14028224 ] Sergey Shelukhin commented on HIVE-7212: This seems to be a duplicate of HIVE-6824 Use resource re-localization instead of restarting sessions in Tez -- Key: HIVE-7212 URL: https://issues.apache.org/jira/browse/HIVE-7212 Project: Hive Issue Type: Bug Components: Tez Affects Versions: 0.14.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-7212.1.patch scriptfile1.q is failing on Tez because of a recent breakage in localization. On top of that we're currently restarting sessions if the resources have changed. (add file/add jar/etc). Instead of doing this we should just have tez relocalize these new resources. This way no session/AM restart is required. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7200) Beeline output displays column heading even if --showHeader=false is set
[ https://issues.apache.org/jira/browse/HIVE-7200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028232#comment-14028232 ] Xuefu Zhang commented on HIVE-7200: --- [~ngangam] Could you repost the new formatting with your patch? The above result seems having empty lines, which isn't good. Also, add necessary tag so that JIRA will show exactly as you see in the console. Beeline output displays column heading even if --showHeader=false is set Key: HIVE-7200 URL: https://issues.apache.org/jira/browse/HIVE-7200 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Naveen Gangam Assignee: Naveen Gangam Priority: Minor Fix For: 0.14.0 Attachments: HIVE-7200.1.patch A few minor/cosmetic issues with the beeline CLI. 1) Tool prints the column headers despite setting the --showHeader to false. This property only seems to affect the subsequent header information that gets printed based on the value of property headerInterval (default value is 100). 2) When showHeader is true headerInterval 0, the header after the first interval gets printed after headerInterval - 1 rows. The code seems to count the initial header as a row, if you will. 3) The table footer(the line that closes the table) does not get printed if the showHeader is false. I think the table should get closed irrespective of whether it prints the header or not. {code} 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| +--+ 6 rows selected (3.998 seconds) 0: jdbc:hive2://localhost:1 !set headerInterval 2 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| +--+ | val | +--+ | f| | T| +--+ | val | +--+ | F| | 0| +--+ | val | +--+ | 1| +--+ 6 rows selected (0.691 seconds) 0: jdbc:hive2://localhost:1 !set showHeader false 0: jdbc:hive2://localhost:1 select * from stringvals; +--+ | val | +--+ | t| | f| | T| | F| | 0| | 1| 6 rows selected (1.728 seconds) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2564) Set dbname at JDBC URL or properties
[ https://issues.apache.org/jira/browse/HIVE-2564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-2564: - Resolution: Duplicate Status: Resolved (was: Patch Available) Closing this as duplicate. HiveServer1 is no longer supported and is scheduled to be removed from the code base (see HIVE-6977). Set dbname at JDBC URL or properties Key: HIVE-2564 URL: https://issues.apache.org/jira/browse/HIVE-2564 Project: Hive Issue Type: Improvement Components: JDBC Affects Versions: 0.7.1, 0.12.0 Reporter: Shinsuke Sugaya Priority: Critical Labels: patch Attachments: HIVE-2564.1.patch, HIVE-2564.2.patch, HIVE-2564.3.patch, hive-2564.patch The current Hive implementation ignores a database name at JDBC URL, though we can set it by executing use DBNAME statement. I think it is better to also specify a database name at JDBC URL or database properties. Therefore, I'll attach the patch. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6928) Beeline should not chop off describe extended results by default
[ https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6928: -- Description: By default, beeline truncates long results based on the console width like: {code} +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | | | | Detailed Table Information | Table(tableName:refills, dbName:default, owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto | +-+--+ 5 rows selected (0.4 seconds) {code} This can be changed by !outputformat, but the default should behave better to give a better experience to the first-time beeline user. was: By default, beeline truncates long results based on the console width like: +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | | | | Detailed Table Information | Table(tableName:refills, dbName:default, owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto | +-+--+ 5 rows selected (0.4 seconds) This can be changed by !outputformat, but the default should behave better to give a better experience to the first-time beeline user. Beeline should not chop off describe extended results by default -- Key: HIVE-6928 URL: https://issues.apache.org/jira/browse/HIVE-6928 Project: Hive Issue Type: Bug Components: CLI Reporter: Szehon Ho Assignee: Chinna Rao Lalam Attachments: HIVE-6928.1.patch, HIVE-6928.patch By default, beeline truncates long results based on the console width like: {code} +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | |
[jira] [Updated] (HIVE-3121) JDBC driver's getCatalogs() method returns schema/db information
[ https://issues.apache.org/jira/browse/HIVE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-3121: - Status: Open (was: Patch Available) JDBC driver's getCatalogs() method returns schema/db information Key: HIVE-3121 URL: https://issues.apache.org/jira/browse/HIVE-3121 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Carl Steinbach Assignee: Richard Ding Attachments: hive-3121.patch, hive-3121_1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3121) JDBC driver's getCatalogs() method returns schema/db information
[ https://issues.apache.org/jira/browse/HIVE-3121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028243#comment-14028243 ] Alan Gates commented on HIVE-3121: -- Looking at the current code (trunk post 0.13) it looks like it has already changed similar to what is suggested in this patch. Not exactly though. I'll move the JIRA from patch available to open. [~cwsteinbach], [~rding], do you want to close this as duplicate? JDBC driver's getCatalogs() method returns schema/db information Key: HIVE-3121 URL: https://issues.apache.org/jira/browse/HIVE-3121 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Carl Steinbach Assignee: Richard Ding Attachments: hive-3121.patch, hive-3121_1.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6928) Beeline should not chop off describe extended results by default
[ https://issues.apache.org/jira/browse/HIVE-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028250#comment-14028250 ] Xuefu Zhang commented on HIVE-6928: --- Could we have a review board entry which makes the review easier? Beeline should not chop off describe extended results by default -- Key: HIVE-6928 URL: https://issues.apache.org/jira/browse/HIVE-6928 Project: Hive Issue Type: Bug Components: CLI Reporter: Szehon Ho Assignee: Chinna Rao Lalam Attachments: HIVE-6928.1.patch, HIVE-6928.patch By default, beeline truncates long results based on the console width like: {code} +-+--+ | col_name | | +-+--+ | pat_id | string | | score | float | | acutes | float | | | | | Detailed Table Information | Table(tableName:refills, dbName:default, owner:hdadmin, createTime:1393882396, lastAccessTime:0, retention:0, sd:Sto | +-+--+ 5 rows selected (0.4 seconds) {code} This can be changed by !outputformat, but the default should behave better to give a better experience to the first-time beeline user. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7217) Inner join query fails in the reducer
Muthu created HIVE-7217: --- Summary: Inner join query fails in the reducer Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.1, 0.13.0 Reporter: Muthu SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu updated HIVE-7217: Attachment: reducer.log Inner join query fails in the reducer - Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.13.1 Reporter: Muthu Attachments: reducer.log SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7218) java.io.IOException: error=7, Argument list too long
Ryan Harris created HIVE-7218: - Summary: java.io.IOException: error=7, Argument list too long Key: HIVE-7218 URL: https://issues.apache.org/jira/browse/HIVE-7218 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1, 0.13.0, 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0, 0.7.1, 0.7.0 Reporter: Ryan Harris HIVE-2372 was originally created in response to this error message, however that patch was merely a work-around to handle the condition where mapred.input.dir is too long. Any other environment variable that is too long for the host OS will still cause a job failure. In my case: While creating a table with a large number of columns, a large hive variable is temporarily created using SET, the variable contains the columns and column descriptions. A CREATE TABLE statement then successfully uses that large variable. After successfully creating the table the hive script attempts to load data into the table using a TRANSFORM script, triggering the error: java.io.IOException: error=7, Argument list too long Since the variable is no longer used after the table is created, the hive script was updated to SET the large variable to empty. After setting the variable empty the second statement in the hive script ran fine. Hive should more gracefully notify the user as to the cause of the problem and offer a configurable approach for automatically handling the condition. In this case, originally identifying the cause of the issue was somewhat confusing since the portion of the hive script that referenced the long variable ran successfully, and the portion of the script that failed didn't even use/reference the variable that was causing that portion to fail. Since HIVE-2372 has already been Fixed this JIRA re-opens the issue since the original issue was worked around, not resolved... -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads
[ https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7210: - Assignee: Gunther Hagleitner (was: Jason Dere) NPE with No plan file found when running Driver instances on multiple threads --- Key: HIVE-7210 URL: https://issues.apache.org/jira/browse/HIVE-7210 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Gunther Hagleitner Informatica has a multithreaded application running multiple instances of CLIDriver. When running concurrent queries they sometimes hit the following error: {noformat} 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 'java.lang.NullPointerException(null)' java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1089) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902) at com.informatica.platform.dtm.executor.hive.impl.AbstractHiveDriverBaseImpl.run(AbstractHiveDriverBaseImpl.java:86) at com.informatica.platform.dtm.executor.hive.MHiveDriver.executeQuery(MHiveDriver.java:126) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeQuery(HiveTaskHandlerImpl.java:358) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeScript(HiveTaskHandlerImpl.java:247) at com.informatica.platform.dtm.executor.hive.task.impl.HiveTaskHandlerImpl.executeMainScript(HiveTaskHandlerImpl.java:194) at com.informatica.platform.ldtm.executor.common.workflow.taskhandler.impl.BaseTaskHandlerImpl.run(BaseTaskHandlerImpl.java:126) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[jira] [Commented] (HIVE-2372) java.io.IOException: error=7, Argument list too long
[ https://issues.apache.org/jira/browse/HIVE-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028312#comment-14028312 ] Ryan Harris commented on HIVE-2372: --- Thanks Sergey, HIVE-7218 created for continued tracking java.io.IOException: error=7, Argument list too long Key: HIVE-2372 URL: https://issues.apache.org/jira/browse/HIVE-2372 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sergey Tryuber Priority: Critical Fix For: 0.10.0 Attachments: HIVE-2372.1.patch.txt, HIVE-2372.2.patch.txt I execute a huge query on a table with a lot of 2-level partitions. There is a perl reducer in my query. Maps worked ok, but every reducer fails with the following exception: 2011-08-11 04:58:29,865 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: Executing [/usr/bin/perl, reducer.pl, my_argument] 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: tablename=null 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: partname=null 2011-08-11 04:58:29,866 INFO org.apache.hadoop.hive.ql.exec.ScriptOperator: alias=null 2011-08-11 04:58:29,935 FATAL ExecReducer: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:129390185139228,reducesinkkey1:8AF163CA6F},value:{_col0:8AF163CA6F,_col1:2011-07-27 22:48:52,_col2:129390185139228,_col3:2006,_col4:4100,_col5:10017388=6,_col6:1063,_col7:NULL,_col8:address.com,_col9:NULL,_col10:NULL},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot initialize ScriptOperator at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:320) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.io.IOException: Cannot run program /usr/bin/perl: java.io.IOException: error=7, Argument list too long at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:279) ... 15 more Caused by: java.io.IOException: java.io.IOException: error=7, Argument list too long at java.lang.UNIXProcess.init(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 16 more It seems to me, I found the cause. ScriptOperator.java puts a lot of configs as environment variables to the child reduce process. One of variables is mapred.input.dir, which in my case more than 150KB. There are a huge amount of input directories in this variable. In short, the problem is that Linux (up to 2.6.23 kernel version) limits summary size of environment variables for child processes to 132KB. This problem could be solved by upgrading the kernel. But strings limitations still be 132KB per string in environment variable. So such huge variable doesn't work even on my home computer (2.6.32). You can read more information on (http://www.kernel.org/doc/man-pages/online/pages/man2/execve.2.html). For now all our work has been stopped because of this problem and I can't find the solution. The only solution, which seems to me more reasonable is to get rid of this variable in reducers. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5595) Implement vectorized SMB JOIN
[ https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-5595: --- Labels: TODOC13 (was: ) Implement vectorized SMB JOIN - Key: HIVE-5595 URL: https://issues.apache.org/jira/browse/HIVE-5595 Project: Hive Issue Type: Sub-task Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-5595.1.patch, HIVE-5595.2.patch, HIVE-5595.3.patch Original Estimate: 168h Remaining Estimate: 168h Vectorized implementation of SMB Map Join. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028317#comment-14028317 ] Eugene Koifman commented on HIVE-7065: -- I'm looking at it now. Will make changes in this ticket Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: FW: HiveServer2 VS HiveServer1 Logging
I think that's expected. SQL Operations like show tables will reach Driver, which has perf and detailed logs about execution. Other operations like set or add are not SQL Operations, so in HS2 they don't hit the Driver and don't generate the logs. They are pretty simple ops that just set some state. Did those show in HS1? If so, maybe the implementation changed. Thanks Szehon On Wed, Jun 11, 2014 at 4:40 AM, Dima Machlin dima.mach...@pursway.com wrote: Any change somebody has a clue about this? From: Dima Machlin [mailto:dima.mach...@pursway.com] Sent: Sunday, May 25, 2014 1:54 PM To: u...@hive.apache.org Subject: RE: HiveServer2 VS HiveServer1 Logging I’ve made some progress in investigating this. It seems that this behavior happens on certain conditions. As long as i’m running any query that isn’t “set” or “add” command the logging is fine. For example “show table” : 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: openSession() 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: getInfo() 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.run 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=TimeToSubmit 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=compile 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=parse 14/05/25 13:47:18 INFO parse.ParseDriver: Parsing command: show tables 14/05/25 13:47:18 INFO parse.ParseDriver: Parse Completed 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=parse start=1401014838047 end=1401014838376 duration=329 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=semanticAnalyze 14/05/25 13:47:18 INFO ql.Driver: Semantic Analysis Completed 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=semanticAnalyze start=1401014838376 end=1401014838453 duration=77 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initializing Self 0 OP 14/05/25 13:47:18 INFO exec.ListSinkOperator: Operator 0 OP initialized 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initialization Done 0 OP 14/05/25 13:47:18 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=compile start=1401014838011 end=1401014838521 duration=510 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.execute 14/05/25 13:47:18 INFO ql.Driver: Starting command: show tables 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=TimeToSubmit start=1401014838011 end=1401014838531 duration=520 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=runTasks 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=task.DDL.Stage-0 14/05/25 13:47:18 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083 14/05/25 13:47:18 INFO hive.metastore: Waiting 1 seconds before next connection attempt. 14/05/25 13:47:19 INFO hive.metastore: Connected to metastore. 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=task.DDL.Stage-0 start=1401014838531 end=1401014839627 duration=1096 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=runTasks start=1401014838531 end=1401014839627 duration=1096 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.execute start=1401014838521 end=1401014839627 duration=1106 OK 14/05/25 13:47:19 INFO ql.Driver: OK 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks start=1401014839627 end=1401014839627 duration=0 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.run start=1401014838011 end=1401014839627 duration=1616 14/05/25 13:47:19 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement() 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: getResultSetMetadata() 14/05/25 13:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded 14/05/25 13:47:19 INFO mapred.FileInputFormat: Total input paths to process : 1 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults() 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults() 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 finished. closing... 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 forwarded 0 rows 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks start=1401014839857 end=1401014839857 duration=0 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: closeOperation Now running : “set hive.enforce.bucketing = true;” 14/05/25 13:48:07 INFO operation.Operation:
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu updated HIVE-7217: Summary: Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer (was: Inner join query fails in the reducer) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer -- Key: HIVE-7217 URL: https://issues.apache.org/jira/browse/HIVE-7217 Project: Hive Issue Type: Bug Affects Versions: 0.13.0, 0.13.1 Reporter: Muthu Attachments: reducer.log SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muthu updated HIVE-7217: Description: SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 16000 rows for join key [663184] 2014-06-11 12:32:39,061 INFO org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp 2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 2 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more was: SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following exception: 2014-06-11 12:32:39,299
Re: FW: HiveServer2 VS HiveServer1 Logging
Sorry I missed the last part mentioning that it messes up logs of show tables after set. That's strange, I tried on latest trunk and I don't see that happening, show tables still shows the perf logs. On Wed, Jun 11, 2014 at 1:06 PM, Szehon Ho sze...@cloudera.com wrote: I think that's expected. SQL Operations like show tables will reach Driver, which has perf and detailed logs about execution. Other operations like set or add are not SQL Operations, so in HS2 they don't hit the Driver and don't generate the logs. They are pretty simple ops that just set some state. Did those show in HS1? If so, maybe the implementation changed. Thanks Szehon On Wed, Jun 11, 2014 at 4:40 AM, Dima Machlin dima.mach...@pursway.com wrote: Any change somebody has a clue about this? From: Dima Machlin [mailto:dima.mach...@pursway.com] Sent: Sunday, May 25, 2014 1:54 PM To: u...@hive.apache.org Subject: RE: HiveServer2 VS HiveServer1 Logging I’ve made some progress in investigating this. It seems that this behavior happens on certain conditions. As long as i’m running any query that isn’t “set” or “add” command the logging is fine. For example “show table” : 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: openSession() 14/05/25 13:47:17 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: getInfo() 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.run 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=TimeToSubmit 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=compile 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=parse 14/05/25 13:47:18 INFO parse.ParseDriver: Parsing command: show tables 14/05/25 13:47:18 INFO parse.ParseDriver: Parse Completed 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=parse start=1401014838047 end=1401014838376 duration=329 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=semanticAnalyze 14/05/25 13:47:18 INFO ql.Driver: Semantic Analysis Completed 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=semanticAnalyze start=1401014838376 end=1401014838453 duration=77 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initializing Self 0 OP 14/05/25 13:47:18 INFO exec.ListSinkOperator: Operator 0 OP initialized 14/05/25 13:47:18 INFO exec.ListSinkOperator: Initialization Done 0 OP 14/05/25 13:47:18 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=compile start=1401014838011 end=1401014838521 duration=510 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=Driver.execute 14/05/25 13:47:18 INFO ql.Driver: Starting command: show tables 14/05/25 13:47:18 INFO ql.Driver: /PERFLOG method=TimeToSubmit start=1401014838011 end=1401014838531 duration=520 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=runTasks 14/05/25 13:47:18 INFO ql.Driver: PERFLOG method=task.DDL.Stage-0 14/05/25 13:47:18 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083 14/05/25 13:47:18 INFO hive.metastore: Waiting 1 seconds before next connection attempt. 14/05/25 13:47:19 INFO hive.metastore: Connected to metastore. 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=task.DDL.Stage-0 start=1401014838531 end=1401014839627 duration=1096 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=runTasks start=1401014838531 end=1401014839627 duration=1096 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.execute start=1401014838521 end=1401014839627 duration=1106 OK 14/05/25 13:47:19 INFO ql.Driver: OK 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks start=1401014839627 end=1401014839627 duration=0 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=Driver.run start=1401014838011 end=1401014839627 duration=1616 14/05/25 13:47:19 INFO cli.CLIService: SessionHandle [2db07453-2235-4f22-ab72-4a27c1b1457d]: executeStatement() 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: getResultSetMetadata() 14/05/25 13:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded 14/05/25 13:47:19 INFO mapred.FileInputFormat: Total input paths to process : 1 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults() 14/05/25 13:47:19 INFO cli.CLIService: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=0628b8f8-01de-4397-8279-a314cf553d7f]: fetchResults() 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 finished. closing... 14/05/25 13:47:19 INFO exec.ListSinkOperator: 0 forwarded 0 rows 14/05/25 13:47:19 INFO ql.Driver: PERFLOG method=releaseLocks 14/05/25 13:47:19 INFO ql.Driver: /PERFLOG method=releaseLocks start=1401014839857
[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028377#comment-14028377 ] Ashutosh Chauhan commented on HIVE-5771: [~tedxu] Can you create a Review Board request for your latest patch. I took a cursorily look and have following observations: * In few tests an extra (or in some cases 2) MR stages got added in the plan. These tests were testing specific optimizations, so seems like those optimizations got disabled now. Tests are : groupby_sort_1.q,groupby_sort_skew_1.q * Tests subquery_multiinsert.q,subquery_notin.q are generating in wrong results * For test annotate_stats_filter.q plan changed from MR to fetch-only, which seems like an improvement. But, not sure how plan got changed. * Some join tests now print a warning about being getting converted into cross-join, which will be performance degradation. cluster.q,join38.q,join_literals.q,join_nullsafe.q,ppd2.q,ppd_clusterby.q,ppd_join4.q,ppd_outer_join5.q * Test smb_mapjoin_25.q is failing with following stack trace: {code} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.getValueObjectInspectors(MapJoinOperator.java:135) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.getJoinOutputObjectInspector(CommonJoinOperator.java:167) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.initializeOp(CommonJoinOperator.java:310) at org.apache.hadoop.hive.ql.exec.AbstractMapJoinOperator.initializeOp(AbstractMapJoinOperator.java:72) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.initializeOp(MapJoinOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:464) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:420) at org.apache.hadoop.hive.ql.exec.HashTableDummyOperator.initializeOp(HashTableDummyOperator.java:40) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:380) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:145) {code} Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, HIVE-5771.11.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, HIVE-5771.patch.javaonly Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7217) Inner join query fails in the reducer when join key file is spilled to tmp by RowContainer
[ https://issues.apache.org/jira/browse/HIVE-7217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-7217: -- Description: {code} SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) {code} The reducer fails with the following exception: {code} 2014-06-11 12:32:39,051 INFO org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 16000 rows for join key [663184] 2014-06-11 12:32:39,061 INFO org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /mnt/volume2/mapred/local/taskTracker/muthu.nivas/jobcache/job_201405301214_170634/attempt_201405301214_170634_r_00_0/work/tmp/hive-rowcontainer413460656723947992/RowContainer1053550561043043830.tmp 2014-06-11 12:32:39,237 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 2 2014-06-11 12:32:39,299 WARN org.apache.hadoop.mapred.Child: Error running child java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:644) at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758) at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:216) ... 7 more Caused by: java.io.IOException: hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video/video_20140611071209 not a SequenceFile at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1805) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1765) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1714) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1728) at org.apache.hadoop.mapred.SequenceFileRecordReader.init(SequenceFileRecordReader.java:43) at org.apache.hadoop.mapred.SequenceFileInputFormat.getRecordReader(SequenceFileInputFormat.java:59) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:226) ... 12 more {code} was: SELECT T1.userid, T2.video_title FROM videoview T1 JOIN video T2 ON T1.video_id = T2.video_id WHERE T1.hourid=389567 hive show create table video; OK CREATE TABLE `video`( `video_id` int, `video_title` string, ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://elsharpynn001.prod.hulu.com:8020/hive/warehouse/video' TBLPROPERTIES ( 'numPartitions'='0', 'numFiles'='1', 'last_modified_by'='hadoop', 'last_modified_time'='1336446601', 'COLUMN_STATS_ACCURATE'='true', 'transient_lastDdlTime'='1402514051', 'numRows'='0', 'totalSize'='586773666', 'rawDataSize'='0') Time taken: 0.249 seconds, Fetched: 98 row(s) The reducer fails with the following
[jira] [Created] (HIVE-7219) Improve performance of serialization utils in ORC
Prasanth J created HIVE-7219: Summary: Improve performance of serialization utils in ORC Key: HIVE-7219 URL: https://issues.apache.org/jira/browse/HIVE-7219 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: Attachment: HIVE-7188.2.patch sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode: mergepartial
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: Status: Open (was: Patch Available) sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode:
[jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC
[ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7219: - Attachment: HIVE-7219.1.patch Improve performance of serialization utils in ORC - Key: HIVE-7219 URL: https://issues.apache.org/jira/browse/HIVE-7219 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7219.1.patch ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7188: Status: Patch Available (was: Open) sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator aggregations: sum(VALUE._col0) mode:
Review Request 22478: HIVE-7188 sum(if()) returns wrong results with vectorization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22478/ --- Review request for hive, Gopal V and Jitendra Pandey. Bugs: HIVE-7188 https://issues.apache.org/jira/browse/HIVE-7188 Repository: hive-git Description --- ColAndCol.evaluate() is incorrectly implemented. Needed to rewrite the evaluate(). Also added junit tests. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ColAndCol.java cb2a952 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorLogicalExpressions.java 3df7c14 Diff: https://reviews.apache.org/r/22478/diff/ Testing --- Thanks, Hari Sankar Sivarama Subramaniyan
[jira] [Commented] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028416#comment-14028416 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-7188: - https://reviews.apache.org/r/22478 sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Map Reduce Map Operator Tree: TableScan alias: ttr_day0 Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: is_returning (type: boolean), is_free (type: boolean) outputColumnNames: is_returning, is_free Statistics: Num rows: 447 Data size: 3581 Basic stats: COMPLETE Column stats: NONE Group By Operator aggregations: sum(if(((is_returning = true) and (is_free = false)), 1, 0)) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: bigint) Execution mode: vectorized Reduce Operator Tree: Group By Operator
[jira] [Updated] (HIVE-7166) Vectorization with UDFs returns incorrect results
[ https://issues.apache.org/jira/browse/HIVE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-7166: Status: Patch Available (was: Open) Vectorization with UDFs returns incorrect results - Key: HIVE-7166 URL: https://issues.apache.org/jira/browse/HIVE-7166 Project: Hive Issue Type: Bug Components: Vectorization Affects Versions: 0.13.0 Environment: Hive 0.13 with Hadoop 2.4 on a 3 node cluster Reporter: Benjamin Bowman Assignee: Hari Sankar Sivarama Subramaniyan Priority: Minor Attachments: HIVE-7166.1.patch, HIVE-7166.2.patch Using BETWEEN, a custom UDF, and vectorized query execution yields incorrect query results. Example Query: SELECT column_1 FROM table_1 WHERE column_1 BETWEEN (UDF_1 - X) and UDF_1 The following test scenario will reproduce the problem: TEST UDF (SIMPLE FUNCTION THAT TAKES NO ARGUMENTS AND RETURNS 1): package com.test; import org.apache.hadoop.hive.ql.exec.Description; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import java.lang.String; import java.lang.*; public class tenThousand extends UDF { private final LongWritable result = new LongWritable(); public LongWritable evaluate() { result.set(1); return result; } } TEST DATA (test.input): 1|CBCABC|12 2|DBCABC|13 3|EBCABC|14 4|ABCABC|15 5|BBCABC|16 6|CBCABC|17 CREATING ORC TABLE: 0: jdbc:hive2://server:10002/db create table testTabOrc (first bigint, second varchar(20), third int) partitioned by (range int) clustered by (first) sorted by (first) into 8 buckets stored as orc tblproperties (orc.compress = SNAPPY, orc.index = true); CREATE LOADING TABLE: 0: jdbc:hive2://server:10002/db create table loadingDir (first bigint, second varchar(20), third int) partitioned by (range int) row format delimited fields terminated by '|' stored as textfile; COPY IN DATA: [root@server]# hadoop fs -copyFromLocal /tmp/test.input /db/loading/. ORC DATA: [root@server]# beeline -u jdbc:hive2://server:10002/db -n root --hiveconf hive.exec.dynamic.partition.mode=nonstrict --hiveconf hive.enforce.sorting=true -e insert into table testTabOrc partition(range) select * from loadingDir; LOAD TEST FUNCTION: 0: jdbc:hive2://server:10002/db add jar /opt/hadoop/lib/testFunction.jar 0: jdbc:hive2://server:10002/db create temporary function ten_thousand as 'com.test.tenThousand'; TURN OFF VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=false; QUERY (RESULTS AS EXPECTED): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ | 1 | | 2 | | 3 | ++ 3 rows selected (15.286 seconds) TURN ON VECTORIZATION: 0: jdbc:hive2://server:10002/db set hive.vectorized.execution.enabled=true; QUERY AGAIN (WRONG RESULTS): 0: jdbc:hive2://server:10002/db select first from testTabOrc where first between ten_thousand()-1 and ten_thousand()-9995; ++ | first | ++ ++ No rows selected (17.763 seconds) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5019) Use StringBuffer instead of += (issue 1)
[ https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-5019: - Status: Open (was: Patch Available) Sorry, patch is out of date and no longer applies. I think this is good work though. If you want to update it against the current trunk I can take a look at it quickly so it doesn't go stale again. Use StringBuffer instead of += (issue 1) Key: HIVE-5019 URL: https://issues.apache.org/jira/browse/HIVE-5019 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Attachments: HIVE-5019.2.patch.txt, HIVE-5019.3.patch.txt Issue 1 - use of StringBuilder over += inside loops. java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java java/org/apache/hadoop/hive/ql/plan/PlanUtils.java java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java java/org/apache/hadoop/hive/ql/udf/UDFLike.java java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package
[ https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028429#comment-14028429 ] Owen O'Malley commented on HIVE-7208: - I think we need a broader refactoring. I think this change is a minor band-aid that will get in the way of the right fix. Even worse, it creates an incompatible change in the API. I think for better or worse, we need to leave the package name alone. move SearchArgument interface into serde package Key: HIVE-7208 URL: https://issues.apache.org/jira/browse/HIVE-7208 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-7208.patch For usage in alternative input formats/serdes, it might be useful to move SearchArgument class to a place that is not in ql (because it's hard to depend on ql). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5556) Pushdown join conditions
[ https://issues.apache.org/jira/browse/HIVE-5556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-5556: --- Labels: TODOC13 (was: ) Pushdown join conditions Key: HIVE-5556 URL: https://issues.apache.org/jira/browse/HIVE-5556 Project: Hive Issue Type: Sub-task Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-5556.1.patch, HIVE-5556.2.patch See details in HIVE- -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5459) Add --version option to hive script
[ https://issues.apache.org/jira/browse/HIVE-5459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-5459: --- Labels: TODOC13 (was: ) Add --version option to hive script --- Key: HIVE-5459 URL: https://issues.apache.org/jira/browse/HIVE-5459 Project: Hive Issue Type: Bug Components: Diagnosability Affects Versions: 0.11.0, 0.12.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-5459.1.patch, HIVE-5459.1.patch Hive jars already contain all the build information, similar to hadoop. This was added as part of HiveServer2 feature. We are still missing the command line wrapper to extract that information -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7219) Improve performance of serialization utils in ORC
[ https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7219: - Attachment: orc-read-perf-jmh-benchmark.png Ran some benchmarks to see reader improvements. Used JMH to run benchmarks with 10 warmup iterations and 10 benchmark iterations. Only the dataset that made use of bit packing were chosen for this benchmark. Number of rows for datasets are inventory_col2 and inventory_col4: 11745000 twitter_census_api_id: 24556361 twitter_search_id: 9396618 github_payload_size: 3216293 aol_querylog_epoch: 3558411 random.nexLong(): 1000 Improve performance of serialization utils in ORC - Key: HIVE-7219 URL: https://issues.apache.org/jira/browse/HIVE-7219 Project: Hive Issue Type: Improvement Components: File Formats Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7219.1.patch, orc-read-perf-jmh-benchmark.png ORC uses serialization utils heavily for reading and writing data. The bitpacking and unpacking code in writeInts() and readInts() can be unrolled for better performance. Also double reader/writer performance can be improved by bulk reading/writing from/to byte array. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5294) Create collect UDF and make evaluator reusable
[ https://issues.apache.org/jira/browse/HIVE-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-5294: --- Labels: TODOC13 (was: ) Create collect UDF and make evaluator reusable -- Key: HIVE-5294 URL: https://issues.apache.org/jira/browse/HIVE-5294 Project: Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-5294.1.patch.txt, HIVE-5294.patch.txt -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification
[ https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-1466: --- Labels: TODOC13 (was: ) Add NULL DEFINED AS to ROW FORMAT specification --- Key: HIVE-1466 URL: https://issues.apache.org/jira/browse/HIVE-1466 Project: Hive Issue Type: New Feature Components: SQL Reporter: Adam Kramer Assignee: Prasad Mujumdar Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch NULL values are passed to transformers as a literal backslash and a literal N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. This is inconsistent. The ROW FORMAT specification of tables should be able to specify the manner in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or '\003' or whatever should apply to all instances of table export and saving. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-3976: --- Labels: TODOC13 (was: ) Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: New Feature Components: Query Processor, Types Affects Versions: 0.11.0 Reporter: Mark Grover Assignee: Xuefu Zhang Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6385) UDF degrees() doesn't take decimal as input
[ https://issues.apache.org/jira/browse/HIVE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-6385: --- Labels: TODOC13 (was: ) UDF degrees() doesn't take decimal as input --- Key: HIVE-6385 URL: https://issues.apache.org/jira/browse/HIVE-6385 Project: Hive Issue Type: Improvement Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-6385.patch HIVE-6246 and HIVE-6327 added decimal support in most of the mathematical UDFs, including radians(). However, such support is still missing for UDF degrees(). This fills the gap. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4764) Support Kerberos HTTP authentication for HiveServer2 running in http mode
[ https://issues.apache.org/jira/browse/HIVE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-4764: --- Labels: TODOC13 (was: ) Support Kerberos HTTP authentication for HiveServer2 running in http mode - Key: HIVE-4764 URL: https://issues.apache.org/jira/browse/HIVE-4764 Project: Hive Issue Type: Sub-task Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Vaibhav Gumashta Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-4764.1.patch, HIVE-4764.2.patch, HIVE-4764.3.patch, HIVE-4764.4.patch, HIVE-4764.5.patch, HIVE-4764.6.patch Support Kerberos authentication for HiveServer2 running in http mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-2599: --- Labels: TODOC13 (was: ) Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Labels: TODOC13 Fix For: 0.13.0 Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.3.patch.txt, HIVE-2599.4.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7065: - Status: Patch Available (was: Reopened) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7065: - Attachment: HIVE-7065.2.patch HIVE-7065.2.patch is an ADDITIONAL patch to fix the regression. Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)
[ https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028493#comment-14028493 ] Benjamin Jakobus commented on HIVE-5019: Thanks - yes, sure. I will update it over the next few days (tomorrow or over the weekend). Use StringBuffer instead of += (issue 1) Key: HIVE-5019 URL: https://issues.apache.org/jira/browse/HIVE-5019 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Attachments: HIVE-5019.2.patch.txt, HIVE-5019.3.patch.txt Issue 1 - use of StringBuilder over += inside loops. java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java java/org/apache/hadoop/hive/ql/plan/PlanUtils.java java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java java/org/apache/hadoop/hive/ql/udf/UDFLike.java java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7065: - Status: Open (was: Patch Available) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.3.patch Ping. Rebased onto trunk. Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Fix Version/s: 0.14.0 Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Fix For: 0.14.0 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Status: Patch Available (was: Open) Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7208) move SearchArgument interface into serde package
[ https://issues.apache.org/jira/browse/HIVE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028516#comment-14028516 ] Sergey Shelukhin commented on HIVE-7208: can you elaborate on broader refactoring? I can keep the package name, I guess that will not break the API move SearchArgument interface into serde package Key: HIVE-7208 URL: https://issues.apache.org/jira/browse/HIVE-7208 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Attachments: HIVE-7208.patch For usage in alternative input formats/serdes, it might be useful to move SearchArgument class to a place that is not in ql (because it's hard to depend on ql). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7065: - Attachment: (was: HIVE-7065.2.patch) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7065: - Status: Patch Available (was: Open) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7065) Hive jobs in webhcat run in default mr mode even in Hive on Tez setup
[ https://issues.apache.org/jira/browse/HIVE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7065: - Attachment: HIVE-7065.2.patch Hive jobs in webhcat run in default mr mode even in Hive on Tez setup - Key: HIVE-7065 URL: https://issues.apache.org/jira/browse/HIVE-7065 Project: Hive Issue Type: Bug Components: Tez, WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7065.1.patch, HIVE-7065.2.patch, HIVE-7065.patch WebHCat config has templeton.hive.properties to specify Hive config properties that need to be passed to Hive client on node executing a job submitted through WebHCat (hive query, for example). this should include hive.execution.engine -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7188) sum(if()) returns wrong results with vectorization
[ https://issues.apache.org/jira/browse/HIVE-7188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028525#comment-14028525 ] Hive QA commented on HIVE-7188: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12649762/HIVE-7188.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5535 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas org.apache.hive.hcatalog.templeton.tool.TestTempletonUtils.testPropertiesParsing {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/440/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/440/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-440/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12649762 sum(if()) returns wrong results with vectorization -- Key: HIVE-7188 URL: https://issues.apache.org/jira/browse/HIVE-7188 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Attachments: HIVE-7188.1.patch, HIVE-7188.2.patch, hike-vector-sum-bug.tgz 1. The tgz file containing the setup is attached. 2. Run the following query select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; returns 0 rows with vectorization turned on whereas it return 131 rows with vectorization turned off. hive source insert.sql ; OK Time taken: 0.359 seconds OK Time taken: 0.015 seconds OK Time taken: 0.069 seconds OK Time taken: 0.176 seconds Loading data to table hike_error.ttr_day0 Table hike_error.ttr_day0 stats: [numFiles=1, numRows=0, totalSize=3581, rawDataSize=0] OK Time taken: 0.33 seconds hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134646_04790d3d-ca9a-427a-8cf9-3174536114ed.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:02,043 null map = 0%, reduce = 100% Ended Job = job_local773704964_0001 Execution completed successfully MapredLocal task succeeded OK 131 Time taken: 5.325 seconds, Fetched: 1 row(s) hive set hive.vectorized.execution.enabled=true; hive select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; Query ID = hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapred.reduce.tasks=number Execution log at: /var/folders/r0/9x0wltgx2nv4m4b18m71z1y4gr/T//hsubramaniyan/hsubramaniyan_20140606134747_1182c765-90ac-4a33-a8b1-760adca6bf38.log Job running in-process (local Hadoop) Hadoop job information for null: number of mappers: 0; number of reducers: 0 2014-06-06 13:47:18,604 null map = 0%, reduce = 100% Ended Job = job_local701415676_0001 Execution completed successfully MapredLocal task succeeded OK 0 Time taken: 5.52 seconds, Fetched: 1 row(s) hive explain select sum(if(is_returning=true and is_free=false,1,0)) as unpaid_returning from hike_error.ttr_day0; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1
[jira] [Commented] (HIVE-7195) Improve Metastore performance
[ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028545#comment-14028545 ] Mithun Radhakrishnan commented on HIVE-7195: [~sershe]: listPartitions(), etc. do have a max_parts parameter. I'm exploring the possibility of reducing the thrift traffic for partition-operations, for a given number of partitions. That would free us up to transfer metadata for more partitions, without fear of the metastore keeling over from heap-frag, etc. One way of doing that is to reduce redundancy when specifying multiple partitions. Abstracting how partitions are specified makes it possible to vary and extend this. Improve Metastore performance - Key: HIVE-7195 URL: https://issues.apache.org/jira/browse/HIVE-7195 Project: Hive Issue Type: Improvement Reporter: Brock Noland Priority: Critical Even with direct SQL, which significantly improves MS performance, some operations take a considerable amount of time, when there are many partitions on table. Specifically I believe the issue: * When a client gets all partitions we do not send them an iterator, we create a collection of all data and then pass the object over the network in total * Operations which require looking up data on the NN can still be slow since there is no cache of information and it's done in a serial fashion * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout to the server so it can calculate that the client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7220) Empty dir in external table causes issue (root_dir_external_table.q failure)
Szehon Ho created HIVE-7220: --- Summary: Empty dir in external table causes issue (root_dir_external_table.q failure) Key: HIVE-7220 URL: https://issues.apache.org/jira/browse/HIVE-7220 Project: Hive Issue Type: Bug Reporter: Szehon Ho While looking at root_dir_external_table.q failure, which is doing a query on an external table located at root ('/'), I noticed that latest Hadoop2 CombineFileInputFormat returns split representing empty directories (like '/Users'), which leads to failure in Hive's CombineFileRecordReader as it tries to open the directory for processing. Tried with an external table in a normal HDFS directory, and it also returns the same error. Looks like a real bug. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5019) Use StringBuffer instead of += (issue 1)
[ https://issues.apache.org/jira/browse/HIVE-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028555#comment-14028555 ] Thejas M Nair commented on HIVE-5019: - In the following change, there is a bug. The tmp needs to get 'reset' after the toString(). Not sure what the most efficient way to do that is (delete vs new StringBuilder). {code} +StringBuilder tmp = new StringBuilder(); for (String key : properties.keySet()) { if (properties.get(key) != null !duplicateProps.contains(key)) { -realProps.add( ' + key + '=' + - escapeHiveCommand(StringEscapeUtils.escapeJava(properties.get(key))) + '); +tmp.append( '); +tmp.append(key); +tmp.append('='); + tmp.append(escapeHiveCommand(StringEscapeUtils.escapeJava(properties.get(key; +tmp.append('); +realProps.add(tmp.toString()); } {code} This does make the code more verbose and less readable. I am not very convinced that in cases like the one above, the use of StringBuilder would make a difference. The compiler would usually replace + with use of StringBuilder in simple cases like this. bq. Yes, they do mostly replace + with StringBuilder.append(). However this is not always the case it seems. I ran some tests and they showed that using the StringBuilder when appending strings is 57% faster than using the + operator (using the StringBuffer took 122 milliseconds whilst the + operator took 284 milliseconds). Can you please upload the test code you used ? Can you try running it longer (say more than 5-10 seconds), so any noise is filtered out. Use StringBuffer instead of += (issue 1) Key: HIVE-5019 URL: https://issues.apache.org/jira/browse/HIVE-5019 Project: Hive Issue Type: Sub-task Reporter: Benjamin Jakobus Assignee: Benjamin Jakobus Attachments: HIVE-5019.2.patch.txt, HIVE-5019.3.patch.txt Issue 1 - use of StringBuilder over += inside loops. java/org/apache/hadoop/hive/ql/optimizer/physical/GenMRSkewJoinProcessor.java java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java java/org/apache/hadoop/hive/ql/plan/ConditionalResolverMergeFiles.java java/org/apache/hadoop/hive/ql/plan/PlanUtils.java java/org/apache/hadoop/hive/ql/security/authorization/BitSetCheckedAuthorizationProvider.java java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java java/org/apache/hadoop/hive/ql/udf/UDFLike.java java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFSentences.java java/org/apache/hadoop/hive/ql/udf/generic/NumDistinctValueEstimator.java java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7195) Improve Metastore performance
[ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028562#comment-14028562 ] Sergey Shelukhin commented on HIVE-7195: Yeah, we were discussing this in Hadoop summit w/Chris and Selena (I hope I remembered the names right), and Alan. We can get rid of individual thrift partition objects and store them more efficiently. Another thing we can do, together with that approach, is make sure APIs only populate things that are necessary, most places don't need full partition object in all its glory. The problem with that is that all parts of partition objects are necessary somewhere, so API will need to be augmented to explicitly say what is needed/not needed. Improve Metastore performance - Key: HIVE-7195 URL: https://issues.apache.org/jira/browse/HIVE-7195 Project: Hive Issue Type: Improvement Reporter: Brock Noland Priority: Critical Even with direct SQL, which significantly improves MS performance, some operations take a considerable amount of time, when there are many partitions on table. Specifically I believe the issue: * When a client gets all partitions we do not send them an iterator, we create a collection of all data and then pass the object over the network in total * Operations which require looking up data on the NN can still be slow since there is no cache of information and it's done in a serial fashion * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout to the server so it can calculate that the client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN
[ https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated HIVE-5857: Attachment: HIVE-5857.2.patch Reduce tasks do not work in uber mode in YARN - Key: HIVE-5857 URL: https://issues.apache.org/jira/browse/HIVE-5857 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.0, 0.13.1 Reporter: Adam Kawa Assignee: Adam Kawa Priority: Critical Labels: plan, uber-jar, uberization, yarn Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch A Hive query fails when it tries to run a reduce task in uber mode in YARN. The NullPointerException is thrown in the ExecReducer.configure method, because the plan file (reduce.xml) for a reduce task is not found. The Utilities.getBaseWork method is expected to return BaseWork object, but it returns NULL due to FileNotFoundException. {code} // org.apache.hadoop.hive.ql.exec.Utilities public static BaseWork getBaseWork(Configuration conf, String name) { ... try { ... if (gWork == null) { Path localPath; if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { localPath = new Path(name); } InputStream in = new FileInputStream(localPath.toUri().getPath()); BaseWork ret = deserializePlan(in); } return gWork; } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... } {code} It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method returns true, because immediately before running a reduce task, org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to local mode (mapreduce.framework.name is changed from yarn to local). On the other hand map tasks run successfully, because its configuration is not changed and still remains yarn. {code} // org.apache.hadoop.mapred.LocalContainerLauncher private void runSubtask(..) { ... conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.MASTER_ADDRESS, local); // bypass shuffle ReduceTask reduce = (ReduceTask)task; reduce.setConf(conf); reduce.run(conf, umbilical); } {code} A super quick fix could just an additional if-branch, where we check if we run a reduce task in uber mode, and then look for a plan file in a different location. *Java stacktrace* {code} 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116) ... 12 more 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1384392632998_34791_r_00_0 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1384392632998_34791_r_00_0 is : 0.0 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner]
[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN
[ https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated HIVE-5857: Status: In Progress (was: Patch Available) Reduce tasks do not work in uber mode in YARN - Key: HIVE-5857 URL: https://issues.apache.org/jira/browse/HIVE-5857 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: Adam Kawa Assignee: Adam Kawa Priority: Critical Labels: plan, uber-jar, uberization, yarn Fix For: 0.13.0 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch A Hive query fails when it tries to run a reduce task in uber mode in YARN. The NullPointerException is thrown in the ExecReducer.configure method, because the plan file (reduce.xml) for a reduce task is not found. The Utilities.getBaseWork method is expected to return BaseWork object, but it returns NULL due to FileNotFoundException. {code} // org.apache.hadoop.hive.ql.exec.Utilities public static BaseWork getBaseWork(Configuration conf, String name) { ... try { ... if (gWork == null) { Path localPath; if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { localPath = new Path(name); } InputStream in = new FileInputStream(localPath.toUri().getPath()); BaseWork ret = deserializePlan(in); } return gWork; } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... } {code} It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method returns true, because immediately before running a reduce task, org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to local mode (mapreduce.framework.name is changed from yarn to local). On the other hand map tasks run successfully, because its configuration is not changed and still remains yarn. {code} // org.apache.hadoop.mapred.LocalContainerLauncher private void runSubtask(..) { ... conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.MASTER_ADDRESS, local); // bypass shuffle ReduceTask reduce = (ReduceTask)task; reduce.setConf(conf); reduce.run(conf, umbilical); } {code} A super quick fix could just an additional if-branch, where we check if we run a reduce task in uber mode, and then look for a plan file in a different location. *Java stacktrace* {code} 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116) ... 12 more 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1384392632998_34791_r_00_0 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1384392632998_34791_r_00_0 is : 0.0 2013-11-20 00:50:56,862 INFO
[jira] [Commented] (HIVE-7195) Improve Metastore performance
[ https://issues.apache.org/jira/browse/HIVE-7195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028564#comment-14028564 ] Sergey Shelukhin commented on HIVE-7195: And yeah the 3rd thing is iterators. We don't really need to keep things on server for that, client can send all the necessary stuff to restore the iterator. We can make it fully stateless by e.g. issuing the same queries with some added limit to get next page, or cache records in metastore (might cause problems with memory). Also presumably iterator will have to operate within externally called openTransaction, otherwise the set may not be consistent. Improve Metastore performance - Key: HIVE-7195 URL: https://issues.apache.org/jira/browse/HIVE-7195 Project: Hive Issue Type: Improvement Reporter: Brock Noland Priority: Critical Even with direct SQL, which significantly improves MS performance, some operations take a considerable amount of time, when there are many partitions on table. Specifically I believe the issue: * When a client gets all partitions we do not send them an iterator, we create a collection of all data and then pass the object over the network in total * Operations which require looking up data on the NN can still be slow since there is no cache of information and it's done in a serial fashion * Perhaps a tangent, but our client timeout is quite dumb. The client will timeout and the server has no idea the client is gone. We should use deadlines, i.e. pass the timeout to the server so it can calculate that the client has expired. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN
[ https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated HIVE-5857: Fix Version/s: 0.13.0 Status: Patch Available (was: Open) Reduce tasks do not work in uber mode in YARN - Key: HIVE-5857 URL: https://issues.apache.org/jira/browse/HIVE-5857 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: Adam Kawa Assignee: Adam Kawa Priority: Critical Labels: plan, uber-jar, uberization, yarn Fix For: 0.13.0 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch A Hive query fails when it tries to run a reduce task in uber mode in YARN. The NullPointerException is thrown in the ExecReducer.configure method, because the plan file (reduce.xml) for a reduce task is not found. The Utilities.getBaseWork method is expected to return BaseWork object, but it returns NULL due to FileNotFoundException. {code} // org.apache.hadoop.hive.ql.exec.Utilities public static BaseWork getBaseWork(Configuration conf, String name) { ... try { ... if (gWork == null) { Path localPath; if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { localPath = new Path(name); } InputStream in = new FileInputStream(localPath.toUri().getPath()); BaseWork ret = deserializePlan(in); } return gWork; } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... } {code} It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method returns true, because immediately before running a reduce task, org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to local mode (mapreduce.framework.name is changed from yarn to local). On the other hand map tasks run successfully, because its configuration is not changed and still remains yarn. {code} // org.apache.hadoop.mapred.LocalContainerLauncher private void runSubtask(..) { ... conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.MASTER_ADDRESS, local); // bypass shuffle ReduceTask reduce = (ReduceTask)task; reduce.setConf(conf); reduce.run(conf, umbilical); } {code} A super quick fix could just an additional if-branch, where we check if we run a reduce task in uber mode, and then look for a plan file in a different location. *Java stacktrace* {code} 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116) ... 12 more 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1384392632998_34791_r_00_0 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1384392632998_34791_r_00_0 is : 0.0 2013-11-20
[jira] [Updated] (HIVE-5857) Reduce tasks do not work in uber mode in YARN
[ https://issues.apache.org/jira/browse/HIVE-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kawa updated HIVE-5857: Status: Patch Available (was: In Progress) Reduce tasks do not work in uber mode in YARN - Key: HIVE-5857 URL: https://issues.apache.org/jira/browse/HIVE-5857 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.1, 0.13.0, 0.12.0 Reporter: Adam Kawa Assignee: Adam Kawa Priority: Critical Labels: plan, uber-jar, uberization, yarn Fix For: 0.13.0 Attachments: HIVE-5857.1.patch.txt, HIVE-5857.2.patch A Hive query fails when it tries to run a reduce task in uber mode in YARN. The NullPointerException is thrown in the ExecReducer.configure method, because the plan file (reduce.xml) for a reduce task is not found. The Utilities.getBaseWork method is expected to return BaseWork object, but it returns NULL due to FileNotFoundException. {code} // org.apache.hadoop.hive.ql.exec.Utilities public static BaseWork getBaseWork(Configuration conf, String name) { ... try { ... if (gWork == null) { Path localPath; if (ShimLoader.getHadoopShims().isLocalMode(conf)) { localPath = path; } else { localPath = new Path(name); } InputStream in = new FileInputStream(localPath.toUri().getPath()); BaseWork ret = deserializePlan(in); } return gWork; } catch (FileNotFoundException fnf) { // happens. e.g.: no reduce work. LOG.debug(No plan file found: +path); return null; } ... } {code} It happens because, the ShimLoader.getHadoopShims().isLocalMode(conf)) method returns true, because immediately before running a reduce task, org.apache.hadoop.mapred.LocalContainerLauncher changes its configuration to local mode (mapreduce.framework.name is changed from yarn to local). On the other hand map tasks run successfully, because its configuration is not changed and still remains yarn. {code} // org.apache.hadoop.mapred.LocalContainerLauncher private void runSubtask(..) { ... conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.MASTER_ADDRESS, local); // bypass shuffle ReduceTask reduce = (ReduceTask)task; reduce.setConf(conf); reduce.run(conf, umbilical); } {code} A super quick fix could just an additional if-branch, where we check if we run a reduce task in uber mode, and then look for a plan file in a different location. *Java stacktrace* {code} 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: hdfs://namenode.c.lon.spotify.net:54310/var/tmp/kawaa/hive_2013-11-20_00-50-43_888_3938384086824086680-2/-mr-10003/e3caacf6-15d6-4987-b186-d2906791b5b0/reduce.xml 2013-11-20 00:50:56,862 WARN [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Exception running local (uberized) 'child' : java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:427) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:225) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:116) ... 12 more 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1384392632998_34791_r_00_0 2013-11-20 00:50:56,862 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1384392632998_34791_r_00_0 is : 0.0 2013-11-20 00:50:56,862 INFO
Re: Review Request 22478: HIVE-7188 sum(if()) returns wrong results with vectorization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22478/#review45438 --- ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ColAndCol.java https://reviews.apache.org/r/22478/#comment80300 Please confirm hive semantics that NULL AND FALSE is FALSE and not NULL. - Jitendra Pandey On June 11, 2014, 9:23 p.m., Hari Sankar Sivarama Subramaniyan wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22478/ --- (Updated June 11, 2014, 9:23 p.m.) Review request for hive, Gopal V and Jitendra Pandey. Bugs: HIVE-7188 https://issues.apache.org/jira/browse/HIVE-7188 Repository: hive-git Description --- ColAndCol.evaluate() is incorrectly implemented. Needed to rewrite the evaluate(). Also added junit tests. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/ColAndCol.java cb2a952 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/expressions/TestVectorLogicalExpressions.java 3df7c14 Diff: https://reviews.apache.org/r/22478/diff/ Testing --- Thanks, Hari Sankar Sivarama Subramaniyan