[jira] [Commented] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
[ https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995478#comment-13995478 ] Prasanth J commented on HIVE-7042: -- Thanks Ashutosh! Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2 -- Key: HIVE-7042 URL: https://issues.apache.org/jira/browse/HIVE-7042 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7042.1.patch, HIVE-7042.1.patch.txt stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield different results for these tests. ORC should use HiveIF to generate ORC splits. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7035) Templeton returns 500 for user errors - when job cannot be found
[ https://issues.apache.org/jira/browse/HIVE-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7035: Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution Eugene! Templeton returns 500 for user errors - when job cannot be found Key: HIVE-7035 URL: https://issues.apache.org/jira/browse/HIVE-7035 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7035.patch curl -i 'http://localhost:50111/templeton/v1/jobs/job_139949638_00011?user.name=ekoifman' should return HTTP Status code 4xx when no such job exists; it currently returns 500. {noformat} {error:org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_201304291205_0015' doesn't exist in RM.\r\n\tat org.apache.hadoop.yarn.server.resourcemanager .ClientRMService.getApplicationReport(ClientRMService.java:247)\r\n\tat org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocol PBServiceImpl.java:120)\r\n\tat org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:241)\r\n\tat org.apache.hado op.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)\r\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Serve r.java:2053)\r\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)\r\n\tat java.security.AccessController.doPrivileged(Native Method)\r\n\tat javax.security.auth.Subject.doAs(Subject.ja va:415)\r\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)\r\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2047)\r\n} {noformat} NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated HIVE-5733: - Description: Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. was: Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shadowed version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shadowing any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-4719) EmbeddedLockManager should be shared to all clients
[ https://issues.apache.org/jira/browse/HIVE-4719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-4719: Attachment: HIVE-4719.6.patch.txt EmbeddedLockManager should be shared to all clients --- Key: HIVE-4719 URL: https://issues.apache.org/jira/browse/HIVE-4719 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-4719.5.patch.txt, HIVE-4719.6.patch.txt, HIVE-4719.D11229.1.patch, HIVE-4719.D11229.2.patch, HIVE-4719.D11229.3.patch, HIVE-4719.D11229.4.patch Currently, EmbeddedLockManager is created per Driver instance, so locking has no meaning. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated May 9, 2014, 12:23 a.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Changes --- Rebased with the latest commit. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs (updated) - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_nested_complex.q PRE-CREATION ql/src/test/queries/clientpositive/avro_nullable_fields.q f90ceb9 ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_nested_complex.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_nullable_fields.q.out 77a6a2e ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 4564e75 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 67d5570 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Status: Open (was: Patch Available) When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6374) Hive job submitted with non-default name node (fs.default.name) doesn't process locations properly
[ https://issues.apache.org/jira/browse/HIVE-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995914#comment-13995914 ] Ashutosh Chauhan commented on HIVE-6374: +1 Hive job submitted with non-default name node (fs.default.name) doesn't process locations properly --- Key: HIVE-6374 URL: https://issues.apache.org/jira/browse/HIVE-6374 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0, 0.12.0, 0.13.0 Environment: Any Reporter: Benjamin Zhitomirsky Assignee: Benjamin Zhitomirsky Attachments: Design of the fix HIVE-6374.docx, hive-6374.1.patch, hive-6374.3.patch, hive-6374.patch Original Estimate: 168h Remaining Estimate: 168h Create table/index/database and add partition DDL doesn't work properly if all following conditions are true: - Metastore service is used - fs.default.name is specified and it differs from the default one - Location is not specified or specified as a not fully qualified URI The root cause of this behavior is that Hive client doesn't pass configuration context to the metastore services which tries to resolve the paths. The fix is it too resolve the path in the Hive client if fs.default.name is specified and it differs from the default one (it is must easier then start passing the context, which would be a major change). The CR will submitted shortly after tests are done -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Status: Open (was: Patch Available) When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated HIVE-5733: - Description: Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shadowed version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shadowing any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. was: Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shadowing all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shadowed version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shadowing any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shadowed version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shadowing any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 18492: HIVE-6473: Allow writing HFiles via HBaseStorageHandler table
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18492/ --- (Updated May 13, 2014, 4:07 a.m.) Review request for hive. Changes --- Updating diff with HIVE-6473.1.patch.txt from JIRA. Bugs: HIVE-6473 https://issues.apache.org/jira/browse/HIVE-6473 Repository: hive-git Description --- From the JIRA: Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. Diffs (updated) - hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 4fe1b1b hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java be1210e hbase-handler/src/test/queries/negative/generatehfiles_require_family_path.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_bulk.m f8bb47d hbase-handler/src/test/queries/positive/hbase_bulk.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_handler_bulk.q PRE-CREATION hbase-handler/src/test/results/negative/generatehfiles_require_family_path.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_bulk.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_handler_bulk.q.out PRE-CREATION Diff: https://reviews.apache.org/r/18492/diff/ Testing --- Thanks, nick dimiduk
[jira] [Commented] (HIVE-5342) Remove pre hadoop-0.20.0 related codes
[ https://issues.apache.org/jira/browse/HIVE-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996028#comment-13996028 ] Hive QA commented on HIVE-5342: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644454/HIVE-5342.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/184/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/184/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12644454 Remove pre hadoop-0.20.0 related codes -- Key: HIVE-5342 URL: https://issues.apache.org/jira/browse/HIVE-5342 Project: Hive Issue Type: Task Reporter: Navis Assignee: Jason Dere Priority: Trivial Attachments: D13047.1.patch, HIVE-5342.1.patch, HIVE-5342.2.patch Recently, we discussed not supporting hadoop-0.20.0. If it would be done like that or not, 0.17 related codes would be removed before that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6601) alter database commands should support schema synonym keyword
[ https://issues.apache.org/jira/browse/HIVE-6601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995393#comment-13995393 ] Thejas M Nair commented on HIVE-6601: - This is the case with the other alter database command as well - ALTER DATABASE database_name SET DBPROPERTIES alter database commands should support schema synonym keyword - Key: HIVE-6601 URL: https://issues.apache.org/jira/browse/HIVE-6601 Project: Hive Issue Type: Bug Reporter: Thejas M Nair It should be possible to use alter schema as an alternative to alter database. But the syntax is not currently supported. {code} alter schema db1 set owner user x; NoViableAltException(215@[]) FAILED: ParseException line 1:6 cannot recognize input near 'schema' 'db1' 'set' in alter statement {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7051) Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION
Prasanth J created HIVE-7051: Summary: Display partition level column stats in DESCRIBE EXTENDED/FORMATTED PARTITION Key: HIVE-7051 URL: https://issues.apache.org/jira/browse/HIVE-7051 Project: Hive Issue Type: Bug Reporter: Prasanth J Same as HIVE-7050 but for partitions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6957) SQL authorization does not work with HS2 binary mode and Kerberos auth
[ https://issues.apache.org/jira/browse/HIVE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6957: --- Fix Version/s: 0.13.1 SQL authorization does not work with HS2 binary mode and Kerberos auth -- Key: HIVE-6957 URL: https://issues.apache.org/jira/browse/HIVE-6957 Project: Hive Issue Type: Bug Components: Authorization, HiveServer2 Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.14.0, 0.13.1 Attachments: HIVE-6957.04-branch.0.13.patch, HIVE-6957.1.patch, HIVE-6957.2.patch, HIVE-6957.3.patch, HIVE-6957.4.patch In HiveServer2, when Kerberos auth and binary transport modes are used, the user name that gets passed on to authorization is the long kerberos username. The username that is used in grant/revoke statements tend to be the short usernames. This also fails in authorizing statements that involve URI, as the authorization mode checks the file system permissions for given user. It does not recognize that the given long username actually owns the file or belongs to the group that owns the file. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Status: Patch Available (was: Open) Re-uploading as Hive QA failed to run tests. When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch, HIVE-7043.4.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6394) Implement Timestmap in ParquetSerde
[ https://issues.apache.org/jira/browse/HIVE-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-6394: Attachment: HIVE-6394.2.patch Adding unit tests. Implement Timestmap in ParquetSerde --- Key: HIVE-6394 URL: https://issues.apache.org/jira/browse/HIVE-6394 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Jarek Jarcec Cecho Assignee: Szehon Ho Labels: Parquet Attachments: HIVE-6394.2.patch, HIVE-6394.patch This JIRA is to implement timestamp support in Parquet SerDe. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995897#comment-13995897 ] Mohammad Kamrul Islam commented on HIVE-3159: - Recently committed HIVE-5823, added some bug. I created a separate JIRA (HIVE-7049) to address this. Uploaded a patch for that. Update AvroSerde to determine schema of new tables -- Key: HIVE-3159 URL: https://issues.apache.org/jira/browse/HIVE-3159 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.12.0 Reporter: Jakob Homan Assignee: Mohammad Kamrul Islam Attachments: HIVE-3159.10.patch, HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159.7.patch, HIVE-3159.9.patch, HIVE-3159v1.patch Currently when writing tables to Avro one must manually provide an Avro schema that matches what is being delivered by Hive. It'd be better to have the serde infer this schema by converting the table's TypeInfo into an appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6772) Virtual columns when used with Lateral View Explode results in SemanticException [Error 10004]
[ https://issues.apache.org/jira/browse/HIVE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992558#comment-13992558 ] Navis commented on HIVE-6772: - I think this is fixed by HIVE-3226 and others. Virtual columns when used with Lateral View Explode results in SemanticException [Error 10004] -- Key: HIVE-6772 URL: https://issues.apache.org/jira/browse/HIVE-6772 Project: Hive Issue Type: Bug Affects Versions: 0.9.0 Environment: Red Hat Enterprise Linux Server release 6.3 (Santiago) Hadoop 2.0.0-cdh4.1.2 Hive 0.9.0 Reporter: Steve Ogden Priority: Minor When using the virtual columns with 'lateral view explode', I get the following error: FAILED: SemanticException [Error 10004]: Line 3:22 Invalid table alias or column reference 'INPUT__FILE__NAME': (possible column names are: _col0, _col1, _col2, _col3, _col4, _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15, _col16, _col17, _col18, _col19, _col20, _col21, _col22) Here is the query: select newMd5(concat(INPUT__FILE__NAME,BLOCK__OFFSET__INSIDE__FILE)) ukey, flat_ric_cd as ric_cd from edwpoc.ts_rtd_gs_stg lateral view explode(split(ric_cd,',')) subView as flat_ric_cd -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7033) grant statements should check if the role exists
[ https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7033: Attachment: HIVE-7033.3.patch Thanks for pointing that out Ashutosh! HIVE-7033.3.patch - changes to avoid TOCTOU issue. grant statements should check if the role exists Key: HIVE-7033 URL: https://issues.apache.org/jira/browse/HIVE-7033 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7016) Hive returns wrong results when execute UDF on top of DISTINCT column
[ https://issues.apache.org/jira/browse/HIVE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993829#comment-13993829 ] Ashutosh Chauhan commented on HIVE-7016: +1 Hive returns wrong results when execute UDF on top of DISTINCT column - Key: HIVE-7016 URL: https://issues.apache.org/jira/browse/HIVE-7016 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.13.1 Reporter: Selina Zhang Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7016.1.patch.txt The following query returns wrong result: select hash(distinct value) from table; This kind of query should be identified as syntax error. However, Hive ignores DISTINCT and returns the result. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7013) Partition of type int has ambiguity for path like field=01
[ https://issues.apache.org/jira/browse/HIVE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995373#comment-13995373 ] Ashutosh Chauhan commented on HIVE-7013: [~reno] There are number of improvements in this area in later versions of Hive. 0.9 is too old. Can you try this with 0.13 ? Partition of type int has ambiguity for path like field=01 -- Key: HIVE-7013 URL: https://issues.apache.org/jira/browse/HIVE-7013 Project: Hive Issue Type: Bug Reporter: Peng Zhang 1. store data in path like /hive/table/year=2014/month=01/day=01 2.create table with partitioned by (year int, month int, day int) 3. add partition(year=2014, month=1, day=1) add partition(year=2014, month=01, day=01) This will create two partitions and locations are /year=2014/month=1/day=1 and year=2014/month=01/day=01 seperately. 4. select where month=1 = no data select where month=01 = no data select where month=01 = OK I tested this scenario in 0.9, and add partition(year=2014, month=1) with select where month=1 works. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Attachment: HIVE-6473.1.patch.txt Rebased to trunk, addressing RB comments, fixed broken tests. Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch.txt Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6473) Allow writing HFiles via HBaseStorageHandler table
[ https://issues.apache.org/jira/browse/HIVE-6473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6473: --- Status: Patch Available (was: Open) Allow writing HFiles via HBaseStorageHandler table -- Key: HIVE-6473 URL: https://issues.apache.org/jira/browse/HIVE-6473 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Assignee: Nick Dimiduk Attachments: HIVE-6473.0.patch.txt, HIVE-6473.1.patch.txt Generating HFiles for bulkload into HBase could be more convenient. Right now we require the user to register a new table with the appropriate output format. This patch allows the exact same functionality, but through an existing table managed by the HBaseStorageHandler. -- This message was sent by Atlassian JIRA (v6.2#6252)
How to remote debug WebHCat?
Hi Folks, Is there a way to remote debug webhcat? If so, how to enable the remote debug? Thanks, Na
[jira] [Updated] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-7049: Attachment: HIVE-7049.1.patch patch uploaded Unable to deserialize AVRO data when file schema and record schema are different and nullable - Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-7049.1.patch It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6910) Invalid column access info for partitioned table
[ https://issues.apache.org/jira/browse/HIVE-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995557#comment-13995557 ] Ashutosh Chauhan commented on HIVE-6910: Patch looks good. But looks like there are few changes which may not be essential for the patch. Left comments on RB. Invalid column access info for partitioned table Key: HIVE-6910 URL: https://issues.apache.org/jira/browse/HIVE-6910 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0, 0.13.0 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6910.1.patch.txt, HIVE-6910.2.patch.txt, HIVE-6910.3.patch.txt, HIVE-6910.4.patch.txt From http://www.mail-archive.com/user@hive.apache.org/msg11324.html neededColumnIDs in TS is only for non-partition columns. But ColumnAccessAnalyzer is calculating it on all columns. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7033) grant statements should check if the role exists
[ https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7033: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Thejas! grant statements should check if the role exists Key: HIVE-7033 URL: https://issues.apache.org/jira/browse/HIVE-7033 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.14.0 Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch, HIVE-7033.3.patch, HIVE-7033.4.patch The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7031) Utiltites.createEmptyFile uses File.Separator instead of Path.Separator to create an empty file in HDFS
[ https://issues.apache.org/jira/browse/HIVE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993499#comment-13993499 ] Hive QA commented on HIVE-7031: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12643845/HIVE-7031.1.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5433 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_partscan_1_23 org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/150/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/150/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12643845 Utiltites.createEmptyFile uses File.Separator instead of Path.Separator to create an empty file in HDFS --- Key: HIVE-7031 URL: https://issues.apache.org/jira/browse/HIVE-7031 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Fix For: 0.14.0 Attachments: HIVE-7031.1.patch This leads to inconsitent HDFS naming for empty partition/tables where a file might be named as hdfs://headnode0:9000/hive/scratch/hive_2 014-04-07_22-39-52_649_4046112898053848089-1/-mr-10010\0 in windows operating system -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6908) TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures
[ https://issues.apache.org/jira/browse/HIVE-6908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6908: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Szehon! TestThriftBinaryCLIService.testExecuteStatementAsync has intermittent failures -- Key: HIVE-6908 URL: https://issues.apache.org/jira/browse/HIVE-6908 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Fix For: 0.14.0 Attachments: HIVE-6908.patch This has failed sometimes in the pre-commit tests. ThriftCLIServiceTest.testExecuteStatementAsync runs two statements. They are given 100 second timeout total, not sure if its by intention. As the first is a select query, it will take a majority of the time. The second statement (create table) should be quicker, but it fails sometimes because timeout is already mostly used up. The timeout should probably be reset after the first statement. If the operation finishes before the timeout, it wont have any effect as it'll break out. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7049) Unable to deserialize AVRO data when file schema and record schema are different and nullable
[ https://issues.apache.org/jira/browse/HIVE-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996660#comment-13996660 ] Xuefu Zhang commented on HIVE-7049: --- Thanks for bringing this up. I'm wondering if the situation you described is an issue of incompatibility of schemas rather than a bug. Record schema says that a field is union (nullable), while file schema says that the file is not a union, which seems suggesting that the data is not compatible with the schema. While we may need to provided a better error message for this, ignoring the file schema (by passing NULL down) will very likely break decimal support, which needs the file schema to read data correctly. Unable to deserialize AVRO data when file schema and record schema are different and nullable - Key: HIVE-7049 URL: https://issues.apache.org/jira/browse/HIVE-7049 Project: Hive Issue Type: Bug Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-7049.1.patch It mainly happens when 1 )file schema and record schema are not same 2 ) Record schema is nullable but file schema is not. The potential code location is at class AvroDeserialize {noformat} if(AvroSerdeUtils.isNullableType(recordSchema)) { return deserializeNullableUnion(datum, fileSchema, recordSchema, columnType); } {noformat} In the above code snippet, recordSchema is verified if it is nullable. But the file schema is not checked. I tested with these values: {noformat} recordSchema= [null,string] fielSchema= string {noformat} And i got the following exception line numbers might not be the same due to mu debugged code version. {noformat} org.apache.avro.AvroRuntimeException: Not a union: string at org.apache.avro.Schema.getTypes(Schema.java:272) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeNullableUnion(AvroDeserializer.java:275) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.worker(AvroDeserializer.java:205) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.workerBase(AvroDeserializer.java:188) at org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserialize(AvroDeserializer.java:174) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.verifyNullableType(TestAvroDeserializer.java:487) at org.apache.hadoop.hive.serde2.avro.TestAvroDeserializer.canDeserializeNullableTypes(TestAvroDeserializer.java:407) {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7012) Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer
[ https://issues.apache.org/jira/browse/HIVE-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7012: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Navis! Wrong RS de-duplication in the ReduceSinkDeDuplication Optimizer Key: HIVE-7012 URL: https://issues.apache.org/jira/browse/HIVE-7012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Sun Rui Assignee: Navis Fix For: 0.14.0 Attachments: HIVE-7012.1.patch.txt, HIVE-7012.2.patch.txt With HIVE 0.13.0, run the following test case: {code:sql} create table src(key bigint, value string); select count(distinct key) as col0 from src order by col0; {code} The following exception will be thrown: {noformat} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:173) ... 14 more Caused by: java.lang.RuntimeException: cannot find field _col0 from [0:reducesinkkey0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:415) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:150) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:79) at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:288) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:166) ... 14 more {noformat} This issue is related to HIVE-6455. When hive.optimize.reducededuplication is set to false, then this issue will be gone. Logical plan when hive.optimize.reducededuplication=false; {noformat} src TableScan (TS_0) alias: src Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Select Operator (SEL_1) expressions: key (type: bigint) outputColumnNames: key Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_2) aggregations: count(DISTINCT key) keys: key (type: bigint) mode: hash outputColumnNames: _col0, _col1 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Reduce Output Operator (RS_3) istinctColumnIndices: key expressions: _col0 (type: bigint) DistributionKeys: 0 sort order: + OutputKeyColumnNames: _col0 Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: NONE Group By Operator (GBY_4) aggregations: count(DISTINCT KEY._col0:0._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Select Operator (SEL_5) expressions: _col0 (type: bigint) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 16 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator (RS_6) key expressions: _col0 (type: bigint)
Re: Review Request 21138: Support more generic way of using composite key for HBaseHandler
On May 12, 2014, 4:56 a.m., Swarnim Kulkarni wrote: hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java, line 132 https://reviews.apache.org/r/21138/diff/1/?file=575776#file575776line132 That said, I am also not a 100% positive on why Navis chose a FamilyFilter here. In my latest patch, I updated the setupFilter method to be protected so that it can be easily overridden. I'll ask Navis for his choice of FamilyFilter here. If we don't get a response, my vote will be to proceed with the protected scope of this method and log a follow up JIRA to clean this up. Okay. Makes sense. Could you log the followup JIRA and link it with this issue. - Xuefu --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21138/#review42667 --- On May 8, 2014, 3:42 p.m., Swarnim Kulkarni wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/21138/ --- (Updated May 8, 2014, 3:42 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} Diffs - hbase-handler/pom.xml 132af43 hbase-handler/src/java/org/apache/hadoop/hive/hbase/AbstractHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/ColumnMappings.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/CompositeHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/DefaultHBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseCompositeKey.java 5008f15 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseKeyFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseLazyObjectFactory.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseRowSerializer.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseScanRange.java PRE-CREATION hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDe.java 5fe35a5 hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseSerDeParameters.java b64590d hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java 4fe1b1b hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java 142bfd8 hbase-handler/src/java/org/apache/hadoop/hive/hbase/LazyHBaseRow.java fc40195 hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestCompositeKey.java 13c344b hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseKeyFactory2.java PRE-CREATION hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestLazyHBaseObject.java 7c4fc9f hbase-handler/src/test/queries/positive/hbase_custom_key.q PRE-CREATION hbase-handler/src/test/queries/positive/hbase_custom_key2.q PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key.q.out PRE-CREATION hbase-handler/src/test/results/positive/hbase_custom_key2.q.out PRE-CREATION itests/util/pom.xml e9720df ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 113227d ql/src/java/org/apache/hadoop/hive/ql/index/IndexPredicateAnalyzer.java d39ee2e ql/src/java/org/apache/hadoop/hive/ql/index/IndexSearchCondition.java 5f1329c ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 4921966 ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcStruct.java 293b74e ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java 2a7fdf9 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveStoragePredicateHandler.java 9f35575 ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java e50026b ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java ecb82d7 ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java c0a8269 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 5f32f2d
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Attachment: HIVE-7043.4.patch When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch, HIVE-7043.4.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6846) allow safe set commands with sql standard authorization
[ https://issues.apache.org/jira/browse/HIVE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995446#comment-13995446 ] Thejas M Nair commented on HIVE-6846: - This is the default list of safe set command that this patch allows : hive.exec.reducers.bytes.per.reducer hive.exec.reducers.max hive.map.aggr hive.map.aggr.hash.percentmemory hive.map.aggr.hash.force.flush.memory.threshold hive.map.aggr.hash.min.reduction hive.groupby.skewindata hive.optimize.multigroupby.common.distincts hive.optimize.index.groupby hive.optimize.ppd hive.optimize.ppd.storage hive.optimize.ppd.storage hive.ppd.recognizetransivity hive.optimize.groupby hive.optimize.sort.dynamic.partition hive.optimize.skewjoin.compiletime hive.optimize.union.remove hive.multigroupby.singlereducer hive.map.groupby.sorted hive.map.groupby.sorted.testmode hive.optimize.skewjoin hive.optimize.skewjoin.compiletime hive.mapred.mode hive.enforce.bucketmapjoin hive.exec.compress.output hive.exec.compress.intermediate hive.exec.parallel hive.exec.parallel.thread.number hive.exec.parallel.thread.number hive.exec.rowoffset hive.merge.mapfiles hive.merge.mapredfiles hive.merge.tezfiles hive.ignore.mapjoin.hint hive.auto.convert.join hive.auto.convert.join.noconditionaltask hive.auto.convert.join.noconditionaltask.size hive.auto.convert.join.use.nonstaged hive.auto.convert.join.noconditionaltask hive.auto.convert.join.noconditionaltask.size hive.auto.convert.join.use.nonstaged hive.enforce.bucketing hive.enforce.sorting hive.enforce.sortmergebucketmapjoin hive.auto.convert.sortmerge.join hive.execution.engine hive.vectorized.execution.enabled hive.mapjoin.optimized.keys hive.mapjoin.lazy.hashtable hive.exec.check.crossproducts hive.compat hive.exec.dynamic.partition.mode mapred.reduce.tasks mapred.output.compression.codec mapred.map.output.compression.codec mapreduce.job.reduce.slowstart.completedmaps mapreduce.job.queuename allow safe set commands with sql standard authorization --- Key: HIVE-6846 URL: https://issues.apache.org/jira/browse/HIVE-6846 Project: Hive Issue Type: Bug Components: Authorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-6846.1.patch, HIVE-6846.2.patch, HIVE-6846.3.patch HIVE-6827 disables all set commands when SQL standard authorization is turned on, but not all set commands are unsafe. We should allow safe set commands. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996633#comment-13996633 ] Sergey Shelukhin commented on HIVE-6430: will commit today evening MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6411) Support more generic way of using composite key for HBaseHandler
[ https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6411: -- Resolution: Fixed Fix Version/s: 0.14.0 Release Note: The new feature needs to be documented at Hive-HBase integration page. Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks to Navis and Swarnim for working on the patch. Support more generic way of using composite key for HBaseHandler Key: HIVE-6411 URL: https://issues.apache.org/jira/browse/HIVE-6411 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Fix For: 0.14.0 Attachments: HIVE-6411.1.patch.txt, HIVE-6411.10.patch.txt, HIVE-6411.11.patch.txt, HIVE-6411.2.patch.txt, HIVE-6411.3.patch.txt, HIVE-6411.4.patch.txt, HIVE-6411.5.patch.txt, HIVE-6411.6.patch.txt, HIVE-6411.7.patch.txt, HIVE-6411.8.patch.txt, HIVE-6411.9.patch.txt HIVE-2599 introduced using custom object for the row key. But it forces key objects to extend HBaseCompositeKey, which is again extension of LazyStruct. If user provides proper Object and OI, we can replace internal key and keyOI with those. Initial implementation is based on factory interface. {code} public interface HBaseKeyFactory { void init(SerDeParameters parameters, Properties properties) throws SerDeException; ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException; LazyObjectBase createObject(ObjectInspector inspector) throws SerDeException; } {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996038#comment-13996038 ] Hive QA commented on HIVE-7043: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12644516/HIVE-7043.3.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/186/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/186/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12644516 When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch, HIVE-7043.3.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Attachment: HIVE-7043.2.patch When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.2.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6965) Transaction manager should use RDBMS time instead of machine time
[ https://issues.apache.org/jira/browse/HIVE-6965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-6965: - Status: Patch Available (was: Open) This patch changes the code to ask the database for the time rather than calling currentTimeMillis(). Transaction manager should use RDBMS time instead of machine time - Key: HIVE-6965 URL: https://issues.apache.org/jira/browse/HIVE-6965 Project: Hive Issue Type: Bug Components: Locking Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-6965.patch Current TxnHandler and CompactionTxnHandler use System.currentTimeMillis() when they need to determine the time (such as heartbeating transactions). In situations where there are multiple Thrift metastore services or users are using an embedded metastore this will lead to issues. We should instead be using time from the RDBMS, which is guaranteed to be the same for all users. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: showing column stats
Create the JIRAs https://issues.apache.org/jira/browse/HIVE-7050 https://issues.apache.org/jira/browse/HIVE-7051 Thanks Prasanth Jayachandran On May 12, 2014, at 6:52 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: I have a basic patch which prints table level column stats.. I can put up the patch for it today/tomorrow.. but for displaying partition level column stats we need to extend the “describe” statement to support column names.. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribePartition. If you see the DDL describe partition does not accept column names. I can create JIRAs for the following tasks 1) Showing column stats in describe table 2) Showing column stats in describe partition If you would like to take up 2) please feel free to do so. Thanks Prasanth Jayachandran On May 12, 2014, at 5:45 PM, Xuefu Zhang xzh...@cloudera.com wrote: Hi all, I'm wondering if there is a simpler way to show column stats than writing a thrift client calling the thrift API, such as commands in Hive CLI. I have tried desc extended as well as explain select, but none of them shows column stats. Thanks, Xuefu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-6187) Add test to verify that DESCRIBE TABLE works with quoted table names
[ https://issues.apache.org/jira/browse/HIVE-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-6187: - Status: Patch Available (was: Open) Add test to verify that DESCRIBE TABLE works with quoted table names Key: HIVE-6187 URL: https://issues.apache.org/jira/browse/HIVE-6187 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Andy Mok Attachments: HIVE-6187.1.patch Backticks around tables named after special keywords, such as items, allow us to create, drop, and alter the table. For example {code:sql} CREATE TABLE foo.`items` (bar INT); DROP TABLE foo.`items`; ALTER TABLE `items` RENAME TO `items_`; {code} However, we cannot call {code:sql} DESCRIBE foo.`items`; DESCRIBE `items`; {code} The DESCRIBE query does not permit backticks to surround table names. The error returned is {code:sql} FAILED: SemanticException [Error 10001]: Table not found `items` {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7042) Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2
[ https://issues.apache.org/jira/browse/HIVE-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7042: - Attachment: HIVE-7042.1.patch.txt Not sure why this patch was not picked up HIVE QA for days. Reuploading the patch again. Fix stats_partscan_1_23.q and orc_createas1.q for hadoop-2 -- Key: HIVE-7042 URL: https://issues.apache.org/jira/browse/HIVE-7042 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7042.1.patch, HIVE-7042.1.patch.txt stats_partscan_1_23.q and orc_createas1.q should use HiveInputFormat as opposed to CombineHiveInputFormat. RCFile uses DefaultCodec for compression (uses DEFLATE) which is not splittable. Hence using CombineHiveIF will yield different results for these tests. ORC should use HiveIF to generate ORC splits. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: showing column stats
Thanks, Prasanth. I tried your patch in HIVE-7050, and it helped me demonstrate another problem related to stats, HIVE-7053. I can review your patches. Thanks again! --Xuefu On Mon, May 12, 2014 at 6:57 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: Create the JIRAs https://issues.apache.org/jira/browse/HIVE-7050 https://issues.apache.org/jira/browse/HIVE-7051 Thanks Prasanth Jayachandran On May 12, 2014, at 6:52 PM, Prasanth Jayachandran pjayachand...@hortonworks.com wrote: I have a basic patch which prints table level column stats.. I can put up the patch for it today/tomorrow.. but for displaying partition level column stats we need to extend the “describe” statement to support column names.. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DescribePartition. If you see the DDL describe partition does not accept column names. I can create JIRAs for the following tasks 1) Showing column stats in describe table 2) Showing column stats in describe partition If you would like to take up 2) please feel free to do so. Thanks Prasanth Jayachandran On May 12, 2014, at 5:45 PM, Xuefu Zhang xzh...@cloudera.com wrote: Hi all, I'm wondering if there is a simpler way to show column stats than writing a thrift client calling the thrift API, such as commands in Hive CLI. I have tried desc extended as well as explain select, but none of them shows column stats. Thanks, Xuefu -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Updated] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-7043: - Attachment: HIVE-7043.1.patch When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.1.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7033) grant statements should check if the role exists
[ https://issues.apache.org/jira/browse/HIVE-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-7033: Attachment: HIVE-7033.2.patch HIVE-7033.2.patch - updating comment in .q file grant statements should check if the role exists Key: HIVE-7033 URL: https://issues.apache.org/jira/browse/HIVE-7033 Project: Hive Issue Type: Bug Components: Authorization, SQLStandardAuthorization Affects Versions: 0.13.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-7033.1.patch, HIVE-7033.2.patch The following grant statement that grants to a role that does not exist succeeds, but it should result in an error. grant all on t1 to role nosuchrole; -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7055) cofig not propagating for PTFOperator
[ https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7055: --- Attachment: HIVE-7055.patch cofig not propagating for PTFOperator - Key: HIVE-7055 URL: https://issues.apache.org/jira/browse/HIVE-7055 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.12.0, 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7055.patch e.g. setting hive.join.cache.size has no effect and task nodes always got default value of 25000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7041) DoubleWritable/ByteWritable should extend their hadoop counterparts
[ https://issues.apache.org/jira/browse/HIVE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-7041: - Attachment: HIVE-7041.1.patch tests didn't run for some reason, re-upload patch. DoubleWritable/ByteWritable should extend their hadoop counterparts --- Key: HIVE-7041 URL: https://issues.apache.org/jira/browse/HIVE-7041 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7041.1.patch, HIVE-7041.1.patch Hive has its own implementations of ByteWritable/DoubleWritable/ShortWritable. We cannot replace usage of these classes since they will break 3rd party UDFs/SerDes, however we can at least extend from the Hadoop version of these classes when possible to avoid duplicate code. When Hive finally moves to version 1.0 we might want to consider removing use of these Hive-specific writables and switching over to using the Hadoop version of these classes. ShortWritable didn't exist in Hadoop until 2.x so it looks like we can't do it with this class until 0.20/1.x support is dropped from Hive. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7056) TestPig_11 fails with Pig 12.1 and earlier
[ https://issues.apache.org/jira/browse/HIVE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-7056: Assignee: Eugene Koifman TestPig_11 fails with Pig 12.1 and earlier -- Key: HIVE-7056 URL: https://issues.apache.org/jira/browse/HIVE-7056 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman on trunk, pig script (http://svn.apache.org/repos/asf/pig/trunk/bin/pig) is looking for *hcatalog-core-*.jar etc. In Pig 12.1 it's looking for hcatalog-core-*.jar, which doesn't work with Hive 0.13. The TestPig_11 job fails with {noformat} 2014-05-13 17:47:10,760 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Failed to parse: Pig script failed to parse: file hcatloadstore.pig, line 19, column 34 pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678) at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411) at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344) at org.apache.pig.PigServer.executeBatch(PigServer.java:369) at org.apache.pig.PigServer.executeBatch(PigServer.java:355) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:478) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: file hcatloadstore.pig, line 19, column 34 pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299) at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284) at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158) at org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:7756) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1669) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 16 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653) at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296) ... 24 more {noformat} the key to this is {noformat} ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/lib/slf4j-api-*.jar: No such file or directory ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-core-*.jar: No such file or directory ls:
[jira] [Updated] (HIVE-7056) TestPig_11 fails with Pig 12.1 and earlier
[ https://issues.apache.org/jira/browse/HIVE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7056: - Description: on trunk, pig script (http://svn.apache.org/repos/asf/pig/trunk/bin/pig) is looking for \*hcatalog-core-\*.jar etc. In Pig 12.1 it's looking for hcatalog-core-\*.jar, which doesn't work with Hive 0.13. The TestPig_11 job fails with {noformat} 2014-05-13 17:47:10,760 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Failed to parse: Pig script failed to parse: file hcatloadstore.pig, line 19, column 34 pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678) at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411) at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344) at org.apache.pig.PigServer.executeBatch(PigServer.java:369) at org.apache.pig.PigServer.executeBatch(PigServer.java:355) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:478) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: file hcatloadstore.pig, line 19, column 34 pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299) at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284) at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158) at org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:7756) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1669) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 16 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653) at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296) ... 24 more {noformat} the key to this is {noformat} ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/lib/slf4j-api-*.jar: No such file or directory ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-core-*.jar: No such file or directory ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-*.jar: No such file or directory ls:
[jira] [Updated] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
[ https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HIVE-6584: --- Attachment: HIVE-6584.0.patch Attaching preliminary patch, based on the patch attached to HBASE-11137. In order to test this properly, I need an HBase table snapshot created. Short of exposing this through hive sql, how can I write a .q file test for this? Add HiveHBaseTableSnapshotInputFormat - Key: HIVE-6584 URL: https://issues.apache.org/jira/browse/HIVE-6584 Project: Hive Issue Type: Improvement Components: HBase Handler Reporter: Nick Dimiduk Attachments: HIVE-6584.0.patch HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing the online region server API provides a nice performance boost for the full scan. HBASE-10642 is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's available, we should add an input format. A follow-on patch could work out how to integrate this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat into existing table definitions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1
I downloaded src tar, built it and ran webhcat e2e tests. I see 2 failures (which I don't see on trunk) TestHive_7 fails with got percentComplete map 100% reduce 0%, expected map 100% reduce 100% TestHeartbeat_1 fails to even launch the job. This looks like the root cause ERROR | 13 May 2014 18:24:00,394 | org.apache.hive.hcatalog.templeton.CatchallExceptionMapper | java.lang.NullPointerException at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:312) at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:479) at org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:170) at org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:153) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:107) at org.apache.hive.hcatalog.templeton.LauncherDelegator$1.run(LauncherDelegator.java:103) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557) at org.apache.hive.hcatalog.templeton.LauncherDelegator.queueAsUser(LauncherDelegator.java:103) at org.apache.hive.hcatalog.templeton.LauncherDelegator.enqueueController(LauncherDelegator.java:81) at org.apache.hive.hcatalog.templeton.JarDelegator.run(JarDelegator.java:55) at org.apache.hive.hcatalog.templeton.Server.mapReduceJar(Server.java:711) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1480) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1411) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1360) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1350) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1360) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:392) at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:87) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1331) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:349) at
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996954#comment-13996954 ] Gopal V commented on HIVE-6430: --- Seems to be only breaking on JDK7 javac. And only on rebuilds with modifications - never on mvn clean package builds. MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7055) cofig not propagating for PTFOperator
[ https://issues.apache.org/jira/browse/HIVE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-7055: --- Status: Patch Available (was: Open) cofig not propagating for PTFOperator - Key: HIVE-7055 URL: https://issues.apache.org/jira/browse/HIVE-7055 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.13.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-7055.patch e.g. setting hive.join.cache.size has no effect and task nodes always got default value of 25000 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6290) Add support for hbase filters for composite keys
[ https://issues.apache.org/jira/browse/HIVE-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6290: -- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Resolved via HIVE-6411. Add support for hbase filters for composite keys Key: HIVE-6290 URL: https://issues.apache.org/jira/browse/HIVE-6290 Project: Hive Issue Type: Sub-task Components: HBase Handler Affects Versions: 0.12.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Fix For: 0.14.0 Attachments: HIVE-6290.1.patch.txt, HIVE-6290.2.patch.txt, HIVE-6290.3.patch.txt Add support for filters to be provided via the composite key class -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7043) When using the tez session pool via hive, once sessions time out, all queries go to the default queue
[ https://issues.apache.org/jira/browse/HIVE-7043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995747#comment-13995747 ] Gunther Hagleitner commented on HIVE-7043: -- +1 When using the tez session pool via hive, once sessions time out, all queries go to the default queue - Key: HIVE-7043 URL: https://issues.apache.org/jira/browse/HIVE-7043 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.13.0 Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: 0.14.0 Attachments: HIVE-7043.1.patch When using a tez session pool to run multiple queries, once the sessions time out, we always end up using the default queue to launch queries. The load balancing doesn't work in this case. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead
[ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997172#comment-13997172 ] Sergey Shelukhin commented on HIVE-6430: Hmm... I cannot repro this... tried JDK 6 or 7, clean build or not, and with modifications. Can you make an addendum patch that fixes it? So I could apply on top MapJoin hash table has large memory overhead Key: HIVE-6430 URL: https://issues.apache.org/jira/browse/HIVE-6430 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.08.patch, HIVE-6430.09.patch, HIVE-6430.10.patch, HIVE-6430.11.patch, HIVE-6430.12.patch, HIVE-6430.12.patch, HIVE-6430.13.patch, HIVE-6430.patch Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row) can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer in other jiras, but in general we don't need to have java hash table there. We can either use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation, to map primitive keys to single row storage structure without an object per row (similar to vectorization). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7054) Support ELT UDF in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-7054: - Attachment: HIVE-7054.patch Here is the review board entry: https://reviews.apache.org/r/21416/ Please review. Support ELT UDF in vectorized mode -- Key: HIVE-7054 URL: https://issues.apache.org/jira/browse/HIVE-7054 Project: Hive Issue Type: New Feature Components: Vectorization Affects Versions: 0.14.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 0.14.0 Attachments: HIVE-7054.patch Implement support for ELT udf in vectorized execution mode. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7046) Propagate addition of new columns to partition schema
Mariano Dominguez created HIVE-7046: --- Summary: Propagate addition of new columns to partition schema Key: HIVE-7046 URL: https://issues.apache.org/jira/browse/HIVE-7046 Project: Hive Issue Type: Improvement Components: Database/Schema Affects Versions: 0.12.0 Reporter: Mariano Dominguez Hive reads data according to the partition schema, not the table schema (because of HIVE-3833). ALTER TABLE only updates the table schema, and the changes are not propagated to partitions. Thus, the schema of a partition will differ from that of the table after altering the table schema; this is done to preserve the ability to read existing data, particularly when using binary formats such as RCFile. Binary formats do not allow changing the type of a field because of the way serialization works; a field serialized as a string will be displayed incorrectly if read as an integer. Unfortunately, as a side effect, this behavior limits the ability to add new columns to already exiting partitions using ALTER TABLE ADD COLUMNS. A possible workaround is to recreate the partitions, but this process could be unnecessarily cumbersome if the number of partitions is high. New columns should be propagated to existing partitions automatically instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7057) webhcat e2e deployment scripts don't have x bit set
[ https://issues.apache.org/jira/browse/HIVE-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7057: - Attachment: HIVE-7057.patch @Thejas could you review this? When checking in please chmod u+x on all .sh files. The patch files can't capture this. webhcat e2e deployment scripts don't have x bit set --- Key: HIVE-7057 URL: https://issues.apache.org/jira/browse/HIVE-7057 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7057.patch also, update env.sh to use latest Pig release NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7052) Optimize split calculation time
[ https://issues.apache.org/jira/browse/HIVE-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HIVE-7052: --- Attachment: HIVE-7052-profiler-2.png HIVE-7052-profiler-1.png Optimize split calculation time --- Key: HIVE-7052 URL: https://issues.apache.org/jira/browse/HIVE-7052 Project: Hive Issue Type: Bug Environment: hive + tez Reporter: Rajesh Balamohan Assignee: Rajesh Balamohan Labels: performance Attachments: HIVE-7052-profiler-1.png, HIVE-7052-profiler-2.png When running a TPC-DS query (query_27), significant amount of time was spent in split computation on a dataset of size 200 GB (ORC format). Profiling revealed that, 1. Lot of time was spent in Config's subtitutevar (regex) in HiveInputFormat.getSplits() method. 2. FileSystem was created repeatedly in OrcInputFormat.generateSplitsInfo(). I will attach the profiler snapshots soon. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-5268) HiveServer2 accumulates orphaned OperationHandle objects when a client fails while executing query
[ https://issues.apache.org/jira/browse/HIVE-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5268: Fix Version/s: (was: 0.13.0) 0.14.0 HiveServer2 accumulates orphaned OperationHandle objects when a client fails while executing query -- Key: HIVE-5268 URL: https://issues.apache.org/jira/browse/HIVE-5268 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta Assignee: Thiruvel Thirumoolan Fix For: 0.14.0 Attachments: HIVE-5268_prototype.patch When queries are executed against the HiveServer2 an OperationHandle object is stored in the OperationManager.handleToOperation HashMap. Currently its the duty of the JDBC client to explicitly close to cleanup the entry in the map. But if the client fails to close the statement then the OperationHandle object is never cleaned up and gets accumulated in the server. This can potentially cause OOM on the server over time. This also can be used as a loophole by a malicious client to bring down the Hive server. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5733) Publish hive-exec artifact without all the dependencies
[ https://issues.apache.org/jira/browse/HIVE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997276#comment-13997276 ] Amareshwari Sriramadasu commented on HIVE-5733: --- +1 This is much required. I agree it has become difficult to depend on hive exec jar, because of ql module shading all the dependencies. I will try to put a patch. Publish hive-exec artifact without all the dependencies --- Key: HIVE-5733 URL: https://issues.apache.org/jira/browse/HIVE-5733 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Jarek Jarcec Cecho Currently the artifact {{hive-exec}} that is available in [maven|http://search.maven.org/remotecontent?filepath=org/apache/hive/hive-exec/0.12.0/hive-exec-0.12.0.jar] is shading all the dependencies (= the jar contains all Hive's dependencies). As other projects that are depending on Hive might be use slightly different version of the dependencies, it can easily happens that Hive's shaded version will be used instead which leads to very time consuming debugging of what is happening (for example SQOOP-1198). Would it be feasible publish {{hive-exec}} jar that will be build without shading any dependency? For example [avro-tools|http://search.maven.org/#artifactdetails%7Corg.apache.avro%7Cavro-tools%7C1.7.5%7Cjar] is having classifier nodeps that represents artifact without any dependencies. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-3276) optimize union sub-queries
[ https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996244#comment-13996244 ] Lefty Leverenz commented on HIVE-3276: -- The configuration parameters are now documented in the wiki: * [hive.optimize.union.remove |https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.union.remove] * [hive.mapred.supports.subdirectories |https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.mapred.supports.subdirectories] optimize union sub-queries -- Key: HIVE-3276 URL: https://issues.apache.org/jira/browse/HIVE-3276 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Namit Jain Assignee: Namit Jain Fix For: 0.10.0 Attachments: HIVE-3276.1.patch, hive.3276.10.patch, hive.3276.11.patch, hive.3276.12.patch, hive.3276.13.patch, hive.3276.14.patch, hive.3276.2.patch, hive.3276.3.patch, hive.3276.4.patch, hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, hive.3276.8.patch, hive.3276.9.patch It might be a good idea to optimize simple union queries containing map-reduce jobs in at least one of the sub-qeuries. For eg: a query like: insert overwrite table T1 partition P1 select * from ( subq1 union all subq2 ) u; today creates 3 map-reduce jobs, one for subq1, another for subq2 and the final one for the union. It might be a good idea to optimize this. Instead of creating the union task, it might be simpler to create a move task (or something like a move task), where the outputs of the two sub-queries will be moved to the final directory. This can easily extend to more than 2 sub-queries in the union. This is very useful if there is a select * followed by filesink after the union. This can be independently useful, and also be used to optimize the skewed joins -- https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization. If there is a select, filter between the union and the filesink, the select and the filter can be moved before the union, and the follow-up job can still be removed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7056) TestPig_11 fails with Pig 12.1 and earlier
[ https://issues.apache.org/jira/browse/HIVE-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-7056: - Assignee: (was: Eugene Koifman) TestPig_11 fails with Pig 12.1 and earlier -- Key: HIVE-7056 URL: https://issues.apache.org/jira/browse/HIVE-7056 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.13.0 Reporter: Eugene Koifman on trunk, pig script (http://svn.apache.org/repos/asf/pig/trunk/bin/pig) is looking for \*hcatalog-core-\*.jar etc. In Pig 12.1 it's looking for hcatalog-core-\*.jar, which doesn't work with Hive 0.13. The TestPig_11 job fails with {noformat} 2014-05-13 17:47:10,760 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] Failed to parse: Pig script failed to parse: file hcatloadstore.pig, line 19, column 34 pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678) at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411) at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344) at org.apache.pig.PigServer.executeBatch(PigServer.java:369) at org.apache.pig.PigServer.executeBatch(PigServer.java:355) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:478) at org.apache.pig.Main.main(Main.java:156) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Caused by: file hcatloadstore.pig, line 19, column 34 pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1299) at org.apache.pig.parser.LogicalPlanBuilder.buildFuncSpec(LogicalPlanBuilder.java:1284) at org.apache.pig.parser.LogicalPlanGenerator.func_clause(LogicalPlanGenerator.java:5158) at org.apache.pig.parser.LogicalPlanGenerator.store_clause(LogicalPlanGenerator.java:7756) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1669) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188) ... 16 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:653) at org.apache.pig.parser.LogicalPlanBuilder.validateFuncSpec(LogicalPlanBuilder.java:1296) ... 24 more {noformat} the key to this is {noformat} ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/lib/slf4j-api-*.jar: No such file or directory ls: /private/tmp/hadoop-ekoifman/nm-local-dir/usercache/ekoifman/appcache/application_1400018007772_0045/container_1400018007772_0045_01_02/apache-hive-0.14.0-SNAPSHOT-bin.tar.gz/apache-hive-0.14.0-SNAPSHOT-bin/hcatalog/share/hcatalog/hcatalog-core-*.jar: No such file or directory ls:
[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994152#comment-13994152 ] Hive QA commented on HIVE-2137: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12595024/HIVE-2137.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/160/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/160/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-160/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1593663. At revision 1593663. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12595024 JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Labels: patch Fix For: 0.14.0 Attachments: HIVE-2137.patch, HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: How to remote debug WebHCat?
if you take webhcat_server.sh as currently in trunk, it supports startDebug option that will let you attach a debugger to the process On Mon, May 12, 2014 at 11:13 PM, Na Yang ny...@maprtech.com wrote: Hi Folks, Is there a way to remote debug webhcat? If so, how to enable the remote debug? Thanks, Na -- Thanks, Eugene -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.