[jira] [Commented] (HIVE-6144) Implement non-staged MapJoin
[ https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876235#comment-13876235 ] Hive QA commented on HIVE-6144: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12623892/HIVE-6144.4.patch.txt {color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 4944 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_like_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mapjoin_mapjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_multiMapJoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nullformatCTAS org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_push_or org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_alter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_ppr2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_ppr org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_unset_table_view_property org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_left_outer_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_context org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_mapjoin org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_deletejar {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/961/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/961/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 33 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12623892 Implement non-staged MapJoin Key: HIVE-6144 URL: https://issues.apache.org/jira/browse/HIVE-6144 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map:
[jira] [Updated] (HIVE-6229) Stats are missing sometimes (regression from HIVE-5936)
[ https://issues.apache.org/jira/browse/HIVE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6229: Status: Patch Available (was: Open) Stats are missing sometimes (regression from HIVE-5936) --- Key: HIVE-6229 URL: https://issues.apache.org/jira/browse/HIVE-6229 Project: Hive Issue Type: Bug Components: Statistics Reporter: Navis Assignee: Navis Attachments: HIVE-6229.1.patch.txt, HIVE-6229.2.patch.txt if prefix length is smaller than hive.stats.key.prefix.max.length but length of prefix + postfix is bigger than that, stats are missed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6229) Stats are missing sometimes (regression from HIVE-5936)
[ https://issues.apache.org/jira/browse/HIVE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6229: Attachment: HIVE-6229.2.patch.txt Stats are missing sometimes (regression from HIVE-5936) --- Key: HIVE-6229 URL: https://issues.apache.org/jira/browse/HIVE-6229 Project: Hive Issue Type: Bug Components: Statistics Reporter: Navis Assignee: Navis Attachments: HIVE-6229.1.patch.txt, HIVE-6229.2.patch.txt if prefix length is smaller than hive.stats.key.prefix.max.length but length of prefix + postfix is bigger than that, stats are missed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Review Request 17118: Stats are missing sometimes (regression from HIVE-5936)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17118/ --- Review request for hive. Bugs: HIVE-6229 https://issues.apache.org/jira/browse/HIVE-6229 Repository: hive-git Description --- if prefix length is smaller than hive.stats.key.prefix.max.length but length of prefix + postfix is bigger than that, stats are missed. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a78b72f conf/hive-default.xml.template 7cd8a1f data/conf/hive-site.xml eac1a3f ql/src/java/org/apache/hadoop/hive/ql/Driver.java bd95161 ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java 1c84523 ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java b6c09eb ql/src/java/org/apache/hadoop/hive/ql/exec/NodeUtils.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java a22a4c2 ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 0e3cfe7 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java b9b5b4a ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/stats/PartialScanMapper.java e319fe4 ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 0f0e825 ql/src/java/org/apache/hadoop/hive/ql/stats/StatsFactory.java 2fb880d ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java 41e237f Diff: https://reviews.apache.org/r/17118/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-6229) Stats are missing sometimes (regression from HIVE-5936)
[ https://issues.apache.org/jira/browse/HIVE-6229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876450#comment-13876450 ] Hive QA commented on HIVE-6229: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12623937/HIVE-6229.2.patch.txt {color:green}SUCCESS:{color} +1 4943 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/962/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/962/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12623937 Stats are missing sometimes (regression from HIVE-5936) --- Key: HIVE-6229 URL: https://issues.apache.org/jira/browse/HIVE-6229 Project: Hive Issue Type: Bug Components: Statistics Reporter: Navis Assignee: Navis Attachments: HIVE-6229.1.patch.txt, HIVE-6229.2.patch.txt if prefix length is smaller than hive.stats.key.prefix.max.length but length of prefix + postfix is bigger than that, stats are missed. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876529#comment-13876529 ] Ted Xu commented on HIVE-5771: -- Hi [~ashutoshgupt...@gmail.com], Your points are valid, thanks! Here is my thinking of those issues: * smb_mapjoin_18.q smb_mapjoin_25.q: those problems are introduced by constant propagate optimizer (CPO) conflicting with *Bucketing Sorting ReduceSink Optimizer (BSRO)*. I tried apply BSRO before CPO and the issue seems fixed. * groupby_sort_1.q groupby_sort_skew_1.q: those are because of CPO conflicting with *Groupby Optimizer (GO)*, apply it before CPO also fixes issue. In fact I'm wondering if it is safe to reorder those optimizers, making it GO-BSRO-CPO. * decimal.q pcr.q: I disabled these two cases because of some issue I still not figured out. My local machine told me to patch a piece of output data like '0.0040' to '0,004', but it is still '0.0040' in hudson server. I guess it is an environment issue. I will update the patch as soon as I validated the above fixes. Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.patch Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876591#comment-13876591 ] Sergey Shelukhin commented on HIVE-6157: Sorry, was not aware of that JIRA. Among other things, this patch adds bulk APIs. They do not support multiple tables as of now, though. Stats are currently fetched on the level of one column (stat optimizer) or one table (table scan stuff), so making use of multi-table API would require more extensive changes on the client (optimizer) side. Fetching column stats slower than the 101 during rush hour -- Key: HIVE-6157 URL: https://issues.apache.org/jira/browse/HIVE-6157 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Gunther Hagleitner Assignee: Sergey Shelukhin Attachments: HIVE-6157.prelim.patch hive.stats.fetch.column.stats controls whether the column stats for a table are fetched during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns) the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent fetching column stats... The reason is probably that the APIs force you to make separate metastore calls for each column in each partition. That's probably the first thing that has to change. The question is if in addition to that we need to cache this in the client or store the stats as a single blob in the database to further cut down on the time. However, the way it stands right now column stats seem unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5687) Streaming support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated HIVE-5687: -- Attachment: 5687-draft-api-spec.pdf Attaching draft api spec for comments Streaming support in Hive - Key: HIVE-5687 URL: https://issues.apache.org/jira/browse/HIVE-5687 Project: Hive Issue Type: Bug Reporter: Roshan Naik Assignee: Roshan Naik Attachments: 5687-draft-api-spec.pdf Implement support for Streaming data into HIVE. - Provide a client streaming API - Transaction support: Clients should be able to periodically commit a batch of records atomically - Immediate visibility: Records should be immediately visible to queries on commit - Should not overload HDFS with too many small files Use Cases: - Streaming logs into HIVE via Flume - Streaming results of computations from Storm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6139) Implement vectorized decimal division and modulo
[ https://issues.apache.org/jira/browse/HIVE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6139: -- Attachment: HIVE-6139.07.patch Uploading patch again to try to kick off automated tests, which didn't run last time. Implement vectorized decimal division and modulo Key: HIVE-6139 URL: https://issues.apache.org/jira/browse/HIVE-6139 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-6139.01.patch, HIVE-6139.02.patch, HIVE-6139.07.patch, HIVE-6139.07.patch Support column-scalar, scalar-column, and column-column versions for division and modulo. Include unit tests. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Timeline for the Hive 0.13 release?
Hi, I agree that picking a date to branch and then restricting commits to that branch would be a less time intensive plan for the RM. Brock On Sat, Jan 18, 2014 at 4:21 PM, Harish Butani hbut...@hortonworks.comwrote: Yes agree it is time to start planning for the next release. I would like to volunteer to do the release management duties for this release(will be a great experience for me) Will be happy to do it, if the community is fine with this. regards, Harish. On Jan 17, 2014, at 7:05 PM, Thejas Nair the...@hortonworks.com wrote: Yes, I think it is time to start planning for the next release. For 0.12 release I created a branch and then accepted patches that people asked to be included for sometime, before moving a phase of accepting only critical bug fixes. This turned out to be laborious. I think we should instead give everyone a few weeks to get any patches they are working on to be ready, cut the branch, and take in only critical bug fixes to the branch after that. How about cutting the branch around mid-February and targeting to release in a week or two after that. Thanks, Thejas On Fri, Jan 17, 2014 at 4:39 PM, Carl Steinbach c...@apache.org wrote: I was wondering what people think about setting a tentative date for the Hive 0.13 release? At an old Hive Contrib meeting we agreed that Hive should follow a time-based release model with new releases every four months. If we follow that schedule we're due for the next release in mid-February. Thoughts? Thanks. Carl -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Commented] (HIVE-5635) WebHCatJTShim23 ignores security/user context
[ https://issues.apache.org/jira/browse/HIVE-5635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876625#comment-13876625 ] Eugene Koifman commented on HIVE-5635: -- [~shanyu] you're right, it does seem odd. I think 1 should be enough. WebHCatJTShim23 ignores security/user context - Key: HIVE-5635 URL: https://issues.apache.org/jira/browse/HIVE-5635 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.13.0 Attachments: HIVE-5635.2.patch, HIVE-5635.3.patch, HIVE-5635.patch WebHCatJTShim23 takes UserGroupInformation object as argument (which represents the user make the call to WebHCat or doAs user) but ignores. WebHCatJTShim20S uses the UserGroupInformation This is inconsistent and may be a security hole because in with Hadoop 2 the methods on WebHCatJTShim are likely running with 'hcat' as the user context. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2
[ https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6002: - Status: Patch Available (was: Open) Making it as patch available. Create new ORC write version to address the changes to RLEv2 Key: HIVE-6002 URL: https://issues.apache.org/jira/browse/HIVE-6002 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-6002.1.patch, HIVE-6002.2.patch HIVE-5994 encodes large negative big integers wrongly. This results in loss of original data that is being written using orc write version 0.12. Bump up the version number to differentiate the bad writes by 0.12 and the good writes by this new version (0.12.1?). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: (was: parquet-hive.patch) Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: (was: hive-0.11-parquet.patch) Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: HIVE-5783.patch without license or author tags. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6227) WebHCat E2E test JOBS_7 fails
[ https://issues.apache.org/jira/browse/HIVE-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876648#comment-13876648 ] Deepesh Khandelwal commented on HIVE-6227: -- Thanks [~daijy] for review and commit. WebHCat E2E test JOBS_7 fails - Key: HIVE-6227 URL: https://issues.apache.org/jira/browse/HIVE-6227 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Fix For: 0.13.0 Attachments: HIVE-6227.patch WebHCat E2E test JOBS_7 fails while verifying the job status of a TempletonControllerJob and its child pig job. The filter currently is such that only pig jobs are looked at, it should also include TempletonControllerJob. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: (was: HIVE-5783.patch) Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6162) multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0
[ https://issues.apache.org/jira/browse/HIVE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6162: Fix Version/s: 0.13.0 multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0 -- Key: HIVE-6162 URL: https://issues.apache.org/jira/browse/HIVE-6162 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.12.0 Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 0.13.0 Attachments: HIVE-6162.patch On Hadoop 2.0, when running hive command line, we saw warnings like this: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/myhdp/hadoop-2.1.2.2.0.6.0-/share/hado op/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/myhdp/hive-0.12.0.2.0.6.0-/lib/slf4j-l og4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6162) multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0
[ https://issues.apache.org/jira/browse/HIVE-6162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6162: Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution Shanyu! multiple SLF4J bindings warning messages when running hive CLI on Hadoop 2.0 -- Key: HIVE-6162 URL: https://issues.apache.org/jira/browse/HIVE-6162 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.12.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: HIVE-6162.patch On Hadoop 2.0, when running hive command line, we saw warnings like this: SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/C:/myhdp/hadoop-2.1.2.2.0.6.0-/share/hado op/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/C:/myhdp/hive-0.12.0.2.0.6.0-/lib/slf4j-l og4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6164) Hive build on Windows failed with datanucleus enhancer error command line is too long
[ https://issues.apache.org/jira/browse/HIVE-6164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6164: Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution Shanyu! Hive build on Windows failed with datanucleus enhancer error command line is too long --- Key: HIVE-6164 URL: https://issues.apache.org/jira/browse/HIVE-6164 Project: Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Fix For: 0.13.0 Attachments: HIVE-6164.patch Build hive 0.13 against hadoop 2.0 on Windows always fail: mvn install -Phadoop-2 ... [ERROR] [ERROR] Standard error from the DataNucleus tool + org.datanucleus.enhancer.Dat aNucleusEnhancer : [ERROR] [ERROR] The command line is too long. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-6215) Prepared Statements created and executed remotely will return no metadata and empty result set
[ https://issues.apache.org/jira/browse/HIVE-6215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samer El Helou resolved HIVE-6215. -- Resolution: Invalid Prepared Statements created and executed remotely will return no metadata and empty result set -- Key: HIVE-6215 URL: https://issues.apache.org/jira/browse/HIVE-6215 Project: Hive Issue Type: Bug Components: Clients Affects Versions: 0.11.0 Environment: I have a Red Hat server 6.4 installed on a VM. Installed IBM Java 1.6 Installed Hadoop 0.20.2 Installed Hive2 0.11 Installed Derby DB 10.4.2.0 Reporter: Samer El Helou Priority: Blocker Labels: Prepared, Remote, Statement Created a simple test to test prepared statements locally. I will receive the correct results. When I try to do the same test on another remote machine, the metadata and result set are empty. Statements created through createStatement work perfectly fine locally and remotely. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Justin Coffey updated HIVE-5783: Attachment: HIVE-5783.patch this is the good one. had a final dependency to clean up. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17005: Vectorized reader for DECIMAL datatype for ORC format.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17005/#review32299 --- common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java https://reviews.apache.org/r/17005/#comment61044 I think it's worth having a case for signum == 0 to update the value to 0, to make correctness obvious, and for speed too, since 0 is a very common value. You can use update(0) and not have to use the updateBigInteger function. common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java https://reviews.apache.org/r/17005/#comment61048 You should put a comment that behavior is undefined if the BigInteger argument is negative, and that you should only pass in positive values. common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java https://reviews.apache.org/r/17005/#comment61045 The convention in this code is to overload update() based on argument type, so I think it's best to call the method update instead of updateBigInteger. Also, add a comment that argument must not be negative. If it is, I think sign extension from shiftRight might cause an error. ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java https://reviews.apache.org/r/17005/#comment61047 can you comment why you are sharing the result null vector into the scratch one? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java https://reviews.apache.org/r/17005/#comment61049 It seems odd that we're reading from a scaleStream because the scale should be the same for every value in the column. Is this necessary? ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java https://reviews.apache.org/r/17005/#comment61050 If any scale values are different inside a single DecimalColumnVector, I think that could cause unpredictable or wrong results. Later operations on DecimalColumnVector take the scale from the columnvector sometimes, not each individual object. ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java https://reviews.apache.org/r/17005/#comment61051 do you want to include the printing in the final test? - Eric Hanson On Jan. 17, 2014, 12:58 a.m., Jitendra Pandey wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17005/ --- (Updated Jan. 17, 2014, 12:58 a.m.) Review request for hive and Eric Hanson. Bugs: HIVE-6178 https://issues.apache.org/jira/browse/HIVE-6178 Repository: hive-git Description --- vectorized reader for DECIMAL datatype for ORC format. Diffs - common/src/java/org/apache/hadoop/hive/common/type/Decimal128.java 3939511 common/src/java/org/apache/hadoop/hive/common/type/UnsignedInt128.java d71ebb3 common/src/test/org/apache/hadoop/hive/common/type/TestUnsignedInt128.java fbb2aa0 ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DecimalColumnVector.java 23564bb ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java 0876bf7 ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestVectorizedORCReader.java 0d5b7ff Diff: https://reviews.apache.org/r/17005/diff/ Testing --- Thanks, Jitendra Pandey
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876712#comment-13876712 ] Justin Coffey commented on HIVE-5783: - Sorry for the spam in posts. Latest patch is good: - no author tags - no criteo copyright - builds against latest version of parquet (1.3.2) I attempted to create a review.apache.org review, but am unable to publish it because I can't assign any reviewers. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6178) Implement vectorized reader for DECIMAL datatype for ORC format.
[ https://issues.apache.org/jira/browse/HIVE-6178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876714#comment-13876714 ] Eric Hanson commented on HIVE-6178: --- Please see my comments on Review Board Implement vectorized reader for DECIMAL datatype for ORC format. Key: HIVE-6178 URL: https://issues.apache.org/jira/browse/HIVE-6178 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6178.1.patch Implement vectorized reader for DECIMAL datatype for ORC format. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized
Gunther Hagleitner created HIVE-6231: Summary: NPE when switching to Tez execution mode after session has been initialized Key: HIVE-6231 URL: https://issues.apache.org/jira/browse/HIVE-6231 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized
[ https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876766#comment-13876766 ] Gunther Hagleitner commented on HIVE-6231: -- We're dynamically creating a session in TezTask if there is none yet. There's a bug in that though that causes NPE when opening the newly created session. NPE when switching to Tez execution mode after session has been initialized --- Key: HIVE-6231 URL: https://issues.apache.org/jira/browse/HIVE-6231 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6231.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized
[ https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6231: - Attachment: HIVE-6231.1.patch NPE when switching to Tez execution mode after session has been initialized --- Key: HIVE-6231 URL: https://issues.apache.org/jira/browse/HIVE-6231 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6231.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized
[ https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6231: - Status: Patch Available (was: Open) NPE when switching to Tez execution mode after session has been initialized --- Key: HIVE-6231 URL: https://issues.apache.org/jira/browse/HIVE-6231 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6231.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized
[ https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876770#comment-13876770 ] Vikram Dixit K commented on HIVE-6231: -- LGTM +1 pending test run. NPE when switching to Tez execution mode after session has been initialized --- Key: HIVE-6231 URL: https://issues.apache.org/jira/browse/HIVE-6231 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6231.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6139) Implement vectorized decimal division and modulo
[ https://issues.apache.org/jira/browse/HIVE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876772#comment-13876772 ] Hive QA commented on HIVE-6139: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12623966/HIVE-6139.07.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4948 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers org.apache.hcatalog.api.TestHCatClient.testBasicDDLCommands org.apache.hcatalog.listener.TestNotificationListener.testAMQListener {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/963/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/963/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12623966 Implement vectorized decimal division and modulo Key: HIVE-6139 URL: https://issues.apache.org/jira/browse/HIVE-6139 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-6139.01.patch, HIVE-6139.02.patch, HIVE-6139.07.patch, HIVE-6139.07.patch Support column-scalar, scalar-column, and column-column versions for division and modulo. Include unit tests. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
[ https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5002: - Attachment: HIVE-5002.2.patch re-uploading unchanged patch in the hope to trigger precommit. Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private --- Key: HIVE-5002 URL: https://issues.apache.org/jira/browse/HIVE-5002 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, h-5002.patch, h-5002.patch Some users want to be able to access the rowIndexes directly from ORC reader extensions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
[ https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5002: - Status: Open (was: Patch Available) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private --- Key: HIVE-5002 URL: https://issues.apache.org/jira/browse/HIVE-5002 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, h-5002.patch, h-5002.patch Some users want to be able to access the rowIndexes directly from ORC reader extensions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
[ https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5002: - Status: Patch Available (was: Open) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private --- Key: HIVE-5002 URL: https://issues.apache.org/jira/browse/HIVE-5002 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, h-5002.patch, h-5002.patch Some users want to be able to access the rowIndexes directly from ORC reader extensions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
[ https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5814: - Attachment: (was: HIVE-5814.patch) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat - Key: HIVE-5814 URL: https://issues.apache.org/jira/browse/HIVE-5814 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5814PrimitiveTypeHivePigMapping.pdf Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well.Also note that CHAR was added recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6232) allow user to control out-of-range values in HCatStorer
Eugene Koifman created HIVE-6232: Summary: allow user to control out-of-range values in HCatStorer Key: HIVE-6232 URL: https://issues.apache.org/jira/browse/HIVE-6232 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Pig values support wider range than Hive. e.g. Pig BIGDECIMAL vs Hive DECIMAL. When storing Pig data into Hive table, if the value is out of range there are 2 options: 1. throw an exception. 2. write NULL instead of the value The 1st has the drawback that it may kill the process that loads 100M rows after 90M rows have been loaded. But the 2nd may not be appropriate for all use cases. Should add support for additional parameters in HCatStorer where the user can specify an option to controll this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6232) allow user to control out-of-range values in HCatStorer
[ https://issues.apache.org/jira/browse/HIVE-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6232: - Description: Pig values support wider range than Hive. e.g. Pig BIGDECIMAL vs Hive DECIMAL. When storing Pig data into Hive table, if the value is out of range there are 2 options: 1. throw an exception. 2. write NULL instead of the value The 1st has the drawback that it may kill the process that loads 100M rows after 90M rows have been loaded. But the 2nd may not be appropriate for all use cases. Should add support for additional parameters in HCatStorer where the user can specify an option to controll this. see org.apache.pig.backend.hadoop.hbase.HBaseStorage for examples was: Pig values support wider range than Hive. e.g. Pig BIGDECIMAL vs Hive DECIMAL. When storing Pig data into Hive table, if the value is out of range there are 2 options: 1. throw an exception. 2. write NULL instead of the value The 1st has the drawback that it may kill the process that loads 100M rows after 90M rows have been loaded. But the 2nd may not be appropriate for all use cases. Should add support for additional parameters in HCatStorer where the user can specify an option to controll this. allow user to control out-of-range values in HCatStorer --- Key: HIVE-6232 URL: https://issues.apache.org/jira/browse/HIVE-6232 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.13.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Pig values support wider range than Hive. e.g. Pig BIGDECIMAL vs Hive DECIMAL. When storing Pig data into Hive table, if the value is out of range there are 2 options: 1. throw an exception. 2. write NULL instead of the value The 1st has the drawback that it may kill the process that loads 100M rows after 90M rows have been loaded. But the 2nd may not be appropriate for all use cases. Should add support for additional parameters in HCatStorer where the user can specify an option to controll this. see org.apache.pig.backend.hadoop.hbase.HBaseStorage for examples -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
[ https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5814: - Attachment: (was: HIVE-5814PrimitiveTypeHivePigMapping.pdf) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat - Key: HIVE-5814 URL: https://issues.apache.org/jira/browse/HIVE-5814 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5814 HCat-Pig type mapping.pdf Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well.Also note that CHAR was added recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
[ https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5814: - Attachment: HIVE-5814 HCat-Pig type mapping.pdf Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat - Key: HIVE-5814 URL: https://issues.apache.org/jira/browse/HIVE-5814 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5814 HCat-Pig type mapping.pdf Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well.Also note that CHAR was added recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6002) Create new ORC write version to address the changes to RLEv2
[ https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876865#comment-13876865 ] Hive QA commented on HIVE-6002: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619929/HIVE-6002.2.patch {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 4943 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/964/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/964/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619929 Create new ORC write version to address the changes to RLEv2 Key: HIVE-6002 URL: https://issues.apache.org/jira/browse/HIVE-6002 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-6002.1.patch, HIVE-6002.2.patch HIVE-5994 encodes large negative big integers wrongly. This results in loss of original data that is being written using orc write version 0.12. Bump up the version number to differentiate the bad writes by 0.12 and the good writes by this new version (0.12.1?). -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876870#comment-13876870 ] Venki Korukanti commented on HIVE-1634: --- From the description it looks like binary storage support is only for few primitive types. Quoting from description: This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types Is there any JIRA or requirement to support the rest of primitive types (like binary, timestamp, decimal) in binary storage format? Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: New Feature Components: HBase Handler Affects Versions: 0.7.0, 0.8.0, 0.9.0 Reporter: Basab Maulik Assignee: Ashutosh Chauhan Fix For: 0.9.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, HIVE-1634.1.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java, hive-1634_3.patch This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.691 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 NULLNULLNULLNULLNULLTest-String NULLNULL Time taken: 0.346 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,b,b,b ) tblproperties ( hbase.table.name = TestHiveHBaseExternalTable, hbase.table.default.storage.type = string); OK Time taken: 0.139 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.151 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.154 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,-,b,b ) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.347 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808
[jira] [Created] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
Deepesh Khandelwal created HIVE-6233: Summary: JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode Key: HIVE-6233 URL: https://issues.apache.org/jira/browse/HIVE-6233 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal JOBS testsuite performs operations with two users test.user.name and test.other.user.name. In Kerberos secure mode it should kinit as the respective user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876889#comment-13876889 ] Ashutosh Chauhan commented on HIVE-1634: I don't think there is any jira for new types or complex types. At the time, this work was done only those primitive types were supported in Hive. Although, any new work in this direction should take into account addition of type support work in Hbase. cc: [~ndimiduk] who is leading the effort in hbase land. Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: New Feature Components: HBase Handler Affects Versions: 0.7.0, 0.8.0, 0.9.0 Reporter: Basab Maulik Assignee: Ashutosh Chauhan Fix For: 0.9.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, HIVE-1634.1.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java, hive-1634_3.patch This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.691 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 NULLNULLNULLNULLNULLTest-String NULLNULL Time taken: 0.346 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,b,b,b ) tblproperties ( hbase.table.name = TestHiveHBaseExternalTable, hbase.table.default.storage.type = string); OK Time taken: 0.139 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.151 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.154 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,-,b,b ) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.347 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time
[jira] [Updated] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
[ https://issues.apache.org/jira/browse/HIVE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6233: - Status: Patch Available (was: Open) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode -- Key: HIVE-6233 URL: https://issues.apache.org/jira/browse/HIVE-6233 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6233.patch JOBS testsuite performs operations with two users test.user.name and test.other.user.name. In Kerberos secure mode it should kinit as the respective user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6234) Implement fast vectorized InputFormat extension for text files
Eric Hanson created HIVE-6234: - Summary: Implement fast vectorized InputFormat extension for text files Key: HIVE-6234 URL: https://issues.apache.org/jira/browse/HIVE-6234 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Implement support for vectorized scan input of text files (plain text with configurable record and fields separators). This should work for CSV files, tab delimited files, etc. The goal is to provide high-performance reading of these files using vectorized scans, and also to do it as an extension of existing Hive. Then, if vectorized query is enabled, existing tables based on text files will be able to benefit immediately without the need to use a different input format. Another goal is to go beyond a simple layering of vectorized row batch iterator over the top of the existing row iterator. It should be possible to, say, read a chunk of data into a byte buffer (several thousand or even million rows), and then read data from it into vectorized row batches directly. Object creations should be minimized to save allocation time and GC overhead. If it is possible to save CPU for values like dates and numbers by caching the translation from string to the final data type, that should ideally be implemented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
[ https://issues.apache.org/jira/browse/HIVE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6233: - Attachment: HIVE-6233.patch Attaching a patch for review with the following changes: - kinit with relevant user between individual tests - rolled hcat-authorization and jobstatus tests into test-multi-users target in build.xml JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode -- Key: HIVE-6233 URL: https://issues.apache.org/jira/browse/HIVE-6233 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6233.patch JOBS testsuite performs operations with two users test.user.name and test.other.user.name. In Kerberos secure mode it should kinit as the respective user. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
[ https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5814: - Status: Patch Available (was: Open) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat - Key: HIVE-5814 URL: https://issues.apache.org/jira/browse/HIVE-5814 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5814 HCat-Pig type mapping.pdf, HIVE-5814.patch Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well.Also note that CHAR was added recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
[ https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5814: - Attachment: HIVE-5814.patch Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat - Key: HIVE-5814 URL: https://issues.apache.org/jira/browse/HIVE-5814 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5814 HCat-Pig type mapping.pdf, HIVE-5814.patch Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well.Also note that CHAR was added recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6233) JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode
[ https://issues.apache.org/jira/browse/HIVE-6233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepesh Khandelwal updated HIVE-6233: - Description: JOBS testsuite performs operations with two users test.user.name and test.other.user.name. In Kerberos secure mode it should kinit as the respective user. NO PRECOMMIT TESTS was:JOBS testsuite performs operations with two users test.user.name and test.other.user.name. In Kerberos secure mode it should kinit as the respective user. JOBS testsuite in WebHCat E2E tests does not work correctly in secure mode -- Key: HIVE-6233 URL: https://issues.apache.org/jira/browse/HIVE-6233 Project: Hive Issue Type: Bug Components: Tests Affects Versions: 0.13.0 Reporter: Deepesh Khandelwal Assignee: Deepesh Khandelwal Attachments: HIVE-6233.patch JOBS testsuite performs operations with two users test.user.name and test.other.user.name. In Kerberos secure mode it should kinit as the respective user. NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6234) Implement fast vectorized InputFormat extension for text files
[ https://issues.apache.org/jira/browse/HIVE-6234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6234: -- Description: Implement support for vectorized scan input of text files (plain text with configurable record and field separators). This should work for CSV files, tab delimited files, etc. The goal is to provide high-performance reading of these files using vectorized scans, and also to do it as an extension of existing Hive. Then, if vectorized query is enabled, existing tables based on text files will be able to benefit immediately without the need to use a different input format. After upgrading to new Hive bits that support this, faster, vectorized processing over existing text tables should just work, when vectorization is enabled. Another goal is to go beyond a simple layering of vectorized row batch iterator over the top of the existing row iterator. It should be possible to, say, read a chunk of data into a byte buffer (several thousand or even million rows), and then read data from it into vectorized row batches directly. Object creations should be minimized to save allocation time and GC overhead. If it is possible to save CPU for values like dates and numbers by caching the translation from string to the final data type, that should ideally be implemented. was: Implement support for vectorized scan input of text files (plain text with configurable record and fields separators). This should work for CSV files, tab delimited files, etc. The goal is to provide high-performance reading of these files using vectorized scans, and also to do it as an extension of existing Hive. Then, if vectorized query is enabled, existing tables based on text files will be able to benefit immediately without the need to use a different input format. Another goal is to go beyond a simple layering of vectorized row batch iterator over the top of the existing row iterator. It should be possible to, say, read a chunk of data into a byte buffer (several thousand or even million rows), and then read data from it into vectorized row batches directly. Object creations should be minimized to save allocation time and GC overhead. If it is possible to save CPU for values like dates and numbers by caching the translation from string to the final data type, that should ideally be implemented. Implement fast vectorized InputFormat extension for text files -- Key: HIVE-6234 URL: https://issues.apache.org/jira/browse/HIVE-6234 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Implement support for vectorized scan input of text files (plain text with configurable record and field separators). This should work for CSV files, tab delimited files, etc. The goal is to provide high-performance reading of these files using vectorized scans, and also to do it as an extension of existing Hive. Then, if vectorized query is enabled, existing tables based on text files will be able to benefit immediately without the need to use a different input format. After upgrading to new Hive bits that support this, faster, vectorized processing over existing text tables should just work, when vectorization is enabled. Another goal is to go beyond a simple layering of vectorized row batch iterator over the top of the existing row iterator. It should be possible to, say, read a chunk of data into a byte buffer (several thousand or even million rows), and then read data from it into vectorized row batches directly. Object creations should be minimized to save allocation time and GC overhead. If it is possible to save CPU for values like dates and numbers by caching the translation from string to the final data type, that should ideally be implemented. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5783: --- Attachment: HIVE-5783.patch Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876922#comment-13876922 ] Brock Noland commented on HIVE-5783: Thank you very much Justin!! I have rebased the patch for trunk. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-5783: --- Fix Version/s: 0.13.0 Status: Patch Available (was: Open) Marking Patch Available for precommit testing. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6235) webhcat e2e test framework needs changes corresponding to JSON module behavior hange
Hari Sankar Sivarama Subramaniyan created HIVE-6235: --- Summary: webhcat e2e test framework needs changes corresponding to JSON module behavior hange Key: HIVE-6235 URL: https://issues.apache.org/jira/browse/HIVE-6235 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Changes required in hcatalog/src/test/e2e/templeton/drivers/TestDriverCurl.pm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 17061: HIVE-5783 - Native Parquet Support in Hive
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17061/ --- (Updated Jan. 20, 2014, 10:25 p.m.) Review request for hive. Changes --- Copyrights have been removed. Bugs: HIVE-5783 https://issues.apache.org/jira/browse/HIVE-5783 Repository: hive-git Description --- Adds native parquet support hive Diffs (updated) - data/files/parquet_create.txt PRE-CREATION pom.xml 41f5337 ql/pom.xml 7087a4c ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetOutputFormat.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetInputSplitWrapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ProjectionPusher.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ArrayWritableGroupConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableGroupConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/DataWritableRecordConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/ETypeConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveGroupConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/convert/HiveSchemaConverter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/DataWritableReadSupport.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/read/ParquetRecordReaderWrapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/AbstractParquetMapInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ArrayWritableObjectInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/DeepParquetHiveMapInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveArrayInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/ParquetHiveSerDe.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/StandardParquetHiveMapInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetByteInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetPrimitiveInspectorFactory.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetShortInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/serde/primitive/ParquetStringInspector.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/writable/BigDecimalWritable.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/writable/BinaryWritable.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriteSupport.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/ParquetRecordWriterWrapper.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 13d0a56 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g f83c15d ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 1ce6bf3 ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 4147503 ql/src/java/parquet/hive/DeprecatedParquetInputFormat.java PRE-CREATION ql/src/java/parquet/hive/DeprecatedParquetOutputFormat.java PRE-CREATION ql/src/java/parquet/hive/MapredParquetInputFormat.java PRE-CREATION ql/src/java/parquet/hive/MapredParquetOutputFormat.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestHiveSchemaConverter.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetInputFormat.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestMapredParquetOutputFormat.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestParquetSerDe.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/UtilitiesTestMethods.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestAbstractParquetMapInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestDeepParquetHiveMapInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestParquetHiveArrayInspector.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/io/parquet/serde/TestStandardParquetHiveMapInspector.java PRE-CREATION ql/src/test/queries/clientpositive/parquet_create.q PRE-CREATION
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876924#comment-13876924 ] Brock Noland commented on HIVE-5783: RB item has been updated: https://reviews.apache.org/r/17061/ Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6236) webhcat e2e tests require renumbering
Hari Sankar Sivarama Subramaniyan created HIVE-6236: --- Summary: webhcat e2e tests require renumbering Key: HIVE-6236 URL: https://issues.apache.org/jira/browse/HIVE-6236 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan The tests need to be renumbered so that they are continuous. ddl.conf - _10 needs to be renumbered to 8 hcatperms.conf - DB_OPS_9 needs to be renumbered. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6237) Webhcat e2e test JOBS_2 fail due to permission when hdfs umask setting is 022
Hari Sankar Sivarama Subramaniyan created HIVE-6237: --- Summary: Webhcat e2e test JOBS_2 fail due to permission when hdfs umask setting is 022 Key: HIVE-6237 URL: https://issues.apache.org/jira/browse/HIVE-6237 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Webhcat e2e test JOBS_2 fail due to permission when hdfs umask setting is 022. We need to make sure that the test is deterministic. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6238) HadoopShims.getLongComparator needs to be public
Thejas M Nair created HIVE-6238: --- Summary: HadoopShims.getLongComparator needs to be public Key: HIVE-6238 URL: https://issues.apache.org/jira/browse/HIVE-6238 Project: Hive Issue Type: Bug Components: Shims Reporter: Thejas M Nair Assignee: Thejas M Nair HadoopShims.getLongComparator is package private, it should be public as it is used from other classes. {code} Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5814) Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat
[ https://issues.apache.org/jira/browse/HIVE-5814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876939#comment-13876939 ] Eugene Koifman commented on HIVE-5814: -- Review Board: https://reviews.apache.org/r/17135 Add DATE, TIMESTAMP, DECIMAL, CHAR, VARCHAR types support in HCat - Key: HIVE-5814 URL: https://issues.apache.org/jira/browse/HIVE-5814 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5814 HCat-Pig type mapping.pdf, HIVE-5814.patch Hive 0.12 added support for new data types. Pig 0.12 added some as well. HCat should handle these as well.Also note that CHAR was added recently. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6238) HadoopShims.getLongComparator needs to be public
[ https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-6238: Attachment: HIVE-6238.1.patch HadoopShims.getLongComparator needs to be public Key: HIVE-6238 URL: https://issues.apache.org/jira/browse/HIVE-6238 Project: Hive Issue Type: Bug Components: Shims Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6238.1.patch HadoopShims.getLongComparator is package private, it should be public as it is used from other classes. {code} Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Review Request 16938: HIVE-6209 'LOAD DATA INPATH ... OVERWRITE ..' doesn't overwrite current data
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16938/#review32325 --- Looks fine to me. As you mentioned on the ticket, the filesystem equality check fails in most conditions and we don't hit this problem. It would be helpful to add a test case to verify the behavior. - Prasad Mujumdar On Jan. 16, 2014, 1:45 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16938/ --- (Updated Jan. 16, 2014, 1:45 a.m.) Review request for hive. Bugs: HIVE-6209 https://issues.apache.org/jira/browse/HIVE-6209 Repository: hive-git Description --- There was a wrong condition introduced in HIVE-3756, that prevented load data overwrite from working properly. In these situations, destf == oldPath == /user/warehouse/hive/tableName, so -rmr was skipped on old data. Note that if file name was same, ie load data inpath 'path' with same path repeatedly, it would work as the rename would overwrite the old data file. But in this case, the filename is different. Other minor changes are trying to improve logging in this area to better diagnose the issues (for example file permission, etc). Diffs - ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 2fe86e1 Diff: https://reviews.apache.org/r/16938/diff/ Testing --- The primary concern was whether removing the directory in these scenarios would make the rename fail. It should not due to fs.mkdirs call before, but I still verified the following scenarios: load/insert overwrite into table with partitions load/insert overwrite into table with buckets Thanks, Szehon Ho
[jira] [Commented] (HIVE-6209) 'LOAD DATA INPATH ... OVERWRITE ..' doesn't overwrite current data
[ https://issues.apache.org/jira/browse/HIVE-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876944#comment-13876944 ] Prasad Mujumdar commented on HIVE-6209: --- Looks fine to me. Some minor suggestions on the reviewboard. 'LOAD DATA INPATH ... OVERWRITE ..' doesn't overwrite current data -- Key: HIVE-6209 URL: https://issues.apache.org/jira/browse/HIVE-6209 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6209.patch In case where user loads data into table using overwrite, using a different file, it is not being overwritten. {code} $ hdfs dfs -cat /tmp/data aaa bbb ccc $ hdfs dfs -cat /tmp/data2 ddd eee fff $ hive hive create table test (id string); hive load data inpath '/tmp/data' overwrite into table test; hive select * from test; aaa bbb ccc hive load data inpath '/tmp/data2' overwrite into table test; hive select * from test; aaa bbb ccc ddd eee fff {code} It seems it is broken by HIVE-3756 which added another condition to whether rmr should be run on old directory, and skips in this case. There is a workaround of set fs.hdfs.impl.disable.cache=true; which sabotages this condition, but this condition should be removed in long-term. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public
[ https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876946#comment-13876946 ] Thejas M Nair commented on HIVE-6238: - I am not sure why this didn't result in an error when I ran 'mvn clean install ..' or 'mvn package -Pdist ..' , and only showed up when I ran bin/hive. {code} Exception in thread main java.lang.ExceptionInInitializerError at org.apache.hadoop.hive.cli.CliDriver.getCommandCompletor(CliDriver.java:541) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:758) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerUDAF(FunctionRegistry.java:1022) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.registerUDAF(FunctionRegistry.java:1015) at org.apache.hadoop.hive.ql.exec.FunctionRegistry.clinit(FunctionRegistry.java:372) ... 9 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113) ... 12 more Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) ... 17 more {code} [~brocknoland], Would you know why the compilation errors such as this one and HIVE-6196 don't result in the mvn commands failing ? HadoopShims.getLongComparator needs to be public Key: HIVE-6238 URL: https://issues.apache.org/jira/browse/HIVE-6238 Project: Hive Issue Type: Bug Components: Shims Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6238.1.patch HadoopShims.getLongComparator is package private, it should be public as it is used from other classes. {code} Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6239) HCatRecordSerDe should be removed
[ https://issues.apache.org/jira/browse/HIVE-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-6239: - Description: It doesn't seem to have any real purpose any more - only seems to be used in tests (was: It doesn't seem to have any real purpose any more) HCatRecordSerDe should be removed - Key: HIVE-6239 URL: https://issues.apache.org/jira/browse/HIVE-6239 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.13.0 Reporter: Eugene Koifman Priority: Minor It doesn't seem to have any real purpose any more - only seems to be used in tests -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6239) HCatRecordSerDe should be removed
Eugene Koifman created HIVE-6239: Summary: HCatRecordSerDe should be removed Key: HIVE-6239 URL: https://issues.apache.org/jira/browse/HIVE-6239 Project: Hive Issue Type: Sub-task Components: HCatalog Affects Versions: 0.13.0 Reporter: Eugene Koifman Priority: Minor It doesn't seem to have any real purpose any more -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6231) NPE when switching to Tez execution mode after session has been initialized
[ https://issues.apache.org/jira/browse/HIVE-6231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876957#comment-13876957 ] Hive QA commented on HIVE-6231: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12623989/HIVE-6231.1.patch {color:green}SUCCESS:{color} +1 4943 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/965/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/965/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12623989 NPE when switching to Tez execution mode after session has been initialized --- Key: HIVE-6231 URL: https://issues.apache.org/jira/browse/HIVE-6231 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6231.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Hive CBO - Branch Request
I was on vacation, sorry for late response. I was looking for branch committer provision. Thanks John On Thu, Dec 19, 2013 at 5:41 PM, Brock Noland br...@cloudera.com wrote: Hi, Do you have an Apache ID? (I don't see you here http://people.apache.org/committer-index.html). Without an Apache ID I am not sure how we'd give you access to commit the branch. More importantly I don't think we have any provision for branch committer in the Hive ByLaws ( https://cwiki.apache.org/confluence/display/Hive/Bylaws) or really any provisions for branches at all. We have talked about adding a branch merge provision but that has not occurred at present. As a side note, Hadoop did recently change their bylaws to include the concept of a branch committer. http://s.apache.org/hadoop-branch-committers Brock On Thu, Dec 19, 2013 at 6:19 PM, John Pullokkaran jpullokka...@hortonworks.com wrote: Hi, I am working on CBO for Hive (HIVE-5775https://issues.apache.org/jira/browse/HIVE-5775 ). In order to make code integration easier i would like to do this work on a separate branch which can be brought into trunk once code is stable and reviewed. It would also be easier if i could commit in to this branch without having to wait for a committer. Thanks John -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HIVE-6240) Update jetty to the latest stable (9.x)
Vaibhav Gumashta created HIVE-6240: -- Summary: Update jetty to the latest stable (9.x) Key: HIVE-6240 URL: https://issues.apache.org/jira/browse/HIVE-6240 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta We're using a very old version of jetty which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public
[ https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876991#comment-13876991 ] Brock Noland commented on HIVE-6238: Weird. Can you verify if the HadoopShims interface it's loading is the latest? HadoopShims.getLongComparator needs to be public Key: HIVE-6238 URL: https://issues.apache.org/jira/browse/HIVE-6238 Project: Hive Issue Type: Bug Components: Shims Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6238.1.patch HadoopShims.getLongComparator is package private, it should be public as it is used from other classes. {code} Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public
[ https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876994#comment-13876994 ] Brock Noland commented on HIVE-6238: Also with regards to HIVE-6196 I believe that javac will compile classes in the wrong directory, but java just won't run them. HadoopShims.getLongComparator needs to be public Key: HIVE-6238 URL: https://issues.apache.org/jira/browse/HIVE-6238 Project: Hive Issue Type: Bug Components: Shims Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6238.1.patch HadoopShims.getLongComparator is package private, it should be public as it is used from other classes. {code} Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6238) HadoopShims.getLongComparator needs to be public
[ https://issues.apache.org/jira/browse/HIVE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877022#comment-13877022 ] Navis commented on HIVE-6238: - I think it's not the reason of the problem. HadoopShims is public interface and all methods in it are all public (isn't it?) what ever it is defined. HadoopShims.getLongComparator needs to be public Key: HIVE-6238 URL: https://issues.apache.org/jira/browse/HIVE-6238 Project: Hive Issue Type: Bug Components: Shims Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6238.1.patch HadoopShims.getLongComparator is package private, it should be public as it is used from other classes. {code} Caused by: java.lang.Error: Unresolved compilation problem: The method getLongComparator() is undefined for the type HadoopShims at org.apache.hadoop.hive.ql.udf.UDAFPercentile.init(UDAFPercentile.java:51) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
[ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877027#comment-13877027 ] Nick Dimiduk commented on HIVE-1634: Hi [~vkorukanti]. The parent ticket for HBase types is HBASE-8089. The groundwork has been laid on the HBase side by way of a {{DataType}} API and an order-preserving serialization format. The next step, as I see it, would be to implement HBASE-10091, that way there's a common description language that can be used to declare HBase types. I'd love your thoughts on that topic if you have some moments to spare. Allow access to Primitive types stored in binary format in HBase Key: HIVE-1634 URL: https://issues.apache.org/jira/browse/HIVE-1634 Project: Hive Issue Type: New Feature Components: HBase Handler Affects Versions: 0.7.0, 0.8.0, 0.9.0 Reporter: Basab Maulik Assignee: Ashutosh Chauhan Fix For: 0.9.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-1634.D1581.3.patch, HIVE-1634.0.patch, HIVE-1634.1.patch, HIVE-1634.branch08.patch, TestHiveHBaseExternalTable.java, hive-1634_3.patch This addresses HIVE-1245 in part, for atomic or primitive types. The serde property hbase.columns.storage.types = -,b,b,b,b,b,b,b,b is a specification of the storage option for the corresponding column in the serde property hbase.columns.mapping. Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families use a colon separated pair such as 's:b' for the key and value part specifiers respectively. See the test cases and queries for HBase handler for additional examples. There is also a table property hbase.table.default.storage.type = string to specify a table level default storage type. The other valid specification is binary. The table level default is overridden by a column level specification. This control is available for the boolean, tinyint, smallint, int, bigint, float, and double primitive types. The attached patch also relaxes the mapping of map types to HBase column families to allow any primitive type to be the map key. Attached is a program for creating a table and populating it in HBase. The external table in Hive can access the data as shown in the example below. hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties (hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.691 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 NULLNULLNULLNULLNULLTest-String NULLNULL Time taken: 0.346 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.139 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,b,b,b ) tblproperties ( hbase.table.name = TestHiveHBaseExternalTable, hbase.table.default.storage.type = string); OK Time taken: 0.139 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768 -2147483648 -9223372036854775808 Test-String -2.1793132E-11 2.01345E291 Time taken: 0.151 seconds hive drop table TestHiveHBaseExternalTable; OK Time taken: 0.154 seconds hive create external table TestHiveHBaseExternalTable (key string, c_bool boolean, c_byte tinyint, c_short smallint, c_int int, c_long bigint, c_string string, c_float float, c_double double) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ( hbase.columns.mapping = :key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double, hbase.columns.storage.types = -,b,b,b,b,b,-,b,b ) tblproperties (hbase.table.name = TestHiveHBaseExternalTable); OK Time taken: 0.347 seconds hive select * from TestHiveHBaseExternalTable; OK key-1 true-128-32768
[jira] [Commented] (HIVE-3617) Predicates pushed down to hbase is not handled properly when constant part is shown first
[ https://issues.apache.org/jira/browse/HIVE-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877028#comment-13877028 ] Navis commented on HIVE-3617: - I also confused with that. negate() had the semantic as you described ( for =, etc), which is now removed. In a word, 3a is a3, not a=3 Predicates pushed down to hbase is not handled properly when constant part is shown first - Key: HIVE-3617 URL: https://issues.apache.org/jira/browse/HIVE-3617 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3617.3.patch.txt Test result could not show the difference because predicates pushed down are not removed currently(HIVE-2897). So I added log message(scan.toMap()) and checked the output. with query select * from hbase_ppd_keyrange where key 8 and key 21; timeRange=[0, 9223372036854775807], batch=-1, startRow=\x00\x00\x00\x08\x00, stopRow=\x00\x00\x00\x15, ... but with query select * from hbase_ppd_keyrange where 8 key and key 21; timeRange=[0, 9223372036854775807], batch=-1, startRow=, stopRow=\x00\x00\x00\x15, ... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6241) Remove direct reference of Hadoop23Shims inQTestUtil
Navis created HIVE-6241: --- Summary: Remove direct reference of Hadoop23Shims inQTestUtil Key: HIVE-6241 URL: https://issues.apache.org/jira/browse/HIVE-6241 Project: Hive Issue Type: Wish Components: Tests Reporter: Navis Assignee: Navis Priority: Trivial {code} if (clusterType == MiniClusterType.tez) { if (!(shims instanceof Hadoop23Shims)) { throw new Exception(Cannot run tez on hadoop-1, Version: +this.hadoopVer); } mr = ((Hadoop23Shims)shims).getMiniTezCluster(conf, 4, getHdfsUriString(fs.getUri().toString()), 1); } else { mr = shims.getMiniMrCluster(conf, 4, getHdfsUriString(fs.getUri().toString()), 1); } {code} Not important but a little annoying when the shims is not in classpath. And I think hadoop24shims or later might support tez. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5002) Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private
[ https://issues.apache.org/jira/browse/HIVE-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877040#comment-13877040 ] Hive QA commented on HIVE-5002: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12623995/HIVE-5002.2.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4931 tests executed *Failed tests:* {noformat} org.apache.hive.beeline.TestBeeLineWithArgs.org.apache.hive.beeline.TestBeeLineWithArgs {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/966/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/966/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12623995 Loosen readRowIndex visibility in ORC's RecordReaderImpl to package private --- Key: HIVE-5002 URL: https://issues.apache.org/jira/browse/HIVE-5002 Project: Hive Issue Type: Bug Components: File Formats Affects Versions: 0.12.0 Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: HIVE-5002.2.patch, HIVE-5002.D12015.1.patch, h-5002.patch, h-5002.patch Some users want to be able to access the rowIndexes directly from ORC reader extensions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6241) Remove direct reference of Hadoop23Shims inQTestUtil
[ https://issues.apache.org/jira/browse/HIVE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6241: Attachment: HIVE-6241.1.patch.txt Remove direct reference of Hadoop23Shims inQTestUtil Key: HIVE-6241 URL: https://issues.apache.org/jira/browse/HIVE-6241 Project: Hive Issue Type: Wish Components: Tests Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6241.1.patch.txt {code} if (clusterType == MiniClusterType.tez) { if (!(shims instanceof Hadoop23Shims)) { throw new Exception(Cannot run tez on hadoop-1, Version: +this.hadoopVer); } mr = ((Hadoop23Shims)shims).getMiniTezCluster(conf, 4, getHdfsUriString(fs.getUri().toString()), 1); } else { mr = shims.getMiniMrCluster(conf, 4, getHdfsUriString(fs.getUri().toString()), 1); } {code} Not important but a little annoying when the shims is not in classpath. And I think hadoop24shims or later might support tez. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6241) Remove direct reference of Hadoop23Shims inQTestUtil
[ https://issues.apache.org/jira/browse/HIVE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6241: Status: Patch Available (was: Open) Remove direct reference of Hadoop23Shims inQTestUtil Key: HIVE-6241 URL: https://issues.apache.org/jira/browse/HIVE-6241 Project: Hive Issue Type: Wish Components: Tests Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6241.1.patch.txt {code} if (clusterType == MiniClusterType.tez) { if (!(shims instanceof Hadoop23Shims)) { throw new Exception(Cannot run tez on hadoop-1, Version: +this.hadoopVer); } mr = ((Hadoop23Shims)shims).getMiniTezCluster(conf, 4, getHdfsUriString(fs.getUri().toString()), 1); } else { mr = shims.getMiniMrCluster(conf, 4, getHdfsUriString(fs.getUri().toString()), 1); } {code} Not important but a little annoying when the shims is not in classpath. And I think hadoop24shims or later might support tez. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6242) hive should print the current log file name
Thejas M Nair created HIVE-6242: --- Summary: hive should print the current log file name Key: HIVE-6242 URL: https://issues.apache.org/jira/browse/HIVE-6242 Project: Hive Issue Type: Bug Reporter: Thejas M Nair Hive cli and services should print the log dir that it is currently using. This should be logged at INFO level. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6240) Update jetty to the latest stable (9.x) in the service module
[ https://issues.apache.org/jira/browse/HIVE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6240: --- Description: We're using a very old version of jetty (6.x.x) which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html. (was: We're using a very old version of jetty which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.) Update jetty to the latest stable (9.x) in the service module - Key: HIVE-6240 URL: https://issues.apache.org/jira/browse/HIVE-6240 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta We're using a very old version of jetty (6.x.x) which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6240) Update jetty to the latest stable (9.x) in the service module
[ https://issues.apache.org/jira/browse/HIVE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6240: --- Description: We're using a very old version of jetty (6.x) which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html. (was: We're using a very old version of jetty (6.x.x) which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html.) Update jetty to the latest stable (9.x) in the service module - Key: HIVE-6240 URL: https://issues.apache.org/jira/browse/HIVE-6240 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta We're using a very old version of jetty (6.x) which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6240) Update jetty to the latest stable (9.x) in the service module
[ https://issues.apache.org/jira/browse/HIVE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-6240: --- Summary: Update jetty to the latest stable (9.x) in the service module (was: Update jetty to the latest stable (9.x)) Update jetty to the latest stable (9.x) in the service module - Key: HIVE-6240 URL: https://issues.apache.org/jira/browse/HIVE-6240 Project: Hive Issue Type: Bug Components: HiveServer2 Reporter: Vaibhav Gumashta We're using a very old version of jetty which has moved a lot: http://www.eclipse.org/jetty/documentation/current/what-jetty-version.html. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3617) Predicates pushed down to hbase is not handled properly when constant part is shown first
[ https://issues.apache.org/jira/browse/HIVE-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877070#comment-13877070 ] Ashutosh Chauhan commented on HIVE-3617: Ya, you are correct. Also, some of constant folding code in here won't be needed after HIVE-5771 perhaps we can simplify that whenever thats gets checked in. +1 lets get this one in. Predicates pushed down to hbase is not handled properly when constant part is shown first - Key: HIVE-3617 URL: https://issues.apache.org/jira/browse/HIVE-3617 Project: Hive Issue Type: Bug Components: HBase Handler Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-3617.3.patch.txt Test result could not show the difference because predicates pushed down are not removed currently(HIVE-2897). So I added log message(scan.toMap()) and checked the output. with query select * from hbase_ppd_keyrange where key 8 and key 21; timeRange=[0, 9223372036854775807], batch=-1, startRow=\x00\x00\x00\x08\x00, stopRow=\x00\x00\x00\x15, ... but with query select * from hbase_ppd_keyrange where 8 key and key 21; timeRange=[0, 9223372036854775807], batch=-1, startRow=, stopRow=\x00\x00\x00\x15, ... -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6144) Implement non-staged MapJoin
[ https://issues.apache.org/jira/browse/HIVE-6144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6144: Attachment: HIVE-6144.5.patch.txt Some tests seemed failed due to HIVE-6229 Implement non-staged MapJoin Key: HIVE-6144 URL: https://issues.apache.org/jira/browse/HIVE-6144 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6144.1.patch.txt, HIVE-6144.2.patch.txt, HIVE-6144.3.patch.txt, HIVE-6144.4.patch.txt, HIVE-6144.5.patch.txt For map join, all data in small aliases are hashed and stored into temporary file in MapRedLocalTask. But for some aliases without filter or projection, it seemed not necessary to do that. For example. {noformat} select a.* from src a join src b on a.key=b.key; {noformat} makes plan like this. {noformat} STAGE PLANS: Stage: Stage-4 Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a HashTable Sink Operator condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] Position of Big Table: 1 Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Stage: Stage-0 Fetch Operator {noformat} table src(a) is fetched and stored as-is in MRLocalTask. With this patch, plan can be like below. {noformat} Stage: Stage-3 Map Reduce Alias - Map Operator Tree: b TableScan alias: b Map Join Operator condition map: Inner Join 0 to 1 condition expressions: 0 {key} {value} 1 handleSkewJoin: false keys: 0 [Column[key]] 1 [Column[key]] outputColumnNames: _col0, _col1 Position of Big Table: 1 Select Operator File Output Operator Local Work: Map Reduce Local Work Alias - Map Local Tables: a Fetch Operator limit: -1 Alias - Map Local Operator Tree: a TableScan alias: a Has Any Stage Alias: false Stage: Stage-0 Fetch Operator {noformat} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6243) error in high-precision division for Decimal128
Eric Hanson created HIVE-6243: - Summary: error in high-precision division for Decimal128 Key: HIVE-6243 URL: https://issues.apache.org/jira/browse/HIVE-6243 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson a = 213474114411690 b = 5062120663 a * b = 1080631725579042037750470 (a * b) / a == actual: 251599050984618 expected: 213474114411690 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6243) error in high-precision division for Decimal128
[ https://issues.apache.org/jira/browse/HIVE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6243: -- Description: a = 213474114411690 b = 5062120663 a * b = 1080631725579042037750470 (a * b) / b == actual: 251599050984618 expected: 213474114411690 was: a = 213474114411690 b = 5062120663 a * b = 1080631725579042037750470 (a * b) / a == actual: 251599050984618 expected: 213474114411690 error in high-precision division for Decimal128 --- Key: HIVE-6243 URL: https://issues.apache.org/jira/browse/HIVE-6243 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson a = 213474114411690 b = 5062120663 a * b = 1080631725579042037750470 (a * b) / b == actual: 251599050984618 expected: 213474114411690 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6243) error in high-precision division for Decimal128
[ https://issues.apache.org/jira/browse/HIVE-6243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6243: -- Attachment: divide-error.01.patch Run TestDecimal128.testKnownPriorErrors() to exhibit the bug. Stepping through the code shows that a * b gives the correct value, but then dividing that by a again does not give the expected result. So, the bug is in the division method divideDestructive(); error in high-precision division for Decimal128 --- Key: HIVE-6243 URL: https://issues.apache.org/jira/browse/HIVE-6243 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Attachments: divide-error.01.patch a = 213474114411690 b = 5062120663 a * b = 1080631725579042037750470 (a * b) / b == actual: 251599050984618 expected: 213474114411690 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6244) hive UT fails on top of Hadoop 2.2.0
Gordon Wang created HIVE-6244: - Summary: hive UT fails on top of Hadoop 2.2.0 Key: HIVE-6244 URL: https://issues.apache.org/jira/browse/HIVE-6244 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.12.0 Reporter: Gordon Wang When building hive 0.12.0 on top of hadoop 2.2.0, a lot of UT fails. The error messages are like this. {code} Job Submission failed with exception 'java.lang.IllegalArgumentException(Wrong FS: pfile:/home/pivotal/jenkins/workspace/Hive0.12UT_withJDK7/build/ql/test/data/warehouse/src, expected: file:///)' junit.framework.AssertionFailedError: Client Execution failed with error code = 1 See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get more logs. at junit.framework.Assert.fail(Assert.java:50) at org.apache.hadoop.hive.cli.TestCliDriver.runTest(TestCliDriver.java:6697) at org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_empty(TestCliDriver.java:3807) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:520) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1060) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:911) {code} listLocatedStatus is not implemented in Hive shims. I think this is the root cause. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6083) User provided table properties are not assigned to the TableDesc of the FileSinkDesc in a CTAS query
[ https://issues.apache.org/jira/browse/HIVE-6083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877094#comment-13877094 ] Navis commented on HIVE-6083: - +1 User provided table properties are not assigned to the TableDesc of the FileSinkDesc in a CTAS query Key: HIVE-6083 URL: https://issues.apache.org/jira/browse/HIVE-6083 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-6083.1.patch.txt, HIVE-6083.2.patch.txt I was trying to use a CTAS query to create a table stored with ORC and orc.compress was set to SNAPPY. However, the table was still compressed as ZLIB (although the result of DESCRIBE still shows that this table is compressed by SNAPPY). For a CTAS query, SemanticAnalyzer.genFileSinkPlan uses CreateTableDesc to generate the TableDesc for the FileSinkDesc by calling PlanUtils.getTableDesc. However, in PlanUtils.getTableDesc, I do not see user provided table properties are assigned to the returned TableDesc (CreateTableDesc.getTblProps was not called in this method ). btw, I only checked the code of 0.12 and trunk. Two examples: * Snappy compression {code} create table web_sales_wrong_orc_snappy stored as orc tblproperties (orc.compress=SNAPPY) as select * from web_sales; {code} {code} describe formatted web_sales_wrong_orc_snappy; Location: hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_snappy Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles1 numRows 719384 orc.compressSNAPPY rawDataSize 97815412 totalSize 40625243 transient_lastDdlTime 1387566015 {code} {code} bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_snappy/00_0 Rows: 719384 Compression: ZLIB Compression size: 262144 ... {code} * No compression {code} create table web_sales_wrong_orc_none stored as orc tblproperties (orc.compress=NONE) as select * from web_sales; {code} {code} describe formatted web_sales_wrong_orc_none; Location: hdfs://localhost:54310/user/hive/warehouse/web_sales_wrong_orc_none Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles1 numRows 719384 orc.compressNONE rawDataSize 97815412 totalSize 40625243 transient_lastDdlTime 1387566064 {code} {code} bin/hive --orcfiledump /user/hive/warehouse/web_sales_wrong_orc_none/00_0 Rows: 719384 Compression: ZLIB Compression size: 262144 ... {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true
Chaoyu Tang created HIVE-6245: - Summary: HS2 creates DBs/Tables with wrong ownership when HMS setugi is true Key: HIVE-6245 URL: https://issues.apache.org/jira/browse/HIVE-6245 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang The case with following settings is valid but does not work correctly in current HS2: == hive.server2.authentication=NONE (or LDAP) hive.server2.enable.doAs= true hive.metastore.sasl.enabled=false hive.metastore.execute.setugi=true == Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC application) and create DBs/Tables with user's ownership. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5799: Status: Patch Available (was: Open) Rebased to trunk session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5799: Status: Open (was: Patch Available) session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5799: Attachment: HIVE-5799.4.patch.txt session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877153#comment-13877153 ] Navis commented on HIVE-5799: - [~thejas] I think client side pinging could be following issue of this. Timeout based server-side clean-up is much needed one for long-running services. session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6245) HS2 creates DBs/Tables with wrong ownership when HMS setugi is true
[ https://issues.apache.org/jira/browse/HIVE-6245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-6245: -- Attachment: HIVE-6245.patch Fixes include: 1. be able to open an impersonation session in an non-kerberized HS2 2. when working with non-kerberized HMS but with hive.metastore.execute.setugi set to true, remember to close the ThreadLocal Hive object thus avoiding using a stale HMS connection in a new session. HS2 creates DBs/Tables with wrong ownership when HMS setugi is true --- Key: HIVE-6245 URL: https://issues.apache.org/jira/browse/HIVE-6245 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Reporter: Chaoyu Tang Assignee: Chaoyu Tang Attachments: HIVE-6245.patch The case with following settings is valid but does not work correctly in current HS2: == hive.server2.authentication=NONE (or LDAP) hive.server2.enable.doAs= true hive.metastore.sasl.enabled=false hive.metastore.execute.setugi=true == Ideally, HS2 is able to impersonate the logged in user (from Beeline, or JDBC application) and create DBs/Tables with user's ownership. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877171#comment-13877171 ] Hive QA commented on HIVE-5783: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12624023/HIVE-5783.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4977 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.history.TestHiveHistory.testSimpleQuery {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/969/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12624023 Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877178#comment-13877178 ] Brock Noland commented on HIVE-5783: Failure was unrelated to the current patch: {noformat} java.lang.RuntimeException: commitTransaction was called but openTransactionCalls = 0. This probably indicates that there are unbalanced calls to openTransaction/commitTransaction at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:378) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:122) at $Proxy6.commitTransaction(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1085) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1117) {noformat} Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Fix For: 0.13.0 Attachments: HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch, HIVE-5783.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-664) optimize UDF split
[ https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-664: --- Status: Open (was: Patch Available) optimize UDF split -- Key: HIVE-664 URL: https://issues.apache.org/jira/browse/HIVE-664 Project: Hive Issue Type: Bug Components: UDF Reporter: Namit Jain Assignee: Teddy Choi Labels: optimization Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt, HIVE-664.3.patch.txt Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Show » ] Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Permlink | « Hide ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Show » ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Permlink | « Hide ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. [ Show » ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-664) optimize UDF split
[ https://issues.apache.org/jira/browse/HIVE-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877185#comment-13877185 ] Navis commented on HIVE-664: Ran a simple micro test on splitting only and found it's not faster significantly (max 15%?) than current implementation (even slower for sometimes). But reusing previous pattern string seemed good idea. Furthermore, if OI for regex is constant type, comparing itself can be ignored. Could you do that too? optimize UDF split -- Key: HIVE-664 URL: https://issues.apache.org/jira/browse/HIVE-664 Project: Hive Issue Type: Bug Components: UDF Reporter: Namit Jain Assignee: Teddy Choi Labels: optimization Attachments: HIVE-664.1.patch.txt, HIVE-664.2.patch.txt, HIVE-664.3.patch.txt Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Show » ] Min Zhou added a comment - 21/Jul/09 07:34 AM It's very useful for us . some comments: 1. Can you implement it directly with Text ? Avoiding string decoding and encoding would be faster. Of course that trick may lead to another problem, as String.split uses a regular expression for splitting. 2. getDisplayString() always return a string in lowercase. [ Permlink | « Hide ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Show » ] Namit Jain added a comment - 21/Jul/09 09:22 AM Committed. Thanks Emil [ Permlink | « Hide ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. [ Show » ] Emil Ibrishimov added a comment - 21/Jul/09 10:48 AM There are some easy (compromise) ways to optimize split: 1. Check if the regex argument actually contains some regex specific characters and if it doesn't, do a straightforward split without converting to strings. 2. Assume some default value for the second argument (for example - split(str) to be equivalent to split(str, ' ') and optimize for this value 3. Have two separate split functions - one that does regex and one that splits around plain text. I think that 1 is a good choice and can be done rather quickly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6099) Multi insert does not work properly with distinct count
[ https://issues.apache.org/jira/browse/HIVE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877276#comment-13877276 ] Navis commented on HIVE-6099: - Looks like hive.optimize.multigroupby.common.distincts optimization is not valid. I cannot imagine how to collect values of each distinct columns into single group when there are multiple distinct columns in the query. I think the optimization should be disabled. [~pavangm] set hive.optimize.multigroupby.common.distincts=false might be helpful. Multi insert does not work properly with distinct count --- Key: HIVE-6099 URL: https://issues.apache.org/jira/browse/HIVE-6099 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.9.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Pavan Gadam Manohar Assignee: Navis Labels: count, distinct, insert, multi-insert Attachments: explain_hive_0.10.0.txt Need 2 rows to reproduce this Bug. Here are the steps. Step 1) Create a table Table_A CREATE EXTERNAL TABLE Table_A ( user string , type int ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE LOCATION '/hive/path/Table_A'; Step 2) Scenario: Lets us say consider user tommy belong to both usertypes 111 and 123. Insert 2 records into the table created above. select * from Table_A; hive select * from table_a; OK tommy 123 2013-12-02 tommy 111 2013-12-02 Step 3) Create 2 destination tables to simulate multi-insert. CREATE EXTERNAL TABLE dest_Table_A ( p_date string , Distinct_Users int , Type111Users int , Type123Users int ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE LOCATION '/hive/path/dest_Table_A'; CREATE EXTERNAL TABLE dest_Table_B ( p_date string , Distinct_Users int , Type111Users int , Type123Users int ) PARTITIONED BY (dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS RCFILE LOCATION '/hive/path/dest_Table_B'; Step 4) Multi insert statement from Table_A a INSERT OVERWRITE TABLE dest_Table_A PARTITION(dt='2013-12-02') select a.dt ,count(distinct a.user) as AllDist ,count(distinct case when a.type = 111 then a.user else null end) as Type111User ,count(distinct case when a.type != 111 then a.user else null end) as Type123User group by a.dt INSERT OVERWRITE TABLE dest_Table_B PARTITION(dt='2013-12-02') select a.dt ,count(distinct a.user) as AllDist ,count(distinct case when a.type = 111 then a.user else null end) as Type111User ,count(distinct case when a.type != 111 then a.user else null end) as Type123User group by a.dt ; Step 5) Verify results. hive select * from dest_table_a; OK 2013-12-02 2 1 1 2013-12-02 Time taken: 0.116 seconds hive select * from dest_table_b; OK 2013-12-02 2 1 1 2013-12-02 Time taken: 0.13 seconds Conclusion: Hive gives a count of 2 for distinct users although there is only one distinct user. After trying many datasets observed that Hive is doing Type111Users + Typoe123Users = DistinctUsers which is wrong. hive select count(distinct a.user) from table_a a; Gives: Total MapReduce CPU Time Spent: 4 seconds 350 msec OK 1 -- This message was sent by Atlassian JIRA (v6.1.5#6160)