[jira] [Created] (HIVE-17945) Support column projection for index access when using Parquet Vectorization
Ferdinand Xu created HIVE-17945: --- Summary: Support column projection for index access when using Parquet Vectorization Key: HIVE-17945 URL: https://issues.apache.org/jira/browse/HIVE-17945 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17920) Vectorized reader does push down projection columns for index access schema
Ferdinand Xu created HIVE-17920: --- Summary: Vectorized reader does push down projection columns for index access schema Key: HIVE-17920 URL: https://issues.apache.org/jira/browse/HIVE-17920 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-17783) Hybrid Grace Join has performance degradation for N-way join using Hive on Tez
Ferdinand Xu created HIVE-17783: --- Summary: Hybrid Grace Join has performance degradation for N-way join using Hive on Tez Key: HIVE-17783 URL: https://issues.apache.org/jira/browse/HIVE-17783 Project: Hive Issue Type: Bug Affects Versions: 2.2.0 Environment: 8*Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 1 master + 7 workers TPC-DS at 3TB data scales Hive version : 2.2.0 Reporter: Ferdinand Xu Most configurations are using default value. And the benchmark is to test enabling against disabling hybrid grace hash join using TPC-DS queries at 3TB data scales. Many queries related to N-way join has performance degradation over three times test. Detailed result is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HIVE-16795) Measure Performance for Parquet Vectorization Reader
Ferdinand Xu created HIVE-16795: --- Summary: Measure Performance for Parquet Vectorization Reader Key: HIVE-16795 URL: https://issues.apache.org/jira/browse/HIVE-16795 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu We need to measure the performance of Parquet Vectorization reader feature using TPCx-BB or TPC-DS to see how much performance gain we can archive. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HIVE-15156) Support Nested Column Field Pruning for Parquet Vectorized Reader
Ferdinand Xu created HIVE-15156: --- Summary: Support Nested Column Field Pruning for Parquet Vectorized Reader Key: HIVE-15156 URL: https://issues.apache.org/jira/browse/HIVE-15156 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu As in HIVE-15055, we need support nested column fields pruning for vectorized reader as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-15112) Implement Parquet vectorization reader for Complex types
Ferdinand Xu created HIVE-15112: --- Summary: Implement Parquet vectorization reader for Complex types Key: HIVE-15112 URL: https://issues.apache.org/jira/browse/HIVE-15112 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Like HIVE-14815, we need support Parquet vectorized reader for complex types like map, struct and union as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14919) Improve the performance of Hive on Spark 2.0.0
Ferdinand Xu created HIVE-14919: --- Summary: Improve the performance of Hive on Spark 2.0.0 Key: HIVE-14919 URL: https://issues.apache.org/jira/browse/HIVE-14919 Project: Hive Issue Type: Improvement Reporter: Ferdinand Xu Assignee: Ferdinand Xu Attachments: benchmark.xlsx In HIVE-14029, we have updated Spark dependency to 2.0.0. We use Intel BigBench[1] to run benchmark over 10 GB data set comparing with Spark 1.6. We can see quite some performance degradations for all the queries of BigBench. For detailed information, please see the attached files. This JIRA is the umbrella ticket addressing those performance issues. [1] https://github.com/intel-hadoop/Big-Data-Benchmark-for-Big-Bench -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14916) Reduce the memory requirements for Spark tests
Ferdinand Xu created HIVE-14916: --- Summary: Reduce the memory requirements for Spark tests Key: HIVE-14916 URL: https://issues.apache.org/jira/browse/HIVE-14916 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu As HIVE-14887, we need to reduce the memory requirements for Spark tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14836) Implement predict pushing down in Vectorized Page reader
Ferdinand Xu created HIVE-14836: --- Summary: Implement predict pushing down in Vectorized Page reader Key: HIVE-14836 URL: https://issues.apache.org/jira/browse/HIVE-14836 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Currently we filter blocks using Predict pushing down. We should support it in page reader as well to improve its efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14827) Micro benchmark for Parquet vectorized reader
Ferdinand Xu created HIVE-14827: --- Summary: Micro benchmark for Parquet vectorized reader Key: HIVE-14827 URL: https://issues.apache.org/jira/browse/HIVE-14827 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu We need a microbenchmark to evaluate the throughput and execution time for Parquet vectorized reader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14826) Support vectorization for Parquet
Ferdinand Xu created HIVE-14826: --- Summary: Support vectorization for Parquet Key: HIVE-14826 URL: https://issues.apache.org/jira/browse/HIVE-14826 Project: Hive Issue Type: New Feature Reporter: Ferdinand Xu Assignee: Ferdinand Xu Parquet vectorized reader can improve both throughput and also leverages existing Hive vectorization execution engine. This is an umbrella ticket to track this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14825) Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0
Ferdinand Xu created HIVE-14825: --- Summary: Figure out the minimum set of required jars for Hive on Spark after bumping up to Spark 2.0.0 Key: HIVE-14825 URL: https://issues.apache.org/jira/browse/HIVE-14825 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Considering that there's no assembly jar for Spark since 2.0.0, we should figure out the minimum set of required jars for HoS to work after bumping up to Spark 2.0.0. By this way, users can decide whether they want to add just the required jars, or all the jars under spark's dir for convenience. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14815) Support vectorization for Parquet
Ferdinand Xu created HIVE-14815: --- Summary: Support vectorization for Parquet Key: HIVE-14815 URL: https://issues.apache.org/jira/browse/HIVE-14815 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14693) Some paritions will be left out when partition number is the multiple of the option hive.msck.repair.batch.size
Ferdinand Xu created HIVE-14693: --- Summary: Some paritions will be left out when partition number is the multiple of the option hive.msck.repair.batch.size Key: HIVE-14693 URL: https://issues.apache.org/jira/browse/HIVE-14693 Project: Hive Issue Type: Bug Components: Hive Reporter: Ferdinand Xu Assignee: Ferdinand Xu For example, bactch_size = 5, and no of partitions = 9, it will skip the last 4 partitions from being added. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14677) Beeline should support executing an initial SQL script
Ferdinand Xu created HIVE-14677: --- Summary: Beeline should support executing an initial SQL script Key: HIVE-14677 URL: https://issues.apache.org/jira/browse/HIVE-14677 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14676) JDBC driver should support executing an initial SQL script
Ferdinand Xu created HIVE-14676: --- Summary: JDBC driver should support executing an initial SQL script Key: HIVE-14676 URL: https://issues.apache.org/jira/browse/HIVE-14676 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-14029) Update Spark version to 1.6
Ferdinand Xu created HIVE-14029: --- Summary: Update Spark version to 1.6 Key: HIVE-14029 URL: https://issues.apache.org/jira/browse/HIVE-14029 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu There are quite some new optimizations in Spark 2.0.0. We need to bump up Spark to 2.0.0 to benefit those performance improvements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11958) Merge master to beeline-cli branch 09/25/2015
Ferdinand Xu created HIVE-11958: --- Summary: Merge master to beeline-cli branch 09/25/2015 Key: HIVE-11958 URL: https://issues.apache.org/jira/browse/HIVE-11958 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: beeline-cli-branch Reporter: Ferdinand Xu Assignee: Ferdinand Xu Fix For: beeline-cli-branch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11944) Address the review items on HIVE-11778
Ferdinand Xu created HIVE-11944: --- Summary: Address the review items on HIVE-11778 Key: HIVE-11944 URL: https://issues.apache.org/jira/browse/HIVE-11944 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu Fix For: beeline-cli-branch This jira will address review items from https://reviews.apache.org/r/38247/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11943) Set old CLI as the default Client when using hive script
Ferdinand Xu created HIVE-11943: --- Summary: Set old CLI as the default Client when using hive script Key: HIVE-11943 URL: https://issues.apache.org/jira/browse/HIVE-11943 Project: Hive Issue Type: Sub-task Components: CLI Affects Versions: beeline-cli-branch Reporter: Ferdinand Xu Assignee: Ferdinand Xu Since we have some concerns about deprecating the current CLI, we will set the old CLI as default. Once we resolve the problems, we will set the new CLI as default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11796) CLI option is not updated when executing the initial files[beeline-cli]
Ferdinand Xu created HIVE-11796: --- Summary: CLI option is not updated when executing the initial files[beeline-cli] Key: HIVE-11796 URL: https://issues.apache.org/jira/browse/HIVE-11796 Project: Hive Issue Type: Sub-task Affects Versions: beeline-cli-branch Reporter: Ferdinand Xu Assignee: Ferdinand Xu Fix For: beeline-cli-branch "Method not supported" is thrown when executing the initial files. This is caused by CLI option is not updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11778) Merge beeline-cli branch to trunk
Ferdinand Xu created HIVE-11778: --- Summary: Merge beeline-cli branch to trunk Key: HIVE-11778 URL: https://issues.apache.org/jira/browse/HIVE-11778 Project: Hive Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Ferdinand Xu Assignee: Ferdinand Xu The team working on the beeline-cli branch would like to merge their work to trunk. This jira will track that effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11770) Use the static variable from beeline instead of untils from JDBC
Ferdinand Xu created HIVE-11770: --- Summary: Use the static variable from beeline instead of untils from JDBC Key: HIVE-11770 URL: https://issues.apache.org/jira/browse/HIVE-11770 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor For beeline, we should use the constant BEELINE_DEFAULT_JDBC_URL in beeline instead of URL_PREFIX in jdbc Utils. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11769) Merge master to beeline-cli branch 09/09/2015
Ferdinand Xu created HIVE-11769: --- Summary: Merge master to beeline-cli branch 09/09/2015 Key: HIVE-11769 URL: https://issues.apache.org/jira/browse/HIVE-11769 Project: Hive Issue Type: Sub-task Affects Versions: beeline-cli-branch Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11747) Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode
Ferdinand Xu created HIVE-11747: --- Summary: Unnecessary error log is shown when executing a "INSERT OVERWRITE LOCAL DIRECTORY" cmd in the embedded mode Key: HIVE-11747 URL: https://issues.apache.org/jira/browse/HIVE-11747 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu The ”INSERT OVERWRITE LOCAL DIRECTORY“ task runs successfully while some error logs are thrown. {noformat} Connected to: Apache Hive (version 2.0.0-SNAPSHOT) Driver: Hive JDBC (version 2.0.0-SNAPSHOT) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 2.0.0-SNAPSHOT by Apache Hive hive> INSERT OVERWRITE LOCAL DIRECTORY '/nullformat' ROW FORMAT DELIMITED NULL DEFINED AS 'fooNull' SELECT a,b FROM base_tab; 18:35:51.288 [HiveServer2-Background-Pool: Thread-25] ERROR org.apache.hadoop.hive.ql.exec.mr.ExecDriver - yarn No rows affected (14.372 seconds) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11746) Connect command should not to be allowed from user[beeline-cli branch]
Ferdinand Xu created HIVE-11746: --- Summary: Connect command should not to be allowed from user[beeline-cli branch] Key: HIVE-11746 URL: https://issues.apache.org/jira/browse/HIVE-11746 Project: Hive Issue Type: Sub-task Components: Beeline Reporter: Ferdinand Xu Assignee: Ferdinand Xu For new cli, user should not be allowed to connect a server or database. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11717) nohup mode is not support for beeline
Ferdinand Xu created HIVE-11717: --- Summary: nohup mode is not support for beeline Key: HIVE-11717 URL: https://issues.apache.org/jira/browse/HIVE-11717 Project: Hive Issue Type: Sub-task Components: Beeline Reporter: Ferdinand Xu We are able use below hive command to run query file in batch mode. {noformat} nohup hive -S -f /home/wj19670/pad.sql >pad.csv & {noformat} However under beeline, we aren't able to use nohup anymore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11640) Shell command doesn't work for new CLI[beeline-cli]
Ferdinand Xu created HIVE-11640: --- Summary: Shell command doesn't work for new CLI[beeline-cli] Key: HIVE-11640 URL: https://issues.apache.org/jira/browse/HIVE-11640 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu The shell command doesn't work for the new CLI and "Error: Method not supported (state=,code=0)" was thrown during the execution for option f and e. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11637) Support hive.cli.print.current.db in new CLI[beeline-cli branch]
Ferdinand Xu created HIVE-11637: --- Summary: Support hive.cli.print.current.db in new CLI[beeline-cli branch] Key: HIVE-11637 URL: https://issues.apache.org/jira/browse/HIVE-11637 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11579) Invoke the set command will close standard error output[beeline-cli]
Ferdinand Xu created HIVE-11579: --- Summary: Invoke the set command will close standard error output[beeline-cli] Key: HIVE-11579 URL: https://issues.apache.org/jira/browse/HIVE-11579 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu We can easily reproduce the debug by the following steps: {code} hive> set system:xx=yy; hive> lss; hive> {code} The error output disappeared since the err outputstream is closed when closing the Hive statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11504) Predicate pushing down doesn't work for float type for Parquet
Ferdinand Xu created HIVE-11504: --- Summary: Predicate pushing down doesn't work for float type for Parquet Key: HIVE-11504 URL: https://issues.apache.org/jira/browse/HIVE-11504 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Predicate builder should use PrimitiveTypeName type in parquet side to construct predicate leaf instead of the type provided by PredicateLeaf. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11352) Avoid the double connections with 'e' option[beeline-cli branch]
Ferdinand Xu created HIVE-11352: --- Summary: Avoid the double connections with 'e' option[beeline-cli branch] Key: HIVE-11352 URL: https://issues.apache.org/jira/browse/HIVE-11352 Project: Hive Issue Type: Sub-task Components: Beeline, CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11336) Support initial file option for new CLI [beeline-cli branch]
Ferdinand Xu created HIVE-11336: --- Summary: Support initial file option for new CLI [beeline-cli branch] Key: HIVE-11336 URL: https://issues.apache.org/jira/browse/HIVE-11336 Project: Hive Issue Type: Sub-task Components: Beeline Affects Versions: beeline-cli-branch Reporter: Ferdinand Xu Assignee: Ferdinand Xu Option 'i' need to be enabled in the new CLI, which can support multiple initial files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11280) Support executing script file from hdfs in new CLI [Beeline-CLI branch]
Ferdinand Xu created HIVE-11280: --- Summary: Support executing script file from hdfs in new CLI [Beeline-CLI branch] Key: HIVE-11280 URL: https://issues.apache.org/jira/browse/HIVE-11280 Project: Hive Issue Type: Sub-task Components: Beeline, CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu In HIVE-7136, old CLI is able to read hive scripts from any of the supported file systems in hadoop eco-system. We need to support it in new CLI as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11277) Merge master to parquet 06/16/2015 [Parquet branch]
Ferdinand Xu created HIVE-11277: --- Summary: Merge master to parquet 06/16/2015 [Parquet branch] Key: HIVE-11277 URL: https://issues.apache.org/jira/browse/HIVE-11277 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11236) BeeLine-Cli: use the same output format as old CLI in the new CLI
Ferdinand Xu created HIVE-11236: --- Summary: BeeLine-Cli: use the same output format as old CLI in the new CLI Key: HIVE-11236 URL: https://issues.apache.org/jira/browse/HIVE-11236 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu In old CLI, the output format is as follows: {noformat} hive> show tables; OK tbl1_name tbl2_name Time taken: 0.808 seconds, Fetched: 2 row(s) {noformat} This requires the default outputformat for new CLI is csv2 and disable the showHeader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11226) BeeLine-Cli: support hive.cli.prompt in new CLI
Ferdinand Xu created HIVE-11226: --- Summary: BeeLine-Cli: support hive.cli.prompt in new CLI Key: HIVE-11226 URL: https://issues.apache.org/jira/browse/HIVE-11226 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu Beeline uses a different prompt format from old CLI. And for the old CLI, it supports configuration. We need change new CLI as the old prompt style. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11203) Beeline force option doesn't force execution when errors occurred in a script.
Ferdinand Xu created HIVE-11203: --- Summary: Beeline force option doesn't force execution when errors occurred in a script. Key: HIVE-11203 URL: https://issues.apache.org/jira/browse/HIVE-11203 Project: Hive Issue Type: Bug Components: Beeline Reporter: Ferdinand Xu Assignee: Ferdinand Xu The force option doesn't function as wiki described. https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-11191) Beeline-cli: support hive.cli.errors.ignore in new CLI
Ferdinand Xu created HIVE-11191: --- Summary: Beeline-cli: support hive.cli.errors.ignore in new CLI Key: HIVE-11191 URL: https://issues.apache.org/jira/browse/HIVE-11191 Project: Hive Issue Type: Sub-task Components: CLI Reporter: Ferdinand Xu Assignee: Ferdinand Xu In the old CLI, it uses "hive.cli.errors.ignore" from the hive configuration to force execution a script when errors occurred. In the beeline, it has a similar option called force. We need to support the previous configuration using beeline functionality. More details about force option are available in https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10979) Fix failed tests in TestSchemaTool after the version number change in HIVE-10921
Ferdinand Xu created HIVE-10979: --- Summary: Fix failed tests in TestSchemaTool after the version number change in HIVE-10921 Key: HIVE-10979 URL: https://issues.apache.org/jira/browse/HIVE-10979 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu Some version variables in sql are not updated in HIVE-10921 which caused unit test failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10975) Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT
Ferdinand Xu created HIVE-10975: --- Summary: Parquet: Bump the parquet version up to 1.8.0rc2-SNAPSHOT Key: HIVE-10975 URL: https://issues.apache.org/jira/browse/HIVE-10975 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Priority: Minor There are lots of changes since parquet's graduation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10943) Beeline-cli: Enable precommit for beelie-cli branch
Ferdinand Xu created HIVE-10943: --- Summary: Beeline-cli: Enable precommit for beelie-cli branch Key: HIVE-10943 URL: https://issues.apache.org/jira/browse/HIVE-10943 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10821) Beeline-CLI: Implement all CLI command using Beeline functionality
Ferdinand Xu created HIVE-10821: --- Summary: Beeline-CLI: Implement all CLI command using Beeline functionality Key: HIVE-10821 URL: https://issues.apache.org/jira/browse/HIVE-10821 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10749) Implement Insert statement for parquet
Ferdinand Xu created HIVE-10749: --- Summary: Implement Insert statement for parquet Key: HIVE-10749 URL: https://issues.apache.org/jira/browse/HIVE-10749 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu We need to implement insert statement for parquet format like ORC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10747) enable the cleanup side effect for Encryption related qfile test
Ferdinand Xu created HIVE-10747: --- Summary: enable the cleanup side effect for Encryption related qfile test Key: HIVE-10747 URL: https://issues.apache.org/jira/browse/HIVE-10747 Project: Hive Issue Type: Sub-task Components: Testing Infrastructure Reporter: Ferdinand Xu Assignee: Ferdinand Xu The hive conf is not reset in the clearTestSideEffects method which is involved from HIVE-8900. This will have pollute other qfile's settings running by TestEncryptedHDFSCliDriver -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10718) Update committer list - Add Ferdinand Xu
Ferdinand Xu created HIVE-10718: --- Summary: Update committer list - Add Ferdinand Xu Key: HIVE-10718 URL: https://issues.apache.org/jira/browse/HIVE-10718 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor NO PRECOMMIT TESTS add myself to committer list -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10717) Fix failed qtest encryption_insert_partition_static test in Jenkin
Ferdinand Xu created HIVE-10717: --- Summary: Fix failed qtest encryption_insert_partition_static test in Jenkin Key: HIVE-10717 URL: https://issues.apache.org/jira/browse/HIVE-10717 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu It can be reproduced in Jenkins. See http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3898/testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10705) Update tests for HIVE-9302 after removing binaries
Ferdinand Xu created HIVE-10705: --- Summary: Update tests for HIVE-9302 after removing binaries Key: HIVE-10705 URL: https://issues.apache.org/jira/browse/HIVE-10705 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10684) Fix the UT failures for HIVE7553 after HIVE-10674 removed the binary jar files
Ferdinand Xu created HIVE-10684: --- Summary: Fix the UT failures for HIVE7553 after HIVE-10674 removed the binary jar files Key: HIVE-10684 URL: https://issues.apache.org/jira/browse/HIVE-10684 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10624) Update the initial script to make beeline bucked cli as default and allow user choose old hive cli by env
Ferdinand Xu created HIVE-10624: --- Summary: Update the initial script to make beeline bucked cli as default and allow user choose old hive cli by env Key: HIVE-10624 URL: https://issues.apache.org/jira/browse/HIVE-10624 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu As discussed in the dev-list, we should update the script to make new beeline bucked cli default and allow user to change to old cli by environment variable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10623) Implement hive cli options using beeline functionality
Ferdinand Xu created HIVE-10623: --- Summary: Implement hive cli options using beeline functionality Key: HIVE-10623 URL: https://issues.apache.org/jira/browse/HIVE-10623 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu We need to support the original hive cli options for the purpose of backwards compatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10461) Implement Record Updater and Raw Merger for Parquet as well
Ferdinand Xu created HIVE-10461: --- Summary: Implement Record Updater and Raw Merger for Parquet as well Key: HIVE-10461 URL: https://issues.apache.org/jira/browse/HIVE-10461 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu The Record updater will create the data with acid information. And for the raw record merger it can provide the user-view data. In this jira, we should implement these two classes and make the basic acid w/r case work. For the upper layer like FileSinkOperator, CompactorMR and TxnManager, we can file new jiras to fix them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10460) change the key of Parquet Record to Nullwritable instead of void
Ferdinand Xu created HIVE-10460: --- Summary: change the key of Parquet Record to Nullwritable instead of void Key: HIVE-10460 URL: https://issues.apache.org/jira/browse/HIVE-10460 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu AcidInputFormat is accepting the key type implement the writable interface. So the void type is not valid if we want to make acid work for parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10372) Bump parquet version to 1.6.0
Ferdinand Xu created HIVE-10372: --- Summary: Bump parquet version to 1.6.0 Key: HIVE-10372 URL: https://issues.apache.org/jira/browse/HIVE-10372 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10189) Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMI optimization
Ferdinand Xu created HIVE-10189: --- Summary: Create a micro benchmark tool for vectorization to evaluate the performance gain after SIMI optimization Key: HIVE-10189 URL: https://issues.apache.org/jira/browse/HIVE-10189 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10135) Add qtest to access struct after parquet column index access enabled
Ferdinand Xu created HIVE-10135: --- Summary: Add qtest to access struct after parquet column index access enabled Key: HIVE-10135 URL: https://issues.apache.org/jira/browse/HIVE-10135 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10079) Enable parquet column index in HIVE
Ferdinand Xu created HIVE-10079: --- Summary: Enable parquet column index in HIVE Key: HIVE-10079 URL: https://issues.apache.org/jira/browse/HIVE-10079 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10077) Use new ParquetInputSplit constructor API
Ferdinand Xu created HIVE-10077: --- Summary: Use new ParquetInputSplit constructor API Key: HIVE-10077 URL: https://issues.apache.org/jira/browse/HIVE-10077 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10076) Update parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6
Ferdinand Xu created HIVE-10076: --- Summary: Update parquet-hadoop-bundle and parquet-column to the version of 1.6.0rc6 Key: HIVE-10076 URL: https://issues.apache.org/jira/browse/HIVE-10076 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10054) Clean up ETypeConverter since Parquet supports timestamp type already
Ferdinand Xu created HIVE-10054: --- Summary: Clean up ETypeConverter since Parquet supports timestamp type already Key: HIVE-10054 URL: https://issues.apache.org/jira/browse/HIVE-10054 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10053) Override new init API fom ReadSupport instead of the deprecated one
Ferdinand Xu created HIVE-10053: --- Summary: Override new init API fom ReadSupport instead of the deprecated one Key: HIVE-10053 URL: https://issues.apache.org/jira/browse/HIVE-10053 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-10032) Remove broken java file from source code
Ferdinand Xu created HIVE-10032: --- Summary: Remove broken java file from source code Key: HIVE-10032 URL: https://issues.apache.org/jira/browse/HIVE-10032 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor Remove all hcatalog broken java files in java source code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.
[ https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9252: --- Attachment: (was: HIVE-9252.patch) > Linking custom SerDe jar to table definition. > - > > Key: HIVE-9252 > URL: https://issues.apache.org/jira/browse/HIVE-9252 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Niels Basjes >Assignee: Ferdinand Xu > Attachments: HIVE-9252.1.patch > > > In HIVE-6047 the option was created that a jar file can be hooked to the > definition of a function. (See: [Language Manual DDL: Permanent > Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions] > ) > I propose to add something similar that can be used when defining an external > table that relies on a custom Serde (I expect to usually only have the > Deserializer). > Something like this: > {code} > CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name > ... > STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]; > {code} > Using this you can define (and share !!!) a Hive table on top of a custom > fileformat without the need to let the IT operations people deploy a custom > SerDe jar file on all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9661) Refine debug log with schema information for the method of creating session directories
[ https://issues.apache.org/jira/browse/HIVE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9661: --- Status: Patch Available (was: Open) > Refine debug log with schema information for the method of creating session > directories > --- > > Key: HIVE-9661 > URL: https://issues.apache.org/jira/browse/HIVE-9661 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu >Priority: Minor > Attachments: HIVE-9661.patch > > > For a session, the scratch directory can be either a local path or a hdfs > scratch path. The method name createRootHDFSDir is quite confusing. So add > the schema information to the debug log for the troubleshooting need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9661) Refine debug log with schema information for the method of creating session directories
[ https://issues.apache.org/jira/browse/HIVE-9661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9661: --- Attachment: HIVE-9661.patch > Refine debug log with schema information for the method of creating session > directories > --- > > Key: HIVE-9661 > URL: https://issues.apache.org/jira/browse/HIVE-9661 > Project: Hive > Issue Type: Bug >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu >Priority: Minor > Attachments: HIVE-9661.patch > > > For a session, the scratch directory can be either a local path or a hdfs > scratch path. The method name createRootHDFSDir is quite confusing. So add > the schema information to the debug log for the troubleshooting need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9661) Refine debug log with schema information for the method of creating session directories
Ferdinand Xu created HIVE-9661: -- Summary: Refine debug log with schema information for the method of creating session directories Key: HIVE-9661 URL: https://issues.apache.org/jira/browse/HIVE-9661 Project: Hive Issue Type: Bug Reporter: Ferdinand Xu Assignee: Ferdinand Xu Priority: Minor For a session, the scratch directory can be either a local path or a hdfs scratch path. The method name createRootHDFSDir is quite confusing. So add the schema information to the debug log for the troubleshooting need. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.
[ https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9252: --- Status: Patch Available (was: Open) > Linking custom SerDe jar to table definition. > - > > Key: HIVE-9252 > URL: https://issues.apache.org/jira/browse/HIVE-9252 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Niels Basjes >Assignee: Ferdinand Xu > Attachments: HIVE-9252.1.patch, HIVE-9252.patch > > > In HIVE-6047 the option was created that a jar file can be hooked to the > definition of a function. (See: [Language Manual DDL: Permanent > Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions] > ) > I propose to add something similar that can be used when defining an external > table that relies on a custom Serde (I expect to usually only have the > Deserializer). > Something like this: > {code} > CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name > ... > STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]; > {code} > Using this you can define (and share !!!) a Hive table on top of a custom > fileformat without the need to let the IT operations people deploy a custom > SerDe jar file on all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.
[ https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9252: --- Attachment: HIVE-9252.1.patch rebase patch > Linking custom SerDe jar to table definition. > - > > Key: HIVE-9252 > URL: https://issues.apache.org/jira/browse/HIVE-9252 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Niels Basjes >Assignee: Ferdinand Xu > Attachments: HIVE-9252.1.patch, HIVE-9252.patch > > > In HIVE-6047 the option was created that a jar file can be hooked to the > definition of a function. (See: [Language Manual DDL: Permanent > Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions] > ) > I propose to add something similar that can be used when defining an external > table that relies on a custom Serde (I expect to usually only have the > Deserializer). > Something like this: > {code} > CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name > ... > STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]; > {code} > Using this you can define (and share !!!) a Hive table on top of a custom > fileformat without the need to let the IT operations people deploy a custom > SerDe jar file on all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9252) Linking custom SerDe jar to table definition.
[ https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9252: --- Attachment: HIVE-9252.patch The initial patch is attached! > Linking custom SerDe jar to table definition. > - > > Key: HIVE-9252 > URL: https://issues.apache.org/jira/browse/HIVE-9252 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Niels Basjes >Assignee: Ferdinand Xu > Attachments: HIVE-9252.patch > > > In HIVE-6047 the option was created that a jar file can be hooked to the > definition of a function. (See: [Language Manual DDL: Permanent > Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions] > ) > I propose to add something similar that can be used when defining an external > table that relies on a custom Serde (I expect to usually only have the > Deserializer). > Something like this: > {code} > CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name > ... > STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]; > {code} > Using this you can define (and share !!!) a Hive table on top of a custom > fileformat without the need to let the IT operations people deploy a custom > SerDe jar file on all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306534#comment-14306534 ] Ferdinand Xu commented on HIVE-9302: Thank Sergio for your review. @[~brocknoland], do you have any further comments for my patch? > Beeline add commands to register local jdbc driver names and jars > - > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.patch, > mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jdbc > driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8136) Reduce table locking
[ https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306528#comment-14306528 ] Ferdinand Xu commented on HIVE-8136: Hi [~brocknoland], I agree with you that an exclusive lock is a must for altering table structure. I think ADDCLUSTERSORTCOLUMN can use shared lock instead. Please see my previous comments for details. > Reduce table locking > > > Key: HIVE-8136 > URL: https://issues.apache.org/jira/browse/HIVE-8136 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: HIVE-8136.patch > > > When using ZK for concurrency control, some statements require an exclusive > table lock when they are atomic. Such as setting a tables location. > This JIRA is to analyze the scope of statements like ALTER TABLE and see if > we can reduce the locking required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9302: --- Attachment: HIVE-9302.3.patch Hi Sergio, I have update my patch according to your comments. Please help me review it if you have some time. Thank you! > Beeline add commands to register local jdbc driver names and jars > - > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.2.patch, HIVE-9302.3.patch, HIVE-9302.patch, > mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jdbc > driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9522) Improve the speed of select count(*) statement for a parquet table with big input(~1GB)
[ https://issues.apache.org/jira/browse/HIVE-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9522: --- Summary: Improve the speed of select count(*) statement for a parquet table with big input(~1GB) (was: Improve the speed of select count(*) statement for a parquet table with big input(~1Gb)) > Improve the speed of select count(*) statement for a parquet table with big > input(~1GB) > --- > > Key: HIVE-9522 > URL: https://issues.apache.org/jira/browse/HIVE-9522 > Project: Hive > Issue Type: Sub-task >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298082#comment-14298082 ] Ferdinand Xu commented on HIVE-9333: Thanks Sergio for your patch. LGTM +1 > Move parquet serialize implementation to DataWritableWriter to improve write > speeds > --- > > Key: HIVE-9333 > URL: https://issues.apache.org/jira/browse/HIVE-9333 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch > > > The serialize process on ParquetHiveSerDe parses a Hive object > to a Writable object by looping through all the Hive object children, > and creating new Writables objects per child. These final writables > objects are passed in to the Parquet writing function, and parsed again > on the DataWritableWriter class by looping through the ArrayWritable > object. These two loops (ParquetHiveSerDe.serialize() and > DataWritableWriter.write() may be reduced to use just one loop into the > DataWritableWriter.write() method in order to increment the writing process > speed for Hive parquet. > In order to achieve this, we can wrap the Hive object and object inspector > on ParquetHiveSerDe.serialize() method into an object that implements the > Writable object and thus avoid the loop that serialize() does, and leave the > loop parser to the DataWritableWriter.write() method. We can see how ORC does > this with the OrcSerde.OrcSerdeRow class. > Writable objects are organized differently on any kind of storage formats, so > I don't think it is necessary to create and keep the writable objects in the > serialize() method as they won't be used until the writing process starts > (DataWritableWriter.write()). > This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9522) Improve the speed of select count(*) statement for a parquet table with big input(~1Gb)
[ https://issues.apache.org/jira/browse/HIVE-9522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9522: --- Summary: Improve the speed of select count(*) statement for a parquet table with big input(~1Gb) (was: Improve select count(*) statement for a parquet table with big input(~1Gb)) > Improve the speed of select count(*) statement for a parquet table with big > input(~1Gb) > --- > > Key: HIVE-9522 > URL: https://issues.apache.org/jira/browse/HIVE-9522 > Project: Hive > Issue Type: Sub-task >Reporter: Ferdinand Xu >Assignee: Ferdinand Xu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9522) Improve select count(*) statement for a parquet table with big input(~1Gb)
Ferdinand Xu created HIVE-9522: -- Summary: Improve select count(*) statement for a parquet table with big input(~1Gb) Key: HIVE-9522 URL: https://issues.apache.org/jira/browse/HIVE-9522 Project: Hive Issue Type: Sub-task Reporter: Ferdinand Xu Assignee: Ferdinand Xu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8136) Reduce table locking
[ https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14297970#comment-14297970 ] Ferdinand Xu commented on HIVE-8136: Sounds unrelated failed cases. > Reduce table locking > > > Key: HIVE-8136 > URL: https://issues.apache.org/jira/browse/HIVE-8136 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: HIVE-8136.patch > > > When using ZK for concurrency control, some statements require an exclusive > table lock when they are atomic. Such as setting a tables location. > This JIRA is to analyze the scope of statements like ALTER TABLE and see if > we can reduce the locking required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9252) Linking custom SerDe jar to table definition.
[ https://issues.apache.org/jira/browse/HIVE-9252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-9252: -- Assignee: Ferdinand Xu > Linking custom SerDe jar to table definition. > - > > Key: HIVE-9252 > URL: https://issues.apache.org/jira/browse/HIVE-9252 > Project: Hive > Issue Type: New Feature > Components: Serializers/Deserializers >Reporter: Niels Basjes >Assignee: Ferdinand Xu > > In HIVE-6047 the option was created that a jar file can be hooked to the > definition of a function. (See: [Language Manual DDL: Permanent > Functions|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-PermanentFunctions] > ) > I propose to add something similar that can be used when defining an external > table that relies on a custom Serde (I expect to usually only have the > Deserializer). > Something like this: > {code} > CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name > ... > STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] > [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]; > {code} > Using this you can define (and share !!!) a Hive table on top of a custom > fileformat without the need to let the IT operations people deploy a custom > SerDe jar file on all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8136) Reduce table locking
[ https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-8136: --- Status: Patch Available (was: In Progress) > Reduce table locking > > > Key: HIVE-8136 > URL: https://issues.apache.org/jira/browse/HIVE-8136 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: HIVE-8136.patch > > > When using ZK for concurrency control, some statements require an exclusive > table lock when they are atomic. Such as setting a tables location. > This JIRA is to analyze the scope of statements like ALTER TABLE and see if > we can reduce the locking required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8136) Reduce table locking
[ https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-8136: --- Attachment: HIVE-8136.patch > Reduce table locking > > > Key: HIVE-8136 > URL: https://issues.apache.org/jira/browse/HIVE-8136 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: HIVE-8136.patch > > > When using ZK for concurrency control, some statements require an exclusive > table lock when they are atomic. Such as setting a tables location. > This JIRA is to analyze the scope of statements like ALTER TABLE and see if > we can reduce the locking required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8136) Reduce table locking
[ https://issues.apache.org/jira/browse/HIVE-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296607#comment-14296607 ] Ferdinand Xu commented on HIVE-8136: Currently the following alter table write type is trying to acquire an exclusive lock. DDL_EXCLUSIVE; RENAMECOLUMN ADDCLUSTERSORTCOLUMN: ADDFILEFORMAT: DROPPROPS: REPLACECOLS: ARCHIVE: UNARCHIVE: ALTERPROTECTMODE: ALTERPARTITIONPROTECTMODE: ALTERLOCATION: DROPPARTITION: RENAMEPARTITION: ADDSKEWEDBY: ALTERSKEWEDLOCATION: ALTERBUCKETNUM: ALTERPARTITION: ADDCOLS: RENAME: TRUNCATE: MERGEFILES: Other following is using shared lock: ADDSERDE ADDPARTITION ADDSERDEPROPS ADDPROPS Others has no lock: COMPACT TOUCH For changing table structure, an exclusive lock is a must. Most of the cases use the exclusive lock since it changes the table or partition structure currently. For adding cluster column and sort column, we can use shared lock for the following reason. {quote} The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query. {quote} For changing the properties, I think we can use no lock if it doesn't change the structure of the table. We can do a follow-up jira. Any thought about it, [~brocknoland]? > Reduce table locking > > > Key: HIVE-8136 > URL: https://issues.apache.org/jira/browse/HIVE-8136 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Ferdinand Xu > > When using ZK for concurrency control, some statements require an exclusive > table lock when they are atomic. Such as setting a tables location. > This JIRA is to analyze the scope of statements like ALTER TABLE and see if > we can reduce the locking required. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests
[ https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296201#comment-14296201 ] Ferdinand Xu commented on HIVE-9470: Thank you for your update. +1 > Use a generic writable object to run ColumnaStorageBench write/read tests > -- > > Key: HIVE-9470 > URL: https://issues.apache.org/jira/browse/HIVE-9470 > Project: Hive > Issue Type: Improvement >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-9470.1.patch, HIVE-9470.2.patch > > > The ColumnarStorageBench benchmark class is using a Parquet writable object > to run all write/read/serialize/deserialize tests. It would be better to use > a more generic writable object (like text writables) to get better benchmark > results between format storages. > Using parquet writables may add advantage when writing parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add commands to register local jdbc driver names and jars
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296117#comment-14296117 ] Ferdinand Xu commented on HIVE-9302: Thanks [~thejas] for your update! > Beeline add commands to register local jdbc driver names and jars > - > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.2.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, > postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jdbc > driver jars and register custom jdbc driver names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9302: --- Attachment: HIVE-9302.2.patch > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.2.patch, HIVE-9302.patch, mysql-connector-java-bin.jar, > postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294853#comment-14294853 ] Ferdinand Xu commented on HIVE-9302: Sorry, I meant to. There are two kinds of use cases. One is to add an existing known driver like mysql driver or postgres driver. Current supported driver are postgres and mysql. {noformat} # beeline beeline> !addlocaldriverjar /path/to/mysql-connector-java-5.1.27-bin.jar beeline> !connect mysql://host:3306/testdb {noformat} And another is to add a customized driver. {noformat} # beeline beeline>!addlocaldriverjar /path/to/DummyDriver-1.0-SNAPSHOT.jar beeline>!!addlocaldrivername org.apache.dummy.DummyDrive beeline> !connect mysql://host:3306/testdb {noformat} > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293700#comment-14293700 ] Ferdinand Xu commented on HIVE-9302: Failed cases are caused by lack of Driver jar files attached in this jira. > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
[ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292844#comment-14292844 ] Ferdinand Xu commented on HIVE-9333: Thanks Sergio for your patch. I have left some general questions in the review board. > Move parquet serialize implementation to DataWritableWriter to improve write > speeds > --- > > Key: HIVE-9333 > URL: https://issues.apache.org/jira/browse/HIVE-9333 > Project: Hive > Issue Type: Sub-task >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-9333.2.patch > > > The serialize process on ParquetHiveSerDe parses a Hive object > to a Writable object by looping through all the Hive object children, > and creating new Writables objects per child. These final writables > objects are passed in to the Parquet writing function, and parsed again > on the DataWritableWriter class by looping through the ArrayWritable > object. These two loops (ParquetHiveSerDe.serialize() and > DataWritableWriter.write() may be reduced to use just one loop into the > DataWritableWriter.write() method in order to increment the writing process > speed for Hive parquet. > In order to achieve this, we can wrap the Hive object and object inspector > on ParquetHiveSerDe.serialize() method into an object that implements the > Writable object and thus avoid the loop that serialize() does, and leave the > loop parser to the DataWritableWriter.write() method. We can see how ORC does > this with the OrcSerde.OrcSerdeRow class. > Writable objects are organized differently on any kind of storage formats, so > I don't think it is necessary to create and keep the writable objects in the > serialize() method as they won't be used until the writing process starts > (DataWritableWriter.write()). > This performance issue was found using microbenchmark tests from HIVE-8121. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9470) Use a generic writable object to run ColumnaStorageBench write/read tests
[ https://issues.apache.org/jira/browse/HIVE-9470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14292772#comment-14292772 ] Ferdinand Xu commented on HIVE-9470: LGTM with some minor suggestions. {noformat} 131 public ColumnarStorageBench() { {noformat} Please remove extra space. {noformat} 233 private ObjectInspector getParquetObjectInspector(final String columnTypes) { {noformat} Can you rename it with getArrayWritableObjectInspector since it will be used by both parquet and orc? {noformat} 242 Writable parquetWritable = createRecord(TypeInfoUtils.getTypeInfosFromTypeString(columnTypes)); {noformat} Can you rename it with recordWritable for the same reason as above? > Use a generic writable object to run ColumnaStorageBench write/read tests > -- > > Key: HIVE-9470 > URL: https://issues.apache.org/jira/browse/HIVE-9470 > Project: Hive > Issue Type: Improvement >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-9470.1.patch > > > The ColumnarStorageBench benchmark class is using a Parquet writable object > to run all write/read/serialize/deserialize tests. It would be better to use > a more generic writable object (like text writables) to get better benchmark > results between format storages. > Using parquet writables may add advantage when writing parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
[ https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9371: --- Resolution: Duplicate Status: Resolved (was: Patch Available) > Execution error for Parquet table and GROUP BY involving CHAR data type > --- > > Key: HIVE-9371 > URL: https://issues.apache.org/jira/browse/HIVE-9371 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Reporter: Matt McCline >Assignee: Ferdinand Xu >Priority: Critical > Attachments: HIVE-9371.1.patch, HIVE-9371.patch, HIVE-9371.patch > > > Query fails involving PARQUET table format, CHAR data type, and GROUP BY. > Probably also fails for VARCHAR, too. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 10 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) > ... 16 more > {noformat} > Here is a q file: > {noformat} > SET hive.vectorized.execution.enabled=false; > drop table char_2; > create table char_2 ( > key char(10), > value char(20) > ) stored as parquet; > insert overwrite table char_2 select * from src; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value asc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value desc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > drop table char_2; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9302: --- Attachment: (was: HIVE-9302.1.patch) > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9302: --- Attachment: HIVE-9302.1.patch > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9302: --- Attachment: DummyDriver-1.0-SNAPSHOT.jar > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: DummyDriver-1.0-SNAPSHOT.jar, HIVE-9302.1.patch, > HIVE-9302.patch, mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9302) Beeline add jar local to client
[ https://issues.apache.org/jira/browse/HIVE-9302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9302: --- Attachment: HIVE-9302.1.patch > Beeline add jar local to client > --- > > Key: HIVE-9302 > URL: https://issues.apache.org/jira/browse/HIVE-9302 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Ferdinand Xu > Attachments: HIVE-9302.1.patch, HIVE-9302.patch, > mysql-connector-java-bin.jar, postgresql-9.3.jdbc3.jar > > > At present if a beeline user uses {{add jar}} the path they give is actually > on the HS2 server. It'd be great to allow beeline users to add local jars as > well. > It might be useful to do this in the jdbc driver itself. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9450) [Parquet] Check all data types work for Parquet in Group By operator
[ https://issues.apache.org/jira/browse/HIVE-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291326#comment-14291326 ] Ferdinand Xu commented on HIVE-9450: Hi [~brocknoland] and [~dongc], do we really need to change the WritableHiveCharObjectInspector.java ? See https://issues.apache.org/jira/browse/HIVE-9371 > [Parquet] Check all data types work for Parquet in Group By operator > > > Key: HIVE-9450 > URL: https://issues.apache.org/jira/browse/HIVE-9450 > Project: Hive > Issue Type: Sub-task >Reporter: Dong Chen >Assignee: Dong Chen > Attachments: HIVE-9450.patch, HIVE-9450.patch > > > Check all data types work for Parquet in Group By operator. > 1. Add test cases for data types. > 2. Fix the ClassCastException bug for CHAR&VARCHAR used in group by for > Parquet. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
[ https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9371: --- Attachment: HIVE-9371.1.patch > Execution error for Parquet table and GROUP BY involving CHAR data type > --- > > Key: HIVE-9371 > URL: https://issues.apache.org/jira/browse/HIVE-9371 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Reporter: Matt McCline >Assignee: Ferdinand Xu >Priority: Critical > Attachments: HIVE-9371.1.patch, HIVE-9371.patch, HIVE-9371.patch > > > Query fails involving PARQUET table format, CHAR data type, and GROUP BY. > Probably also fails for VARCHAR, too. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 10 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) > ... 16 more > {noformat} > Here is a q file: > {noformat} > SET hive.vectorized.execution.enabled=false; > drop table char_2; > create table char_2 ( > key char(10), > value char(20) > ) stored as parquet; > insert overwrite table char_2 select * from src; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value asc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value desc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > drop table char_2; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
[ https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9371: --- Attachment: HIVE-9371.patch Reupload my patch to kick off the precommit. > Execution error for Parquet table and GROUP BY involving CHAR data type > --- > > Key: HIVE-9371 > URL: https://issues.apache.org/jira/browse/HIVE-9371 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Reporter: Matt McCline >Assignee: Ferdinand Xu >Priority: Critical > Attachments: HIVE-9371.patch, HIVE-9371.patch > > > Query fails involving PARQUET table format, CHAR data type, and GROUP BY. > Probably also fails for VARCHAR, too. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 10 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) > ... 16 more > {noformat} > Here is a q file: > {noformat} > SET hive.vectorized.execution.enabled=false; > drop table char_2; > create table char_2 ( > key char(10), > value char(20) > ) stored as parquet; > insert overwrite table char_2 select * from src; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value asc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value desc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > drop table char_2; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
[ https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285076#comment-14285076 ] Ferdinand Xu commented on HIVE-9371: Hi [~brocknoland] [~mohitsabharwal], can you help take a look at my fix when you have some time? > Execution error for Parquet table and GROUP BY involving CHAR data type > --- > > Key: HIVE-9371 > URL: https://issues.apache.org/jira/browse/HIVE-9371 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Reporter: Matt McCline >Assignee: Ferdinand Xu >Priority: Critical > Attachments: HIVE-9371.patch > > > Query fails involving PARQUET table format, CHAR data type, and GROUP BY. > Probably also fails for VARCHAR, too. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 10 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) > ... 16 more > {noformat} > Here is a q file: > {noformat} > SET hive.vectorized.execution.enabled=false; > drop table char_2; > create table char_2 ( > key char(10), > value char(20) > ) stored as parquet; > insert overwrite table char_2 select * from src; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value asc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value desc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > drop table char_2; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-8838: -- Assignee: Ferdinand Xu > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Ferdinand Xu > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
[ https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9371: --- Status: Patch Available (was: Open) > Execution error for Parquet table and GROUP BY involving CHAR data type > --- > > Key: HIVE-9371 > URL: https://issues.apache.org/jira/browse/HIVE-9371 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Reporter: Matt McCline >Assignee: Ferdinand Xu >Priority: Critical > Attachments: HIVE-9371.patch > > > Query fails involving PARQUET table format, CHAR data type, and GROUP BY. > Probably also fails for VARCHAR, too. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 10 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) > ... 16 more > {noformat} > Here is a q file: > {noformat} > SET hive.vectorized.execution.enabled=false; > drop table char_2; > create table char_2 ( > key char(10), > value char(20) > ) stored as parquet; > insert overwrite table char_2 select * from src; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value asc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value desc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > drop table char_2; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9371) Execution error for Parquet table and GROUP BY involving CHAR data type
[ https://issues.apache.org/jira/browse/HIVE-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu updated HIVE-9371: --- Attachment: HIVE-9371.patch > Execution error for Parquet table and GROUP BY involving CHAR data type > --- > > Key: HIVE-9371 > URL: https://issues.apache.org/jira/browse/HIVE-9371 > Project: Hive > Issue Type: Bug > Components: File Formats, Query Processor >Reporter: Matt McCline >Assignee: Ferdinand Xu >Priority: Critical > Attachments: HIVE-9371.patch > > > Query fails involving PARQUET table format, CHAR data type, and GROUP BY. > Probably also fails for VARCHAR, too. > {noformat} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to > org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) > ... 10 more > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be > cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable > at > org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) > at > org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) > at > org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) > ... 16 more > {noformat} > Here is a q file: > {noformat} > SET hive.vectorized.execution.enabled=false; > drop table char_2; > create table char_2 ( > key char(10), > value char(20) > ) stored as parquet; > insert overwrite table char_2 select * from src; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value asc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value asc > limit 5; > select value, sum(cast(key as int)), count(*) numrows > from src > group by value > order by value desc > limit 5; > explain select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > -- should match the query from src > select value, sum(cast(key as int)), count(*) numrows > from char_2 > group by value > order by value desc > limit 5; > drop table char_2; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)