[jira] [Commented] (HIVE-9699) Extend PTFs to provide referenced columns for CP
[ https://issues.apache.org/jira/browse/HIVE-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325584#comment-14325584 ] Lefty Leverenz commented on HIVE-9699: -- Does this need any user documentation? Extend PTFs to provide referenced columns for CP Key: HIVE-9699 URL: https://issues.apache.org/jira/browse/HIVE-9699 Project: Hive Issue Type: Improvement Components: PTF-Windowing Reporter: Navis Assignee: Navis Priority: Trivial Fix For: 1.2.0 Attachments: HIVE-9699.1.patch.txt, HIVE-9699.2.patch.txt As described in HIVE-9341, If PTFs can provide referenced column names, column pruner can use that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3781) Index related events should be delivered to metastore event listener
[ https://issues.apache.org/jira/browse/HIVE-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325560#comment-14325560 ] Lefty Leverenz commented on HIVE-3781: -- Doc done: The wiki has been updated so I removed the TODOC15 label. Version information was not needed, because *hive.exec.drop.ignorenonexistent* has covered DROP INDEX since 0.7.0 when the parameter was created (HIVE-1858). Index related events should be delivered to metastore event listener Key: HIVE-3781 URL: https://issues.apache.org/jira/browse/HIVE-3781 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0 Reporter: Sudhanshu Arora Assignee: Navis Fix For: 1.1.0 Attachments: HIVE-3781.5.patch.txt, HIVE-3781.6.patch.txt, HIVE-3781.7.patch.txt, HIVE-3781.D7731.1.patch, HIVE-3781.D7731.2.patch, HIVE-3781.D7731.3.patch, HIVE-3781.D7731.4.patch, hive.3781.3.patch, hive.3781.4.patch An event listener must be called for any DDL activity. For example, create_index, drop_index today does not call metaevent listener. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3781) Index related events should be delivered to metastore event listener
[ https://issues.apache.org/jira/browse/HIVE-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-3781: - Labels: (was: TODOC15) Index related events should be delivered to metastore event listener Key: HIVE-3781 URL: https://issues.apache.org/jira/browse/HIVE-3781 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.9.0 Reporter: Sudhanshu Arora Assignee: Navis Fix For: 1.1.0 Attachments: HIVE-3781.5.patch.txt, HIVE-3781.6.patch.txt, HIVE-3781.7.patch.txt, HIVE-3781.D7731.1.patch, HIVE-3781.D7731.2.patch, HIVE-3781.D7731.3.patch, HIVE-3781.D7731.4.patch, hive.3781.3.patch, hive.3781.4.patch An event listener must be called for any DDL activity. For example, create_index, drop_index today does not call metaevent listener. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9556) create UDF to calculate the Levenshtein distance between two strings
[ https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325588#comment-14325588 ] Hive QA commented on HIVE-9556: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12699433/HIVE-9556.3.patch {color:green}SUCCESS:{color} +1 7560 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2819/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2819/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2819/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12699433 - PreCommit-HIVE-TRUNK-Build create UDF to calculate the Levenshtein distance between two strings Key: HIVE-9556 URL: https://issues.apache.org/jira/browse/HIVE-9556 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch, HIVE-9556.3.patch Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965. Example: The Levenshtein distance between kitten and sitting is 3 1. kitten → sitten (substitution of s for k) 2. sitten → sittin (substitution of i for e) 3. sittin → sitting (insertion of g at the end). {code} select levenshtein('kitten', 'sitting'); 3 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-2573) Create per-session function registry
[ https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325556#comment-14325556 ] Lefty Leverenz commented on HIVE-2573: -- Doc update: The description of *hive.exec.drop.ignorenonexistent* has been updated in the wiki. Does the per-session function registry need to be documented? Create per-session function registry - Key: HIVE-2573 URL: https://issues.apache.org/jira/browse/HIVE-2573 Project: Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Navis Assignee: Navis Priority: Minor Labels: TODOC1.2 Fix For: 1.2.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, HIVE-2573.1.patch.txt, HIVE-2573.10.patch.txt, HIVE-2573.11.patch.txt, HIVE-2573.12.patch.txt, HIVE-2573.13.patch.txt, HIVE-2573.14.patch.txt, HIVE-2573.15.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, HIVE-2573.4.patch.txt, HIVE-2573.5.patch, HIVE-2573.6.patch, HIVE-2573.7.patch, HIVE-2573.8.patch.txt, HIVE-2573.9.patch.txt Currently the function registry is shared resource and could be overrided by other users when using HiveServer. If per-session function registry is provided, this situation could be prevented. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9712) Row count and data size are set to LONG.MAX when source table has 0 rows
[ https://issues.apache.org/jira/browse/HIVE-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damien Carol updated HIVE-9712: --- Summary: Row count and data size are set to LONG.MAX when source table has 0 rows (was: Hive : Row count and data size are set to LONG.MAX when source table has 0 rows) Row count and data size are set to LONG.MAX when source table has 0 rows Key: HIVE-9712 URL: https://issues.apache.org/jira/browse/HIVE-9712 Project: Hive Issue Type: Bug Components: Physical Optimizer Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Prasanth Jayachandran TPC-DS Q66 generates and in-efficient plan because cardinality estimate of dimension table gets set to 9223372036854775807. {code} Map 10 Map Operator Tree: TableScan alias: ship_mode filterExpr: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 0 Data size: 47 Basic stats: PARTIAL Column stats: COMPLETE Filter Operator predicate: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and sm_ship_mode_sk is not null) (type: boolean) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: sm_ship_mode_sk (type: int) outputColumnNames: _col0 Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: int) sort order: + Map-reduce partition columns: _col0 (type: int) Statistics: Num rows: 9223372036854775807 Data size: 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE Execution mode: vectorized {code} Full plan {code} explain select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,ship_carriers ,year ,sum(jan_sales) as jan_sales ,sum(feb_sales) as feb_sales ,sum(mar_sales) as mar_sales ,sum(apr_sales) as apr_sales ,sum(may_sales) as may_sales ,sum(jun_sales) as jun_sales ,sum(jul_sales) as jul_sales ,sum(aug_sales) as aug_sales ,sum(sep_sales) as sep_sales ,sum(oct_sales) as oct_sales ,sum(nov_sales) as nov_sales ,sum(dec_sales) as dec_sales ,sum(jan_sales/w_warehouse_sq_ft) as jan_sales_per_sq_foot ,sum(feb_sales/w_warehouse_sq_ft) as feb_sales_per_sq_foot ,sum(mar_sales/w_warehouse_sq_ft) as mar_sales_per_sq_foot ,sum(apr_sales/w_warehouse_sq_ft) as apr_sales_per_sq_foot ,sum(may_sales/w_warehouse_sq_ft) as may_sales_per_sq_foot ,sum(jun_sales/w_warehouse_sq_ft) as jun_sales_per_sq_foot ,sum(jul_sales/w_warehouse_sq_ft) as jul_sales_per_sq_foot ,sum(aug_sales/w_warehouse_sq_ft) as aug_sales_per_sq_foot ,sum(sep_sales/w_warehouse_sq_ft) as sep_sales_per_sq_foot ,sum(oct_sales/w_warehouse_sq_ft) as oct_sales_per_sq_foot ,sum(nov_sales/w_warehouse_sq_ft) as nov_sales_per_sq_foot ,sum(dec_sales/w_warehouse_sq_ft) as dec_sales_per_sq_foot ,sum(jan_net) as jan_net ,sum(feb_net) as feb_net ,sum(mar_net) as mar_net ,sum(apr_net) as apr_net ,sum(may_net) as may_net ,sum(jun_net) as jun_net ,sum(jul_net) as jul_net ,sum(aug_net) as aug_net ,sum(sep_net) as sep_net ,sum(oct_net) as oct_net ,sum(nov_net) as nov_net ,sum(dec_net) as dec_net from ( select w_warehouse_name ,w_warehouse_sq_ft ,w_city ,w_county ,w_state ,w_country ,concat('DIAMOND', ',', 'AIRBORNE') as ship_carriers ,d_year as year ,sum(case when d_moy = 1 then ws_sales_price* ws_quantity else 0 end) as jan_sales ,sum(case when d_moy = 2 then ws_sales_price* ws_quantity else 0 end) as feb_sales ,sum(case when d_moy = 3 then ws_sales_price* ws_quantity else 0 end) as mar_sales ,sum(case when d_moy = 4 then ws_sales_price* ws_quantity else 0 end) as apr_sales ,sum(case when d_moy = 5 then ws_sales_price* ws_quantity else 0 end) as may_sales ,sum(case when d_moy = 6 then ws_sales_price* ws_quantity else 0 end) as jun_sales ,sum(case when d_moy = 7
[jira] [Commented] (HIVE-9188) BloomFilter support in ORC
[ https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325694#comment-14325694 ] Lefty Leverenz commented on HIVE-9188: -- Doc note: [~prasanth_j] documented this in the ORC wikidoc. * [ORC Files -- Bloom Filter Index | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-BloomFilterIndex] BloomFilter support in ORC -- Key: HIVE-9188 URL: https://issues.apache.org/jira/browse/HIVE-9188 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Labels: orcfile Fix For: 1.2.0 Attachments: HIVE-9188.1.patch, HIVE-9188.10.patch, HIVE-9188.11.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, HIVE-9188.4.patch, HIVE-9188.5.patch, HIVE-9188.6.patch, HIVE-9188.7.patch, HIVE-9188.8.patch, HIVE-9188.9.patch BloomFilters are well known probabilistic data structure for set membership checking. We can use bloom filters in ORC index for better row group pruning. Currently, ORC row group index uses min/max statistics to eliminate row groups (stripes as well) that do not satisfy predicate condition specified in the query. But in some cases, the efficiency of min/max based elimination is not optimal (unsorted columns with wide range of entries). Bloom filters can be an effective and efficient alternative for row group/split elimination for point queries or queries with IN clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-6755) Zookeeper Lock Manager leaks zookeeper connections.
[ https://issues.apache.org/jira/browse/HIVE-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrey Stepachev resolved HIVE-6755. Resolution: Won't Fix Zookeeper Lock Manager leaks zookeeper connections. --- Key: HIVE-6755 URL: https://issues.apache.org/jira/browse/HIVE-6755 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.12.0 Environment: cloudera cdh5b2 Reporter: Andrey Stepachev Priority: Critical Attachments: HIVE-6755.patch Driver holds instance for ZkHiveLockManager. In turn SqlQuery holds it too. So if we have many not closed queries we will get many zk sessions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9715) Add support for SSL keypass
Wellington Chevreuil created HIVE-9715: -- Summary: Add support for SSL keypass Key: HIVE-9715 URL: https://issues.apache.org/jira/browse/HIVE-9715 Project: Hive Issue Type: Improvement Reporter: Wellington Chevreuil Priority: Minor Currently, Hive Server allows for setting keystore file password only. It does not support to use keys with password. This feature is supported by some other hadoop services, such as HDFS, HBASE, MR. It would be nice to have this behaviour in hive consistent with the other mentioned services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9715) Add support for SSL keypass
[ https://issues.apache.org/jira/browse/HIVE-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HIVE-9715: --- Component/s: HiveServer2 Add support for SSL keypass --- Key: HIVE-9715 URL: https://issues.apache.org/jira/browse/HIVE-9715 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Wellington Chevreuil Priority: Minor Currently, Hive Server allows for setting keystore file password only. It does not support to use keys with password. This feature is supported by some other hadoop services, such as HDFS, HBASE, MR. It would be nice to have this behaviour in hive consistent with the other mentioned services. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.
[ https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu resolved HIVE-9546. Resolution: Duplicate You are hitting HIVE-9199. If you think it's not such issue, please reopen and provide more information. Create table taking substantially longer time when other select queries are run in parallel. Key: HIVE-9546 URL: https://issues.apache.org/jira/browse/HIVE-9546 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: RedHat Linux, Cloudera 5.3.0 Reporter: sri venu bora Assignee: Aihua Xu Attachments: Hive_create_Issue.txt Create table taking substantially longer time when other select queries are run in parallel. We were able to reproduce the issue using beeline in two sessions. Beeline Shell 1: a) create table with no other queries running on hive ( took approximately 0.313 seconds) b) Insert Data into the table c) Run a select count query on the above table Beeline Shell 2: a) create table while step c) is running in the Beeline Shell 1. (took approximately 60.431 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: VOTE Bylaw for having branch committers in hive
+1 thanks Prasad On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Folks, We seem to have quite a few projects going around and in the interest of time and the project as a whole, it seems good to have branch committers much like what is there in the Hadoop project. I am proposing an addition to the committer bylaws as follows ( taken from the hadoop project bylaws http://hadoop.apache.org/bylaws.html ) Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. Actions: New Branch Committer Description: When a new branch committer is proposed for the project. Approval: Lazy Consensus Binding Votes: Active PMC members Minimum Length: 3 days Mailing List: priv...@hive.apache.org Actions: Removal of Branch Committer Description: When a branch committer is removed from the project. Approval: Consensus Binding Votes: Active PMC members excluding the committer in question if they are PMC members too. Minimum Length: 6 days Mailing List: priv...@hive.apache.org This vote will run for 6 days. PMC members please vote. Thanks Vikram.
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326302#comment-14326302 ] Peter Lin commented on HIVE-7292: - Would love to use this production, is it going to release in hive 15? Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9717) The max/min function used by AggrStats for decimal type is not what we expected
Pengcheng Xiong created HIVE-9717: - Summary: The max/min function used by AggrStats for decimal type is not what we expected Key: HIVE-9717 URL: https://issues.apache.org/jira/browse/HIVE-9717 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong In current version hive-schema-1.2.0, in TABLE PART_COL_STATS, we store the BIG_DECIMAL_LOW_VALUE and BIG_DECIMAL_HIGH_VALUE as varchar. For example, derby BIG_DECIMAL_LOW_VALUE VARCHAR(4000), BIG_DECIMAL_HIGH_VALUE VARCHAR(4000) mssql BIG_DECIMAL_HIGH_VALUE varchar(255) NULL, BIG_DECIMAL_LOW_VALUE varchar(255) NULL, mysql `BIG_DECIMAL_LOW_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin, `BIG_DECIMAL_HIGH_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin, oracle BIG_DECIMAL_LOW_VALUE VARCHAR2(4000), BIG_DECIMAL_HIGH_VALUE VARCHAR2(4000), postgres BIG_DECIMAL_LOW_VALUE character varying(4000) DEFAULT NULL::character varying, BIG_DECIMAL_HIGH_VALUE character varying(4000) DEFAULT NULL::character varying, And, when we do the aggrstats, we do a MAX/MIN of all the BIG_DECIMAL_HIGH_VALU/BIG_DECIMAL_LOW_VALUEE of partitions. We are expecting a max/min of a decimal (a number). However, it is actually a max/min of a varchar (a string). As a result, '900' is more than '1000'. This also affects the extrapolation of the status. The proposed solution is to use a CAST function to cast it to decimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 1
I guess the README.txt can list Apache Spark as query execution framework along with MapReduce and Tez. thanks Prasad On Tue, Feb 17, 2015 at 1:07 PM, Brock Noland br...@cloudera.com wrote: Thank you Alan. That is my mistake actually. We can delete this now and will do so here: https://issues.apache.org/jira/browse/HIVE-9708 On Tue, Feb 17, 2015 at 10:37 AM, Alan Gates alanfga...@gmail.com wrote: It looks like a jar file snuck into the source release: gates find . -name \*.jar ./testlibs/ant-contrib-1.0b3.jar Apache policy is that binary files cannot be in releases. Alan. Brock Noland br...@cloudera.com February 16, 2015 at 21:08 Apache Hive 1.1.0 Release Candidate 0 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc1/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1024/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc1/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Comment Edited] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326336#comment-14326336 ] Xuefu Zhang edited comment on HIVE-7292 at 2/18/15 6:37 PM: Formerly 0.15, now 1.1 is going to be released soon. Release candidate is out. was (Author: xuefuz): Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out. Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
Yongzhi Chen created HIVE-9716: -- Summary: Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 0.13.0, 0.12.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count (*) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326352#comment-14326352 ] Aihua Xu commented on HIVE-3454: Yeah. I have tested with an MR job and it picks up the hive-site.xml without the problem with hiveserver2 or CLI. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9188) BloomFilter support in ORC
[ https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-9188: Release Note: Support for Bloom Filters in ORC internal index. BloomFilter support in ORC -- Key: HIVE-9188 URL: https://issues.apache.org/jira/browse/HIVE-9188 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Labels: orcfile Fix For: 1.2.0 Attachments: HIVE-9188.1.patch, HIVE-9188.10.patch, HIVE-9188.11.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, HIVE-9188.4.patch, HIVE-9188.5.patch, HIVE-9188.6.patch, HIVE-9188.7.patch, HIVE-9188.8.patch, HIVE-9188.9.patch BloomFilters are well known probabilistic data structure for set membership checking. We can use bloom filters in ORC index for better row group pruning. Currently, ORC row group index uses min/max statistics to eliminate row groups (stripes as well) that do not satisfy predicate condition specified in the query. But in some cases, the efficiency of min/max based elimination is not optimal (unsorted columns with wide range of entries). Bloom filters can be an effective and efficient alternative for row group/split elimination for point queries or queries with IN clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9717) The max/min function used by AggrStats for decimal type is not what we expected
[ https://issues.apache.org/jira/browse/HIVE-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-9717: - Assignee: Pengcheng Xiong The max/min function used by AggrStats for decimal type is not what we expected --- Key: HIVE-9717 URL: https://issues.apache.org/jira/browse/HIVE-9717 Project: Hive Issue Type: Bug Reporter: Pengcheng Xiong Assignee: Pengcheng Xiong In current version hive-schema-1.2.0, in TABLE PART_COL_STATS, we store the BIG_DECIMAL_LOW_VALUE and BIG_DECIMAL_HIGH_VALUE as varchar. For example, derby BIG_DECIMAL_LOW_VALUE VARCHAR(4000), BIG_DECIMAL_HIGH_VALUE VARCHAR(4000) mssql BIG_DECIMAL_HIGH_VALUE varchar(255) NULL, BIG_DECIMAL_LOW_VALUE varchar(255) NULL, mysql `BIG_DECIMAL_LOW_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin, `BIG_DECIMAL_HIGH_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin, oracle BIG_DECIMAL_LOW_VALUE VARCHAR2(4000), BIG_DECIMAL_HIGH_VALUE VARCHAR2(4000), postgres BIG_DECIMAL_LOW_VALUE character varying(4000) DEFAULT NULL::character varying, BIG_DECIMAL_HIGH_VALUE character varying(4000) DEFAULT NULL::character varying, And, when we do the aggrstats, we do a MAX/MIN of all the BIG_DECIMAL_HIGH_VALU/BIG_DECIMAL_LOW_VALUEE of partitions. We are expecting a max/min of a decimal (a number). However, it is actually a max/min of a varchar (a string). As a result, '900' is more than '1000'. This also affects the extrapolation of the status. The proposed solution is to use a CAST function to cast it to decimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
[ https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-9716: --- Description: When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more was: When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count (*) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException:
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
I guess the README.txt can list Apache Spark as query execution framework along with MapReduce and Tez. thanks Prasad On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com wrote: +1 1. downloaded the src and bin, and verified md5. 2. built the src with -Phadoop-1 and -Phadoop-2. 3. ran a few unit tests Thanks, Xuefu On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Updated] (HIVE-9617) UDF from_utc_timestamp throws NPE if the second argument is null
[ https://issues.apache.org/jira/browse/HIVE-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-9617: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. Thanks Alexander for the fix and for being persistent on getting your patch reviewed. UDF from_utc_timestamp throws NPE if the second argument is null Key: HIVE-9617 URL: https://issues.apache.org/jira/browse/HIVE-9617 Project: Hive Issue Type: Bug Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Priority: Minor Attachments: HIVE-9617.1.patch, HIVE-9617.2.patch UDF from_utc_timestamp throws NPE if the second argument is null {code} select from_utc_timestamp('2015-02-06 10:30:00', cast(null as string)); FAILED: NullPointerException null {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9188) BloomFilter support in ORC
[ https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326408#comment-14326408 ] Prasanth Jayachandran commented on HIVE-9188: - [~leftylev] Thanks for the doc edits! BloomFilter support in ORC -- Key: HIVE-9188 URL: https://issues.apache.org/jira/browse/HIVE-9188 Project: Hive Issue Type: New Feature Components: File Formats Affects Versions: 0.15.0 Reporter: Prasanth Jayachandran Assignee: Prasanth Jayachandran Labels: orcfile Fix For: 1.2.0 Attachments: HIVE-9188.1.patch, HIVE-9188.10.patch, HIVE-9188.11.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, HIVE-9188.4.patch, HIVE-9188.5.patch, HIVE-9188.6.patch, HIVE-9188.7.patch, HIVE-9188.8.patch, HIVE-9188.9.patch BloomFilters are well known probabilistic data structure for set membership checking. We can use bloom filters in ORC index for better row group pruning. Currently, ORC row group index uses min/max statistics to eliminate row groups (stripes as well) that do not satisfy predicate condition specified in the query. But in some cases, the efficiency of min/max based elimination is not optimal (unsorted columns with wide range of entries). Bloom filters can be an effective and efficient alternative for row group/split elimination for point queries or queries with IN clause. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326336#comment-14326336 ] Xuefu Zhang commented on HIVE-7292: --- Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out. Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326335#comment-14326335 ] Xuefu Zhang commented on HIVE-7292: --- Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out. Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme
[ https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9716: --- Description: When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: {noformat} 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more {noformat} was: When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354) Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalStateException: Invalid input path file:/user/hive/warehouse/t1/data at org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406) at org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170) ... 9 more Map job fails when table's LOCATION does not have scheme Key: HIVE-9716 URL: https://issues.apache.org/jira/browse/HIVE-9716 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Yongzhi Chen Assignee: Yongzhi Chen Priority: Minor When a table's location (the value of column 'LOCATION' in SDS table in metastore) does not have a scheme, map job returns error. For example, when do select count ( * ) from t1, get following exception: {noformat} 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: job_local2120192529_0001 java.lang.Exception: java.lang.RuntimeException:
[jira] [Commented] (HIVE-9556) create UDF to calculate the Levenshtein distance between two strings
[ https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326401#comment-14326401 ] Jason Dere commented on HIVE-9556: -- +1 create UDF to calculate the Levenshtein distance between two strings Key: HIVE-9556 URL: https://issues.apache.org/jira/browse/HIVE-9556 Project: Hive Issue Type: Improvement Components: UDF Reporter: Alexander Pivovarov Assignee: Alexander Pivovarov Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch, HIVE-9556.3.patch Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who considered this distance in 1965. Example: The Levenshtein distance between kitten and sitting is 3 1. kitten → sitten (substitution of s for k) 2. sitten → sittin (substitution of i for e) 3. sittin → sitting (insertion of g at the end). {code} select levenshtein('kitten', 'sitting'); 3 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: VOTE Bylaw for having branch committers in hive
Seems like there is consensus all around. Vikram, would you like to update the wiki with new bylaws? Thanks, Ashutosh On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org wrote: +1 thanks Prasad On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Folks, We seem to have quite a few projects going around and in the interest of time and the project as a whole, it seems good to have branch committers much like what is there in the Hadoop project. I am proposing an addition to the committer bylaws as follows ( taken from the hadoop project bylaws http://hadoop.apache.org/bylaws.html ) Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. Actions: New Branch Committer Description: When a new branch committer is proposed for the project. Approval: Lazy Consensus Binding Votes: Active PMC members Minimum Length: 3 days Mailing List: priv...@hive.apache.org Actions: Removal of Branch Committer Description: When a branch committer is removed from the project. Approval: Consensus Binding Votes: Active PMC members excluding the committer in question if they are PMC members too. Minimum Length: 6 days Mailing List: priv...@hive.apache.org This vote will run for 6 days. PMC members please vote. Thanks Vikram.
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Attachment: nation.tbl patch.txt Insert into dynamic partitions with same column structure in the distibute by clause barfs Key: HIVE-9718 URL: https://issues.apache.org/jira/browse/HIVE-9718 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 1.0.0 Reporter: Pavan Srinivas Attachments: nation.tbl, patch.txt Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain insert overwrite table nation_new_p partition (p) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Table schema used: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} Sample
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Description: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain insert overwrite table nation_new_p partition (name3) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Table schema used: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} Sample data for the table is provided by the file attached with. was: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain insert overwrite table nation_new_p partition (p) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Description: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table nation_new_p partition (some) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} and {code} CREATE TABLE `nation_new_p`( `n_name1` string, `n_name2` string, `n_name3` string) PARTITIONED BY ( `some` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' {code} Sample data for the table is provided by the file attached with. was: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Attachment: (was: HIVE-3454.3.patch) Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Attachment: HIVE-3454.4.patch Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.4.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: VOTE Bylaw for having branch committers in hive
Hi Carl, Here is the list of 17 active PMC members: Brock Noland Carl Steinbach Edward Capriolo Alan Gates Gunther Hagleitner Ashutosh Chauhan Jason Dere Lefty Leverenz Navis Ryu Owen O'Malley Prasad Suresh Mujumdar Prasanth J Harish Butani Szehon Ho Thejas Madhavan Nair Vikram Dixit K Xuefu Zhang Non active members: Ashish Thusoo Kevin Wilfong He Yongqiang Namit Jain Joydeep Sensarma Ning Zhang Raghotham Murthy https://issues.apache.org/jira/issues/?jql=text%20~%20%22kevin%20wilfong%22%20OR%20text%20~%20%22ashish%20thusoo%22%20or%20text%20~%20%22heyongqiang%22%20OR%20text%20~%20%22Namit%20Jain%22%20OR%20text%20~%20%22joydeep%20sensarma%22%20OR%20text%20~%20%22ning%20zhang%22%20OR%20text%20~%20%22raghotham%20murthy%22%20AND%20project%20%3D%20Hive%20ORDER%20BY%20updated%20DESC In the results, only the first 4/5 need to be considered because of the time line of 6 months. All of them were resolved in prior years and the last comments are mostly hudson or closing comments by others. I could not see any mails from them on the mailing lists either during this period. Thus those 7 members haven't met the criterion for being active as specified in the hive bylaws. Should I change the bylaw for this type of vote happening to dev list instead of the user mailing list as it is currently stated? Thanks Vikram. On Wed, Feb 18, 2015 at 12:33 PM, Carl Steinbach cwsteinb...@gmail.com wrote: Hi Vikram, Can you please post the names of the 17 currently active PMC members so that we have it for the records? Also, according to the bylaws this vote was supposed to happen on the user@hive list. Maybe we want to change this? Thanks. - Carl On Wed, Feb 18, 2015 at 12:25 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Yes. The vote passes with 12 +1s out of 17 currently active PMC members. I will update the wiki with the new bylaws. On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org wrote: Seems like there is consensus all around. Vikram, would you like to update the wiki with new bylaws? Thanks, Ashutosh On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org wrote: +1 thanks Prasad On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Folks, We seem to have quite a few projects going around and in the interest of time and the project as a whole, it seems good to have branch committers much like what is there in the Hadoop project. I am proposing an addition to the committer bylaws as follows ( taken from the hadoop project bylaws http://hadoop.apache.org/bylaws.html ) Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. Actions: New Branch Committer Description: When a new branch committer is proposed for the project. Approval: Lazy Consensus Binding Votes: Active PMC members Minimum Length: 3 days Mailing List: priv...@hive.apache.org Actions: Removal of Branch Committer Description: When a branch committer is removed from the project. Approval: Consensus Binding Votes: Active PMC members excluding the committer in question if they are PMC members too. Minimum Length: 6 days Mailing List: priv...@hive.apache.org This vote will run for 6 days. PMC members please vote. Thanks Vikram. -- Nothing better than when appreciated for hard work. -Mark -- Nothing better than when appreciated for hard work. -Mark
Re: VOTE Bylaw for having branch committers in hive
Should I change the bylaw for this type of vote happening to dev list instead of the user mailing list as it is currently stated? Sounds good to me. On the other hand, here are some arguments in favor of keeping this type of vote on the user mailing list: (1) wider distribution increases transparency, (2) wider distribution can broaden the non-voting discussion, (3) the user list has less traffic than the dev list, although the upcoming issues list will reduce the dev clutter. Anyway, it should be decided in a new voting thread. -- Lefty On Wed, Feb 18, 2015 at 1:24 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Carl, Here is the list of 17 active PMC members: Brock Noland Carl Steinbach Edward Capriolo Alan Gates Gunther Hagleitner Ashutosh Chauhan Jason Dere Lefty Leverenz Navis Ryu Owen O'Malley Prasad Suresh Mujumdar Prasanth J Harish Butani Szehon Ho Thejas Madhavan Nair Vikram Dixit K Xuefu Zhang Non active members: Ashish Thusoo Kevin Wilfong He Yongqiang Namit Jain Joydeep Sensarma Ning Zhang Raghotham Murthy https://issues.apache.org/jira/issues/?jql=text%20~%20%22kevin%20wilfong%22%20OR%20text%20~%20%22ashish%20thusoo%22%20or%20text%20~%20%22heyongqiang%22%20OR%20text%20~%20%22Namit%20Jain%22%20OR%20text%20~%20%22joydeep%20sensarma%22%20OR%20text%20~%20%22ning%20zhang%22%20OR%20text%20~%20%22raghotham%20murthy%22%20AND%20project%20%3D%20Hive%20ORDER%20BY%20updated%20DESC In the results, only the first 4/5 need to be considered because of the time line of 6 months. All of them were resolved in prior years and the last comments are mostly hudson or closing comments by others. I could not see any mails from them on the mailing lists either during this period. Thus those 7 members haven't met the criterion for being active as specified in the hive bylaws. Should I change the bylaw for this type of vote happening to dev list instead of the user mailing list as it is currently stated? Thanks Vikram. On Wed, Feb 18, 2015 at 12:33 PM, Carl Steinbach cwsteinb...@gmail.com wrote: Hi Vikram, Can you please post the names of the 17 currently active PMC members so that we have it for the records? Also, according to the bylaws this vote was supposed to happen on the user@hive list. Maybe we want to change this? Thanks. - Carl On Wed, Feb 18, 2015 at 12:25 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Yes. The vote passes with 12 +1s out of 17 currently active PMC members. I will update the wiki with the new bylaws. On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org wrote: Seems like there is consensus all around. Vikram, would you like to update the wiki with new bylaws? Thanks, Ashutosh On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org wrote: +1 thanks Prasad On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Folks, We seem to have quite a few projects going around and in the interest of time and the project as a whole, it seems good to have branch committers much like what is there in the Hadoop project. I am proposing an addition to the committer bylaws as follows ( taken from the hadoop project bylaws http://hadoop.apache.org/bylaws.html ) Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. Actions: New Branch Committer Description: When a new branch committer is proposed for the project. Approval: Lazy Consensus Binding Votes: Active PMC members Minimum Length: 3 days Mailing List: priv...@hive.apache.org Actions: Removal of Branch Committer Description: When a branch committer is removed from the project. Approval: Consensus Binding Votes: Active PMC members excluding the committer in question if they are PMC members too. Minimum Length: 6 days Mailing List: priv...@hive.apache.org This vote will run for 6 days. PMC members please vote. Thanks Vikram. -- Nothing better than when appreciated for hard work. -Mark -- Nothing better than when appreciated for hard work. -Mark
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Description: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain insert overwrite table nation_new_p partition (name3) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} and {code} CREATE TABLE `nation_new_p`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) PARTITIONED BY ( `some` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' {code} Sample data for the table is provided by the file attached with. was: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Description: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table nation_new_p partition (some) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} and {code} CREATE TABLE `nation_new_p`( `n_name1` string, `n_name2` string) PARTITIONED BY ( `some` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' {code} Sample data for the table is provided by the file attached with. was: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Description: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table nation_new_p partition (some) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, an optimization of deduplication of columns is done. But, when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} and {code} CREATE TABLE `nation_new_p`( `n_name1` string, `n_name2` string) PARTITIONED BY ( `some` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' {code} Sample data for the table is provided by the file attached with. was: Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table
Re: VOTE Bylaw for having branch committers in hive
Yes. The vote passes with 12 +1s out of 17 currently active PMC members. I will update the wiki with the new bylaws. On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org wrote: Seems like there is consensus all around. Vikram, would you like to update the wiki with new bylaws? Thanks, Ashutosh On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org wrote: +1 thanks Prasad On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Folks, We seem to have quite a few projects going around and in the interest of time and the project as a whole, it seems good to have branch committers much like what is there in the Hadoop project. I am proposing an addition to the committer bylaws as follows ( taken from the hadoop project bylaws http://hadoop.apache.org/bylaws.html ) Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. Actions: New Branch Committer Description: When a new branch committer is proposed for the project. Approval: Lazy Consensus Binding Votes: Active PMC members Minimum Length: 3 days Mailing List: priv...@hive.apache.org Actions: Removal of Branch Committer Description: When a branch committer is removed from the project. Approval: Consensus Binding Votes: Active PMC members excluding the committer in question if they are PMC members too. Minimum Length: 6 days Mailing List: priv...@hive.apache.org This vote will run for 6 days. PMC members please vote. Thanks Vikram. -- Nothing better than when appreciated for hard work. -Mark
Re: VOTE Bylaw for having branch committers in hive
Hi Vikram, Can you please post the names of the 17 currently active PMC members so that we have it for the records? Also, according to the bylaws this vote was supposed to happen on the user@hive list. Maybe we want to change this? Thanks. - Carl On Wed, Feb 18, 2015 at 12:25 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Yes. The vote passes with 12 +1s out of 17 currently active PMC members. I will update the wiki with the new bylaws. On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org wrote: Seems like there is consensus all around. Vikram, would you like to update the wiki with new bylaws? Thanks, Ashutosh On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org wrote: +1 thanks Prasad On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com wrote: Hi Folks, We seem to have quite a few projects going around and in the interest of time and the project as a whole, it seems good to have branch committers much like what is there in the Hadoop project. I am proposing an addition to the committer bylaws as follows ( taken from the hadoop project bylaws http://hadoop.apache.org/bylaws.html ) Significant, pervasive features are often developed in a speculative branch of the repository. The PMC may grant commit rights on the branch to its consistent contributors, while the initiative is active. Branch committers are responsible for shepherding their feature into an active release and do not cast binding votes or vetoes in the project. Actions: New Branch Committer Description: When a new branch committer is proposed for the project. Approval: Lazy Consensus Binding Votes: Active PMC members Minimum Length: 3 days Mailing List: priv...@hive.apache.org Actions: Removal of Branch Committer Description: When a branch committer is removed from the project. Approval: Consensus Binding Votes: Active PMC members excluding the committer in question if they are PMC members too. Minimum Length: 6 days Mailing List: priv...@hive.apache.org This vote will run for 6 days. PMC members please vote. Thanks Vikram. -- Nothing better than when appreciated for hard work. -Mark
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Status: In Progress (was: Patch Available) Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.13.1, 0.13.0, 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.1, 0.8.0 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
Good idea... since it's not a blocker I will add that for 1.1.1 and 1.2.0. On Wed, Feb 18, 2015 at 10:37 AM, Prasad Mujumdar pras...@cloudera.com wrote: I guess the README.txt can list Apache Spark as query execution framework along with MapReduce and Tez. thanks Prasad On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com wrote: +1 1. downloaded the src and bin, and verified md5. 2. built the src with -Phadoop-1 and -Phadoop-2. 3. ran a few unit tests Thanks, Xuefu On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Commented] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326539#comment-14326539 ] Gopal V commented on HIVE-9718: --- The bug aside, the DISTRIBUTE BY will result in a sub-optimal plan. Have you tried removing the DISTRIBUTE BY and instead using the automatic reducer injection? https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.sort.dynamic.partition Insert into dynamic partitions with same column structure in the distibute by clause barfs Key: HIVE-9718 URL: https://issues.apache.org/jira/browse/HIVE-9718 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 1.0.0 Reporter: Pavan Srinivas Priority: Critical Attachments: nation.tbl, patch.txt Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; insert overwrite table nation_new_p partition (some) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, an optimization of deduplication of columns is done. But, when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Tables used are: {code} CREATE EXTERNAL
[jira] [Created] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
Pavan Srinivas created HIVE-9718: Summary: Insert into dynamic partitions with same column structure in the distibute by clause barfs Key: HIVE-9718 URL: https://issues.apache.org/jira/browse/HIVE-9718 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 1.0.0 Reporter: Pavan Srinivas Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain insert overwrite table nation_new_p partition (p) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Table schema used: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'; {code} Sample data for the table is provided by the file attached with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs
[ https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pavan Srinivas updated HIVE-9718: - Priority: Critical (was: Major) Insert into dynamic partitions with same column structure in the distibute by clause barfs Key: HIVE-9718 URL: https://issues.apache.org/jira/browse/HIVE-9718 Project: Hive Issue Type: Bug Affects Versions: 0.14.0, 1.0.0 Reporter: Pavan Srinivas Priority: Critical Attachments: nation.tbl, patch.txt Sample reproducible query: {code} SET hive.exec.dynamic.partition.mode=nonstrict; SET hive.exec.dynamic.partition=true; explain insert overwrite table nation_new_p partition (p) select n_name as name1, n_name as name2, n_name as name3 from nation distribute by name3; {code} Note: Make sure there is data in the source table to reproduce the issue. During the optimizations done for Jira: https://issues.apache.org/jira/browse/HIVE-4867, a optimization of deduplication of columns is done. But when one of the columns is used as part of partitioned/distribute by, its not taken care of. The above query produces exception as follows: {code} Diagnostic Messages for this Task: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181) at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. carefully final deposits detect slyly agai} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) ... 12 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 13 more Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55) at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325) ... 19 more {code} Table schema used: {code} CREATE EXTERNAL TABLE `nation`( `n_nationkey` int, `n_name` string, `n_regionkey` int, `n_comment` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
[jira] [Commented] (HIVE-9613) Left join query plan outputs wrong column when using subquery
[ https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326537#comment-14326537 ] Chao commented on HIVE-9613: OK, I was able to reproduce the issue on my cluster too. Previously I was using CLI local mode. Strangely, the plan looks different when it is running on a cluster versus running locally. I'll look more into this issue.. Left join query plan outputs wrong column when using subquery -- Key: HIVE-9613 URL: https://issues.apache.org/jira/browse/HIVE-9613 Project: Hive Issue Type: Bug Components: Parser, Query Planning Affects Versions: 0.14.0, 1.0.0 Environment: apache hadoop 2.5.1 Reporter: Li Xin Attachments: test.sql I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own. I have three tables,as follows: table 1: _hivetemp.category_city_rank_: ||category||city||rank|| |jinrongfuwu|shanghai|1| |ktvjiuba|shanghai|2| table 2:_hivetemp.category_match_: ||src_category_en||src_category_cn||dst_category_en||dst_category_cn|| |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu| |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba| table 3:_hivetemp.city_match_: ||src_city_name_en||dst_city_name_en||city_name_cn|| |sh|shanghai|上海| And the query is : {code} select a.category, a.city, a.rank, b.src_category_en, c.src_city_name_en from hivetemp.category_city_rank a left outer join (select src_category_en, dst_category_en from hivetemp.category_match) b on a.category = b.dst_category_en left outer join (select src_city_name_en, dst_city_name_en from hivetemp.city_match) c on a.city = c.dst_city_name_en {code} which shoud output the results as follows,and i test it in hive 0.13: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh| |ktvjiuba|shanghai|2|zpwentiyingshi|sh| but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|shanghai|sh| |ktvjiuba|shanghai|2|shanghai|sh| Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing. {quote} b:category_match TableScan alias: category_match Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: dst_category_en (type: string) outputColumnNames: _col1 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Attachment: (was: HIVE-3454.4.patch) Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
Sounds good. +1 Verified checksums of source and binary tarballs Compiled with hadoop-1 and hadoop-2 profiles with distributions Ran maven verify thanks Prasad On Wed, Feb 18, 2015 at 12:50 PM, Brock Noland br...@cloudera.com wrote: Good idea... since it's not a blocker I will add that for 1.1.1 and 1.2.0. On Wed, Feb 18, 2015 at 10:37 AM, Prasad Mujumdar pras...@cloudera.com wrote: I guess the README.txt can list Apache Spark as query execution framework along with MapReduce and Tez. thanks Prasad On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com wrote: +1 1. downloaded the src and bin, and verified md5. 2. built the src with -Phadoop-1 and -Phadoop-2. 3. ran a few unit tests Thanks, Xuefu On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
Hi, From the release branch, I noticed that the hive-exec.jar now contains a copy of guava-14 without any relocations. The hive spark-client pom.xml adds guava as a lib jar instead of shading it in. https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111 That seems to be a great approach for guava compat issues across execution engines. Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??). https://issues.apache.org/jira/browse/SPARK-2848 Does any of the same compatibility issues occur when using a hive-exec.jar containing guava-14 on MRv2 (which has guava-11 in the classpath)? Cheers, Gopal On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Created] (HIVE-9720) Metastore does not properly migrate column stats when renaming a table across databases.
Alexander Behm created HIVE-9720: Summary: Metastore does not properly migrate column stats when renaming a table across databases. Key: HIVE-9720 URL: https://issues.apache.org/jira/browse/HIVE-9720 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.13.1 Reporter: Alexander Behm It appears that the Hive Metastore does not properly migrate column statistics when renaming a table across databases. While renaming across databases is not supported in HiveQL, it can be done via the Metastore Thrift API. The problem is that such a newly renamed table cannot be dropped (unless renamed back to its original database/name). Here are steps for reproducing the issue. 1. From the Hive shell/beeline: {code} create database db1; create database db2; create table db1.mv (i int); use db1; analyze table mv compute statistics for columns i; {code} 2. From a Java program: {code} public static void main(String[] args) throws Exception { HiveConf conf = new HiveConf(MetaStoreClientPool.class); HiveMetaStoreClient hiveClient = new HiveMetaStoreClient(conf); Table t = hiveClient.getTable(db1, mv); t.setDbName(db2); t.setTableName(mv2); hiveClient.alter_table(db1, mv, t); } {code} 3. From the Hive shell/beeline: {code} drop table db2.mv2; {code} Stack shown when running 3: {code} FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:javax.jdo.JDODataStoreException: Exception thrown flushing changes to datastore at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451) at org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:165) at org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:411) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108) at com.sun.proxy.$Proxy0.commitTransaction(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1389) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1525) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:106) at com.sun.proxy.$Proxy1.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:8072) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:8056) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724) NestedThrowablesStackTrace: java.sql.BatchUpdateException: Batch entry 0 DELETE FROM TBLS WHERE TBL_ID='1621' was aborted. Call getNextException to see the cause. at org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2598) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:407) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2737) at com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:424) at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372) at org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628) at
[jira] [Commented] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326995#comment-14326995 ] Pengcheng Xiong commented on HIVE-9647: --- [~mmokhtar], the test failure is unrelated and it passed on my laptop. Could you please try the patch? It could be applied on trunk. Thanks. Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-9647.01.patch High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE co-porccessors which we can use as a reference here Using the global stats from TAB_COL_STATS and the per partition stats from PART_COL_STATS extrapolate the NDV for the qualified partitions as in : Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS)) More details While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of the plans are different, then I dumped the CBO logical plan and I found that join estimates are drastically different Unpartitioned schema : {code} 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering: HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2956 HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2982 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2980 HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], joinType=[inner]): rowcount = 28880.460910696, cumulative cost = {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 3
+1 1. downloaded the src tarball and built w/ -Phadoop-1/2 2. verified no binary (jars) in the src tarball On Wed, Feb 18, 2015 at 8:56 PM, Brock Noland br...@cloudera.com wrote: +1 verified sigs, hashes, created tables, ran MR on YARN jobs On Wed, Feb 18, 2015 at 8:54 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 3 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc3/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1026/ Source tag for RC3 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31178/ --- Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Diffs - data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 74f1b01 metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java 7fc04f1 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 574141c metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 475883b ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q 00c9b53 ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 8ae9a90 ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q PRE-CREATION ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 0f6b15d ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 1fdeb90 ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out PRE-CREATION Diff: https://reviews.apache.org/r/31178/diff/ Testing --- Thanks, pengcheng xiong
[jira] [Commented] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327087#comment-14327087 ] Jason Dere commented on HIVE-9537: -- This was by design. The SQL spec didn't seem to have any specifics here regarding the trailing spaces behavior, and MySQL/Postgres (which I had available at the time) had similar semantics regarding how trailing spaces for char were treated during length()/concat(). upper()/lower() should not be affected by this string expressions on a fixed length character do not preserve trailing spaces -- Key: HIVE-9537 URL: https://issues.apache.org/jira/browse/HIVE-9537 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Aihua Xu When a string expression such as upper or lower is applied to a fixed length column the trailing spaces of the fixed length character are not preserved. {code:sql} CREATE TABLE if not exists TCHAR ( RNUM int, CCHAR char(32) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; {code} {{cchar}} as a {{char(32)}}. {code:sql} select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), concat(upper(cchar), cchar) from tchar; {code} 0|\N 1| 2| 3|BB 4|EE 5|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[VOTE] Apache Hive 1.1.0 Release Candidate 3
Apache Hive 1.1.0 Release Candidate 3 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc3/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1026/ Source tag for RC3 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 3
+1 verified sigs, hashes, created tables, ran MR on YARN jobs On Wed, Feb 18, 2015 at 8:54 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 3 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc3/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1026/ Source tag for RC3 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Created] (HIVE-9721) Hadoop23Shims.setFullFileStatus should check for null
Brock Noland created HIVE-9721: -- Summary: Hadoop23Shims.setFullFileStatus should check for null Key: HIVE-9721 URL: https://issues.apache.org/jira/browse/HIVE-9721 Project: Hive Issue Type: Bug Reporter: Brock Noland {noformat} 2015-02-18 22:46:10,209 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path file:/tmp/hive/f1a28dee-70e8-4bc3-bd35-9be13834d1fc/hive_2015-02-18_22-46-10_065_3348083202601156561-1 does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.UnsupportedOperationException: RawLocalFileSystem doesn't support getAclStatus java.lang.UnsupportedOperationException: RawLocalFileSystem doesn't support getAclStatus at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429) at org.apache.hadoop.fs.FilterFileSystem.getAclStatus(FilterFileSystem.java:562) at org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:645) at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:524) at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:234) at org.apache.hadoop.hive.ql.Context.getExtTmpPathRelTo(Context.java:424) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6290) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9069) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8961) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9807) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9700) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10136) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:284) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:190) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1106) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:101) at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172) at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:415) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 2015-02-18 17:30:58,753 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: Skipping ACL inheritance: File system for path file:/tmp/hive/e3eb01f0-bb58-45a8-b773-8f4f3420457c/hive_2015-02-18_17-30-58_346_5020255420422913166-1/-mr-1 does not support ACLs but dfs.namenode.acls.enabled is set to true: java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:668) at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:527) at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:234) at org.apache.hadoop.hive.ql.Context.getExtTmpPathRelTo(Context.java:424) at
[jira] [Updated] (HIVE-9706) HBase handler support for snapshots should confirm properties before use
[ https://issues.apache.org/jira/browse/HIVE-9706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-9706: --- Resolution: Fixed Fix Version/s: (was: 1.1.0) Status: Resolved (was: Patch Available) Thank you Sean! I have committed this to trunk! HBase handler support for snapshots should confirm properties before use Key: HIVE-9706 URL: https://issues.apache.org/jira/browse/HIVE-9706 Project: Hive Issue Type: Bug Components: HBase Handler Affects Versions: 0.14.0, 1.0.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 1.2.0 Attachments: HIVE-9707.1.patch The HBase Handler's support for running over snapshots attempts to copy a number of hbase internal configurations into a job configuration. Some of these configuration keys are removed in HBase 1.0.0+ and the current implementation will fail when copying the resultant null value into a new configuration. Additionally, some internal configs added in later HBase 0.98 versions are not respected. Instead, setup should check for the presence of the keys it expects and then make the new configuration consistent with them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
+1 1. downloaded the src and bin, and verified md5. 2. built the src with -Phadoop-1 and -Phadoop-2. 3. ran a few unit tests Thanks, Xuefu On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326137#comment-14326137 ] Aihua Xu commented on HIVE-3454: The test failure is unrelated to the change. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Attachment: HIVE-3454.3.patch Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325867#comment-14325867 ] Aihua Xu commented on HIVE-3454: Thanks [~jdere] for reviewing. Just updated the parameter name. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9703) Merge from Spark branch to trunk 02/16/2015
[ https://issues.apache.org/jira/browse/HIVE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9703: -- Resolution: Fixed Fix Version/s: 1.2.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks to Brock for the review. Merge from Spark branch to trunk 02/16/2015 --- Key: HIVE-9703 URL: https://issues.apache.org/jira/browse/HIVE-9703 Project: Hive Issue Type: Task Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 1.2.0 Attachments: HIVE-9703.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9613) Left join query plan outputs wrong column when using subquery
[ https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325850#comment-14325850 ] Li Xin commented on HIVE-9613: -- thank you for your reply Chao. I just set up a new hive of version 1.0 in my cluster,and didn't change any configuration ,and the results is still the same , strange. i will just attached the sql i tested , could you take any time to test it and let me know it is ok or not . Best regards and happy Chinese new year. Left join query plan outputs wrong column when using subquery -- Key: HIVE-9613 URL: https://issues.apache.org/jira/browse/HIVE-9613 Project: Hive Issue Type: Bug Components: Parser, Query Planning Affects Versions: 0.14.0, 1.0.0 Environment: apache hadoop 2.5.1 Reporter: Li Xin Attachments: test.sql I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own. I have three tables,as follows: table 1: _hivetemp.category_city_rank_: ||category||city||rank|| |jinrongfuwu|shanghai|1| |ktvjiuba|shanghai|2| table 2:_hivetemp.category_match_: ||src_category_en||src_category_cn||dst_category_en||dst_category_cn|| |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu| |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba| table 3:_hivetemp.city_match_: ||src_city_name_en||dst_city_name_en||city_name_cn|| |sh|shanghai|上海| And the query is : {code} select a.category, a.city, a.rank, b.src_category_en, c.src_city_name_en from hivetemp.category_city_rank a left outer join (select src_category_en, dst_category_en from hivetemp.category_match) b on a.category = b.dst_category_en left outer join (select src_city_name_en, dst_city_name_en from hivetemp.city_match) c on a.city = c.dst_city_name_en {code} which shoud output the results as follows,and i test it in hive 0.13: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh| |ktvjiuba|shanghai|2|zpwentiyingshi|sh| but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|shanghai|sh| |ktvjiuba|shanghai|2|shanghai|sh| Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing. {quote} b:category_match TableScan alias: category_match Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: dst_category_en (type: string) outputColumnNames: _col1 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9613) Left join query plan outputs wrong column when using subquery
[ https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Xin updated HIVE-9613: - Attachment: test.sql this sql script i tested output the 4th column with wrong values Left join query plan outputs wrong column when using subquery -- Key: HIVE-9613 URL: https://issues.apache.org/jira/browse/HIVE-9613 Project: Hive Issue Type: Bug Components: Parser, Query Planning Affects Versions: 0.14.0, 1.0.0 Environment: apache hadoop 2.5.1 Reporter: Li Xin Attachments: test.sql I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own. I have three tables,as follows: table 1: _hivetemp.category_city_rank_: ||category||city||rank|| |jinrongfuwu|shanghai|1| |ktvjiuba|shanghai|2| table 2:_hivetemp.category_match_: ||src_category_en||src_category_cn||dst_category_en||dst_category_cn|| |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu| |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba| table 3:_hivetemp.city_match_: ||src_city_name_en||dst_city_name_en||city_name_cn|| |sh|shanghai|上海| And the query is : {code} select a.category, a.city, a.rank, b.src_category_en, c.src_city_name_en from hivetemp.category_city_rank a left outer join (select src_category_en, dst_category_en from hivetemp.category_match) b on a.category = b.dst_category_en left outer join (select src_city_name_en, dst_city_name_en from hivetemp.city_match) c on a.city = c.dst_city_name_en {code} which shoud output the results as follows,and i test it in hive 0.13: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh| |ktvjiuba|shanghai|2|zpwentiyingshi|sh| but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|shanghai|sh| |ktvjiuba|shanghai|2|shanghai|sh| Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing. {quote} b:category_match TableScan alias: category_match Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: dst_category_en (type: string) outputColumnNames: _col1 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang reassigned HIVE-9659: - Assignee: Jimmy Xiang 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] --- Key: HIVE-9659 URL: https://issues.apache.org/jira/browse/HIVE-9659 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xin Hao Assignee: Jimmy Xiang We found that 'Error while trying to create table container' occurs during Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'. If hive.optimize.skewjoin set to 'false', the case could pass. How to reproduce: 1. set hive.optimize.skewjoin=true; 2. Run BigBench case Q12 and it will fail. Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you will found error 'Error while trying to create table container' in the log and also a NullPointerException near the end of the log. (a) Detail error message for 'Error while trying to create table container': {noformat} 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193) at org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to create table container at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158) at org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a directory: hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable at org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106) ... 22 more 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler {noformat} (b) Detail error message for NullPointerException: {noformat} 5/02/12 01:29:50 ERROR MapJoinOperator: Unexpected exception: null
[jira] [Commented] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326062#comment-14326062 ] Aihua Xu commented on HIVE-9537: [~the6campbells] I don't think it's a bug. The Char type has fixed-length with padding spaces but they are not to be included the value of the fields and won't be considered when you call the upper/lower function. The result is as expected. string expressions on a fixed length character do not preserve trailing spaces -- Key: HIVE-9537 URL: https://issues.apache.org/jira/browse/HIVE-9537 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Aihua Xu When a string expression such as upper or lower is applied to a fixed length column the trailing spaces of the fixed length character are not preserved. {code:sql} CREATE TABLE if not exists TCHAR ( RNUM int, CCHAR char(32) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; {code} {{cchar}} as a {{char(32)}}. {code:sql} select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), concat(upper(cchar), cchar) from tchar; {code} 0|\N 1| 2| 3|BB 4|EE 5|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu resolved HIVE-9537. Resolution: Not a Problem If you think you have additional issue, please reopen or open a new one. string expressions on a fixed length character do not preserve trailing spaces -- Key: HIVE-9537 URL: https://issues.apache.org/jira/browse/HIVE-9537 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Aihua Xu When a string expression such as upper or lower is applied to a fixed length column the trailing spaces of the fixed length character are not preserved. {code:sql} CREATE TABLE if not exists TCHAR ( RNUM int, CCHAR char(32) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; {code} {{cchar}} as a {{char(32)}}. {code:sql} select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), concat(upper(cchar), cchar) from tchar; {code} 0|\N 1| 2| 3|BB 4|EE 5|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326065#comment-14326065 ] Hive QA commented on HIVE-3454: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12699474/HIVE-3454.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7557 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2820/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2820/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2820/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12699474 - PreCommit-HIVE-TRUNK-Build Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Release Note: The behaviors of converting from BOOLEAN/BYTE/SHORT/INT/BIGINT and converting from FLOAT/DOUBLE to TIMESTAMP have been inconsistent. The value of a BOOLEAN/BYTE/SHORT/INT/BIGINT is treated as the time in milliseconds while the value of a FLOAT/DOUBLE is treated as the time in seconds. With the change of HIVE-3454, we support an additional configuration hive.int.timestamp.conversion.in.seconds to enable the interpretation the BOOLEAN/BYTE/SHORT/INT/BIGINT value in seconds during the timestamp conversion without breaking the existing customers. By default, the existing functionality is kept. was: The behaviors of converting from BOOLEAN/BYTE/SHORT/INT/BIGINT and converting from FLOAT/DOUBLE to TIMESTAMP have been inconsistent. The value of a BOOLEAN/BYTE/SHORT/INT/BIGINT is treated as the time in milliseconds while the value of a FLOAT/DOUBLE is treated as the time in seconds. With the change of HIVE-3454, we support an additional configuration int.timestamp.conversion.in.seconds to enable the interpretation the BOOLEAN/BYTE/SHORT/INT/BIGINT value in seconds during the timestamp conversion without breaking the existing customers. By default, the existing functionality is kept. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325868#comment-14325868 ] Aihua Xu commented on HIVE-3454: Thanks [~jdere] for reviewing. Just updated the parameter name. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-3454: --- Attachment: (was: HIVE-3454.3.patch) Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9551) Unable to read Microsoft SQL Server timestamp column
[ https://issues.apache.org/jira/browse/HIVE-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326002#comment-14326002 ] Aihua Xu commented on HIVE-9551: [~dilipg] Can you provide more details? Can you check what value returned from sqoop and what did you see from Hive? Exception or incorrectly interpreted? Unable to read Microsoft SQL Server timestamp column Key: HIVE-9551 URL: https://issues.apache.org/jira/browse/HIVE-9551 Project: Hive Issue Type: Bug Components: CLI, SQL Reporter: Dilip Godhia When sqoop reads a timestamp column from SQL Server, hive is not able to process it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.
[ https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326005#comment-14326005 ] Aihua Xu commented on HIVE-9546: [~vbora] Seems you are hitting the issue HIVE-9199. Create table taking substantially longer time when other select queries are run in parallel. Key: HIVE-9546 URL: https://issues.apache.org/jira/browse/HIVE-9546 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: RedHat Linux, Cloudera 5.3.0 Reporter: sri venu bora Attachments: Hive_create_Issue.txt Create table taking substantially longer time when other select queries are run in parallel. We were able to reproduce the issue using beeline in two sessions. Beeline Shell 1: a) create table with no other queries running on hive ( took approximately 0.313 seconds) b) Insert Data into the table c) Run a select count query on the above table Beeline Shell 2: a) create table while step c) is running in the Beeline Shell 1. (took approximately 60.431 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.
[ https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326007#comment-14326007 ] Aihua Xu commented on HIVE-9546: Try to set hive.exec.parallel=false to disable the parallel to see if it makes the difference in hive-site.xml. Create table taking substantially longer time when other select queries are run in parallel. Key: HIVE-9546 URL: https://issues.apache.org/jira/browse/HIVE-9546 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: RedHat Linux, Cloudera 5.3.0 Reporter: sri venu bora Assignee: Aihua Xu Attachments: Hive_create_Issue.txt Create table taking substantially longer time when other select queries are run in parallel. We were able to reproduce the issue using beeline in two sessions. Beeline Shell 1: a) create table with no other queries running on hive ( took approximately 0.313 seconds) b) Insert Data into the table c) Run a select count query on the above table Beeline Shell 2: a) create table while step c) is running in the Beeline Shell 1. (took approximately 60.431 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.
[ https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-9546: -- Assignee: Aihua Xu Create table taking substantially longer time when other select queries are run in parallel. Key: HIVE-9546 URL: https://issues.apache.org/jira/browse/HIVE-9546 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Environment: RedHat Linux, Cloudera 5.3.0 Reporter: sri venu bora Assignee: Aihua Xu Attachments: Hive_create_Issue.txt Create table taking substantially longer time when other select queries are run in parallel. We were able to reproduce the issue using beeline in two sessions. Beeline Shell 1: a) create table with no other queries running on hive ( took approximately 0.313 seconds) b) Insert Data into the table c) Run a select count query on the above table Beeline Shell 2: a) create table while step c) is running in the Beeline Shell 1. (took approximately 60.431 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326009#comment-14326009 ] Aihua Xu commented on HIVE-9537: [~the6campbells] Can you provide the HIVE version you have the problem? string expressions on a fixed length character do not preserve trailing spaces -- Key: HIVE-9537 URL: https://issues.apache.org/jira/browse/HIVE-9537 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Aihua Xu When a string expression such as upper or lower is applied to a fixed length column the trailing spaces of the fixed length character are not preserved. {code:sql} CREATE TABLE if not exists TCHAR ( RNUM int, CCHAR char(32) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; {code} {{cchar}} as a {{char(32)}}. {code:sql} select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), concat(upper(cchar), cchar) from tchar; {code} 0|\N 1| 2| 3|BB 4|EE 5|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-9561: -- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) [~lirui], no worries. I just committed this to the Spark branch. Thanks, Rui. SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, HIVE-9561.6-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu reassigned HIVE-9537: -- Assignee: Aihua Xu string expressions on a fixed length character do not preserve trailing spaces -- Key: HIVE-9537 URL: https://issues.apache.org/jira/browse/HIVE-9537 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Aihua Xu When a string expression such as upper or lower is applied to a fixed length column the trailing spaces of the fixed length character are not preserved. {code:sql} CREATE TABLE if not exists TCHAR ( RNUM int, CCHAR char(32) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; {code} {{cchar}} as a {{char(32)}}. {code:sql} select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), concat(upper(cchar), cchar) from tchar; {code} 0|\N 1| 2| 3|BB 4|EE 5|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326158#comment-14326158 ] Brock Noland edited comment on HIVE-3454 at 2/18/15 4:40 PM: - Have we tested this as part of an MR job? I don't think that the hive-site.xml is shipped as part of MR jobs. If that is true, how about we do as follows: 1) Add method {{public static void initialize(Configuration)}} to {{TimestampWritable}} 2) Call this method from {{AbstractSerDe.initialize}} which should be called, with configuration, in all the right places. 3) In {{TimestampWritable.initialize}} you can use the static {{HiveConf.getBoolVar}} a bit kludgy but it should work. This all assuming the current impl doesn't work. bq. timestamp conversion. I think we need a space after this. was (Author: brocknoland): Have we tested this as part of an MR job? I don't think that the hive-site.xml is shipped as part of MR jobs. If that is true, how about we do as follows: 1) Add method {{public static void initialize(Configuration)}} to {{TimestampWritable}} 2) Call this method from {{AbstractSerDe.initialize}} which should be called, with configuration, in all the right places. 3) In {{TimestampWritable.initialize}} you can use the static {{HiveCon.getBoolVar}} a bit kludgy but it should work. This all assuming the current impl doesn't work. bq. timestamp conversion. I think we need a space after this. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326158#comment-14326158 ] Brock Noland commented on HIVE-3454: Have we tested this as part of an MR job? I don't think that the hive-site.xml is shipped as part of MR jobs. If that is true, how about we do as follows: 1) Add method {{public static void initialize(Configuration)}} to {{TimestampWritable}} 2) Call this method from {{AbstractSerDe.initialize}} which should be called, with configuration, in all the right places. 3) In {{TimestampWritable.Configuration}} you can use the static {{HiveCon.getBoolVar}} a bit kludgy but it should work. This all assuming the current impl doesn't work. bq. timestamp conversion. I think we need a space after this. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)
[ https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326158#comment-14326158 ] Brock Noland edited comment on HIVE-3454 at 2/18/15 4:40 PM: - Have we tested this as part of an MR job? I don't think that the hive-site.xml is shipped as part of MR jobs. If that is true, how about we do as follows: 1) Add method {{public static void initialize(Configuration)}} to {{TimestampWritable}} 2) Call this method from {{AbstractSerDe.initialize}} which should be called, with configuration, in all the right places. 3) In {{TimestampWritable.initialize}} you can use the static {{HiveCon.getBoolVar}} a bit kludgy but it should work. This all assuming the current impl doesn't work. bq. timestamp conversion. I think we need a space after this. was (Author: brocknoland): Have we tested this as part of an MR job? I don't think that the hive-site.xml is shipped as part of MR jobs. If that is true, how about we do as follows: 1) Add method {{public static void initialize(Configuration)}} to {{TimestampWritable}} 2) Call this method from {{AbstractSerDe.initialize}} which should be called, with configuration, in all the right places. 3) In {{TimestampWritable.Configuration}} you can use the static {{HiveCon.getBoolVar}} a bit kludgy but it should work. This all assuming the current impl doesn't work. bq. timestamp conversion. I think we need a space after this. Problem with CAST(BIGINT as TIMESTAMP) -- Key: HIVE-3454 URL: https://issues.apache.org/jira/browse/HIVE-3454 Project: Hive Issue Type: Bug Components: Types, UDF Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 0.13.1 Reporter: Ryan Harris Assignee: Aihua Xu Labels: newbie, newdev, patch Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch Ran into an issue while working with timestamp conversion. CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current time from the BIGINT returned by unix_timestamp() Instead, however, a 1970-01-16 timestamp is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326675#comment-14326675 ] Lefty Leverenz commented on HIVE-7292: -- Doc note: See comments on HIVE-9257 and HIVE-9448 for documentation issues. * [HIVE-9257 commit comment with doc notes | https://issues.apache.org/jira/browse/HIVE-9257?focusedCommentId=14273166page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273166] * HIVE-9448 doc comments ** [list of configuration parameters | https://issues.apache.org/jira/browse/HIVE-9448?focusedCommentId=14292487page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292487] ** [where documented | https://issues.apache.org/jira/browse/HIVE-9448?focusedCommentId=14298353page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14298353] Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-7292) Hive on Spark
[ https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326644#comment-14326644 ] Lefty Leverenz edited comment on HIVE-7292 at 2/18/15 10:49 PM: Although this issue is still marked Unresolved, the Spark branch has been merged to trunk and is Resolved for the 1.1.0 release (HIVE-9257 and HIVE-9352). (Edit: Also HIVE-9448.) was (Author: leftylev): Although this issue is still marked Unresolved, the Spark branch has been merged to trunk and is Resolved for the 1.1.0 release (HIVE-9257 and HIVE-9352). Hive on Spark - Key: HIVE-7292 URL: https://issues.apache.org/jira/browse/HIVE-7292 Project: Hive Issue Type: Improvement Components: Spark Reporter: Xuefu Zhang Assignee: Xuefu Zhang Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5 Attachments: Hive-on-Spark.pdf Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend. Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop. Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does. This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
Checked the md5 and the signature of both. Built src and ran a few queries with CLI/Beeline. Something very strange.. with the bin I cannot create any table at all (says FAILED: SemanticException Line 1:13 Invalid table name..), I am not sure what is wrong, as it works using the one I build from src. I also can create tables fine with the previous RC binary(s). How did you create the binary this time, was there any modification from the one built by src? Hope it is not a setup error on my part. Thanks, Szehon On Wed, Feb 18, 2015 at 2:26 PM, Prasad Mujumdar pras...@cloudera.com wrote: Sounds good. +1 Verified checksums of source and binary tarballs Compiled with hadoop-1 and hadoop-2 profiles with distributions Ran maven verify thanks Prasad On Wed, Feb 18, 2015 at 12:50 PM, Brock Noland br...@cloudera.com wrote: Good idea... since it's not a blocker I will add that for 1.1.1 and 1.2.0. On Wed, Feb 18, 2015 at 10:37 AM, Prasad Mujumdar pras...@cloudera.com wrote: I guess the README.txt can list Apache Spark as query execution framework along with MapReduce and Tez. thanks Prasad On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com wrote: +1 1. downloaded the src and bin, and verified md5. 2. built the src with -Phadoop-1 and -Phadoop-2. 3. ran a few unit tests Thanks, Xuefu On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Assigned] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong reassigned HIVE-9647: - Assignee: Pengcheng Xiong (was: Gunther Hagleitner) Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE co-porccessors which we can use as a reference here Using the global stats from TAB_COL_STATS and the per partition stats from PART_COL_STATS extrapolate the NDV for the qualified partitions as in : Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS)) More details While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of the plans are different, then I dumped the CBO logical plan and I found that join estimates are drastically different Unpartitioned schema : {code} 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering: HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2956 HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2982 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2980 HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], joinType=[inner]): rowcount = 28880.460910696, cumulative cost = {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): rowcount = 5.5578005E7, cumulative cost = {0}, id = 2823 HiveProjectRel(d_date_sk=[$0],
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
We should be able to generate those values in webhcat-default.xml. Eugene? On Wed, Feb 18, 2015 at 4:43 PM, Lefty Leverenz leftylever...@gmail.com wrote: Four configuration values in webhcat-default.xml need to be updated (same as HIVE-8807 https://issues.apache.org/jira/browse/HIVE-8807 updated in the patch for release 1.0.0 https://issues.apache.org/jira/secure/attachment/12695112/HIVE8807.patch): - templeton.pig.path - templeton.hive.path - templeton.hive.home - templeton.hcat.home How can we make this happen in every release, without reminders? -- Lefty On Wed, Feb 18, 2015 at 4:04 PM, Brock Noland br...@cloudera.com wrote: Yeah that is really strange. I have seen that before, a long time back, and but not found the root cause. I think it's a bug in either antlr or how we use antlr. I will re-generate the binaries and start another vote. Note the source tag will be the same which is technically what we vote on.. On Wed, Feb 18, 2015 at 3:59 PM, Chao Sun c...@cloudera.com wrote: I tested apache-hive.1.1.0-bin and I also got the same error as Szehon reported. On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com wrote: Hi, On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, From the release branch, I noticed that the hive-exec.jar now contains a copy of guava-14 without any relocations. The hive spark-client pom.xml adds guava as a lib jar instead of shading it in. https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111 That seems to be a great approach for guava compat issues across execution engines. Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??). https://issues.apache.org/jira/browse/SPARK-2848 Does any of the same compatibility issues occur when using a hive-exec.jar containing guava-14 on MRv2 (which has guava-11 in the classpath)? Not that I am aware of. I've tested it on top of MRv2 a number of times and I think the unit tests also excercise these code paths. Cheers, Gopal On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours -- Best, Chao
[jira] [Updated] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9647: -- Status: Patch Available (was: Open) Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-9647.01.patch High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE co-porccessors which we can use as a reference here Using the global stats from TAB_COL_STATS and the per partition stats from PART_COL_STATS extrapolate the NDV for the qualified partitions as in : Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS)) More details While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of the plans are different, then I dumped the CBO logical plan and I found that join estimates are drastically different Unpartitioned schema : {code} 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering: HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2956 HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2982 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2980 HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], joinType=[inner]): rowcount = 28880.460910696, cumulative cost = {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): rowcount = 5.5578005E7, cumulative cost = {0}, id = 2823 HiveProjectRel(d_date_sk=[$0],
[jira] [Updated] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-9647: -- Attachment: HIVE-9647.01.patch Current HIVE implements the NDV using Flajolet-Martin algorithm. In this algorithm, it assumes that we can have a a hash function hash(x) which maps input x to integers in the range [0; 2^{L-1}] and where the outputs are sufficiently UNIFORMLY distributed. Thus, if we assume an UNIFORM distribution, the density of the NDV should also be the same. Moreover, since we already have the min/max as well as NDV for each partition, we can calculate the density for each partition. We use the average of the density of all the partitions for the aggregation. This method is not only independent of the # of partitions which runs fast, but also is easy to extended to extrapolation cases where we miss the status of some of the partitions. This patch also address the bug in HIVE-9717 Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-9647.01.patch High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE co-porccessors which we can use as a reference here Using the global stats from TAB_COL_STATS and the per partition stats from PART_COL_STATS extrapolate the NDV for the qualified partitions as in : Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS)) More details While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of the plans are different, then I dumped the CBO logical plan and I found that join estimates are drastically different Unpartitioned schema : {code} 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering: HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2956 HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2982 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2980 HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], joinType=[inner]): rowcount = 28880.460910696, cumulative
[jira] [Commented] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables
[ https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326844#comment-14326844 ] Mostafa Mokhtar commented on HIVE-9647: --- Awesome :) I am happy we can fix this. Discrepancy in cardinality estimates between partitioned and un-partitioned tables --- Key: HIVE-9647 URL: https://issues.apache.org/jira/browse/HIVE-9647 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Pengcheng Xiong Fix For: 1.2.0 Attachments: HIVE-9647.01.patch High-level summary HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number of distinct value to estimate join selectivity. The way statistics are aggregated for partitioned tables results in discrepancy in number of distinct values which results in different plans between partitioned and un-partitioned schemas. The table below summarizes the NDVs in computeInnerJoinSelectivity which are used to estimate selectivity of joins. ||Column ||Partitioned count distincts|| Un-Partitioned count distincts |sr_customer_sk |71,245 |1,415,625| |sr_item_sk |38,846|62,562| |sr_ticket_number |71,245 |34,931,085| |ss_customer_sk |88,476|1,415,625| |ss_item_sk |38,846|62,562| |ss_ticket_number|100,756 |56,256,175| The discrepancy is because NDV calculation for a partitioned table assumes that the NDV range is contained within each partition and is calculates as select max(NUM_DISTINCTS) from PART_COL_STATS” . This is problematic for columns like ticket number which are naturally increasing with the partitioned date column ss_sold_date_sk. Suggestions Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for HBASE co-porccessors which we can use as a reference here Using the global stats from TAB_COL_STATS and the per partition stats from PART_COL_STATS extrapolate the NDV for the qualified partitions as in : Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS)) More details While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of the plans are different, then I dumped the CBO logical plan and I found that join estimates are drastically different Unpartitioned schema : {code} 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering: HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2956 HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2982 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2980 HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], joinType=[inner]): rowcount = 28880.460910696, cumulative cost = {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923 HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): rowcount = 5.5578005E7, cumulative
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
Hi, On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, From the release branch, I noticed that the hive-exec.jar now contains a copy of guava-14 without any relocations. The hive spark-client pom.xml adds guava as a lib jar instead of shading it in. https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111 That seems to be a great approach for guava compat issues across execution engines. Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??). https://issues.apache.org/jira/browse/SPARK-2848 Does any of the same compatibility issues occur when using a hive-exec.jar containing guava-14 on MRv2 (which has guava-11 in the classpath)? Not that I am aware of. I've tested it on top of MRv2 a number of times and I think the unit tests also excercise these code paths. Cheers, Gopal On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours
[jira] [Commented] (HIVE-9703) Merge from Spark branch to trunk 02/16/2015
[ https://issues.apache.org/jira/browse/HIVE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326776#comment-14326776 ] Lefty Leverenz commented on HIVE-9703: -- Does any of this need documentation, or can we assume it's all covered by jiras that patched the Spark branch? Merge from Spark branch to trunk 02/16/2015 --- Key: HIVE-9703 URL: https://issues.apache.org/jira/browse/HIVE-9703 Project: Hive Issue Type: Task Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 1.2.0 Attachments: HIVE-9703.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8807) Obsolete default values in webhcat-default.xml
[ https://issues.apache.org/jira/browse/HIVE-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326827#comment-14326827 ] Lefty Leverenz commented on HIVE-8807: -- This also needs to be done for release 1.1.0, but I don't think we should have a new Jira for each release. Would it make sense to reopen this issue for each release? Or is there a better way to make sure webhcat-default.xml gets updated? Obsolete default values in webhcat-default.xml -- Key: HIVE-8807 URL: https://issues.apache.org/jira/browse/HIVE-8807 Project: Hive Issue Type: Bug Components: WebHCat Affects Versions: 0.12.0, 0.13.0, 0.14.0 Reporter: Lefty Leverenz Assignee: Eugene Koifman Fix For: 1.0.0 Attachments: HIVE8807.patch The defaults for templeton.pig.path templeton.hive.path are 0.11 in webhcat-default.xml but they ought to match current release numbers. The Pig version is 0.12.0 for Hive 0.14 RC0 (as shown in pom.xml). no precommit tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9613) Left join query plan outputs wrong column when using subquery
[ https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326838#comment-14326838 ] Chao commented on HIVE-9613: Hi [~spyfree], sorry I was wrong before - in the upstream trunk I don't get this issue anymore. It appears that this is an issue in ColumnPruner and is already fixed in HIVE-9327. Left join query plan outputs wrong column when using subquery -- Key: HIVE-9613 URL: https://issues.apache.org/jira/browse/HIVE-9613 Project: Hive Issue Type: Bug Components: Parser, Query Planning Affects Versions: 0.14.0, 1.0.0 Environment: apache hadoop 2.5.1 Reporter: Li Xin Attachments: test.sql I have a query that outputs a column with wrong contents when using subquery,and the contents of that column is equal to another column,not its own. I have three tables,as follows: table 1: _hivetemp.category_city_rank_: ||category||city||rank|| |jinrongfuwu|shanghai|1| |ktvjiuba|shanghai|2| table 2:_hivetemp.category_match_: ||src_category_en||src_category_cn||dst_category_en||dst_category_cn|| |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu| |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba| table 3:_hivetemp.city_match_: ||src_city_name_en||dst_city_name_en||city_name_cn|| |sh|shanghai|上海| And the query is : {code} select a.category, a.city, a.rank, b.src_category_en, c.src_city_name_en from hivetemp.category_city_rank a left outer join (select src_category_en, dst_category_en from hivetemp.category_match) b on a.category = b.dst_category_en left outer join (select src_city_name_en, dst_city_name_en from hivetemp.city_match) c on a.city = c.dst_city_name_en {code} which shoud output the results as follows,and i test it in hive 0.13: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh| |ktvjiuba|shanghai|2|zpwentiyingshi|sh| but int hive0.14,the results in the column *src_category_en* is wrong,and is just the *city* contents: ||category||city||rank||src_category_en||src_city_name_en|| |jinrongfuwu|shanghai|1|shanghai|sh| |ktvjiuba|shanghai|2|shanghai|sh| Using explain to examine the execution plan,i can see the first subquery just outputs one column of *dst_category_en*,and *src_category_en* is just missing. {quote} b:category_match TableScan alias: category_match Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: dst_category_en (type: string) outputColumnNames: _col1 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE Column stats: NONE {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9719) Up calcite version on cbo branch
[ https://issues.apache.org/jira/browse/HIVE-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326850#comment-14326850 ] Julian Hyde commented on HIVE-9719: --- I just pushed the snapshot. It is based on https://github.com/apache/incubator-calcite/commit/f9db1ee9210a04f7a3ddae23e52e26be1669debb. Up calcite version on cbo branch Key: HIVE-9719 URL: https://issues.apache.org/jira/browse/HIVE-9719 Project: Hive Issue Type: Task Components: CBO Affects Versions: cbo-branch Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-9719.cbo.patch CALCITE-594 is now checked in calcite master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
I tested apache-hive.1.1.0-bin and I also got the same error as Szehon reported. On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com wrote: Hi, On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, From the release branch, I noticed that the hive-exec.jar now contains a copy of guava-14 without any relocations. The hive spark-client pom.xml adds guava as a lib jar instead of shading it in. https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111 That seems to be a great approach for guava compat issues across execution engines. Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??). https://issues.apache.org/jira/browse/SPARK-2848 Does any of the same compatibility issues occur when using a hive-exec.jar containing guava-14 on MRv2 (which has guava-11 in the classpath)? Not that I am aware of. I've tested it on top of MRv2 a number of times and I think the unit tests also excercise these code paths. Cheers, Gopal On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours -- Best, Chao
Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2
Yeah that is really strange. I have seen that before, a long time back, and but not found the root cause. I think it's a bug in either antlr or how we use antlr. I will re-generate the binaries and start another vote. Note the source tag will be the same which is technically what we vote on.. On Wed, Feb 18, 2015 at 3:59 PM, Chao Sun c...@cloudera.com wrote: I tested apache-hive.1.1.0-bin and I also got the same error as Szehon reported. On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com wrote: Hi, On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org wrote: Hi, From the release branch, I noticed that the hive-exec.jar now contains a copy of guava-14 without any relocations. The hive spark-client pom.xml adds guava as a lib jar instead of shading it in. https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111 That seems to be a great approach for guava compat issues across execution engines. Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??). https://issues.apache.org/jira/browse/SPARK-2848 Does any of the same compatibility issues occur when using a hive-exec.jar containing guava-14 on MRv2 (which has guava-11 in the classpath)? Not that I am aware of. I've tested it on top of MRv2 a number of times and I think the unit tests also excercise these code paths. Cheers, Gopal On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote: Apache Hive 1.1.0 Release Candidate 2 is available here: http://people.apache.org/~brock/apache-hive-1.1.0-rc2/ Maven artifacts are available here: https://repository.apache.org/content/repositories/orgapachehive-1025/ Source tag for RC1 is at: http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/ My key is located here: https://people.apache.org/keys/group/hive.asc Voting will conclude in 72 hours -- Best, Chao
[jira] [Reopened] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces
[ https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] N Campbell reopened HIVE-9537: -- The Hive documentation is vague at best with respect to when padding is preserved/ignored: Char types are similar to Varchar but they are fixed-length meaning that values shorter than the specified length value are padded with spaces but trailing spaces are not important during comparisons. The maximum length is fixed at 255. There is no discussion on non-comparison operations such as upper, lower, concat etc. Consider the following, the driver may return CCHAR will trailing blanks but a string operation such as concat fails to preserve them. Should an application locally perform a scalar operation on the returned value such as LEN, LOWER etc then it may retain the spaces. Meanwhile server side an 'equivalent' expression is not blank preserving. select rnum, cchar, concat( concat( concat( cchar,'...'), cchar),'...') from tchar. So the driver will return BBspaces and then BB...BB... for the 2nd and 3rd projected item. Similarly length(cchar) returns 2 and not 5 etc. Customers using technologies such as Hana, DB2, Netezza, ... will expect the blank padded behaviour. To all intents and purposes most SQL persons would not consider the implementation to be fixed length character. i.e length(cchar) returns 32 i.e cchar || '...' . returns 'BB ...BB ...' Should this be the design intent of Hive I would ask for the documentation to be far more comprehensive is stating the semantics. string expressions on a fixed length character do not preserve trailing spaces -- Key: HIVE-9537 URL: https://issues.apache.org/jira/browse/HIVE-9537 Project: Hive Issue Type: Bug Components: SQL Reporter: N Campbell Assignee: Aihua Xu When a string expression such as upper or lower is applied to a fixed length column the trailing spaces of the fixed length character are not preserved. {code:sql} CREATE TABLE if not exists TCHAR ( RNUM int, CCHAR char(32) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; {code} {{cchar}} as a {{char(32)}}. {code:sql} select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), concat(upper(cchar), cchar) from tchar; {code} 0|\N 1| 2| 3|BB 4|EE 5|FF -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326921#comment-14326921 ] Rui Li commented on HIVE-9561: -- Thank you Xuefu! SHUFFLE_SORT should only be used for order by query [Spark Branch] -- Key: HIVE-9561 URL: https://issues.apache.org/jira/browse/HIVE-9561 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Fix For: spark-branch Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, HIVE-9561.6-spark.patch The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance and are difficult to control. So we should limit the use of {{sortByKey}} to order by query only. -- This message was sent by Atlassian JIRA (v6.3.4#6332)