[jira] [Commented] (HIVE-9699) Extend PTFs to provide referenced columns for CP

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325584#comment-14325584
 ] 

Lefty Leverenz commented on HIVE-9699:
--

Does this need any user documentation?

 Extend PTFs to provide referenced columns for CP
 

 Key: HIVE-9699
 URL: https://issues.apache.org/jira/browse/HIVE-9699
 Project: Hive
  Issue Type: Improvement
  Components: PTF-Windowing
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 1.2.0

 Attachments: HIVE-9699.1.patch.txt, HIVE-9699.2.patch.txt


 As described in HIVE-9341, If PTFs can provide referenced column names, 
 column pruner can use that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3781) Index related events should be delivered to metastore event listener

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325560#comment-14325560
 ] 

Lefty Leverenz commented on HIVE-3781:
--

Doc done:  The wiki has been updated so I removed the TODOC15 label.

Version information was not needed, because *hive.exec.drop.ignorenonexistent* 
has covered DROP INDEX since 0.7.0 when the parameter was created (HIVE-1858).

 Index related events should be delivered to metastore event listener
 

 Key: HIVE-3781
 URL: https://issues.apache.org/jira/browse/HIVE-3781
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0
Reporter: Sudhanshu Arora
Assignee: Navis
 Fix For: 1.1.0

 Attachments: HIVE-3781.5.patch.txt, HIVE-3781.6.patch.txt, 
 HIVE-3781.7.patch.txt, HIVE-3781.D7731.1.patch, HIVE-3781.D7731.2.patch, 
 HIVE-3781.D7731.3.patch, HIVE-3781.D7731.4.patch, hive.3781.3.patch, 
 hive.3781.4.patch


 An event listener must be called for any DDL activity. For example, 
 create_index, drop_index today does not call metaevent listener.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3781) Index related events should be delivered to metastore event listener

2015-02-18 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-3781:
-
Labels:   (was: TODOC15)

 Index related events should be delivered to metastore event listener
 

 Key: HIVE-3781
 URL: https://issues.apache.org/jira/browse/HIVE-3781
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0
Reporter: Sudhanshu Arora
Assignee: Navis
 Fix For: 1.1.0

 Attachments: HIVE-3781.5.patch.txt, HIVE-3781.6.patch.txt, 
 HIVE-3781.7.patch.txt, HIVE-3781.D7731.1.patch, HIVE-3781.D7731.2.patch, 
 HIVE-3781.D7731.3.patch, HIVE-3781.D7731.4.patch, hive.3781.3.patch, 
 hive.3781.4.patch


 An event listener must be called for any DDL activity. For example, 
 create_index, drop_index today does not call metaevent listener.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9556) create UDF to calculate the Levenshtein distance between two strings

2015-02-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325588#comment-14325588
 ] 

Hive QA commented on HIVE-9556:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12699433/HIVE-9556.3.patch

{color:green}SUCCESS:{color} +1 7560 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2819/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2819/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2819/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12699433 - PreCommit-HIVE-TRUNK-Build

 create UDF to calculate the Levenshtein distance between two strings
 

 Key: HIVE-9556
 URL: https://issues.apache.org/jira/browse/HIVE-9556
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch, HIVE-9556.3.patch


 Levenshtein distance is a string metric for measuring the difference between 
 two sequences. Informally, the Levenshtein distance between two words is the 
 minimum number of single-character edits (i.e. insertions, deletions or 
 substitutions) required to change one word into the other. It is named after 
 Vladimir Levenshtein, who considered this distance in 1965.
 Example:
 The Levenshtein distance between kitten and sitting is 3
 1. kitten → sitten (substitution of s for k)
 2. sitten → sittin (substitution of i for e)
 3. sittin → sitting (insertion of g at the end).
 {code}
 select levenshtein('kitten', 'sitting');
 3
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-2573) Create per-session function registry

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325556#comment-14325556
 ] 

Lefty Leverenz commented on HIVE-2573:
--

Doc update:  The description of *hive.exec.drop.ignorenonexistent* has been 
updated in the wiki.

Does the per-session function registry need to be documented?

 Create per-session function registry 
 -

 Key: HIVE-2573
 URL: https://issues.apache.org/jira/browse/HIVE-2573
 Project: Hive
  Issue Type: Improvement
  Components: Server Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2573.D3231.1.patch, 
 HIVE-2573.1.patch.txt, HIVE-2573.10.patch.txt, HIVE-2573.11.patch.txt, 
 HIVE-2573.12.patch.txt, HIVE-2573.13.patch.txt, HIVE-2573.14.patch.txt, 
 HIVE-2573.15.patch.txt, HIVE-2573.2.patch.txt, HIVE-2573.3.patch.txt, 
 HIVE-2573.4.patch.txt, HIVE-2573.5.patch, HIVE-2573.6.patch, 
 HIVE-2573.7.patch, HIVE-2573.8.patch.txt, HIVE-2573.9.patch.txt


 Currently the function registry is shared resource and could be overrided by 
 other users when using HiveServer. If per-session function registry is 
 provided, this situation could be prevented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9712) Row count and data size are set to LONG.MAX when source table has 0 rows

2015-02-18 Thread Damien Carol (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damien Carol updated HIVE-9712:
---
Summary: Row count and data size are set to LONG.MAX when source table has 
0 rows  (was: Hive : Row count and data size are set to LONG.MAX when source 
table has 0 rows)

 Row count and data size are set to LONG.MAX when source table has 0 rows
 

 Key: HIVE-9712
 URL: https://issues.apache.org/jira/browse/HIVE-9712
 Project: Hive
  Issue Type: Bug
  Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran

 TPC-DS Q66 generates and in-efficient plan because cardinality estimate of 
 dimension table gets set to 9223372036854775807.
 {code}
 Map 10 
 Map Operator Tree:
 TableScan
   alias: ship_mode
   filterExpr: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and 
 sm_ship_mode_sk is not null) (type: boolean)
   Statistics: Num rows: 0 Data size: 47 Basic stats: PARTIAL 
 Column stats: COMPLETE
   Filter Operator
 predicate: ((sm_carrier) IN ('DIAMOND', 'AIRBORNE') and 
 sm_ship_mode_sk is not null) (type: boolean)
 Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
 Select Operator
   expressions: sm_ship_mode_sk (type: int)
   outputColumnNames: _col0
   Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
   Reduce Output Operator
 key expressions: _col0 (type: int)
 sort order: +
 Map-reduce partition columns: _col0 (type: int)
 Statistics: Num rows: 9223372036854775807 Data size: 
 9223372036854775807 Basic stats: COMPLETE Column stats: COMPLETE
 Execution mode: vectorized
 {code}
 Full plan 
 {code}
 explain  
 select   
  w_warehouse_name
   ,w_warehouse_sq_ft
   ,w_city
   ,w_county
   ,w_state
   ,w_country
 ,ship_carriers
 ,year
   ,sum(jan_sales) as jan_sales
   ,sum(feb_sales) as feb_sales
   ,sum(mar_sales) as mar_sales
   ,sum(apr_sales) as apr_sales
   ,sum(may_sales) as may_sales
   ,sum(jun_sales) as jun_sales
   ,sum(jul_sales) as jul_sales
   ,sum(aug_sales) as aug_sales
   ,sum(sep_sales) as sep_sales
   ,sum(oct_sales) as oct_sales
   ,sum(nov_sales) as nov_sales
   ,sum(dec_sales) as dec_sales
   ,sum(jan_sales/w_warehouse_sq_ft) as jan_sales_per_sq_foot
   ,sum(feb_sales/w_warehouse_sq_ft) as feb_sales_per_sq_foot
   ,sum(mar_sales/w_warehouse_sq_ft) as mar_sales_per_sq_foot
   ,sum(apr_sales/w_warehouse_sq_ft) as apr_sales_per_sq_foot
   ,sum(may_sales/w_warehouse_sq_ft) as may_sales_per_sq_foot
   ,sum(jun_sales/w_warehouse_sq_ft) as jun_sales_per_sq_foot
   ,sum(jul_sales/w_warehouse_sq_ft) as jul_sales_per_sq_foot
   ,sum(aug_sales/w_warehouse_sq_ft) as aug_sales_per_sq_foot
   ,sum(sep_sales/w_warehouse_sq_ft) as sep_sales_per_sq_foot
   ,sum(oct_sales/w_warehouse_sq_ft) as oct_sales_per_sq_foot
   ,sum(nov_sales/w_warehouse_sq_ft) as nov_sales_per_sq_foot
   ,sum(dec_sales/w_warehouse_sq_ft) as dec_sales_per_sq_foot
   ,sum(jan_net) as jan_net
   ,sum(feb_net) as feb_net
   ,sum(mar_net) as mar_net
   ,sum(apr_net) as apr_net
   ,sum(may_net) as may_net
   ,sum(jun_net) as jun_net
   ,sum(jul_net) as jul_net
   ,sum(aug_net) as aug_net
   ,sum(sep_net) as sep_net
   ,sum(oct_net) as oct_net
   ,sum(nov_net) as nov_net
   ,sum(dec_net) as dec_net
  from (
 select 
   w_warehouse_name
   ,w_warehouse_sq_ft
   ,w_city
   ,w_county
   ,w_state
   ,w_country
   ,concat('DIAMOND', ',', 'AIRBORNE') as ship_carriers
 ,d_year as year
   ,sum(case when d_moy = 1 
   then ws_sales_price* ws_quantity else 0 end) as jan_sales
   ,sum(case when d_moy = 2 
   then ws_sales_price* ws_quantity else 0 end) as feb_sales
   ,sum(case when d_moy = 3 
   then ws_sales_price* ws_quantity else 0 end) as mar_sales
   ,sum(case when d_moy = 4 
   then ws_sales_price* ws_quantity else 0 end) as apr_sales
   ,sum(case when d_moy = 5 
   then ws_sales_price* ws_quantity else 0 end) as may_sales
   ,sum(case when d_moy = 6 
   then ws_sales_price* ws_quantity else 0 end) as jun_sales
   ,sum(case when d_moy = 7 
  

[jira] [Commented] (HIVE-9188) BloomFilter support in ORC

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325694#comment-14325694
 ] 

Lefty Leverenz commented on HIVE-9188:
--

Doc note:  [~prasanth_j] documented this in the ORC wikidoc.

* [ORC Files -- Bloom Filter Index | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-BloomFilterIndex]

 BloomFilter support in ORC
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Fix For: 1.2.0

 Attachments: HIVE-9188.1.patch, HIVE-9188.10.patch, 
 HIVE-9188.11.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, HIVE-9188.4.patch, 
 HIVE-9188.5.patch, HIVE-9188.6.patch, HIVE-9188.7.patch, HIVE-9188.8.patch, 
 HIVE-9188.9.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-6755) Zookeeper Lock Manager leaks zookeeper connections.

2015-02-18 Thread Andrey Stepachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Stepachev resolved HIVE-6755.

Resolution: Won't Fix

 Zookeeper Lock Manager leaks zookeeper connections.
 ---

 Key: HIVE-6755
 URL: https://issues.apache.org/jira/browse/HIVE-6755
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
 Environment: cloudera cdh5b2
Reporter: Andrey Stepachev
Priority: Critical
 Attachments: HIVE-6755.patch


 Driver holds instance for ZkHiveLockManager. In turn SqlQuery holds it too. 
 So if we have many not closed queries we will get many zk sessions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9715) Add support for SSL keypass

2015-02-18 Thread Wellington Chevreuil (JIRA)
Wellington Chevreuil created HIVE-9715:
--

 Summary: Add support for SSL keypass
 Key: HIVE-9715
 URL: https://issues.apache.org/jira/browse/HIVE-9715
 Project: Hive
  Issue Type: Improvement
Reporter: Wellington Chevreuil
Priority: Minor


Currently, Hive Server allows for setting keystore file password only. It does 
not support to use keys with password. This feature is supported by some other 
hadoop services, such as HDFS, HBASE, MR. It would be nice to have this 
behaviour in hive consistent with the other mentioned services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9715) Add support for SSL keypass

2015-02-18 Thread Wellington Chevreuil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wellington Chevreuil updated HIVE-9715:
---
Component/s: HiveServer2

 Add support for SSL keypass
 ---

 Key: HIVE-9715
 URL: https://issues.apache.org/jira/browse/HIVE-9715
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Wellington Chevreuil
Priority: Minor

 Currently, Hive Server allows for setting keystore file password only. It 
 does not support to use keys with password. This feature is supported by some 
 other hadoop services, such as HDFS, HBASE, MR. It would be nice to have this 
 behaviour in hive consistent with the other mentioned services.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved HIVE-9546.

Resolution: Duplicate

You are hitting HIVE-9199. If you think it's not such issue, please reopen and 
provide more information.

 Create table taking substantially longer time when other select queries are 
 run in parallel.
 

 Key: HIVE-9546
 URL: https://issues.apache.org/jira/browse/HIVE-9546
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: RedHat Linux, Cloudera 5.3.0
Reporter: sri venu bora
Assignee: Aihua Xu
 Attachments: Hive_create_Issue.txt


 Create table taking substantially longer time when other select queries are 
 run in parallel.
 We were able to reproduce the issue using beeline in two sessions.
 Beeline Shell 1: 
  a) create table with no other queries running on hive ( took approximately 
 0.313 seconds)
  b) Insert Data into the table
  c) Run a select count query on the above table
 Beeline Shell 2: 
  a) create table while step c) is running in the Beeline Shell 1. (took 
 approximately 60.431 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: VOTE Bylaw for having branch committers in hive

2015-02-18 Thread Prasad Mujumdar
  +1

thanks
Prasad


On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com
wrote:

 Hi Folks,

 We seem to have quite a few projects going around and in the interest of
 time and the project as a whole, it seems good to have branch committers
 much like what is there in the Hadoop project. I am proposing an addition
 to the committer bylaws as follows ( taken from the hadoop project bylaws
 http://hadoop.apache.org/bylaws.html )

 Significant, pervasive features are often developed in a speculative
 branch of the repository. The PMC may grant commit rights on the branch to
 its consistent contributors, while the initiative is active. Branch
 committers are responsible for shepherding their feature into an active
 release and do not cast binding votes or vetoes in the project.

 Actions: New Branch Committer
 Description: When a new branch committer is proposed for the project.
 Approval: Lazy Consensus
 Binding Votes: Active PMC members
 Minimum Length: 3 days
 Mailing List: priv...@hive.apache.org

 Actions: Removal of Branch Committer
 Description: When a branch committer is removed from the project.
 Approval: Consensus
 Binding Votes: Active PMC members excluding the committer in question if
 they are PMC members too.
 Minimum Length: 6 days
 Mailing List: priv...@hive.apache.org

 This vote will run for 6 days. PMC members please vote.

 Thanks
 Vikram.



[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Peter Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326302#comment-14326302
 ] 

Peter Lin commented on HIVE-7292:
-

Would love to use this production, is it going to release in hive 15?

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9717) The max/min function used by AggrStats for decimal type is not what we expected

2015-02-18 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-9717:
-

 Summary: The max/min function used by AggrStats for decimal type 
is not what we expected
 Key: HIVE-9717
 URL: https://issues.apache.org/jira/browse/HIVE-9717
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong


In current version hive-schema-1.2.0, in TABLE PART_COL_STATS, we store the 
BIG_DECIMAL_LOW_VALUE and BIG_DECIMAL_HIGH_VALUE as varchar. For example,

derby
BIG_DECIMAL_LOW_VALUE VARCHAR(4000), BIG_DECIMAL_HIGH_VALUE VARCHAR(4000)

mssql
BIG_DECIMAL_HIGH_VALUE varchar(255) NULL,
BIG_DECIMAL_LOW_VALUE varchar(255) NULL,

mysql
`BIG_DECIMAL_LOW_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin,
 `BIG_DECIMAL_HIGH_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin,

oracle
BIG_DECIMAL_LOW_VALUE VARCHAR2(4000),
 BIG_DECIMAL_HIGH_VALUE VARCHAR2(4000),

postgres
BIG_DECIMAL_LOW_VALUE character varying(4000) DEFAULT NULL::character varying,
 BIG_DECIMAL_HIGH_VALUE character varying(4000) DEFAULT NULL::character 
varying,

And, when we do the aggrstats, we do a MAX/MIN of all the 
BIG_DECIMAL_HIGH_VALU/BIG_DECIMAL_LOW_VALUEE of partitions. We are expecting a 
max/min of a decimal (a number). However, it is actually a max/min of a varchar 
(a string). As a result, '900' is more than '1000'. This also affects the 
extrapolation of the status. The proposed solution is to use a CAST function to 
cast it to decimal. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 1

2015-02-18 Thread Prasad Mujumdar
  I guess the README.txt can list Apache Spark as query execution framework
along with MapReduce and Tez.

thanks
Prasad


On Tue, Feb 17, 2015 at 1:07 PM, Brock Noland br...@cloudera.com wrote:

 Thank you Alan. That is my mistake actually. We can delete this now and
 will do so here: https://issues.apache.org/jira/browse/HIVE-9708

 On Tue, Feb 17, 2015 at 10:37 AM, Alan Gates alanfga...@gmail.com wrote:

 It looks like a jar file snuck into the source release:
 gates find . -name \*.jar
 ./testlibs/ant-contrib-1.0b3.jar

 Apache policy is that binary files cannot be in releases.

 Alan.

   Brock Noland br...@cloudera.com
  February 16, 2015 at 21:08
 Apache Hive 1.1.0 Release Candidate 0 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc1/

 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1024/

 Source tag for RC1 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc1/

 My key is located here: https://people.apache.org/keys/group/hive.asc

 Voting will conclude in 72 hours





[jira] [Comment Edited] (HIVE-7292) Hive on Spark

2015-02-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326336#comment-14326336
 ] 

Xuefu Zhang edited comment on HIVE-7292 at 2/18/15 6:37 PM:


Formerly 0.15, now 1.1 is going to be released soon. Release candidate is out.


was (Author: xuefuz):
Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-18 Thread Yongzhi Chen (JIRA)
Yongzhi Chen created HIVE-9716:
--

 Summary: Map job fails when table's LOCATION does not have scheme
 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 0.13.0, 0.12.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor


When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count (*) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326352#comment-14326352
 ] 

Aihua Xu commented on HIVE-3454:


Yeah. I have tested with an MR job and it picks up the hive-site.xml without 
the problem with hiveserver2 or CLI.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9188) BloomFilter support in ORC

2015-02-18 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-9188:

Release Note: Support for Bloom Filters in ORC internal index.

 BloomFilter support in ORC
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Fix For: 1.2.0

 Attachments: HIVE-9188.1.patch, HIVE-9188.10.patch, 
 HIVE-9188.11.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, HIVE-9188.4.patch, 
 HIVE-9188.5.patch, HIVE-9188.6.patch, HIVE-9188.7.patch, HIVE-9188.8.patch, 
 HIVE-9188.9.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9717) The max/min function used by AggrStats for decimal type is not what we expected

2015-02-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-9717:
-

Assignee: Pengcheng Xiong

 The max/min function used by AggrStats for decimal type is not what we 
 expected
 ---

 Key: HIVE-9717
 URL: https://issues.apache.org/jira/browse/HIVE-9717
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong

 In current version hive-schema-1.2.0, in TABLE PART_COL_STATS, we store the 
 BIG_DECIMAL_LOW_VALUE and BIG_DECIMAL_HIGH_VALUE as varchar. For example,
 derby
 BIG_DECIMAL_LOW_VALUE VARCHAR(4000), BIG_DECIMAL_HIGH_VALUE VARCHAR(4000)
 mssql
 BIG_DECIMAL_HIGH_VALUE varchar(255) NULL,
 BIG_DECIMAL_LOW_VALUE varchar(255) NULL,
 mysql
 `BIG_DECIMAL_LOW_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE latin1_bin,
  `BIG_DECIMAL_HIGH_VALUE` varchar(4000) CHARACTER SET latin1 COLLATE 
 latin1_bin,
 oracle
 BIG_DECIMAL_LOW_VALUE VARCHAR2(4000),
  BIG_DECIMAL_HIGH_VALUE VARCHAR2(4000),
 postgres
 BIG_DECIMAL_LOW_VALUE character varying(4000) DEFAULT NULL::character 
 varying,
  BIG_DECIMAL_HIGH_VALUE character varying(4000) DEFAULT NULL::character 
 varying,
 And, when we do the aggrstats, we do a MAX/MIN of all the 
 BIG_DECIMAL_HIGH_VALU/BIG_DECIMAL_LOW_VALUEE of partitions. We are expecting 
 a max/min of a decimal (a number). However, it is actually a max/min of a 
 varchar (a string). As a result, '900' is more than '1000'. This also affects 
 the extrapolation of the status. The proposed solution is to use a CAST 
 function to cast it to decimal. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-18 Thread Yongzhi Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongzhi Chen updated HIVE-9716:
---
Description: 
When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count ( * ) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more

  was:
When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count (*) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more


 Map job fails when table's LOCATION does not have scheme
 

 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor

 When a table's location (the value of column 'LOCATION' in SDS table in 
 metastore) does not have a scheme, map job returns error. For example, 
 when do select count ( * ) from t1, get following exception:
 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
 job_local2120192529_0001
 java.lang.Exception: java.lang.RuntimeException: 
 java.lang.IllegalStateException: 

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Prasad Mujumdar
I guess the README.txt can list Apache Spark as query execution
framework along with MapReduce and Tez.

thanks
Prasad


On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 +1

 1. downloaded the src and bin, and verified md5.
 2. built the src with -Phadoop-1 and -Phadoop-2.
 3. ran a few unit tests

 Thanks,
 Xuefu

 On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote:

  Apache Hive 1.1.0 Release Candidate 2 is available here:
  http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
 
  Maven artifacts are available here:
  https://repository.apache.org/content/repositories/orgapachehive-1025/
 
  Source tag for RC1 is at:
  http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
 
  My key is located here: https://people.apache.org/keys/group/hive.asc
 
  Voting will conclude in 72 hours
 



[jira] [Updated] (HIVE-9617) UDF from_utc_timestamp throws NPE if the second argument is null

2015-02-18 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-9617:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed.  Thanks Alexander for the fix and for being persistent on 
getting your patch reviewed.

 UDF from_utc_timestamp throws NPE if the second argument is null
 

 Key: HIVE-9617
 URL: https://issues.apache.org/jira/browse/HIVE-9617
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
Priority: Minor
 Attachments: HIVE-9617.1.patch, HIVE-9617.2.patch


 UDF from_utc_timestamp throws NPE if the second argument is null
 {code}
 select from_utc_timestamp('2015-02-06 10:30:00', cast(null as string));
 FAILED: NullPointerException null
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9188) BloomFilter support in ORC

2015-02-18 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326408#comment-14326408
 ] 

Prasanth Jayachandran commented on HIVE-9188:
-

[~leftylev] Thanks for the doc edits!

 BloomFilter support in ORC
 --

 Key: HIVE-9188
 URL: https://issues.apache.org/jira/browse/HIVE-9188
 Project: Hive
  Issue Type: New Feature
  Components: File Formats
Affects Versions: 0.15.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran
  Labels: orcfile
 Fix For: 1.2.0

 Attachments: HIVE-9188.1.patch, HIVE-9188.10.patch, 
 HIVE-9188.11.patch, HIVE-9188.2.patch, HIVE-9188.3.patch, HIVE-9188.4.patch, 
 HIVE-9188.5.patch, HIVE-9188.6.patch, HIVE-9188.7.patch, HIVE-9188.8.patch, 
 HIVE-9188.9.patch


 BloomFilters are well known probabilistic data structure for set membership 
 checking. We can use bloom filters in ORC index for better row group pruning. 
 Currently, ORC row group index uses min/max statistics to eliminate row 
 groups (stripes as well) that do not satisfy predicate condition specified in 
 the query. But in some cases, the efficiency of min/max based elimination is 
 not optimal (unsorted columns with wide range of entries). Bloom filters can 
 be an effective and efficient alternative for row group/split elimination for 
 point queries or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326336#comment-14326336
 ] 

Xuefu Zhang commented on HIVE-7292:
---

Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326335#comment-14326335
 ] 

Xuefu Zhang commented on HIVE-7292:
---

Formerly 0.15, now 1.1 is going to be release soon. Release candidate is out.

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9716) Map job fails when table's LOCATION does not have scheme

2015-02-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9716:
---
Description: 
When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count ( * ) from t1, get following exception:

{noformat}
15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more
{noformat}

  was:
When a table's location (the value of column 'LOCATION' in SDS table in 
metastore) does not have a scheme, map job returns error. For example, 
when do select count ( * ) from t1, get following exception:

15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
job_local2120192529_0001
java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.RuntimeException: java.lang.IllegalStateException: Invalid 
input path file:/user/hive/warehouse/t1/data
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Invalid input path 
file:/user/hive/warehouse/t1/data
at 
org.apache.hadoop.hive.ql.exec.MapOperator.getNominalPath(MapOperator.java:406)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp(MapOperator.java:442)
at 
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
... 9 more


 Map job fails when table's LOCATION does not have scheme
 

 Key: HIVE-9716
 URL: https://issues.apache.org/jira/browse/HIVE-9716
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Yongzhi Chen
Assignee: Yongzhi Chen
Priority: Minor

 When a table's location (the value of column 'LOCATION' in SDS table in 
 metastore) does not have a scheme, map job returns error. For example, 
 when do select count ( * ) from t1, get following exception:
 {noformat}
 15/02/18 12:29:43 [Thread-22]: WARN mapred.LocalJobRunner: 
 job_local2120192529_0001
 java.lang.Exception: java.lang.RuntimeException: 
 

[jira] [Commented] (HIVE-9556) create UDF to calculate the Levenshtein distance between two strings

2015-02-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326401#comment-14326401
 ] 

Jason Dere commented on HIVE-9556:
--

+1

 create UDF to calculate the Levenshtein distance between two strings
 

 Key: HIVE-9556
 URL: https://issues.apache.org/jira/browse/HIVE-9556
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Reporter: Alexander Pivovarov
Assignee: Alexander Pivovarov
 Attachments: HIVE-9556.1.patch, HIVE-9556.2.patch, HIVE-9556.3.patch


 Levenshtein distance is a string metric for measuring the difference between 
 two sequences. Informally, the Levenshtein distance between two words is the 
 minimum number of single-character edits (i.e. insertions, deletions or 
 substitutions) required to change one word into the other. It is named after 
 Vladimir Levenshtein, who considered this distance in 1965.
 Example:
 The Levenshtein distance between kitten and sitting is 3
 1. kitten → sitten (substitution of s for k)
 2. sitten → sittin (substitution of i for e)
 3. sittin → sitting (insertion of g at the end).
 {code}
 select levenshtein('kitten', 'sitting');
 3
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: VOTE Bylaw for having branch committers in hive

2015-02-18 Thread Ashutosh Chauhan
Seems like there is consensus all around.
Vikram,
would you like to update the wiki with new bylaws?

Thanks,
Ashutosh

On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org wrote:

   +1

 thanks
 Prasad


 On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com
 wrote:

  Hi Folks,
 
  We seem to have quite a few projects going around and in the interest of
  time and the project as a whole, it seems good to have branch committers
  much like what is there in the Hadoop project. I am proposing an addition
  to the committer bylaws as follows ( taken from the hadoop project bylaws
  http://hadoop.apache.org/bylaws.html )
 
  Significant, pervasive features are often developed in a speculative
  branch of the repository. The PMC may grant commit rights on the branch
 to
  its consistent contributors, while the initiative is active. Branch
  committers are responsible for shepherding their feature into an active
  release and do not cast binding votes or vetoes in the project.
 
  Actions: New Branch Committer
  Description: When a new branch committer is proposed for the project.
  Approval: Lazy Consensus
  Binding Votes: Active PMC members
  Minimum Length: 3 days
  Mailing List: priv...@hive.apache.org
 
  Actions: Removal of Branch Committer
  Description: When a branch committer is removed from the project.
  Approval: Consensus
  Binding Votes: Active PMC members excluding the committer in question if
  they are PMC members too.
  Minimum Length: 6 days
  Mailing List: priv...@hive.apache.org
 
  This vote will run for 6 days. PMC members please vote.
 
  Thanks
  Vikram.
 



[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Attachment: nation.tbl
patch.txt

 Insert into dynamic partitions with same column structure in the distibute 
 by clause barfs
 

 Key: HIVE-9718
 URL: https://issues.apache.org/jira/browse/HIVE-9718
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0
Reporter: Pavan Srinivas
 Attachments: nation.tbl, patch.txt


 Sample reproducible query: 
 {code}
 SET hive.exec.dynamic.partition.mode=nonstrict;
 SET hive.exec.dynamic.partition=true;
 explain insert overwrite table nation_new_p partition (p)
 select n_name as name1, n_name as name2, n_name as name3 from nation 
 distribute by name3;
 {code}
 Note: Make sure there is data in the source table to reproduce the issue. 
 During the optimizations done for Jira: 
 https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
 deduplication of columns is done. But when one of the columns is used as part 
 of partitioned/distribute by, its not taken care of.  
 The above query produces exception as follows:
 {code}
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row 
 {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
 carefully final deposits detect slyly agai}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row 
 {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
 carefully final deposits detect slyly agai}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
   ... 12 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 13 more
 Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
   ... 19 more
 {code}
 Table schema used: 
 {code}
 CREATE EXTERNAL TABLE `nation`(
   `n_nationkey` int,
   `n_name` string,
   `n_regionkey` int,
   `n_comment` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '|'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
 {code}
 Sample 

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (name3)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Table schema used: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}

Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (p)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a 

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table nation_new_p partition (some)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_name1` string,
  `n_name2` string,
  `n_name3` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain 

[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: (was: HIVE-3454.3.patch)

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: HIVE-3454.4.patch

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.4.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: VOTE Bylaw for having branch committers in hive

2015-02-18 Thread Vikram Dixit K
Hi Carl,

Here is the list of 17 active PMC members:

Brock Noland
Carl Steinbach
Edward Capriolo
Alan Gates
Gunther Hagleitner
Ashutosh Chauhan
Jason Dere
Lefty Leverenz
Navis Ryu
Owen O'Malley
Prasad Suresh Mujumdar
Prasanth J
Harish Butani
Szehon Ho
Thejas Madhavan Nair
Vikram Dixit K
Xuefu Zhang


Non active members:

Ashish Thusoo
Kevin Wilfong
He Yongqiang
Namit Jain
Joydeep Sensarma
Ning Zhang
Raghotham Murthy

https://issues.apache.org/jira/issues/?jql=text%20~%20%22kevin%20wilfong%22%20OR%20text%20~%20%22ashish%20thusoo%22%20or%20text%20~%20%22heyongqiang%22%20OR%20text%20~%20%22Namit%20Jain%22%20OR%20text%20~%20%22joydeep%20sensarma%22%20OR%20text%20~%20%22ning%20zhang%22%20OR%20text%20~%20%22raghotham%20murthy%22%20AND%20project%20%3D%20Hive%20ORDER%20BY%20updated%20DESC

In the results, only the first 4/5 need to be considered because of the
time line of 6 months. All of them were resolved in prior years and the
last comments are mostly hudson or closing comments by others. I could not
see any mails from them on the mailing lists either during this period.
Thus those 7 members haven't met the criterion for being active as
specified in the hive bylaws.

Should I change the bylaw for this type of vote happening to dev list
instead of the user mailing list as it is currently stated?

Thanks
Vikram.


On Wed, Feb 18, 2015 at 12:33 PM, Carl Steinbach cwsteinb...@gmail.com
wrote:

 Hi Vikram,

 Can you please post the names of the 17 currently active PMC members so
 that we have it for the records?

 Also, according to the bylaws this vote was supposed to happen on the
 user@hive list. Maybe we want to change this?

 Thanks.

 - Carl

 On Wed, Feb 18, 2015 at 12:25 PM, Vikram Dixit K vikram.di...@gmail.com
 wrote:

  Yes. The vote passes with 12 +1s out of 17 currently active PMC members.
 I
  will update the wiki with the new bylaws.
 
  On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org
 
  wrote:
 
   Seems like there is consensus all around.
   Vikram,
   would you like to update the wiki with new bylaws?
  
   Thanks,
   Ashutosh
  
   On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org
   wrote:
  
  +1
   
thanks
Prasad
   
   
On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K 
 vikram.di...@gmail.com
  
wrote:
   
 Hi Folks,

 We seem to have quite a few projects going around and in the
 interest
   of
 time and the project as a whole, it seems good to have branch
   committers
 much like what is there in the Hadoop project. I am proposing an
   addition
 to the committer bylaws as follows ( taken from the hadoop project
   bylaws
 http://hadoop.apache.org/bylaws.html )

 Significant, pervasive features are often developed in a
 speculative
 branch of the repository. The PMC may grant commit rights on the
  branch
to
 its consistent contributors, while the initiative is active. Branch
 committers are responsible for shepherding their feature into an
  active
 release and do not cast binding votes or vetoes in the project.

 Actions: New Branch Committer
 Description: When a new branch committer is proposed for the
 project.
 Approval: Lazy Consensus
 Binding Votes: Active PMC members
 Minimum Length: 3 days
 Mailing List: priv...@hive.apache.org

 Actions: Removal of Branch Committer
 Description: When a branch committer is removed from the project.
 Approval: Consensus
 Binding Votes: Active PMC members excluding the committer in
 question
   if
 they are PMC members too.
 Minimum Length: 6 days
 Mailing List: priv...@hive.apache.org

 This vote will run for 6 days. PMC members please vote.

 Thanks
 Vikram.

   
  
 
 
 
  --
  Nothing better than when appreciated for hard work.
  -Mark
 




-- 
Nothing better than when appreciated for hard work.
-Mark


Re: VOTE Bylaw for having branch committers in hive

2015-02-18 Thread Lefty Leverenz

 Should I change the bylaw for this type of vote happening to dev list
 instead of the user mailing list as it is currently stated?


Sounds good to me.

On the other hand, here are some arguments in favor of keeping this type of
vote on the user mailing list:  (1) wider distribution increases
transparency, (2) wider distribution can broaden the non-voting discussion,
(3) the user list has less traffic than the dev list, although the upcoming
issues list will reduce the dev clutter.

Anyway, it should be decided in a new voting thread.


-- Lefty

On Wed, Feb 18, 2015 at 1:24 PM, Vikram Dixit K vikram.di...@gmail.com
wrote:

 Hi Carl,

 Here is the list of 17 active PMC members:

 Brock Noland
 Carl Steinbach
 Edward Capriolo
 Alan Gates
 Gunther Hagleitner
 Ashutosh Chauhan
 Jason Dere
 Lefty Leverenz
 Navis Ryu
 Owen O'Malley
 Prasad Suresh Mujumdar
 Prasanth J
 Harish Butani
 Szehon Ho
 Thejas Madhavan Nair
 Vikram Dixit K
 Xuefu Zhang


 Non active members:

 Ashish Thusoo
 Kevin Wilfong
 He Yongqiang
 Namit Jain
 Joydeep Sensarma
 Ning Zhang
 Raghotham Murthy


 https://issues.apache.org/jira/issues/?jql=text%20~%20%22kevin%20wilfong%22%20OR%20text%20~%20%22ashish%20thusoo%22%20or%20text%20~%20%22heyongqiang%22%20OR%20text%20~%20%22Namit%20Jain%22%20OR%20text%20~%20%22joydeep%20sensarma%22%20OR%20text%20~%20%22ning%20zhang%22%20OR%20text%20~%20%22raghotham%20murthy%22%20AND%20project%20%3D%20Hive%20ORDER%20BY%20updated%20DESC

 In the results, only the first 4/5 need to be considered because of the
 time line of 6 months. All of them were resolved in prior years and the
 last comments are mostly hudson or closing comments by others. I could not
 see any mails from them on the mailing lists either during this period.
 Thus those 7 members haven't met the criterion for being active as
 specified in the hive bylaws.

 Should I change the bylaw for this type of vote happening to dev list
 instead of the user mailing list as it is currently stated?

 Thanks
 Vikram.


 On Wed, Feb 18, 2015 at 12:33 PM, Carl Steinbach cwsteinb...@gmail.com
 wrote:

  Hi Vikram,
 
  Can you please post the names of the 17 currently active PMC members so
  that we have it for the records?
 
  Also, according to the bylaws this vote was supposed to happen on the
  user@hive list. Maybe we want to change this?
 
  Thanks.
 
  - Carl
 
  On Wed, Feb 18, 2015 at 12:25 PM, Vikram Dixit K vikram.di...@gmail.com
 
  wrote:
 
   Yes. The vote passes with 12 +1s out of 17 currently active PMC
 members.
  I
   will update the wiki with the new bylaws.
  
   On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan 
 hashut...@apache.org
  
   wrote:
  
Seems like there is consensus all around.
Vikram,
would you like to update the wiki with new bylaws?
   
Thanks,
Ashutosh
   
On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org
 
wrote:
   
   +1

 thanks
 Prasad


 On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K 
  vikram.di...@gmail.com
   
 wrote:

  Hi Folks,
 
  We seem to have quite a few projects going around and in the
  interest
of
  time and the project as a whole, it seems good to have branch
committers
  much like what is there in the Hadoop project. I am proposing an
addition
  to the committer bylaws as follows ( taken from the hadoop
 project
bylaws
  http://hadoop.apache.org/bylaws.html )
 
  Significant, pervasive features are often developed in a
  speculative
  branch of the repository. The PMC may grant commit rights on the
   branch
 to
  its consistent contributors, while the initiative is active.
 Branch
  committers are responsible for shepherding their feature into an
   active
  release and do not cast binding votes or vetoes in the project.
 
  Actions: New Branch Committer
  Description: When a new branch committer is proposed for the
  project.
  Approval: Lazy Consensus
  Binding Votes: Active PMC members
  Minimum Length: 3 days
  Mailing List: priv...@hive.apache.org
 
  Actions: Removal of Branch Committer
  Description: When a branch committer is removed from the project.
  Approval: Consensus
  Binding Votes: Active PMC members excluding the committer in
  question
if
  they are PMC members too.
  Minimum Length: 6 days
  Mailing List: priv...@hive.apache.org
 
  This vote will run for 6 days. PMC members please vote.
 
  Thanks
  Vikram.
 

   
  
  
  
   --
   Nothing better than when appreciated for hard work.
   -Mark
  
 



 --
 Nothing better than when appreciated for hard work.
 -Mark



[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (name3)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET 

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table nation_new_p partition (some)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_name1` string,
  `n_name2` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table 

[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Description: 
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table nation_new_p partition (some)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, an optimization of 
deduplication of columns is done. But, when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Tables used are: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}
and 
{code}
CREATE TABLE `nation_new_p`(
  `n_name1` string,
  `n_name2` string)
PARTITIONED BY (
  `some` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
{code}
Sample data for the table is provided by the file attached with. 

  was:
Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

 insert overwrite table 

Re: VOTE Bylaw for having branch committers in hive

2015-02-18 Thread Vikram Dixit K
Yes. The vote passes with 12 +1s out of 17 currently active PMC members. I
will update the wiki with the new bylaws.

On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org
wrote:

 Seems like there is consensus all around.
 Vikram,
 would you like to update the wiki with new bylaws?

 Thanks,
 Ashutosh

 On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org
 wrote:

+1
 
  thanks
  Prasad
 
 
  On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com
  wrote:
 
   Hi Folks,
  
   We seem to have quite a few projects going around and in the interest
 of
   time and the project as a whole, it seems good to have branch
 committers
   much like what is there in the Hadoop project. I am proposing an
 addition
   to the committer bylaws as follows ( taken from the hadoop project
 bylaws
   http://hadoop.apache.org/bylaws.html )
  
   Significant, pervasive features are often developed in a speculative
   branch of the repository. The PMC may grant commit rights on the branch
  to
   its consistent contributors, while the initiative is active. Branch
   committers are responsible for shepherding their feature into an active
   release and do not cast binding votes or vetoes in the project.
  
   Actions: New Branch Committer
   Description: When a new branch committer is proposed for the project.
   Approval: Lazy Consensus
   Binding Votes: Active PMC members
   Minimum Length: 3 days
   Mailing List: priv...@hive.apache.org
  
   Actions: Removal of Branch Committer
   Description: When a branch committer is removed from the project.
   Approval: Consensus
   Binding Votes: Active PMC members excluding the committer in question
 if
   they are PMC members too.
   Minimum Length: 6 days
   Mailing List: priv...@hive.apache.org
  
   This vote will run for 6 days. PMC members please vote.
  
   Thanks
   Vikram.
  
 




-- 
Nothing better than when appreciated for hard work.
-Mark


Re: VOTE Bylaw for having branch committers in hive

2015-02-18 Thread Carl Steinbach
Hi Vikram,

Can you please post the names of the 17 currently active PMC members so
that we have it for the records?

Also, according to the bylaws this vote was supposed to happen on the
user@hive list. Maybe we want to change this?

Thanks.

- Carl

On Wed, Feb 18, 2015 at 12:25 PM, Vikram Dixit K vikram.di...@gmail.com
wrote:

 Yes. The vote passes with 12 +1s out of 17 currently active PMC members. I
 will update the wiki with the new bylaws.

 On Wed, Feb 18, 2015 at 11:15 AM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Seems like there is consensus all around.
  Vikram,
  would you like to update the wiki with new bylaws?
 
  Thanks,
  Ashutosh
 
  On Wed, Feb 18, 2015 at 8:58 AM, Prasad Mujumdar pras...@apache.org
  wrote:
 
 +1
  
   thanks
   Prasad
  
  
   On Mon, Feb 9, 2015 at 2:43 PM, Vikram Dixit K vikram.di...@gmail.com
 
   wrote:
  
Hi Folks,
   
We seem to have quite a few projects going around and in the interest
  of
time and the project as a whole, it seems good to have branch
  committers
much like what is there in the Hadoop project. I am proposing an
  addition
to the committer bylaws as follows ( taken from the hadoop project
  bylaws
http://hadoop.apache.org/bylaws.html )
   
Significant, pervasive features are often developed in a speculative
branch of the repository. The PMC may grant commit rights on the
 branch
   to
its consistent contributors, while the initiative is active. Branch
committers are responsible for shepherding their feature into an
 active
release and do not cast binding votes or vetoes in the project.
   
Actions: New Branch Committer
Description: When a new branch committer is proposed for the project.
Approval: Lazy Consensus
Binding Votes: Active PMC members
Minimum Length: 3 days
Mailing List: priv...@hive.apache.org
   
Actions: Removal of Branch Committer
Description: When a branch committer is removed from the project.
Approval: Consensus
Binding Votes: Active PMC members excluding the committer in question
  if
they are PMC members too.
Minimum Length: 6 days
Mailing List: priv...@hive.apache.org
   
This vote will run for 6 days. PMC members please vote.
   
Thanks
Vikram.
   
  
 



 --
 Nothing better than when appreciated for hard work.
 -Mark



[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Status: In Progress  (was: Patch Available)

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.13.1, 0.13.0, 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.1, 
 0.8.0
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Brock Noland
Good idea... since it's not a blocker I will add that for 1.1.1 and 1.2.0.

On Wed, Feb 18, 2015 at 10:37 AM, Prasad Mujumdar pras...@cloudera.com wrote:
 I guess the README.txt can list Apache Spark as query execution
 framework along with MapReduce and Tez.

 thanks
 Prasad


 On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com wrote:

 +1

 1. downloaded the src and bin, and verified md5.
 2. built the src with -Phadoop-1 and -Phadoop-2.
 3. ran a few unit tests

 Thanks,
 Xuefu

 On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote:

  Apache Hive 1.1.0 Release Candidate 2 is available here:
  http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
 
  Maven artifacts are available here:
  https://repository.apache.org/content/repositories/orgapachehive-1025/
 
  Source tag for RC1 is at:
  http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
 
  My key is located here: https://people.apache.org/keys/group/hive.asc
 
  Voting will conclude in 72 hours
 



[jira] [Commented] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326539#comment-14326539
 ] 

Gopal V commented on HIVE-9718:
---

The bug aside, the DISTRIBUTE BY will result in a sub-optimal plan.

Have you tried removing the DISTRIBUTE BY and instead using the automatic 
reducer injection?

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.optimize.sort.dynamic.partition


 Insert into dynamic partitions with same column structure in the distibute 
 by clause barfs
 

 Key: HIVE-9718
 URL: https://issues.apache.org/jira/browse/HIVE-9718
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0
Reporter: Pavan Srinivas
Priority: Critical
 Attachments: nation.tbl, patch.txt


 Sample reproducible query: 
 {code}
 SET hive.exec.dynamic.partition.mode=nonstrict;
 SET hive.exec.dynamic.partition=true;
  insert overwrite table nation_new_p partition (some)
 select n_name as name1, n_name as name2, n_name as name3 from nation 
 distribute by name3;
 {code}
 Note: Make sure there is data in the source table to reproduce the issue. 
 During the optimizations done for Jira: 
 https://issues.apache.org/jira/browse/HIVE-4867, an optimization of 
 deduplication of columns is done. But, when one of the columns is used as 
 part of partitioned/distribute by, its not taken care of.  
 The above query produces exception as follows:
 {code}
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row 
 {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
 carefully final deposits detect slyly agai}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row 
 {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
 carefully final deposits detect slyly agai}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
   ... 12 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 13 more
 Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
   ... 19 more
 {code}
 Tables used are: 
 {code}
 CREATE EXTERNAL 

[jira] [Created] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)
Pavan Srinivas created HIVE-9718:


 Summary: Insert into dynamic partitions with same column structure 
in the distibute by clause barfs
 Key: HIVE-9718
 URL: https://issues.apache.org/jira/browse/HIVE-9718
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0
Reporter: Pavan Srinivas


Sample reproducible query: 
{code}
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;

explain insert overwrite table nation_new_p partition (p)
select n_name as name1, n_name as name2, n_name as name3 from nation distribute 
by name3;
{code}

Note: Make sure there is data in the source table to reproduce the issue. 

During the optimizations done for Jira: 
https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
deduplication of columns is done. But when one of the columns is used as part 
of partitioned/distribute by, its not taken care of.  

The above query produces exception as follows:
{code}
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row 
{n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
carefully final deposits detect slyly agai}
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
... 12 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
... 13 more
Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at 
org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at 
org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
... 19 more
{code}

Table schema used: 
{code}
CREATE EXTERNAL TABLE `nation`(
  `n_nationkey` int,
  `n_name` string,
  `n_regionkey` int,
  `n_comment` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '|'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
{code}

Sample data for the table is provided by the file attached with. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9718) Insert into dynamic partitions with same column structure in the distibute by clause barfs

2015-02-18 Thread Pavan Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pavan Srinivas updated HIVE-9718:
-
Priority: Critical  (was: Major)

 Insert into dynamic partitions with same column structure in the distibute 
 by clause barfs
 

 Key: HIVE-9718
 URL: https://issues.apache.org/jira/browse/HIVE-9718
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.14.0, 1.0.0
Reporter: Pavan Srinivas
Priority: Critical
 Attachments: nation.tbl, patch.txt


 Sample reproducible query: 
 {code}
 SET hive.exec.dynamic.partition.mode=nonstrict;
 SET hive.exec.dynamic.partition=true;
 explain insert overwrite table nation_new_p partition (p)
 select n_name as name1, n_name as name2, n_name as name3 from nation 
 distribute by name3;
 {code}
 Note: Make sure there is data in the source table to reproduce the issue. 
 During the optimizations done for Jira: 
 https://issues.apache.org/jira/browse/HIVE-4867, a optimization of 
 deduplication of columns is done. But when one of the columns is used as part 
 of partitioned/distribute by, its not taken care of.  
 The above query produces exception as follows:
 {code}
 Diagnostic Messages for this Task:
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row 
 {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
 carefully final deposits detect slyly agai}
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:185)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
   at 
 org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row 
 {n_nationkey:0,n_name:ALGERIA,n_regionkey:0,n_comment: haggle. 
 carefully final deposits detect slyly agai}
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:503)
   at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176)
   ... 12 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
   at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
   at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493)
   ... 13 more
 Caused by: java.lang.RuntimeException: cannot find field _col2 from [0:_col0]
   at 
 org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
   at 
 org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147)
   at 
 org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:954)
   at 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:325)
   ... 19 more
 {code}
 Table schema used: 
 {code}
 CREATE EXTERNAL TABLE `nation`(
   `n_nationkey` int,
   `n_name` string,
   `n_regionkey` int,
   `n_comment` string)
 ROW FORMAT DELIMITED
   FIELDS TERMINATED BY '|'
 STORED AS INPUTFORMAT
   'org.apache.hadoop.mapred.TextInputFormat'
 OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
 

[jira] [Commented] (HIVE-9613) Left join query plan outputs wrong column when using subquery

2015-02-18 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326537#comment-14326537
 ] 

Chao commented on HIVE-9613:


OK, I was able to reproduce the issue on my cluster too. Previously I was using 
CLI local mode.
Strangely, the plan looks different when it is running on a cluster versus 
running locally.
I'll look more into this issue..

 Left join query plan outputs  wrong column when using subquery
 --

 Key: HIVE-9613
 URL: https://issues.apache.org/jira/browse/HIVE-9613
 Project: Hive
  Issue Type: Bug
  Components: Parser, Query Planning
Affects Versions: 0.14.0, 1.0.0
 Environment: apache hadoop 2.5.1 
Reporter: Li Xin
 Attachments: test.sql


 I have a query that outputs a column with wrong contents when using 
 subquery,and the contents of that column is equal to another column,not its 
 own.
 I have three tables,as follows:
 table 1: _hivetemp.category_city_rank_:
 ||category||city||rank||
 |jinrongfuwu|shanghai|1|
 |ktvjiuba|shanghai|2|
 table 2:_hivetemp.category_match_:
 ||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
 |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
 |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
 table 3:_hivetemp.city_match_:
 ||src_city_name_en||dst_city_name_en||city_name_cn||
 |sh|shanghai|上海|
 And the query is :
 {code}
 select
 a.category,
 a.city,
 a.rank,
 b.src_category_en,
 c.src_city_name_en
 from
 hivetemp.category_city_rank a
 left outer join
 (select
 src_category_en,
 dst_category_en
 from
 hivetemp.category_match) b
 on  a.category = b.dst_category_en
 left outer join
 (select
 src_city_name_en,
 dst_city_name_en
 from
 hivetemp.city_match) c
 on  a.city = c.dst_city_name_en
 {code}
 which shoud output the results as follows,and i test it in hive 0.13:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
 |ktvjiuba|shanghai|2|zpwentiyingshi|sh|
 but int hive0.14,the results in the column *src_category_en* is wrong,and is 
 just the *city* contents:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|shanghai|sh|
 |ktvjiuba|shanghai|2|shanghai|sh|
 Using explain to examine the execution plan,i can see the first subquery just 
 outputs one column of *dst_category_en*,and *src_category_en* is just missing.
 {quote}
b:category_match
   TableScan
 alias: category_match
 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: dst_category_en (type: string)
   outputColumnNames: _col1
   Statistics: Num rows: 131 Data size: 13149 Basic stats: 
 COMPLETE Column stats: NONE
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: (was: HIVE-3454.4.patch)

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Prasad Mujumdar
  Sounds good.

+1

Verified checksums of source and binary tarballs
Compiled with hadoop-1 and hadoop-2 profiles with distributions
Ran maven verify


thanks
Prasad


On Wed, Feb 18, 2015 at 12:50 PM, Brock Noland br...@cloudera.com wrote:

 Good idea... since it's not a blocker I will add that for 1.1.1 and 1.2.0.

 On Wed, Feb 18, 2015 at 10:37 AM, Prasad Mujumdar pras...@cloudera.com
 wrote:
  I guess the README.txt can list Apache Spark as query execution
  framework along with MapReduce and Tez.
 
  thanks
  Prasad
 
 
  On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com
 wrote:
 
  +1
 
  1. downloaded the src and bin, and verified md5.
  2. built the src with -Phadoop-1 and -Phadoop-2.
  3. ran a few unit tests
 
  Thanks,
  Xuefu
 
  On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com
 wrote:
 
   Apache Hive 1.1.0 Release Candidate 2 is available here:
   http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
  
   Maven artifacts are available here:
  
 https://repository.apache.org/content/repositories/orgapachehive-1025/
  
   Source tag for RC1 is at:
   http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
  
   My key is located here: https://people.apache.org/keys/group/hive.asc
  
   Voting will conclude in 72 hours
  
 



Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Gopal Vijayaraghavan
Hi,

From the release branch, I noticed that the hive-exec.jar now contains a
copy of guava-14 without any relocations.

The hive spark-client pom.xml adds guava as a lib jar instead of shading
it in. 

https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111


That seems to be a great approach for guava compat issues across execution
engines.


Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??).

https://issues.apache.org/jira/browse/SPARK-2848


Does any of the same compatibility issues occur when using a hive-exec.jar
containing guava-14 on MRv2 (which has guava-11 in the classpath)?

Cheers,
Gopal

On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote:

Apache Hive 1.1.0 Release Candidate 2 is available here:
http://people.apache.org/~brock/apache-hive-1.1.0-rc2/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1025/

Source tag for RC1 is at:
http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/

My key is located here: https://people.apache.org/keys/group/hive.asc

Voting will conclude in 72 hours




[jira] [Created] (HIVE-9720) Metastore does not properly migrate column stats when renaming a table across databases.

2015-02-18 Thread Alexander Behm (JIRA)
Alexander Behm created HIVE-9720:


 Summary: Metastore does not properly migrate column stats when 
renaming a table across databases.
 Key: HIVE-9720
 URL: https://issues.apache.org/jira/browse/HIVE-9720
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.13.1
Reporter: Alexander Behm


It appears that the Hive Metastore does not properly migrate column statistics 
when renaming a table across databases. While renaming across databases is not 
supported in HiveQL, it can be done via the Metastore Thrift API.
The problem is that such a newly renamed table cannot be dropped (unless 
renamed back to its original database/name).

Here are steps for reproducing the issue.

1. From the Hive shell/beeline:
{code}
create database db1;
create database db2;
create table db1.mv (i int);
use db1;
analyze table mv compute statistics for columns i;
{code}

2. From a Java program:
{code}
  public static void main(String[] args) throws Exception {
HiveConf conf = new HiveConf(MetaStoreClientPool.class);
HiveMetaStoreClient hiveClient = new HiveMetaStoreClient(conf);
Table t = hiveClient.getTable(db1, mv);
t.setDbName(db2);
t.setTableName(mv2);
hiveClient.alter_table(db1, mv, t);
  }
{code}

3. From the Hive shell/beeline:
{code}
drop table db2.mv2;
{code}

Stack shown when running 3:

{code}
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask. 
MetaException(message:javax.jdo.JDODataStoreException: Exception thrown 
flushing changes to datastore
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:451)
at 
org.datanucleus.api.jdo.JDOTransaction.commit(JDOTransaction.java:165)
at 
org.apache.hadoop.hive.metastore.ObjectStore.commitTransaction(ObjectStore.java:411)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy0.commitTransaction(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1389)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1525)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:106)
at com.sun.proxy.$Proxy1.drop_table_with_environment_context(Unknown 
Source)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:8072)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_table_with_environment_context.getResult(ThriftHiveMetastore.java:8056)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.metastore.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:48)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:244)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
NestedThrowablesStackTrace:
java.sql.BatchUpdateException: Batch entry 0 DELETE FROM TBLS WHERE 
TBL_ID='1621' was aborted.  Call getNextException to see the cause.
at 
org.postgresql.jdbc2.AbstractJdbc2Statement$BatchResultHandler.handleError(AbstractJdbc2Statement.java:2598)
at 
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:407)
at 
org.postgresql.jdbc2.AbstractJdbc2Statement.executeBatch(AbstractJdbc2Statement.java:2737)
at 
com.jolbox.bonecp.StatementHandle.executeBatch(StatementHandle.java:424)
at 
org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeBatch(ParamLoggingPreparedStatement.java:372)
at 
org.datanucleus.store.rdbms.SQLController.processConnectionStatement(SQLController.java:628)
at 

[jira] [Commented] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326995#comment-14326995
 ] 

Pengcheng Xiong commented on HIVE-9647:
---

[~mmokhtar], the test failure is unrelated and it passed on my laptop. Could 
you please try the patch? It could be applied on trunk. Thanks.

 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-9647.01.patch


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE co-porccessors which we can use as a reference here 
 Using the global stats from TAB_COL_STATS and the per partition stats from 
 PART_COL_STATS extrapolate the NDV for the qualified partitions as in :
 Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / 
 (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS))
 More details
 While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of 
 the plans are different, then I dumped the CBO logical plan and I found that 
 join estimates are drastically different
 Unpartitioned schema :
 {code}
 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer 
 (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering:
 HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], 
 store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], 
 as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], 
 as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): 
 rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2956
   HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], 
 agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], 
 agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = 
 {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954
 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, 
 cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952
   HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], 
 ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], 
 sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], 
 sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2982
 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2980
   HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], 
 joinType=[inner]): rowcount = 28880.460910696, cumulative cost = 
 {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 
 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): 
 rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822
 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], 
 sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923
   

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 3

2015-02-18 Thread Xuefu Zhang
+1

1. downloaded the src tarball and built w/ -Phadoop-1/2
2. verified no binary (jars) in the src tarball

On Wed, Feb 18, 2015 at 8:56 PM, Brock Noland br...@cloudera.com wrote:

 +1

 verified sigs, hashes, created tables, ran MR on YARN jobs

 On Wed, Feb 18, 2015 at 8:54 PM, Brock Noland br...@cloudera.com wrote:
  Apache Hive 1.1.0 Release Candidate 3 is available here:
  http://people.apache.org/~brock/apache-hive-1.1.0-rc3/
 
  Maven artifacts are available here:
  https://repository.apache.org/content/repositories/orgapachehive-1026/
 
  Source tag for RC3 is at:
  http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/
 
  My key is located here: https://people.apache.org/keys/group/hive.asc
 
  Voting will conclude in 72 hours



Review Request 31178: Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31178/
---

Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

The discrepancy is because NDV calculation for a partitioned table assumes that 
the NDV range is contained within each partition and is calculates as select 
max(NUM_DISTINCTS) from PART_COL_STATS” .
This is problematic for columns like ticket number which are naturally 
increasing with the partitioned date column ss_sold_date_sk.


Diffs
-

  data/files/extrapolate_stats_partial_ndv.txt PRE-CREATION 
  
metastore/src/java/org/apache/hadoop/hive/metastore/IExtrapolatePartStatus.java 
74f1b01 
  
metastore/src/java/org/apache/hadoop/hive/metastore/LinearExtrapolatePartStatus.java
 7fc04f1 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
574141c 
  metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java 
475883b 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_full.q 00c9b53 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial.q 8ae9a90 
  ql/src/test/queries/clientpositive/extrapolate_part_stats_partial_ndv.q 
PRE-CREATION 
  ql/src/test/results/clientpositive/extrapolate_part_stats_full.q.out 0f6b15d 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial.q.out 
1fdeb90 
  ql/src/test/results/clientpositive/extrapolate_part_stats_partial_ndv.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/31178/diff/


Testing
---


Thanks,

pengcheng xiong



[jira] [Commented] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces

2015-02-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327087#comment-14327087
 ] 

Jason Dere commented on HIVE-9537:
--

This was by design. The SQL spec didn't seem to have any specifics here 
regarding the trailing spaces behavior, and MySQL/Postgres (which I had 
available at the time) had similar semantics regarding how trailing spaces for 
char were treated during length()/concat(). upper()/lower() should not be 
affected by this

 string expressions on a fixed length character do not preserve trailing spaces
 --

 Key: HIVE-9537
 URL: https://issues.apache.org/jira/browse/HIVE-9537
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Aihua Xu

 When a string expression such as upper or lower is applied to a fixed length 
 column the trailing spaces of the fixed length character are not preserved.
 {code:sql}
 CREATE TABLE  if not exists TCHAR ( 
 RNUM int, 
 CCHAR char(32)
 )
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
 LINES TERMINATED BY '\n' 
 STORED AS TEXTFILE;
 {code}
 {{cchar}} as a {{char(32)}}.
 {code:sql}
 select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), 
 concat(upper(cchar), cchar) 
 from tchar;
 {code}
 0|\N
 1|
 2| 
 3|BB
 4|EE
 5|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[VOTE] Apache Hive 1.1.0 Release Candidate 3

2015-02-18 Thread Brock Noland
Apache Hive 1.1.0 Release Candidate 3 is available here:
http://people.apache.org/~brock/apache-hive-1.1.0-rc3/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1026/

Source tag for RC3 is at:
http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/

My key is located here: https://people.apache.org/keys/group/hive.asc

Voting will conclude in 72 hours


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 3

2015-02-18 Thread Brock Noland
+1

verified sigs, hashes, created tables, ran MR on YARN jobs

On Wed, Feb 18, 2015 at 8:54 PM, Brock Noland br...@cloudera.com wrote:
 Apache Hive 1.1.0 Release Candidate 3 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc3/

 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1026/

 Source tag for RC3 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc3/

 My key is located here: https://people.apache.org/keys/group/hive.asc

 Voting will conclude in 72 hours


[jira] [Created] (HIVE-9721) Hadoop23Shims.setFullFileStatus should check for null

2015-02-18 Thread Brock Noland (JIRA)
Brock Noland created HIVE-9721:
--

 Summary: Hadoop23Shims.setFullFileStatus should check for null
 Key: HIVE-9721
 URL: https://issues.apache.org/jira/browse/HIVE-9721
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland


{noformat}
2015-02-18 22:46:10,209 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: 
Skipping ACL inheritance: File system for path 
file:/tmp/hive/f1a28dee-70e8-4bc3-bd35-9be13834d1fc/hive_2015-02-18_22-46-10_065_3348083202601156561-1
 does not support ACLs but dfs.namenode.acls.enabled is set to true: 
java.lang.UnsupportedOperationException: RawLocalFileSystem doesn't support 
getAclStatus
java.lang.UnsupportedOperationException: RawLocalFileSystem doesn't support 
getAclStatus
at org.apache.hadoop.fs.FileSystem.getAclStatus(FileSystem.java:2429)
at 
org.apache.hadoop.fs.FilterFileSystem.getAclStatus(FilterFileSystem.java:562)
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.getFullFileStatus(Hadoop23Shims.java:645)
at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:524)
at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:234)
at 
org.apache.hadoop.hive.ql.Context.getExtTmpPathRelTo(Context.java:424)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:6290)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9069)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:8961)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9807)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:9700)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10136)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:284)
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10147)
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:190)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:222)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:421)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:307)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1112)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1106)
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:101)
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172)
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366)
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:415)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2015-02-18 17:30:58,753 INFO org.apache.hadoop.hive.shims.HadoopShimsSecure: 
Skipping ACL inheritance: File system for path 
file:/tmp/hive/e3eb01f0-bb58-45a8-b773-8f4f3420457c/hive_2015-02-18_17-30-58_346_5020255420422913166-1/-mr-1
 does not support ACLs but dfs.namenode.acls.enabled is set to true: 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.hive.shims.Hadoop23Shims.setFullFileStatus(Hadoop23Shims.java:668)
at org.apache.hadoop.hive.common.FileUtils.mkdir(FileUtils.java:527)
at org.apache.hadoop.hive.ql.Context.getStagingDir(Context.java:234)
at 
org.apache.hadoop.hive.ql.Context.getExtTmpPathRelTo(Context.java:424)
at 

[jira] [Updated] (HIVE-9706) HBase handler support for snapshots should confirm properties before use

2015-02-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-9706:
---
   Resolution: Fixed
Fix Version/s: (was: 1.1.0)
   Status: Resolved  (was: Patch Available)

Thank you Sean! I have committed this to trunk!

 HBase handler support for snapshots should confirm properties before use
 

 Key: HIVE-9706
 URL: https://issues.apache.org/jira/browse/HIVE-9706
 Project: Hive
  Issue Type: Bug
  Components: HBase Handler
Affects Versions: 0.14.0, 1.0.0
Reporter: Sean Busbey
Assignee: Sean Busbey
 Fix For: 1.2.0

 Attachments: HIVE-9707.1.patch


 The HBase Handler's support for running over snapshots attempts to copy a 
 number of hbase internal configurations into a job configuration.
 Some of these configuration keys are removed in HBase 1.0.0+ and the current 
 implementation will fail when copying the resultant null value into a new 
 configuration. Additionally, some internal configs added in later HBase 0.98 
 versions are not respected.
 Instead, setup should check for the presence of the keys it expects and then 
 make the new configuration consistent with them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Xuefu Zhang
+1

1. downloaded the src and bin, and verified md5.
2. built the src with -Phadoop-1 and -Phadoop-2.
3. ran a few unit tests

Thanks,
Xuefu

On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com wrote:

 Apache Hive 1.1.0 Release Candidate 2 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc2/

 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1025/

 Source tag for RC1 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/

 My key is located here: https://people.apache.org/keys/group/hive.asc

 Voting will conclude in 72 hours



[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326137#comment-14326137
 ] 

Aihua Xu commented on HIVE-3454:


The test failure is unrelated to the change.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: HIVE-3454.3.patch

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325867#comment-14325867
 ] 

Aihua Xu commented on HIVE-3454:


Thanks [~jdere] for reviewing. Just updated the parameter name.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9703) Merge from Spark branch to trunk 02/16/2015

2015-02-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9703:
--
   Resolution: Fixed
Fix Version/s: 1.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks to Brock for the review.

 Merge from Spark branch to trunk 02/16/2015
 ---

 Key: HIVE-9703
 URL: https://issues.apache.org/jira/browse/HIVE-9703
 Project: Hive
  Issue Type: Task
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 1.2.0

 Attachments: HIVE-9703.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9613) Left join query plan outputs wrong column when using subquery

2015-02-18 Thread Li Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325850#comment-14325850
 ] 

Li Xin commented on HIVE-9613:
--

thank you for your reply Chao. 

I just set up a new hive of version 1.0 in my cluster,and didn't change any 
configuration ,and the results is still the same , strange.

i will just attached the sql i tested , could you take any time to test it and 
let me know it is ok or not . Best regards  and happy Chinese new year. 


 Left join query plan outputs  wrong column when using subquery
 --

 Key: HIVE-9613
 URL: https://issues.apache.org/jira/browse/HIVE-9613
 Project: Hive
  Issue Type: Bug
  Components: Parser, Query Planning
Affects Versions: 0.14.0, 1.0.0
 Environment: apache hadoop 2.5.1 
Reporter: Li Xin
 Attachments: test.sql


 I have a query that outputs a column with wrong contents when using 
 subquery,and the contents of that column is equal to another column,not its 
 own.
 I have three tables,as follows:
 table 1: _hivetemp.category_city_rank_:
 ||category||city||rank||
 |jinrongfuwu|shanghai|1|
 |ktvjiuba|shanghai|2|
 table 2:_hivetemp.category_match_:
 ||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
 |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
 |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
 table 3:_hivetemp.city_match_:
 ||src_city_name_en||dst_city_name_en||city_name_cn||
 |sh|shanghai|上海|
 And the query is :
 {code}
 select
 a.category,
 a.city,
 a.rank,
 b.src_category_en,
 c.src_city_name_en
 from
 hivetemp.category_city_rank a
 left outer join
 (select
 src_category_en,
 dst_category_en
 from
 hivetemp.category_match) b
 on  a.category = b.dst_category_en
 left outer join
 (select
 src_city_name_en,
 dst_city_name_en
 from
 hivetemp.city_match) c
 on  a.city = c.dst_city_name_en
 {code}
 which shoud output the results as follows,and i test it in hive 0.13:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
 |ktvjiuba|shanghai|2|zpwentiyingshi|sh|
 but int hive0.14,the results in the column *src_category_en* is wrong,and is 
 just the *city* contents:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|shanghai|sh|
 |ktvjiuba|shanghai|2|shanghai|sh|
 Using explain to examine the execution plan,i can see the first subquery just 
 outputs one column of *dst_category_en*,and *src_category_en* is just missing.
 {quote}
b:category_match
   TableScan
 alias: category_match
 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: dst_category_en (type: string)
   outputColumnNames: _col1
   Statistics: Num rows: 131 Data size: 13149 Basic stats: 
 COMPLETE Column stats: NONE
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9613) Left join query plan outputs wrong column when using subquery

2015-02-18 Thread Li Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Xin updated HIVE-9613:
-
Attachment: test.sql

this sql script i tested output the 4th column with wrong values

 Left join query plan outputs  wrong column when using subquery
 --

 Key: HIVE-9613
 URL: https://issues.apache.org/jira/browse/HIVE-9613
 Project: Hive
  Issue Type: Bug
  Components: Parser, Query Planning
Affects Versions: 0.14.0, 1.0.0
 Environment: apache hadoop 2.5.1 
Reporter: Li Xin
 Attachments: test.sql


 I have a query that outputs a column with wrong contents when using 
 subquery,and the contents of that column is equal to another column,not its 
 own.
 I have three tables,as follows:
 table 1: _hivetemp.category_city_rank_:
 ||category||city||rank||
 |jinrongfuwu|shanghai|1|
 |ktvjiuba|shanghai|2|
 table 2:_hivetemp.category_match_:
 ||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
 |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
 |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
 table 3:_hivetemp.city_match_:
 ||src_city_name_en||dst_city_name_en||city_name_cn||
 |sh|shanghai|上海|
 And the query is :
 {code}
 select
 a.category,
 a.city,
 a.rank,
 b.src_category_en,
 c.src_city_name_en
 from
 hivetemp.category_city_rank a
 left outer join
 (select
 src_category_en,
 dst_category_en
 from
 hivetemp.category_match) b
 on  a.category = b.dst_category_en
 left outer join
 (select
 src_city_name_en,
 dst_city_name_en
 from
 hivetemp.city_match) c
 on  a.city = c.dst_city_name_en
 {code}
 which shoud output the results as follows,and i test it in hive 0.13:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
 |ktvjiuba|shanghai|2|zpwentiyingshi|sh|
 but int hive0.14,the results in the column *src_category_en* is wrong,and is 
 just the *city* contents:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|shanghai|sh|
 |ktvjiuba|shanghai|2|shanghai|sh|
 Using explain to examine the execution plan,i can see the first subquery just 
 outputs one column of *dst_category_en*,and *src_category_en* is just missing.
 {quote}
b:category_match
   TableScan
 alias: category_match
 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: dst_category_en (type: string)
   outputColumnNames: _col1
   Statistics: Num rows: 131 Data size: 13149 Basic stats: 
 COMPLETE Column stats: NONE
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9659) 'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]

2015-02-18 Thread Jimmy Xiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang reassigned HIVE-9659:
-

Assignee: Jimmy Xiang

 'Error while trying to create table container' occurs during hive query case 
 execution when hive.optimize.skewjoin set to 'true' [Spark Branch]
 ---

 Key: HIVE-9659
 URL: https://issues.apache.org/jira/browse/HIVE-9659
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Xin Hao
Assignee: Jimmy Xiang

 We found that 'Error while trying to create table container'  occurs during 
 Big-Bench Q12 case execution when hive.optimize.skewjoin set to 'true'.
 If hive.optimize.skewjoin set to 'false', the case could pass.
 How to reproduce:
 1. set hive.optimize.skewjoin=true;
 2. Run BigBench case Q12 and it will fail. 
 Check the executor log (e.g. /usr/lib/spark/work/app-/2/stderr) and you 
 will found error 'Error while trying to create table container' in the log 
 and also a NullPointerException near the end of the log.
 (a) Detail error message for 'Error while trying to create table container':
 {noformat}
 15/02/12 01:29:49 ERROR SparkMapRecordHandler: Error processing row: 
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Error while trying to 
 create table container
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:118)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:193)
   at 
 org.apache.hadoop.hive.ql.exec.MapJoinOperator.cleanUpInputFileChangedOp(MapJoinOperator.java:219)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1051)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged(Operator.java:1055)
   at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:486)
   at 
 org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:47)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:98)
   at 
 scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
   at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
   at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:65)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
   at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
   at org.apache.spark.scheduler.Task.run(Task.scala:56)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error while 
 trying to create table container
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:158)
   at 
 org.apache.hadoop.hive.ql.exec.spark.HashTableLoader.load(HashTableLoader.java:115)
   ... 21 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error, not a 
 directory: 
 hdfs://bhx1:8020/tmp/hive/root/d22ef465-bff5-4edb-a822-0a9f1c25b66c/hive_2015-02-12_01-28-10_008_6897031694580088767-1/-mr-10009/HashTable-Stage-6/MapJoin-mapfile01--.hashtable
   at 
 org.apache.hadoop.hive.ql.exec.persistence.MapJoinTableContainerSerDe.load(MapJoinTableContainerSerDe.java:106)
   ... 22 more
 15/02/12 01:29:49 INFO SparkRecordHandler: maximum memory = 40939028480
 15/02/12 01:29:49 INFO PerfLogger: PERFLOG method=SparkInitializeOperators 
 from=org.apache.hadoop.hive.ql.exec.spark.SparkRecordHandler
 {noformat}
 (b) Detail error message for NullPointerException:
 {noformat}
 5/02/12 01:29:50 ERROR MapJoinOperator: Unexpected exception: null
 

[jira] [Commented] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326062#comment-14326062
 ] 

Aihua Xu commented on HIVE-9537:


[~the6campbells] I don't think it's a bug. The Char type has fixed-length with 
padding spaces but they are not to be included the value of the fields and 
won't be considered when you call the upper/lower function. The result is as 
expected.



 string expressions on a fixed length character do not preserve trailing spaces
 --

 Key: HIVE-9537
 URL: https://issues.apache.org/jira/browse/HIVE-9537
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Aihua Xu

 When a string expression such as upper or lower is applied to a fixed length 
 column the trailing spaces of the fixed length character are not preserved.
 {code:sql}
 CREATE TABLE  if not exists TCHAR ( 
 RNUM int, 
 CCHAR char(32)
 )
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
 LINES TERMINATED BY '\n' 
 STORED AS TEXTFILE;
 {code}
 {{cchar}} as a {{char(32)}}.
 {code:sql}
 select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), 
 concat(upper(cchar), cchar) 
 from tchar;
 {code}
 0|\N
 1|
 2| 
 3|BB
 4|EE
 5|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu resolved HIVE-9537.

Resolution: Not a Problem

If you think you have additional issue, please reopen or open a new one.

 string expressions on a fixed length character do not preserve trailing spaces
 --

 Key: HIVE-9537
 URL: https://issues.apache.org/jira/browse/HIVE-9537
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Aihua Xu

 When a string expression such as upper or lower is applied to a fixed length 
 column the trailing spaces of the fixed length character are not preserved.
 {code:sql}
 CREATE TABLE  if not exists TCHAR ( 
 RNUM int, 
 CCHAR char(32)
 )
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
 LINES TERMINATED BY '\n' 
 STORED AS TEXTFILE;
 {code}
 {{cchar}} as a {{char(32)}}.
 {code:sql}
 select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), 
 concat(upper(cchar), cchar) 
 from tchar;
 {code}
 0|\N
 1|
 2| 
 3|BB
 4|EE
 5|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326065#comment-14326065
 ] 

Hive QA commented on HIVE-3454:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12699474/HIVE-3454.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7557 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2820/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2820/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2820/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12699474 - PreCommit-HIVE-TRUNK-Build

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Release Note: 
The behaviors of converting from BOOLEAN/BYTE/SHORT/INT/BIGINT and converting 
from FLOAT/DOUBLE to TIMESTAMP have been inconsistent. The value of a 
BOOLEAN/BYTE/SHORT/INT/BIGINT is treated as the time in milliseconds while  the 
value of a FLOAT/DOUBLE is treated as the time in seconds. 

With the change of HIVE-3454, we support an additional configuration 
hive.int.timestamp.conversion.in.seconds to enable the interpretation the 
BOOLEAN/BYTE/SHORT/INT/BIGINT value in seconds during the timestamp conversion 
without breaking the existing customers. By default, the existing functionality 
is kept.

  was:
The behaviors of converting from BOOLEAN/BYTE/SHORT/INT/BIGINT and converting 
from FLOAT/DOUBLE to TIMESTAMP have been inconsistent. The value of a 
BOOLEAN/BYTE/SHORT/INT/BIGINT is treated as the time in milliseconds while  the 
value of a FLOAT/DOUBLE is treated as the time in seconds. 

With the change of HIVE-3454, we support an additional configuration 
int.timestamp.conversion.in.seconds to enable the interpretation the 
BOOLEAN/BYTE/SHORT/INT/BIGINT value in seconds during the timestamp conversion 
without breaking the existing customers. By default, the existing functionality 
is kept.


 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14325868#comment-14325868
 ] 

Aihua Xu commented on HIVE-3454:


Thanks [~jdere] for reviewing. Just updated the parameter name.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-3454:
---
Attachment: (was: HIVE-3454.3.patch)

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9551) Unable to read Microsoft SQL Server timestamp column

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326002#comment-14326002
 ] 

Aihua Xu commented on HIVE-9551:


[~dilipg] Can you provide more details? Can you check what value returned from 
sqoop and what did you see from Hive? Exception or incorrectly interpreted? 

 Unable to read Microsoft SQL Server timestamp column
 

 Key: HIVE-9551
 URL: https://issues.apache.org/jira/browse/HIVE-9551
 Project: Hive
  Issue Type: Bug
  Components: CLI, SQL
Reporter: Dilip Godhia

 When sqoop reads a timestamp column from SQL Server, hive is not able to 
 process it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326005#comment-14326005
 ] 

Aihua Xu commented on HIVE-9546:


[~vbora] Seems you are hitting the issue HIVE-9199. 

 Create table taking substantially longer time when other select queries are 
 run in parallel.
 

 Key: HIVE-9546
 URL: https://issues.apache.org/jira/browse/HIVE-9546
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: RedHat Linux, Cloudera 5.3.0
Reporter: sri venu bora
 Attachments: Hive_create_Issue.txt


 Create table taking substantially longer time when other select queries are 
 run in parallel.
 We were able to reproduce the issue using beeline in two sessions.
 Beeline Shell 1: 
  a) create table with no other queries running on hive ( took approximately 
 0.313 seconds)
  b) Insert Data into the table
  c) Run a select count query on the above table
 Beeline Shell 2: 
  a) create table while step c) is running in the Beeline Shell 1. (took 
 approximately 60.431 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326007#comment-14326007
 ] 

Aihua Xu commented on HIVE-9546:


Try to set   hive.exec.parallel=false to disable the parallel to see if it 
makes the difference in hive-site.xml. 

 Create table taking substantially longer time when other select queries are 
 run in parallel.
 

 Key: HIVE-9546
 URL: https://issues.apache.org/jira/browse/HIVE-9546
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: RedHat Linux, Cloudera 5.3.0
Reporter: sri venu bora
Assignee: Aihua Xu
 Attachments: Hive_create_Issue.txt


 Create table taking substantially longer time when other select queries are 
 run in parallel.
 We were able to reproduce the issue using beeline in two sessions.
 Beeline Shell 1: 
  a) create table with no other queries running on hive ( took approximately 
 0.313 seconds)
  b) Insert Data into the table
  c) Run a select count query on the above table
 Beeline Shell 2: 
  a) create table while step c) is running in the Beeline Shell 1. (took 
 approximately 60.431 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9546) Create table taking substantially longer time when other select queries are run in parallel.

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-9546:
--

Assignee: Aihua Xu

 Create table taking substantially longer time when other select queries are 
 run in parallel.
 

 Key: HIVE-9546
 URL: https://issues.apache.org/jira/browse/HIVE-9546
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.1
 Environment: RedHat Linux, Cloudera 5.3.0
Reporter: sri venu bora
Assignee: Aihua Xu
 Attachments: Hive_create_Issue.txt


 Create table taking substantially longer time when other select queries are 
 run in parallel.
 We were able to reproduce the issue using beeline in two sessions.
 Beeline Shell 1: 
  a) create table with no other queries running on hive ( took approximately 
 0.313 seconds)
  b) Insert Data into the table
  c) Run a select count query on the above table
 Beeline Shell 2: 
  a) create table while step c) is running in the Beeline Shell 1. (took 
 approximately 60.431 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces

2015-02-18 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326009#comment-14326009
 ] 

Aihua Xu commented on HIVE-9537:


[~the6campbells] Can you provide the HIVE version you have the problem?

 string expressions on a fixed length character do not preserve trailing spaces
 --

 Key: HIVE-9537
 URL: https://issues.apache.org/jira/browse/HIVE-9537
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Aihua Xu

 When a string expression such as upper or lower is applied to a fixed length 
 column the trailing spaces of the fixed length character are not preserved.
 {code:sql}
 CREATE TABLE  if not exists TCHAR ( 
 RNUM int, 
 CCHAR char(32)
 )
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
 LINES TERMINATED BY '\n' 
 STORED AS TEXTFILE;
 {code}
 {{cchar}} as a {{char(32)}}.
 {code:sql}
 select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), 
 concat(upper(cchar), cchar) 
 from tchar;
 {code}
 0|\N
 1|
 2| 
 3|BB
 4|EE
 5|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-9561:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

[~lirui], no worries. I just committed this to the Spark branch. Thanks, Rui.

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
 HIVE-9561.6-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces

2015-02-18 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu reassigned HIVE-9537:
--

Assignee: Aihua Xu

 string expressions on a fixed length character do not preserve trailing spaces
 --

 Key: HIVE-9537
 URL: https://issues.apache.org/jira/browse/HIVE-9537
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Aihua Xu

 When a string expression such as upper or lower is applied to a fixed length 
 column the trailing spaces of the fixed length character are not preserved.
 {code:sql}
 CREATE TABLE  if not exists TCHAR ( 
 RNUM int, 
 CCHAR char(32)
 )
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
 LINES TERMINATED BY '\n' 
 STORED AS TEXTFILE;
 {code}
 {{cchar}} as a {{char(32)}}.
 {code:sql}
 select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), 
 concat(upper(cchar), cchar) 
 from tchar;
 {code}
 0|\N
 1|
 2| 
 3|BB
 4|EE
 5|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326158#comment-14326158
 ] 

Brock Noland edited comment on HIVE-3454 at 2/18/15 4:40 PM:
-

Have we tested this as part of an MR job? I don't think that the hive-site.xml 
is shipped as part of MR jobs. If that is true, how about we do as follows:

1) Add method {{public static void initialize(Configuration)}} to 
{{TimestampWritable}}
2) Call this method from {{AbstractSerDe.initialize}} which should be called, 
with configuration, in all the right places.
3) In {{TimestampWritable.initialize}} you can use the static 
{{HiveConf.getBoolVar}}

a bit kludgy but it should work. This all assuming the current impl doesn't 
work.

bq. timestamp conversion.

I think we need a space after this.


was (Author: brocknoland):
Have we tested this as part of an MR job? I don't think that the hive-site.xml 
is shipped as part of MR jobs. If that is true, how about we do as follows:

1) Add method {{public static void initialize(Configuration)}} to 
{{TimestampWritable}}
2) Call this method from {{AbstractSerDe.initialize}} which should be called, 
with configuration, in all the right places.
3) In {{TimestampWritable.initialize}} you can use the static 
{{HiveCon.getBoolVar}}

a bit kludgy but it should work. This all assuming the current impl doesn't 
work.

bq. timestamp conversion.

I think we need a space after this.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326158#comment-14326158
 ] 

Brock Noland commented on HIVE-3454:


Have we tested this as part of an MR job? I don't think that the hive-site.xml 
is shipped as part of MR jobs. If that is true, how about we do as follows:

1) Add method {{public static void initialize(Configuration)}} to 
{{TimestampWritable}}
2) Call this method from {{AbstractSerDe.initialize}} which should be called, 
with configuration, in all the right places.
3) In {{TimestampWritable.Configuration}} you can use the static 
{{HiveCon.getBoolVar}}

a bit kludgy but it should work. This all assuming the current impl doesn't 
work.

bq. timestamp conversion.

I think we need a space after this.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-3454) Problem with CAST(BIGINT as TIMESTAMP)

2015-02-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326158#comment-14326158
 ] 

Brock Noland edited comment on HIVE-3454 at 2/18/15 4:40 PM:
-

Have we tested this as part of an MR job? I don't think that the hive-site.xml 
is shipped as part of MR jobs. If that is true, how about we do as follows:

1) Add method {{public static void initialize(Configuration)}} to 
{{TimestampWritable}}
2) Call this method from {{AbstractSerDe.initialize}} which should be called, 
with configuration, in all the right places.
3) In {{TimestampWritable.initialize}} you can use the static 
{{HiveCon.getBoolVar}}

a bit kludgy but it should work. This all assuming the current impl doesn't 
work.

bq. timestamp conversion.

I think we need a space after this.


was (Author: brocknoland):
Have we tested this as part of an MR job? I don't think that the hive-site.xml 
is shipped as part of MR jobs. If that is true, how about we do as follows:

1) Add method {{public static void initialize(Configuration)}} to 
{{TimestampWritable}}
2) Call this method from {{AbstractSerDe.initialize}} which should be called, 
with configuration, in all the right places.
3) In {{TimestampWritable.Configuration}} you can use the static 
{{HiveCon.getBoolVar}}

a bit kludgy but it should work. This all assuming the current impl doesn't 
work.

bq. timestamp conversion.

I think we need a space after this.

 Problem with CAST(BIGINT as TIMESTAMP)
 --

 Key: HIVE-3454
 URL: https://issues.apache.org/jira/browse/HIVE-3454
 Project: Hive
  Issue Type: Bug
  Components: Types, UDF
Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0, 
 0.13.1
Reporter: Ryan Harris
Assignee: Aihua Xu
  Labels: newbie, newdev, patch
 Attachments: HIVE-3454.1.patch.txt, HIVE-3454.2.patch, 
 HIVE-3454.3.patch, HIVE-3454.3.patch, HIVE-3454.patch


 Ran into an issue while working with timestamp conversion.
 CAST(unix_timestamp() as TIMESTAMP) should create a timestamp for the current 
 time from the BIGINT returned by unix_timestamp()
 Instead, however, a 1970-01-16 timestamp is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-7292) Hive on Spark

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326675#comment-14326675
 ] 

Lefty Leverenz commented on HIVE-7292:
--

Doc note:  See comments on HIVE-9257 and HIVE-9448 for documentation issues.

* [HIVE-9257 commit comment with doc notes | 
https://issues.apache.org/jira/browse/HIVE-9257?focusedCommentId=14273166page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14273166]
* HIVE-9448 doc comments
** [list of configuration parameters | 
https://issues.apache.org/jira/browse/HIVE-9448?focusedCommentId=14292487page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14292487]
** [where documented | 
https://issues.apache.org/jira/browse/HIVE-9448?focusedCommentId=14298353page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14298353]

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-7292) Hive on Spark

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326644#comment-14326644
 ] 

Lefty Leverenz edited comment on HIVE-7292 at 2/18/15 10:49 PM:


Although this issue is still marked Unresolved, the Spark branch has been 
merged to trunk and is Resolved for the 1.1.0 release (HIVE-9257 and 
HIVE-9352).  (Edit:  Also HIVE-9448.)


was (Author: leftylev):
Although this issue is still marked Unresolved, the Spark branch has been 
merged to trunk and is Resolved for the 1.1.0 release (HIVE-9257 and HIVE-9352).

 Hive on Spark
 -

 Key: HIVE-7292
 URL: https://issues.apache.org/jira/browse/HIVE-7292
 Project: Hive
  Issue Type: Improvement
  Components: Spark
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: Spark-M1, Spark-M2, Spark-M3, Spark-M4, Spark-M5
 Attachments: Hive-on-Spark.pdf


 Spark as an open-source data analytics cluster computing framework has gained 
 significant momentum recently. Many Hive users already have Spark installed 
 as their computing backbone. To take advantages of Hive, they still need to 
 have either MapReduce or Tez on their cluster. This initiative will provide 
 user a new alternative so that those user can consolidate their backend. 
 Secondly, providing such an alternative further increases Hive's adoption as 
 it exposes Spark users  to a viable, feature-rich de facto standard SQL tools 
 on Hadoop.
 Finally, allowing Hive to run on Spark also has performance benefits. Hive 
 queries, especially those involving multiple reducer stages, will run faster, 
 thus improving user experience as Tez does.
 This is an umbrella JIRA which will cover many coming subtask. Design doc 
 will be attached here shortly, and will be on the wiki as well. Feedback from 
 the community is greatly appreciated!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Szehon Ho
Checked the md5 and the signature of both.
Built src and ran a few queries with CLI/Beeline.

Something very strange.. with the bin I cannot create any table at all
(says FAILED: SemanticException Line 1:13 Invalid table name..), I am not
sure what is wrong, as it works using the one I build from src.  I also can
create tables fine with the previous RC binary(s).  How did you create the
binary this time, was there any modification from the one built by src?
Hope it is not a setup error on my part.

Thanks,
Szehon

On Wed, Feb 18, 2015 at 2:26 PM, Prasad Mujumdar pras...@cloudera.com
wrote:

   Sounds good.

 +1

 Verified checksums of source and binary tarballs
 Compiled with hadoop-1 and hadoop-2 profiles with distributions
 Ran maven verify


 thanks
 Prasad


 On Wed, Feb 18, 2015 at 12:50 PM, Brock Noland br...@cloudera.com wrote:

  Good idea... since it's not a blocker I will add that for 1.1.1 and
 1.2.0.
 
  On Wed, Feb 18, 2015 at 10:37 AM, Prasad Mujumdar pras...@cloudera.com
  wrote:
   I guess the README.txt can list Apache Spark as query execution
   framework along with MapReduce and Tez.
  
   thanks
   Prasad
  
  
   On Wed, Feb 18, 2015 at 8:26 AM, Xuefu Zhang xzh...@cloudera.com
  wrote:
  
   +1
  
   1. downloaded the src and bin, and verified md5.
   2. built the src with -Phadoop-1 and -Phadoop-2.
   3. ran a few unit tests
  
   Thanks,
   Xuefu
  
   On Tue, Feb 17, 2015 at 3:14 PM, Brock Noland br...@cloudera.com
  wrote:
  
Apache Hive 1.1.0 Release Candidate 2 is available here:
http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
   
Maven artifacts are available here:
   
  https://repository.apache.org/content/repositories/orgapachehive-1025/
   
Source tag for RC1 is at:
http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
   
My key is located here:
 https://people.apache.org/keys/group/hive.asc
   
Voting will conclude in 72 hours
   
  
 



[jira] [Assigned] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-9647:
-

Assignee: Pengcheng Xiong  (was: Gunther Hagleitner)

 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE co-porccessors which we can use as a reference here 
 Using the global stats from TAB_COL_STATS and the per partition stats from 
 PART_COL_STATS extrapolate the NDV for the qualified partitions as in :
 Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / 
 (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS))
 More details
 While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of 
 the plans are different, then I dumped the CBO logical plan and I found that 
 join estimates are drastically different
 Unpartitioned schema :
 {code}
 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer 
 (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering:
 HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], 
 store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], 
 as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], 
 as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): 
 rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2956
   HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], 
 agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], 
 agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = 
 {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954
 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, 
 cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952
   HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], 
 ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], 
 sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], 
 sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2982
 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2980
   HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], 
 joinType=[inner]): rowcount = 28880.460910696, cumulative cost = 
 {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 
 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): 
 rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822
 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], 
 sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): 
 rowcount = 5.5578005E7, cumulative cost = {0}, id = 2823
   HiveProjectRel(d_date_sk=[$0], 

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Brock Noland
We should be able to generate those values in webhcat-default.xml. Eugene?

On Wed, Feb 18, 2015 at 4:43 PM, Lefty Leverenz leftylever...@gmail.com wrote:
 Four configuration values in webhcat-default.xml need to be updated (same
 as HIVE-8807 https://issues.apache.org/jira/browse/HIVE-8807 updated in
 the patch for release 1.0.0
 https://issues.apache.org/jira/secure/attachment/12695112/HIVE8807.patch):

- templeton.pig.path
- templeton.hive.path
- templeton.hive.home
- templeton.hcat.home

 How can we make this happen in every release, without reminders?


 -- Lefty

 On Wed, Feb 18, 2015 at 4:04 PM, Brock Noland br...@cloudera.com wrote:

 Yeah that is really strange. I have seen that before, a long time
 back, and but not found the root cause. I think it's a bug in either
 antlr or how we use antlr.

 I will re-generate the binaries and start another vote. Note the
 source tag will be the same which is technically what we vote on..

 On Wed, Feb 18, 2015 at 3:59 PM, Chao Sun c...@cloudera.com wrote:
  I tested apache-hive.1.1.0-bin and I also got the same error as Szehon
  reported.
 
  On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com
 wrote:
 
  Hi,
 
 
 
  On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan 
 gop...@apache.org
  wrote:
   Hi,
  
   From the release branch, I noticed that the hive-exec.jar now
 contains a
   copy of guava-14 without any relocations.
  
   The hive spark-client pom.xml adds guava as a lib jar instead of
 shading
   it in.
  
  
 https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111
  
  
   That seems to be a great approach for guava compat issues across
  execution
   engines.
  
  
   Spark itself relocates guava-14 for compatibility with
 Hive-on-Spark(??).
  
   https://issues.apache.org/jira/browse/SPARK-2848
  
  
   Does any of the same compatibility issues occur when using a
  hive-exec.jar
   containing guava-14 on MRv2 (which has guava-11 in the classpath)?
 
  Not that I am aware of. I've tested it on top of MRv2 a number of
  times and I think the unit tests also excercise these code paths.
 
  
   Cheers,
   Gopal
  
   On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote:
  
  Apache Hive 1.1.0 Release Candidate 2 is available here:
  http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
  
  Maven artifacts are available here:
  
 https://repository.apache.org/content/repositories/orgapachehive-1025/
  
  Source tag for RC1 is at:
  http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
  
  My key is located here: https://people.apache.org/keys/group/hive.asc
  
  Voting will conclude in 72 hours
  
  
 
 
 
 
  --
  Best,
  Chao



[jira] [Updated] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9647:
--
Status: Patch Available  (was: Open)

 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-9647.01.patch


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE co-porccessors which we can use as a reference here 
 Using the global stats from TAB_COL_STATS and the per partition stats from 
 PART_COL_STATS extrapolate the NDV for the qualified partitions as in :
 Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / 
 (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS))
 More details
 While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of 
 the plans are different, then I dumped the CBO logical plan and I found that 
 join estimates are drastically different
 Unpartitioned schema :
 {code}
 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer 
 (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering:
 HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], 
 store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], 
 as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], 
 as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): 
 rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2956
   HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], 
 agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], 
 agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = 
 {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954
 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, 
 cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952
   HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], 
 ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], 
 sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], 
 sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2982
 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2980
   HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], 
 joinType=[inner]): rowcount = 28880.460910696, cumulative cost = 
 {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 
 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): 
 rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822
 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], 
 sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): 
 rowcount = 5.5578005E7, cumulative cost = {0}, id = 2823
   HiveProjectRel(d_date_sk=[$0], 

[jira] [Updated] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-9647:
--
Attachment: HIVE-9647.01.patch

Current HIVE implements the NDV using Flajolet-Martin algorithm. In this 
algorithm, it assumes that we can have a a hash function hash(x) which maps 
input x to integers in the range [0; 2^{L-1}] and where the outputs are 
sufficiently UNIFORMLY distributed. 

Thus, if we assume an UNIFORM distribution, the density of the NDV should also 
be the same. Moreover, since we already have the min/max as well as NDV for 
each partition, we can calculate the density for each partition. We use the 
average of the density of all the partitions for the aggregation. This method 
is not only independent of the # of partitions which runs fast, but also is 
easy to extended to extrapolation cases where we miss the status of some of the 
partitions.

This patch also address the bug in HIVE-9717

 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-9647.01.patch


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE co-porccessors which we can use as a reference here 
 Using the global stats from TAB_COL_STATS and the per partition stats from 
 PART_COL_STATS extrapolate the NDV for the qualified partitions as in :
 Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / 
 (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS))
 More details
 While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of 
 the plans are different, then I dumped the CBO logical plan and I found that 
 join estimates are drastically different
 Unpartitioned schema :
 {code}
 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer 
 (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering:
 HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], 
 store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], 
 as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], 
 as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): 
 rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2956
   HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], 
 agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], 
 agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = 
 {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954
 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, 
 cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952
   HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], 
 ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], 
 sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], 
 sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2982
 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2980
   HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], 
 joinType=[inner]): rowcount = 28880.460910696, cumulative 

[jira] [Commented] (HIVE-9647) Discrepancy in cardinality estimates between partitioned and un-partitioned tables

2015-02-18 Thread Mostafa Mokhtar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326844#comment-14326844
 ] 

Mostafa Mokhtar commented on HIVE-9647:
---

Awesome :) 
I am happy we can fix this.






 Discrepancy in cardinality estimates between partitioned and un-partitioned 
 tables 
 ---

 Key: HIVE-9647
 URL: https://issues.apache.org/jira/browse/HIVE-9647
 Project: Hive
  Issue Type: Bug
  Components: CBO
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Pengcheng Xiong
 Fix For: 1.2.0

 Attachments: HIVE-9647.01.patch


 High-level summary
 HiveRelMdSelectivity.computeInnerJoinSelectivity relies on per column number 
 of distinct value to estimate join selectivity.
 The way statistics are aggregated for partitioned tables results in 
 discrepancy in number of distinct values which results in different plans 
 between partitioned and un-partitioned schemas.
 The table below summarizes the NDVs in computeInnerJoinSelectivity which are 
 used to estimate selectivity of joins.
 ||Column  ||Partitioned count distincts|| Un-Partitioned count 
 distincts 
 |sr_customer_sk   |71,245 |1,415,625|
 |sr_item_sk   |38,846|62,562|
 |sr_ticket_number |71,245 |34,931,085|
 |ss_customer_sk   |88,476|1,415,625|
 |ss_item_sk   |38,846|62,562|
 |ss_ticket_number|100,756 |56,256,175|
   
 The discrepancy is because NDV calculation for a partitioned table assumes 
 that the NDV range is contained within each partition and is calculates as 
 select max(NUM_DISTINCTS) from PART_COL_STATS” .
 This is problematic for columns like ticket number which are naturally 
 increasing with the partitioned date column ss_sold_date_sk.
 Suggestions
 Use Hyper Log Log as suggested by Gopal, there is an HLL implementation for 
 HBASE co-porccessors which we can use as a reference here 
 Using the global stats from TAB_COL_STATS and the per partition stats from 
 PART_COL_STATS extrapolate the NDV for the qualified partitions as in :
 Max ( (NUM_DISTINCTS from TAB_COL_STATS) x (Number of qualified partitions) / 
 (Number of Partitions), max(NUM_DISTINCTS) from PART_COL_STATS))
 More details
 While doing TPC-DS Partitioned vs. Un-Partitioned runs I noticed that many of 
 the plans are different, then I dumped the CBO logical plan and I found that 
 join estimates are drastically different
 Unpartitioned schema :
 {code}
 2015-02-10 11:33:27,624 DEBUG [main]: parse.SemanticAnalyzer 
 (SemanticAnalyzer.java:apply(12624)) - Plan After Join Reordering:
 HiveProjectRel(store_sales_quantitycount=[$0], store_sales_quantityave=[$1], 
 store_sales_quantitystdev=[$2], store_sales_quantitycov=[/($2, $1)], 
 as_store_returns_quantitycount=[$3], as_store_returns_quantityave=[$4], 
 as_store_returns_quantitystdev=[$5], store_returns_quantitycov=[/($5, $4)]): 
 rowcount = 1.0, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2956
   HiveAggregateRel(group=[{}], agg#0=[count($0)], agg#1=[avg($0)], 
 agg#2=[stddev_samp($0)], agg#3=[count($1)], agg#4=[avg($1)], 
 agg#5=[stddev_samp($1)]): rowcount = 1.0, cumulative cost = 
 {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2954
 HiveProjectRel($f0=[$4], $f1=[$8]): rowcount = 40.05611776795562, 
 cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 io}, id = 2952
   HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$1], 
 ss_customer_sk=[$2], ss_ticket_number=[$3], ss_quantity=[$4], 
 sr_item_sk=[$5], sr_customer_sk=[$6], sr_ticket_number=[$7], 
 sr_return_quantity=[$8], d_date_sk=[$9], d_quarter_name=[$10]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2982
 HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount = 
 40.05611776795562, cumulative cost = {6.056835407771381E8 rows, 0.0 cpu, 0.0 
 io}, id = 2980
   HiveJoinRel(condition=[AND(AND(=($2, $6), =($1, $5)), =($3, $7))], 
 joinType=[inner]): rowcount = 28880.460910696, cumulative cost = 
 {6.05654559E8 rows, 0.0 cpu, 0.0 io}, id = 2964
 HiveProjectRel(ss_sold_date_sk=[$0], ss_item_sk=[$2], 
 ss_customer_sk=[$3], ss_ticket_number=[$9], ss_quantity=[$10]): rowcount = 
 5.50076554E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2920
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_sales]]): 
 rowcount = 5.50076554E8, cumulative cost = {0}, id = 2822
 HiveProjectRel(sr_item_sk=[$2], sr_customer_sk=[$3], 
 sr_ticket_number=[$9], sr_return_quantity=[$10]): rowcount = 5.5578005E7, 
 cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2923
   HiveTableScanRel(table=[[tpcds_bin_orc_200.store_returns]]): 
 rowcount = 5.5578005E7, cumulative 

Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Brock Noland
Hi,



On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org wrote:
 Hi,

 From the release branch, I noticed that the hive-exec.jar now contains a
 copy of guava-14 without any relocations.

 The hive spark-client pom.xml adds guava as a lib jar instead of shading
 it in.

 https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111


 That seems to be a great approach for guava compat issues across execution
 engines.


 Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??).

 https://issues.apache.org/jira/browse/SPARK-2848


 Does any of the same compatibility issues occur when using a hive-exec.jar
 containing guava-14 on MRv2 (which has guava-11 in the classpath)?

Not that I am aware of. I've tested it on top of MRv2 a number of
times and I think the unit tests also excercise these code paths.


 Cheers,
 Gopal

 On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote:

Apache Hive 1.1.0 Release Candidate 2 is available here:
http://people.apache.org/~brock/apache-hive-1.1.0-rc2/

Maven artifacts are available here:
https://repository.apache.org/content/repositories/orgapachehive-1025/

Source tag for RC1 is at:
http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/

My key is located here: https://people.apache.org/keys/group/hive.asc

Voting will conclude in 72 hours




[jira] [Commented] (HIVE-9703) Merge from Spark branch to trunk 02/16/2015

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326776#comment-14326776
 ] 

Lefty Leverenz commented on HIVE-9703:
--

Does any of this need documentation, or can we assume it's all covered by jiras 
that patched the Spark branch?

 Merge from Spark branch to trunk 02/16/2015
 ---

 Key: HIVE-9703
 URL: https://issues.apache.org/jira/browse/HIVE-9703
 Project: Hive
  Issue Type: Task
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 1.2.0

 Attachments: HIVE-9703.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8807) Obsolete default values in webhcat-default.xml

2015-02-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326827#comment-14326827
 ] 

Lefty Leverenz commented on HIVE-8807:
--

This also needs to be done for release 1.1.0, but I don't think we should have 
a new Jira for each release.  Would it make sense to reopen this issue for each 
release?  Or is there a better way to make sure webhcat-default.xml gets 
updated?

 Obsolete default values in webhcat-default.xml
 --

 Key: HIVE-8807
 URL: https://issues.apache.org/jira/browse/HIVE-8807
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Affects Versions: 0.12.0, 0.13.0, 0.14.0
Reporter: Lefty Leverenz
Assignee: Eugene Koifman
 Fix For: 1.0.0

 Attachments: HIVE8807.patch


 The defaults for templeton.pig.path  templeton.hive.path are 0.11 in 
 webhcat-default.xml but they ought to match current release numbers.
 The Pig version is 0.12.0 for Hive 0.14 RC0 (as shown in pom.xml).
 no precommit tests



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9613) Left join query plan outputs wrong column when using subquery

2015-02-18 Thread Chao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326838#comment-14326838
 ] 

Chao commented on HIVE-9613:


Hi [~spyfree], sorry I was wrong before - in the upstream trunk I don't get 
this issue anymore.
It appears that this is an issue in ColumnPruner and is already fixed in 
HIVE-9327.

 Left join query plan outputs  wrong column when using subquery
 --

 Key: HIVE-9613
 URL: https://issues.apache.org/jira/browse/HIVE-9613
 Project: Hive
  Issue Type: Bug
  Components: Parser, Query Planning
Affects Versions: 0.14.0, 1.0.0
 Environment: apache hadoop 2.5.1 
Reporter: Li Xin
 Attachments: test.sql


 I have a query that outputs a column with wrong contents when using 
 subquery,and the contents of that column is equal to another column,not its 
 own.
 I have three tables,as follows:
 table 1: _hivetemp.category_city_rank_:
 ||category||city||rank||
 |jinrongfuwu|shanghai|1|
 |ktvjiuba|shanghai|2|
 table 2:_hivetemp.category_match_:
 ||src_category_en||src_category_cn||dst_category_en||dst_category_cn||
 |danbaobaoxiantouzi|投资担保|担保/贷款|jinrongfuwu|
 |zpwentiyingshi|娱乐/休闲|KTV/酒吧|ktvjiuba|
 table 3:_hivetemp.city_match_:
 ||src_city_name_en||dst_city_name_en||city_name_cn||
 |sh|shanghai|上海|
 And the query is :
 {code}
 select
 a.category,
 a.city,
 a.rank,
 b.src_category_en,
 c.src_city_name_en
 from
 hivetemp.category_city_rank a
 left outer join
 (select
 src_category_en,
 dst_category_en
 from
 hivetemp.category_match) b
 on  a.category = b.dst_category_en
 left outer join
 (select
 src_city_name_en,
 dst_city_name_en
 from
 hivetemp.city_match) c
 on  a.city = c.dst_city_name_en
 {code}
 which shoud output the results as follows,and i test it in hive 0.13:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|danbaobaoxiantouzi|sh|
 |ktvjiuba|shanghai|2|zpwentiyingshi|sh|
 but int hive0.14,the results in the column *src_category_en* is wrong,and is 
 just the *city* contents:
 ||category||city||rank||src_category_en||src_city_name_en||
 |jinrongfuwu|shanghai|1|shanghai|sh|
 |ktvjiuba|shanghai|2|shanghai|sh|
 Using explain to examine the execution plan,i can see the first subquery just 
 outputs one column of *dst_category_en*,and *src_category_en* is just missing.
 {quote}
b:category_match
   TableScan
 alias: category_match
 Statistics: Num rows: 131 Data size: 13149 Basic stats: COMPLETE 
 Column stats: NONE
 Select Operator
   expressions: dst_category_en (type: string)
   outputColumnNames: _col1
   Statistics: Num rows: 131 Data size: 13149 Basic stats: 
 COMPLETE Column stats: NONE
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9719) Up calcite version on cbo branch

2015-02-18 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326850#comment-14326850
 ] 

Julian Hyde commented on HIVE-9719:
---

I just pushed the snapshot. It is based on 
https://github.com/apache/incubator-calcite/commit/f9db1ee9210a04f7a3ddae23e52e26be1669debb.

 Up calcite version on cbo branch
 

 Key: HIVE-9719
 URL: https://issues.apache.org/jira/browse/HIVE-9719
 Project: Hive
  Issue Type: Task
  Components: CBO
Affects Versions: cbo-branch
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-9719.cbo.patch


 CALCITE-594 is now checked in calcite master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Chao Sun
I tested apache-hive.1.1.0-bin and I also got the same error as Szehon
reported.

On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com wrote:

 Hi,



 On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org
 wrote:
  Hi,
 
  From the release branch, I noticed that the hive-exec.jar now contains a
  copy of guava-14 without any relocations.
 
  The hive spark-client pom.xml adds guava as a lib jar instead of shading
  it in.
 
  https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111
 
 
  That seems to be a great approach for guava compat issues across
 execution
  engines.
 
 
  Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??).
 
  https://issues.apache.org/jira/browse/SPARK-2848
 
 
  Does any of the same compatibility issues occur when using a
 hive-exec.jar
  containing guava-14 on MRv2 (which has guava-11 in the classpath)?

 Not that I am aware of. I've tested it on top of MRv2 a number of
 times and I think the unit tests also excercise these code paths.

 
  Cheers,
  Gopal
 
  On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote:
 
 Apache Hive 1.1.0 Release Candidate 2 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
 
 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1025/
 
 Source tag for RC1 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
 
 My key is located here: https://people.apache.org/keys/group/hive.asc
 
 Voting will conclude in 72 hours
 
 




-- 
Best,
Chao


Re: [VOTE] Apache Hive 1.1.0 Release Candidate 2

2015-02-18 Thread Brock Noland
Yeah that is really strange. I have seen that before, a long time
back, and but not found the root cause. I think it's a bug in either
antlr or how we use antlr.

I will re-generate the binaries and start another vote. Note the
source tag will be the same which is technically what we vote on..

On Wed, Feb 18, 2015 at 3:59 PM, Chao Sun c...@cloudera.com wrote:
 I tested apache-hive.1.1.0-bin and I also got the same error as Szehon
 reported.

 On Wed, Feb 18, 2015 at 3:48 PM, Brock Noland br...@cloudera.com wrote:

 Hi,



 On Wed, Feb 18, 2015 at 2:21 PM, Gopal Vijayaraghavan gop...@apache.org
 wrote:
  Hi,
 
  From the release branch, I noticed that the hive-exec.jar now contains a
  copy of guava-14 without any relocations.
 
  The hive spark-client pom.xml adds guava as a lib jar instead of shading
  it in.
 
  https://github.com/apache/hive/blob/branch-1.1/spark-client/pom.xml#L111
 
 
  That seems to be a great approach for guava compat issues across
 execution
  engines.
 
 
  Spark itself relocates guava-14 for compatibility with Hive-on-Spark(??).
 
  https://issues.apache.org/jira/browse/SPARK-2848
 
 
  Does any of the same compatibility issues occur when using a
 hive-exec.jar
  containing guava-14 on MRv2 (which has guava-11 in the classpath)?

 Not that I am aware of. I've tested it on top of MRv2 a number of
 times and I think the unit tests also excercise these code paths.

 
  Cheers,
  Gopal
 
  On 2/17/15, 3:14 PM, Brock Noland br...@cloudera.com wrote:
 
 Apache Hive 1.1.0 Release Candidate 2 is available here:
 http://people.apache.org/~brock/apache-hive-1.1.0-rc2/
 
 Maven artifacts are available here:
 https://repository.apache.org/content/repositories/orgapachehive-1025/
 
 Source tag for RC1 is at:
 http://svn.apache.org/repos/asf/hive/tags/release-1.1.0-rc2/
 
 My key is located here: https://people.apache.org/keys/group/hive.asc
 
 Voting will conclude in 72 hours
 
 




 --
 Best,
 Chao


[jira] [Reopened] (HIVE-9537) string expressions on a fixed length character do not preserve trailing spaces

2015-02-18 Thread N Campbell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

N Campbell reopened HIVE-9537:
--

The Hive documentation is vague at best with respect to when padding is 
preserved/ignored:

Char types are similar to Varchar but they are fixed-length meaning that 
values shorter than the specified length value are padded with spaces but 
trailing spaces are not important during comparisons. The maximum length is 
fixed at 255. 

There is no discussion on non-comparison operations such as upper, lower, 
concat etc.

Consider the following, the driver may return CCHAR will trailing blanks but a 
string operation such as concat fails to preserve them. Should an application 
locally perform a scalar operation on the returned value such as LEN, LOWER etc 
then it may retain the spaces. Meanwhile server side an 'equivalent' expression 
is not blank preserving.

select rnum, cchar, concat( concat( concat( cchar,'...'), cchar),'...') from 
tchar. 

So the driver will return BBspaces and then BB...BB... for the 2nd and 3rd 
projected item. Similarly length(cchar) returns 2 and not 5 etc.

Customers using technologies such as Hana, DB2, Netezza, ... will expect the 
blank padded behaviour. To all intents and purposes most SQL persons would not 
consider the implementation to be fixed length character.

i.e length(cchar) returns 32

i.e cchar || '...' . returns 'BB  ...BB 
 ...'

Should this be the design intent of Hive I would ask for the documentation to 
be far more comprehensive is stating the semantics. 



 string expressions on a fixed length character do not preserve trailing spaces
 --

 Key: HIVE-9537
 URL: https://issues.apache.org/jira/browse/HIVE-9537
 Project: Hive
  Issue Type: Bug
  Components: SQL
Reporter: N Campbell
Assignee: Aihua Xu

 When a string expression such as upper or lower is applied to a fixed length 
 column the trailing spaces of the fixed length character are not preserved.
 {code:sql}
 CREATE TABLE  if not exists TCHAR ( 
 RNUM int, 
 CCHAR char(32)
 )
 ROW FORMAT DELIMITED 
 FIELDS TERMINATED BY '|' 
 LINES TERMINATED BY '\n' 
 STORED AS TEXTFILE;
 {code}
 {{cchar}} as a {{char(32)}}.
 {code:sql}
 select cchar, concat(cchar, cchar), concat(lower(cchar), cchar), 
 concat(upper(cchar), cchar) 
 from tchar;
 {code}
 0|\N
 1|
 2| 
 3|BB
 4|EE
 5|FF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9561) SHUFFLE_SORT should only be used for order by query [Spark Branch]

2015-02-18 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326921#comment-14326921
 ] 

Rui Li commented on HIVE-9561:
--

Thank you Xuefu!

 SHUFFLE_SORT should only be used for order by query [Spark Branch]
 --

 Key: HIVE-9561
 URL: https://issues.apache.org/jira/browse/HIVE-9561
 Project: Hive
  Issue Type: Sub-task
  Components: Spark
Reporter: Rui Li
Assignee: Rui Li
 Fix For: spark-branch

 Attachments: HIVE-9561.1-spark.patch, HIVE-9561.2-spark.patch, 
 HIVE-9561.3-spark.patch, HIVE-9561.4-spark.patch, HIVE-9561.5-spark.patch, 
 HIVE-9561.6-spark.patch


 The {{sortByKey}} shuffle launches probe jobs. Such jobs can hurt performance 
 and are difficult to control. So we should limit the use of {{sortByKey}} to 
 order by query only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >