date:20131211


[ 
https://issues.apache.org/jira/browse/HIVE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845173#comment-13845173
 ] 

Hive QA commented on HIVE-6003:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618173/HIVE-6003.1.patch

{color:green}SUCCESS:{color} +1 4762 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/609/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/609/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618173

 bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS 
 ---

 Key: HIVE-6003
 URL: https://issues.apache.org/jira/browse/HIVE-6003
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6003.1.patch


 hadoop (0.20.2, 1.x, 2.x) appends HADOOP_CLIENT_OPTS to HADOO_OPTS. 
 So it is unnecessary to have this statement in bin/hive, under debug mode -
  
   export HADOOP_OPTS=$HADOOP_OPTS $HADOOP_CLIENT_OPTS
 It results in the HADOOP_CLIENT_OPTS being appended twice, resulting in this 
 error in debug mode.
 {code}
 bin/hive --debug 
 ERROR: Cannot load this JVM TI agent twice, check your java command line for 
 duplicate jdwp options.
 Error occurred during initialization of VM
 agent library failed to init: jdwp
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5978) Rollups not supported in vector mode.

2013-12-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845189#comment-13845189
 ] 

Lefty Leverenz commented on HIVE-5978:
--

Does this need to be documented in the design doc (HIVE-4160)?

 Rollups not supported in vector mode.
 -

 Key: HIVE-5978
 URL: https://issues.apache.org/jira/browse/HIVE-5978
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.13.0

 Attachments: HIVE-5978.1.patch


 Rollups are not supported in vector mode, the query should fail to vectorize. 
 A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6006) Add UDF to calculate distance between geographic coordinates

2013-12-11 Thread Kostiantyn Kudriavtsev (JIRA)

Kostiantyn Kudriavtsev created HIVE-6006:


 Summary: Add UDF to calculate distance between geographic 
coordinates
 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Kostiantyn Kudriavtsev
Priority: Minor


It would be nice to have Hive UDF to calculate distance between two points on 
Earth. Haversine formula seems to be good enough to overcome this issue

The next function is proposed:
HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6006) Add UDF to calculate distance between geographic coordinates

2013-12-11 Thread Kostiantyn Kudriavtsev (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845224#comment-13845224
 ] 

Kostiantyn Kudriavtsev commented on HIVE-6006:
--

Hi guys, could please anyone assign this issue to me?
I'll provide path as soon as possible

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5978) Rollups not supported in vector mode.

2013-12-11 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845228#comment-13845228
 ] 

Lefty Leverenz commented on HIVE-5978:
--

Oh, just realized the wiki has a vectorization design doc now:  
[https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution].  
It links to HIVE-4160 for the complete doc, very helpful.

ROLLUP isn't in the list of supported data types and operations, so perhaps its 
current lack of support is implied.  Should there be a list of supported 
aggregates?

 Rollups not supported in vector mode.
 -

 Key: HIVE-5978
 URL: https://issues.apache.org/jira/browse/HIVE-5978
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.13.0

 Attachments: HIVE-5978.1.patch


 Rollups are not supported in vector mode, the query should fail to vectorize. 
 A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.


[ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845262#comment-13845262
 ] 

Hive QA commented on HIVE-5945:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618185/HIVE-5945.2.patch.txt

{color:green}SUCCESS:{color} +1 4761 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/611/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/611/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618185

 ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
 tables which are not used in the child of this conditional task.
 -

 Key: HIVE-5945
 URL: https://issues.apache.org/jira/browse/HIVE-5945
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
Reporter: Yin Huai
Assignee: Navis
Priority: Critical
 Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt


 Here is an example
 {code}
 select
i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 FROM store_sales
 JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
 JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
 JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
 customer_demographics.cd_demo_sk)
 JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
 where
cd_gender = 'F' and
cd_marital_status = 'U' and
cd_education_status = 'Primary' and
d_year = 2002 and
s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
 group by
i_item_id,
s_state
 order by
i_item_id,
s_state
 limit 100;
 {\code}
 I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
 jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
 date_dim) and 3 MR job (for reduce joins.)
 So, I checked the conditional task determining the plan of the join involving 
 item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
 aliasToFileSizeMap contains all input tables used in this query and the 
 intermediate table generated by joining store_sales and date_dim. So, when we 
 sum the size of all small tables, the size of store_sales (which is around 
 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: doc on predicate pushdown in joins

2013-12-11 Thread Lefty Leverenz

Happy to fix the sentence and the link.  I pointed out the name change just
so you would review it, so please don't apologize!

One more question:  why am I not finding getQualifiedAliases() in the
SemanticAnalyzer class?  It turns up in OpProcFactory.java with javadoc
comments, but I can't find it anywhere in the API docs -- not even in the
index (Hive 0.12.0 API http://hive.apache.org/docs/r0.12.0/api/):

*getQMap()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html#getQMap()
-
Method in class
org.apache.hadoop.hive.ql.QTestUtilhttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html
 
*getQualifiedName()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.html#getQualifiedName()
-
Method in class
org.apache.hadoop.hive.serde2.typeinfo.TypeInfohttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.htmlString
representing the qualified type
name.*getQualifiers()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiers()
-
Method in class
org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html
 
*getQualifiersSize()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiersSize()
-
Method in class
org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html
Most mysterious.

-- Lefty


On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.comwrote:

 I can see why you would rename.

 But this sentence is not correct:
 'Hive enforces the predicate pushdown rules by these methods in the
 SemanticAnalyzer and JoinPPD classes:'

 It should be:
 Hive enforces the rules by these methods in the SemanticAnalyzer and
 JoinPPD classes:

 (The implementation involves both predicate pushdown and analyzing join
 conditions)
 Sorry about this.

 So the link should say 'Hive Outer Join Behavior'

 regards,
 Harish.


 On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 How's this?  Hive 
 Implementationhttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-HiveImplementation

 Also, I moved the link on the Design Docs 
 pagehttps://cwiki.apache.org/confluence/display/Hive/DesignDocsfrom
 *Proposed* to *Other*.  (It's called SQL Outer Join Predicate Pushdown
 Rules https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior 
 which
 doesn't match the title, but seems okay because it's more descriptive.)

 -- Lefty


 On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.comwrote:

 You are correct, it is plural.

 regards,
 Harish.

 On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 Okay, then monospace with () after the method name is a good way to
 show them:  parseJoinCondition() and getQualifiedAlias() ... but I only
 found the latter pluralized, instead of singular, so should it be
 getQualifiedAliases() or am I missing something?

 trunk *grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'*

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   *
 the comments for getQualifiedAliases function.

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:
  SetString aliases = getQualifiedAliases((JoinOperator) nd, owi

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
// be pushed down per getQualifiedAliases

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
  private SetString getQualifiedAliases(JoinOperator op, RowResolver
 rr) {



 -- Lefty


 On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani hbut...@hortonworks.comwrote:

 Looks good.  Thanks for doing this.

 Minor point:

 *Rule 1:* During *QBJoinTree* construction in Plan Gen, the parse Join
 Condition logic applies this rule.
 *Rule 2:* During *JoinPPD* (Join Predicate Pushdown) the get Qualified
 Alias logic applies this rule.

 FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the
 SemanticAnalyzer and JoinPPD classes respectively.
 Writing these as separate words maybe confusing. You are better judge of
 how to represent this(quoted/bold etc.)

 regards,
 Harish.


 On Dec 9, 2013, at 1:52 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 The Outer Join Behavior
 wikidoc
 https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavioris

 done, with links from the Design
 Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs page
 and
 the Joins doc
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinOptimization
 
 .

 Harish (or anyone else) would you please review the changes I made to
 the definition
 for Null Supplying
 table

[jira] [Commented] (HIVE-5936) analyze command failing to collect stats with counter mechanism


[ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845307#comment-13845307
 ] 

Hive QA commented on HIVE-5936:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618158/HIVE-5936.9.patch.txt

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 4762 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_aggregator_error_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_publisher_error_1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_publisher_error_2
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/612/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/612/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618158

 analyze command failing to collect stats with counter mechanism
 ---

 Key: HIVE-5936
 URL: https://issues.apache.org/jira/browse/HIVE-5936
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-5936.1.patch.txt, HIVE-5936.2.patch.txt, 
 HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, 
 HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, 
 HIVE-5936.9.patch.txt


 With counter mechanism, MR job is successful, but StatsTask on client fails 
 with NPE.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5595) Implement vectorized SMB JOIN


[ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845334#comment-13845334
 ] 

Hive QA commented on HIVE-5595:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618196/HIVE-5595.2.patch

{color:green}SUCCESS:{color} +1 4764 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/613/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/613/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618196

 Implement vectorized SMB JOIN
 -

 Key: HIVE-5595
 URL: https://issues.apache.org/jira/browse/HIVE-5595
 Project: Hive
  Issue Type: Sub-task
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Critical
 Attachments: HIVE-5595.1.patch, HIVE-5595.2.patch

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6007) Make the output of the reduce side plan optimized by the correlation optimizer more reader-friendly.

2013-12-11 Thread Yin Huai (JIRA)

Yin Huai created HIVE-6007:
--

 Summary: Make the output of the reduce side plan optimized by the 
correlation optimizer more reader-friendly.
 Key: HIVE-6007
 URL: https://issues.apache.org/jira/browse/HIVE-6007
 Project: Hive
  Issue Type: Sub-task
Reporter: Yin Huai
Assignee: Yin Huai
Priority: Minor


Because a MuxOperator can have multiple parents, the output of the plan can 
show the sub-plan starting from this MuxOperator multiple times, which makes 
the reduce side plan confusing. An example is shown in 
https://mail-archives.apache.org/mod_mbox/hive-user/201312.mbox/%3CCAO0ZKSjniR0z%2BOt4KWouq236fKXo%3D5nE_Oih7A87e3HiuBsG9w%40mail.gmail.com%3E.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: doc on predicate pushdown in joins

2013-12-11 Thread Harish Butani

getQualifiedAliases is a private method in JoinPPD.

Maybe we should remove the section on Hive Implementation here. It is in the 
Design doc; this information only concerns developers.

regards,
Harish.


On Dec 11, 2013, at 3:05 AM, Lefty Leverenz leftylever...@gmail.com wrote:

 Happy to fix the sentence and the link.  I pointed out the name change just 
 so you would review it, so please don't apologize!
 
 One more question:  why am I not finding getQualifiedAliases() in the 
 SemanticAnalyzer class?  It turns up in OpProcFactory.java with javadoc 
 comments, but I can't find it anywhere in the API docs -- not even in the 
 index (Hive 0.12.0 API): 
 
 getQMap() - Method in class org.apache.hadoop.hive.ql.QTestUtil
  
 getQualifiedName() - Method in class 
 org.apache.hadoop.hive.serde2.typeinfo.TypeInfo
 String representing the qualified type name.
 getQualifiers() - Method in class 
 org.apache.hive.service.cli.thrift.TTypeQualifiers
  
 getQualifiersSize() - Method in class 
 org.apache.hive.service.cli.thrift.TTypeQualifiers
 
 Most mysterious.
 
 -- Lefty
 
 
 On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.com 
 wrote:
 I can see why you would rename.
 
 But this sentence is not correct:
 'Hive enforces the predicate pushdown rules by these methods in the 
 SemanticAnalyzer and JoinPPD classes:'
 
 It should be:
 Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD 
 classes:
 
 (The implementation involves both predicate pushdown and analyzing join 
 conditions)
 Sorry about this.
 
 So the link should say 'Hive Outer Join Behavior'
 
 regards,
 Harish.
 
 
 On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 How's this?  Hive Implementation
 
 Also, I moved the link on the Design Docs page from Proposed to Other.  
 (It's called SQL Outer Join Predicate Pushdown Rules which doesn't match the 
 title, but seems okay because it's more descriptive.)
 
 -- Lefty
 
 
 On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.com 
 wrote:
 You are correct, it is plural.
 
 regards,
 Harish.
 
 On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 Okay, then monospace with () after the method name is a good way to show 
 them:  parseJoinCondition() and getQualifiedAlias() ... but I only found 
 the latter pluralized, instead of singular, so should it be 
 getQualifiedAliases() or am I missing something?
 
 trunk grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   * the 
 comments for getQualifiedAliases function.
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:  
 SetString aliases = getQualifiedAliases((JoinOperator) nd, owi
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
 // be pushed down per getQualifiedAliases
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
 private SetString getQualifiedAliases(JoinOperator op, RowResolver rr) {
 
 
 -- Lefty
 
 
 On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani hbut...@hortonworks.com 
 wrote:
 Looks good.  Thanks for doing this.
 
 Minor point:
 
 Rule 1: During QBJoinTree construction in Plan Gen, the parse Join 
 Condition logic applies this rule.
 Rule 2: During JoinPPD (Join Predicate Pushdown) the get Qualified Alias 
 logic applies this rule.
 
 FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the 
 SemanticAnalyzer and JoinPPD classes respectively. 
 Writing these as separate words maybe confusing. You are better judge of 
 how to represent this(quoted/bold etc.)
 
 regards,
 Harish.
 
 
 On Dec 9, 2013, at 1:52 AM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 The Outer Join Behavior
 wikidochttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavioris
 
 done, with links from the Design
 Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs page and
 the Joins 
 dochttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinOptimization
 .
 
 Harish (or anyone else) would you please review the changes I made to
 the definition
 for Null Supplying
 tablehttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-Definitions
 
 ?
 
 -- Lefty
 
 
 On Mon, Dec 2, 2013 at 6:46 PM, Thejas Nair the...@hortonworks.com wrote:
 
 :)
 
 
 On Mon, Dec 2, 2013 at 6:18 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:
 Easy as 3.14159  (I can take a hint.)
 
 -- Lefty
 
 
 On Mon, Dec 2, 2013 at 5:34 PM, Thejas Nair the...@hortonworks.com
 wrote:
 
 FYI, Harish has a written a very nice doc describing predicate push
 down rules for join. I have attached it to the design doc page. It
 will be very useful for anyone looking at joins.
 
 
 https://cwiki.apache.org/confluence/download/attachments/27362075/OuterJoinBehavior.html
 
 (any help converting it to wiki format from html is welcome!).
 
 --
 CONFIDENTIALITY

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates

2013-12-11 Thread Kostiantyn Kudriavtsev (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Affects Version/s: 0.13.0
 Tags: UDF
Fix Version/s: 0.13.0

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5897) Fix hadoop2 execution environment Milestone 2

2013-12-11 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5897:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Vikram!

 Fix hadoop2 execution environment Milestone 2
 -

 Key: HIVE-5897
 URL: https://issues.apache.org/jira/browse/HIVE-5897
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Vikram Dixit K
 Fix For: 0.13.0

 Attachments: HIVE-5897.4.patch, HIVE-5897.5.patch, HIVE-5897.patch, 
 HIVE-5897.patch, HIVE-5897.patch


 Follow on to HIVE-5755.
 List of known issues:
 hcatalog-pig-adapter and ql need
 {noformat}
  dependency
groupIdorg.apache.hadoop/groupId
artifactIdhadoop-mapreduce-client-common/artifactId
version${hadoop-23.version}/version
scopetest/scope
 /dependency
 {noformat}
 hcatalog core and hbase storage handler needs 
 {noformat}
  dependency
groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-common/artifactId
   version${hadoop-23.version}/version
   classifiertests/classifier
   scopetest/scope
 /dependency
 dependency
   groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-mapreduce-client-hs/artifactId
   version${hadoop-23.version}/version
   scopetest/scope
 /dependency
dependency
  groupIdorg.apache.hadoop/groupId
  artifactIdhadoop-yarn-server-tests/artifactId
  version${hadoop-23.version}/version
  classifiertests/classifier
  scopetest/scope
/dependency
 {noformat}
 hcatalog core needs:
 {noformat}
dependency
   groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-mapreduce-client-jobclient/artifactId
   version${hadoop-23.version}/version
   scopetest/scope
 /dependency
 {noformat}
 beeline needs 
 {noformat}
 dependency
   groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-mapreduce-client-core/artifactId
   version${hadoop-23.version}/version
   scopetest/scope
 /dependency
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-11 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845634#comment-13845634
 ] 

Eric Hanson commented on HIVE-5996:
---

-1

We should not be changing the output data types of expression results that are 
arguably reasonable. It causes code churn and can break existing apps. 

Having sum(bigint) return bigint is long standing behavior in Hive and is 
reasonable.

As a side note, SQL Server returns bigint for sum(bigint).

If users need more digits, they can cast the input to sum to a decimal.

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16146: HIVE-5993: JDBC Driver should not hard-code the database name

2013-12-11 Thread Prasad Mujumdar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16146/#review30215
---


The patch look fine.
It would be better to avoid duplicating the code to call GetInfo().


jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java
https://reviews.apache.org/r/16146/#comment57827

I think it would be better to add a helper method to call GetInfo() with 
given InfoType. We'll endup duplicating this code in multiple places.


- Prasad Mujumdar


On Dec. 10, 2013, 12:54 a.m., Szehon Ho wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16146/
 ---
 
 (Updated Dec. 10, 2013, 12:54 a.m.)
 
 
 Review request for hive and Prasad Mujumdar.
 
 
 Bugs: HIVE-5993
 https://issues.apache.org/jira/browse/HIVE-5993
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
 string Hive.
 
 This should instead call the existing Hive-server2 api to return the db name. 
  Incidentally, the server returns Apache Hive.
 
 
 Diffs
 -
 
   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
 1ba8ad3 
   jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 5087ded 
 
 Diff: https://reviews.apache.org/r/16146/diff/
 
 
 Testing
 ---
 
 Ran TestJdbcDriver2.
 
 
 Thanks,
 
 Szehon Ho

[jira] [Commented] (HIVE-5993) JDBC Driver should not hard-code the database name

2013-12-11 Thread Prasad Mujumdar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845646#comment-13845646
 ] 

Prasad Mujumdar commented on HIVE-5993:
---

[~szehon] I added a comment on the reviewboard.

 JDBC Driver should not hard-code the database name
 --

 Key: HIVE-5993
 URL: https://issues.apache.org/jira/browse/HIVE-5993
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-5993.patch, HIVE-5993.patch


 Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
 string hive.
 This should instead call the existing Hive-server2 api to return the db name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result


[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845650#comment-13845650
 ] 

Thejas M Nair commented on HIVE-5996:
-

[~xuefuz] Can you please mark any jiras that make/propose such non backward 
compatible changes with the 'Incompatible change' flag ?
That would ensure that the community reviews such changes more carefully.


 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result


 [ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5996:


Hadoop Flags: Incompatible change

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2


[ 
https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845655#comment-13845655
 ] 

Harish Butani commented on HIVE-6004:
-

+1

 Fix statistics annotation related test failures in hadoop2
 --

 Key: HIVE-6004
 URL: https://issues.apache.org/jira/browse/HIVE-6004
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor, Statistics
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.13.0

 Attachments: HIVE-6004.1.patch


 Fix test failures that are related to HIVE-5369 and its subtask changes.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL


[ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845666#comment-13845666
 ] 

Sergey Shelukhin commented on HIVE-5679:


added extra date parsing to metastore itself

 add date support to metastore JDO/SQL
 -

 Key: HIVE-5679
 URL: https://issues.apache.org/jira/browse/HIVE-5679
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5679.01.patch, HIVE-5679.patch


 Metastore supports strings and integral types in filters.
 It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5679) add date support to metastore JDO/SQL


 [ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5679:
---

Attachment: HIVE-5679.01.patch

 add date support to metastore JDO/SQL
 -

 Key: HIVE-5679
 URL: https://issues.apache.org/jira/browse/HIVE-5679
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5679.01.patch, HIVE-5679.patch


 Metastore supports strings and integral types in filters.
 It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845664#comment-13845664
 ] 

Xuefu Zhang commented on HIVE-5996:
---

Okay. Will do.

 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16171: HIVE-5679 add date support to metastore JDO/SQL

2013-12-11 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16171/
---

(Updated Dec. 11, 2013, 7:41 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

See JIRA


Diffs (updated)
-

  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
01c2626 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
a98d9d1 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 04d399f 
  
metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 
93e9942 
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 00e90cb 
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 
  ql/src/test/queries/clientpositive/partition_date.q 3c031db 
  ql/src/test/results/clientpositive/partition_date.q.out 3462a1b 

Diff: https://reviews.apache.org/r/16171/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Updated] (HIVE-6008) optionally include metastore info into explain output


 [ 
https://issues.apache.org/jira/browse/HIVE-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6008:
---

Summary: optionally include metastore info into explain output  (was: 
optionally include metastore path into explain output)

 optionally include metastore info into explain output
 -

 Key: HIVE-6008
 URL: https://issues.apache.org/jira/browse/HIVE-6008
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Priority: Minor

 To verify some metastore perf improvements are working, it would be nice to 
 (optionally) output basic description of what metastore did (whether filter 
 was pushed down, whether sql was used) into explain output, and enable this 
 option for some q tests (e.g. partition_date, and a few others), or all of 
 them.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6008) optionally include metastore path into explain output

Sergey Shelukhin created HIVE-6008:
--

 Summary: optionally include metastore path into explain output
 Key: HIVE-6008
 URL: https://issues.apache.org/jira/browse/HIVE-6008
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Priority: Minor


To verify some metastore perf improvements are working, it would be nice to 
(optionally) output basic description of what metastore did (whether filter was 
pushed down, whether sql was used) into explain output, and enable this option 
for some q tests (e.g. partition_date, and a few others), or all of them.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

2013-12-11 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845677#comment-13845677
 ] 

Eric Hanson commented on HIVE-5356:
---

Vectorization has to implement a specific semantics for an operation. So if the 
semantics change, the vectorized implementation of the operation must be 
changed too, or the operation could either fail to vectorize or give wrong 
results.

 Move arithmatic UDFs to generic UDF implementations
 ---

 Key: HIVE-5356
 URL: https://issues.apache.org/jira/browse/HIVE-5356
 Project: Hive
  Issue Type: Task
  Components: UDF
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
 HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
 HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
 HIVE-5356.8.patch, HIVE-5356.9.patch


 Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
 implemented as old-style UDFs and java reflection is used to determine the 
 return type TypeInfos/ObjectInspectors, based on the return type of the 
 evaluate() method chosen for the expression. This works fine for types that 
 don't have type params.
 Hive decimal type participates in these operations just like int or double. 
 Different from double or int, however, decimal has precision and scale, which 
 cannot be determined by just looking at the return type (decimal) of the UDF 
 evaluate() method, even though the operands have certain precision/scale. 
 With the default of decimal without precision/scale, then (10, 0) will be 
 the type params. This is certainly not desirable.
 To solve this problem, all of the arithmetic operators would need to be 
 implemented as GenericUDFs, which allow returning ObjectInspector during the 
 initialize() method. The object inspectors returned can carry type params, 
 from which the exact return type can be determined.
 It's worth mentioning that, for user UDF implemented in non-generic way, if 
 the return type of the chosen evaluate() method is decimal, the return type 
 actually has (10,0) as precision/scale, which might not be desirable. This 
 needs to be documented.
 This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
 the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations


[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845682#comment-13845682
 ] 

Sergey Shelukhin commented on HIVE-5356:


Maybe some special test can be added for this? Run a set of simple queries with 
and without, and ensure results are the same. That will solve the problem 
before commits.


 Move arithmatic UDFs to generic UDF implementations
 ---

 Key: HIVE-5356
 URL: https://issues.apache.org/jira/browse/HIVE-5356
 Project: Hive
  Issue Type: Task
  Components: UDF
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
 HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
 HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
 HIVE-5356.8.patch, HIVE-5356.9.patch


 Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
 implemented as old-style UDFs and java reflection is used to determine the 
 return type TypeInfos/ObjectInspectors, based on the return type of the 
 evaluate() method chosen for the expression. This works fine for types that 
 don't have type params.
 Hive decimal type participates in these operations just like int or double. 
 Different from double or int, however, decimal has precision and scale, which 
 cannot be determined by just looking at the return type (decimal) of the UDF 
 evaluate() method, even though the operands have certain precision/scale. 
 With the default of decimal without precision/scale, then (10, 0) will be 
 the type params. This is certainly not desirable.
 To solve this problem, all of the arithmetic operators would need to be 
 implemented as GenericUDFs, which allow returning ObjectInspector during the 
 initialize() method. The object inspectors returned can carry type params, 
 from which the exact return type can be determined.
 It's worth mentioning that, for user UDF implemented in non-generic way, if 
 the return type of the chosen evaluate() method is decimal, the return type 
 actually has (10,0) as precision/scale, which might not be desirable. This 
 needs to be documented.
 This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
 the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL


[ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845711#comment-13845711
 ] 

Hive QA commented on HIVE-5679:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618282/HIVE-5679.01.patch

{color:green}SUCCESS:{color} +1 4762 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/614/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/614/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618282

 add date support to metastore JDO/SQL
 -

 Key: HIVE-5679
 URL: https://issues.apache.org/jira/browse/HIVE-5679
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-5679.01.patch, HIVE-5679.patch


 Metastore supports strings and integral types in filters.
 It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6003) bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS


[ 
https://issues.apache.org/jira/browse/HIVE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845714#comment-13845714
 ] 

Thejas M Nair commented on HIVE-6003:
-

bq. I think it is sufficient to just remove HADOOP_CLIENT_OPTS from HADOOP_OPTS 
to make it work.
That is what I am doing in the patch.


 bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS 
 ---

 Key: HIVE-6003
 URL: https://issues.apache.org/jira/browse/HIVE-6003
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-6003.1.patch


 hadoop (0.20.2, 1.x, 2.x) appends HADOOP_CLIENT_OPTS to HADOO_OPTS. 
 So it is unnecessary to have this statement in bin/hive, under debug mode -
  
   export HADOOP_OPTS=$HADOOP_OPTS $HADOOP_CLIENT_OPTS
 It results in the HADOOP_CLIENT_OPTS being appended twice, resulting in this 
 error in debug mode.
 {code}
 bin/hive --debug 
 ERROR: Cannot load this JVM TI agent twice, check your java command line for 
 duplicate jdwp options.
 Error occurred during initialization of VM
 agent library failed to init: jdwp
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5872) Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types

2013-12-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845715#comment-13845715
 ] 

Xuefu Zhang commented on HIVE-5872:
---

Thanks for the review, Prasad.

{quote}
In general, it looks like we need more exception logic for handling decimals in 
UDAF (HIVE-5872, HIVE-5866). It might be useful to add a note in the dev guide 
for future work ..
{quote}
I assume you are referring the following code snippet:
{code}
if (t == null) {
  return warnedOnceNullMapKey;
}
{code}
I agree with your assessment. Currently Hive emits null as the only error 
handling option. Thus, null check is (or is missed) everywhere in the code, not 
specific to decimal. For a long run, I agree we need to have a better exception 
handling, especially when we introduce different error handling (HIVE-5438).


 Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal 
 types
 ---

 Key: HIVE-5872
 URL: https://issues.apache.org/jira/browse/HIVE-5872
 Project: Hive
  Issue Type: Improvement
  Components: Types, UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5872.1.patch, HIVE-5872.2.patch, HIVE-5872.3.patch, 
 HIVE-5872.4.patch, HIVE-5872.patch


 Currently UDAFs are still reporting system default precision/scale (38, 18) 
 for decimal results. Not only this is coarse, but also this can cause 
 problems in subsequent operators such as division, where the result is 
 dependent on the precision/scale of the input, which can go out of bound 
 (38,38). Thus, these UDAFs should correctly report the precision/scale of the 
 result.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone

2013-12-11 Thread Johndee Burks (JIRA)

Johndee Burks created HIVE-6009:
---

 Summary: Add from_unixtime UDF that has controllable Timezone
 Key: HIVE-6009
 URL: https://issues.apache.org/jira/browse/HIVE-6009
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.10.0
 Environment: CDH4.4
Reporter: Johndee Burks
Priority: Trivial


Currently the from_unixtime UDF takes into a account timezone of the system 
doing the transformation. I think that implementation is good, but it would be 
nice to include or change the current UDF to have a configurable timezone. 

It would be useful for looking at timestamp data from different regions in the 
native region's timezone. 

Example: 

from_unixtime(unix_time, format, timezone)

from_unixtime(129384, dd MMM , GMT-5)





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Review Request 16184: Hive should be able to skip header and footer rows when reading data file for a table (HIVE-5795)

2013-12-11 Thread Shuaishuai Nie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16184/
---

Review request for hive, Eric Hanson and Thejas Nair.


Bugs: hive-5795
https://issues.apache.org/jira/browse/hive-5795


Repository: hive-git


Description
---

Hive should be able to skip header and footer rows when reading data file for a 
table
(follow up with review https://reviews.apache.org/r/15663/diff/#index_header)


Diffs
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 
  conf/hive-default.xml.template c61a0bb 
  data/files/header_footer_table_1/0001.txt PRE-CREATION 
  data/files/header_footer_table_1/0002.txt PRE-CREATION 
  data/files/header_footer_table_1/0003.txt PRE-CREATION 
  data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION 
  data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION 
  data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION 
  itests/qtest/pom.xml c3cbb89 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java d2b2526 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
dd5cb6b 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 974a5d6 
  
ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java 
85dd975 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 
0686d9b 
  ql/src/test/queries/clientnegative/file_with_header_footer_negative.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION 
  ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/file_with_header_footer.q.out PRE-CREATION 
  serde/if/serde.thrift 2ceb572 
  
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java
 22a6168 

Diff: https://reviews.apache.org/r/16184/diff/


Testing
---


Thanks,

Shuaishuai Nie

[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2013-12-11 Thread Shuaishuai Nie (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845731#comment-13845731
 ] 

Shuaishuai Nie commented on HIVE-5795:
--

Updated the code review at https://reviews.apache.org/r/16174/

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.number=1, 
 skip.footer.number=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.


[ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845745#comment-13845745
 ] 

Jitendra Nath Pandey commented on HIVE-5979:


Committed to trunk.

 Failure in cast to timestamps.
 --

 Key: HIVE-5979
 URL: https://issues.apache.org/jira/browse/HIVE-5979
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.13.0

 Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch


 Query ran:
 {code}
 select cast(t as timestamp), cast(si as timestamp),
cast(i as timestamp), cast(b as timestamp),
cast(f as string), cast(d as timestamp),
cast(bo as timestamp), cast(b * 0 as timestamp),
cast(ts as timestamp), cast(s as timestamp),
cast(substr(s, 1, 1) as timestamp)
 from Table1;
 {code}
 Running this query with hive.vectorized.execution.enabled=true fails with the 
 following exception:
 {noformat}
 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
 diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
 diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
 at 
 org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
 ... 8 more
 Caused by: java.lang.IllegalArgumentException: nanos  9 or  0
 at java.sql.Timestamp.setNanos(Timestamp.java:383)
 at 
 org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
 at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
 at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
 ... 9 more
 {noformat}
 Full log is attached.
 Schema for the table is as follows:
 {code}
 hive desc Table1;
 OK
 t tinyint from deserializer
 sismallintfrom deserializer
 i int from deserializer
 b bigint  from deserializer
 f float   from deserializer
 d double  from deserializer
 boboolean from deserializer
 s string  from deserializer
 s2string  from deserializer
 tstimestamp   from deserializer
 Time taken: 0.521 seconds, Fetched: 10 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.


 [ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5979:
---

Release Note:   (was: Committed to trunk.)

 Failure in cast to timestamps.
 --

 Key: HIVE-5979
 URL: https://issues.apache.org/jira/browse/HIVE-5979
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.13.0

 Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch


 Query ran:
 {code}
 select cast(t as timestamp), cast(si as timestamp),
cast(i as timestamp), cast(b as timestamp),
cast(f as string), cast(d as timestamp),
cast(bo as timestamp), cast(b * 0 as timestamp),
cast(ts as timestamp), cast(s as timestamp),
cast(substr(s, 1, 1) as timestamp)
 from Table1;
 {code}
 Running this query with hive.vectorized.execution.enabled=true fails with the 
 following exception:
 {noformat}
 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
 diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
 diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
 at 
 org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
 ... 8 more
 Caused by: java.lang.IllegalArgumentException: nanos  9 or  0
 at java.sql.Timestamp.setNanos(Timestamp.java:383)
 at 
 org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
 at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
 at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
 ... 9 more
 {noformat}
 Full log is attached.
 Schema for the table is as follows:
 {code}
 hive desc Table1;
 OK
 t tinyint from deserializer
 sismallintfrom deserializer
 i int from deserializer
 b bigint  from deserializer
 f float   from deserializer
 d double  from deserializer
 boboolean from deserializer
 s string  from deserializer
 s2string  from deserializer
 tstimestamp   from deserializer
 Time taken: 0.521 seconds, Fetched: 10 row(s)
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.


 [ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5979:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
 Release Note: Committed to trunk.
   Status: Resolved  (was: Patch Available)

 Failure in cast to timestamps.
 --

 Key: HIVE-5979
 URL: https://issues.apache.org/jira/browse/HIVE-5979
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 0.13.0

 Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch


 Query ran:
 {code}
 select cast(t as timestamp), cast(si as timestamp),
cast(i as timestamp), cast(b as timestamp),
cast(f as string), cast(d as timestamp),
cast(bo as timestamp), cast(b * 0 as timestamp),
cast(ts as timestamp), cast(s as timestamp),
cast(substr(s, 1, 1) as timestamp)
 from Table1;
 {code}
 Running this query with hive.vectorized.execution.enabled=true fails with the 
 following exception:
 {noformat}
 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
 Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
 diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
 diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
 java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
 Hive Runtime Error while processing row
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
 at 
 org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
 at 
 org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
 at 
 org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at 
 org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
 at 
 org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
 ... 8 more
 Caused by: java.lang.IllegalArgumentException: nanos  9 or  0
 at java.sql.Timestamp.setNanos(Timestamp.java:383)
 at 
 org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
 at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
 at 
 org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
 at 
 org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
 ... 9 more
 {noformat}
 Full log is attached.
 Schema for the table is as follows:
 {code}
 hive desc Table1;
 OK
 t tinyint from deserializer
 sismallintfrom deserializer
 i int from deserializer
 b bigint  from deserializer
 f float   from deserializer
 d double  from deserializer
 boboolean from deserializer
 s string  from deserializer
 s2string  from deserializer
 tstimestamp   from deserializer
 Time taken: 0.521 seconds, Fetched: 10 row(s)
 {code}



--
This message was sent by Atlassian JIRA

[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

2013-12-11 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845760#comment-13845760
 ] 

Xuefu Zhang commented on HIVE-5996:
---

{quote}
Having sum(bigint) return bigint is long standing behavior in Hive and is 
reasonable.

As a side note, SQL Server returns bigint for sum(bigint).

If users need more digits, they can cast the input to sum to a decimal.
{quote}

My concern is not about the number of digits that long can hold. Hive processes 
large number of rows that traditional DBs are shy of, and the chance of getting 
overflow error is bigger. With the proposed change, Hive can guarantee 10b (or 
certain number of) rows without worry about this problem. Without it, Hive has 
such guaranty, and two valid rows can overflow, as demonstrated in this JIRA.


 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations


[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845784#comment-13845784
 ] 

Sergey Shelukhin commented on HIVE-5356:


Filed HIVE-6010

 Move arithmatic UDFs to generic UDF implementations
 ---

 Key: HIVE-5356
 URL: https://issues.apache.org/jira/browse/HIVE-5356
 Project: Hive
  Issue Type: Task
  Components: UDF
Affects Versions: 0.11.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
 HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
 HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
 HIVE-5356.8.patch, HIVE-5356.9.patch


 Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
 implemented as old-style UDFs and java reflection is used to determine the 
 return type TypeInfos/ObjectInspectors, based on the return type of the 
 evaluate() method chosen for the expression. This works fine for types that 
 don't have type params.
 Hive decimal type participates in these operations just like int or double. 
 Different from double or int, however, decimal has precision and scale, which 
 cannot be determined by just looking at the return type (decimal) of the UDF 
 evaluate() method, even though the operands have certain precision/scale. 
 With the default of decimal without precision/scale, then (10, 0) will be 
 the type params. This is certainly not desirable.
 To solve this problem, all of the arithmetic operators would need to be 
 implemented as GenericUDFs, which allow returning ObjectInspector during the 
 initialize() method. The object inspectors returned can carry type params, 
 from which the exact return type can be determined.
 It's worth mentioning that, for user UDF implemented in non-generic way, if 
 the return type of the chosen evaluate() method is decimal, the return type 
 actually has (10,0) as precision/scale, which might not be desirable. This 
 needs to be documented.
 This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
 the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution

Sergey Shelukhin created HIVE-6010:
--

 Summary: create a test that would ensure vectorization produces 
same results as non-vectorized execution
 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin


So as to ensure that vectorization is not forgotten when changes are made to 
things. Obviously it would not be viable to have a bulletproof test, but at 
least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution


[ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845783#comment-13845783
 ] 

Sergey Shelukhin commented on HIVE-6010:


I will likely take it later this week if noone else takes it before.

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
Reporter: Sergey Shelukhin

 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution


 [ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6010:
---

Component/s: Vectorization
 Tests

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
  Components: Tests, Vectorization
Reporter: Sergey Shelukhin

 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16146: HIVE-5993: JDBC Driver should not hard-code the database name

2013-12-11 Thread Szehon Ho


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16146/
---

(Updated Dec. 11, 2013, 10:33 p.m.)


Review request for hive and Prasad Mujumdar.


Changes
---

Thanks for the suggestion.  Refactored the getInfo logic into a single method.


Bugs: HIVE-5993
https://issues.apache.org/jira/browse/HIVE-5993


Repository: hive-git


Description
---

Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
string Hive.

This should instead call the existing Hive-server2 api to return the db name.  
Incidentally, the server returns Apache Hive.


Diffs (updated)
-

  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
1ba8ad3 
  jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 5087ded 

Diff: https://reviews.apache.org/r/16146/diff/


Testing
---

Ran TestJdbcDriver2.


Thanks,

Szehon Ho

[jira] [Updated] (HIVE-5993) JDBC Driver should not hard-code the database name

2013-12-11 Thread Szehon Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-5993:


Attachment: HIVE-5993.1.patch

Incorporating review feedback.

 JDBC Driver should not hard-code the database name
 --

 Key: HIVE-5993
 URL: https://issues.apache.org/jira/browse/HIVE-5993
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-5993.1.patch, HIVE-5993.patch, HIVE-5993.patch


 Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
 string hive.
 This should instead call the existing Hive-server2 api to return the db name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5521) Remove CommonRCFileInputFormat


[ 
https://issues.apache.org/jira/browse/HIVE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845825#comment-13845825
 ] 

Jitendra Nath Pandey commented on HIVE-5521:


+1

 Remove CommonRCFileInputFormat
 --

 Key: HIVE-5521
 URL: https://issues.apache.org/jira/browse/HIVE-5521
 Project: Hive
  Issue Type: Bug
  Components: File Formats, Vectorization
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-5521.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6011) correlation optimizer unit tests are failing on tez

2013-12-11 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-6011:
--

 Summary: correlation optimizer unit tests are failing on tez 
 Key: HIVE-6011
 URL: https://issues.apache.org/jira/browse/HIVE-6011
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan


Some extra clean-ups in tez branch made this to fail. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6011) correlation optimizer unit tests are failing on tez

2013-12-11 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6011:
---

Attachment: HIVE-6011-tez-branch.patch

 correlation optimizer unit tests are failing on tez 
 

 Key: HIVE-6011
 URL: https://issues.apache.org/jira/browse/HIVE-6011
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-6011-tez-branch.patch


 Some extra clean-ups in tez branch made this to fail. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck

2013-12-11 Thread Steven Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845856#comment-13845856
 ] 

Steven Wong commented on HIVE-4574:
---

[https://bugs.openjdk.java.net/browse/JDK-8028054] now says that the bug is 
fixed in 8 and backported to 7u60.

 XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
 --

 Key: HIVE-4574
 URL: https://issues.apache.org/jira/browse/HIVE-4574
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.11.0, 0.12.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4574.1.patch


 In open jdk7, XMLEncoder.writeObject call leads to calls to 
 java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe 
 because it uses a static WeakHashMap that would get used from multiple 
 threads. See -
 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46
 Concurrent access to HashMap implementation that are not thread safe can 
 sometimes result in infinite-loops and other problems. If jdk7 is in use, it 
 makes sense to synchronize calls to XMLEncoder.writeObject .



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6012) restore backward compatibility of arithmetic operations

Thejas M Nair created HIVE-6012:
---

 Summary: restore backward compatibility of arithmetic operations
 Key: HIVE-6012
 URL: https://issues.apache.org/jira/browse/HIVE-6012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Thejas M Nair


HIVE-5356 changed the behavior of some of the arithmetic operations, and the 
change is not backward compatible, as pointed out in this [jira 
comment|https://issues.apache.org/jira/browse/HIVE-5356?focusedCommentId=13813398page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13813398]
{code}
int / int = decimal
float / float = double
float * float = double
float + float = double
{code}






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Attachment: QuotedIdentifier.html

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
 Attachments: QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6013) Supporting Quoted Identifiers in Column Names

Harish Butani created HIVE-6013:
---

 Summary: Supporting Quoted Identifiers in Column Names
 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
 Attachments: QuotedIdentifier.html

Hive's current behavior on Quoted Identifiers is different from the normal 
interpretation. Quoted Identifier (using backticks) has a special 
interpretation for Select expressions(as Regular Expressions). Have documented 
current behavior and proposed a solution in attached doc.

Summary of solution is:
- Introduce 'standard' quoted identifiers for columns only. 
- At the langauage level this is turned on by a flag.
- At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6011) correlation optimizer unit tests are failing on tez


[ 
https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845878#comment-13845878
 ] 

Gunther Hagleitner commented on HIVE-6011:
--

LGTM +1

 correlation optimizer unit tests are failing on tez 
 

 Key: HIVE-6011
 URL: https://issues.apache.org/jira/browse/HIVE-6011
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-6011-tez-branch.patch


 Some extra clean-ups in tez branch made this to fail. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Attachment: HIVE-6013.1.patch

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Assigned] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani reassigned HIVE-6013:
---

Assignee: Harish Butani

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6014) Stage ids differ in the tez branch

2013-12-11 Thread Vikram Dixit K (JIRA)

Vikram Dixit K created HIVE-6014:


 Summary: Stage ids differ in the tez branch
 Key: HIVE-6014
 URL: https://issues.apache.org/jira/browse/HIVE-6014
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-6014.1.patch





--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-3183) case expression should allow different types per ISO-SQL 2011

2013-12-11 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845883#comment-13845883
 ] 

Szehon Ho commented on HIVE-3183:
-

I guess we can resolve this one as duplicate, unless there is something of this 
JIRA not captured by the other?

 case expression should allow different types per ISO-SQL 2011
 -

 Key: HIVE-3183
 URL: https://issues.apache.org/jira/browse/HIVE-3183
 Project: Hive
  Issue Type: Bug
  Components: SQL
Affects Versions: 0.8.0
Reporter: N Campbell
 Attachments: Hive-3183.patch.txt, udf_when_type_wrong2.q.out, 
 udf_when_type_wrong3.q.out


 The ISO-SQL standard specification for CASE allows the specification to 
 include different types in the WHEN and ELSE blocks including this example 
 which mixes smallint and integer types
 select case when vsint.csint is not null then vsint.csint else 1 end from 
 cert.vsint vsint 
 The Apache Hive docs do not state how it deviates from the standard or any 
 given restrictions so unsure if this is a bug vs an enhancement. Many SQL 
 applications mix so this seems to be a restrictive implementation if this is 
 by design.
 Argument type mismatch '1': The expression after ELSE should have the same 
 type as those after THEN: smallint is expected but int is found



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6014) Stage ids differ in the tez branch

2013-12-11 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-6014:
-

Status: Patch Available  (was: Open)

 Stage ids differ in the tez branch
 --

 Key: HIVE-6014
 URL: https://issues.apache.org/jira/browse/HIVE-6014
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-6014.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6014) Stage ids differ in the tez branch

2013-12-11 Thread Vikram Dixit K (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-6014:
-

Attachment: HIVE-6014.1.patch

 Stage ids differ in the tez branch
 --

 Key: HIVE-6014
 URL: https://issues.apache.org/jira/browse/HIVE-6014
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-6014.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6014) Stage ids differ in the tez branch


[ 
https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845892#comment-13845892
 ] 

Gunther Hagleitner commented on HIVE-6014:
--

LGTM +1

 Stage ids differ in the tez branch
 --

 Key: HIVE-6014
 URL: https://issues.apache.org/jira/browse/HIVE-6014
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-6014.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Resolved] (HIVE-6011) correlation optimizer unit tests are failing on tez


 [ 
https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-6011.
--

Resolution: Fixed

Committed to branch. Thanks Ashutosh!

 correlation optimizer unit tests are failing on tez 
 

 Key: HIVE-6011
 URL: https://issues.apache.org/jira/browse/HIVE-6011
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-6011-tez-branch.patch


 Some extra clean-ups in tez branch made this to fail. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6011) correlation optimizer unit tests are failing on tez


 [ 
https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6011:
-

Fix Version/s: tez-branch

 correlation optimizer unit tests are failing on tez 
 

 Key: HIVE-6011
 URL: https://issues.apache.org/jira/browse/HIVE-6011
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: tez-branch

 Attachments: HIVE-6011-tez-branch.patch


 Some extra clean-ups in tez branch made this to fail. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one

Sergey Shelukhin created HIVE-6015:
--

 Summary: vectorized logarithm produces results for 0 that are 
different from a non-vectorized one
 Key: HIVE-6015
 URL: https://issues.apache.org/jira/browse/HIVE-6015
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5993) JDBC Driver should not hard-code the database name


[ 
https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845900#comment-13845900
 ] 

Hive QA commented on HIVE-5993:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618310/HIVE-5993.1.patch

{color:green}SUCCESS:{color} +1 4763 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/615/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/615/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618310

 JDBC Driver should not hard-code the database name
 --

 Key: HIVE-5993
 URL: https://issues.apache.org/jira/browse/HIVE-5993
 Project: Hive
  Issue Type: Improvement
  Components: JDBC
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-5993.1.patch, HIVE-5993.patch, HIVE-5993.patch


 Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded 
 string hive.
 This should instead call the existing Hive-server2 api to return the db name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

adding ANSI flag for hive

2013-12-11 Thread Sergey Shelukhin

Hi.

There's recently been some discussion about data type changes in Hive
(double to decimal), and result changes for special cases like division by
zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an
example; I am assuming ANSI SQL is meant).
The latter are non-controversial (I guess), but for the former, performance
may suffer and/or backward compat may be broken if Hive is brought in
compliance.
If fuller ANSI compat is sought in the future, there may be some even
hairier issues such as double-quoted identifiers.

In light of that, and also following MySQL, I wonder if we should add a
flag, or set of flags, to HIVE to be able to force ANSI compliance.
When this/ese flag/s is/are not set, for example, int/int division could
return double for backward compat/perf, vectorization can skip the special
case handling for division by zero/etc., etc.
Wdyt?

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Commented] (HIVE-6012) restore backward compatibility of arithmetic operations


[ 
https://issues.apache.org/jira/browse/HIVE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845916#comment-13845916
 ] 

Sergey Shelukhin commented on HIVE-6012:


I started a dev alias thread about having ANSI flag to choose between old Hive 
mode and ANSI SQL mode

 restore backward compatibility of arithmetic operations
 ---

 Key: HIVE-6012
 URL: https://issues.apache.org/jira/browse/HIVE-6012
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Thejas M Nair

 HIVE-5356 changed the behavior of some of the arithmetic operations, and the 
 change is not backward compatible, as pointed out in this [jira 
 comment|https://issues.apache.org/jira/browse/HIVE-5356?focusedCommentId=13813398page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13813398]
 {code}
 int / int = decimal
 float / float = double
 float * float = double
 float + float = double
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one


 [ 
https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6015:
---

Status: Patch Available  (was: Open)

 vectorized logarithm produces results for 0 that are different from a 
 non-vectorized one
 

 Key: HIVE-6015
 URL: https://issues.apache.org/jira/browse/HIVE-6015
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6015.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one


 [ 
https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6015:
---

Attachment: HIVE-6015.patch

Small (logically) patch

 vectorized logarithm produces results for 0 that are different from a 
 non-vectorized one
 

 Key: HIVE-6015
 URL: https://issues.apache.org/jira/browse/HIVE-6015
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6015.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Resolved] (HIVE-6005) BETWEEN is broken after using KRYO


 [ 
https://issues.apache.org/jira/browse/HIVE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HIVE-6005.


Resolution: Duplicate

HIVE-5263 appears to fix this. Can you try that patch?

 BETWEEN is broken after using KRYO
 --

 Key: HIVE-6005
 URL: https://issues.apache.org/jira/browse/HIVE-6005
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Eric Chu

 After taking in HIVE-1511, HIVE-5422, and HIVE-5257 on top of Hive 0.12 to 
 use Kryo, queries with BETWEEN start to fail with the following exception:
 com.esotericsoftware.kryo.KryoException: Class cannot be created (missing 
 no-arg constructor): 
 org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantBooleanObjectInspector
 Serialization trace:
 argumentOIs (org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween)
 genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)
 filters (org.apache.hadoop.hive.ql.plan.JoinDesc)
 conf (org.apache.hadoop.hive.ql.exec.JoinOperator)
 reducer (org.apache.hadoop.hive.ql.plan.ReduceWork)
   at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097)
   at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109)
   at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526)
 ...
 A workaround is to replace BETWEEN with = and =, but I think this failure 
 is a bug and not by design. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.

Sushanth Sowmyan created HIVE-6016:
--

 Summary: Hadoop23Shims has a bug in listLocatedStatus impl.
 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J


Prashant and I discovered that the implementation of the wrapping Iterator in 
listLocatedStatus at 
https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
 is broken.

Basically, if you had files (a,b,_s) , with a filter that is supposed to filter 
out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with 
hasNext looking at the next value to see if it's null, and using that to decide 
if it has any more entries, and thus, (a,b,_s) becomes (a,b).

The problem with this approach, however, is that if you have an underlying 
(a,_s,b) and expect a (a,b) from it, you won't, because it translates to a 
(a,null,b), which then translates to a (a).

Furthermore, there's a boundary condition on the very first pick, which causes 
a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up 
with a resultant unfiltered (_s,a,b) which orc breaks on.




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


[ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845932#comment-13845932
 ] 

Sushanth Sowmyan commented on HIVE-6016:


Thanks for the correction, Prashanth, I've edited the bug report to remove that 
case.

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


 [ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6016:
-

Attachment: HIVE-6016.1.patch

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


 [ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6016:
---

Description: 
Prashant and I discovered that the implementation of the wrapping Iterator in 
listLocatedStatus at 
https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
 is broken.

Basically, if you had files (a,b,_s) , with a filter that is supposed to filter 
out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with 
hasNext looking at the next value to see if it's null, and using that to decide 
if it has any more entries, and thus, (a,b,_s) becomes (a,b).

There's a boundary condition on the very first pick, which causes a (_s,a,b) to 
result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant 
unfiltered (_s,a,b) which orc breaks on.


  was:
Prashant and I discovered that the implementation of the wrapping Iterator in 
listLocatedStatus at 
https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
 is broken.

Basically, if you had files (a,b,_s) , with a filter that is supposed to filter 
out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with 
hasNext looking at the next value to see if it's null, and using that to decide 
if it has any more entries, and thus, (a,b,_s) becomes (a,b).

The problem with this approach, however, is that if you have an underlying 
(a,_s,b) and expect a (a,b) from it, you won't, because it translates to a 
(a,null,b), which then translates to a (a).

Furthermore, there's a boundary condition on the very first pick, which causes 
a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up 
with a resultant unfiltered (_s,a,b) which orc breaks on.



 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


[ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845931#comment-13845931
 ] 

Prasanth J commented on HIVE-6016:
--

There is a correction to the description. 

I think only (_s,a,b) is a problem. The logic seems not to apply the PathFilter 
for first file alone. For other cases it works fine as there is a while loop in 
next() which keeps iterating to next valid file by applying filter. So in case 
of (a,_s,b), first file is a for which no filter is applied. For the next file 
_s filter is applied and next becomes null. But the while continues to next 
valid file in which case its b. So finally only (a,b) is returned. The iterator 
will not return null under any case.

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 The problem with this approach, however, is that if you have an underlying 
 (a,_s,b) and expect a (a,b) from it, you won't, because it translates to a 
 (a,null,b), which then translates to a (a).
 Furthermore, there's a boundary condition on the very first pick, which 
 causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we 
 wind up with a resultant unfiltered (_s,a,b) which orc breaks on.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


 [ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6016:
---

Description: 
Prashant and I discovered that the implementation of the wrapping Iterator in 
listLocatedStatus at 
https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
 is broken.

Basically, if you had files (a,b,_s) , with a filter that is supposed to filter 
out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with 
hasNext looking at the next value to see if it's null, and using that to decide 
if it has any more entries, and thus, (a,b,_s) becomes (a,b).

There's a boundary condition on the very first pick, which causes a (_s,a,b) to 
result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant 
unfiltered (_s,a,b) which orc breaks on.

The effect of this bug is that Orc will not be able to read directories where 
there is a _SUCCESS file, say, as the first entry returned by the FileStatus.


  was:
Prashant and I discovered that the implementation of the wrapping Iterator in 
listLocatedStatus at 
https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
 is broken.

Basically, if you had files (a,b,_s) , with a filter that is supposed to filter 
out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with 
hasNext looking at the next value to see if it's null, and using that to decide 
if it has any more entries, and thus, (a,b,_s) becomes (a,b).

There's a boundary condition on the very first pick, which causes a (_s,a,b) to 
result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant 
unfiltered (_s,a,b) which orc breaks on.



 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


[ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845936#comment-13845936
 ] 

Prasanth J commented on HIVE-6016:
--

This should fix hcatalog unit test failure TestOrcDynamicPartitioned in hadoop2.

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one


 [ 
https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6015:
---

Labels: vectorization  (was: )

 vectorized logarithm produces results for 0 that are different from a 
 non-vectorized one
 

 Key: HIVE-6015
 URL: https://issues.apache.org/jira/browse/HIVE-6015
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
  Labels: vectorization
 Attachments: HIVE-6015.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


 [ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6016:
-

Status: Patch Available  (was: Open)

Making it as patch available for precommit tests.

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.


[ 
https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845946#comment-13845946
 ] 

Sushanth Sowmyan commented on HIVE-6016:


Patch looks good to me. +1.

Paging [~ashutoshc]/[~owen.omalley] for another review. :)

 Hadoop23Shims has a bug in listLocatedStatus impl.
 --

 Key: HIVE-6016
 URL: https://issues.apache.org/jira/browse/HIVE-6016
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0
Reporter: Sushanth Sowmyan
Assignee: Prasanth J
 Attachments: HIVE-6016.1.patch


 Prashant and I discovered that the implementation of the wrapping Iterator in 
 listLocatedStatus at 
 https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393
  is broken.
 Basically, if you had files (a,b,_s) , with a filter that is supposed to 
 filter out _s, we expect an output result of (a,b). Instead, we get 
 (a,b,null), with hasNext looking at the next value to see if it's null, and 
 using that to decide if it has any more entries, and thus, (a,b,_s) becomes 
 (a,b).
 There's a boundary condition on the very first pick, which causes a (_s,a,b) 
 to result in (_s,a,b), bypassing the filter, and thus, we wind up with a 
 resultant unfiltered (_s,a,b) which orc breaks on.
 The effect of this bug is that Orc will not be able to read directories where 
 there is a _SUCCESS file, say, as the first entry returned by the FileStatus.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Assigned] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution


 [ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-6010:
--

Assignee: Sergey Shelukhin

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
  Components: Tests, Vectorization
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: adding ANSI flag for hive

2013-12-11 Thread Thejas Nair

Having too many configs complicates things for the user, and also
complicates the code, and you also end up having many untested
combinations of config flags.
I think we should identify a bunch of non compatible changes that we
think are important, fix it in a branch and make a major version
release (say 1.x).

This is also related to HIVE-5875, where there is a discussion on
switching the defaults for some of the configs to more desirable
values, but non backward compatible values.

On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin
ser...@hortonworks.com wrote:
 Hi.

 There's recently been some discussion about data type changes in Hive
 (double to decimal), and result changes for special cases like division by
 zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an
 example; I am assuming ANSI SQL is meant).
 The latter are non-controversial (I guess), but for the former, performance
 may suffer and/or backward compat may be broken if Hive is brought in
 compliance.
 If fuller ANSI compat is sought in the future, there may be some even
 hairier issues such as double-quoted identifiers.

 In light of that, and also following MySQL, I wonder if we should add a
 flag, or set of flags, to HIVE to be able to force ANSI compliance.
 When this/ese flag/s is/are not set, for example, int/int division could
 return double for backward compat/perf, vectorization can skip the special
 case handling for division by zero/etc., etc.
 Wdyt?

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: doc on predicate pushdown in joins

2013-12-11 Thread Lefty Leverenz

 Maybe we should remove the section on Hive Implementation here.
 It is in the Design doc; this information only concerns developers.

But this is the Design doc (unless there's another one somewhere -- maybe
attached to a JIRA ticket?) and it's in the Resources for Contributors part
of the wiki, so it seems appropriate to me.  I'll delete the implementation
section if that's your preference.

Here are the links again, with fixes:

   - Design Docshttps://cwiki.apache.org/confluence/display/Hive/DesignDocs
(bottom
   of list)
   - Predicate Pushdown
Ruleshttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-PredicatePushdownRules


Speaking of JIRA tickets, is there one for this and should I add any
version information?

-- Lefty


On Wed, Dec 11, 2013 at 7:59 AM, Harish Butani hbut...@hortonworks.comwrote:

 getQualifiedAliases is a private method in JoinPPD.

 Maybe we should remove the section on Hive Implementation here. It is in
 the Design doc; this information only concerns developers.

 regards,
 Harish.


 On Dec 11, 2013, at 3:05 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 Happy to fix the sentence and the link.  I pointed out the name change
 just so you would review it, so please don't apologize!

 One more question:  why am I not finding getQualifiedAliases() in the
 SemanticAnalyzer class?  It turns up in OpProcFactory.java with javadoc
 comments, but I can't find it anywhere in the API docs -- not even in the
 index (Hive 0.12.0 API http://hive.apache.org/docs/r0.12.0/api/):

 *getQMap()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html#getQMap()
  -
 Method in class 
 org.apache.hadoop.hive.ql.QTestUtilhttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html
  
 *getQualifiedName()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.html#getQualifiedName()
  -
 Method in class 
 org.apache.hadoop.hive.serde2.typeinfo.TypeInfohttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.html
  String
 representing the qualified type 
 name.*getQualifiers()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiers()
  -
 Method in class 
 org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html
  
 *getQualifiersSize()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiersSize()
  -
 Method in class 
 org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html
 Most mysterious.

 -- Lefty


 On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.comwrote:

 I can see why you would rename.

 But this sentence is not correct:
 'Hive enforces the predicate pushdown rules by these methods in the
 SemanticAnalyzer and JoinPPD classes:'

 It should be:
 Hive enforces the rules by these methods in the SemanticAnalyzer and
 JoinPPD classes:

 (The implementation involves both predicate pushdown and analyzing join
 conditions)
 Sorry about this.

 So the link should say 'Hive Outer Join Behavior'

 regards,
 Harish.


 On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 How's this?  Hive 
 Implementationhttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-HiveImplementation

 Also, I moved the link on the Design Docs 
 pagehttps://cwiki.apache.org/confluence/display/Hive/DesignDocsfrom
 *Proposed* to *Other*.  (It's called SQL Outer Join Predicate Pushdown
 Ruleshttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior 
 which
 doesn't match the title, but seems okay because it's more descriptive.)

 -- Lefty


 On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani 
 hbut...@hortonworks.comwrote:

 You are correct, it is plural.

 regards,
 Harish.

 On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com
 wrote:

 Okay, then monospace with () after the method name is a good way to
 show them:  parseJoinCondition() and getQualifiedAlias() ... but I only
 found the latter pluralized, instead of singular, so should it be
 getQualifiedAliases() or am I missing something?

 trunk *grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'*

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   *
 the comments for getQualifiedAliases function.

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:
  SetString aliases = getQualifiedAliases((JoinOperator) nd, owi

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
// be pushed down per getQualifiedAliases

 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
  private SetString getQualifiedAliases(JoinOperator op, RowResolver
 rr) {



 -- Lefty


 On Mon, Dec 9, 2013 at

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845963#comment-13845963
 ] 

Hive QA commented on HIVE-6013:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618322/HIVE-6013.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 4768 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedId_alter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedId_skew
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedId_smb
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_columns
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/616/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/616/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618322

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6014) Stage ids differ in the tez branch


[ 
https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845975#comment-13845975
 ] 

Hive QA commented on HIVE-6014:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12618323/HIVE-6014.1.patch

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/617/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/617/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-617/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java'
Reverted 'itests/qtest/pom.xml'
Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/assembly/target shims/0.20S/target shims/0.23/target shims/common/target 
shims/common-secure/target packaging/target hbase-handler/target 
testutils/target jdbc/target metastore/target itests/target 
itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target 
itests/hive-unit/target itests/custom-serde/target itests/util/target 
hcatalog/target hcatalog/storage-handlers/hbase/target 
hcatalog/server-extensions/target hcatalog/core/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen 
service/target contrib/target serde/target beeline/target odbc/target 
cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/results/clientpositive/quotedid_alter.q.out 
ql/src/test/results/clientpositive/quotedid_partition.q.out 
ql/src/test/results/clientpositive/quotedid_basic.q.out 
ql/src/test/results/clientpositive/quotedid_skew.q.out 
ql/src/test/results/clientpositive/quotedId_smb.q.out 
ql/src/test/queries/clientpositive/quotedId_alter.q 
ql/src/test/queries/clientpositive/quotedId_skew.q 
ql/src/test/queries/clientpositive/quotedid_basic.q 
ql/src/test/queries/clientpositive/quotedid_partition.q 
ql/src/test/queries/clientpositive/quotedId_smb.q
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1550329.

At revision 1550329.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12618323

 Stage ids differ in the tez branch
 --

 Key: HIVE-6014
 URL: https://issues.apache.org/jira/browse/HIVE-6014
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-6014.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2


 [ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6002:
-

Attachment: HIVE-6002.1.patch

Bumped the ORC write version number to 0.12.1. [~owen.omalley]  Can you please 
review this change?

 Create new ORC write version to address the changes to RLEv2
 

 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-6002.1.patch


 HIVE-5994 encodes large negative big integers wrongly. This results in loss 
 of original data that is being written using orc write version 0.12. Bump up 
 the version number to differentiate the bad writes by 0.12 and the good 
 writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2


 [ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6002:
-

Attachment: (was: HIVE-6002.1.patch)

 Create new ORC write version to address the changes to RLEv2
 

 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile

 HIVE-5994 encodes large negative big integers wrongly. This results in loss 
 of original data that is being written using orc write version 0.12. Bump up 
 the version number to differentiate the bad writes by 0.12 and the good 
 writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2


 [ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6002:
-

Attachment: HIVE-6002.1.patch

 Create new ORC write version to address the changes to RLEv2
 

 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-6002.1.patch


 HIVE-5994 encodes large negative big integers wrongly. This results in loss 
 of original data that is being written using orc write version 0.12. Bump up 
 the version number to differentiate the bad writes by 0.12 and the good 
 writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5975) [WebHCat] templeton mapreduce job failed if provide define parameters

2013-12-11 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845986#comment-13845986
 ] 

Eugene Koifman commented on HIVE-5975:
--

+1

 [WebHCat] templeton mapreduce job failed if provide define parameters
 ---

 Key: HIVE-5975
 URL: https://issues.apache.org/jira/browse/HIVE-5975
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0, 0.13.0
Reporter: shanyu zhao
Assignee: shanyu zhao
 Attachments: hive-5975.2.patch, hive-5975.patch


 Trying to submit a mapreduce job through templeton failed:
 curl -k -u user:pass -d user.name=user -d define=JobName=MRPiJob -d class=pi 
 -d arg=16 -d arg=100 -d jar=hadoop-mapreduce-examples.jar 
 https://xxx/templeton/v1/mapreduce/jar
 The error message is:
 Usage: org.apache.hadoop.examples.QuasiMonteCarlo nMaps nSamples
  Generic options supported are
  -conf configuration file specify an application configuration file
  -D property=value use value for given property
  -fs local|namenode:port specify a namenode
  -jt local|jobtracker:port specify a job tracker
  -files comma separated list of files specify comma separated files to be 
 copied to the map reduce cluster
  -libjars comma separated list of jars specify comma separated jar files to 
 include in the classpath.
  -archives comma separated list of archives specify comma separated 
 archives to be unarchived on the compute machines.
 The general command line syntax is
  bin/hadoop command [genericOptions] [commandOptions]
 templeton: job failed with exit code 2
 Note that if we remove the define parameter it works fine.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive

2013-12-11 Thread Eric Hanson (JIRA)

Eric Hanson created HIVE-6017:
-

 Summary: Contribute Decimal128 high-performance decimal(p, s) 
package from Microsoft to Hive
 Key: HIVE-6017
 URL: https://issues.apache.org/jira/browse/HIVE-6017
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Eric Hanson


Contribute the Decimal128 high-performance decimal package developed by 
Microsoft to Hive. This was originally written for Microsoft PolyBase by 
Hideaki Kimura.

This code is about 8X more efficient than Java BigDecimal for typical 
operations. It uses a finite (128 bit) precision and can handle up to 
decimal(38, X). It is also mutable so you can change the contents of an 
existing object. This helps reduce the cost of new() and garbage collection.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6002) Create new ORC write version to address the changes to RLEv2


[ 
https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845998#comment-13845998
 ] 

Prasanth J commented on HIVE-6002:
--

Do we need to discard 0.12 version completely? 0.12 version is not valid 
anymore. But config option still allows users to specify 0.12 version. In 
which case, should can we forcefully bump version to 0.12.1?

 Create new ORC write version to address the changes to RLEv2
 

 Key: HIVE-6002
 URL: https://issues.apache.org/jira/browse/HIVE-6002
 Project: Hive
  Issue Type: Bug
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-6002.1.patch


 HIVE-5994 encodes large negative big integers wrongly. This results in loss 
 of original data that is being written using orc write version 0.12. Bump up 
 the version number to differentiate the bad writes by 0.12 and the good 
 writes by this new version (0.12.1?).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism


 [ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5936:


Attachment: HIVE-5936.10.patch.txt

Fixed error message

 analyze command failing to collect stats with counter mechanism
 ---

 Key: HIVE-5936
 URL: https://issues.apache.org/jira/browse/HIVE-5936
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, 
 HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, 
 HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, 
 HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt


 With counter mechanism, MR job is successful, but StatsTask on client fails 
 with NPE.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism


 [ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5936:


Status: Open  (was: Patch Available)

 analyze command failing to collect stats with counter mechanism
 ---

 Key: HIVE-5936
 URL: https://issues.apache.org/jira/browse/HIVE-5936
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, 
 HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, 
 HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, 
 HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt


 With counter mechanism, MR job is successful, but StatsTask on client fails 
 with NPE.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism


 [ 
https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5936:


Status: Patch Available  (was: Open)

 analyze command failing to collect stats with counter mechanism
 ---

 Key: HIVE-5936
 URL: https://issues.apache.org/jira/browse/HIVE-5936
 Project: Hive
  Issue Type: Bug
  Components: Statistics
Affects Versions: 0.13.0
Reporter: Ashutosh Chauhan
Assignee: Navis
 Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, 
 HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, 
 HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, 
 HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt


 With counter mechanism, MR job is successful, but StatsTask on client fails 
 with NPE.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: doc on predicate pushdown in joins

2013-12-11 Thread Harish Butani

I see.  Let's leave it in.

This is old code, hard to attribute to  jiras:
- The PPD code comes from: HIVE-279, HIVE-2337
- I cannot tell when the join condition parsing code was added.

regards,
Harish.

On Dec 11, 2013, at 5:17 PM, Lefty Leverenz leftylever...@gmail.com wrote:

  Maybe we should remove the section on Hive Implementation here. 
  It is in the Design doc; this information only concerns developers.
 
 But this is the Design doc (unless there's another one somewhere -- maybe 
 attached to a JIRA ticket?) and it's in the Resources for Contributors part 
 of the wiki, so it seems appropriate to me.  I'll delete the implementation 
 section if that's your preference. 
 
 Here are the links again, with fixes: 
 Design Docs (bottom of list)
 Predicate Pushdown Rules 
 Speaking of JIRA tickets, is there one for this and should I add any version 
 information? 
 
 -- Lefty
 
 
 On Wed, Dec 11, 2013 at 7:59 AM, Harish Butani hbut...@hortonworks.com 
 wrote:
 getQualifiedAliases is a private method in JoinPPD.
 
 Maybe we should remove the section on Hive Implementation here. It is in the 
 Design doc; this information only concerns developers.
 
 regards,
 Harish.
 
 
 On Dec 11, 2013, at 3:05 AM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 Happy to fix the sentence and the link.  I pointed out the name change just 
 so you would review it, so please don't apologize!
 
 One more question:  why am I not finding getQualifiedAliases() in the 
 SemanticAnalyzer class?  It turns up in OpProcFactory.java with javadoc 
 comments, but I can't find it anywhere in the API docs -- not even in the 
 index (Hive 0.12.0 API): 
 
 getQMap() - Method in class org.apache.hadoop.hive.ql.QTestUtil
  
 getQualifiedName() - Method in class 
 org.apache.hadoop.hive.serde2.typeinfo.TypeInfo
 String representing the qualified type name.
 getQualifiers() - Method in class 
 org.apache.hive.service.cli.thrift.TTypeQualifiers
  
 getQualifiersSize() - Method in class 
 org.apache.hive.service.cli.thrift.TTypeQualifiers
 
 Most mysterious.
 
 -- Lefty
 
 
 On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.com 
 wrote:
 I can see why you would rename.
 
 But this sentence is not correct:
 'Hive enforces the predicate pushdown rules by these methods in the 
 SemanticAnalyzer and JoinPPD classes:'
 
 It should be:
 Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD 
 classes:
 
 (The implementation involves both predicate pushdown and analyzing join 
 conditions)
 Sorry about this.
 
 So the link should say 'Hive Outer Join Behavior'
 
 regards,
 Harish.
 
 
 On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 How's this?  Hive Implementation
 
 Also, I moved the link on the Design Docs page from Proposed to Other.  
 (It's called SQL Outer Join Predicate Pushdown Rules which doesn't match 
 the title, but seems okay because it's more descriptive.)
 
 -- Lefty
 
 
 On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.com 
 wrote:
 You are correct, it is plural.
 
 regards,
 Harish.
 
 On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 Okay, then monospace with () after the method name is a good way to show 
 them:  parseJoinCondition() and getQualifiedAlias() ... but I only found 
 the latter pluralized, instead of singular, so should it be 
 getQualifiedAliases() or am I missing something?
 
 trunk grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221:   * 
 the comments for getQualifiedAliases function.
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230:  
 SetString aliases = getQualifiedAliases((JoinOperator) nd, owi
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242:
 // be pushed down per getQualifiedAliases
 ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471:
 private SetString getQualifiedAliases(JoinOperator op, RowResolver rr) {
 
 
 -- Lefty
 
 
 On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani hbut...@hortonworks.com 
 wrote:
 Looks good.  Thanks for doing this.
 
 Minor point:
 
 Rule 1: During QBJoinTree construction in Plan Gen, the parse Join 
 Condition logic applies this rule.
 Rule 2: During JoinPPD (Join Predicate Pushdown) the get Qualified Alias 
 logic applies this rule.
 
 FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the 
 SemanticAnalyzer and JoinPPD classes respectively. 
 Writing these as separate words maybe confusing. You are better judge of 
 how to represent this(quoted/bold etc.)
 
 regards,
 Harish.
 
 
 On Dec 9, 2013, at 1:52 AM, Lefty Leverenz leftylever...@gmail.com wrote:
 
 The Outer Join Behavior
 wikidochttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavioris
 
 done, with links from the Design
 Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs page 
 and
 the Joins

[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution

[
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846029#comment-13846029
]

Sergey Shelukhin commented on HIVE-6010:

I looked at CliDriver generation/code/flow... the plan is as such (this can
also be used for other stuff later if needed).

There will be new CliDriver template, called TestCompareCliDriver, with
separate set of .q files.
Unlike normal CliDriver, it will not use .out files; instead, there will be
multiple .qv (query version) initialization files; I haven't decided yet
whether these should be a set per query (q file), or a set applied to all
queries. The latter is simpler and solves the problem for vectorization, but
the former may make sense for other things, esp. if we need to compare more
things, Nqv x Nq combinations to run will quickly become ugly. Perhaps
per-query qv files can be added when needed.
The test, for each of its q files, will concatenate all the requisite qv files
in turn with the q file, run each of the resulting queries w/different output
files, and diff the outputs with each other. It will fail if they don't match.

So, for vectorization we can have some simple queries (arithmetics, functions,
etc.), with qv files being one-liners to enable and disable vectorization.

[~ehans] [~jnp] opinions?

create a test that would ensure vectorization produces same results as
non-vectorized execution
---

Key: HIVE-6010
URL: https://issues.apache.org/jira/browse/HIVE-6010
Project: Hive
Issue Type: Test
Components: Tests, Vectorization
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

So as to ensure that vectorization is not forgotten when changes are made to
things. Obviously it would not be viable to have a bulletproof test, but at
least a subset of operations can be verified.

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution


[ 
https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846030#comment-13846030
 ] 

Sergey Shelukhin commented on HIVE-6010:


Actually, this can also be used instead of VerifyingObjectStore to verify 
MetaStoreDirectSql matches JDO, come thing of it. Will reduce the coverage but 
also remove the crutch from that part of code.

 create a test that would ensure vectorization produces same results as 
 non-vectorized execution
 ---

 Key: HIVE-6010
 URL: https://issues.apache.org/jira/browse/HIVE-6010
 Project: Hive
  Issue Type: Test
  Components: Tests, Vectorization
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 So as to ensure that vectorization is not forgotten when changes are made to 
 things. Obviously it would not be viable to have a bulletproof test, but at 
 least a subset of operations can be verified.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result


[ 
https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846033#comment-13846033
 ] 

Thejas M Nair commented on HIVE-5996:
-

I am curious what the datatype for sum(l) is in mysql, where l is a bigint. 
Is it also using decimal ?


 Query for sum of a long column of a table with only two rows produces wrong 
 result
 --

 Key: HIVE-5996
 URL: https://issues.apache.org/jira/browse/HIVE-5996
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5996.patch


 {code}
 hive desc test2;
 OK
 l bigint  None
 hive select * from test2; 
 OK
 666
 555
 hive select sum(l) from test2;
 OK
 -6224521851487329395
 {code}
 It's believed that a wrap-around error occurred. It's surprising that it 
 happens only with two rows. Same query in MySQL returns:
 {code}
 mysql select sum(l) from test;
 +--+
 | sum(l)   |
 +--+
 | 1221 |
 +--+
 1 row in set (0.00 sec)
 {code}
 Hive should accommodate large number of rows. Overflowing with only two rows 
 is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6018) FetchTask should not reference metastore classes

Navis created HIVE-6018:
---

 Summary: FetchTask should not reference metastore classes
 Key: HIVE-6018
 URL: https://issues.apache.org/jira/browse/HIVE-6018
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


The below code parts in PartitionDesc throws NoClassDefFounError sometimes  in 
execution.
{noformat}
public Deserializer getDeserializer() {
try {
  return MetaStoreUtils.getDeserializer(Hive.get().getConf(), 
getProperties());
} catch (Exception e) {
  return null;
}
  }
{noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6018) FetchTask should not reference metastore classes


 [ 
https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6018:


Status: Patch Available  (was: Open)

 FetchTask should not reference metastore classes
 

 Key: HIVE-6018
 URL: https://issues.apache.org/jira/browse/HIVE-6018
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-6018.1.patch.txt


 The below code parts in PartitionDesc throws NoClassDefFounError sometimes  
 in execution.
 {noformat}
 public Deserializer getDeserializer() {
 try {
   return MetaStoreUtils.getDeserializer(Hive.get().getConf(), 
 getProperties());
 } catch (Exception e) {
   return null;
 }
   }
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6018) FetchTask should not reference metastore classes