[jira] [Commented] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031801#comment-14031801
 ] 

Hive QA commented on HIVE-7228:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650319/HIVE-7228.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/469/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/469/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-469/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650319

 StreamPrinter should be joined to calling thread 
 -

 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Attachments: HIVE-7228.patch


 ISSUE:
 StreamPrinter class is used for connecting an input stream (connected to 
 output) of a process with the output stream of a Session 
 (CliSessionState/SessionState class)
 It acts as a pipe between the two and transfers data from input stream to the 
 output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 
 From some of the current usages of this class, I noticed that the calling 
 threads do not wait for the transfer operation to be completed. That is, the 
 calling thread does not join the SteamPrinter threads.
 The calling thread would move forward thinking that the respective output 
 stream already has the data needed. But, it is not always the right 
 assumption since, it might happen that
 the StreamPrinter thread did not finish execution by the time it was expected 
 by the calling thread.
 FIX:
 To ensure that calling thread waits for the StreamPrinter threads to 
 complete, StreamPrinter threads are joined to calling thread.
 Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
 (like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez

2014-06-15 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031804#comment-14031804
 ] 

Gunther Hagleitner commented on HIVE-7212:
--

No new failures.

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch, HIVE-7212.3.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6385) UDF degrees() doesn't take decimal as input

2014-06-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031811#comment-14031811
 ] 

Lefty Leverenz commented on HIVE-6385:
--

[~lars_francke] documented this in the wiki in February 2014 (and I added 
version information in March):

* [UDFs -- Mathematical Functions | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-MathematicalFunctions]
* [doc diffs for HIVE-6385 (degrees) and other Hive 0.13.0 jiras | 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27362046selectedPageVersions=79selectedPageVersions=77]

 UDF degrees() doesn't take decimal as input
 ---

 Key: HIVE-6385
 URL: https://issues.apache.org/jira/browse/HIVE-6385
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-6385.patch


 HIVE-6246 and HIVE-6327 added decimal support in most of the mathematical 
 UDFs, including radians(). However, such support is still missing for UDF 
 degrees(). This fills the gap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6385) UDF degrees() doesn't take decimal as input

2014-06-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031814#comment-14031814
 ] 

Lefty Leverenz commented on HIVE-6385:
--

Also documented in Data Types:

* [Hive Data Types -- Mathematical UDFs | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-MathematicalUDFs]
 

 UDF degrees() doesn't take decimal as input
 ---

 Key: HIVE-6385
 URL: https://issues.apache.org/jira/browse/HIVE-6385
 Project: Hive
  Issue Type: Improvement
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-6385.patch


 HIVE-6246 and HIVE-6327 added decimal support in most of the mathematical 
 UDFs, including radians(). However, such support is still missing for UDF 
 degrees(). This fills the gap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2014-06-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-3976:
-

Labels:   (was: TODOC13)

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor, Types
Affects Versions: 0.11.0
Reporter: Mark Grover
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, 
 HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, 
 HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, 
 HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff


 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2014-06-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031816#comment-14031816
 ] 

Lefty Leverenz commented on HIVE-3976:
--

[~lars_francke] documented this in the wiki:

* [Hive Data Types -- Decimals | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-Decimals]
* [doc diffs for HIVE-3976 | 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27838462selectedPageVersions=26selectedPageVersions=25]

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor, Types
Affects Versions: 0.11.0
Reporter: Mark Grover
Assignee: Xuefu Zhang
 Fix For: 0.13.0

 Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, 
 HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, 
 HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, 
 HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff


 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2014-06-15 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031824#comment-14031824
 ] 

Lefty Leverenz commented on HIVE-1466:
--

[~prasadm] documented this in the DDL and DML wikidocs:

* [DDL:  Create Table (row_format) | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTable]
* [DDL:  Row Format, Storage Format, and SerDe | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormat,StorageFormat,andSerDe]
** [DDL doc diffs for HIVE-1466 | 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27362034selectedPageVersions=72selectedPageVersions=71]
* [DML:  Writing data into the filesystem from queries | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries]
** [DML doc diffs for HIVE-1466 | 
https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=27362036selectedPageVersions=21selectedPageVersions=20]

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
  Labels: TODOC13
 Fix For: 0.13.0

 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-1466) Add NULL DEFINED AS to ROW FORMAT specification

2014-06-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-1466:
-

Labels:   (was: TODOC13)

 Add NULL DEFINED AS to ROW FORMAT specification
 ---

 Key: HIVE-1466
 URL: https://issues.apache.org/jira/browse/HIVE-1466
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Adam Kramer
Assignee: Prasad Mujumdar
 Fix For: 0.13.0

 Attachments: HIVE-1466.1.patch, HIVE-1466.2.patch


 NULL values are passed to transformers as a literal backslash and a literal 
 N. NULL values are saved when INSERT OVERWRITing LOCAL DIRECTORies as NULL. 
 This is inconsistent.
 The ROW FORMAT specification of tables should be able to specify the manner 
 in which a null character is represented. ROW FORMAT NULL DEFINED AS '\N' or 
 '\003' or whatever should apply to all instances of table export and saving.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Documentation Policy

2014-06-15 Thread Lefty Leverenz

 Should we create JIRA for these so that the work to be done on these does
 not get lost?


... or should we schedule a doc blitz to take care of as many as possible
right away?  (Inclusive OR.)

-- Lefty


On Sat, Jun 14, 2014 at 10:35 PM, kulkarni.swar...@gmail.com 
kulkarni.swar...@gmail.com wrote:

 A few more from older releases:

 *0.10*:

 https://issues.apache.org/jira/browse/HIVE-2397?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC10%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

 *0.11:*

 https://issues.apache.org/jira/browse/HIVE-3073?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC11%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

 *0.12:*

 https://issues.apache.org/jira/browse/HIVE-5161?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC12%20AND%20status%20in%20(Resolved%2C%20Closed)%20ORDER%20BY%20priority%20DESC

 Should we create  JIRA for these so that the work to be done on these does
 not get lost?



 On Fri, Jun 13, 2014 at 5:59 PM, Lefty Leverenz leftylever...@gmail.com
 wrote:

  Agreed, deleting TODOC## simplifies the labels field, so we should just
 use
  comments to keep track of docs done.
 
  Besides, doc tasks can get complicated -- my gmail inbox has a few
 messages
  with simultaneous done and to-do labels -- so comments are best for
  tracking progress.  Also, as Szehon noticed, links in the comments make
 it
  easy to find the docs.
 
  +1 on (a):  delete TODOCs when done; don't add any new labels.
 
  -- Lefty
 
 
  On Fri, Jun 13, 2014 at 1:31 PM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
   +1 on deleting the TODOC tag as I think it's assumed by default that
 once
   an enhancement is done, it will be doc'ed. We may consider adding an
   additional docdone tag but I think we can instead just wait for a +1
  from
   the contributor that the documentation is satisfactory (and assume a
   implicit +1 for no reply) before deleting the TODOC tag.
  
  
   On Fri, Jun 13, 2014 at 1:32 PM, Szehon Ho sze...@cloudera.com
 wrote:
  
Yea, I'd imagine the TODOC tag pollutes the query of TODOC's and
  confuses
the state of a JIRA, so its probably best to remove it.
   
The idea of docdone is to query what docs got produced and needs
   review?
 It might be nice to have a tag for that, to easily signal to
  contributor
or interested parties to take a look.
   
On a side note, I already find very helpful your JIRA comments with
  links
to doc-wikis, both to inform the contributor and just as reference
 for
others.  Thanks again for the great work.
   
   
On Fri, Jun 13, 2014 at 1:33 AM, Lefty Leverenz 
  leftylever...@gmail.com
   
wrote:
   
 One more question:  what should we do after the documentation is
 done
for a
 JIRA ticket?

 (a) Just remove the TODOC## label.
 (b) Replace TODOC## with docdone (no caps, no version number).
 (c) Add a docdone label but keep TODOC##.
 (d) Something else.


 -- Lefty


 On Thu, Jun 12, 2014 at 12:54 PM, Brock Noland br...@cloudera.com
 
wrote:

  Thank you guys! This is great work.
 
 
  On Wed, Jun 11, 2014 at 6:20 PM, kulkarni.swar...@gmail.com 
  kulkarni.swar...@gmail.com wrote:
 
   Going through the issues, I think overall Lefty did an awesome
  job
  catching
   and documenting most of them in time. Following are some of the
   0.13
 and
   0.14 ones which I found which either do not have documentation
 or
have
   outdated one and probably need one to be consumeable.
  Contributors,
 feel
   free to remove the label if you disagree.
  
   *TODOC13:*
  
  
 

   
  
 
 https://issues.apache.org/jira/browse/HIVE-6827?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC13%20AND%20status%20in%20(Resolved%2C%20Closed)
  
   *TODOC14:*
  
  
 

   
  
 
 https://issues.apache.org/jira/browse/HIVE-6999?jql=project%20%3D%20HIVE%20AND%20labels%20%3D%20TODOC14%20AND%20status%20in%20(Resolved%2C%20Closed)
  
   I'll continue digging through the queue going backwards to 0.12
  and
 0.11
   and see if I find similar stuff there as well.
  
  
  
   On Wed, Jun 11, 2014 at 10:36 AM, kulkarni.swar...@gmail.com 
   kulkarni.swar...@gmail.com wrote:
  
 Feel free to label such jiras with this keyword and ask the
   contributors
for more information if you need any.
   
Cool. I'll start chugging through the queue today adding
 labels
   as
 apt.
   
   
On Tue, Jun 10, 2014 at 9:45 PM, Thejas Nair 
the...@hortonworks.com
 
wrote:
   
 Shall we lump 0.13.0 and 0.13.1 doc tasks as TODOC13?
Sounds good to me.
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the
 

[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031827#comment-14031827
 ] 

Hive QA commented on HIVE-5771:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650333/HIVE-5771.12.patch

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 5615 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_25
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_views
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cross_product_check_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_exists
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/470/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/470/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-470/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650333

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
 HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
 HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, 
 HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5607) Hive fails to parse the % (mod) sign after brackets.

2014-06-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-5607:
-

Labels: TODOC14  (was: )

 Hive fails to parse the % (mod) sign after brackets.
 --

 Key: HIVE-5607
 URL: https://issues.apache.org/jira/browse/HIVE-5607
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: dima machlin
Assignee: Xuefu Zhang
Priority: Minor
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-5607.1.patch, HIVE-5607.patch


 the scenario :
 create table t(a int);
 select * from t order by (a)%7;
 will fail with the following exception :
 FAILED: ParseException line 1:28 mismatched input '%' expecting EOF near ')'
 I must mention that this *does* work in 0.7.1 and doesn't work in 0.10



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6810) Provide example and update docs to show use of back tick when doing SHOW GRANT

2014-06-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6810:
-

Labels: TODOC12  (was: )

 Provide example and update docs to show use of back tick when doing SHOW GRANT
 --

 Key: HIVE-6810
 URL: https://issues.apache.org/jira/browse/HIVE-6810
 Project: Hive
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 0.12.0
Reporter: Udai Kiran Potluri
  Labels: TODOC12

 The Docs at 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization#LanguageManualAuthorization-ViewingGrantedPrivileges
 Do not show an example or mention need to use back tick (`) character 
 especially when there are special characters. Per HIVE-2074, all GRANT/REVOKE 
 need a back tick character when using -. Similarly, with the SHOW GRANT 
 USER if the user id has a ..
 For eg: SHOW GRANT USER `abc.xyz` ON TABLE mock_opt; 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6684) Beeline does not accept comments that are preceded by spaces

2014-06-15 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-6684:
-

Labels: TODOC14  (was: )

 Beeline does not accept comments that are preceded by spaces
 

 Key: HIVE-6684
 URL: https://issues.apache.org/jira/browse/HIVE-6684
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.10.0
Reporter: Jeremy Beard
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-6684.1.patch, HIVE-6684.2.patch


 Beeline throws an error if single-line comments are indented with spaces. 
 This works in the embedded Hive CLI.
 For example:
 SELECT
-- this is the field we want
field
 FROM
table;
 Error: Error while processing statement: FAILED: ParseException line 1:71 
 cannot recognize input near 'EOF' 'EOF' 'EOF' in select clause 
 (state=42000,code=4)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-15 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Attachment: HIVE-7159.5.patch

.5 fixes some of the failures.

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-15 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Status: Patch Available  (was: Open)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-15 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-7159:
-

Status: Open  (was: Patch Available)

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-15 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031858#comment-14031858
 ] 

Gunther Hagleitner commented on HIVE-7159:
--

+1 once the tests pass

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7159) For inner joins push a 'is not null predicate' to the join sources for every non nullSafe join condition

2014-06-15 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031859#comment-14031859
 ] 

Gunther Hagleitner commented on HIVE-7159:
--

rb: https://reviews.apache.org/r/22553/

 For inner joins push a 'is not null predicate' to the join sources for every 
 non nullSafe join condition
 

 Key: HIVE-7159
 URL: https://issues.apache.org/jira/browse/HIVE-7159
 Project: Hive
  Issue Type: Bug
Reporter: Harish Butani
Assignee: Harish Butani
 Attachments: HIVE-7159.1.patch, HIVE-7159.2.patch, HIVE-7159.3.patch, 
 HIVE-7159.4.patch, HIVE-7159.5.patch


 A join B on A.x = B.y
 can be transformed to
 (A where x is not null) join (B where y is not null) on A.x = B.y
 Apart from avoiding shuffling null keyed rows it also avoids issues with 
 reduce-side skew when there are a lot of null values in the data.
 Thanks to [~gopalv] for the analysis and coming up with the solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031894#comment-14031894
 ] 

Hive QA commented on HIVE-6584:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650362/HIVE-6584.4.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_correlationoptimizer2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/472/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-472/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650362

 Add HiveHBaseTableSnapshotInputFormat
 -

 Key: HIVE-6584
 URL: https://issues.apache.org/jira/browse/HIVE-6584
 Project: Hive
  Issue Type: Improvement
  Components: HBase Handler
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
 Fix For: 0.14.0

 Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, 
 HIVE-6584.3.patch, HIVE-6584.4.patch


 HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. 
 This allows a MR job to consume a stable, read-only view of an HBase table 
 directly off of HDFS. Bypassing the online region server API provides a nice 
 performance boost for the full scan. HBASE-10642 is backporting that feature 
 to 0.94/0.96 and also adding a {{mapred}} implementation. Once that's 
 available, we should add an input format. A follow-on patch could work out 
 how to integrate this functionality into the StorageHandler, similar to how 
 HIVE-6473 integrates the HFileOutputFormat into existing table definitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7233) File hive-hwi-0.13.1 not found on lib folder

2014-06-15 Thread Dinh Hoang Luong (JIRA)
Dinh Hoang Luong created HIVE-7233:
--

 Summary: File hive-hwi-0.13.1 not found on lib folder
 Key: HIVE-7233
 URL: https://issues.apache.org/jira/browse/HIVE-7233
 Project: Hive
  Issue Type: New Feature
  Components: Web UI
Affects Versions: 0.13.1
Reporter: Dinh Hoang Luong


I found that: 
line 27 of file 
.../apache-hive-0.13.1-sr/hwi/pom.xml with 
packagejar/package instead of packagewar/package

sorry my english is bad. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-15 Thread David Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031977#comment-14031977
 ] 

David Chen commented on HIVE-7094:
--

See HIVE-7230 for the patch for adding the Eclipse formatter file.

 Separate out static/dynamic partitioning code in FileRecordWriterContainer
 --

 Key: HIVE-7094
 URL: https://issues.apache.org/jira/browse/HIVE-7094
 Project: Hive
  Issue Type: Sub-task
  Components: HCatalog
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7094.1.patch, HIVE-7094.3.patch


 There are two major places in FileRecordWriterContainer that have the {{if 
 (dynamicPartitioning)}} condition: the constructor and write().
 This is the approach that I am taking:
 # Move the DP and SP code into two subclasses: 
 DynamicFileRecordWriterContainer and StaticFileRecordWriterContainer.
 # Make FileRecordWriterContainer an abstract class that contains the common 
 code for both implementations. For write(), FileRecordWriterContainer will 
 call an abstract method that will provide the local RecordWriter, 
 ObjectInspector, SerDe, and OutputJobInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7210) NPE with No plan file found when running Driver instances on multiple threads

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031990#comment-14031990
 ] 

Hive QA commented on HIVE-7210:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650358/HIVE-7210.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.conf.TestHiveConf.testConfProperties
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/474/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/474/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-474/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650358

 NPE with No plan file found when running Driver instances on multiple 
 threads
 ---

 Key: HIVE-7210
 URL: https://issues.apache.org/jira/browse/HIVE-7210
 Project: Hive
  Issue Type: Bug
Reporter: Jason Dere
Assignee: Gunther Hagleitner
 Attachments: HIVE-7210.1.patch


 Informatica has a multithreaded application running multiple instances of 
 CLIDriver.  When running concurrent queries they sometimes hit the following 
 error:
 {noformat}
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.hive.ql.exec.Utilities: No plan file found: 
 hdfs://ICRHHW21NODE1:8020/tmp/hive-qamercury/hive_2014-05-30_10-24-57_346_890014621821056491-2/-mr-10002/6169987c-3263-4737-b5cb-38daab882afb/map.xml
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :INFO 
 org.apache.hadoop.mapreduce.JobSubmitter: Cleaning up the staging area 
 /tmp/hadoop-yarn/staging/qamercury/.staging/job_1401360353644_0078
 2014-05-30 10:24:59 pool-10-thread-1 INFO: Hadoop_Native_Log :ERROR 
 org.apache.hadoop.hive.ql.exec.Task: Job Submission failed with exception 
 'java.lang.NullPointerException(null)'
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:255)
 at 
 org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:271)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
 at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
 at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
 at 
 org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
 at 
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
 at 
 org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
 at 
 org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
 at 
 org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
 at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
 at 
 org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
 at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
 at 
 

[jira] [Updated] (HIVE-7228) StreamPrinter should be joined to calling thread

2014-06-15 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7228:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Pankit!

 StreamPrinter should be joined to calling thread 
 -

 Key: HIVE-7228
 URL: https://issues.apache.org/jira/browse/HIVE-7228
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Pankit Thapar
Assignee: Pankit Thapar
Priority: Minor
 Fix For: 0.14.0

 Attachments: HIVE-7228.patch


 ISSUE:
 StreamPrinter class is used for connecting an input stream (connected to 
 output) of a process with the output stream of a Session 
 (CliSessionState/SessionState class)
 It acts as a pipe between the two and transfers data from input stream to the 
 output stream. THE TRANSFER OPERATION RUNS IN A SEPARATE THREAD. 
 From some of the current usages of this class, I noticed that the calling 
 threads do not wait for the transfer operation to be completed. That is, the 
 calling thread does not join the SteamPrinter threads.
 The calling thread would move forward thinking that the respective output 
 stream already has the data needed. But, it is not always the right 
 assumption since, it might happen that
 the StreamPrinter thread did not finish execution by the time it was expected 
 by the calling thread.
 FIX:
 To ensure that calling thread waits for the StreamPrinter threads to 
 complete, StreamPrinter threads are joined to calling thread.
 Please note , without the fix, TestCliDriverMethods#testRun failed sometimes 
 (like 1 in 30 times). This test would not fail with this fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7212) Use resource re-localization instead of restarting sessions in Tez

2014-06-15 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031998#comment-14031998
 ] 

Vikram Dixit K commented on HIVE-7212:
--

+1 LGTM

 Use resource re-localization instead of restarting sessions in Tez
 --

 Key: HIVE-7212
 URL: https://issues.apache.org/jira/browse/HIVE-7212
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Attachments: HIVE-7212.1.patch, HIVE-7212.2.patch, HIVE-7212.3.patch


 scriptfile1.q is failing on Tez because of a recent breakage in localization. 
 On top of that we're currently restarting sessions if the resources have 
 changed. (add file/add jar/etc). Instead of doing this we should just have 
 tez relocalize these new resources. This way no session/AM restart is 
 required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032019#comment-14032019
 ] 

Hive QA commented on HIVE-7230:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650394/HIVE-7230.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5611 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_scriptfile1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.conf.TestHiveConf.testConfProperties
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/475/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/475/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-475/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650394

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-15 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032029#comment-14032029
 ] 

Swarnim Kulkarni commented on HIVE-7230:


Duplicate of HIVE-6317

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7230) Add Eclipse formatter file for Hive coding conventions

2014-06-15 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032030#comment-14032030
 ] 

Swarnim Kulkarni commented on HIVE-7230:


As noted on HIVE-6317, you might be able to fix this by simply adding the 
following to the pom file:

{noformat}
plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-eclipse-plugin/artifactId
version${maven.eclipse.plugin.version}/version
configuration
  downloadJavadocstrue/downloadJavadocs
  downloadSourcestrue/downloadSources
  
workspaceActiveCodeStyleProfileNameGoogleStyle/workspaceActiveCodeStyleProfileName
  
workspaceCodeStylesURLhttps://google-styleguide.googlecode.com/svn/trunk/eclipse-java-google-style.xml/workspaceCodeStylesURL
/configuration
/plugin
{noformat}

 Add Eclipse formatter file for Hive coding conventions
 --

 Key: HIVE-7230
 URL: https://issues.apache.org/jira/browse/HIVE-7230
 Project: Hive
  Issue Type: Improvement
Reporter: David Chen
Assignee: David Chen
 Attachments: HIVE-7230.1.patch


 Eclipse's formatter is a convenient way to clean up formatting for Java code. 
 Currently, there is no Eclipse formatter file checked into Hive's codebase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-6317) Add eclipse code formatter to hive projects

2014-06-15 Thread Swarnim Kulkarni (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Swarnim Kulkarni resolved HIVE-6317.


Resolution: Duplicate

 Add eclipse code formatter to hive projects
 ---

 Key: HIVE-6317
 URL: https://issues.apache.org/jira/browse/HIVE-6317
 Project: Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.12.0
Reporter: Swarnim Kulkarni

 Currently on hive trunk, it seems like the eclipse formatter doesn't get 
 automatically imported(it used to happen sometime ago). We should probably 
 fix that so all changes going forward are formatted consistently according to 
 this formatter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032033#comment-14032033
 ] 

Gopal V commented on HIVE-7232:
---

[~ashutoshc]: Incorrect results as well.

Ran the same query with Tez  MR, got different results.

MR doesn't hit the same scenario becuase of the empty Map task, which doesn't 
have any input columns named reducesinkkey0.

Tez seems to hit a corner case where there are 2 shuffle joins one after the 
other - there is an input col named KEY.reducesinkkey0 and an output col named 
reducesinkkey0, which have no relation to each other.

{code}
$ diff -y -W 72  results/q5.tez.txt results/q5.mr.txt 
CHINA   985314.0848|VIETNAM 1.897236998313891E10
INDIA   819113.441801  |CHINA   1.894405687452681E10
VIETNAM 637407.2255|INDONESIA   1.89306456994551
JAPAN   523754.9791|JAPAN   1.892184676125508E10
INDONESIA   517900.1924|INDIA   1.886882412417209E10
{code}

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7094) Separate out static/dynamic partitioning code in FileRecordWriterContainer

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032064#comment-14032064
 ] 

Hive QA commented on HIVE-7094:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650393/HIVE-7094.3.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/478/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/478/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-478/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-478/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 'conf/hive-default.xml.template'
Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java'
Reverted 'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitPack.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestNewIntegerEncoding.java'
Reverted 
'ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReader.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerReaderV2.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriter.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java'
Reverted 
'ql/src/java/org/apache/hadoop/hive/ql/io/orc/RunLengthIntegerWriterV2.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/SerializationUtils.java'
Reverted 'ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
itests/target itests/hcatalog-unit/target itests/test-serde/target 
itests/qtest/target itests/hive-minikdc/target itests/hive-unit/target 
itests/custom-serde/target itests/util/target hcatalog/target 
hcatalog/core/target hcatalog/streaming/target 
hcatalog/server-extensions/target hcatalog/hcatalog-pig-adapter/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hwi/target 
common/target common/src/gen service/target contrib/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target 
ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestUnrolledBitPack.java
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1602783.

At revision 1602783.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650393

 Separate out static/dynamic 

[jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032062#comment-14032062
 ] 

Hive QA commented on HIVE-7219:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650413/HIVE-7219.3.patch

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 5578 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_aggregate
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_mapjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hive.hcatalog.pig.TestHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testStoreFuncAllSimpleTypes
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/476/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/476/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-476/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650413

 Improve performance of serialization utils in ORC
 -

 Key: HIVE-7219
 URL: https://issues.apache.org/jira/browse/HIVE-7219
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, 
 orc-read-perf-jmh-benchmark.png


 ORC uses serialization utils heavily for reading and writing data. The 
 bitpacking and unpacking code in writeInts() and readInts() can be unrolled 
 for better performance. Also double reader/writer performance can be improved 
 by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7234) Select on decimal column throws NPE

2014-06-15 Thread Ashish Kumar Singh (JIRA)
Ashish Kumar Singh created HIVE-7234:


 Summary: Select on decimal column throws NPE
 Key: HIVE-7234
 URL: https://issues.apache.org/jira/browse/HIVE-7234
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh


Select on decimal column throws NPE for values greater than maximum permissible 
value (99)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HIVE-7198) HiveServer2 CancelOperation does not work for long running queries

2014-06-15 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis resolved HIVE-7198.
-

Resolution: Duplicate

It's fixed by HIVE-5901. Feel free to reopen this issue if it's reproduced in 
hive-0.13.0.

 HiveServer2 CancelOperation does not work for long running queries
 --

 Key: HIVE-7198
 URL: https://issues.apache.org/jira/browse/HIVE-7198
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Romain Rigaux

 Sending the CancelOperation() call does not always stop the query and its 
 related MapReduce jobs.
 e.g. from https://issues.cloudera.org/browse/HUE-2144
 {code}
 I guess you're right. But the strange thing is that the canceled query shows 
 in job browser as 'Running' and the percents go up - 0%, 50%, then the job is 
 failed.
 How does the cancelling actually work? Is it like the hadoop kill command? It 
 seems to me like it works until certain phase of map reduce is done.
 And another thing - after cancelling the job in Hue I can kill it with hadoop 
 job -kill job_id. If it was killed already, it would show no such job.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 22612: HIVE-7234: Handle nulls from decimal columns elegantly

2014-06-15 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22612/
---

Review request for hive, Szehon Ho and Xuefu Zhang.


Bugs: HIVE-7234
https://issues.apache.org/jira/browse/HIVE-7234


Repository: hive-git


Description
---

HIVE-7234: Handle nulls from decimal columns elegantly


Diffs
-

  common/src/java/org/apache/hadoop/hive/common/type/HiveDecimal.java 
ad0901548217fbb828a01f8f5edda64581ac2c1e 
  data/files/decimal_10_0.txt PRE-CREATION 
  data/files/decimal_9_0.txt PRE-CREATION 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestDecimal.java 
PRE-CREATION 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyHiveDecimal.java 
78cc3819c61f5a1bcb0cdd3425a0105416c26861 
  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyUtils.java 
5a4623729ec955bbe8fcf662503b42ff8735eaad 

Diff: https://reviews.apache.org/r/22612/diff/


Testing
---

Added unit tests to test the scenario.


Thanks,

Ashish Singh



[jira] [Updated] (HIVE-7234) Select on decimal column throws NPE

2014-06-15 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7234:
-

Status: Patch Available  (was: Open)

 Select on decimal column throws NPE
 ---

 Key: HIVE-7234
 URL: https://issues.apache.org/jira/browse/HIVE-7234
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7234.patch


 Select on decimal column throws NPE for values greater than maximum 
 permissible value (99)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7234) Select on decimal column throws NPE

2014-06-15 Thread Ashish Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Kumar Singh updated HIVE-7234:
-

Attachment: HIVE-7234.patch

 Select on decimal column throws NPE
 ---

 Key: HIVE-7234
 URL: https://issues.apache.org/jira/browse/HIVE-7234
 Project: Hive
  Issue Type: Bug
Reporter: Ashish Kumar Singh
Assignee: Ashish Kumar Singh
 Attachments: HIVE-7234.patch


 Select on decimal column throws NPE for values greater than maximum 
 permissible value (99)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-7232:
---

Assignee: Navis

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032088#comment-14032088
 ] 

Ashutosh Chauhan commented on HIVE-7232:


Seems like this also can get triggered  for MR path. I think latest patch on 
HIVE-5771 is failing for test like subquery_in.q because they are hitting into 
this issue.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7219) Improve performance of serialization utils in ORC

2014-06-15 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032090#comment-14032090
 ] 

Gunther Hagleitner commented on HIVE-7219:
--

These failures look related to the patch (at least some of them). Looked at 
orc_analyze: Need to update golden files with new sizes. orc_split_elimination: 
Seems the order of records has changed in some queries, not sure how this patch 
causes it, but should take a look.

 Improve performance of serialization utils in ORC
 -

 Key: HIVE-7219
 URL: https://issues.apache.org/jira/browse/HIVE-7219
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7219.1.patch, HIVE-7219.2.patch, HIVE-7219.3.patch, 
 orc-read-perf-jmh-benchmark.png


 ORC uses serialization utils heavily for reading and writing data. The 
 bitpacking and unpacking code in writeInts() and readInts() can be unrolled 
 for better performance. Also double reader/writer performance can be improved 
 by bulk reading/writing from/to byte array.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7235) TABLESAMPLE on join table is regarded as alias

2014-06-15 Thread Navis (JIRA)
Navis created HIVE-7235:
---

 Summary: TABLESAMPLE on join table is regarded as alias
 Key: HIVE-7235
 URL: https://issues.apache.org/jira/browse/HIVE-7235
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial


{noformat}
SELECT c_custkey, o_custkey
FROM customer tablesample (1000 ROWS) join orders tablesample (1000 ROWS) on 
c_custkey = o_custkey;
{noformat}
Fails with NPE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()

2014-06-15 Thread steve, Oh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

steve, Oh updated HIVE-7182:


Status: Patch Available  (was: Open)

 ResultSet is not closed in JDBCStatsPublisher#init()
 

 Key: HIVE-7182
 URL: https://issues.apache.org/jira/browse/HIVE-7182
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: steve, Oh
Priority: Minor
 Attachments: HIVE-7182.1.patch, HIVE-7182.2.patch, HIVE-7182.patch


 {code}
 ResultSet rs = dbm.getTables(null, null, 
 JDBCStatsUtils.getStatTableName(), null);
 boolean tblExists = rs.next();
 {code}
 rs is not closed upon return from init()
 If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the 
 close() call should be placed in finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7182) ResultSet is not closed in JDBCStatsPublisher#init()

2014-06-15 Thread steve, Oh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

steve, Oh updated HIVE-7182:


Attachment: HIVE-7182.2.patch

I reattached the patch after fix compile error and rebase. HIVE-7182.2.patch 
rebased against current trunk.

 ResultSet is not closed in JDBCStatsPublisher#init()
 

 Key: HIVE-7182
 URL: https://issues.apache.org/jira/browse/HIVE-7182
 Project: Hive
  Issue Type: Bug
Reporter: Ted Yu
Assignee: steve, Oh
Priority: Minor
 Attachments: HIVE-7182.1.patch, HIVE-7182.2.patch, HIVE-7182.patch


 {code}
 ResultSet rs = dbm.getTables(null, null, 
 JDBCStatsUtils.getStatTableName(), null);
 boolean tblExists = rs.next();
 {code}
 rs is not closed upon return from init()
 If stmt.executeUpdate() throws exception, stmt.close() would be skipped - the 
 close() call should be placed in finally block.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HIVE-7236) Tez progress monitor should indicate running/failed tasks

2014-06-15 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-7236 started by Gopal V.

 Tez progress monitor should indicate running/failed tasks
 -

 Key: HIVE-7236
 URL: https://issues.apache.org/jira/browse/HIVE-7236
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor

 Currently, the only logging in TezJobMonitor is for completed tasks. 
 This makes it hard to locate task stalls and task failures. Failure scenarios 
 are harder to debug, in particular when analyzing query runs on a cluster 
 with bad nodes.
 Change the job monitor to log running  failed tasks as follows.
 {code}
 Map 1: 0(+157,-1)/1755 Reducer 2: 0/1  
 Map 1: 0(+168,-1)/1755 Reducer 2: 0/1  
 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1  
 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 
 {code}
 That is 189 tasks running, 1 failure and 0 complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7236) Tez progress monitor should indicate running/failed tasks

2014-06-15 Thread Gopal V (JIRA)
Gopal V created HIVE-7236:
-

 Summary: Tez progress monitor should indicate running/failed tasks
 Key: HIVE-7236
 URL: https://issues.apache.org/jira/browse/HIVE-7236
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor


Currently, the only logging in TezJobMonitor is for completed tasks. 

This makes it hard to locate task stalls and task failures. Failure scenarios 
are harder to debug, in particular when analyzing query runs on a cluster with 
bad nodes.

Change the job monitor to log running  failed tasks as follows.

{code}
Map 1: 0(+157,-1)/1755 Reducer 2: 0/1  
Map 1: 0(+168,-1)/1755 Reducer 2: 0/1  
Map 1: 0(+189,-1)/1755 Reducer 2: 0/1  
Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 
{code}

That is 189 tasks running, 1 failure and 0 complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7236) Tez progress monitor should indicate running/failed tasks

2014-06-15 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-7236:
--

Attachment: HIVE-7236.1.patch

 Tez progress monitor should indicate running/failed tasks
 -

 Key: HIVE-7236
 URL: https://issues.apache.org/jira/browse/HIVE-7236
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Attachments: HIVE-7236.1.patch


 Currently, the only logging in TezJobMonitor is for completed tasks. 
 This makes it hard to locate task stalls and task failures. Failure scenarios 
 are harder to debug, in particular when analyzing query runs on a cluster 
 with bad nodes.
 Change the job monitor to log running  failed tasks as follows.
 {code}
 Map 1: 0(+157,-1)/1755 Reducer 2: 0/1  
 Map 1: 0(+168,-1)/1755 Reducer 2: 0/1  
 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1  
 Map 1: 0(+189,-1)/1755 Reducer 2: 0/1 
 {code}
 That is 189 tasks running, 1 failure and 0 complete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032106#comment-14032106
 ] 

Navis commented on HIVE-7232:
-

Fail of  subquery_in.q in HIVE-5771 seemed not caused by HIVE-4867 but strongly 
related with it because HIVE-4867 have (intentionally) broken internal 
assumption on keys/values of RS. With constant propagation optimizer, 
subquery_in.q is making different keys for each aliases of join, which seemed 
not valid.
{code}
-- sq_1
Reduce Output Operator
  key expressions: _col1 (type: int)
  sort order: ++
  Map-reduce partition columns: _col1 (type: int)
{code}
and
{code}
-- others
Reduce Output Operator
  key expressions: _col0 (type: int), _col1 (type: int)
  sort order: ++
  Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
{code}

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032108#comment-14032108
 ] 

Navis commented on HIVE-7232:
-

For this problem, I cannot understand that the RS which is a child of JOIN can 
get ROW of format,
{noformat}
{reducesinkkey0:442}
{noformat}
In my reading, join would emit ROW and rowOI which is labeled with output 
columns, like below
{noformat}
_col0{KEY.reducesinkkey0} 
_col3{VALUE._col2}
_col10  {VALUE._col0}
_col11  {KEY.reducesinkkey0} 
_col14  {VALUE._col3}
{noformat}

I don't have environment for hadoop-2, so it's hard to verify, so it might take 
some time. 

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7231) Improve ORC padding

2014-06-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032109#comment-14032109
 ] 

Hive QA commented on HIVE-7231:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12650431/HIVE-7231.1.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 5536 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_parquet_columnar
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rand_partitionpruner3
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_ctas
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/479/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/479/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-Build-479/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12650431

 Improve ORC padding
 ---

 Key: HIVE-7231
 URL: https://issues.apache.org/jira/browse/HIVE-7231
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: orcfile
 Attachments: HIVE-7231.1.patch


 Current ORC padding is not optimal because of fixed stripe sizes within 
 block. The padding overhead will be significant in some cases. Also padding 
 percentage relative to stripe size is not configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-5771) Constant propagation optimizer for Hive

2014-06-15 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5771:
---

Status: Open  (was: Patch Available)

[~tedxu] Navis's observation 
[here|https://issues.apache.org/jira/browse/HIVE-7232?focusedCommentId=14032108page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032108]
 seems correct about failing tests for subquery_in.q test case.

 Constant propagation optimizer for Hive
 ---

 Key: HIVE-5771
 URL: https://issues.apache.org/jira/browse/HIVE-5771
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Ted Xu
Assignee: Ted Xu
 Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
 HIVE-5771.11.patch, HIVE-5771.12.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, 
 HIVE-5771.4.patch, HIVE-5771.5.patch, HIVE-5771.6.patch, HIVE-5771.7.patch, 
 HIVE-5771.8.patch, HIVE-5771.9.patch, HIVE-5771.patch, 
 HIVE-5771.patch.javaonly


 Currently there is no constant folding/propagation optimizer, all expressions 
 are evaluated at runtime. 
 HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
 however, it is still a runtime evaluation and it doesn't propagate constants 
 from a subquery to outside.
 It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7198) HiveServer2 CancelOperation does not work for long running queries

2014-06-15 Thread Romain Rigaux (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032115#comment-14032115
 ] 

Romain Rigaux commented on HIVE-7198:
-

Nice! Will have a try!

 HiveServer2 CancelOperation does not work for long running queries
 --

 Key: HIVE-7198
 URL: https://issues.apache.org/jira/browse/HIVE-7198
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Affects Versions: 0.12.0
Reporter: Romain Rigaux

 Sending the CancelOperation() call does not always stop the query and its 
 related MapReduce jobs.
 e.g. from https://issues.cloudera.org/browse/HUE-2144
 {code}
 I guess you're right. But the strange thing is that the canceled query shows 
 in job browser as 'Running' and the percents go up - 0%, 50%, then the job is 
 failed.
 How does the cancelling actually work? Is it like the hadoop kill command? It 
 seems to me like it works until certain phase of map reduce is done.
 And another thing - after cancelling the job in Hue I can kill it with hadoop 
 job -kill job_id. If it was killed already, it would show no such job.
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7232) ReduceSink is emitting NULL keys due to failed keyEval

2014-06-15 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032123#comment-14032123
 ] 

Gopal V commented on HIVE-7232:
---

[~navis]: I can run tests for you, if you have a patch file with log lines.

I can reproduce this issue consistently for all recent runs of this query.

 ReduceSink is emitting NULL keys due to failed keyEval
 --

 Key: HIVE-7232
 URL: https://issues.apache.org/jira/browse/HIVE-7232
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.14.0
Reporter: Gopal V
Assignee: Navis

 After HIVE-4867 has been merged in, some queries have exhibited a very weird 
 skew towards NULL keys emitted from the ReduceSinkOperator.
 Added extra logging to print expr.column() in ExprNodeColumnEvaluator  in 
 reduce sink.
 {code}
 2014-06-14 00:37:19,186 INFO [TezChild] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator:
 numDistributionKeys = 1 {null -- ExprNodeColumnEvaluator(_col10)}
 key_row={reducesinkkey0:442}
 {code}
 {code}
   HiveKey firstKey = toHiveKey(cachedKeys[0], tag, null);
   int distKeyLength = firstKey.getDistKeyLength();
   if(distKeyLength = 1) {
 StringBuffer x1 = new StringBuffer();
 x1.append(numDistributionKeys = + numDistributionKeys + \n);
 for (int i = 0; i  numDistributionKeys; i++) {
 x1.append(cachedKeys[0][i] +  --  + keyEval[i] + \n);
 }
 x1.append(key_row=+ SerDeUtils.getJSONString(row, 
 keyObjectInspector));
 LOG.info(GOPAL:  + x1.toString());
   }
 {code}
 The query is tpc-h query5, with extra NULL checks just to be sure.
 {code}
 ELECT n_name,
sum(l_extendedprice * (1 - l_discount)) AS revenue
 FROM customer,
  orders,
  lineitem,
  supplier,
  nation,
  region
 WHERE c_custkey = o_custkey
   AND l_orderkey = o_orderkey
   AND l_suppkey = s_suppkey
   AND c_nationkey = s_nationkey
   AND s_nationkey = n_nationkey
   AND n_regionkey = r_regionkey
   AND r_name = 'ASIA'
   AND o_orderdate = '1994-01-01'
   AND o_orderdate  '1995-01-01'
   and l_orderkey is not null
   and c_custkey is not null
   and l_suppkey is not null
   and c_nationkey is not null
   and s_nationkey is not null
   and n_regionkey is not null
 GROUP BY n_name
 ORDER BY revenue DESC;
 {code}
 The reducer which has the issue has the following plan
 {code}
 Reducer 3
 Reduce Operator Tree:
   Join Operator
 condition map:
  Inner Join 0 to 1
 condition expressions:
   0 {KEY.reducesinkkey0} {VALUE._col2}
   1 {VALUE._col0} {KEY.reducesinkkey0} {VALUE._col3}
 outputColumnNames: _col0, _col3, _col10, _col11, _col14
 Statistics: Num rows: 18344 Data size: 95229140992 Basic 
 stats: COMPLETE Column stats: NONE
 Reduce Output Operator
   key expressions: _col10 (type: int)
   sort order: +
   Map-reduce partition columns: _col10 (type: int)
   Statistics: Num rows: 18344 Data size: 95229140992 
 Basic stats: COMPLETE Column stats: NONE
   value expressions: _col0 (type: int), _col3 (type: int), 
 _col11 (type: int), _col14 (type: string)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)