[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202126#comment-14202126
 ] 

Owen O'Malley commented on HIVE-8732:
-

I should also point out that I added a line to the orcfiledump with a line 
about the version. New files will get the line:

File Version: 0.12 with HIVE_8732

Files written by the old writer will say either:

File Version: 0.12 with ORIGINAL
or
File Version: 0.11 with ORIGINAL



 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-07 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202230#comment-14202230
 ] 

Prasanth J commented on HIVE-8732:
--

I have verified that file version in file dump with old orc formats. 

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1416#comment-1416
 ] 

Hive QA commented on HIVE-8732:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679704/HIVE-8732.patch

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6680 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_10_0
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationComplexExpr
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationLargeMaxSplit
org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationSmallMaxSplit
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1658/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679704 - PreCommit-HIVE-TRUNK-Build

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201442#comment-14201442
 ] 

Prasanth J commented on HIVE-8732:
--

The new changes looks good to me. +1.
Can you create a followup for dealing with NaN in double column statistics?

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201628#comment-14201628
 ] 

Hive QA commented on HIVE-8732:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/1267/HIVE-8732.patch

{color:green}SUCCESS:{color} +1 6665 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1678/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1678/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1678/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 1267 - PreCommit-HIVE-TRUNK-Build

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-06 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201738#comment-14201738
 ] 

Gunther Hagleitner commented on HIVE-8732:
--

+1 for hive .14

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197848#comment-14197848
 ] 

Hive QA commented on HIVE-8732:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12679346/HIVE-8732.patch

{color:green}SUCCESS:{color} +1 6677 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1640/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1640/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1640/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12679346 - PreCommit-HIVE-TRUNK-Build

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199199#comment-14199199
 ] 

Owen O'Malley commented on HIVE-8732:
-

I've created the timestamp bug as HIVE-8746. The fix for that one is pretty 
touchy and I'll do it in 0.15 I think rather than risk the 0.14 release.

I don't want to create a new write format since the old reader will read the 
corrected files. I will add a flag that I can use to suppress using the split 
elimination code for files with broken stripe/file indexes.

Does that sound reasonable?

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199268#comment-14199268
 ] 

Dain Sundstrom commented on HIVE-8732:
--

How will the code know that the stripe/file indexes are broken (i.e., written 
with the current writer and not the new one)?BWT, the current  reader will 
read files with future version numbers; you only get a warning message:
{code:java}
  if (major  OrcFile.Version.CURRENT.getMajor() ||
  (major == OrcFile.Version.CURRENT.getMajor() 
   minor  OrcFile.Version.CURRENT.getMinor())) {
log.warn(ORC file  + path +
  was written by a future Hive version  +
 versionString(version) +
 . This file may not be readable by this version of Hive.);
  }
{code}

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199353#comment-14199353
 ] 

Dain Sundstrom commented on HIVE-8732:
--

DoubleStatisticsImpl merge and update methods don't handle NaN properly.  Any 
comparison with NaN returns false, so if the first value is NaN you end up with 
min and max of NaN, which implies that the column only contains NaNs.  We 
should consider tracking NaN specially in the stats.

Regardless, for now any code reading the DoubleStatistic should discard a stat 
containing a NaN.

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199460#comment-14199460
 ] 

Dain Sundstrom commented on HIVE-8732:
--

This seems like a reasonable plan.

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-05 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199468#comment-14199468
 ] 

Dain Sundstrom commented on HIVE-8732:
--

The patch looks reasonable.  

We still need to decide how to deal with doubles with NaN. 

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch, HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-04 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196843#comment-14196843
 ] 

Dain Sundstrom commented on HIVE-8732:
--

Decimal and Date are also broken

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-04 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196847#comment-14196847
 ] 

Owen O'Malley commented on HIVE-8732:
-

Timestamp too. *sigh*

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197045#comment-14197045
 ] 

Prasanth J commented on HIVE-8732:
--

LGTM, +1. Pending tests

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-04 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197280#comment-14197280
 ] 

Dain Sundstrom commented on HIVE-8732:
--

The test cover a.merge(b) where is wider, but not b.merge(a).  I'd test both 
directions to make sure someone doesn't introduce the opposite bug.

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly

2014-11-04 Thread Dain Sundstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197283#comment-14197283
 ] 

Dain Sundstrom commented on HIVE-8732:
--

You will need to bump the file version so a reader knows the stats are good.  
You should also disable using these stats for predicate pushdown in the current 
version of the file.

And if you are bumping the version, you should fix the Timestamp epoch bug.

 ORC string statistics are not merged correctly
 --

 Key: HIVE-8732
 URL: https://issues.apache.org/jira/browse/HIVE-8732
 Project: Hive
  Issue Type: Bug
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley
Priority: Blocker
 Fix For: 0.14.0

 Attachments: HIVE-8732.patch


 Currently ORC's string statistics do not merge correctly causing incorrect 
 maximum values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)