[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202126#comment-14202126 ] Owen O'Malley commented on HIVE-8732: - I should also point out that I added a line to the orcfiledump with a line about the version. New files will get the line: File Version: 0.12 with HIVE_8732 Files written by the old writer will say either: File Version: 0.12 with ORIGINAL or File Version: 0.11 with ORIGINAL ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14202230#comment-14202230 ] Prasanth J commented on HIVE-8732: -- I have verified that file version in file dump with old orc formats. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1416#comment-1416 ] Hive QA commented on HIVE-8732: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679704/HIVE-8732.patch {color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6680 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_merge6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge6 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_decimal_10_0 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_mapjoin_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationComplexExpr org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationLargeMaxSplit org.apache.hadoop.hive.ql.io.orc.TestOrcSplitElimination.testSplitEliminationSmallMaxSplit org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection org.apache.hive.hcatalog.streaming.TestStreaming.testInterleavedTransactionBatchCommits org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchEmptyCommit {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1658/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1658/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 32 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12679704 - PreCommit-HIVE-TRUNK-Build ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201442#comment-14201442 ] Prasanth J commented on HIVE-8732: -- The new changes looks good to me. +1. Can you create a followup for dealing with NaN in double column statistics? ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201628#comment-14201628 ] Hive QA commented on HIVE-8732: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/1267/HIVE-8732.patch {color:green}SUCCESS:{color} +1 6665 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1678/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1678/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1678/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 1267 - PreCommit-HIVE-TRUNK-Build ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201738#comment-14201738 ] Gunther Hagleitner commented on HIVE-8732: -- +1 for hive .14 ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197848#comment-14197848 ] Hive QA commented on HIVE-8732: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12679346/HIVE-8732.patch {color:green}SUCCESS:{color} +1 6677 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1640/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1640/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1640/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12679346 - PreCommit-HIVE-TRUNK-Build ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199199#comment-14199199 ] Owen O'Malley commented on HIVE-8732: - I've created the timestamp bug as HIVE-8746. The fix for that one is pretty touchy and I'll do it in 0.15 I think rather than risk the 0.14 release. I don't want to create a new write format since the old reader will read the corrected files. I will add a flag that I can use to suppress using the split elimination code for files with broken stripe/file indexes. Does that sound reasonable? ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199268#comment-14199268 ] Dain Sundstrom commented on HIVE-8732: -- How will the code know that the stripe/file indexes are broken (i.e., written with the current writer and not the new one)?BWT, the current reader will read files with future version numbers; you only get a warning message: {code:java} if (major OrcFile.Version.CURRENT.getMajor() || (major == OrcFile.Version.CURRENT.getMajor() minor OrcFile.Version.CURRENT.getMinor())) { log.warn(ORC file + path + was written by a future Hive version + versionString(version) + . This file may not be readable by this version of Hive.); } {code} ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199353#comment-14199353 ] Dain Sundstrom commented on HIVE-8732: -- DoubleStatisticsImpl merge and update methods don't handle NaN properly. Any comparison with NaN returns false, so if the first value is NaN you end up with min and max of NaN, which implies that the column only contains NaNs. We should consider tracking NaN specially in the stats. Regardless, for now any code reading the DoubleStatistic should discard a stat containing a NaN. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199460#comment-14199460 ] Dain Sundstrom commented on HIVE-8732: -- This seems like a reasonable plan. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199468#comment-14199468 ] Dain Sundstrom commented on HIVE-8732: -- The patch looks reasonable. We still need to decide how to deal with doubles with NaN. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch, HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196843#comment-14196843 ] Dain Sundstrom commented on HIVE-8732: -- Decimal and Date are also broken ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196847#comment-14196847 ] Owen O'Malley commented on HIVE-8732: - Timestamp too. *sigh* ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197045#comment-14197045 ] Prasanth J commented on HIVE-8732: -- LGTM, +1. Pending tests ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197280#comment-14197280 ] Dain Sundstrom commented on HIVE-8732: -- The test cover a.merge(b) where is wider, but not b.merge(a). I'd test both directions to make sure someone doesn't introduce the opposite bug. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-8732) ORC string statistics are not merged correctly
[ https://issues.apache.org/jira/browse/HIVE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14197283#comment-14197283 ] Dain Sundstrom commented on HIVE-8732: -- You will need to bump the file version so a reader knows the stats are good. You should also disable using these stats for predicate pushdown in the current version of the file. And if you are bumping the version, you should fix the Timestamp epoch bug. ORC string statistics are not merged correctly -- Key: HIVE-8732 URL: https://issues.apache.org/jira/browse/HIVE-8732 Project: Hive Issue Type: Bug Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley Priority: Blocker Fix For: 0.14.0 Attachments: HIVE-8732.patch Currently ORC's string statistics do not merge correctly causing incorrect maximum values. -- This message was sent by Atlassian JIRA (v6.3.4#6332)