[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311024#comment-14311024 ] Lefty Leverenz commented on HIVE-4639: -- Doc note: [~prasanth_j] documented this in the ORC wiki. * [ORC -- Column Statistics | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ColumnStatistics] But it says the hasNull flag is added in 1.2.0 -- shouldn't that be 1.1.0, since this jira's fix version is 0.15? Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Fix For: 0.15.0 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311026#comment-14311026 ] Prasanth Jayachandran commented on HIVE-4639: - Good catch! [~leftylev]. Updated the docs! Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Fix For: 0.15.0 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272090#comment-14272090 ] Hive QA commented on HIVE-4639: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12691023/HIVE-4639.3.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6747 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2311/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2311/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2311/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12691023 - PreCommit-HIVE-TRUNK-Build Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272122#comment-14272122 ] Gopal V commented on HIVE-4639: --- for the sake of documentation this does not change the ORC format version (i.e ORC files with hasNull flags can be read by hive-14). [~leftylev]: FYI. Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Fix For: 0.15.0 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272141#comment-14272141 ] Lefty Leverenz commented on HIVE-4639: -- Thanks [~gopalv]. I assume that means no documentation is needed, since this is internal and backward-compatible. Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Fix For: 0.15.0 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270348#comment-14270348 ] Hive QA commented on HIVE-4639: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12690690/HIVE-4639.2.patch {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 6747 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testColumnsWithNullAndCompression org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithoutNull org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplex org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplexOldFormat org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testSerdeStatsOldFormat org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testStringAndBinaryStatistics org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2296/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2296/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2296/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12690690 - PreCommit-HIVE-TRUNK-Build Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268884#comment-14268884 ] Gopal V commented on HIVE-4639: --- Added this patch to my daily TPC-H 1Tb ETL reloaded lineitem with the new format. Testing {{select * from lineitem where l_shipdate is null;}}. Before: 66.728 seconds (208774320430 bytes read) After: 7.87 seconds (539046900 bytes read) LGTM - +1. Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268053#comment-14268053 ] Owen O'Malley commented on HIVE-4639: - You should encode four values: no_values, all_nulls, some_nulls, no_nulls This will allow you to support a richer set of sargs. Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268071#comment-14268071 ] Gopal V commented on HIVE-4639: --- Yes, we have that granularity locked up in two states (as a tri-state, now - all_nulls, some_nulls, no_nulls). We actually have all_nulls/no_values encoded as min=null/max=null. This patch is the some_nulls/no_nulls boolean on top of that - though, that information is in somewhat non-obvious detail. Another thought occurs, that since we have a whole long stream of IS_PRESENT already, I suspect storing the actual NULL count would be somewhat helpful, if we need to have a heuristic for IS_NULL row-level predicate evaluation for wide de-normalized tables (i.e read filter col first and then avoid creating large vector batches for the rest). Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268099#comment-14268099 ] Prasanth Jayachandran commented on HIVE-4639: - As Gopal mentioned, we can infer the other stats from the existing information all_nulls - min = null no_nulls - hasNull = false some_nulls - hasNull = true, min != null Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267182#comment-14267182 ] Hive QA commented on HIVE-4639: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12690444/HIVE-4639.1.patch {color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6731 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testReadFormat_0_11[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testReadFormat_0_11[1] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[0] org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[1] org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testColumnsWithNullAndCompression org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithoutNull org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplex org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplexOldFormat org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testSerdeStatsOldFormat org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testStringAndBinaryStatistics {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2274/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2274/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2274/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 32 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12690444 - PreCommit-HIVE-TRUNK-Build Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Prasanth Jayachandran Attachments: HIVE-4639.1.patch It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index
[ https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675338#comment-13675338 ] Prasanth J commented on HIVE-4639: -- [~owen.omalley]are you working on this issue? If not I can take over this issue. Add has null flag to ORC internal index --- Key: HIVE-4639 URL: https://issues.apache.org/jira/browse/HIVE-4639 Project: Hive Issue Type: Improvement Components: File Formats Reporter: Owen O'Malley Assignee: Owen O'Malley It would enable more predicate pushdown if we added a flag to the index entry recording if there were any null values in the column for the 10k rows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira