[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057672#comment-17057672 ] Hive QA commented on HIVE-22980: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12996390/HIVE-22980.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18099 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.exec.tez.TestDynamicPartitionPruner.testSingleSourceMultipleFiltersOrdering1 (batchId=353) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/21078/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21078/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21078/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12996390 - PreCommit-HIVE-Build > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch, > HIVE-22980.3.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is > only respected by {{FileInputFormat}}, but not by any other implementations > of {{InputFormat}}. The customer wants to have the ability to filter out rows > based on path/filenames, current ORC features like bloomfilters and indexes > are not good enough for them to minimize number of disk read operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057638#comment-17057638 ] Hive QA commented on HIVE-22980: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 7s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 39s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 13s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-21078/dev-support/hive-personality.sh | | git revision | master / 812a626 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-21078/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-21078/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch, > HIVE-22980.3.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is > only respected by {{FileInputFormat}}, but not by any other implementations >
[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056825#comment-17056825 ] Hive QA commented on HIVE-22980: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12996315/HIVE-22980.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18097 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.parse.TestScheduledReplicationScenarios.testAcidTablesReplLoadBootstrapIncr (batchId=270) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/21064/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21064/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21064/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12996315 - PreCommit-HIVE-Build > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch, > HIVE-22980.3.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is > only respected by {{FileInputFormat}}, but not by any other implementations > of {{InputFormat}}. The customer wants to have the ability to filter out rows > based on path/filenames, current ORC features like bloomfilters and indexes > are not good enough for them to minimize number of disk read operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056784#comment-17056784 ] Hive QA commented on HIVE-22980: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 41s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 2 new + 196 unchanged - 0 fixed = 198 total (was 196) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-21064/dev-support/hive-personality.sh | | git revision | master / f86ca3e | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-21064/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-21064/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-21064/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface.
[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053325#comment-17053325 ] Hive QA commented on HIVE-22980: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12995720/HIVE-22980.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:green}SUCCESS:{color} +1 due to 18103 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/20977/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20977/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20977/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12995720 - PreCommit-HIVE-Build > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is > only respected by {{FileInputFormat}}, but not by any other implementations > of {{InputFormat}}. The customer wants to have the ability to filter out rows > based on path/filenames, current ORC features like bloomfilters and indexes > are not good enough for them to minimize number of disk read operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053292#comment-17053292 ] Hive QA commented on HIVE-22980: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 26s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 3m 46s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 26s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 40s{color} | {color:red} ql: The patch generated 14 new + 196 unchanged - 0 fixed = 210 total (was 196) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Optional Tests | asflicense javac javadoc findbugs checkstyle compile | | uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux | | Build tool | maven | | Personality | /data/hiveptest/working/yetus_PreCommit-HIVE-Build-20977/dev-support/hive-personality.sh | | git revision | master / 3bed626 | | Default Java | 1.8.0_111 | | findbugs | v3.0.1 | | checkstyle | http://104.198.109.242/logs//PreCommit-HIVE-Build-20977/yetus/diff-checkstyle-ql.txt | | asflicense | http://104.198.109.242/logs//PreCommit-HIVE-Build-20977/yetus/patch-asflicense-problems.txt | | modules | C: ql U: ql | | Console output | http://104.198.109.242/logs//PreCommit-HIVE-Build-20977/yetus.txt | | Powered by | Apache Yetushttp://yetus.apache.org | This message was automatically generated. > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface. Property
[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables
[ https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052071#comment-17052071 ] Oleksiy Sayankin commented on HIVE-22980: - *FIXED* *SOLUTION* Add processing of {{mapreduce.input.pathFilter.class}} property in {{AcidUtils}} for ORC tables. E.g. to enable custom filter: \\ \\ 1. Implement {{CustomPathFilter}}: {code} class CustomPathFilter implements PathFilter{ @Override public boolean accept(Path path) { String name = path.getName(); return name.startsWith("a"); } } {code} 2. Add {{CustomPathFilter}} to configuration. {code} PathFilter customPathFilter = new CustomPathFilter(); Configuration conf = new Configuration(); conf.setClass("mapreduce.input.pathFilter.class", customPathFilter.getClass(), PathFilter.class); {code} 3. Pass {{Configuration}} to Hive: {code} AcidUtils.Directory dir = AcidUtils.getAcidState(fs, new MockPath(fs, "/tbl/part1"), conf, new ValidReaderWriteIdList("tbl:100:" + Long.MAX_VALUE + ":"), null, false, null, false); {code} *EFFECTS* ORC processing. > Support custom path filter for ORC tables > - > > Key: HIVE-22980 > URL: https://issues.apache.org/jira/browse/HIVE-22980 > Project: Hive > Issue Type: New Feature > Components: ORC >Reporter: Oleksiy Sayankin >Assignee: Oleksiy Sayankin >Priority: Major > Attachments: HIVE-22980.1.patch > > > The customer is looking for an option to specify custom path filter for ORC > tables. Please find the details below from customer requirement. > Problem Statement/Approach in customer words : > {quote} > Currently, Orc file input format does not take in path filters set in the > property "mapreduce.input.pathfilter.class" OR " > mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc > files. > AcidUtils class has a static filter called "hiddenFilters" which is used by > ORC to filter input paths. If we can pass the custom filter classes(set in > the property mentioned above) to AcidUtils and replace hiddenFilter with a > filter that does an "and" operation over hiddenFilter+customFilters, the > filters would work well. > On local testing, mapreduce.input.pathfilter.class seems to be working for > Text tables but not for ORC tables. > {quote} > Our analysis: > {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for > {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is > only respected by {{FileInputFormat}}, but not by any other implementations > of {{InputFormat}}. The customer wants to have the ability to filter out rows > based on path/filenames, current ORC features like bloomfilters and indexes > are not good enough for them to minimize number of disk read operations. -- This message was sent by Atlassian Jira (v8.3.4#803005)