[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-12 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057672#comment-17057672
 ] 

Hive QA commented on HIVE-22980:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12996390/HIVE-22980.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18099 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.exec.tez.TestDynamicPartitionPruner.testSingleSourceMultipleFiltersOrdering1
 (batchId=353)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/21078/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21078/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21078/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12996390 - PreCommit-HIVE-Build

> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch, 
> HIVE-22980.3.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is 
> only respected by {{FileInputFormat}}, but not by any other implementations 
> of {{InputFormat}}. The customer wants to have the ability to filter out rows 
> based on path/filenames, current ORC features like bloomfilters and indexes 
> are not good enough for them to minimize number of disk read operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-12 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057638#comment-17057638
 ] 

Hive QA commented on HIVE-22980:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
 7s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
39s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
21s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
13s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 52s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-21078/dev-support/hive-personality.sh
 |
| git revision | master / 812a626 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21078/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21078/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch, 
> HIVE-22980.3.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is 
> only respected by {{FileInputFormat}}, but not by any other implementations 
> 

[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-11 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056825#comment-17056825
 ] 

Hive QA commented on HIVE-22980:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12996315/HIVE-22980.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 18097 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestScheduledReplicationScenarios.testAcidTablesReplLoadBootstrapIncr
 (batchId=270)
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/21064/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/21064/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-21064/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12996315 - PreCommit-HIVE-Build

> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch, 
> HIVE-22980.3.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is 
> only respected by {{FileInputFormat}}, but not by any other implementations 
> of {{InputFormat}}. The customer wants to have the ability to filter out rows 
> based on path/filenames, current ORC features like bloomfilters and indexes 
> are not good enough for them to minimize number of disk read operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-11 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056784#comment-17056784
 ] 

Hive QA commented on HIVE-22980:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
41s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
3s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 2 new + 196 unchanged - 0 
fixed = 198 total (was 196) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
55s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
14s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-21064/dev-support/hive-personality.sh
 |
| git revision | master / f86ca3e |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21064/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21064/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-21064/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch, HIVE-22980.2.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. 

[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-06 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053325#comment-17053325
 ] 

Hive QA commented on HIVE-22980:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12995720/HIVE-22980.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 18103 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/20977/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/20977/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-20977/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12995720 - PreCommit-HIVE-Build

> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is 
> only respected by {{FileInputFormat}}, but not by any other implementations 
> of {{InputFormat}}. The customer wants to have the ability to filter out rows 
> based on path/filenames, current ORC features like bloomfilters and indexes 
> are not good enough for them to minimize number of disk read operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-06 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17053292#comment-17053292
 ] 

Hive QA commented on HIVE-22980:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  9m 
26s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
46s{color} | {color:blue} ql in master has 1531 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
56s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
26s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
40s{color} | {color:red} ql: The patch generated 14 new + 196 unchanged - 0 
fixed = 210 total (was 196) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
15s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 24m 45s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-20977/dev-support/hive-personality.sh
 |
| git revision | master / 3bed626 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.1 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20977/yetus/diff-checkstyle-ql.txt
 |
| asflicense | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20977/yetus/patch-asflicense-problems.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-20977/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. Property 

[jira] [Commented] (HIVE-22980) Support custom path filter for ORC tables

2020-03-05 Thread Oleksiy Sayankin (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17052071#comment-17052071
 ] 

Oleksiy Sayankin commented on HIVE-22980:
-

*FIXED*

*SOLUTION*

Add processing of {{mapreduce.input.pathFilter.class}} property in 
{{AcidUtils}} for ORC tables. E.g. to enable custom filter:
\\
\\
1. Implement {{CustomPathFilter}}:
{code}
class CustomPathFilter implements PathFilter{
  @Override
  public boolean accept(Path path) {
String name = path.getName();
return name.startsWith("a");
  }
}
{code}

2. Add {{CustomPathFilter}} to configuration.
{code}
PathFilter customPathFilter =  new CustomPathFilter();
Configuration conf = new Configuration();
conf.setClass("mapreduce.input.pathFilter.class", customPathFilter.getClass(), 
PathFilter.class);
{code}

3. Pass {{Configuration}} to Hive:
{code}
AcidUtils.Directory dir = AcidUtils.getAcidState(fs, new MockPath(fs, 
"/tbl/part1"), conf, new ValidReaderWriteIdList("tbl:100:" + Long.MAX_VALUE + 
":"), null, false, null, false);
{code}

*EFFECTS*

ORC processing.

> Support custom path filter for ORC tables
> -
>
> Key: HIVE-22980
> URL: https://issues.apache.org/jira/browse/HIVE-22980
> Project: Hive
>  Issue Type: New Feature
>  Components: ORC
>Reporter: Oleksiy Sayankin
>Assignee: Oleksiy Sayankin
>Priority: Major
> Attachments: HIVE-22980.1.patch
>
>
> The customer is looking for an option to specify custom path filter for ORC 
> tables. Please find the details below from customer requirement.
> Problem Statement/Approach in customer words :
> {quote} 
> Currently, Orc file input format does not take in path filters set in the 
> property "mapreduce.input.pathfilter.class" OR " 
> mapred.input.pathfilter.class ". So, we cannot use custom filters with Orc 
> files. 
> AcidUtils class has a static filter called "hiddenFilters" which is used by 
> ORC to filter input paths. If we can pass the custom filter classes(set in 
> the property mentioned above) to AcidUtils and replace hiddenFilter with a 
> filter that does an "and" operation over hiddenFilter+customFilters, the 
> filters would work well.
> On local testing, mapreduce.input.pathfilter.class seems to be working for 
> Text tables but not for ORC tables.
> {quote}
> Our analysis:
> {{OrcInputFormat}} and {{FileInputFormat}} are different implementations for 
> {{Inputformat}} interface. Property "{{mapreduce.input.pathfilter.class}}" is 
> only respected by {{FileInputFormat}}, but not by any other implementations 
> of {{InputFormat}}. The customer wants to have the ability to filter out rows 
> based on path/filenames, current ORC features like bloomfilters and indexes 
> are not good enough for them to minimize number of disk read operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)