[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374981#comment-17374981 ] Steve Loughran commented on HDFS-14788: --- a modtime filter bundled into hadoop-distcp could be nice > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374569#comment-17374569 ] Mukund Thakur commented on HDFS-14788: -- [~wanghongbing] You could always write a new implementation and configure as per the doc [https://github.com/apache/hadoop/pull/1702/files#diff-aabf0a2eb6a65a9c67335f493b233fddbf6f177ffcdcea32792bba24498c38a0R445] > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372761#comment-17372761 ] Hongbing Wang commented on HDFS-14788: -- Is there a plan to filter files by modtime? In the scenario of incremental data synchronization, if files in certain time windows can be specified, efficiency can be greatly improved. > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009117#comment-17009117 ] Hudson commented on HDFS-14788: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17818 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/17818/]) HDFS-14788. Use dynamic regex filter to ignore copy of source files in (stevel: rev 819159fa060897bcf7c9ae09bf4b2fc97292f92b) * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java * (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyFilter.java * (edit) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyFilter.java * (add) hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/RegexpInConfigurationFilter.java * (add) hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestRegexpInConfigurationFilter.java * (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp
[ https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009113#comment-17009113 ] Steve Loughran commented on HDFS-14788: --- committed to trunk. Unlikely to cause problems if backport to 3.2/3.1 > Use dynamic regex filter to ignore copy of source files in Distcp > - > > Key: HDFS-14788 > URL: https://issues.apache.org/jira/browse/HDFS-14788 > Project: Hadoop HDFS > Issue Type: Improvement > Components: distcp >Affects Versions: 3.2.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Major > Fix For: 3.3.0 > > > There is a feature in Distcp where we can ignore specific files to get copied > to the destination. This is currently based on a filter regex which is read > from a specific file. The process of creating different regex file for > different distcp jobs seems like a tedious task. What we are proposing is to > expose a regex_filter parameter which can be set during Distcp job creation > and use this filter in a new implementation CopyFilter class. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org