[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp

2021-07-05 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374981#comment-17374981
 ] 

Steve Loughran commented on HDFS-14788:
---

a modtime filter bundled into hadoop-distcp could be nice

> Use dynamic regex filter to ignore copy of source files in Distcp
> -
>
> Key: HDFS-14788
> URL: https://issues.apache.org/jira/browse/HDFS-14788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.2.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
> Fix For: 3.3.0
>
>
> There is a feature in Distcp where we can ignore specific files to get copied 
> to the destination. This is currently based on a filter regex which is read 
> from a specific file. The process of creating different regex file for 
> different distcp jobs seems like a tedious task. What we are proposing is to 
> expose a regex_filter parameter which can be set during Distcp job creation 
> and use this filter in a new implementation CopyFilter class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp

2021-07-05 Thread Mukund Thakur (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17374569#comment-17374569
 ] 

Mukund Thakur commented on HDFS-14788:
--

[~wanghongbing] You could always write a new implementation and configure as 
per the doc 
[https://github.com/apache/hadoop/pull/1702/files#diff-aabf0a2eb6a65a9c67335f493b233fddbf6f177ffcdcea32792bba24498c38a0R445]

 

> Use dynamic regex filter to ignore copy of source files in Distcp
> -
>
> Key: HDFS-14788
> URL: https://issues.apache.org/jira/browse/HDFS-14788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.2.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
> Fix For: 3.3.0
>
>
> There is a feature in Distcp where we can ignore specific files to get copied 
> to the destination. This is currently based on a filter regex which is read 
> from a specific file. The process of creating different regex file for 
> different distcp jobs seems like a tedious task. What we are proposing is to 
> expose a regex_filter parameter which can be set during Distcp job creation 
> and use this filter in a new implementation CopyFilter class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp

2021-07-01 Thread Hongbing Wang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17372761#comment-17372761
 ] 

Hongbing Wang commented on HDFS-14788:
--

Is there a plan to filter files by modtime? In the scenario of incremental data 
synchronization, if files in certain time windows can be specified, efficiency 
can be greatly improved.

> Use dynamic regex filter to ignore copy of source files in Distcp
> -
>
> Key: HDFS-14788
> URL: https://issues.apache.org/jira/browse/HDFS-14788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.2.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
> Fix For: 3.3.0
>
>
> There is a feature in Distcp where we can ignore specific files to get copied 
> to the destination. This is currently based on a filter regex which is read 
> from a specific file. The process of creating different regex file for 
> different distcp jobs seems like a tedious task. What we are proposing is to 
> expose a regex_filter parameter which can be set during Distcp job creation 
> and use this filter in a new implementation CopyFilter class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp

2020-01-06 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009117#comment-17009117
 ] 

Hudson commented on HDFS-14788:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #17818 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/17818/])
HDFS-14788. Use dynamic regex filter to ignore copy of source files in (stevel: 
rev 819159fa060897bcf7c9ae09bf4b2fc97292f92b)
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java
* (add) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestCopyFilter.java
* (edit) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/CopyFilter.java
* (add) 
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/RegexpInConfigurationFilter.java
* (add) 
hadoop-tools/hadoop-distcp/src/test/java/org/apache/hadoop/tools/TestRegexpInConfigurationFilter.java
* (edit) hadoop-tools/hadoop-distcp/src/site/markdown/DistCp.md.vm


> Use dynamic regex filter to ignore copy of source files in Distcp
> -
>
> Key: HDFS-14788
> URL: https://issues.apache.org/jira/browse/HDFS-14788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.2.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
> Fix For: 3.3.0
>
>
> There is a feature in Distcp where we can ignore specific files to get copied 
> to the destination. This is currently based on a filter regex which is read 
> from a specific file. The process of creating different regex file for 
> different distcp jobs seems like a tedious task. What we are proposing is to 
> expose a regex_filter parameter which can be set during Distcp job creation 
> and use this filter in a new implementation CopyFilter class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14788) Use dynamic regex filter to ignore copy of source files in Distcp

2020-01-06 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17009113#comment-17009113
 ] 

Steve Loughran commented on HDFS-14788:
---

committed to trunk. Unlikely to cause problems if backport to 3.2/3.1

> Use dynamic regex filter to ignore copy of source files in Distcp
> -
>
> Key: HDFS-14788
> URL: https://issues.apache.org/jira/browse/HDFS-14788
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.2.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
> Fix For: 3.3.0
>
>
> There is a feature in Distcp where we can ignore specific files to get copied 
> to the destination. This is currently based on a filter regex which is read 
> from a specific file. The process of creating different regex file for 
> different distcp jobs seems like a tedious task. What we are proposing is to 
> expose a regex_filter parameter which can be set during Distcp job creation 
> and use this filter in a new implementation CopyFilter class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org