[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2024-07-04 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863112#comment-17863112
 ] 

ZanderXu commented on HDFS-2139:


Thanks [~zero45] [~liuguanghua] for your remind. I have forcefully updated the 
development branch HDFS-2139 based on the latest trunk. You can submit PRs to 
this development branch and I will review them. 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2024-07-04 Thread Plamen Jeliazkov (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863070#comment-17863070
 ] 

Plamen Jeliazkov commented on HDFS-2139:


Hey folks, it has been a while since I've contributed but this is a feature I 
have also been deeply interested in and would use in a production environment. 
Would love to collaborate and help drive this forward. Thank you.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2024-05-15 Thread liuguanghua (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846574#comment-17846574
 ] 

liuguanghua commented on HDFS-2139:
---

[~haiyang Hu]  Hello sir. Are you still working on this now ?  I am interested 
in doing some things in this job. [~xuzq_zander]   And I will use fastcopy in 
production environment.   

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2024-05-11 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845565#comment-17845565
 ] 

ZanderXu commented on HDFS-2139:


https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1uGHA2dXLldlNoaYF-4c63baYjCuft_T88wdvhwVgh6c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2024-05-07 Thread liuguanghua (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844497#comment-17844497
 ] 

liuguanghua commented on HDFS-2139:
---

[~xuzq_zander]  Hello,sir.    The design doc can not be viewd because of 
permission. 

[https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing]

Can you upload a new version in Attachments? Thanks very much

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2023-05-30 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727476#comment-17727476
 ] 

ZanderXu commented on HDFS-2139:


{quote}[~xuzq_zander] Hi bro,I am interested in doing some things in this job.
{quote}
[~pengbei] Thank you very much and I'm so happy you are interested in this 
ticket as well.  I'm so sorry I have assigned them to [~haiyang Hu] . Let's 
help to review and push it forward together. Thanks again. 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2023-05-30 Thread Haiyang Hu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727473#comment-17727473
 ] 

Haiyang Hu commented on HDFS-2139:
--

Hi [~xuzq_zander] bor, I am very interested for this job, and I will work hard 
to complete it,Thanks.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2023-05-30 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727471#comment-17727471
 ] 

Bei Peng commented on HDFS-2139:


[~xuzq_zander] Hi bro,I am interested in doing some things in this job.

I can do some sub-tasks such as:   
https://issues.apache.org/jira/browse/HDFS-16758   and  
https://issues.apache.org/jira/browse/HDFS-16760 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2023-05-30 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727465#comment-17727465
 ] 

ZanderXu commented on HDFS-2139:


[~ferhui] Master, I'm so sorry that I can't push this ticket forward due to 
work reasons. cc [~ayushtkn] 

[~haiyang Hu] Hi bro, are you interested in doing this job? I can help to 
review your PRs. 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-09-06 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600719#comment-17600719
 ] 

Steve Loughran commented on HDFS-2139:
--

As you are going to are you making changes in the public/stable filesystem 
APIs, I'd like to keep an eye on that.

Anything that goes in 
* should have its own HADOOP- JIRA, even if all the work is in the hdfs branch.
* needs to be able to work well in cloud infrastructure is which implement 
similar capabilities but different latencies etc.
* 

Whatever copy command goes in shouldn't be a copy(src, dest) -> boolean call 
but instead return a builder subclass of FSDataOutputStreamBuilder which can 
allow for extra options, and return a Future which the caller can block on; etc 
etc
* needs to have something in the filesystem markdown spec and a matching 
contract test
* and a PathCapabilities probe which can check for the API being available 
under a path.
* and fail by throwing exceptions, not returning true/false. A return value is 
needed for the future; something which implements IOStatisticsSource is useful.

Any new API should work identically with azure storage as/when it adds the 
needed operation.; S3's file-by-file COPY call would also be supported. It is 
not going to be as fast as anything in HFS, but as it doesn't use any network 
IO outside the S3 store it is higher bandwidth and scales better than this CP 
would normally do. (The Hive team have asked for S3 copying before, but it gets 
complex once you start to think about encryption; s3a support might need to add 
extra source files


{code}
Future r = fs.copy(src, dest)
  .withFileStatus(srcStatus)   // as with openFile
.withProgress(progressable)
.must("fs.option.copy.atomic", true)   // example of a builder option, 
here one requiring atomic file/dir copy.
.build())
 
r.get();   // block for result
 
{code}

I'd also propose it as a new interface which both FileContext and FileSystem 
implement.

Also, fs shell could be good simple place for this to be used too...easier to 
get working/stabilise there.



> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-09-05 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600565#comment-17600565
 ] 

Hui Fei commented on HDFS-2139:
---

[~xuzq_zander] cut a feature branch HDFS-2139 from trunk. can start your work, 
thanks!

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-29 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597152#comment-17597152
 ] 

ZanderXu commented on HDFS-2139:


[~ayushtkn] Sir, thanks for your ideas. I will consider it during coding.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-28 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17596982#comment-17596982
 ] 

Ayush Saxena commented on HDFS-2139:


Had a quick look on the design, looks good to me.

Just a point regarding the migration conditions,
 * I think the storage type also needs to be same, like DIsk to Disk
 * Both source & target if encrypted should be within encryption zones with 
same keys.

Since this is copy & same block data will be pointing to two different files in 
different namespace(ns1 & ns2), we should make sure append & truncate on one 
file doesn't bother the data in other.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-28 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17596981#comment-17596981
 ] 

ZanderXu commented on HDFS-2139:


Copy, sir. Thanks

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-28 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17596967#comment-17596967
 ] 

Hui Fei commented on HDFS-2139:
---

[~xuzq_zander] I think the design doc is good to me, Thanks.
We can wait for others' feedback until the end of this week. If no other 
comments, I will create a feature branch and you can start your work

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-25 Thread fanshilun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585081#comment-17585081
 ] 

fanshilun commented on HDFS-2139:
-

[~ferhui] Thank you very much for your detailed explanation!

Very much looking forward to your completion of this feature!

Thanks again for your contribution!!!

[~ferhui] [~xuzq_zander] 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-24 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584592#comment-17584592
 ] 

Hui Fei commented on HDFS-2139:
---

[~slfan1989] Thanks for your comments.

> keep the original Assignee of this jira

It seems that [~rituraj] hasn't watched this ticket for several years. So I 
assign it to [~xuzq_zander] and he has a chance to take over this task.
Anyway I reassigned it to [~rituraj]

> Task1: Add a new method LocalBlockCopyViaHardLink to Datanode
This doesn't seem to be described in the documentation

I think it is clear in the initial path. I guess the design document aims to 
describe improvements.

> . Is there enough performance test data for HDFS-15294? What is the expected 
> performance improvement of HDFS-2139 after implementation?

Fast copy aims to be instead of copy in distcp. It seems the benchmark is 
enough to show the expected result, right?




> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-24 Thread fanshilun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584147#comment-17584147
 ] 

fanshilun commented on HDFS-2139:
-

[~ferhui] Personally, this jira has helped a lot of people, I think we should 
keep the original Assignee of this jira, should we create subtasks and assign 
them?

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-24 Thread fanshilun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584145#comment-17584145
 ] 

fanshilun commented on HDFS-2139:
-

[~xuzq_zander]

Very happy that this feature can be restarted, but there are the following 
problems:
  1. Is there enough performance test data for HDFS-15294? What is the expected 
performance improvement of HDFS-2139 after implementation? 
  2. It seems that the planning of tasks in the design document is not very 
clear. Can you explain the specific transformation content of each task in 
detail?



 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-24 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17584144#comment-17584144
 ] 

Hui Fei commented on HDFS-2139:
---

[~weichiu] Could you check the design doc [~xuzq_zander] provided?

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode
> [~xuzq_zander]Provided a design doc 
> https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-23 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583489#comment-17583489
 ] 

ZanderXu commented on HDFS-2139:


[~pengbei] Thanks for you suggestion. Yes, you are right, the design doc 
contains this block level input format description, 
[Here|https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit#heading=h.jxwxyrx0d7f3].
 Or you can refer to BlockLevel InputFormat in [Design 
Doc|https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing].





> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-23 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583461#comment-17583461
 ] 

Bei Peng commented on HDFS-2139:


[~xuzq_zander]  In my experience, the speed of FastCopy depends on the number 
of blocks and the number of files  (the amount of metadata).  Distcp's existing 
Map task input splitting strategy will cause data skewing when using FastCopy. 
For example, a Map will copy a 128 MB file with only 1 block. The other Map 
will copy 128 1M files with 128 blocks, which leads to long-tailed tasks.   So 
I think we need a Map task  input splitting strategy based on the number of 
blocks. 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-19 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581740#comment-17581740
 ] 

ZanderXu commented on HDFS-2139:


https://docs.google.com/document/d/1OHdUpQmKD3TZ3xdmQsXNmlXJetn2QFPinMH31Q4BqkI/edit?usp=sharing

[~ferhui][~weichiu][~ayushtkn][~pengbei] Master, sorry for the late design. 
Please help me reviewing this design. And I will start the development work in 
parallel.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578340#comment-17578340
 ] 

Hui Fei commented on HDFS-2139:
---

Glad to receive positive feedbacks, thank you! [~xuzq_zander] is interested in 
this feature and will assign this ticket to him. We can help review.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578322#comment-17578322
 ] 

ZanderXu commented on HDFS-2139:


Thanks [~weichiu] [~ayushtkn] [~ferhui] [~pengbei] for your comments. I will 
prepare a detailed design this weekend, please help me review it after I 
completed.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578296#comment-17578296
 ] 

Bei Peng commented on HDFS-2139:


{quote}Many companies backport it into their internal branches and use it.
 * DistCp supports fastcopy
 * Implement block based strategy{quote}
me too.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-10 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578258#comment-17578258
 ] 

Ayush Saxena commented on HDFS-2139:


{quote}Many companies backport it into their internal branches and use it.
{quote}
Yeps, I am one of those to do so long back for our internal branch. :)

Should be great if we could get it pushed here as well.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-10 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578236#comment-17578236
 ] 

ZanderXu commented on HDFS-2139:


[~weichiu] [~ferhui] Thanks for planing to push this feature forward. 
{quote}Some questions I have as I wasn't involved in this from the begining. 
How is this different from other similar features? E.g. HDFS-3370 HDFS-15294 
(federation rename/balance)
{quote}
HDFS-3370 proposes hard link a file in one NameService with the same block list.

HDFS-15294 proposes a solution to balance files in different NameServices by 
DistCp. 

HDFS-2139 proposes a high performance data migration tool, FastCp. Because if 
the source DN belongs to both the source NameService and the target 
NameService, we can use hard link technology instead of data copy to improve 
the performance.

!SeaTalk_IMG_1660188087.png|width=1175,height=415!

I have some practical experience with it. I'd like to take over and push this 
feature forward if I can. [~ferhui] [~weichiu] [~hexiaoqiao] 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, SeaTalk_IMG_1660188087.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-10 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578063#comment-17578063
 ] 

Wei-Chiu Chuang commented on HDFS-2139:
---

Sounds great!
Some questions I have as I wasn't involved in this from the begining. How is 
this different from other similar features? E.g. HDFS-3370 HDFS-15294 
(federation rename/balance)

A small design doc describing your enhancements is greatly appreciated.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-10 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578053#comment-17578053
 ] 

Hui Fei commented on HDFS-2139:
---

Fast Copy really is a good feature. In my experience it can help us copy files 
from one namespace to another and solve the scalability issues of namenode. And 
I know that Many companies backport it into their internal branches and use it.
We did several improvements based on this initial patch.
* DistCp supports fastcopy
* Implement block based strategy
* Fastcopy supports EC

So I plan to push this feature into trunk. What do you think of it? [~weichiu] 
[~aajisaka] [~hexiaoqiao]

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2017-10-30 Thread feiwei (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225268#comment-16225268
 ] 

feiwei commented on HDFS-2139:
--

In FsDatasetImpl.java  , You should modify to ensure synchronization 

public void hardLinkOneBlock(ExtendedBlock srcBlock, ExtendedBlock dstBlock) 
throws IOException {
BlockLocalPathInfo blpi = getBlockLocalPathInfo(srcBlock);
File src = new File(blpi.getBlockPath());
File srcMeta = new File(blpi.getMetaPath());

if (getVolume(srcBlock).getAvailable() < dstBlock.getNumBytes()) {
  throw new DiskOutOfSpaceException("Insufficient space for hardlink block 
" + srcBlock);
}

BlockPoolSlice dstBPS = 
getVolume(srcBlock).getBlockPoolSlice(dstBlock.getBlockPoolId());

synchronized (this) {
   File dstBlockFile = dstBPS.hardLinkOneBlock(src, srcMeta, 
dstBlock.getLocalBlock());
   dstBlockFile = dstBPS.addBlock(dstBlock.getLocalBlock(), 
dstBlockFile);
   ReplicaInfo replicaInfo = new 
FinalizedReplica(dstBlock.getLocalBlock(), getVolume(srcBlock), 
dstBlockFile.getParentFile());
   volumeMap.add(dstBlock.getBlockPoolId(), replicaInfo);
}
  }

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2017-08-21 Thread Doris Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136200#comment-16136200
 ] 

Doris Gu commented on HDFS-2139:


[~mopishv0] Could you please tell me the plan of this issue? If not have, I am 
glad to know the information of this tool's practical experience or test 
results.
Btw, one more question, you mentioned that "hard-links at the HDFS file level 
won't work when copying files between two namespaces(with same datanodes) in 
fedaration", could you please explain this more detailedly?
Thanks very much!

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2016-02-01 Thread M. C. Srivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127543#comment-15127543
 ] 

M. C. Srivas commented on HDFS-2139:


Just a curiosity ... why was this method chosen? Why not simply implement 
hard-links at the HDFS file level? Then a simple NN transaction is sufficient 
to create the new pathname and add a ref count to the Inode. What is the 
benefit of doing it this way? Is it because we are going across two different 
NNs? If so, how do you prevent a simultaneous delete of the file at the srcNN 
from running ahead and removing some of the blocks of the src file?


> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2016-02-01 Thread cuixin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127423#comment-15127423
 ] 

cuixin commented on HDFS-2139:
--

Thanks your code diff. Do we have a plan to release this patch?

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2016-02-01 Thread Liu Junhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127584#comment-15127584
 ] 

Liu Junhong commented on HDFS-2139:
---

We will begin to use federation and need to copy files between two 
namespaces(with same datanodes), so hard-links at the HDFS file level won't 
work,  fastcp is needed and better than distcp.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2016-02-01 Thread Liu Junhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127616#comment-15127616
 ] 

Liu Junhong commented on HDFS-2139:
---

How the fastcopy work with 2 different NNs (assume  src file is in NN1,  dst is 
NN2):
1: send create file to NN2
2: getblocklocation for the src file
3: send addblock to NN2 using the favornodes
4: send copyblock to the datanode whitch is the result of step 3
So, if the src file is deleted before step 1, step 2 will be fail, and the dst 
file will be delete by leasemanager.
If the src file is deleted before step4, some of the dst blocks' final state 
will not be finalized,  it will be delete by FastCopy at line 745.
But there is a worst situation: a runtime exception occurs, it will lead block 
missing.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2016-02-01 Thread Liu Junhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127592#comment-15127592
 ] 

Liu Junhong commented on HDFS-2139:
---

I think it's useful when copy files between two namespaces(with same datanodes).
I need a reviewer [~dhruba] 

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2014-05-02 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988059#comment-13988059
 ] 

Daryn Sharp commented on HDFS-2139:
---

I glanced through the patch but haven't studied it.  Initial questions:
# Are block tokens being checked for this operation?
# Does the DN enforce no linking of UC blocks?

 Fast copy for HDFS.
 ---

 Key: HDFS-2139
 URL: https://issues.apache.org/jira/browse/HDFS-2139
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Pritam Damania
 Attachments: HDFS-2139.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 There is a need to perform fast file copy on HDFS. The fast copy mechanism 
 for a file works as
 follows :
 1) Query metadata for all blocks of the source file.
 2) For each block 'b' of the file, find out its datanode locations.
 3) For each block of the file, add an empty block to the namesystem for
 the destination file.
 4) For each location of the block, instruct the datanode to make a local
 copy of that block.
 5) Once each datanode has copied over its respective blocks, they
 report to the namenode about it.
 6) Wait for all blocks to be copied and exit.
 This would speed up the copying process considerably by removing top of
 the rack data transfers.
 Note : An extra improvement, would be to instruct the datanode to create a
 hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2014-04-29 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984296#comment-13984296
 ] 

Guo Ruijing commented on HDFS-2139:
---

comment 1: using hardlink to copy data is a good idea. but we can still keep 
copy in RPC like following and using hardlink in implementation.

message CopyBlockRequestProto {
  required ExtendedBlockProto srcBlock = 1;
  required ExtendedBlockProto dstBlock = 2;
  required uint64 length = 3;
}

in implementation, 
if platform don't support hardlink, we can use copy
if length == srcBlock length, we can use hardlink
if length != srcBlock lenth, we can use copy

 Fast copy for HDFS.
 ---

 Key: HDFS-2139
 URL: https://issues.apache.org/jira/browse/HDFS-2139
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Pritam Damania
 Attachments: HDFS-2139.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 There is a need to perform fast file copy on HDFS. The fast copy mechanism 
 for a file works as
 follows :
 1) Query metadata for all blocks of the source file.
 2) For each block 'b' of the file, find out its datanode locations.
 3) For each block of the file, add an empty block to the namesystem for
 the destination file.
 4) For each location of the block, instruct the datanode to make a local
 copy of that block.
 5) Once each datanode has copied over its respective blocks, they
 report to the namenode about it.
 6) Wait for all blocks to be copied and exit.
 This would speed up the copying process considerably by removing top of
 the rack data transfers.
 Note : An extra improvement, would be to instruct the datanode to create a
 hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2014-04-29 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13984343#comment-13984343
 ] 

Guo Ruijing commented on HDFS-2139:
---

comment 2: we may implement file clone/copy.

1. API: public int64 clone(Path src, Path dest, int64 length)

src file will be copied/cloned to dest file with length.
FileSystem.java in hadoop-common is changed to support file clone with length.
length = -1 means whole file is cloned/copied.
return value means how many bytes are copied.

2. The public API will call internal API cloneInternal in HDFS

uint64 cloneInternal(String src, String destination,
DistributedFileSystem srcFs, DistributedFileSystem dstFs, uint64 length)
most of code can be moved to cloneInternal in hadoop-hdfs-project/hadoop-hdfs

3. tools directory keep less code and call cloneInternal


 Fast copy for HDFS.
 ---

 Key: HDFS-2139
 URL: https://issues.apache.org/jira/browse/HDFS-2139
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Pritam Damania
 Attachments: HDFS-2139.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 There is a need to perform fast file copy on HDFS. The fast copy mechanism 
 for a file works as
 follows :
 1) Query metadata for all blocks of the source file.
 2) For each block 'b' of the file, find out its datanode locations.
 3) For each block of the file, add an empty block to the namesystem for
 the destination file.
 4) For each location of the block, instruct the datanode to make a local
 copy of that block.
 5) Once each datanode has copied over its respective blocks, they
 report to the namenode about it.
 6) Wait for all blocks to be copied and exit.
 This would speed up the copying process considerably by removing top of
 the rack data transfers.
 Note : An extra improvement, would be to instruct the datanode to create a
 hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2011-09-26 Thread Hairong Kuang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115079#comment-13115079
 ] 

Hairong Kuang commented on HDFS-2139:
-

Hi Pritam, could you please create several subtasks and contribute what you did 
for our internal branch to the Apache? Thanks!

 Fast copy for HDFS.
 ---

 Key: HDFS-2139
 URL: https://issues.apache.org/jira/browse/HDFS-2139
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Pritam Damania
   Original Estimate: 168h
  Remaining Estimate: 168h

 There is a need to perform fast file copy on HDFS. The fast copy mechanism 
 for a file works as
 follows :
 1) Query metadata for all blocks of the source file.
 2) For each block 'b' of the file, find out its datanode locations.
 3) For each block of the file, add an empty block to the namesystem for
 the destination file.
 4) For each location of the block, instruct the datanode to make a local
 copy of that block.
 5) Once each datanode has copied over its respective blocks, they
 report to the namenode about it.
 6) Wait for all blocks to be copied and exit.
 This would speed up the copying process considerably by removing top of
 the rack data transfers.
 Note : An extra improvement, would be to instruct the datanode to create a
 hardlink of the block file if we are copying a block on the same datanode

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira