[ 
https://issues.apache.org/jira/browse/HADOOP-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei (Eddy) Xu updated HADOOP-5353:
----------------------------------

    Description: 
This is something only of relevance of people doing front ends to FS 
operations, and as they could take the code in FSUtil and add something with 
this feature, its a blocker to none of them. 

Current FileUtil.copy can take a long time to move large files around, but 
there is no progress indicator to GUIs, or a way to cancel the operation 
mid-way, j interrupting the thread or closing the filesystem.

I propose a FileIOProgress interface to the copy ops, one that had a single 
method to notify listeners of bytes read and written, and the number of files 
handled.

{code}
interface FileIOProgress {
 boolean progress(int files, long bytesRead, long bytesWritten);
}

The return value would be true to continue the operation, or false to stop the 
copy and leave the FS in whatever incomplete state it is in currently. 

it could even be fancier: have  beginFileOperation and endFileOperation 
callbacks to pass in the name of the current file being worked on, though I 
don't have a personal need for that.

GUIs could show progress bars and cancel buttons, other tools could use the 
interface to pass any cancellation notice upstream.

The FileUtil.copy operations would call this interface (blocking) after every 
block copy, so the frequency of invocation would depend on block size and 
network/disk speeds. Which is also why I don't propose having any percentage 
done indicators; it's too hard to predict percentage of time done for 
distributed file IO with any degree of accuracy.

  was:
This is something only of relevance of people doing front ends to FS 
operations, and as they could take the code in FSUtil and add something with 
this feature, its a blocker to none of them. 

Current FileUtil.copy can take a long time to move large files around, but 
there is no progress indicator to GUIs, or a way to cancel the operation 
mid-way, short of interrupting the thread or closing the filesystem.

I propose a FileIOProgress interface to the copy ops, one that had a single 
method to notify listeners of bytes read and written, and the number of files 
handled.

{code}
interface FileIOProgress {
 boolean progress(int files, long bytesRead, long bytesWritten);
}

The return value would be true to continue the operation, or false to stop the 
copy and leave the FS in whatever incomplete state it is in currently. 

it could even be fancier: have  beginFileOperation and endFileOperation 
callbacks to pass in the name of the current file being worked on, though I 
don't have a personal need for that.

GUIs could show progress bars and cancel buttons, other tools could use the 
interface to pass any cancellation notice upstream.

The FileUtil.copy operations would call this interface (blocking) after every 
block copy, so the frequency of invocation would depend on block size and 
network/disk speeds. Which is also why I don't propose having any percentage 
done indicators; it's too hard to predict percentage of time done for 
distributed file IO with any degree of accuracy.


> add progress callback feature to the slow FileUtil operations with ability to 
> cancel the work
> ---------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5353
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5353
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 0.21.0
>            Reporter: Steve Loughran
>            Assignee: Lei (Eddy) Xu
>            Priority: Minor
>         Attachments: HADOOP-5353.000.patch
>
>
> This is something only of relevance of people doing front ends to FS 
> operations, and as they could take the code in FSUtil and add something with 
> this feature, its a blocker to none of them. 
> Current FileUtil.copy can take a long time to move large files around, but 
> there is no progress indicator to GUIs, or a way to cancel the operation 
> mid-way, j interrupting the thread or closing the filesystem.
> I propose a FileIOProgress interface to the copy ops, one that had a single 
> method to notify listeners of bytes read and written, and the number of files 
> handled.
> {code}
> interface FileIOProgress {
>  boolean progress(int files, long bytesRead, long bytesWritten);
> }
> The return value would be true to continue the operation, or false to stop 
> the copy and leave the FS in whatever incomplete state it is in currently. 
> it could even be fancier: have  beginFileOperation and endFileOperation 
> callbacks to pass in the name of the current file being worked on, though I 
> don't have a personal need for that.
> GUIs could show progress bars and cancel buttons, other tools could use the 
> interface to pass any cancellation notice upstream.
> The FileUtil.copy operations would call this interface (blocking) after every 
> block copy, so the frequency of invocation would depend on block size and 
> network/disk speeds. Which is also why I don't propose having any percentage 
> done indicators; it's too hard to predict percentage of time done for 
> distributed file IO with any degree of accuracy.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to