[ http://issues.apache.org/jira/browse/HADOOP-619?page=all ]

Doug Cutting updated HADOOP-619:
--------------------------------

    Status: Open  (was: Patch Available)

The spacing in this patch is still non-standard.

I note that the "mapred.input.subdir" feature is removed.  I think this is 
okay, as no one uses it, but thought it should be noted.

Why can't validateInput() simply call globPaths() and check that the results 
exist?  The current implementation is not only much more complicated, but I'm 
not sure that it's correct, since it fails if any glob pattern fails to have 
matches.  Is that what we want?  I would think that non-matching glob 
expressions, like empty directories, should be ignored so long as some of the 
inputs exist.

Finally, it looks like globPaths() still calls Path#toString() instead of 
Path#toUri().getPath().

> Unify Map-Reduce and Streaming to take the same globbed input specification
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-619
>                 URL: http://issues.apache.org/jira/browse/HADOOP-619
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.9.1
>            Reporter: eric baldeschwieler
>         Assigned To: Sanjay Dahiya
>             Fix For: 0.10.0
>
>         Attachments: Hadoop-619.patch, Hadoop-619.patch, Hadoop-619.patch, 
> Hadoop-619_1.patch, Hadoop-619_1.patch, Hadoop-619_2.patch, Hadoop-619_2.patch
>
>
> Right now streaming input is specified very differently from other map-reduce 
> input.  It would be good if these two apps could take much more similar input 
> specs.
> In particular -input in streaming expects a file or glob pattern while MR 
> takes a directory.  It would be cool if both could take a glob patern of 
> files and if both took a directory by default (with some patern excluded to 
> allow logs, metadata and other framework output to be safely stored).
> We want to be sure that MR input is backward compatible over this change.  I 
> propose that a single file should be accepted as an input or a single 
> directory.  Globs should only match directories if the paterns is '/' 
> terminated, to avoid massive inputs specified by mistake.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to