[ 
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002328#comment-15002328
 ] 

Michael Joyce commented on NUTCH-2166:
--------------------------------------

Small change in dump format. Instead of making a bajillion nested folders it 
seems like it might be nicer to simple use the reverse URL as the file name.

So the file for 
http://bar.foo.com:8983/to/index.htm
Would dump to the encoded
<output folder>/com%2Ffoo%2Fbar%2F8983%2Fhttp%2Fto%2Findex.htm

Of course, we may then run into file name length issues this way. Perhaps 
having both eventually will be useful?

> Add reverse URL format to dump tool
> -----------------------------------
>
>                 Key: NUTCH-2166
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2166
>             Project: Nutch
>          Issue Type: Improvement
>          Components: tool
>    Affects Versions: 2.3, 1.10
>            Reporter: Michael Joyce
>            Assignee: Michael Joyce
>             Fix For: 2.4, 1.11
>
>
> Update the FileDumper tool with an option for dumping files to the output 
> directory in reverse URL format.
> So the file for 
> http://bar.foo.com:8983/to/index.html?a=b
> Would dump to
> <output folder>/com/foo/bar/8983/http/to/index.html?a=b



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to