[ https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002328#comment-15002328 ]
Michael Joyce commented on NUTCH-2166: -------------------------------------- Small change in dump format. Instead of making a bajillion nested folders it seems like it might be nicer to simple use the reverse URL as the file name. So the file for http://bar.foo.com:8983/to/index.htm Would dump to the encoded <output folder>/com%2Ffoo%2Fbar%2F8983%2Fhttp%2Fto%2Findex.htm Of course, we may then run into file name length issues this way. Perhaps having both eventually will be useful? > Add reverse URL format to dump tool > ----------------------------------- > > Key: NUTCH-2166 > URL: https://issues.apache.org/jira/browse/NUTCH-2166 > Project: Nutch > Issue Type: Improvement > Components: tool > Affects Versions: 2.3, 1.10 > Reporter: Michael Joyce > Assignee: Michael Joyce > Fix For: 2.4, 1.11 > > > Update the FileDumper tool with an option for dumping files to the output > directory in reverse URL format. > So the file for > http://bar.foo.com:8983/to/index.html?a=b > Would dump to > <output folder>/com/foo/bar/8983/http/to/index.html?a=b -- This message was sent by Atlassian JIRA (v6.3.4#6332)