[ 
https://issues.apache.org/jira/browse/NUTCH-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004191#comment-15004191
 ] 

Michael Joyce commented on NUTCH-2166:
--------------------------------------

Output from a small example run. I don't know that I'm terribly happy with the 
_file solution. Open to ideas on that.

{code}
dumpoutputtest/
├── edu
│   └── caltech
│       └── www
│           └── http
│               └── _file
└── gov
    └── nasa
        ├── eyes
        │   └── http
        │       ├── _file
        │       ├── earth
        │       │   └── _file
        │       └── exoplanets
        │           └── _file
        ├── jpl
        │   ├── blogs
        │   │   └── http
        │   │       └── _file
        │   ├── http
        │   │   └── _file
        │   ├── mars
        │   │   └── http
        │   │       └── _file
        │   ├── photojournal
        │   │   └── http
        │   │       └── _file
        │   ├── planetquest
        │   │   └── http
        │   │       └── _file
        │   └── www
        │       └── http
        │           ├── _file
        │           ├── about
        │           │   ├── _file
        │           │   ├── exec.php
        │           │   ├── history.php
        │           │   └── reports.php
        │           ├── apps
        │           │   └── _file
        │           ├── asteroidwatch
        │           │   └── _file
        │           ├── contact_JPL.php
        │           ├── edu
        │           │   ├── _file
        │           │   ├── events
        │           │   │   ├── 2015
        │           │   │   │   └── 11
        │           │   │   │       └── 1
        │           │   │   │           └── 
see-the-phases-of-the-moon-by-day-and-by-night
        │           │   │   │               └── _file
        │           │   │   └── _file
        │           │   ├── intern
        │           │   │   └── _file
        │           │   ├── learn
        │           │   │   └── _file
        │           │   ├── news
        │           │   │   └── _file
        │           │   └── teach
        │           │       └── _file
        │           ├── events
        │           │   ├── _file
        │           │   ├── lectures.php
        │           │   ├── open-house.php
        │           │   ├── speakers-bureau.php
        │           │   ├── team-competitions.php
        │           │   └── tours
        │           │       └── views
        │           │           └── _file
        │           ├── infographics
        │           │   └── _file
        │           ├── missions
        │           │   └── _file
        │           ├── multimedia
        │           │   └── audio.php
        │           ├── news
        │           │   ├── _file
        │           │   ├── factsheets.php
        │           │   ├── mediaroom.php
        │           │   └── presskits.php
        │           ├── opportunities
        │           │   └── _file
        │           ├── social
        │           │   └── _file
        │           ├── spaceimages
        │           │   └── _file
        │           └── videos
        │               └── _file
        ├── solarsystem
        │   └── http
        │       └── _file
        └── www
            └── http
                ├── _file
                └── earthrightnow
                    └── _file

51 directories, 44 files

{code}

> Add reverse URL format to dump tool
> -----------------------------------
>
>                 Key: NUTCH-2166
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2166
>             Project: Nutch
>          Issue Type: Improvement
>          Components: tool
>    Affects Versions: 2.3, 1.10
>            Reporter: Michael Joyce
>            Assignee: Michael Joyce
>             Fix For: 2.4, 1.11
>
>         Attachments: NUTCH-2166_joyce_13Nov2015.patch
>
>
> Update the FileDumper tool with an option for dumping files to the output 
> directory in reverse URL format.
> So the file for 
> http://bar.foo.com:8983/to/index.html?a=b
> Would dump to
> <output folder>/com/foo/bar/8983/http/to/index.html?a=b



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to