[
https://issues.apache.org/jira/browse/NUTCH-451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrzej Bialecki updated NUTCH-451:
------------------------------------
Attachment: LocalFetchRecover.java
> Tool to recover partial fetcher output
> --------------------------------------
>
> Key: NUTCH-451
> URL: https://issues.apache.org/jira/browse/NUTCH-451
> Project: Nutch
> Issue Type: Improvement
> Components: fetcher
> Affects Versions: 0.9.0
> Reporter: Andrzej Bialecki
> Assigned To: Andrzej Bialecki
> Fix For: 0.9.0
>
> Attachments: LocalFetchRecover.java
>
>
> This class may help you to recover partial data from a failed Fetcher run.
> NOTE 1: this works ONLY if you ran Fetcher using "local" file system, i.e.
> you didn't use DFS - partial output to DFS is permanently lost if a process
> fails to properly close the output streams.
> NOTE 2: if Fetcher was stopped abruptly (killed or crashed), then partial
> SequenceFile-s will be corrupted at the end. This means that it won't be
> possible to recover all data from them - most likely only the data up to the
> last sync marker can be recovered.
> The recovery proces requires some preparation:
> * determine the map directories corresponding to the map task outputs of the
> failed job. These map directories contain SequenceFile-s consisting of pairs
> of <Text, FetcherOutput>, named e.g. part-0.out, or file.out, or spill0.out.
> * create the new input directory, let's say input/. Copy all SequenceFile-s
> into this directory, renaming them sequentially like this:
> input/part-00000
> input/part-00001
> input/part-00002
> input/part-00003
> ...
>
> * specify the "input" directory as the input to this tool.
> If all goes well, a new segment will be created as a subdirectory of the
> output dir.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers