[ https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500728 ]
Andrzej Bialecki commented on NUTCH-392: ----------------------------------------- > I think it is okay to allow BLOCK compression for linkdb, crawldb, crawl_*, > content, parse_data. Because I don't think that people will need fast > random-access > on anything but parse_text. LinkDb is accessed on-line randomly through LinkDbInlinks, when users request anchors. Similarly, parse_data is accessed when requesting "explain", and may be also accessed to retrieve other hit metadata. Content is accessed randomly when displaying cached preview. I think in all these cases we can use at most RECORD compression, or NONE. > OutputFormat implementations should pass on Progressable > -------------------------------------------------------- > > Key: NUTCH-392 > URL: https://issues.apache.org/jira/browse/NUTCH-392 > Project: Nutch > Issue Type: New Feature > Components: fetcher > Reporter: Doug Cutting > Assignee: Andrzej Bialecki > Fix For: 1.0.0 > > Attachments: NUTCH-392.patch > > > OutputFormat implementations should pass the Progressable they are passed to > underlying SequenceFile implementations. This will keep reduce tasks from > timing out when block writes are slow. This issue depends on > http://issues.apache.org/jira/browse/HADOOP-636. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.