subject:"\[jira\] Commented\: \(NUTCH\-392\) OutputFormat implementations should pass on Progressable"

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508900
 ] 

Andrzej Bialecki  commented on NUTCH-392:
-

Excellent work, Doğacan - thank you. The numbers for RECORD compression 
probably depend on some sweet spot in the environment, related to the CPU 
usage, how the OS pulls data from the disk / disk buffers, what is the hard 
drive cache, what is the size of internal mem buffers in Hadoop, etc, etc. I 
would venture a guess that compression NONE is raw disk I/O bound, whereas BOCK 
compression suffers from poor performance of seeking in compressed data.

I agree with your conclusions regarding the type of compression to use for each 
segment part.

Re: Nutch not doing any internal compression for Content and ParseText: Content 
is a versioned writable, so we can change its implementation and provide 
compatibility code to read older data. The same with ParseText.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch, ParseTextBenchmark.java
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508861
 ] 

Doğacan Güney commented on NUTCH-392:
-

After changing ParseText to not do any internal compression, segment directory 
looks like this:

828Mcrawl/segments/20070626163143/content
35M crawl/segments/20070626163143/crawl_fetch
23M crawl/segments/20070626163143/crawl_generate
44M crawl/segments/20070626163143/crawl_parse # BLOCK compression
218Mcrawl/segments/20070626163143/parse_data
524Mcrawl/segments/20070626163143/parse_text
192Mcrawl/segments/20070626163143/parse_text_block
242Mcrawl/segments/20070626163143/parse_text_record

As you can see parse_text_block is around %20 percent smaller than 
parse_text_record.

I also wrote a simple benchmark that randomly requests n urls from each parse 
text sequentially (but it requests the same urls in the same order from all 
parse texts). All parse texts contain a single part with ~250K urls. Here are 
the results (Trial 0 is NONE, trial is RECORD, trial 2 is BLOCK):

for n = 1000:
Trial 0 has taken 9947 ms.
Trial 1 has taken 6794 ms.
Trial 2 has taken 9717 ms.

for n = 5000:
Trial 0 has taken 40918 ms.
Trial 1 has taken 19969 ms.
Trial 2 has taken 52622 ms.

for n = 1
Trial 0 has taken 57622 ms.
Trial 1 has taken 24291 ms.
Trial 2 has taken 96292 ms.

Overall RECORD compression is the fastest and BLOCK compression is the slowest 
(by a large margin).

Assuming my benchmark code is correct (feel free to show me where it is wrong), 
these are my conclusions:

* I don't know what others think, but to me it still looks like we can use 
BLOCK compression for structures like content, linkdb, etc. Even though, it is 
much slower than RECORD, it can still serve ~100 parse texts per second. While, 
this is certainly not good enough for parse text, it probably is good enough 
for others.

* We should definitely enable RECORD compression for parse text and BLOCK 
compression for crawl_*. For some reason, RECORD compression performs better 
than O(n) (which makes me think that something is wrong with my benchmark code) 
for parse text.

* Nutch should not do any compression internally. Hadoop can do this better 
with its native compression. Content and ParseText compress their data on their 
own (and they can be converted to hadoop's compression in a backward-compatible 
way). I don't know if anything else does compression.

PS: Native hadoop library is loaded. I haven't specified which compression 
codec to use so I guess it uses zlib. Lzo results would have probably been 
better.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch, ParseTextBenchmark.java
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508823
 ] 

Doğacan Güney commented on NUTCH-392:
-

> data of parse_text is already compressed so recompressing it does not give 
> huge gains

Wow, I am certainly not at my sharpest today. Thanks for pointing out. I will 
change ParseText and report back with the results.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread Sami Siren (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508820
 ] 

Sami Siren commented on NUTCH-392:
--

> But why is parse_text_block's size so close to parse_text 
data of parse_text is already compressed so recompressing it does not give huge 
gains

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508818
 ] 

Doğacan Güney commented on NUTCH-392:
-

> Re: Content versioning - we can use negative int values as version numbers. 
> I'm still not sure what is the impact of 
> BLOCK compression on MapFile random access. 

Good idea! 

(Btw, I still believe that BLOCK compression's performance hit is irrelevant 
for anything but parse_text. That's why I am trying to do the second test. I 
was trying to test how fast random access on parse_text is under different 
compressions. BLOCK compression will probably be not fast enough for 
parse_text. But if the impact is minor, it can be used for everything else.)

>  Regarding the sizes: parse_text_record size is larger, because for small 
> chunks of data the compression overhead may far
> outweigh the compression gains. Re: the large size of crawl_parse - is this 
> related to your patch? It could be simply related to 
> the fact that there are many outlinks in those pages ... Or is crawl_parse 
> using BLOCK compression too?

OK, I understand why parse_text_record is larger, thanks for the explanation. 
But why is parse_text_block's size so close to parse_text (why is content so 
different from parse_text? BLOCK creates wonders in content but does not even 
give a 10% in parse_text.). Feed plugin wasn't enabled so my patch shouldn't 
matter. Also, crawl_parse is using NONE compression.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508816
 ] 

Andrzej Bialecki  commented on NUTCH-392:
-

Re: Content versioning - we can use negative int values as version numbers. I'm 
still not sure what is the impact of BLOCK compression on MapFile random access.

Regarding the sizes: parse_text_record size is larger, because for small chunks 
of data the compression overhead may far outweigh the compression gains. Re: 
the large size of crawl_parse - is this related to your patch? It could be 
simply related to the fact that there are many outlinks in those pages ... Or 
is crawl_parse using BLOCK compression too?

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508812
 ] 

Doğacan Güney commented on NUTCH-392:
-

OK, I have done a bit of testing on compression but I'm stuck. Here it is:

* I changed Content to be a regular Writable instead of a CompressedWritable 
and turned on BLOCK compression. Results were pretty impressive. Content size 
went down from ~1GB to ~500MB. Unfortunately, I haven't figured out how we can 
change Content in a backward compatible way. Reading first byte as version 
won't work (because first byte is not version, the first thing written is the 
size of the compressed data as int).

* This is where it gets strange. I was trying to test the performance impact of 
BLOCK compression (when generating summaries).  I fetched a sample 25 url 
segment (a subset of dmoz). Then I made a small modification to 
ParseOutputFormat so that it outputs parse_text in all three compression 
formats ( http://www.ceng.metu.edu.tr/~e1345172/comp_parse.patch ). After 
parsing, segment looks like this:

828Mcrawl/segments/20070626163143/content
35M crawl/segments/20070626163143/crawl_fetch
23M crawl/segments/20070626163143/crawl_generate
345Mcrawl/segments/20070626163143/crawl_parse
196Mcrawl/segments/20070626163143/parse_data
244Mcrawl/segments/20070626163143/parse_text # NONE
232Mcrawl/segments/20070626163143/parse_text_block # BLOCK
246Mcrawl/segments/20070626163143/parse_text_record # RECORD

Not only parse_text_record is larger than parse_text and parse_text_block is 
only slightly smaller, but also crawl_parse is larger than any of them!

I probably messed up somewhere and I can't see it. Any help would be welcome.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-02 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500951
 ] 

Andrzej Bialecki  commented on NUTCH-392:
-

I don't think it's a good idea, it's creating too many cryptic options ... 
Average users won't be able to assess what are the best choices there, and 
advanced users are able to change this directly in the source anyway ...

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-02 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500935
 ] 

Doğacan Güney commented on NUTCH-392:
-

Perhaps we can allow a user to configure this on a per-structure basis by 
adding new properties: compression.type.{parse_text,crawldb,parse_data,linkdb} 
or whatever. Then we can make such a property take one of the 4 valid values: 
BLOCK, NONE, RECORD, DEFAULT where DEFAULT is the value of 
io.sequence.file.compression.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread Doug Cutting (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500822
 ] 

Doug Cutting commented on NUTCH-392:


Anchors, explain, and the cache are used relatively infrequently, considerably 
less than once per query, and hence *much* less than once per displayed hit.  
So it might be acceptable if they're somewhat slower.  Block compression should 
still be fast-enough for interactive use, and these uses would never dominate 
CPU use in an application, would they?

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500728
 ] 

Andrzej Bialecki  commented on NUTCH-392:
-

> I think it is okay to allow BLOCK compression for linkdb, crawldb, crawl_*,
> content, parse_data. Because I don't think that people will need fast 
> random-access
>  on anything but parse_text.

LinkDb is accessed on-line randomly through LinkDbInlinks, when users request 
anchors. Similarly, parse_data is accessed when requesting "explain", and may 
be also accessed to retrieve other hit metadata. Content is accessed randomly 
when displaying cached preview. I think in all these cases we can use at most 
RECORD compression, or NONE.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500665
 ] 

Doğacan Güney commented on NUTCH-392:
-

I think it is okay to allow BLOCK compression for linkdb, crawldb, crawl_*, 
content, parse_data. Because I don't think that people will need fast 
random-access on anything but parse_text. 

I agree that we need to test performance impact of BLOCK compression before 
committing such a change. Unfortunately, our  setup doesn't include BLOCK 
compression right now. I will try to test it and report some results once I get 
the chance.

PS: Compressing content will not have significant savings right now since it is 
already compressed internally but once content stops doing that I think there 
will be _huge_ savings there. 

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread Andrzej Bialecki (JIRA)


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500635
 ] 

Andrzej Bialecki  commented on NUTCH-392:
-

Good point. We can change it to use the following pattern (as Hadoop uses 
internally), e.g.:

contentOut = new MapFile.Writer(job, fs, content.toString(), Text.class, 
Content.class, SequenceFile.getCompressionType(job), progress);

However, the original patch had some merits, too. Some types of data are not 
that compressible in themselves (using RECORD compression), i.e. it takes more 
effort to compress/decompress than space savings are worth. In case of 
crawl_parse and crawl_fetch it would make sense to enforce BLOCK or NONE 
compression type, and disallow the RECORD type.

 I know that BLOCK compression gives a better space savings, and incidentally 
may increase the writing speed. But I'm not sure what is the performance impact 
of using BLOCK compressed MapFile-s when doing random reading - this is the 
scenario in LinkDbInlinks, FetchedSegments and similar places. Could you 
perhaps test it? The original patch used RECORD compression for MapFile-s, 
probably for this reason.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2007-06-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/NUTCH-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500603
 ] 

Doğacan Güney commented on NUTCH-392:
-

>From what I  understand of MapFile.Writer code in hadoop, if you give 
>CompressionType as an argument in its constructor it overwrites the 
>compression value in config. So since nutch manually sets parse_text and 
>parse_data to RECORD compression ( and crawl_parse to NONE), we will not get 
>the advantages of BLOCK compression even if we set it in config. 

BLOCK compression seems to work really great if you got the native libraries in 
place, so IMHO it would be better to not manually set CompressionType and allow 
people to set it to whatever they want in config.

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: https://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
>Assignee: Andrzej Bialecki 
> Fix For: 1.0.0
>
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

2006-10-25 Thread Doug Cutting (JIRA)

[ 
http://issues.apache.org/jira/browse/NUTCH-392?page=comments#action_12444719 ] 

Doug Cutting commented on NUTCH-392:


This should not be applied until Nutch uses Hadoop 0.8.  It also contains a 
patch required to make Nutch work correctly with Hadoop 0.8 (where 
LocalFileSystem.rename() of a non-existing file now throws an exception).

> OutputFormat implementations should pass on Progressable
> 
>
> Key: NUTCH-392
> URL: http://issues.apache.org/jira/browse/NUTCH-392
> Project: Nutch
>  Issue Type: New Feature
>  Components: fetcher
>Reporter: Doug Cutting
> Assigned To: Doug Cutting
> Attachments: NUTCH-392.patch
>
>
> OutputFormat implementations should pass the Progressable they are passed to 
> underlying SequenceFile implementations.  This will keep reduce tasks from 
> timing out when block writes are slow.  This issue depends on 
> http://issues.apache.org/jira/browse/HADOOP-636.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

[jira] Commented: (NUTCH-392) OutputFormat implementations should pass on Progressable

15 matches

Site Navigation

Mail list logo

Footer information