[jira] Updated: (NUTCH-506) Nutch should delegate compression to Hadoop

2007-07-11 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/NUTCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney updated NUTCH-506:


Attachment: NUTCH-506.patch

New version. I missed ProtocolStatus and ParseStatus. This patch updates them 
in a backward-compatible way.

 Nutch should delegate compression to Hadoop
 ---

 Key: NUTCH-506
 URL: https://issues.apache.org/jira/browse/NUTCH-506
 Project: Nutch
  Issue Type: Improvement
Reporter: Doğacan Güney
 Fix For: 1.0.0

 Attachments: compress.patch, NUTCH-506.patch


 Some data structures within nutch (such as Content, ParseText) handle their 
 own compression. We should delegate all compressions to Hadoop. 
 Also, nutch should respect io.seqfile.compression.type setting. Currently 
 even if io.seqfile.compression.type is BLOCK or RECORD, nutch overrides it 
 for some structures and sets it to NONE (However, IMO, ParseText should 
 always be compressed as RECORD because of performance reasons).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-506) Nutch should delegate compression to Hadoop

2007-06-29 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/NUTCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doğacan Güney updated NUTCH-506:


Attachment: compress.patch

This patch changes Content (Content is no longer a CompressedWritable) and 
ParseText (from VersionedWritable(*) to Writable). These changes are backwards 
compatible. So old segments can still be read after this patch.

Patch also changes Content's public api very slightly. Content.forceInflate 
method is removed because it is no longer needed.

 Nutch should delegate compression to Hadoop
 ---

 Key: NUTCH-506
 URL: https://issues.apache.org/jira/browse/NUTCH-506
 Project: Nutch
  Issue Type: Improvement
Reporter: Doğacan Güney
 Fix For: 1.0.0

 Attachments: compress.patch


 Some data structures within nutch (such as Content, ParseText) handle their 
 own compression. We should delegate all compressions to Hadoop. 
 Also, nutch should respect io.seqfile.compression.type setting. Currently 
 even if io.seqfile.compression.type is BLOCK or RECORD, nutch overrides it 
 for some structures and sets it to NONE (However, IMO, ParseText should 
 always be compressed as RECORD because of performance reasons).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.