[
https://issues.apache.org/jira/browse/NUTCH-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508234
]
Doğacan Güney commented on NUTCH-434:
-------------------------------------
You mean the one we output from reducer? We can't wrap it (as you noted) in
NutchWritable since it is not writable and we don't have to wrap it. Indexer
just wraps a Lucene document in ObjectWritable (hadoop forces you to output
writable values from reduer) so that Indexer.OutputFormat can pick it up. Since
this is done locally (without any serialization/deserialization), there is no
cost to wrapping it with ObjectWritable.
Now there is a good chance that you knew all this :). If your point was that we
should replace all ObjectWritables, perhaps I can add something like a
LameLuceneDocumentWrapper that is writable (with readFields and write empty),
and use it to pass lucene document from reducer to OutputFormat.
> Replace usage of ObjectWritable with something based on GenericWritable
> -----------------------------------------------------------------------
>
> Key: NUTCH-434
> URL: https://issues.apache.org/jira/browse/NUTCH-434
> Project: Nutch
> Issue Type: Improvement
> Reporter: Sami Siren
> Attachments: NUTCH-434.patch, NUTCH-434_v2.patch
>
>
> We should replace the usage of ObjectWritable and classes extending it with
> class extending GenericWritable. Classes based on GenericWritable have
> smaller footprint on disc and the baseclass also does not contain any classes
> that are Deprecated.
> There is one problem though: the ParseData currently needs Configuration
> object before it can deserialize itself and GenericWritable
> doesn't provide a way to inject configuration in. We could either a) remove
> the need for Configuration, or b) write a class similar to GenericWritable
> that does conf injecting.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers