[ 
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364699 ] 

Stefan Groschupf commented on NUTCH-192:
----------------------------------------

* plus whatever it takes to put the class name->id mapping in the MapWritable 
header (the mapping table): let's assume 40 bytes. 

I do not write the mapping table in any kind to the out stream, by now the the 
id is caculated by a hash from the class name. 
I will change this so it will be a part of the class where I will manually 
assign LongWritable id = (byte)1, UTF8 id = (byte)2, etc.

For example writing a long ( e.g. a timestamp) as UTF8 require me 15 byte, 
writing it as LongWritable took me 8 byte.
8 byte plus 1 byte for the class type, is 60 % required space than using a 
String. 

I guess the main missunderstanding is that I do not write the clazz - id map 
into the stream at any time.
Makes that sense?
 


> meta data support for CrawlDatum
> --------------------------------
>
>          Key: NUTCH-192
>          URL: http://issues.apache.org/jira/browse/NUTCH-192
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>      Fix For: 0.8-dev
>  Attachments: metadata300106.patch
>
> Supporting meta data in CrawlDatum would help to get a set of new nutch 
> features realized and makes a lot possible to smaller special focused search 
> engines.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to