[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364699 ]
Stefan Groschupf commented on NUTCH-192: ---------------------------------------- * plus whatever it takes to put the class name->id mapping in the MapWritable header (the mapping table): let's assume 40 bytes. I do not write the mapping table in any kind to the out stream, by now the the id is caculated by a hash from the class name. I will change this so it will be a part of the class where I will manually assign LongWritable id = (byte)1, UTF8 id = (byte)2, etc. For example writing a long ( e.g. a timestamp) as UTF8 require me 15 byte, writing it as LongWritable took me 8 byte. 8 byte plus 1 byte for the class type, is 60 % required space than using a String. I guess the main missunderstanding is that I do not write the clazz - id map into the stream at any time. Makes that sense? > meta data support for CrawlDatum > -------------------------------- > > Key: NUTCH-192 > URL: http://issues.apache.org/jira/browse/NUTCH-192 > Project: Nutch > Type: Improvement > Versions: 0.8-dev > Reporter: Stefan Groschupf > Fix For: 0.8-dev > Attachments: metadata300106.patch > > Supporting meta data in CrawlDatum would help to get a set of new nutch > features realized and makes a lot possible to smaller special focused search > engines. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
