[ https://issues.apache.org/jira/browse/NUTCH-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13574846#comment-13574846 ]
Lewis John McGibbney commented on NUTCH-1511: --------------------------------------------- Hi Roland. Are you able to look into this at all? > Metadata in MYSQL updated with 'garbage' > ---------------------------------------- > > Key: NUTCH-1511 > URL: https://issues.apache.org/jira/browse/NUTCH-1511 > Project: Nutch > Issue Type: Bug > Components: fetcher, injector, storage > Affects Versions: 2.1 > Environment: Ubuntu 12.04 > Reporter: J. Gobel > Labels: metadata, mysql, nutch, scoring-opic > Fix For: 2.2 > > > After applying patch for Metadata parser (NUTCH-1478) I notice that the > metadata field just before the crawl ends is populated with the correct > information. However when the crawl is completely finished the metadata field > is populated with 'garbage' _csh_����� > I notice in my SQL log file that the scoring plugin is overwriting the > metadata field in a final data insertion with '_csh_ \0\0\0\0\'. When I > remove 'scoring-opic' out of 'plugin.includes' property in the nutch-site.xml > , the metadata-field is crisp and clear. > MYSQL LOG FILE: (I did a crawl on http://nutch.apache.org. Below you will see > a fragments of my MYSQL log file, only the moments when data is written to > the METADATA field in the MYSQL table. > First Insertion .. here I suppose scoring-opic writes its information, _csh_ > ?€\0\0\0 > 58 Query INSERT INTO webpage > (fetchInterval,fetchTime,id,markers,metadata,score )VALUES > (2592000,1357122976493,'org.apache.nutch:http/',' dist 0 _injmrk_ y\0',' > _csh_ ?€\0\0\0',1.0) ON DUPLICATE KEY UPDATE > fetchInterval=2592000,fetchTime=1357122976493,markers=' dist 0 _injmrk_ > y\0',metadata=' > _csh_ ?€\0\0\0',score=1.0 > Second Insertion - inhere scraped metada is inserted into metadata. > 81 Query INSERT INTO webpage > (id,markers,metadata,outlinks,parseStatus,signature,text,title )VALUES > ('org.apache.nutch:http/', > The final insertion - please note that here the metadata field is > overwritten with _CSH_\0\0\0\0 > 90 Query INSERT INTO webpage (fetchTime,id,inlinks,markers,metadata > )VALUES (1359714995075,'org.apache.nutch:http/',' 0http://nutch.apache.org/ > Nutch\0',' dist 0 _injmrk_ y _updmrk_*1357122982-1745626508 > __prsmrk__*1357122982-1745626508 _gnmrk_*1357122982-1745626508 > _ftcmrk_*1357122982-1745626508\0',' > _csh_ \0\0\0\0\0') ON DUPLICATE KEY UPDATE fetchTime=1359714995075,inlinks=' > 0http://nutch.apache.org/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira