Issue with Parse metaData while crawling RSSFeed URL

Saurabh Suman Fri, 17 Jul 2009 04:16:18 -0700

hi
I am  crawling a feed url.  http://blog.taragana.com/n/c/india/feed/.
I have set depth =2.
I am using FeedParser.java for parsing it.
For depth 1 in parseData in segments  folder  Parse Metadata for a url "
http://blog.taragana.com/n/30-child-labourers-rescued-in-agra-and-firozabad-111417/
" is   like this
Parse Metadata :author=Ani CharEncodingForConversion=utf-8 tag=Agra
tag=Firozabad tag=Uttar Pradesh tag=India OriginalCharEncoding=utf-8
feed=http://blog.taragana.com/n published=1247778368000 .
As we can see it contains  author.


but for  depth 2 parsemetadata for same url is like this:
Parse Metadata: CharEncodingForConversion=utf-8 OriginalCharEncoding=utf-8 

when i search i am not getting author. i have following question regarding
this-

(1)Does Nutch overwrite  Parsed metadata of depth 1 with that of depth 2 
for this URL or does it merge the two? If it overwrites, then how can I stop
it from doing the same as I need the author and other information obtained
by parsing the RSS feed.




-- 
View this message in context: 
http://www.nabble.com/Issue-with-Parse-metaData-while-crawling-RSSFeed-URL-tp24532613p24532613.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Issue with Parse metaData while crawling RSSFeed URL

Reply via email to