On Jan 4, 2007, at 1:55 PM, Brian Whitman wrote:

> I did that kill -SIGQUIT thing on the parse hang-- looks like  
> jid3lib has a problem... but if jid3lib throws an exception,  
> shouldn't the parse-mp3 plugin and nutch pick it up and continue?  
> (Excuse my java lack of knowledge...)

As suspected, jid3lib had a nasty bug in parsing certain bad id3v2.X  
files. I've patched it only to find someone else had found the bug in  
2004 but the devs never integrated the change... does anyone know a  
better (more maintained) java id3 parsing library? I could volunteer  
to write parse-mp3 against something else instead.


More to the topic of Nutch, it's interesting that the hangs only  
happened on a re-parse. These files have been parsed before during  
the crawl. If a parse subprocess hangs like this during a crawl,  
doesn't a reaper come around and kill the thread and ignore the url?  
If so, shouldn't the same thing happen during explicit parses?









-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to