Maybe you can reproduce the problem on your environment with URLs publicaly available.
What is the mime type for the documents without titles? Alexander 2008/10/21 John Mendenhall <[EMAIL PROTECTED]> > > > Can u post some of the urls for which parse text is missing. > > I am unable to post the actual urls. This is a private > project for which exact urls cannot be shared. > > JohnM > > > > > > On Tue, Oct 21, 2008 at 6:44 AM, John Mendenhall <[EMAIL PROTECTED] > >wrote: > > > > > We are using nutch version nutch-2008-07-22_04-01-29. > > > We have a crawldb with over 1 million urls. > > > > > > We have noticed some of the urls in search results > > > do not have titles. After some research comparing > > > urls with titles and urls without titles, the urls > > > without titles have empty parsetext. > > > > > > Why would some urls have empty parsetext? > > > Is there some place I can look to determine why > > > parsetext is missing? > > > > > > Is the only way to reparse those urls with empty > > > parsetext to remove the crawl_parse directory for > > > the corresponding segment and run the nutch parse > > > command? > > > > > > Is there something I should do to guarantee all > > > urls get a parsetext, and hopefully, a title? > > > > > > Thanks in advance for any assistance or pointers > > > to other resources or ideas. > > > > > > JohnM > > -- > john mendenhall > [EMAIL PROTECTED] > surf utopia > internet services > -- Best Regards Alexander Aristov