All right. Take a look to this output of the segread command:
060803 132735 PARSED? STARTED FINISHED
COUNT DIR NAME
060803 132735 true 20060717-14:41:58 20060717-14:41:58
1 crawl-legislacao_copia/segments/20060717144154
060803 132735 true 20060717-14:42:03 20060717-14:43:22
77 crawl-legislacao_copia/segments/20060717144201
060803 132735 true 20060717-14:43:29 20060717-15:08:10
1464 crawl-legislacao_copia/segments/20060717144327
060803 132735 true 20060717-15:08:17 20060717-15:11:58
223 crawl-legislacao_copia/segments/20060717150815
060803 132736 true 20060718-09:02:56 20060718-09:03:10
5 crawl-legislacao_copia/segments/20060718090250
060803 132736 true 20060803-10:55:18 20060803-12:53:49
1541 crawl-legislacao_copia/segments/20060803105509
060803 132736 true 20060803-13:07:15 20060803-13:07:20
4 crawl-legislacao_copia/segments/20060803130707
060803 132736 TOTAL: 3315 entries in 7 segments.
My db.default.fetch.interval is 15. Before I run a recrawl script I had 5
segments ( 200607* ) and the Index points to 1537 documents. After run the
recrawl 2 segments was created and then the script index all. When I
analyzed the index generated I see it had 1541 documents. But how can you
see the segments 200607* are old and can be deleted. I done this:
rm -rf segments/200607*
Then I get de NPE. I right I must to re-index the 2 remain segments. I've
done this. So, I analize again the index and it has only 1417!
My questions:
Why it occurs? How can I know which segments can be deleted?
I hope you can help me
On 8/3/06, Marko Bauhardt <[EMAIL PROTECTED]> wrote:
Hi,
if you delete segments then be sure that you doesnt have an index
from this segment.
The segment contains the parsed content and the index is the index
from this content. If you delete the segment and you doing a search
on this index, a NPE occurs because no summary (parsed content) are
found.
HTH
Marko
Am 03.08.2006 um 16:33 schrieb Lourival Júnior:
> Why when I delete some segments that reach the
> db.default.fetcth.intervalthe search application gets the
> nullPointerException? Periodically I have to
> recrawl my Site. And delete old segments is a problem. Someone have a
> suggestion?
>
> Regards
>
> --
> Lourival Junior
> Universidade Federal do Pará
> Curso de Bacharelado em Sistemas de Informação
> http://www.ufpa.br/cbsi
> Msn: [EMAIL PROTECTED]
--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general