Can you send result of ls -lR <DataDir>/<DBName> ? Gregory Kozlovsky wrote:
> Hello, Alexander, > > I tried twice to clean the database and start indexing anew. The same > result. > When I comment out the Converter statements > > Converter application/pdf text/plain /usr/bin/pdftotext -q $in $out > Converter application/postscript text/plain /usr/local/bin/pstotext > > the indexing goes all right. > > When indexing with the converters, the abnormally large files are > 657M 00w > 544M 01w > 630M 02w > 623M 03w > 632M 04w > 641M 05w > 659M 06w > 651M 07w > 595M 08w > 637M 09w > 653M 10w > 590M 11w > 608M 12w > 657M 13w > 621M 14w > 642M 15w > 327M 16w > > Now I am trying to find out if the problem lies with .pdf or .ps files. > > What is the format of the files in 00w-99w directories? Is it described > somewhere? > > Regards, > > Gregory > > -----Original Message----- > From: Alexander F Avdonkin [mailto:[EMAIL PROTECTED]] > Sent: Samstag, 6. Juli 2002 11:47 > To: [EMAIL PROTECTED] > Subject: Re: [aseek-users] > > Possibly it could happen due to corrupted delta files. See which files > occupies > the most of space inside those directories. > The only solution here is to reindex everything from clear DB. > > Alexander. > > Gregory Kozlovsky wrote: > > > Hello, ASPseekers, > > > > I install aspseek-1.2.9 and started indexing into an empty database. > > However, > > the indexing stopped when 95390 docs were indexed and 352506 were found > > and not indexed. The reason is that the /var/aspseek/dbname became huge > and > > filled all the available space. With the old version, this directory had > 5.2 > > G for > > about 2 million indexed docs, now it is 14 G. Here is the output of "du *" > > inside the > > directory: > > > > [root@isn-search]# du * > > 657M 00w > > 544M 01w > > 630M 02w > > 623M 03w > > 632M 04w > > 641M 05w > > 659M 06w > > 651M 07w > > 595M 08w > > 637M 09w > > 653M 10w > > 590M 11w > > 608M 12w > > 657M 13w > > 621M 14w > > 642M 15w > > 327M 16w > > 39M 17w > > 41M 18w > > 43M 19w > > 43M 20w > > 37M 21w > > > > The rest of the subdirectories are normal size, around 50M. What is going > > wrong? One more thing that is suspicious is that I started indexing .pdf > > and .ps documents. May be the converters give some junk words? What > > converters do you people use? > > > > Gregory Kozlovsky > > > > Project Manager for Information Systems Tel: +41 (0)1 632 > 63 > > 70 > > International Relations and Security Network (ISN) Fax: +41 (0)1 632 > 14 > > 13 > > Center for Security Studies and Conflict Research Email: > > [EMAIL PROTECTED] > > Swiss Federal Institute of Technology (ETH) http://www.isn.ch > > Leonhardshalde 21, ETH-Zentrum / LEH > > CH-8092 Z�rich, Switzerland
