Re: bibindex error

Ferran Jorba Thu, 15 Mar 2012 03:36:30 -0700

Hello all,
 
[...]
> This is definitely the case. The records in question cover the Sciences,
> Arts and Humanities and Social Sciences from all over the world, so in
> short "Life, Universe and Everything"...
>
>> Still, 16M of different index terms seems like a lot.  Isn't there some
>> other problem such as using English stemming on German text or something
>> similar?
>
> IMHO could happen as well. Not all journals are in English, though the
> majority surely is.
>
> So, one conclusion here might be again that we probably hit a problem
> here that never occurred in Invenio till now as we just have a very
> broad scope. So not that much an issue of the number of the records but
> of their content, and that a solution might be to just shorten the
> textual content in that area if this is possible. Interesting.


I fear we are close to hit the same number here, at least for our full
text index:

 $ echo 'select count(*) from idxWORD09F;' | dbexec
 count(*)
 15868566

At DDD we have 94,610 pdf documents with slightly more than 2,5 million
pages in several languages, from basque to greek, and everything in
between, plus all the garbage that OCR generates when digitalisating old
documents.

Should we be start to panic?

Ferran

Re: bibindex error

Reply via email to