Hello all, [...] > This is definitely the case. The records in question cover the Sciences, > Arts and Humanities and Social Sciences from all over the world, so in > short "Life, Universe and Everything"... > >> Still, 16M of different index terms seems like a lot. Isn't there some >> other problem such as using English stemming on German text or something >> similar? > > IMHO could happen as well. Not all journals are in English, though the > majority surely is. > > So, one conclusion here might be again that we probably hit a problem > here that never occurred in Invenio till now as we just have a very > broad scope. So not that much an issue of the number of the records but > of their content, and that a solution might be to just shorten the > textual content in that area if this is possible. Interesting.
I fear we are close to hit the same number here, at least for our full text index: $ echo 'select count(*) from idxWORD09F;' | dbexec count(*) 15868566 At DDD we have 94,610 pdf documents with slightly more than 2,5 million pages in several languages, from basque to greek, and everything in between, plus all the garbage that OCR generates when digitalisating old documents. Should we be start to panic? Ferran
