On Wed, 29 Feb 2012, Alexander Wagner wrote: > I'm really not involved in our project here, but it might very well be > that we could leave out a lot of the actual "words" without any loss > of information and thus break the index down considerably.
This is exactly what I had in mind when I mentioned tweaking CFG variables that influence word breaking procedures. One should not need to create 16M of different index terms to provide good user search experience, I'd think. > So from the discussion, the question that is raised in me is "how could > I check how the index in question looks like, what is it's contents, in > a way to be able to judge if we really need this type of indexing?" `SELECT term FROM idxWORD01F LIMIT 100' and friends. > @Sebastian: could it be that the headache is caused by 995 C5? Based on your description, I would think so. You may want to include only certain subfields from your 999 reference lines into the global word index. This will indeed reduce your index size considerably. You can influence which subfields get indexed into the global `any field' index via BibIndex Admin Interface, menu Manage Logical Fields. > If it is http://zb0035.zb.kfa-juelich.de/record/767550/ (Sebastian?) and > my assumption about 995 C5 above is correct this record is quite > screwed. BTW and FWIW, I cannot access this URL from here. Maybe it's behind firewall. But it's not necessary for me to access it anyway, since Alex nicely described what the record looks like already. Best regards -- Tibor Simko
