The field size is restricted to 1 million tokens, because of the very reasons you mentioned. So, even if I have one separate field for the content of a file, I might reach the limit if the file is really big. But I can't help that. What I want to avoid is that the whole content of some files can not be found because I used one field for the content of all files and they just could not be appended anymore.
--- Erick Erickson <erickerick...@gmail.com> schrieb am Di, 19.1.2010: > Von: Erick Erickson <erickerick...@gmail.com> > Betreff: Re: Indexing and Searching linked files > An: java-user@lucene.apache.org > Datum: Dienstag, 19. Januar 2010, 14:43 > What field size limit are you talking > about here? Because 10,000 > tokens is the default, but you can increase it to > Integer.MAX_VALUE. > > So are you really talking billions of tokens here? Your > index > quickly becomes unmanageable if you're allowing it to grow > by such increments. > > One can argue, IMO, that the first N (10M, say) tokens/file > is > "enough" and there's not much real value in the rest, but > that > can be a weak argument depending on the problem space.... > > But if you're really committed to indexing an unbounded > number > of arbitrarily large files...you'll fail. Sometime, > somewhere, somebody > will want to index enough to violate whatever limits you > have (disk, > memory, time, whatever). So I think you'd be farther ahead > to ask your > product manager what limits are reasonable and go from > there... > > HTH > Erick > > On Tue, Jan 19, 2010 at 7:57 AM, Anna Hunecke <annahune...@yahoo.de> > wrote: > > > Hi! > > I have been working with Lucene for a while now. So > far, I found helpful > > tips on this list, so I hope somebody can help me with > my problem: > > > > In our app information is grouped in so-called cards. > Now, it should be > > made possible to also search on files linked to the > cards. You can link > > arbitrarily many files to a card and the size of the > files is also not > > restricted. > > So, as far as I can see, there are two ways to do > this: > > > > 1. Add the content of the files to the search index of > the card. First, I > > thought that I could just have an additional field in > the index which > > contains the content of all the files. But then, if > the files are very big, > > I could hit the field size limit, and would possibly > not get the content of > > all files indexed. So, I would need one field per > file. The problem I have > > then is that I don't know how many files I have and > how large the index > > would get. This is risky, because some customers have > a lot of data. > > > > 2. Create a separate index for files. The documents in > this index would > > contain one file each, so I would not have the problem > that I don't know how > > many fields I have. But then, the searching is a > problem: > > I would need to search on both the card and the > document index, and somehow > > merge the results together. I sort by score always, > but, as I understand it, > > the scores of the results of two different indexes are > not comparable. > > > > So, which way do you think is better? > > > > Best, > > Anna > > > > __________________________________________________ > > Do You Yahoo!? > > Sie sind Spam leid? Yahoo! Mail verfügt über einen > herausragenden Schutz > > gegen Massenmails. > > http://mail.yahoo.com > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > __________________________________________________ Do You Yahoo!? Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org