Thank you four fast, straight and usefull comments. Keeping in mind what was said, did anybody actually think about implementing a kind of database layer on top of a lucene index. A database would be an index, collumns would be fields and entries documents. At least everything which would only require a single table could be done. A SELECT would be search ...
:-) Karl > > On Feb 5, 2005, at 10:04 AM, Karl Koch wrote: > > 1) Can I store all the information of the text file, but also apply a > > analyser. E.g. I use the StopAnalyzer. After finding the document, I > > want to > > extract the original text also from the index. Does this require that I > > store the information twice in two different fields (one indexed and > > one > > unindexed) ? > > You should use a single stored, tokenized, and indexed field for this > purpose. Be cautious of how you construct the Field object to achieve > this. > > > 2) I would like to extract information from the index which can found > > in a > > boolean way. I know that Lucene is a VSM which provides Boolean > > operators. > > This however does not change its functioning. For example, I have a > > field > > with contains an ID number and I want to use the search like a database > > operatation (e.g. to find the document with id=1). I can solve the > > problem > > by searching with query "id:1". However, this does not ensure that I > > will > > only get one result. Usually the first result is the document I want. > > But it > > could happen, that this sometimes does not work. > > Why wouldn't it work? For ID-type fields, use a Field.Keyword (stored, > indexed, but not tokenized). Search for a specific ID using a > TermQuery (don't use QueryParser for this, please). If the ID values > are unique, you'll either get zero or one result. > > > What happens if I should > > get no results? I guess if I search for id=5 and 5 did not exist I > > would > > probably get 50, 51, .. just because the contain 5. Did somebody work > > with > > this and can suggest a stable solution? > > No, this would not be the case, unless you're analyzing the ID field > with some strange character-by-character analyzer or doing a wildcard > "*5*" type query. > > > A good solution for these two questions would help me avoiding a > > database > > which would need to replicate most the data which I already have in my > > Lucene index... > > You're on the right track and avoiding a database when it is overkill > or duplicative is commendable :) > > Erik > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Lassen Sie Ihren Gedanken freien Lauf... z.B. per FreeSMS GMX bietet bis zu 100 FreeSMS/Monat: http://www.gmx.net/de/go/mail --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]