Re: Some questions about index...

Karl Koch Sat, 05 Feb 2005 13:40:00 -0800

Thank you four fast, straight and usefull comments. Keeping in mind what was
said, did anybody actually think about implementing a kind of database layer
on top of a lucene index. A database would be an index, collumns would be
fields and entries documents. At least everything which would only require a
single table could be done. A SELECT would be search ...


:-)

Karl

> 
> On Feb 5, 2005, at 10:04 AM, Karl Koch wrote:
> > 1) Can I store all the information of the text file, but also apply a
> > analyser. E.g. I use the StopAnalyzer. After finding the document, I 
> > want to
> > extract the original text also from the index. Does this require that I
> > store the information twice in two different fields (one indexed and 
> > one
> > unindexed) ?
> 
> You should use a single stored, tokenized, and indexed field for this 
> purpose.  Be cautious of how you construct the Field object to achieve 
> this.
> 
> > 2) I would like to extract information from the index which can found 
> > in a
> > boolean way. I know that Lucene is a VSM which provides Boolean 
> > operators.
> > This however does not change its functioning. For example, I have a 
> > field
> > with contains an ID number and I want to use the search like a database
> > operatation (e.g. to find the document with id=1). I can solve the 
> > problem
> > by searching with query "id:1". However, this does not ensure that I 
> > will
> > only get one result. Usually the first result is the document I want. 
> > But it
> > could happen, that this sometimes does not work.
> 
> Why wouldn't it work?  For ID-type fields, use a Field.Keyword (stored, 
> indexed, but not tokenized).  Search for a specific ID using a 
> TermQuery (don't use QueryParser for this, please).  If the ID values 
> are unique, you'll either get zero or one result.
> 
> >  What happens if I should
> > get no results? I guess if I search for id=5 and 5 did not exist I 
> > would
> > probably get 50, 51, .. just because the contain 5. Did somebody work 
> > with
> > this and can suggest a stable solution?
> 
> No, this would not be the case, unless you're analyzing the ID field 
> with some strange character-by-character analyzer or doing a wildcard 
> "*5*" type query.
> 
> > A good solution for these two questions would help me avoiding a 
> > database
> > which would need to replicate most the data which I already have in my
> > Lucene index...
> 
> You're on the right track and avoiding a database when it is overkill 
> or duplicative is commendable :)
> 
>       Erik
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

-- 
Lassen Sie Ihren Gedanken freien Lauf... z.B. per FreeSMS
GMX bietet bis zu 100 FreeSMS/Monat: http://www.gmx.net/de/go/mail

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Some questions about index...

Reply via email to