Any comments, suggestions? Maybe I should rephrase my original message or describe it in detail? I really would like to get any response if possible.
Thanks a lot in advance! On Mon, Sep 1, 2008 at 10:25 AM, Leonid Maslov <[EMAIL PROTECTED]> wrote: > Hi all, > > First of all, sorry for my poor English. It's not my native language. > > I'm trying to use Lucene to index hierarchical kind of information: I have > structured html and pdf/word documents and I want to index them in ways to > perform search in titles, text, paragraphs or tables only, or any > combinations of items mentioned above. At the moment I see 3 possible > solutions: > > - Create the set of all possible fields, like: contents, title, > heading, table etc... And index the data in all them accordingly. Possible > impacts: > - a big count of fields > - data duplication (because I need to make search looking in the > paragraphs to look inside all the inner elements, so every outer element > indexed will contain all the inner element content as well) > - Create the hierarchy of the fields, like "title", "paragraph/title", > "paragraph/title/subparagraph/table". Possible impacts: > - count of fields remains the same > - soft set of fields (not consistent) > - I'm not sure about the ways I could process required information > and perform search. > - Performance issues? > - Use one field for content and just add location prefix to content. > For example "contents:*paragraph/heading:*token1 token2". * > paragraph/heading:* here is used as additional information prefix. So, > I (possibly?) could reuse PrefixQuery functionality or smth. Impacts: > - Strong set of index fields (small) > - Additional information processing - all the queries I'll use will > have to work as PrefixQuery > - Performance issues? > > > So, have anyone tried to make things work like that? Or am I trying to use > wrench to hammer in nails? I assume Lucene wasn't thought to be used like > that, but it's worth trying (at least asking). > Any results / suggestions are welcome! > > -- > Bests regards, > Leonid Maslov! > Adrienne Gusoff - "Opportunity knocked. My doorman threw him out." > -- Bests regards, Leonid Maslov! Adrienne Gusoff - "Opportunity knocked. My doorman threw him out."