I think the last structure is good. The index should be structured
according to how you want to search it. If your needs changed, you
should simply have another index. One index for all is not really
good. Index is more of trading space for time, so duplication is not
really a concern.

The first structure omits some hobby data, and the second structure
will have duplicated people that needs to be pruned.

--
Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com


On 12/13/06, Andrew Hughes <[EMAIL PROTECTED]> wrote:
Thanks Erick,

I'll give a representation of the data structure that I am trying to
index (in xml)..... This represents a relational data structure. Because
all Place (ie Kazakhstan) Person's are grouped together eta....

    <Example>
        <Place name="United States of America">
           <PlaceAlias>USA</PlaceAlias>
           <PlaceAlias>U.S.A</PlaceAlias>
           <PlaceAlias>US</PlaceAlias>
           <Person>
              <Name>George W Bush</Name>
              <Occupation>Demolition</Occupation>
              <Hobby alias="Funny">Comedy</Hobby>
              <Hobby alias="Pretend">Mime</Hobby>
              <Hobby>Ant Farms</Hobby>
           </Person>
           <Person>
              <Name>Bill Clinton</Name>
              <Occupation>Retired</Occupation>
              <Hobby>Smoking Cigars</Hobby>
           </Person>
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
        </Place>
        <Place name="kazakhstan">
           <PlaceAlias>kazak</PlaceAlias>
           <PlaceAlias>kazzi</PlaceAlias>
           <PlaceAlias>kzh</PlaceAlias>
           <Person>
              <Name>Borat</Name>
              <Occupation>TV Reporter</Occupation>
              <Hobby alias="Boogie">Dancing</Hobby>
              <Hobby alias="Soccer">Football</Hobby>
              <Hobby>Swimming</Hobby>
              <!-- many more hobbie's in here.. (or even none) with or
    without aliases -->
              <!-- many more hobbie's in here.. (or even none) with or
    without aliases -->
              <!-- many more hobbie's in here.. (or even none) with or
    without aliases -->
           </Person>
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
           <!-- many more person's here.... -->
        </Place>
        <!-- many more place's, person's and hobbie's here.... -->
        <!-- many more place's, person's and hobbie's here.... -->
        <!-- many more place's, person's and hobbie's here.... -->
    </Example>


I am expecting someone to say that this Relational/3NF strucutre should
simply be placed into a flat index... the concept of an index replaces
the 1-Many relational approach by grouping/indexing all "documents" with
the same "Place" together... or at least effectively making the search
time so fast and hence achieving a usable solution....

    Place     Person_Name       Person_Occupation  Hobby
    ===========================================================================
    USA          George W Bush  Demolition         Comedy
    USA          Bill  Clinton  Retired            Smoking Cigars
    Kazakhstan   Borat          TV Presenter       Dancing



I do however ask... how would one group duplicate fields.... such as the
"Hobbie's" below..... should these simply be a single field in the
lucene index??? that are tokenized? Or should everything be
*duplicated*???? Like this.... (plus I have ignored Alias' for simplicity).


    Place     Person_Name       Person_Occupation  Hobby
    ===========================================================================
    USA          George W Bush  Demolition         Comedy
    USA          George W Bush  Demolition         Mime
    USA          George W Bush  Demolition         Ant Farms
    USA          Bill  Clinton  Retired            Smoking Cigars
    Kazakhstan   Borat          TV Presenter       Dancing
    Kazakhstan   Borat          TV Presenter       Football
    Kazakhstan   Borat          TV Presenter       Swimming

    OR

    Place     Person_Name       Person_Occupation  Hobby
    ===========================================================================
    USA          George W Bush  Demolition         Comedy + Mime + Ant Farms
    USA          Bill  Clinton  Retired            Smoking Cigars
    Kazakhstan   Borat          TV Presenter       Dancing + Football +
    Swimming


I guess my final question, which is really what I am trying to achieve
is this.... I want to search for all "Person's" in the "~United States
of America", who's name is like "~Klinton" and enjoy's "~Smoking" for a
Hobby. An important part of this.... is that "I Wont know which token is
to be matched to which field", like when you go to an internet search
engine..... so I do I tokenize and put all fields from the XML into a
single Field in the index and query that with tokens??????


I realize that I'm posting LOTS of complicated questions.... and I am
probably just looking at the equivalent of a HTML indexing/search
implementation.



Many Thanks....

--AH




Erick Erickson wrote:
> Tell us more about the problem you are trying to solve. Lucene is
> designed
> for large text searching, not relations. Trying to "index a data
> structure"
> seems like mis-application of Lucene. Without some idea of what you are
> trying to accomplish, any advice you get is irrelevant at best...
>
>
> Best
> Erick
>
> On 12/13/06, Andrew Hughes <[EMAIL PROTECTED]> wrote:
>>
>> Hey All,
>>
>> I am very interested in indexing a 3NF Data Structure. Is there any
>> advice that someone can provide with this? From what I have seen Lucene
>> is typically a flat "First Normal Form" (Flat) data structure.... The
>> only way I can see to combine the relational links between multiple
>> indexes is to compare documents.
>>
>>
>> Any Help is Appreciated.
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to