I think the last structure is good. The index should be structured according to how you want to search it. If your needs changed, you should simply have another index. One index for all is not really good. Index is more of trading space for time, so duplication is not really a concern.
The first structure omits some hobby data, and the second structure will have duplicated people that needs to be pruned. -- Chris Lu ------------------------- Instant Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com On 12/13/06, Andrew Hughes <[EMAIL PROTECTED]> wrote:
Thanks Erick, I'll give a representation of the data structure that I am trying to index (in xml)..... This represents a relational data structure. Because all Place (ie Kazakhstan) Person's are grouped together eta.... <Example> <Place name="United States of America"> <PlaceAlias>USA</PlaceAlias> <PlaceAlias>U.S.A</PlaceAlias> <PlaceAlias>US</PlaceAlias> <Person> <Name>George W Bush</Name> <Occupation>Demolition</Occupation> <Hobby alias="Funny">Comedy</Hobby> <Hobby alias="Pretend">Mime</Hobby> <Hobby>Ant Farms</Hobby> </Person> <Person> <Name>Bill Clinton</Name> <Occupation>Retired</Occupation> <Hobby>Smoking Cigars</Hobby> </Person> <!-- many more person's here.... --> <!-- many more person's here.... --> <!-- many more person's here.... --> </Place> <Place name="kazakhstan"> <PlaceAlias>kazak</PlaceAlias> <PlaceAlias>kazzi</PlaceAlias> <PlaceAlias>kzh</PlaceAlias> <Person> <Name>Borat</Name> <Occupation>TV Reporter</Occupation> <Hobby alias="Boogie">Dancing</Hobby> <Hobby alias="Soccer">Football</Hobby> <Hobby>Swimming</Hobby> <!-- many more hobbie's in here.. (or even none) with or without aliases --> <!-- many more hobbie's in here.. (or even none) with or without aliases --> <!-- many more hobbie's in here.. (or even none) with or without aliases --> </Person> <!-- many more person's here.... --> <!-- many more person's here.... --> <!-- many more person's here.... --> </Place> <!-- many more place's, person's and hobbie's here.... --> <!-- many more place's, person's and hobbie's here.... --> <!-- many more place's, person's and hobbie's here.... --> </Example> I am expecting someone to say that this Relational/3NF strucutre should simply be placed into a flat index... the concept of an index replaces the 1-Many relational approach by grouping/indexing all "documents" with the same "Place" together... or at least effectively making the search time so fast and hence achieving a usable solution.... Place Person_Name Person_Occupation Hobby =========================================================================== USA George W Bush Demolition Comedy USA Bill Clinton Retired Smoking Cigars Kazakhstan Borat TV Presenter Dancing I do however ask... how would one group duplicate fields.... such as the "Hobbie's" below..... should these simply be a single field in the lucene index??? that are tokenized? Or should everything be *duplicated*???? Like this.... (plus I have ignored Alias' for simplicity). Place Person_Name Person_Occupation Hobby =========================================================================== USA George W Bush Demolition Comedy USA George W Bush Demolition Mime USA George W Bush Demolition Ant Farms USA Bill Clinton Retired Smoking Cigars Kazakhstan Borat TV Presenter Dancing Kazakhstan Borat TV Presenter Football Kazakhstan Borat TV Presenter Swimming OR Place Person_Name Person_Occupation Hobby =========================================================================== USA George W Bush Demolition Comedy + Mime + Ant Farms USA Bill Clinton Retired Smoking Cigars Kazakhstan Borat TV Presenter Dancing + Football + Swimming I guess my final question, which is really what I am trying to achieve is this.... I want to search for all "Person's" in the "~United States of America", who's name is like "~Klinton" and enjoy's "~Smoking" for a Hobby. An important part of this.... is that "I Wont know which token is to be matched to which field", like when you go to an internet search engine..... so I do I tokenize and put all fields from the XML into a single Field in the index and query that with tokens?????? I realize that I'm posting LOTS of complicated questions.... and I am probably just looking at the equivalent of a HTML indexing/search implementation. Many Thanks.... --AH Erick Erickson wrote: > Tell us more about the problem you are trying to solve. Lucene is > designed > for large text searching, not relations. Trying to "index a data > structure" > seems like mis-application of Lucene. Without some idea of what you are > trying to accomplish, any advice you get is irrelevant at best... > > > Best > Erick > > On 12/13/06, Andrew Hughes <[EMAIL PROTECTED]> wrote: >> >> Hey All, >> >> I am very interested in indexing a 3NF Data Structure. Is there any >> advice that someone can provide with this? From what I have seen Lucene >> is typically a flat "First Normal Form" (Flat) data structure.... The >> only way I can see to combine the relational links between multiple >> indexes is to compare documents. >> >> >> Any Help is Appreciated. >> >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]