Re: How to make Relationships work for Multi-valued Index Fields?
Hey Gunaranjan, I have the same scenario as you. A lucene index is denormalized. It should not contain entity relationship. When I need to do something like you are doing, I group the related values in one field. Let's say we have 2 credit cards. the first has id 30459673 and taxes at 1.5%/month and the second has id 56305 and taxes at 2.5%. What I do is create a multivalued field that I index the values as "id ^ taxes". In the client side I put the logic to parse the string in a convenient way to work with the values. I expect that helps you. 2009/1/25 Gunaranjan Chandraraju > Paul > Its not just about merging the fields or resource usage. If you look at > the scenario below, the issue is that it mixes up my fields (shipping and > billing address) for instance. I can't merge them and still keep the > 'distinction' for search.Your case is a 'generalization' field. Thus > the search will work. I know mine is a trivial example and can be overcome > by just two fields (shipping_address & billing_address - but can I am > talking of cases when we have many such 'groups of fields'). > > In general such one to many relationship for indices in a 'document' is > also really really common :). Again I am not trying to argue a point - I > would be happy to get some idea on how to do it and be corrected if I'm > wrong. > > Lastly (while thats not my worry point right now), I tend to be careful > with resources. When dealing with very large data, I will avoid any > unnecessary overhead as-far-as-possible and take every optimization I get :) > > Guna > > > On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote: > > Guna, >> >> it's really really normal to duplicate stuffs to be merged into a field. >> >> We do this all the time, for example to have a field >> "text-in-any-language" while a field "text-in-english" is also there and the >> queries boost matches in text-in-any-language less than text-in-english (if >> user is in english). >> >> This difference in weighting is the gold of Lucene I feel (of retrieval >> generally). >> Also, depending on the field you make different indexing, while still >> copying it in solr (for example use a different analyzer per language). >> >> paul >> >> PS: don't be scared with resources, this is the side of the world where >> the resource is the least the problem! (typically a "catch-all-field" >> wouldn't be stored though as this would then load the memory). >> >> >> Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit : >> >> Thanks >>> This sounds redundant to me - to store the fields separately and then >>> concat all of them to one copy field again. >>> >>> My XML is like this >>> >>> >>> I am currently using XPATH or XSL to separate them into individual >>> indexed fields like: address_state_1, address_type_1 etc. in SOLR. >>> >>> From what you say, it looks to me that I might as well just treat the >>> entire address as a single 'text field' and search within the text after >>> tokenizing. This way I don't need to have the _1, _2 as the single text >>> field will contain the information together (and thus grouped - so I know >>> which is shipping/billing etc?).Will there be any performance difference >>> between this and the copy field approach? >>> >>> Is there no other way (programmatic) to search across multiple fields? I >>> did take a quick look at dismax but again it needs the field names to be >>> specifically mentioned in the config file or in the query. I can't do this >>> as I am not able to predict the number of fields (e.g. credit cards a person >>> can have?). >>> >>> I like SOLR, but to me, this seems to be a very common and simple search >>> scenario/pattern - however its implementation in SOLR is appearing to be not >>> very straightforward. (My apologies, if I on the wrong track here because >>> I don't understand SOLR well. ) >>> >>> Regards, >>> Guna >>> On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള് नोब्ळ् wrote: >>> >>> for searching you need to put them in a single field . use in schema.xml to achieve that On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju wrote: > I make this approach work with XPATH and XSL. However, this approach > creates multiple fields of like this > > address_state_1 > address_state_2 > ... > address_state_10 > > and > > credit_card_1 > credit_card_2 > credit_card_3 > > > How do I search for a credit_card.The query syntax does not seem to > support wild cards in field names. For e.g. I cant seem to do this -> > credit_card*:1234 4567 7890 1234 > > On the search side I would not know how many credit card fields got > created > for a document and so I need that to be dynamic. > > -g > > > On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: > > Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. >> >> On Fri, Jan 23,
Re: How to make Relationships work for Multi-valued Index Fields?
Paul Its not just about merging the fields or resource usage. If you look at the scenario below, the issue is that it mixes up my fields (shipping and billing address) for instance. I can't merge them and still keep the 'distinction' for search.Your case is a 'generalization' field. Thus the search will work. I know mine is a trivial example and can be overcome by just two fields (shipping_address & billing_address - but can I am talking of cases when we have many such 'groups of fields'). In general such one to many relationship for indices in a 'document' is also really really common :). Again I am not trying to argue a point - I would be happy to get some idea on how to do it and be corrected if I'm wrong. Lastly (while thats not my worry point right now), I tend to be careful with resources. When dealing with very large data, I will avoid any unnecessary overhead as-far-as-possible and take every optimization I get :) Guna On Jan 25, 2009, at 1:50 AM, Paul Libbrecht wrote: Guna, it's really really normal to duplicate stuffs to be merged into a field. We do this all the time, for example to have a field "text-in-any- language" while a field "text-in-english" is also there and the queries boost matches in text-in-any-language less than text-in- english (if user is in english). This difference in weighting is the gold of Lucene I feel (of retrieval generally). Also, depending on the field you make different indexing, while still copying it in solr (for example use a different analyzer per language). paul PS: don't be scared with resources, this is the side of the world where the resource is the least the problem! (typically a "catch-all- field" wouldn't be stored though as this would then load the memory). Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit : Thanks This sounds redundant to me - to store the fields separately and then concat all of them to one copy field again. My XML is like this I am currently using XPATH or XSL to separate them into individual indexed fields like: address_state_1, address_type_1 etc. in SOLR. From what you say, it looks to me that I might as well just treat the entire address as a single 'text field' and search within the text after tokenizing. This way I don't need to have the _1, _2 as the single text field will contain the information together (and thus grouped - so I know which is shipping/billing etc?).Will there be any performance difference between this and the copy field approach? Is there no other way (programmatic) to search across multiple fields? I did take a quick look at dismax but again it needs the field names to be specifically mentioned in the config file or in the query. I can't do this as I am not able to predict the number of fields (e.g. credit cards a person can have?). I like SOLR, but to me, this seems to be a very common and simple search scenario/pattern - however its implementation in SOLR is appearing to be not very straightforward. (My apologies, if I on the wrong track here because I don't understand SOLR well. ) Regards, Guna On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള് नोब्ळ् wrote: for searching you need to put them in a single field . use in schema.xml to achieve that On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju wrote: I make this approach work with XPATH and XSL. However, this approach creates multiple fields of like this address_state_1 address_state_2 ... address_state_10 and credit_card_1 credit_card_2 credit_card_3 How do I search for a credit_card.The query syntax does not seem to support wild cards in field names. For e.g. I cant seem to do this -> credit_card*:1234 4567 7890 1234 On the search side I would not know how many credit card fields got created for a document and so I need that to be dynamic. -g On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: I have setup my DIH to treat these as entities as below I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: How to make Relationships work for Multi-valued Index Fields?
Thanks Much appreciate the guidance. I think I will go with the single field approach for now. Also will take a look at the URL below and come back if I have any ideas. Guna On Jan 25, 2009, at 12:49 AM, Shalin Shekhar Mangar wrote: On Sun, Jan 25, 2009 at 2:05 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: Thanks This sounds redundant to me - to store the fields separately and then concat all of them to one copy field again. Sometimes that may be the only way. For example, if you want to facet on some of those fields, as well as to search them all. My XML is like this I am currently using XPATH or XSL to separate them into individual indexed fields like: address_state_1, address_type_1 etc. in SOLR. From what you say, it looks to me that I might as well just treat the entire address as a single 'text field' and search within the text after tokenizing. This way I don't need to have the _1, _2 as the single text field will contain the information together (and thus grouped - so I know which is shipping/billing etc?).Will there be any performance difference between this and the copy field approach? No I think, one field may even be better since you are creating less number of fields. If you never need to do faceting and you don't want to get the contents of each address field separately. This is your best option. Is there no other way (programmatic) to search across multiple fields? I did take a quick look at dismax but again it needs the field names to be specifically mentioned in the config file or in the query. I can't do this as I am not able to predict the number of fields (e.g. credit cards a person can have?). I like SOLR, but to me, this seems to be a very common and simple search scenario/pattern - however its implementation in SOLR is appearing to be not very straightforward. (My apologies, if I on the wrong track here because I don't understand SOLR well. ) There had been some discussion on having wildcards in field names. But I guess nobody contributed (or had the need?) for the complete proposal. Copy Fields give a lot of flexibility which is what most people use. http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
Guna, it's really really normal to duplicate stuffs to be merged into a field. We do this all the time, for example to have a field "text-in-any- language" while a field "text-in-english" is also there and the queries boost matches in text-in-any-language less than text-in- english (if user is in english). This difference in weighting is the gold of Lucene I feel (of retrieval generally). Also, depending on the field you make different indexing, while still copying it in solr (for example use a different analyzer per language). paul PS: don't be scared with resources, this is the side of the world where the resource is the least the problem! (typically a "catch-all- field" wouldn't be stored though as this would then load the memory). Le 25-janv.-09 à 09:35, Gunaranjan Chandraraju a écrit : Thanks This sounds redundant to me - to store the fields separately and then concat all of them to one copy field again. My XML is like this I am currently using XPATH or XSL to separate them into individual indexed fields like: address_state_1, address_type_1 etc. in SOLR. From what you say, it looks to me that I might as well just treat the entire address as a single 'text field' and search within the text after tokenizing. This way I don't need to have the _1, _2 as the single text field will contain the information together (and thus grouped - so I know which is shipping/billing etc?).Will there be any performance difference between this and the copy field approach? Is there no other way (programmatic) to search across multiple fields? I did take a quick look at dismax but again it needs the field names to be specifically mentioned in the config file or in the query. I can't do this as I am not able to predict the number of fields (e.g. credit cards a person can have?). I like SOLR, but to me, this seems to be a very common and simple search scenario/pattern - however its implementation in SOLR is appearing to be not very straightforward. (My apologies, if I on the wrong track here because I don't understand SOLR well. ) Regards, Guna On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള് नोब्ळ् wrote: for searching you need to put them in a single field . use in schema.xml to achieve that On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju wrote: I make this approach work with XPATH and XSL. However, this approach creates multiple fields of like this address_state_1 address_state_2 ... address_state_10 and credit_card_1 credit_card_2 credit_card_3 How do I search for a credit_card.The query syntax does not seem to support wild cards in field names. For e.g. I cant seem to do this -> credit_card*:1234 4567 7890 1234 On the search side I would not know how many credit card fields got created for a document and so I need that to be dynamic. -g On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: I have setup my DIH to treat these as entities as below I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul smime.p7s Description: S/MIME cryptographic signature
Re: How to make Relationships work for Multi-valued Index Fields?
On Sun, Jan 25, 2009 at 2:05 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: > Thanks > This sounds redundant to me - to store the fields separately and then > concat all of them to one copy field again. > Sometimes that may be the only way. For example, if you want to facet on some of those fields, as well as to search them all. > > My XML is like this > > > I am currently using XPATH or XSL to separate them into individual indexed > fields like: address_state_1, address_type_1 etc. in SOLR. > > From what you say, it looks to me that I might as well just treat the > entire address as a single 'text field' and search within the text after > tokenizing. This way I don't need to have the _1, _2 as the single text > field will contain the information together (and thus grouped - so I know > which is shipping/billing etc?).Will there be any performance difference > between this and the copy field approach? > No I think, one field may even be better since you are creating less number of fields. If you never need to do faceting and you don't want to get the contents of each address field separately. This is your best option. > > Is there no other way (programmatic) to search across multiple fields? I > did take a quick look at dismax but again it needs the field names to be > specifically mentioned in the config file or in the query. I can't do this > as I am not able to predict the number of fields (e.g. credit cards a person > can have?). > > I like SOLR, but to me, this seems to be a very common and simple search > scenario/pattern - however its implementation in SOLR is appearing to be not > very straightforward. (My apologies, if I on the wrong track here because > I don't understand SOLR well. ) There had been some discussion on having wildcards in field names. But I guess nobody contributed (or had the need?) for the complete proposal. Copy Fields give a lot of flexibility which is what most people use. http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
Thanks This sounds redundant to me - to store the fields separately and then concat all of them to one copy field again. My XML is like this I am currently using XPATH or XSL to separate them into individual indexed fields like: address_state_1, address_type_1 etc. in SOLR. From what you say, it looks to me that I might as well just treat the entire address as a single 'text field' and search within the text after tokenizing. This way I don't need to have the _1, _2 as the single text field will contain the information together (and thus grouped - so I know which is shipping/billing etc?).Will there be any performance difference between this and the copy field approach? Is there no other way (programmatic) to search across multiple fields? I did take a quick look at dismax but again it needs the field names to be specifically mentioned in the config file or in the query. I can't do this as I am not able to predict the number of fields (e.g. credit cards a person can have?). I like SOLR, but to me, this seems to be a very common and simple search scenario/pattern - however its implementation in SOLR is appearing to be not very straightforward. (My apologies, if I on the wrong track here because I don't understand SOLR well. ) Regards, Guna On Jan 24, 2009, at 10:54 PM, Noble Paul നോബിള് नोब्ळ् wrote: for searching you need to put them in a single field . use in schema.xml to achieve that On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju wrote: I make this approach work with XPATH and XSL. However, this approach creates multiple fields of like this address_state_1 address_state_2 ... address_state_10 and credit_card_1 credit_card_2 credit_card_3 How do I search for a credit_card.The query syntax does not seem to support wild cards in field names. For e.g. I cant seem to do this -> credit_card*:1234 4567 7890 1234 On the search side I would not know how many credit card fields got created for a document and so I need that to be dynamic. -g On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: I have setup my DIH to treat these as entities as below I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- --Noble Paul
Re: How to make Relationships work for Multi-valued Index Fields?
for searching you need to put them in a single field . use in schema.xml to achieve that On Sun, Jan 25, 2009 at 7:39 AM, Gunaranjan Chandraraju wrote: > I make this approach work with XPATH and XSL. However, this approach > creates multiple fields of like this > > address_state_1 > address_state_2 > ... > address_state_10 > > and > > credit_card_1 > credit_card_2 > credit_card_3 > > > How do I search for a credit_card.The query syntax does not seem to > support wild cards in field names. For e.g. I cant seem to do this -> > credit_card*:1234 4567 7890 1234 > > On the search side I would not know how many credit card fields got created > for a document and so I need that to be dynamic. > > -g > > > On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: > >> Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. >> >> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >>> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < >>> chandrar...@apple.com> wrote: >>> I have setup my DIH to treat these as entities as below >>> baseDir="***" fileName=".*xml" rootEntity="false" dataSource="null" > >>> name="record" processor="XPathEntityProcessor" stream="false" forEach="/record" url="${f.fileAbsolutePath}"> >>> name="record_adr" processor="XPathEntityProcessor" stream="false" forEach="/record/address" url="${f.fileAbsolutePath}"> >>> xpath="/record/address/@street" /> >>> xpath="/record/address//@state" /> >>> xpath="/record/address//@type" /> >>> >>> I think the only way is to create a dynamic field for each attribute >>> (street, state etc.). Write a transformer to copy the fields from your >>> data >>> config to appropriately named dynamic field (e.g. street_1, state_1, >>> etc). >>> To maintain this counter you will need to get/store it with >>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and >>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC). >>> >>> I cant't think of an easier way. >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. > > -- --Noble Paul
Re: How to make Relationships work for Multi-valued Index Fields?
I make this approach work with XPATH and XSL. However, this approach creates multiple fields of like this address_state_1 address_state_2 ... address_state_10 and credit_card_1 credit_card_2 credit_card_3 How do I search for a credit_card.The query syntax does not seem to support wild cards in field names. For e.g. I cant seem to do this -> credit_card*:1234 4567 7890 1234 On the search side I would not know how many credit card fields got created for a document and so I need that to be dynamic. -g On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: I have setup my DIH to treat these as entities as below I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
Hi Fergus, XPathEntityprocessor can read multivalued fields easily eg ***change** In this case all address_street,address_state,address_type will be returned as separate lists while parsing. If you wish to put them into multple fields you can write a transformer and iterate thru the lists and put them into separate fields. If there are 3 tags then you get a List for each fields where the length of the list==3. If an item is missing it will be added as a null. ensure that the fields are marked as multiValued="true" in the schema.xml. Otherwise it does not return List . If there is no corresponding mapping in schema.xml you can explicitly put it here in the dataconfig.xml eg: I saw the syntax '/record/address//@state'. '//' is not supported . You will have to explicitly give the full path. --Noble On Sat, Jan 24, 2009 at 2:57 PM, Noble Paul നോബിള് नोब्ळ् wrote: > nesting of an XPathEntityProcessor into another XPathEntityProcessor > is possible only if a field in an xml is a filename/url . > what is the purpose of nesting like this? > is it because you have multiple addresses? the possible solutions are > discussed elsewhere in this thread > > On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie wrote: >> Hello, >> >> I am also a newbie and was wanting to do almost the exact same thing. >> I was planning on doing the equivalent of:- >> >> >> >> >> > baseDir="***" >> fileName=".*xml" >> rootEntity="false" >> dataSource="null" > >> > name="record" >> processor="XPathEntityProcessor" >> stream="false" >> rootEntity="false"***changed*** >> forEach="/record" >> url="${f.fileAbsolutePath}"> >> >> ***change** >> >> > name="record_adr" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record/address" >> url="${f.fileAbsolutePath}"> >> >> > xpath="/record/address//@state" /> >> >> >> >> >> >> >> >> ID is no longer unique within Solr, There would be multiple "documents" >> with a given ID; one for each address. You can then search on ID and get >> the three addresses, you can also search on an address more sensibly. >> >> I have not been able to try this yet as other issues are still to be >> dealt with. >> >> Comments? >> >>>Hi >>>I may be completely off on this being new to SOLR but I am not sure >>>how to index related groups of fields in a document and preserver >>>their 'grouping'. I would appreciate any help on this.Detailed >>>description of the problem below. >>> >>>I am trying to index an entity that can have multiple occurrences in >>>the same document - e.g. Address. The address could be Shipping, >>>Home, Office etc. Each address element has multiple values in it >>>like street, state etc.Thus each address element is a group with >>>the state and street in one address element being related to each other. >>> >>>It looks like this in my source xml >>> >>> >>> >>> >>> >>> >>> >>> >>>I have setup my DIH to treat these as entities as below >>> >>> >>> >>> >>> >> baseDir="***" >>> fileName=".*xml" >>> rootEntity="false" >>> dataSource="null" > >>> >>name="record" >>> processor="XPathEntityProcessor" >>> stream="false" >>> forEach="/record" >>>url="${f.fileAbsolutePath}"> >>> >>> >>> >>> >> name="record_adr" >>>processor="XPathEntityProcessor" >>>stream="false" >>>forEach="/record/address" >>>url="${f.fileAbsolutePath}"> >>> >>>>> xpath="/record/address//@state" /> >>> >>> >>> >>> >>> >>> >>> >>> >>>The problem is as follows. DIH seems to treat these as entities but >>>solr seems to flatten them out on indexing to fields in a document >>>(losing the entity part). >>> >>>So when I search for the an ID - in the response all the street fields >>>are bunched to-gather, followed by all the state fields type etc. >>>Thus I can't associate which street address corresponds to which >>>address type in the response. >>> >>>What seems harder is this - say I need to query on 'Street' = XYZ1 and >>>type="Office". This should NOT return a document since the street for >>>the office address is "XY2" and not "XYZ1". However when I quer
Re: How to make Relationships work for Multi-valued Index Fields?
nesting of an XPathEntityProcessor into another XPathEntityProcessor is possible only if a field in an xml is a filename/url . what is the purpose of nesting like this? is it because you have multiple addresses? the possible solutions are discussed elsewhere in this thread On Sat, Jan 24, 2009 at 2:41 PM, Fergus McMenemie wrote: > Hello, > > I am also a newbie and was wanting to do almost the exact same thing. > I was planning on doing the equivalent of:- > > > > >baseDir="***" > fileName=".*xml" > rootEntity="false" > dataSource="null" > >name="record" > processor="XPathEntityProcessor" > stream="false" > rootEntity="false"***changed*** > forEach="/record" > url="${f.fileAbsolutePath}"> > > ***change** > > name="record_adr" > processor="XPathEntityProcessor" > stream="false" > forEach="/record/address" > url="${f.fileAbsolutePath}"> > > xpath="/record/address//@state" /> > > > > > > > > ID is no longer unique within Solr, There would be multiple "documents" > with a given ID; one for each address. You can then search on ID and get > the three addresses, you can also search on an address more sensibly. > > I have not been able to try this yet as other issues are still to be > dealt with. > > Comments? > >>Hi >>I may be completely off on this being new to SOLR but I am not sure >>how to index related groups of fields in a document and preserver >>their 'grouping'. I would appreciate any help on this.Detailed >>description of the problem below. >> >>I am trying to index an entity that can have multiple occurrences in >>the same document - e.g. Address. The address could be Shipping, >>Home, Office etc. Each address element has multiple values in it >>like street, state etc.Thus each address element is a group with >>the state and street in one address element being related to each other. >> >>It looks like this in my source xml >> >> >> >> >> >> >> >> >>I have setup my DIH to treat these as entities as below >> >> >> >> >> > baseDir="***" >> fileName=".*xml" >> rootEntity="false" >> dataSource="null" > >> >name="record" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record" >>url="${f.fileAbsolutePath}"> >> >> >> >> > name="record_adr" >>processor="XPathEntityProcessor" >>stream="false" >>forEach="/record/address" >>url="${f.fileAbsolutePath}"> >> >>> xpath="/record/address//@state" /> >> >> >> >> >> >> >> >> >>The problem is as follows. DIH seems to treat these as entities but >>solr seems to flatten them out on indexing to fields in a document >>(losing the entity part). >> >>So when I search for the an ID - in the response all the street fields >>are bunched to-gather, followed by all the state fields type etc. >>Thus I can't associate which street address corresponds to which >>address type in the response. >> >>What seems harder is this - say I need to query on 'Street' = XYZ1 and >>type="Office". This should NOT return a document since the street for >>the office address is "XY2" and not "XYZ1". However when I query for >>address_state:"XYZ1" and address_type:"Office" I get back this document. >> >>The problem seems to be that while DIH allows 'entities' within a >>document the SOLR schema does not preserve them - it 'flattens' all >>of them out as indices for the document. >> >>I could work around the problem by creating SOLR fields like >>"home_address_street" and "office_address_street" and do some xpath >>mapping. However I don't want to do it as we can have multiple >>'other' addresses. Also I have other fields whose type is not easily >>distinguished like address. >> >>As I mentioned being new to SOLR I might have completely goofed on a >>way to set it up - much appreciate any direction on it. I am using >>SOLR 1.3 >> >>Regards, >>Guna > > -- > > === > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > === > -- --Noble Paul
Re: How to make Relationships work for Multi-valued Index Fields?
Hello, I am also a newbie and was wanting to do almost the exact same thing. I was planning on doing the equivalent of:- ***change** ID is no longer unique within Solr, There would be multiple "documents" with a given ID; one for each address. You can then search on ID and get the three addresses, you can also search on an address more sensibly. I have not been able to try this yet as other issues are still to be dealt with. Comments? >Hi >I may be completely off on this being new to SOLR but I am not sure >how to index related groups of fields in a document and preserver >their 'grouping'. I would appreciate any help on this.Detailed >description of the problem below. > >I am trying to index an entity that can have multiple occurrences in >the same document - e.g. Address. The address could be Shipping, >Home, Office etc. Each address element has multiple values in it >like street, state etc.Thus each address element is a group with >the state and street in one address element being related to each other. > >It looks like this in my source xml > > > > > > > > >I have setup my DIH to treat these as entities as below > > > > >baseDir="***" > fileName=".*xml" > rootEntity="false" > dataSource="null" > > name="record" > processor="XPathEntityProcessor" > stream="false" > forEach="/record" >url="${f.fileAbsolutePath}"> > > > >name="record_adr" >processor="XPathEntityProcessor" >stream="false" >forEach="/record/address" >url="${f.fileAbsolutePath}"> > > xpath="/record/address//@state" /> > > > > > > > > >The problem is as follows. DIH seems to treat these as entities but >solr seems to flatten them out on indexing to fields in a document >(losing the entity part). > >So when I search for the an ID - in the response all the street fields >are bunched to-gather, followed by all the state fields type etc. >Thus I can't associate which street address corresponds to which >address type in the response. > >What seems harder is this - say I need to query on 'Street' = XYZ1 and >type="Office". This should NOT return a document since the street for >the office address is "XY2" and not "XYZ1". However when I query for >address_state:"XYZ1" and address_type:"Office" I get back this document. > >The problem seems to be that while DIH allows 'entities' within a >document the SOLR schema does not preserve them - it 'flattens' all >of them out as indices for the document. > >I could work around the problem by creating SOLR fields like >"home_address_street" and "office_address_street" and do some xpath >mapping. However I don't want to do it as we can have multiple >'other' addresses. Also I have other fields whose type is not easily >distinguished like address. > >As I mentioned being new to SOLR I might have completely goofed on a >way to set it up - much appreciate any direction on it. I am using >SOLR 1.3 > >Regards, >Guna -- === Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===
Re: How to make Relationships work for Multi-valued Index Fields?
Yes Solr does. But DataImportHandler with the 1.3 release does not support it. However, you can use the trunk data import handler jar with Solr 1.3 if you do not feel comfortable using Solr 1.4 trunk. On Fri, Jan 23, 2009 at 1:36 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: > > I thought 1.3 supported dynamic fields in schema.xml? > > Guna > > > On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: > > Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. >> >> On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >> On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < >>> chandrar...@apple.com> wrote: >>> >>> I have setup my DIH to treat these as entities as below >>> baseDir="***" fileName=".*xml" rootEntity="false" dataSource="null" > >>> name="record" processor="XPathEntityProcessor" stream="false" forEach="/record" url="${f.fileAbsolutePath}"> >>> name="record_adr" processor="XPathEntityProcessor" stream="false" forEach="/record/address" url="${f.fileAbsolutePath}"> >>> xpath="/record/address/@street" /> >>> xpath="/record/address//@state" /> >>> xpath="/record/address//@type" /> >>> I think the only way is to create a dynamic field for each attribute >>> (street, state etc.). Write a transformer to copy the fields from your >>> data >>> config to appropriately named dynamic field (e.g. street_1, state_1, >>> etc). >>> To maintain this counter you will need to get/store it with >>> Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and >>> Context#setSessionAttribute(name, val, Context.SCOPE_DOC). >>> >>> I cant't think of an easier way. >>> -- >>> Regards, >>> Shalin Shekhar Mangar. >>> >>> >> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > > -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
I thought 1.3 supported dynamic fields in schema.xml? Guna On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: I have setup my DIH to treat these as entities as below I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
I thought 1.3 supported dynamic fields in schema.xml? Guna On Jan 22, 2009, at 11:54 PM, Shalin Shekhar Mangar wrote: Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: I have setup my DIH to treat these as entities as below I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
Oops, one more gotcha. The dynamic field support is only in 1.4 trunk. On Fri, Jan 23, 2009 at 1:24 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < > chandrar...@apple.com> wrote: > >> >> >> >> >> >> >> >> >> I have setup my DIH to treat these as entities as below >> >> >> >> >> > baseDir="***" >> fileName=".*xml" >> rootEntity="false" >> dataSource="null" > >>> name="record" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record" >> url="${f.fileAbsolutePath}"> >> >> >> >> > name="record_adr" >> processor="XPathEntityProcessor" >> stream="false" >> forEach="/record/address" >> url="${f.fileAbsolutePath}"> >> > xpath="/record/address/@street" /> >> > xpath="/record/address//@state" /> >> > xpath="/record/address//@type" /> >> >> >> >> >> >> > > I think the only way is to create a dynamic field for each attribute > (street, state etc.). Write a transformer to copy the fields from your data > config to appropriately named dynamic field (e.g. street_1, state_1, etc). > To maintain this counter you will need to get/store it with > Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and > Context#setSessionAttribute(name, val, Context.SCOPE_DOC). > > I cant't think of an easier way. > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Shalin Shekhar Mangar.
Re: How to make Relationships work for Multi-valued Index Fields?
On Fri, Jan 23, 2009 at 1:08 PM, Gunaranjan Chandraraju < chandrar...@apple.com> wrote: > > > > > > > > > I have setup my DIH to treat these as entities as below > > > > > baseDir="***" > fileName=".*xml" > rootEntity="false" > dataSource="null" > > name="record" > processor="XPathEntityProcessor" > stream="false" > forEach="/record" > url="${f.fileAbsolutePath}"> > > > > name="record_adr" > processor="XPathEntityProcessor" > stream="false" > forEach="/record/address" > url="${f.fileAbsolutePath}"> > xpath="/record/address/@street" /> > xpath="/record/address//@state" /> > xpath="/record/address//@type" /> > > > > > > I think the only way is to create a dynamic field for each attribute (street, state etc.). Write a transformer to copy the fields from your data config to appropriately named dynamic field (e.g. street_1, state_1, etc). To maintain this counter you will need to get/store it with Context#getSessionAttribute(name, val, Context.SCOPE_DOC) and Context#setSessionAttribute(name, val, Context.SCOPE_DOC). I cant't think of an easier way. -- Regards, Shalin Shekhar Mangar.