Re: multi-valued associated fields
Here's the problem with mixing dissimilar text: relevance. Your text relevance depends on a document's delta with all other documents in the index. If you index nothing but technical papers, searching a technical term will find what you expect. If you mix technical papers and movie titles, text query will be useless. On Thu, May 13, 2010 at 12:06 PM, Eric Grobler impalah...@googlemail.com wrote: Hi Ahmed Thanks again for sharing your insight and experience. I will discuss the multi-core approach with members of our team. Regards Eric On Wed, May 12, 2010 at 9:24 PM, ahammad ahmed.ham...@gmail.com wrote: In our deployment, we thought that complications might arise when attempting to hit the Solr server with addresses of too many cores. For instance, we have 15+ cores running at the moment. At the worst case, we will have to use all 15+ addresses of all the cores to search all our data. What we eventually did was to combine all the cores into a single core, which will basically give us a more clean solution. You will get the simplicity of querying one core, but the flexibility of modifying cores separately. Basically, we have all the cores indexing separately. We set up a script that would use the index merge functionality of Solr to combine all the indexes into a single index accessible through one core. Yes, there will be some overhead on the server, but I believe that it's a good compromise. In our case, we have multiple servers at our disposal, so this was not a problem to implement. It all depends on your data set and the volume of documents that you will be indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: multi-valued associated fields
Hi Ahmed Thanks again for sharing your insight and experience. I will discuss the multi-core approach with members of our team. Regards Eric On Wed, May 12, 2010 at 9:24 PM, ahammad ahmed.ham...@gmail.com wrote: In our deployment, we thought that complications might arise when attempting to hit the Solr server with addresses of too many cores. For instance, we have 15+ cores running at the moment. At the worst case, we will have to use all 15+ addresses of all the cores to search all our data. What we eventually did was to combine all the cores into a single core, which will basically give us a more clean solution. You will get the simplicity of querying one core, but the flexibility of modifying cores separately. Basically, we have all the cores indexing separately. We set up a script that would use the index merge functionality of Solr to combine all the indexes into a single index accessible through one core. Yes, there will be some overhead on the server, but I believe that it's a good compromise. In our case, we have multiple servers at our disposal, so this was not a problem to implement. It all depends on your data set and the volume of documents that you will be indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
Hello Eric, Certainly it is possible. I would strongly advice to have field which differentiates the record type (RECORD_TYPE:CAR / PROPERTY). In general I was also wondering how Solr developers implement websites that uses tag filters.For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. By using faceting queries, You could acheive this. Regards Aditya www.findbestopensource.com On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.comwrote: Hallo Solr community, We are considering Solr for searching on content from various partners with wildly different content. Is it possible or practical to work with multi-valued associated fields like this? Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS Type:Flat, Rooms:2, Period:6 months Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock Breed:Siamese, Age:9 weeks and do: - searching on individual keys - range queries within multi-valued fields. - faceting I suppose an alternative would be to create unnamed fields like range1, range2, range3 with a descripter field like Year,KM,EngineSize for a car document and Rooms for a property document for example. In general I was also wondering how Solr developers implement websites that uses tag filters. For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. Any suggestions and feedback would be greatly appreciated. Regards Eric
Re: multi-valued associated fields
Hi Aditya, Thanks for your response. Yes, a category type would be needed. One thing I am not clear about, If you have multi-values like toshiba, tecra, LCD it is then clear that you can run solr queries like: fq=mymultivaluefield:LCD but for associated fields like: make=toshiba, model=tecra, screen=LCD make=toshiba, model=tecra, screen=TFT Is there a way for Solr to understand key=value pairs from a multi-value field? For example you may want to do a filter on screen type: fq=mymultivaluefield:screen and not fq=mymultivaluefield:screen:LCD On Wed, May 12, 2010 at 8:20 AM, findbestopensource findbestopensou...@gmail.com wrote: Hello Eric, Certainly it is possible. I would strongly advice to have field which differentiates the record type (RECORD_TYPE:CAR / PROPERTY). In general I was also wondering how Solr developers implement websites that uses tag filters.For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. By using faceting queries, You could acheive this. Regards Aditya www.findbestopensource.com On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.com wrote: Hallo Solr community, We are considering Solr for searching on content from various partners with wildly different content. Is it possible or practical to work with multi-valued associated fields like this? Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS Type:Flat, Rooms:2, Period:6 months Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock Breed:Siamese, Age:9 weeks and do: - searching on individual keys - range queries within multi-valued fields. - faceting I suppose an alternative would be to create unnamed fields like range1, range2, range3 with a descripter field like Year,KM,EngineSize for a car document and Rooms for a property document for example. In general I was also wondering how Solr developers implement websites that uses tag filters. For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. Any suggestions and feedback would be greatly appreciated. Regards Eric
Re: multi-valued associated fields
Hi Erick, Thank your for your thoughts, I had exactly the same idea like your screenLCD suggestion (but with a semicolon) For example: range1 range2 range3 range_flagsproperties 890 2001 range1;km, range2;kw, range3;year group;auto, make;audi, model;a4, color;red 575 2003 range1;km, range2;kw, range3;year group;auto, make;nissan, model;primeria, ABS 475 2004 range1;km, range2;kw, range3;year group;auto, make;nissan, model;primeria, ABS 4null null range1;rooms group;immo, type;flat 16 null null range1;memory group;handy, make;iphone, model;3GS, memory;16GB The range_flags column will describe the range1, range2, range3 columns so that you can do queries like: q=*:* fq=properties:group;auto fq=range_flags:range1;km fq=range1:[2 TO 8] The range_flag filter is not absolutely nessasary, but it ensures that you do range queries on the data you intent to. Regards Eric On Wed, May 12, 2010 at 3:12 PM, Erick Erickson erickerick...@gmail.comwrote: I'm not entirely sure this is germane, but there's absolutely no requirement that all documents in SOLR have the same fields. So it's possible for you to index the wildly different content in wildly different fields G. Then searching for screen:LCD would be straightforward. Of course this may not map into your problem space at all well, but I thought I'd mention it. There's nothing in SOLR that I know of that understands KEY:VALUE pairs in a multi-valued field. However, you could do some trick like prefix your key, so you'd effectively be indexing screenLCD and screenTFT, but that's kind of awkward if your values have multiple words, although you could still prefix each word (e.g. to index a screen with the value really big one, you'd have to index screenreally, screenbig, screenone or some such). And have you looked at dynamic fields? Again I'm not sure that works for you, but might be worth a look. Best Erick On Wed, May 12, 2010 at 4:17 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Aditya, Thanks for your response. Yes, a category type would be needed. One thing I am not clear about, If you have multi-values like toshiba, tecra, LCD it is then clear that you can run solr queries like: fq=mymultivaluefield:LCD but for associated fields like: make=toshiba, model=tecra, screen=LCD make=toshiba, model=tecra, screen=TFT Is there a way for Solr to understand key=value pairs from a multi-value field? For example you may want to do a filter on screen type: fq=mymultivaluefield:screen and not fq=mymultivaluefield:screen:LCD On Wed, May 12, 2010 at 8:20 AM, findbestopensource findbestopensou...@gmail.com wrote: Hello Eric, Certainly it is possible. I would strongly advice to have field which differentiates the record type (RECORD_TYPE:CAR / PROPERTY). In general I was also wondering how Solr developers implement websites that uses tag filters.For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. By using faceting queries, You could acheive this. Regards Aditya www.findbestopensource.com On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.com wrote: Hallo Solr community, We are considering Solr for searching on content from various partners with wildly different content. Is it possible or practical to work with multi-valued associated fields like this? Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS Type:Flat, Rooms:2, Period:6 months Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock Breed:Siamese, Age:9 weeks and do: - searching on individual keys - range queries within multi-valued fields. - faceting I suppose an alternative would be to create unnamed fields like range1, range2, range3 with a descripter field like Year,KM,EngineSize for a car document and Rooms for a property document for example. In general I was also wondering how Solr developers implement websites that uses tag filters. For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. Any suggestions and feedback would be greatly appreciated. Regards Eric
Re: multi-valued associated fields
I had the same problem as you last year, i.e. indexing stuff from different sources with different characteristics. The way I approached it is by setting up a multi-core environment, with each core representing one type of data. Within each core, I had a data type sort of field that would define what kind of data is stored (i.e. in your case, it would be auto or real estate etc...). The advantages of this setup is that it allows you to make changes to individual cores without affecting anything else. Also, faceting based on category is achieved by the data type field. You can do searching on multiple cores like you would on a single core, meaning that all the search parameters can be applied. Solr will automatically merge all the data into one result set. Another advantage is if you index frequently, this way will allow you to index at different times and reduce the overall load. Just a thought an an approach... -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
Hi Ahmed, Interesting, I did not think of a multi-core approach. I am not sure, but we might have upto 10 different kinds of data to contend with like property, pets, farming, electronics, travel, auto, jobs, sport etc that might complicate things. Also, one practical limitation we have, is that at times we need to sort across all cores as it were, and return x number of top rows according to a specific filter. I will definitely meditate on your multi-core solution. Regards Eric On Wed, May 12, 2010 at 8:08 PM, ahammad ahmed.ham...@gmail.com wrote: I had the same problem as you last year, i.e. indexing stuff from different sources with different characteristics. The way I approached it is by setting up a multi-core environment, with each core representing one type of data. Within each core, I had a data type sort of field that would define what kind of data is stored (i.e. in your case, it would be auto or real estate etc...). The advantages of this setup is that it allows you to make changes to individual cores without affecting anything else. Also, faceting based on category is achieved by the data type field. You can do searching on multiple cores like you would on a single core, meaning that all the search parameters can be applied. Solr will automatically merge all the data into one result set. Another advantage is if you index frequently, this way will allow you to index at different times and reduce the overall load. Just a thought an an approach... -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813275.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
In our deployment, we thought that complications might arise when attempting to hit the Solr server with addresses of too many cores. For instance, we have 15+ cores running at the moment. At the worst case, we will have to use all 15+ addresses of all the cores to search all our data. What we eventually did was to combine all the cores into a single core, which will basically give us a more clean solution. You will get the simplicity of querying one core, but the flexibility of modifying cores separately. Basically, we have all the cores indexing separately. We set up a script that would use the index merge functionality of Solr to combine all the indexes into a single index accessible through one core. Yes, there will be some overhead on the server, but I believe that it's a good compromise. In our case, we have multiple servers at our disposal, so this was not a problem to implement. It all depends on your data set and the volume of documents that you will be indexing. -- View this message in context: http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multi-valued associated fields
If you do go this route, I'd use something besides a colon, it's too easily confused with a field delimiter. That's just *asking* for trouble... Any non-alpha character you use will cause some grief if you choose an incompatible Analyzer. Even upper/lower case can be split in ways you wouldn't want with WordDelimiterFilterFactory. Just a heads-up.. But the multiple cores idea is also interesting... Best Erick On Wed, May 12, 2010 at 2:20 PM, Eric Grobler impalah...@googlemail.comwrote: Hi Erick, Thank your for your thoughts, I had exactly the same idea like your screenLCD suggestion (but with a semicolon) For example: range1 range2 range3 range_flagsproperties 890 2001 range1;km, range2;kw, range3;year group;auto, make;audi, model;a4, color;red 575 2003 range1;km, range2;kw, range3;year group;auto, make;nissan, model;primeria, ABS 475 2004 range1;km, range2;kw, range3;year group;auto, make;nissan, model;primeria, ABS 4null null range1;rooms group;immo, type;flat 16 null null range1;memory group;handy, make;iphone, model;3GS, memory;16GB The range_flags column will describe the range1, range2, range3 columns so that you can do queries like: q=*:* fq=properties:group;auto fq=range_flags:range1;km fq=range1:[2 TO 8] The range_flag filter is not absolutely nessasary, but it ensures that you do range queries on the data you intent to. Regards Eric On Wed, May 12, 2010 at 3:12 PM, Erick Erickson erickerick...@gmail.com wrote: I'm not entirely sure this is germane, but there's absolutely no requirement that all documents in SOLR have the same fields. So it's possible for you to index the wildly different content in wildly different fields G. Then searching for screen:LCD would be straightforward. Of course this may not map into your problem space at all well, but I thought I'd mention it. There's nothing in SOLR that I know of that understands KEY:VALUE pairs in a multi-valued field. However, you could do some trick like prefix your key, so you'd effectively be indexing screenLCD and screenTFT, but that's kind of awkward if your values have multiple words, although you could still prefix each word (e.g. to index a screen with the value really big one, you'd have to index screenreally, screenbig, screenone or some such). And have you looked at dynamic fields? Again I'm not sure that works for you, but might be worth a look. Best Erick On Wed, May 12, 2010 at 4:17 AM, Eric Grobler impalah...@googlemail.com wrote: Hi Aditya, Thanks for your response. Yes, a category type would be needed. One thing I am not clear about, If you have multi-values like toshiba, tecra, LCD it is then clear that you can run solr queries like: fq=mymultivaluefield:LCD but for associated fields like: make=toshiba, model=tecra, screen=LCD make=toshiba, model=tecra, screen=TFT Is there a way for Solr to understand key=value pairs from a multi-value field? For example you may want to do a filter on screen type: fq=mymultivaluefield:screen and not fq=mymultivaluefield:screen:LCD On Wed, May 12, 2010 at 8:20 AM, findbestopensource findbestopensou...@gmail.com wrote: Hello Eric, Certainly it is possible. I would strongly advice to have field which differentiates the record type (RECORD_TYPE:CAR / PROPERTY). In general I was also wondering how Solr developers implement websites that uses tag filters.For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. By using faceting queries, You could acheive this. Regards Aditya www.findbestopensource.com On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.com wrote: Hallo Solr community, We are considering Solr for searching on content from various partners with wildly different content. Is it possible or practical to work with multi-valued associated fields like this? Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS Type:Flat, Rooms:2, Period:6 months Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock Breed:Siamese, Age:9 weeks and do: - searching on individual keys - range queries within multi-valued fields. - faceting I suppose an alternative would be to create unnamed fields like range1, range2, range3 with a descripter field like Year,KM,EngineSize for a car document and Rooms for a property document for example. In general I was also wondering how Solr developers implement websites that uses tag filters. For example, a user clicks on