Re: multi-valued associated fields

2010-05-15 Thread Lance Norskog
Here's the problem with mixing dissimilar text: relevance. Your text
relevance depends on a document's delta with all other documents in
the index. If you index nothing but technical papers, searching a
technical term will find what you expect. If you mix technical papers
and movie titles, text query will be useless.

On Thu, May 13, 2010 at 12:06 PM, Eric Grobler
impalah...@googlemail.com wrote:
 Hi Ahmed

 Thanks again for sharing your insight and experience.
 I will discuss the multi-core approach with members of our team.

 Regards
 Eric

 On Wed, May 12, 2010 at 9:24 PM, ahammad ahmed.ham...@gmail.com wrote:


 In our deployment, we thought that complications might arise when
 attempting
 to hit the Solr server with addresses of too many cores. For instance, we
 have 15+ cores running at the moment. At the worst case, we will have to
 use
 all 15+ addresses of all the cores to search all our data. What we
 eventually did was to combine all the cores into a single core, which will
 basically give us a more clean solution. You will get the simplicity of
 querying one core, but the flexibility of modifying cores separately.

 Basically, we have all the cores indexing separately. We set up a script
 that would use the index merge functionality of Solr to combine all the
 indexes into a single index accessible through one core. Yes, there will be
 some overhead on the server, but I believe that it's a good compromise. In
 our case, we have multiple servers at our disposal, so this was not a
 problem to implement. It all depends on your data set and the volume of
 documents that you will be indexing.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
Lance Norskog
goks...@gmail.com


Re: multi-valued associated fields

2010-05-13 Thread Eric Grobler
Hi Ahmed

Thanks again for sharing your insight and experience.
I will discuss the multi-core approach with members of our team.

Regards
Eric

On Wed, May 12, 2010 at 9:24 PM, ahammad ahmed.ham...@gmail.com wrote:


 In our deployment, we thought that complications might arise when
 attempting
 to hit the Solr server with addresses of too many cores. For instance, we
 have 15+ cores running at the moment. At the worst case, we will have to
 use
 all 15+ addresses of all the cores to search all our data. What we
 eventually did was to combine all the cores into a single core, which will
 basically give us a more clean solution. You will get the simplicity of
 querying one core, but the flexibility of modifying cores separately.

 Basically, we have all the cores indexing separately. We set up a script
 that would use the index merge functionality of Solr to combine all the
 indexes into a single index accessible through one core. Yes, there will be
 some overhead on the server, but I believe that it's a good compromise. In
 our case, we have multiple servers at our disposal, so this was not a
 problem to implement. It all depends on your data set and the volume of
 documents that you will be indexing.

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: multi-valued associated fields

2010-05-12 Thread findbestopensource
Hello Eric,

Certainly it is possible. I would strongly advice to have field which
differentiates the record type (RECORD_TYPE:CAR / PROPERTY).

In general I was also wondering how Solr developers implement websites
that
uses tag filters.For example, a user clicks on Hard drives then get tags
External,
Internal then clicks on External and gets usb, firewire etc.
By using faceting queries, You could acheive this.

Regards
Aditya
www.findbestopensource.com




On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.comwrote:

 Hallo Solr community,

 We are considering Solr for searching on content from various partners
 with wildly different content.

 Is it possible or practical to work with multi-valued associated fields
 like
 this?
 Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS
 Type:Flat, Rooms:2, Period:6 months
 Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock
 Breed:Siamese, Age:9 weeks

 and do:
 - searching on individual keys
 - range queries within multi-valued fields.
 - faceting

 I suppose an alternative would be to create unnamed fields like
 range1, range2, range3 with a descripter field like
  Year,KM,EngineSize for a car document and
  Rooms for a property document for example.

 In general I was also wondering how Solr developers implement websites that
 uses tag filters.
 For example, a user clicks on Hard drives then get tags External,
 Internal then clicks on External and gets usb, firewire etc.

 Any suggestions and feedback would be greatly appreciated.

 Regards
 Eric



Re: multi-valued associated fields

2010-05-12 Thread Eric Grobler
Hi Aditya,

Thanks for your response.
Yes, a category type would be needed.

One thing I am not clear about,
If you have multi-values like toshiba, tecra, LCD
it is then clear that you can run solr queries like:
  fq=mymultivaluefield:LCD

but for associated fields like:
  make=toshiba, model=tecra, screen=LCD
  make=toshiba, model=tecra, screen=TFT
Is there a way for Solr to understand key=value pairs from a multi-value
field?
For example you may want to do a filter on screen type:
  fq=mymultivaluefield:screen
and not
  fq=mymultivaluefield:screen:LCD



On Wed, May 12, 2010 at 8:20 AM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Hello Eric,

 Certainly it is possible. I would strongly advice to have field which
 differentiates the record type (RECORD_TYPE:CAR / PROPERTY).

 In general I was also wondering how Solr developers implement websites
 that
 uses tag filters.For example, a user clicks on Hard drives then get tags
 External,
 Internal then clicks on External and gets usb, firewire etc.
 By using faceting queries, You could acheive this.

 Regards
 Aditya
 www.findbestopensource.com




 On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.com
 wrote:

  Hallo Solr community,
 
  We are considering Solr for searching on content from various partners
  with wildly different content.
 
  Is it possible or practical to work with multi-valued associated fields
  like
  this?
  Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS
  Type:Flat, Rooms:2, Period:6 months
  Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock
  Breed:Siamese, Age:9 weeks
 
  and do:
  - searching on individual keys
  - range queries within multi-valued fields.
  - faceting
 
  I suppose an alternative would be to create unnamed fields like
  range1, range2, range3 with a descripter field like
   Year,KM,EngineSize for a car document and
   Rooms for a property document for example.
 
  In general I was also wondering how Solr developers implement websites
 that
  uses tag filters.
  For example, a user clicks on Hard drives then get tags External,
  Internal then clicks on External and gets usb, firewire etc.
 
  Any suggestions and feedback would be greatly appreciated.
 
  Regards
  Eric
 



Re: multi-valued associated fields

2010-05-12 Thread Eric Grobler
Hi Erick,

Thank your for your thoughts,
I had exactly the same idea like your screenLCD suggestion (but with a
semicolon)
For example:
range1   range2 range3 range_flagsproperties
890 2001   range1;km, range2;kw, range3;year  group;auto,
make;audi, model;a4, color;red
575 2003   range1;km, range2;kw, range3;year  group;auto,
make;nissan, model;primeria, ABS
475 2004   range1;km, range2;kw, range3;year  group;auto,
make;nissan, model;primeria, ABS
4null   null   range1;rooms   group;immo,
type;flat
16   null   null   range1;memory  group;handy,
make;iphone, model;3GS, memory;16GB

The range_flags column will describe the range1, range2, range3 columns so
that you can do queries like:
q=*:*
fq=properties:group;auto
fq=range_flags:range1;km
fq=range1:[2 TO 8]

The range_flag filter is not absolutely nessasary, but it ensures that you
do range queries on the data you intent to.

Regards
Eric

On Wed, May 12, 2010 at 3:12 PM, Erick Erickson erickerick...@gmail.comwrote:

 I'm not entirely sure this is germane, but there's absolutely no
 requirement
 that
 all documents in SOLR have the same fields. So it's possible for you to
 index
 the wildly different content in wildly different fields G. Then
 searching for
 screen:LCD would be straightforward. Of course this may not map into your
 problem space at all well, but I thought I'd mention it.

 There's nothing in SOLR that I know of that understands KEY:VALUE pairs in
 a multi-valued field. However, you could do some trick like prefix your
 key,
 so you'd
 effectively be indexing screenLCD and screenTFT, but that's kind of awkward
 if your
 values have multiple words, although you could still prefix each word (e.g.
 to index a screen with the value really big one, you'd have to index
 screenreally, screenbig, screenone or some such).

 And have you looked at dynamic fields? Again I'm not sure that works for
 you, but
 might be worth a look.

 Best
 Erick

 On Wed, May 12, 2010 at 4:17 AM, Eric Grobler impalah...@googlemail.com
 wrote:

  Hi Aditya,
 
  Thanks for your response.
  Yes, a category type would be needed.
 
  One thing I am not clear about,
  If you have multi-values like toshiba, tecra, LCD
  it is then clear that you can run solr queries like:
   fq=mymultivaluefield:LCD
 
  but for associated fields like:
   make=toshiba, model=tecra, screen=LCD
   make=toshiba, model=tecra, screen=TFT
  Is there a way for Solr to understand key=value pairs from a multi-value
  field?
  For example you may want to do a filter on screen type:
   fq=mymultivaluefield:screen
  and not
   fq=mymultivaluefield:screen:LCD
 
 
 
  On Wed, May 12, 2010 at 8:20 AM, findbestopensource 
  findbestopensou...@gmail.com wrote:
 
   Hello Eric,
  
   Certainly it is possible. I would strongly advice to have field which
   differentiates the record type (RECORD_TYPE:CAR / PROPERTY).
  
   In general I was also wondering how Solr developers implement
 websites
   that
   uses tag filters.For example, a user clicks on Hard drives then get
  tags
   External,
   Internal then clicks on External and gets usb, firewire etc.
   By using faceting queries, You could acheive this.
  
   Regards
   Aditya
   www.findbestopensource.com
  
  
  
  
   On Wed, May 12, 2010 at 12:29 PM, Eric Grobler 
  impalah...@googlemail.com
   wrote:
  
Hallo Solr community,
   
We are considering Solr for searching on content from various
 partners
with wildly different content.
   
Is it possible or practical to work with multi-valued associated
 fields
like
this?
Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS
Type:Flat, Rooms:2, Period:6 months
Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock
Breed:Siamese, Age:9 weeks
   
and do:
- searching on individual keys
- range queries within multi-valued fields.
- faceting
   
I suppose an alternative would be to create unnamed fields like
range1, range2, range3 with a descripter field like
 Year,KM,EngineSize for a car document and
 Rooms for a property document for example.
   
In general I was also wondering how Solr developers implement
 websites
   that
uses tag filters.
For example, a user clicks on Hard drives then get tags External,
Internal then clicks on External and gets usb, firewire etc.
   
Any suggestions and feedback would be greatly appreciated.
   
Regards
Eric
   
  
 



Re: multi-valued associated fields

2010-05-12 Thread ahammad

I had the same problem as you last year, i.e. indexing stuff from different
sources with different characteristics. The way I approached it is by
setting up a multi-core environment, with each core representing one type of
data. Within each core, I had a data type sort of field that would define
what kind of data is stored (i.e. in your case, it would be auto or real
estate etc...).

The advantages of this setup is that it allows you to make changes to
individual cores without affecting anything else. Also, faceting based on
category is achieved by the data type field. You can do searching on
multiple cores like you would on a single core, meaning that all the search
parameters can be applied. Solr will automatically merge all the data into
one result set. Another advantage is if you index frequently, this way will
allow you to index at different times and reduce the overall load. Just a
thought an an approach...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813275.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multi-valued associated fields

2010-05-12 Thread Eric Grobler
Hi Ahmed,

Interesting, I did not think of a multi-core approach.

I am not sure, but we might have upto 10 different kinds of data to contend
with like property, pets, farming, electronics, travel, auto, jobs, sport
etc that might complicate things.
Also, one practical limitation we have, is that at times we need to sort
across all cores as it were, and return x number of top rows according to a
specific filter.

I will definitely meditate on your multi-core solution.

Regards
Eric




On Wed, May 12, 2010 at 8:08 PM, ahammad ahmed.ham...@gmail.com wrote:


 I had the same problem as you last year, i.e. indexing stuff from different
 sources with different characteristics. The way I approached it is by
 setting up a multi-core environment, with each core representing one type
 of
 data. Within each core, I had a data type sort of field that would define
 what kind of data is stored (i.e. in your case, it would be auto or real
 estate etc...).

 The advantages of this setup is that it allows you to make changes to
 individual cores without affecting anything else. Also, faceting based on
 category is achieved by the data type field. You can do searching on
 multiple cores like you would on a single core, meaning that all the search
 parameters can be applied. Solr will automatically merge all the data into
 one result set. Another advantage is if you index frequently, this way will
 allow you to index at different times and reduce the overall load. Just a
 thought an an approach...
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813275.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: multi-valued associated fields

2010-05-12 Thread ahammad

In our deployment, we thought that complications might arise when attempting
to hit the Solr server with addresses of too many cores. For instance, we
have 15+ cores running at the moment. At the worst case, we will have to use
all 15+ addresses of all the cores to search all our data. What we
eventually did was to combine all the cores into a single core, which will
basically give us a more clean solution. You will get the simplicity of
querying one core, but the flexibility of modifying cores separately. 

Basically, we have all the cores indexing separately. We set up a script
that would use the index merge functionality of Solr to combine all the
indexes into a single index accessible through one core. Yes, there will be
some overhead on the server, but I believe that it's a good compromise. In
our case, we have multiple servers at our disposal, so this was not a
problem to implement. It all depends on your data set and the volume of
documents that you will be indexing. 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-tp811883p813419.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: multi-valued associated fields

2010-05-12 Thread Erick Erickson
If you do go this route, I'd use something besides a colon, it's too easily
confused with a field delimiter. That's just *asking* for trouble...

Any non-alpha character you use will cause some grief if you choose an
incompatible Analyzer. Even upper/lower case can be split in ways
you wouldn't want with WordDelimiterFilterFactory.

Just a heads-up..

But the multiple cores idea is also interesting...

Best
Erick

On Wed, May 12, 2010 at 2:20 PM, Eric Grobler impalah...@googlemail.comwrote:

 Hi Erick,

 Thank your for your thoughts,
 I had exactly the same idea like your screenLCD suggestion (but with a
 semicolon)
 For example:
 range1   range2 range3 range_flagsproperties
 890 2001   range1;km, range2;kw, range3;year  group;auto,
 make;audi, model;a4, color;red
 575 2003   range1;km, range2;kw, range3;year  group;auto,
 make;nissan, model;primeria, ABS
 475 2004   range1;km, range2;kw, range3;year  group;auto,
 make;nissan, model;primeria, ABS
 4null   null   range1;rooms   group;immo,
 type;flat
 16   null   null   range1;memory  group;handy,
 make;iphone, model;3GS, memory;16GB

 The range_flags column will describe the range1, range2, range3 columns so
 that you can do queries like:
 q=*:*
 fq=properties:group;auto
 fq=range_flags:range1;km
 fq=range1:[2 TO 8]

 The range_flag filter is not absolutely nessasary, but it ensures that you
 do range queries on the data you intent to.

 Regards
 Eric

 On Wed, May 12, 2010 at 3:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  I'm not entirely sure this is germane, but there's absolutely no
  requirement
  that
  all documents in SOLR have the same fields. So it's possible for you to
  index
  the wildly different content in wildly different fields G. Then
  searching for
  screen:LCD would be straightforward. Of course this may not map into your
  problem space at all well, but I thought I'd mention it.
 
  There's nothing in SOLR that I know of that understands KEY:VALUE pairs
 in
  a multi-valued field. However, you could do some trick like prefix your
  key,
  so you'd
  effectively be indexing screenLCD and screenTFT, but that's kind of
 awkward
  if your
  values have multiple words, although you could still prefix each word
 (e.g.
  to index a screen with the value really big one, you'd have to index
  screenreally, screenbig, screenone or some such).
 
  And have you looked at dynamic fields? Again I'm not sure that works for
  you, but
  might be worth a look.
 
  Best
  Erick
 
  On Wed, May 12, 2010 at 4:17 AM, Eric Grobler impalah...@googlemail.com
  wrote:
 
   Hi Aditya,
  
   Thanks for your response.
   Yes, a category type would be needed.
  
   One thing I am not clear about,
   If you have multi-values like toshiba, tecra, LCD
   it is then clear that you can run solr queries like:
fq=mymultivaluefield:LCD
  
   but for associated fields like:
make=toshiba, model=tecra, screen=LCD
make=toshiba, model=tecra, screen=TFT
   Is there a way for Solr to understand key=value pairs from a
 multi-value
   field?
   For example you may want to do a filter on screen type:
fq=mymultivaluefield:screen
   and not
fq=mymultivaluefield:screen:LCD
  
  
  
   On Wed, May 12, 2010 at 8:20 AM, findbestopensource 
   findbestopensou...@gmail.com wrote:
  
Hello Eric,
   
Certainly it is possible. I would strongly advice to have field which
differentiates the record type (RECORD_TYPE:CAR / PROPERTY).
   
In general I was also wondering how Solr developers implement
  websites
that
uses tag filters.For example, a user clicks on Hard drives then get
   tags
External,
Internal then clicks on External and gets usb, firewire etc.
By using faceting queries, You could acheive this.
   
Regards
Aditya
www.findbestopensource.com
   
   
   
   
On Wed, May 12, 2010 at 12:29 PM, Eric Grobler 
   impalah...@googlemail.com
wrote:
   
 Hallo Solr community,

 We are considering Solr for searching on content from various
  partners
 with wildly different content.

 Is it possible or practical to work with multi-valued associated
  fields
 like
 this?
 Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS
 Type:Flat, Rooms:2, Period:6 months
 Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock
 Breed:Siamese, Age:9 weeks

 and do:
 - searching on individual keys
 - range queries within multi-valued fields.
 - faceting

 I suppose an alternative would be to create unnamed fields like
 range1, range2, range3 with a descripter field like
  Year,KM,EngineSize for a car document and
  Rooms for a property document for example.

 In general I was also wondering how Solr developers implement
  websites
that
 uses tag filters.
 For example, a user clicks on