Re: A schema inside a Solr Schema (Schema in a can)
Here is a thread on this subject that I did not find earlier. Sometimes discussion, thought, and 'mulling' in the subconcious gets me better Google searches. http://lucene.472066.n3.nabble.com/multi-valued-associated-fields-td811883.html Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Dennis Gearon To: solr-user@lucene.apache.org Sent: Mon, December 20, 2010 10:19:53 AM Subject: Re: A schema inside a Solr Schema (Schema in a can) Thanks James. So being accurate with fields with fields(mulitvalues) is probably not possible using all the currently made analyzers. - Original Message From: "Dyer, James" To: "solr-user@lucene.apache.org" Sent: Mon, December 20, 2010 7:16:43 AM Subject: RE: A schema inside a Solr Schema (Schema in a can) Dennis, If you need to search a key/value pair, you'll have to put them both in the same field, somehow. One way is to re-index them using the key in the fieldname. For instance, suppose you have: contributor: dyer, james contributor: smith, sam role: author role: editor ...but you want to search only for authors, you could index these again with fieldnames like: contrib_author: dyer, james contrib_editor: smith, sam Then you would query "q=contributor:smtih" to search all contribtors and q=contrib_editor:smith just to get editors. Another way to do it is to use some type of marker character sequence to define the "key" and index it like this: contributor: dyer, james __author contributor: smith, sam __editor then you can query like this: "q=contributor:"smith __editor"~50 ... to search only for editors named Smith. We are not yet fully developed here on SOLR but we currently use both of these approaches using a different search engine. One nice thing SOLR could add to this second approach that is not an option with our other system is the possibility of writing a custom analyzer that could maybe take some of the complexity out of the app. Not sure exactly how it'd work though... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Friday, December 17, 2010 6:52 PM To: solr-user@lucene.apache.org Subject: RE: A schema inside a Solr Schema (Schema in a can) So this is a current usable plugin (except for the latest bug)? And, is it possible to search jwithin ust one key:value pair in a multivalued field? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 12/17/10, Ahmet Arslan wrote: > From: Ahmet Arslan > Subject: RE: A schema inside a Solr Schema (Schema in a can) > To: solr-user@lucene.apache.org > Date: Friday, December 17, 2010, 12:47 PM > > The problem with this approach > is that Lucene doesn't > > support wildcards in phrases. > > With https://issues.apache.org/jira/browse/SOLR-1604 you can > do that. > > > >
Re: A schema inside a Solr Schema (Schema in a can)
Thanks James. So being accurate with fields with fields(mulitvalues) is probably not possible using all the currently made analyzers. - Original Message From: "Dyer, James" To: "solr-user@lucene.apache.org" Sent: Mon, December 20, 2010 7:16:43 AM Subject: RE: A schema inside a Solr Schema (Schema in a can) Dennis, If you need to search a key/value pair, you'll have to put them both in the same field, somehow. One way is to re-index them using the key in the fieldname. For instance, suppose you have: contributor: dyer, james contributor: smith, sam role: author role: editor ...but you want to search only for authors, you could index these again with fieldnames like: contrib_author: dyer, james contrib_editor: smith, sam Then you would query "q=contributor:smtih" to search all contribtors and q=contrib_editor:smith just to get editors. Another way to do it is to use some type of marker character sequence to define the "key" and index it like this: contributor: dyer, james __author contributor: smith, sam __editor then you can query like this: "q=contributor:"smith __editor"~50 ... to search only for editors named Smith. We are not yet fully developed here on SOLR but we currently use both of these approaches using a different search engine. One nice thing SOLR could add to this second approach that is not an option with our other system is the possibility of writing a custom analyzer that could maybe take some of the complexity out of the app. Not sure exactly how it'd work though... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Friday, December 17, 2010 6:52 PM To: solr-user@lucene.apache.org Subject: RE: A schema inside a Solr Schema (Schema in a can) So this is a current usable plugin (except for the latest bug)? And, is it possible to search jwithin ust one key:value pair in a multivalued field? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 12/17/10, Ahmet Arslan wrote: > From: Ahmet Arslan > Subject: RE: A schema inside a Solr Schema (Schema in a can) > To: solr-user@lucene.apache.org > Date: Friday, December 17, 2010, 12:47 PM > > The problem with this approach > is that Lucene doesn't > > support wildcards in phrases. > > With https://issues.apache.org/jira/browse/SOLR-1604 you can > do that. > > > >
RE: A schema inside a Solr Schema (Schema in a can)
Dennis, If you need to search a key/value pair, you'll have to put them both in the same field, somehow. One way is to re-index them using the key in the fieldname. For instance, suppose you have: contributor: dyer, james contributor: smith, sam role: author role: editor ...but you want to search only for authors, you could index these again with fieldnames like: contrib_author: dyer, james contrib_editor: smith, sam Then you would query "q=contributor:smtih" to search all contribtors and q=contrib_editor:smith just to get editors. Another way to do it is to use some type of marker character sequence to define the "key" and index it like this: contributor: dyer, james __author contributor: smith, sam __editor then you can query like this: "q=contributor:"smith __editor"~50 ... to search only for editors named Smith. We are not yet fully developed here on SOLR but we currently use both of these approaches using a different search engine. One nice thing SOLR could add to this second approach that is not an option with our other system is the possibility of writing a custom analyzer that could maybe take some of the complexity out of the app. Not sure exactly how it'd work though... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Friday, December 17, 2010 6:52 PM To: solr-user@lucene.apache.org Subject: RE: A schema inside a Solr Schema (Schema in a can) So this is a current usable plugin (except for the latest bug)? And, is it possible to search jwithin ust one key:value pair in a multivalued field? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 12/17/10, Ahmet Arslan wrote: > From: Ahmet Arslan > Subject: RE: A schema inside a Solr Schema (Schema in a can) > To: solr-user@lucene.apache.org > Date: Friday, December 17, 2010, 12:47 PM > > The problem with this approach > is that Lucene doesn't > > support wildcards in phrases. > > With https://issues.apache.org/jira/browse/SOLR-1604 you can > do that. > > > >
RE: A schema inside a Solr Schema (Schema in a can)
So this is a current usable plugin (except for the latest bug)? And, is it possible to search jwithin ust one key:value pair in a multivalued field? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 12/17/10, Ahmet Arslan wrote: > From: Ahmet Arslan > Subject: RE: A schema inside a Solr Schema (Schema in a can) > To: solr-user@lucene.apache.org > Date: Friday, December 17, 2010, 12:47 PM > > The problem with this approach > is that Lucene doesn't > > support wildcards in phrases. > > With https://issues.apache.org/jira/browse/SOLR-1604 you can > do that. > > > >
RE: A schema inside a Solr Schema (Schema in a can)
> The problem with this approach is that Lucene doesn't > support wildcards in phrases. With https://issues.apache.org/jira/browse/SOLR-1604 you can do that.
RE: A schema inside a Solr Schema (Schema in a can)
Quite a bit of this is over hy head at this point. I shold NOT have duplicate fields in the column. I wonder how that affects things. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Fri, 12/17/10, Dyer, James wrote: > From: Dyer, James > Subject: RE: A schema inside a Solr Schema (Schema in a can) > To: "solr-user@lucene.apache.org" > Date: Friday, December 17, 2010, 9:43 AM > There's also one "gotcha" we've > experienced when searching acrosse multi-valued > fields: SOLR will match across field occurences. > In the example below, if you were to search > q=contrib_name:(james AND smith), you will get this record > back. It matches one name from one contributor and > another name from a different contributor. This is not > what our users want. > > As a work-around, I am converting these to phrase queries > with slop: "james smith"~50 ... Just use a slop # > smaller than your positionIncrementGap and bigger than the # > of terms entered. This will prevent the cross-field > matches yet allow the words to occur in any order. > > The problem with this approach is that Lucene doesn't > support wildcards in phrases. Unlucky for us, because > our app automatically adds a wildcard to every term entered > in Contributor searching. So when we convert to SOLR > we will have to disable this "feature" for multi-word > queries. I experimented with the double metaphone > filter (too many false positive matches) and edge n-gram > filter (could make the index very big) to alleviate this > loss of functionality. Currently I have it set up to > index each name as the full name plus the first > initial. (so "j dyer" would match but not "ja dyer") > If this is considered not-good-enough, we can probably see > about doing the edge n-grams several characters out... > > > If anyone else has any other ideas I should try, please do > speak up. Thank you. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Dyer, James > Sent: Friday, December 17, 2010 10:59 AM > To: solr-user@lucene.apache.org > Subject: RE: A schema inside a Solr Schema (Schema in a > can) > > Dennis, > > I may be misunderstanding your question, but think I've > just worked through something similar. We're indexing > book metadata, and a book can have more than one > Contributor. We want to store both the contributor's > name, their Role and their id (from our rel db). With > our old system, we had to do something like this: > > contrib: dyer, james|author|123 > contrib: smith, sam|editor|456 > > But Lucene/Solr will guanantee that multivalued fields > return in exactly the same order you put them in. So > with SOLR we can do this: > > contrib_name: dyer, james > contrib_name: smith, sam > contrib_role: author > contrib_role: editor > contrib_id:123 > contrib_id:456 > > The trick is to be very careful you put everything in the > same order (its easy if it is all from the same SQL query > from an relational database). If one of the data > elements is a NULL you have to use a placeholder (like an > empty string or a zero). > > Another option is use a dynamic field: > > contrib_123: dyer, james > contrib_456: smith, sam > > The problem here is if you want to display and use a > fieldlist (fl=), you cannot use wildcards (ex: fl=contrib_* > doesn't work). Same for searching (q=, qf=). You > can only use dynamic fields if you know the fieldname at > runtime you need to deal with. > > Both of these options might be more work for your app to > deal than the delimiter approach. And, in our case, we > could stick with the delimiter field and store it and then > have a separate indexed field that just has the name (as > this is all we search on). You could even just have 1 > field if you used a fancy analysis sequence that would only > index the element(s) you wanted indexes... > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > Sent: Friday, December 17, 2010 12:43 AM > To: solr-user@lucene.apache.org > Subject: A schema inside a Solr Schema (Schema in a can) > > Is it possible to put name value pairs of any type in a
RE: A schema inside a Solr Schema (Schema in a can)
You've given me some tings to think about, James, thanks. Dennis Gearon -- On Fri, 12/17/10, Dyer, James wrote: > From: Dyer, James > Subject: RE: A schema inside a Solr Schema (Schema in a can) > To: "solr-user@lucene.apache.org" > Date: Friday, December 17, 2010, 8:58 AM > Dennis, > > I may be misunderstanding your question, but think I've > just worked through something similar. We're indexing > book metadata, and a book can have more than one > Contributor. We want to store both the contributor's > name, their Role and their id (from our rel db). With > our old system, we had to do something like this: > > contrib: dyer, james|author|123 > contrib: smith, sam|editor|456 > > But Lucene/Solr will guanantee that multivalued fields > return in exactly the same order you put them in. So > with SOLR we can do this: > > contrib_name: dyer, james > contrib_name: smith, sam > contrib_role: author > contrib_role: editor > contrib_id:123 > contrib_id:456 > > The trick is to be very careful you put everything in the > same order (its easy if it is all from the same SQL query > from an relational database). If one of the data > elements is a NULL you have to use a placeholder (like an > empty string or a zero). > > Another option is use a dynamic field: > > contrib_123: dyer, james > contrib_456: smith, sam > > The problem here is if you want to display and use a > fieldlist (fl=), you cannot use wildcards (ex: fl=contrib_* > doesn't work). Same for searching (q=, qf=). You > can only use dynamic fields if you know the fieldname at > runtime you need to deal with. > > Both of these options might be more work for your app to > deal than the delimiter approach. And, in our case, we > could stick with the delimiter field and store it and then > have a separate indexed field that just has the name (as > this is all we search on). You could even just have 1 > field if you used a fancy analysis sequence that would only > index the element(s) you wanted indexes... > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -Original Message- > From: Dennis Gearon [mailto:gear...@sbcglobal.net] > > Sent: Friday, December 17, 2010 12:43 AM > To: solr-user@lucene.apache.org > Subject: A schema inside a Solr Schema (Schema in a can) > > Is it possible to put name value pairs of any type in a > native Solr Index field type? Like JSON/XML/YML? > > The reason that I ask, since you asked, is I want my main > index schema to be a base object, and another multivalue > column to be the attributes of base object inherited > descendants. > > Is there any other way to do this? > > What are the limitations in searching and indexing > documents with multivalue fields? > > Dennis Gearon > > Signature Warning > > It is always a good idea to learn from your own mistakes. > It is usually a better idea to learn from others’ > mistakes, so you do not have to make them yourself. from > 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > EARTH has a Right To Life, > otherwise we all die. >
RE: A schema inside a Solr Schema (Schema in a can)
There's also one "gotcha" we've experienced when searching acrosse multi-valued fields: SOLR will match across field occurences. In the example below, if you were to search q=contrib_name:(james AND smith), you will get this record back. It matches one name from one contributor and another name from a different contributor. This is not what our users want. As a work-around, I am converting these to phrase queries with slop: "james smith"~50 ... Just use a slop # smaller than your positionIncrementGap and bigger than the # of terms entered. This will prevent the cross-field matches yet allow the words to occur in any order. The problem with this approach is that Lucene doesn't support wildcards in phrases. Unlucky for us, because our app automatically adds a wildcard to every term entered in Contributor searching. So when we convert to SOLR we will have to disable this "feature" for multi-word queries. I experimented with the double metaphone filter (too many false positive matches) and edge n-gram filter (could make the index very big) to alleviate this loss of functionality. Currently I have it set up to index each name as the full name plus the first initial. (so "j dyer" would match but not "ja dyer") If this is considered not-good-enough, we can probably see about doing the edge n-grams several characters out... If anyone else has any other ideas I should try, please do speak up. Thank you. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dyer, James Sent: Friday, December 17, 2010 10:59 AM To: solr-user@lucene.apache.org Subject: RE: A schema inside a Solr Schema (Schema in a can) Dennis, I may be misunderstanding your question, but think I've just worked through something similar. We're indexing book metadata, and a book can have more than one Contributor. We want to store both the contributor's name, their Role and their id (from our rel db). With our old system, we had to do something like this: contrib: dyer, james|author|123 contrib: smith, sam|editor|456 But Lucene/Solr will guanantee that multivalued fields return in exactly the same order you put them in. So with SOLR we can do this: contrib_name: dyer, james contrib_name: smith, sam contrib_role: author contrib_role: editor contrib_id:123 contrib_id:456 The trick is to be very careful you put everything in the same order (its easy if it is all from the same SQL query from an relational database). If one of the data elements is a NULL you have to use a placeholder (like an empty string or a zero). Another option is use a dynamic field: contrib_123: dyer, james contrib_456: smith, sam The problem here is if you want to display and use a fieldlist (fl=), you cannot use wildcards (ex: fl=contrib_* doesn't work). Same for searching (q=, qf=). You can only use dynamic fields if you know the fieldname at runtime you need to deal with. Both of these options might be more work for your app to deal than the delimiter approach. And, in our case, we could stick with the delimiter field and store it and then have a separate indexed field that just has the name (as this is all we search on). You could even just have 1 field if you used a fancy analysis sequence that would only index the element(s) you wanted indexes... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Friday, December 17, 2010 12:43 AM To: solr-user@lucene.apache.org Subject: A schema inside a Solr Schema (Schema in a can) Is it possible to put name value pairs of any type in a native Solr Index field type? Like JSON/XML/YML? The reason that I ask, since you asked, is I want my main index schema to be a base object, and another multivalue column to be the attributes of base object inherited descendants. Is there any other way to do this? What are the limitations in searching and indexing documents with multivalue fields? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
RE: A schema inside a Solr Schema (Schema in a can)
Dennis, I may be misunderstanding your question, but think I've just worked through something similar. We're indexing book metadata, and a book can have more than one Contributor. We want to store both the contributor's name, their Role and their id (from our rel db). With our old system, we had to do something like this: contrib: dyer, james|author|123 contrib: smith, sam|editor|456 But Lucene/Solr will guanantee that multivalued fields return in exactly the same order you put them in. So with SOLR we can do this: contrib_name: dyer, james contrib_name: smith, sam contrib_role: author contrib_role: editor contrib_id:123 contrib_id:456 The trick is to be very careful you put everything in the same order (its easy if it is all from the same SQL query from an relational database). If one of the data elements is a NULL you have to use a placeholder (like an empty string or a zero). Another option is use a dynamic field: contrib_123: dyer, james contrib_456: smith, sam The problem here is if you want to display and use a fieldlist (fl=), you cannot use wildcards (ex: fl=contrib_* doesn't work). Same for searching (q=, qf=). You can only use dynamic fields if you know the fieldname at runtime you need to deal with. Both of these options might be more work for your app to deal than the delimiter approach. And, in our case, we could stick with the delimiter field and store it and then have a separate indexed field that just has the name (as this is all we search on). You could even just have 1 field if you used a fancy analysis sequence that would only index the element(s) you wanted indexes... James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Friday, December 17, 2010 12:43 AM To: solr-user@lucene.apache.org Subject: A schema inside a Solr Schema (Schema in a can) Is it possible to put name value pairs of any type in a native Solr Index field type? Like JSON/XML/YML? The reason that I ask, since you asked, is I want my main index schema to be a base object, and another multivalue column to be the attributes of base object inherited descendants. Is there any other way to do this? What are the limitations in searching and indexing documents with multivalue fields? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die.