Re: Indexing a term into separate Lucene indexes
On 6/19/2014 4:51 PM, Huang, Roger wrote: If I have documents with a person and his email address: u...@domain.commailto:u...@domain.com How can I configure Solr (4.6) so that the email address source field is indexed as - the user part of the address (e.g., user) is in Lucene index X - the domain part of the address (e.g., domain.com) is in a separate Lucene index Y I would like to be able search as follows: - Find all people whose email addresses have user part = userXyz - Find all people whose email addresses have domain part = domainABC.com - Find the person with exact email address = user...@domainabc.com Would I use a copyField declaration in my schema? http://wiki.apache.org/solr/SchemaXml#Copy_Fields I don't think you actually want the data to end up in entirely different indexes. Although it is possible to search more than one separate index, that's very likely NOT what you want to do, and it comes with its own challenges. What you most likely want is to put this data into different fields within the same index. You'll need to write custom code to accomplish this, especially if you need the stored data to contain only the parts rather than the complete email address. A copyField can get the data to additional fields, but I'm not aware of anything built-in to the schema that can trim the unwanted information from the new fields, and even if there is, any stored data will be the original data for all three fields. It's up to you whether this custom code is in a user application that does your indexing or in a custom update processor that you load as a plugin to Solr itself. Extending whatever user application you are already using for indexing is very likely to be a lot easier. Thanks, Shawn
RE: Indexing a term into separate Lucene indexes
Shawn, Thanks for your response. Due to security requirements, I do need the name and domain parts of the email address stored in separate Lucene indexes. How do you recommend doing this? What are the challenges? Once the name and domain parts of the email address are in different Lucene indexes, would I need to modify my Solr search string? Thanks, Roger -Original Message- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, June 20, 2014 10:19 AM To: solr-user@lucene.apache.org Subject: Re: Indexing a term into separate Lucene indexes On 6/19/2014 4:51 PM, Huang, Roger wrote: If I have documents with a person and his email address: u...@domain.commailto:u...@domain.com How can I configure Solr (4.6) so that the email address source field is indexed as - the user part of the address (e.g., user) is in Lucene index X - the domain part of the address (e.g., domain.com) is in a separate Lucene index Y I would like to be able search as follows: - Find all people whose email addresses have user part = userXyz - Find all people whose email addresses have domain part = domainABC.com - Find the person with exact email address = user...@domainabc.com Would I use a copyField declaration in my schema? http://wiki.apache.org/solr/SchemaXml#Copy_Fields I don't think you actually want the data to end up in entirely different indexes. Although it is possible to search more than one separate index, that's very likely NOT what you want to do, and it comes with its own challenges. What you most likely want is to put this data into different fields within the same index. You'll need to write custom code to accomplish this, especially if you need the stored data to contain only the parts rather than the complete email address. A copyField can get the data to additional fields, but I'm not aware of anything built-in to the schema that can trim the unwanted information from the new fields, and even if there is, any stored data will be the original data for all three fields. It's up to you whether this custom code is in a user application that does your indexing or in a custom update processor that you load as a plugin to Solr itself. Extending whatever user application you are already using for indexing is very likely to be a lot easier. Thanks, Shawn
Re: Indexing a term into separate Lucene indexes
On 6/20/2014 10:04 AM, Huang, Roger wrote: Due to security requirements, I do need the name and domain parts of the email address stored in separate Lucene indexes. How do you recommend doing this? What are the challenges? Once the name and domain parts of the email address are in different Lucene indexes, would I need to modify my Solr search string? Solr works best if all the data for an individual document is contained in a single flat schema. As soon as you try to put some of the data in one index and some of the data in another index, you'll probably run into problems combining the data and/or problems with performance. Solr does have some join capability, but when it is mentioned, usually it is to discuss the things it CAN'T do, not the things that it can do. What kind of security requirement would necessitate splitting data that logically belongs together? Thanks, Shawn
Re: Indexing a term into separate Lucene indexes
On 6/20/2014 12:17 PM, Huang, Roger wrote: How would you recommend storing the name and domain parts of the email address in separate Lucene indexes? To query, would I use the Solr cross-core join, fromIndex, toIndex? I have absolutely no idea how to use Solr's join functionality. It is not required for my indexes. Here's the wiki page on the subject: https://wiki.apache.org/solr/Join Additional note: Your reply did not come to the mailing list, it was only sent to me. Thanks, Shawn