Re: Indexing a term into separate Lucene indexes

2014-06-20 Thread Shawn Heisey
On 6/19/2014 4:51 PM, Huang, Roger wrote:
 If I have documents with a person and his email address: 
 u...@domain.commailto:u...@domain.com

 How can I configure Solr (4.6) so that the email address source field is 
 indexed as

 -  the user part of the address (e.g., user) is in Lucene index X

 -  the domain part of the address (e.g., domain.com) is in a 
 separate Lucene index Y

 I would like to be able search as follows:

 -  Find all people whose email addresses have user part = userXyz

 -  Find all people whose email addresses have domain part = 
 domainABC.com

 -  Find the person with exact email address = user...@domainabc.com

 Would I use a copyField declaration in my schema?
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different
indexes.  Although it is possible to search more than one separate
index, that's very likely NOT what you want to do, and it comes with its
own challenges.  What you most likely want is to put this data into
different fields within the same index.

You'll need to write custom code to accomplish this, especially if you
need the stored data to contain only the parts rather than the complete
email address.  A copyField can get the data to additional fields, but
I'm not aware of anything built-in to the schema that can trim the
unwanted information from the new fields, and even if there is, any
stored data will be the original data for all three fields.  It's up to
you whether this custom code is in a user application that does your
indexing or in a custom update processor that you load as a plugin to
Solr itself.  Extending whatever user application you are already using
for indexing is very likely to be a lot easier.

Thanks,
Shawn



RE: Indexing a term into separate Lucene indexes

2014-06-20 Thread Huang, Roger
Shawn,
Thanks for your response.
Due to security requirements, I do need the name and domain parts of the email 
address stored in separate Lucene indexes.
How do you recommend doing this?  What are the challenges?
Once the name and domain parts of the email address are in different Lucene 
indexes, would I need to modify my  Solr search string?
Thanks,
Roger


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Friday, June 20, 2014 10:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing a term into separate Lucene indexes

On 6/19/2014 4:51 PM, Huang, Roger wrote:
 If I have documents with a person and his email address: 
 u...@domain.commailto:u...@domain.com

 How can I configure Solr (4.6) so that the email address source field 
 is indexed as

 -  the user part of the address (e.g., user) is in Lucene index X

 -  the domain part of the address (e.g., domain.com) is in a 
 separate Lucene index Y

 I would like to be able search as follows:

 -  Find all people whose email addresses have user part = userXyz

 -  Find all people whose email addresses have domain part = 
 domainABC.com

 -  Find the person with exact email address = user...@domainabc.com

 Would I use a copyField declaration in my schema?
 http://wiki.apache.org/solr/SchemaXml#Copy_Fields

I don't think you actually want the data to end up in entirely different 
indexes.  Although it is possible to search more than one separate index, 
that's very likely NOT what you want to do, and it comes with its own 
challenges.  What you most likely want is to put this data into different 
fields within the same index.

You'll need to write custom code to accomplish this, especially if you need the 
stored data to contain only the parts rather than the complete email address.  
A copyField can get the data to additional fields, but I'm not aware of 
anything built-in to the schema that can trim the unwanted information from the 
new fields, and even if there is, any stored data will be the original data for 
all three fields.  It's up to you whether this custom code is in a user 
application that does your indexing or in a custom update processor that you 
load as a plugin to Solr itself.  Extending whatever user application you are 
already using for indexing is very likely to be a lot easier.

Thanks,
Shawn



Re: Indexing a term into separate Lucene indexes

2014-06-20 Thread Shawn Heisey
On 6/20/2014 10:04 AM, Huang, Roger wrote:
 Due to security requirements, I do need the name and domain parts of the 
 email address stored in separate Lucene indexes.
 How do you recommend doing this?  What are the challenges?
 Once the name and domain parts of the email address are in different Lucene 
 indexes, would I need to modify my  Solr search string?

Solr works best if all the data for an individual document is contained
in a single flat schema.  As soon as you try to put some of the data in
one index and some of the data in another index, you'll probably run
into problems combining the data and/or problems with performance.  Solr
does have some join capability, but when it is mentioned, usually it is
to discuss the things it CAN'T do, not the things that it can do.

What kind of security requirement would necessitate splitting data that
logically belongs together?

Thanks,
Shawn



Re: Indexing a term into separate Lucene indexes

2014-06-20 Thread Shawn Heisey
On 6/20/2014 12:17 PM, Huang, Roger wrote:
 How would you recommend storing the name and domain parts of the email 
 address in separate Lucene indexes?
 To query, would I use the Solr cross-core join, fromIndex, toIndex?

I have absolutely no idea how to use Solr's join functionality.  It is
not required for my indexes.  Here's the wiki page on the subject:

https://wiki.apache.org/solr/Join

Additional note: Your reply did not come to the mailing list, it was
only sent to me.

Thanks,
Shawn