Re: Do we have some sort of recomposing token filter?

2013-01-08 Thread Steve Rowe
Hi Alexandre,

CombiningFilter sounds close (no option to put spaces between original terms), 
but hasn't yet been committed: 
https://issues.apache.org/jira/browse/LUCENE-3413.

Steve

On Jan 8, 2013, at 4:55 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Hello,
 
 I want to take a composite email address  such as John Doe 
 john...@example.com and leave John Doe as a facet field.
 
 So far, I got UAX29 Tokenizer combined with TypeTokenFilterFactory to
 filter out email type.
 
 But that leaves with John and Doe as tokens which I cannot figure out
 how to combine back with extra space to make it back into John Doe.
 
 I thought about using regexp instead to just string , but that feels
 even less robust.
 
 Do we have anything ready to use for that or do I need to custom code?
 
 Regards,
   Alex.
 
 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)



Re: Do we have some sort of recomposing token filter?

2013-01-08 Thread Otis Gospodnetic
Hi,

Are you just trying to extract the personal name? I think Java Mail has the
ability to do that.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Jan 8, 2013 4:56 PM, Alexandre Rafalovitch arafa...@gmail.com wrote:

 Hello,

 I want to take a composite email address  such as John Doe 
 john...@example.com and leave John Doe as a facet field.

 So far, I got UAX29 Tokenizer combined with TypeTokenFilterFactory to
 filter out email type.

 But that leaves with John and Doe as tokens which I cannot figure out
 how to combine back with extra space to make it back into John Doe.

 I thought about using regexp instead to just string , but that feels
 even less robust.

 Do we have anything ready to use for that or do I need to custom code?

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)