Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Zheng Lin Edwin Yeo
Hi Ahmet,

Ok. Thanks for your advice.

Regards,
Edwin

On 25 November 2017 at 10:23, Ahmet Arslan  wrote:

>
>
> Hi Zheng,
>
> UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps
> them single token.
>
> StandardTokenizer produce two or more tokens for an entity.
>
> Please try them using the analysis page, use which one suits your
> requirements.
>
> Ahmet
>
>
>
> On Friday, November 24, 2017, 11:46:57 AM GMT+3, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
>
>
>
>
>
> Hi,
>
> I am indexing email addresses into Solr via EML files. Currently, I am
> using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
> found that we can also use UAX29URLEmailTokenizerFactory with
> LowerCaseFilterFactory.
>
> Does anyone have any recommendation on which Tokenizer is better?
>
> I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.
>
> Regards,
> Edwin
>


Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Zheng Lin Edwin Yeo
Hi Rick,

For both of the tokenizers, it does not split on the hyphens for email like
this:
solr-user@lucene.apache.org

The entire email address remains intact for both of the tokenizers.

Regards,
Edwin

On 24 November 2017 at 20:19, Rick Leir  wrote:

> Edwin
> There is a spec for which characters are acceptable in an email name, and
> another spec for chars in a domain name. I suspect you will have more
> success with a tokenizer which is specialized for email, but I have not
> looked at UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory split
> on hyphens?
> Cheers --Rick
>
> On November 24, 2017 3:46:46 AM EST, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com> wrote:
> >Hi,
> >
> >I am indexing email addresses into Solr via EML files. Currently, I am
> >using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I
> >also
> >found that we can also use UAX29URLEmailTokenizerFactory with
> >LowerCaseFilterFactory.
> >
> >Does anyone have any recommendation on which Tokenizer is better?
> >
> >I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.
> >
> >Regards,
> >Edwin
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Ahmet Arslan


Hi Zheng,

UAX29UET recognizes URLs and e-mails. It does not tokenize them. It keeps them 
single token.

StandardTokenizer produce two or more tokens for an entity.

Please try them using the analysis page, use which one suits your requirements.

Ahmet



On Friday, November 24, 2017, 11:46:57 AM GMT+3, Zheng Lin Edwin Yeo 
 wrote: 





Hi,

I am indexing email addresses into Solr via EML files. Currently, I am
using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
found that we can also use UAX29URLEmailTokenizerFactory with
LowerCaseFilterFactory.

Does anyone have any recommendation on which Tokenizer is better?

I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.

Regards,
Edwin


Re: Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Rick Leir
Edwin
There is a spec for which characters are acceptable in an email name, and 
another spec for chars in a domain name. I suspect you will have more success 
with a tokenizer which is specialized for email, but I have not looked at 
UAX29URLEmailTokenizerFactory. Does ClassicTokenizerFactory split on hyphens? 
Cheers --Rick

On November 24, 2017 3:46:46 AM EST, Zheng Lin Edwin Yeo  
wrote:
>Hi,
>
>I am indexing email addresses into Solr via EML files. Currently, I am
>using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I
>also
>found that we can also use UAX29URLEmailTokenizerFactory with
>LowerCaseFilterFactory.
>
>Does anyone have any recommendation on which Tokenizer is better?
>
>I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.
>
>Regards,
>Edwin

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Difference between UAX29URLEmailTokenizerFactory and ClassicTokenizerFactory

2017-11-24 Thread Zheng Lin Edwin Yeo
Hi,

I am indexing email addresses into Solr via EML files. Currently, I am
using ClassicTokenizerFactory with LowerCaseFilterFactory. However, I also
found that we can also use UAX29URLEmailTokenizerFactory with
LowerCaseFilterFactory.

Does anyone have any recommendation on which Tokenizer is better?

I am currently using Solr 6.5.1, and planning to upgrade to Solr 7.1.0.

Regards,
Edwin