[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2018-06-01 Thread Junte Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498309#comment-16498309
 ] 

Junte Zhang commented on LUCENE-8278:
-

I think I have tested the patch:
{code:java}
patch -p1 -i LUCENE-8278.patch 
patching file lucene/analysis/common/build.xml
patching file 
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ASCIITLD.jflex-macro
patching file 
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.java
patching file 
lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.jflex
patching file 
lucene/analysis/common/src/test/org/apache/lucene/analysis/standard/TestUAX29URLEmailTokenizer.java
patching file 
lucene/analysis/common/src/tools/java/org/apache/lucene/analysis/standard/GenerateJflexTLDMacros.java
{code}
 

then ant compile

Started Solr and created a core with a fieldType:
{code:java}

  
    
  
{code}
Then tested in the Solr Admin but didn't see a difference, but perhaps I missed 
something.

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8278.patch
>
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2018-05-30 Thread Junte Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495777#comment-16495777
 ] 

Junte Zhang commented on LUCENE-8278:
-

Hi Steve, sorry for the late response. I will check this tomorrow. Thanks for 
picking up this bug report! 

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-8278.patch
>
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2018-05-02 Thread Junte Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460629#comment-16460629
 ] 

Junte Zhang edited comment on LUCENE-8278 at 5/2/18 7:36 AM:
-

Thank you for confirming this issue Steve. We run Lucene/Solr 6.6 on our 
production servers, and we also found this workaround to append a whitespace to 
the token to work on this version. However, this workaround is no longer 
working on Lucene 7.3.0. I'll see if I can fix this...


was (Author: drjz):
Thank you for confirming this issue Steve. We run Lucene/Solr 6.6 on our 
production servers, and we also found this workaround to append a whitespace to 
the token to work on this version. However, this workaround is no longer 
working in Lucene 7.3.0. I'll see if I can fix this...

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Priority: Minor
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2018-05-02 Thread Junte Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460629#comment-16460629
 ] 

Junte Zhang commented on LUCENE-8278:
-

Thank you for confirming this issue Steve. We run Lucene/Solr 6.6 on our 
production servers, and we also found this workaround to append a whitespace to 
the token to work on this version. However, this workaround is no longer 
working in Lucene 7.3.0. I'll see if I can fix this...

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Priority: Minor
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2018-04-26 Thread Junte Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junte Zhang updated LUCENE-8278:

Summary: UAX29URLEmailTokenizer is not detecting some tokens as URL type  
(was: UAX29URLEmailTokenizer is not detecting some tokens as URL tag)

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Priority: Minor
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL tag

2018-04-26 Thread Junte Zhang (JIRA)
Junte Zhang created LUCENE-8278:
---

 Summary: UAX29URLEmailTokenizer is not detecting some tokens as 
URL tag
 Key: LUCENE-8278
 URL: https://issues.apache.org/jira/browse/LUCENE-8278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Junte Zhang


We are using the UAX29URLEmailTokenizer so we can use the token types in our 
plugins.

However, I noticed that the tokenizer is not detecting certain URLs as  
but  instead.

Examples that are not working:
 * example.com is 
 * example.net is 

But:
 * https://example.com is 
 * as is https://example.net

Examples that work:
 * example.ch is 
 * example.co.uk is 
 * example.nl is 

I have checked this JIRA, and could not find an issue. I have tested this on 
Lucene (Solr) 6.4.1 and 7.3.

Could someone confirm my findings and advise what I could do to (help) resolve 
this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org