[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498309#comment-16498309 ] Junte Zhang commented on LUCENE-8278: - I think I have tested the patch: {code:java} patch -p1 -i LUCENE-8278.patch patching file lucene/analysis/common/build.xml patching file lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/ASCIITLD.jflex-macro patching file lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.java patching file lucene/analysis/common/src/java/org/apache/lucene/analysis/standard/UAX29URLEmailTokenizerImpl.jflex patching file lucene/analysis/common/src/test/org/apache/lucene/analysis/standard/TestUAX29URLEmailTokenizer.java patching file lucene/analysis/common/src/tools/java/org/apache/lucene/analysis/standard/GenerateJflexTLDMacros.java {code} then ant compile Started Solr and created a core with a fieldType: {code:java} {code} Then tested in the Solr Admin but didn't see a difference, but perhaps I missed something. > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Assignee: Steve Rowe >Priority: Minor > Attachments: LUCENE-8278.patch > > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495777#comment-16495777 ] Junte Zhang commented on LUCENE-8278: - Hi Steve, sorry for the late response. I will check this tomorrow. Thanks for picking up this bug report! > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Assignee: Steve Rowe >Priority: Minor > Attachments: LUCENE-8278.patch > > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460629#comment-16460629 ] Junte Zhang edited comment on LUCENE-8278 at 5/2/18 7:36 AM: - Thank you for confirming this issue Steve. We run Lucene/Solr 6.6 on our production servers, and we also found this workaround to append a whitespace to the token to work on this version. However, this workaround is no longer working on Lucene 7.3.0. I'll see if I can fix this... was (Author: drjz): Thank you for confirming this issue Steve. We run Lucene/Solr 6.6 on our production servers, and we also found this workaround to append a whitespace to the token to work on this version. However, this workaround is no longer working in Lucene 7.3.0. I'll see if I can fix this... > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Priority: Minor > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460629#comment-16460629 ] Junte Zhang commented on LUCENE-8278: - Thank you for confirming this issue Steve. We run Lucene/Solr 6.6 on our production servers, and we also found this workaround to append a whitespace to the token to work on this version. However, this workaround is no longer working in Lucene 7.3.0. I'll see if I can fix this... > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Priority: Minor > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junte Zhang updated LUCENE-8278: Summary: UAX29URLEmailTokenizer is not detecting some tokens as URL type (was: UAX29URLEmailTokenizer is not detecting some tokens as URL tag) > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Priority: Minor > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL tag
Junte Zhang created LUCENE-8278: --- Summary: UAX29URLEmailTokenizer is not detecting some tokens as URL tag Key: LUCENE-8278 URL: https://issues.apache.org/jira/browse/LUCENE-8278 Project: Lucene - Core Issue Type: Bug Reporter: Junte Zhang We are using the UAX29URLEmailTokenizer so we can use the token types in our plugins. However, I noticed that the tokenizer is not detecting certain URLs as but instead. Examples that are not working: * example.com is * example.net is But: * https://example.com is * as is https://example.net Examples that work: * example.ch is * example.co.uk is * example.nl is I have checked this JIRA, and could not find an issue. I have tested this on Lucene (Solr) 6.4.1 and 7.3. Could someone confirm my findings and advise what I could do to (help) resolve this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org