[jira] [Resolved] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly
[ https://issues.apache.org/jira/browse/SOLR-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-12570. --- Resolution: Fixed Assignee: Koji Sekiguchi Fix Version/s: 7.4.1 > OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields > because pattern replacement doesn't work correctly > - > > Key: SOLR-12570 > URL: https://issues.apache.org/jira/browse/SOLR-12570 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: UpdateRequestProcessors >Affects Versions: 7.3, 7.3.1, 7.4 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.5, 7.4.1 > > Attachments: SOLR-12570.patch > > > Because of the following code, if resolvedDest is "body_{EntityType}_s" and > becomes "body_PERSON_s" by replacement, but once it is replaced, as > placeholder ({EntityType}) is overwritten, the destination is always > "body_PERSON_s". > {code} > resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8420) Upgrade OpenNLP to 1.9.0
[ https://issues.apache.org/jira/browse/LUCENE-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-8420. Resolution: Fixed Assignee: Koji Sekiguchi > Upgrade OpenNLP to 1.9.0 > > > Key: LUCENE-8420 > URL: https://issues.apache.org/jira/browse/LUCENE-8420 > Project: Lucene - Core > Issue Type: Task > Components: modules/analysis >Affects Versions: 7.4 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: LUCENE-8420.patch > > > OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 > can read the previous format for back-compat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly
[ https://issues.apache.org/jira/browse/SOLR-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551521#comment-16551521 ] Koji Sekiguchi commented on SOLR-12570: --- I posted a patch in LUCENE-8420. It includes the new ner model which can predict LOCATION in addition to PERSON. I think we can add the test for this after LUCENE-8420 committed, I haven't tried the new model file to predict LOCATION, though. > OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields > because pattern replacement doesn't work correctly > - > > Key: SOLR-12570 > URL: https://issues.apache.org/jira/browse/SOLR-12570 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: UpdateRequestProcessors >Affects Versions: 7.3, 7.3.1, 7.4 >Reporter: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: SOLR-12570.patch > > > Because of the following code, if resolvedDest is "body_{EntityType}_s" and > becomes "body_PERSON_s" by replacement, but once it is replaced, as > placeholder ({EntityType}) is overwritten, the destination is always > "body_PERSON_s". > {code} > resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8420) Upgrade OpenNLP to 1.9.0
[ https://issues.apache.org/jira/browse/LUCENE-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551519#comment-16551519 ] Koji Sekiguchi commented on LUCENE-8420: I created model files for 1.9.0 by executing ant train-test-models under lucene/analysis/opennlp/. As for the training data, I renamed ner_flashman.txt to ner.txt and let the file have location type for SOLR-12570. I deleted opennlp-maxent which is never used (and I think it's old; opennlp-tools package includes maxent). > Upgrade OpenNLP to 1.9.0 > > > Key: LUCENE-8420 > URL: https://issues.apache.org/jira/browse/LUCENE-8420 > Project: Lucene - Core > Issue Type: Task > Components: modules/analysis >Affects Versions: 7.4 >Reporter: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: LUCENE-8420.patch > > > OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 > can read the previous format for back-compat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8420) Upgrade OpenNLP to 1.9.0
[ https://issues.apache.org/jira/browse/LUCENE-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-8420: --- Attachment: LUCENE-8420.patch > Upgrade OpenNLP to 1.9.0 > > > Key: LUCENE-8420 > URL: https://issues.apache.org/jira/browse/LUCENE-8420 > Project: Lucene - Core > Issue Type: Task > Components: modules/analysis >Affects Versions: 7.4 >Reporter: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: LUCENE-8420.patch > > > OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 > can read the previous format for back-compat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12571) Upgrade OpenNLP to 1.9.0
Koji Sekiguchi created SOLR-12571: - Summary: Upgrade OpenNLP to 1.9.0 Key: SOLR-12571 URL: https://issues.apache.org/jira/browse/SOLR-12571 Project: Solr Issue Type: Task Security Level: Public (Default Security Level. Issues are Public) Components: contrib - LangId, update Affects Versions: 7.4 Reporter: Koji Sekiguchi Fix For: master (8.0), 7.5 OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 can read the previous format for back-compat. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly
[ https://issues.apache.org/jira/browse/SOLR-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-12570: -- Attachment: SOLR-12570.patch > OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields > because pattern replacement doesn't work correctly > - > > Key: SOLR-12570 > URL: https://issues.apache.org/jira/browse/SOLR-12570 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: UpdateRequestProcessors >Affects Versions: 7.3, 7.3.1, 7.4 >Reporter: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.5 > > Attachments: SOLR-12570.patch > > > Because of the following code, if resolvedDest is "body_{EntityType}_s" and > becomes "body_PERSON_s" by replacement, but once it is replaced, as > placeholder ({EntityType}) is overwritten, the destination is always > "body_PERSON_s". > {code} > resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly
Koji Sekiguchi created SOLR-12570: - Summary: OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly Key: SOLR-12570 URL: https://issues.apache.org/jira/browse/SOLR-12570 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: UpdateRequestProcessors Affects Versions: 7.4, 7.3.1, 7.3 Reporter: Koji Sekiguchi Fix For: master (8.0), 7.5 Because of the following code, if resolvedDest is "body_{EntityType}_s" and becomes "body_PERSON_s" by replacement, but once it is replaced, as placeholder ({EntityType}) is overwritten, the destination is always "body_PERSON_s". {code} resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12202) failed to run solr-exporter.cmd on Windows platform
[ https://issues.apache.org/jira/browse/SOLR-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-12202. --- Resolution: Fixed Fix Version/s: master (8.0) 7.4 7.3 Thanks! > failed to run solr-exporter.cmd on Windows platform > --- > > Key: SOLR-12202 > URL: https://issues.apache.org/jira/browse/SOLR-12202 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.3 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Major > Fix For: 7.3, 7.4, master (8.0) > > Attachments: SOLR-12202.patch, SOLR-12202_branch_7_3.patch > > > failed to run solr-exporter.cmd on Windows platform due to following: > - incorrect main class name. > - incorrect classpath specification. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-12202) failed to run solr-exporter.cmd on Windows platform
[ https://issues.apache.org/jira/browse/SOLR-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-12202: - Assignee: Koji Sekiguchi > failed to run solr-exporter.cmd on Windows platform > --- > > Key: SOLR-12202 > URL: https://issues.apache.org/jira/browse/SOLR-12202 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.3 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Major > Attachments: SOLR-12202.patch > > > failed to run solr-exporter.cmd on Windows platform due to following: > - incorrect main class name. > - incorrect classpath specification. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-11795. --- Resolution: Fixed Mark as resolved. Thanks everyone! > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 7.3, master (8.0) > > Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, > SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, > SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, > SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, > SOLR-11795-ref-guide.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reopened SOLR-11795: --- Thanks. I'll apply the additional patch soon. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, > SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, > SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, > SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, > SOLR-11795-ref-guide.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-11795. --- Resolution: Fixed Yes, thanks Uwe and all! > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, > SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, > SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, > SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, > SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385552#comment-16385552 ] Koji Sekiguchi commented on SOLR-11795: --- Uwe's suggestion helped us to check this patch working on various platforms without causing someone trouble. Actually, Java 9 Jenkins found that SnakeYAML stuff uses reflection in illegal ways, which we couldn't notice before committing. ... and the results look good so far. I'd like to commit this to master and branch_7x soon. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, > SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, > SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, > SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, > SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383080#comment-16383080 ] Koji Sekiguchi edited comment on SOLR-11795 at 3/2/18 3:02 AM: --- Hi Uwe, I created the following branches: * SOLR-11795 (for master) * branch_7x-SOLR-11795 (for branch_7x) Sorry to put you to the trouble but can you setup the Linux and Windows Policeman Jenkins jobs that you kindly suggested a week ago? Thank you very much in advance! was (Author: koji): Hi Uwe, I created the following branches: * SOLR-1175 (for master) * branch_7x-SOLR-1175 (for branch_7x) Sorry to put you to the trouble but can you setup the Linux and Windows Policeman Jenkins jobs that you kindly suggested a week ago? Thank you very much in advance! > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, > SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, > SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, > SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, > SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383080#comment-16383080 ] Koji Sekiguchi commented on SOLR-11795: --- Hi Uwe, I created the following branches: * SOLR-1175 (for master) * branch_7x-SOLR-1175 (for branch_7x) Sorry to put you to the trouble but can you setup the Linux and Windows Policeman Jenkins jobs that you kindly suggested a week ago? Thank you very much in advance! > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, > SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, > SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, > SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, > SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375582#comment-16375582 ] Koji Sekiguchi commented on SOLR-11795: --- Thanks for the kind suggestion. I'd like to create a branch for this and let you know the name. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, > SOLR-11795-dev-tools.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375195#comment-16375195 ] Koji Sekiguchi commented on SOLR-11795: --- I can't apologize enough for this. :( > Now we have XML, JSON, properties files and now YAML. Why not use one that's > already used by other places in Solr? As for using yaml, I asked the contributor why using it rather than json and I just accepted the reason (more readable and understandable, able to include comment etc.) but I should gain favor with committers about importing new config format. > Or much simpler: Get rid of YAML! I'd like to talk to him that we could use json for config but it'll take time to apply. I'm going to revert this soon. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, > SOLR-11795-dev-tools.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-11795. --- Resolution: Fixed Thanks Minoru and everyone! > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, > SOLR-11795-dev-tools.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373596#comment-16373596 ] Koji Sekiguchi commented on SOLR-11795: --- Thanks for letting us know it. Yes, we discussed about the problem last night as we were paying careful attention to Jenkins. I think we can fix it soon. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: master (8.0), 7.3 > > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795.patch, > solr-dashboard.png, solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reopened SOLR-11795: --- Reopening this. We're still working on this. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 7.3 > > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-11795. --- Resolution: Fixed Fix Version/s: 7.3 > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 7.3 > > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > Time Spent: 20m > Remaining Estimate: 0h > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369692#comment-16369692 ] Koji Sekiguchi commented on SOLR-11795: --- I can still see several UpdateRequestProcessors in solrconfig.xml for test. Are they necessary? And I'm sorry if I'm wrong but do you need test-files/exampledocs/*.xml files? As for schema settings, existing all Solr contribs use schema.xml, not managed-schema. Why don't you follow them? > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, > SOLR-11795-7.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369645#comment-16369645 ] Koji Sekiguchi commented on SOLR-11795: --- Thank you for updating the patch. I can see hard coded luceneMatchVersion in the patch: {code} 7.1.0 {code} You can rephrase it like this: {code} ${tests.luceneMatchVersion:LATEST} {code} And I think your solrconfig.xml for test is still fat... Please consult solr/contrib/langid for making test config more compact. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795.patch, > solr-dashboard.png, solr-exporter-diagram.png > > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366903#comment-16366903 ] Koji Sekiguchi commented on SOLR-11795: --- Today I had a meeting with Minoru, the contributor of this patch. We discussed in detail about this contribution and I found this is very nice! There is a similar ticket SOLR-10654, which implements ResponseWriter for Prometheus and is called thru wt parameter, but I prefer Minoru's way. Why I prefer this is because: * This is highly independent from Solr main unit. He just makes contrib/prometheus-exporter directory and provides everything under it, including SolrExporter for Prometheus in this patch. This patch doesn't change Solr main source. * Implementing an exporter looks mainstream in Prometheus field, such as MySQL, Memcached, Mesos, etc. See https://prometheus.io/docs/instrumenting/exporters/ * Solrj is used to implement SolrExporter in this patch. It can be used on SolrCloud environment. * It allows users to monitor not only Solr metrics which come from /admin/metrics but also facet counts which come from /select (see config.yml in the patch). I requested him to update the patch in terms of providing Ref Guide (he already wrote README.md so just move its contents to Ref Guide) and adding more tests so that we can know the change of the response format of /admin/metrics if it happens. I'll wait for his next patch. Once I got it and nobody objects, I'd like to commit this in the next week. > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, > SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-11592: -- Affects Version/s: (was: 7.1) 7.2 > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.2 >Reporter: Koji Sekiguchi >Assignee: Steve Rowe >Priority: Minor > Attachments: SOLR-11592.patch, SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-11592: - Assignee: Steve Rowe > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.1 >Reporter: Koji Sekiguchi >Assignee: Steve Rowe >Priority: Minor > Attachments: SOLR-11592.patch, SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328494#comment-16328494 ] Koji Sekiguchi commented on SOLR-11592: --- Looks good to me. :) > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11592.patch, SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-11795) Add Solr metrics exporter for Prometheus
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-11795: - Assignee: Koji Sekiguchi > Add Solr metrics exporter for Prometheus > > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 7.2 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11795.patch, solr-dashboard.png, > solr-exporter-diagram.png > > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus to contrib directory
[ https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304127#comment-16304127 ] Koji Sekiguchi commented on SOLR-11795: --- +1 looks nice! > Add Solr metrics exporter for Prometheus to contrib directory > - > > Key: SOLR-11795 > URL: https://issues.apache.org/jira/browse/SOLR-11795 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Minoru Osuka >Priority: Minor > Fix For: master (8.0) > > Attachments: solr-dashboard.png, solr-exporter-diagram.png > > > I 'd like to monitor Solr using Prometheus and Grafana. > I've already created Solr metrics exporter for Prometheus. I'd like to > contribute to contrib directory if you don't mind. > !solr-exporter-diagram.png|thumbnail! > !solr-dashboard.png|thumbnail! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243314#comment-16243314 ] Koji Sekiguchi commented on SOLR-11592: --- Hi Steve, Thank you for reviewing the patch. You're right! I'll do them later, after finishing my project. Or, if Steve or someone can implement this, please take. I think I can review. :) > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233857#comment-16233857 ] Koji Sekiguchi edited comment on SOLR-11592 at 11/2/17 12:55 AM: - OpenNLP's model covers 103 languages. https://svn.apache.org/repos/bigdata/opennlp/tags/langdetect-183_RC3/leipzig/resources/README.txt was (Author: koji): OpenNLP's model covers 103 languages. > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233857#comment-16233857 ] Koji Sekiguchi commented on SOLR-11592: --- OpenNLP's model covers 103 languages. > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11592) add another language detector using OpenNLP
[ https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-11592: -- Attachment: SOLR-11592.patch patch. it doesn't have any tests yet. > add another language detector using OpenNLP > --- > > Key: SOLR-11592 > URL: https://issues.apache.org/jira/browse/SOLR-11592 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: contrib - LangId >Affects Versions: 7.1 >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-11592.patch > > > We already have two language detectors, lang-detect and Tika's lang detect. > This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11592) add another language detector using OpenNLP
Koji Sekiguchi created SOLR-11592: - Summary: add another language detector using OpenNLP Key: SOLR-11592 URL: https://issues.apache.org/jira/browse/SOLR-11592 Project: Solr Issue Type: New Feature Security Level: Public (Default Security Level. Issues are Public) Components: contrib - LangId Affects Versions: 7.1 Reporter: Koji Sekiguchi Priority: Minor We already have two language detectors, lang-detect and Tika's lang detect. This is a ticket that gives users third option using OpenNLP. :) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-9184) Add convenience method to ModifiableSolrParams
[ https://issues.apache.org/jira/browse/SOLR-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-9184. -- Resolution: Fixed Fix Version/s: 6.6 Thanks, Jörg! > Add convenience method to ModifiableSolrParams > -- > > Key: SOLR-9184 > URL: https://issues.apache.org/jira/browse/SOLR-9184 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Jörg Rathlev >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 6.6 > > Attachments: SOLR-9184.patch, SOLR-9184.patch > > > Add a static convenience method {{ModifiableSolrParams#of(SolrParams)}} which > returns the same instance if it already is modifiable, otherwise creates a > new {{ModifiableSolrParams}} instance. > Rationale: when writing custom SearchComponents, we find that we often need > to ensure that the SolrParams are modifiable. The copy constructor of > ModifiableSolrParams always creates a copy, even if the SolrParms already are > modifiable. > Alternatives: The method could also be added as a convenience method in > SolrParams itself, which already has static helper methods for wrapDefaults > and wrapAppended. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-9184) Add convenience method to ModifiableSolrParams
[ https://issues.apache.org/jira/browse/SOLR-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-9184: Assignee: Koji Sekiguchi > Add convenience method to ModifiableSolrParams > -- > > Key: SOLR-9184 > URL: https://issues.apache.org/jira/browse/SOLR-9184 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Jörg Rathlev >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-9184.patch > > > Add a static convenience method {{ModifiableSolrParams#of(SolrParams)}} which > returns the same instance if it already is modifiable, otherwise creates a > new {{ModifiableSolrParams}} instance. > Rationale: when writing custom SearchComponents, we find that we often need > to ensure that the SolrParams are modifiable. The copy constructor of > ModifiableSolrParams always creates a copy, even if the SolrParms already are > modifiable. > Alternatives: The method could also be added as a convenience method in > SolrParams itself, which already has static helper methods for wrapDefaults > and wrapAppended. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9184) Add convenience method to ModifiableSolrParams
[ https://issues.apache.org/jira/browse/SOLR-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935705#comment-15935705 ] Koji Sekiguchi commented on SOLR-9184: -- I think this is almost ready. How about adding assertNotSame for the first test? > Add convenience method to ModifiableSolrParams > -- > > Key: SOLR-9184 > URL: https://issues.apache.org/jira/browse/SOLR-9184 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Reporter: Jörg Rathlev >Priority: Minor > Attachments: SOLR-9184.patch > > > Add a static convenience method {{ModifiableSolrParams#of(SolrParams)}} which > returns the same instance if it already is modifiable, otherwise creates a > new {{ModifiableSolrParams}} instance. > Rationale: when writing custom SearchComponents, we find that we often need > to ensure that the SolrParams are modifiable. The copy constructor of > ModifiableSolrParams always creates a copy, even if the SolrParms already are > modifiable. > Alternatives: The method could also be added as a convenience method in > SolrParams itself, which already has static helper methods for wrapDefaults > and wrapAppended. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-2867) Problem Wtih solr Score Display
[ https://issues.apache.org/jira/browse/SOLR-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi closed SOLR-2867. Resolution: Invalid Please ask about your problem in the solr-user mailing list. > Problem Wtih solr Score Display > --- > > Key: SOLR-2867 > URL: https://issues.apache.org/jira/browse/SOLR-2867 > Project: Solr > Issue Type: Bug > Components: SearchComponents - other >Affects Versions: 3.1 > Environment: Linux and Mysql >Reporter: Pragyanjeet Rout > Original Estimate: 24h > Remaining Estimate: 24h > > We are firing a solr query and checking its relevancy score. > But problem with relevancy score is that for some results the value for score > is been truncated. > Example:-I have a query as below > http://localhost:8983/solr/mywork/select/?q=( contractLength:12 speedScore:[4 > TO 7] dataScore:[2 TO *])&fq=( ( connectionType:"Cable" > connectionType:"Naked")AND ( monthlyCost:[* TO *])AND ( speedScore:[4 TO > *])AND ( dataScore:[2 TO > *]))&version=2.2&start=0&rows=500&indent=on&sort=score desc, planType asc, > monthlyCost1 asc, monthlyCost2 asc > The below mentioned is my xml returned from solr :- > > 3.6897283 > 12 > 3 > ABC > 120.9 > 7 > > > 3.689728 > 12 > 2 > DEF > 49.95 > 6 > > I have used the "debugQuery=true" in query and I saw solr is calculating the > correct score(PSB) but somehow is it truncating the lastdigit i.e "3" from > the second result. > Because of this my ranking order gets disturbed and I get wrong results while > displaying > > 3.6897283 = (MATCH) sum of:3.1476827 = (MATCH) weight(contractLength:€#0;#12; > in 51), product of:0.92363054 = queryWeight(contractLength:€#0;#12;), product > of:3.4079456 = idf(docFreq=8, maxDocs=100) 0.27102268 = queryNorm 3.4079456 > = (MATCH) fieldWeight(contractLength:€#0;#12; in 51), product of:1.0 = > tf(termFreq(contractLength:€#0;#12;)=1) 3.4079456 = idf(docFreq=8, > maxDocs=100) > 1.0 = fieldNorm(field=contractLength, doc=51) 0.27102268 = (MATCH) > ConstantScore(speedScore:[€#0;#4; TO *]), product of: > 1.0 = boost 0.27102268 = queryNorm 0.27102268 = (MATCH) > ConstantScore(dataScore:[€#0;#2; TO *]), product of: 1.0 = boost 0.27102268 > = queryNorm > > > 3.6897283 = (MATCH) sum of: 3.1476827 = (MATCH) > weight(contractLength:€#0;#12; in 97), product of: 0.92363054 = > queryWeight(contractLength:€#0;#12;), product of: 3.4079456 = idf(docFreq=8, > maxDocs=100) 0.27102268 = queryNorm 3.4079456 = (MATCH) > fieldWeight(contractLength:€#0;#12; in 97), product of: 1.0 = > tf(termFreq(contractLength:€#0;#12;)=1) 3.4079456 = idf(docFreq=8, > maxDocs=100) 1.0 = fieldNorm(field=contractLength, doc=97) 0.27102268 = > (MATCH) ConstantScore(speedScore:[€#0;#4; TO *]), product of: 1.0 = boost > 0.27102268 = queryNorm 0.27102268 = (MATCH) > ConstantScore(dataScore:[€#0;#2; TO *]), product of:1.0 = boost > 0.27102268 = queryNorm > > Please educate me for the above behaviour from solr. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-9918. -- Resolution: Fixed Fix Version/s: 6.4 master (7.0) Thanks, Tim! > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen >Assignee: Koji Sekiguchi > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813849#comment-15813849 ] Koji Sekiguchi commented on SOLR-9918: -- I think this is ready. > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen >Assignee: Koji Sekiguchi > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806603#comment-15806603 ] Koji Sekiguchi commented on SOLR-9918: -- Thank you for giving the great explanation which is more than I expected. :) > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen >Assignee: Koji Sekiguchi > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-9918: Assignee: Koji Sekiguchi > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen >Assignee: Koji Sekiguchi > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803344#comment-15803344 ] Koji Sekiguchi commented on SOLR-9918: -- Thank you for your additional explanation. I agree with you on the Confluence page is the best place to put that kind of guideline notes. I just wanted to see such information in the ticket, not javadoc, because I think it helps committers to understand the requirement and importance of this proposal. As for SignatureUpdateProcessor, I thought it skipped to add the doc if the signature is same, but when I looked into the patch on SOLR-799, I noticed that it always updates the existing document even if the doc has the same signature. > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen > Attachments: SOLR-9918.patch, SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs
[ https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796928#comment-15796928 ] Koji Sekiguchi commented on SOLR-9918: -- I believe the proposal is very useful for users who need this function, but it is better for users if there is an additional explanation of the difference from the existing one that gives similar function. How do users decide which UpdateRequestProcessor to use for their use cases as compared to SignatureUpdateProcessor? > An UpdateRequestProcessor to skip duplicate inserts and ignore updates to > missing docs > -- > > Key: SOLR-9918 > URL: https://issues.apache.org/jira/browse/SOLR-9918 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: update >Reporter: Tim Owen > Attachments: SOLR-9918.patch > > > This is an UpdateRequestProcessor and Factory that we have been using in > production, to handle 2 common cases that were awkward to achieve using the > existing update pipeline and current processor classes: > * When inserting document(s), if some already exist then quietly skip the new > document inserts - do not churn the index by replacing the existing documents > and do not throw a noisy exception that breaks the batch of inserts. By > analogy with SQL, {{insert if not exists}}. In our use-case, multiple > application instances can (rarely) process the same input so it's easier for > us to de-dupe these at Solr insert time than to funnel them into a global > ordered queue first. > * When applying AtomicUpdate documents, if a document being updated does not > exist, quietly do nothing - do not create a new partially-populated document > and do not throw a noisy exception about missing required fields. By analogy > with SQL, {{update where id = ..}}. Our use-case relies on this because we > apply updates optimistically and have best-effort knowledge about what > documents will exist, so it's easiest to skip the updates (in the same way a > Database would). > I would have kept this in our own package hierarchy but it relies on some > package-scoped methods, and seems like it could be useful to others if they > choose to configure it. Some bits of the code were borrowed from > {{DocBasedVersionConstraintsProcessorFactory}}. > Attached patch has unit tests to confirm the behaviour. > This class can be used by configuring solrconfig.xml like so.. > {noformat} > > > class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory"> > true > false > > > > > {noformat} > and initParams defaults of > {noformat} > skipexisting > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7321) Character Mapping
[ https://issues.apache.org/jira/browse/LUCENE-7321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321858#comment-15321858 ] Koji Sekiguchi commented on LUCENE-7321: What is the advantage of this compared to MappingCharFilter? > Character Mapping > - > > Key: LUCENE-7321 > URL: https://issues.apache.org/jira/browse/LUCENE-7321 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Affects Versions: 4.6.1, 6.0, 5.4.1, 6.0.1 >Reporter: Ivan Provalov >Priority: Minor > Labels: patch > Fix For: 6.0.1 > > Attachments: CharacterMappingComponent.pdf, LUCENE-7321.patch > > > One of the challenges in search is recall of an item with a common typing > variant. These cases can be as simple as lower/upper case in most languages, > accented characters, or more complex morphological phenomena like prefix > omitting, or constructing a character with some combining mark. This > component addresses the cases, which are not covered by ASCII folding > component, or more complex to design with other tools. The idea is that a > linguist could provide the mappings in a tab-delimited file, which then can > be directly used by Solr. > The mappings are maintained in the tab-delimited file, which could be just a > copy paste from Excel spreadsheet. This gives the linguists the opportunity > to create the mappings, then for the developer to include them in Solr > configuration. There are a few cases, when the mappings grow complex, where > some additional debugging may be required. The mappings can contain any > sequence of characters to any other sequence of characters. > Some of the cases I discuss in detail document are handling the voiced vowels > for Japanese; common typing substitutions for Korean, Russian, Polish; > transliteration for Polish, Arabic; prefix removal for Arabic; suffix folding > for Japanese. In the appendix, I give an example of implementing a Russian > light weight stemmer using this component. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6837) Add N-best output capability to JapaneseTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954483#comment-14954483 ] Koji Sekiguchi commented on LUCENE-6837: We have our own morphological analyzer with n-best output. If nobody take this, I'll assign to me. :) > Add N-best output capability to JapaneseTokenizer > - > > Key: LUCENE-6837 > URL: https://issues.apache.org/jira/browse/LUCENE-6837 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Affects Versions: 5.3 >Reporter: KONNO, Hiroharu >Priority: Minor > Attachments: LUCENE-6837.patch > > > Japanese morphological analyzers often generate mis-segmented tokens. N-best > output reduces the impact of mis-segmentation on search result. N-best output > is more meaningful than character N-gram, and it increases hit count too. > If you use N-best output, you can get decompounded tokens (ex: > "シニアソフトウェアエンジニア" => {"シニア", "シニアソフトウェアエンジニア", "ソフトウェア", "エンジニア"}) and > overwrapped tokens (ex: "数学部長谷川" => {"数学", "部", "部長", "長谷川", "谷川"}), > depending on the dictionary and N-best parameter settings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7488) suspicious FVH init code in DefaultSolrHighlighter even when FVH should not be used
[ https://issues.apache.org/jira/browse/SOLR-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520776#comment-14520776 ] Koji Sekiguchi commented on SOLR-7488: -- Thanks David and Hoss! > suspicious FVH init code in DefaultSolrHighlighter even when FVH should not > be used > --- > > Key: SOLR-7488 > URL: https://issues.apache.org/jira/browse/SOLR-7488 > Project: Solr > Issue Type: Bug >Affects Versions: 4.10 >Reporter: Hoss Man >Assignee: David Smiley > Fix For: Trunk, 5.2 > > > Rich Hume reported gettting errors from FastVectorHighlighter, evidently > while using the the surround query parser, even though he was not trying to > "useFastVectorHighlighter" > my naive reading of the code leads me to believe that DefaultSolrHighlighter > is incorrectly attempting to initialize a FVH instance even when it shouldn't > be -- which appears to cause failures in cases where the query in use is not > something that can be handled by the FVH. > Not sure how to reproduce at the moment -- but the code smells fishy. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3055) Use NGramPhraseQuery in Solr
[ https://issues.apache.org/jira/browse/SOLR-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259553#comment-14259553 ] Koji Sekiguchi commented on SOLR-3055: -- Hi Uchida-san, thank you for your effort for reworking this issue! According to your observation (pros and cons), I like the 1st strategy to go on. And if you agree, why don't you add test cases for that one? And also, don't we need to consider other n-gram type Tokenizers even TokenFilters, such as NGramTokenFilter and CJKBigramFilter? And, I think there is a restriction when minGramSize != maxGramSize. If it's not significant, I think we can examine the restriction separately from this issue because we rarely set different values to those for searching CJK words. But we use a lot NGramTokenizer with fixed gram size for searching CJK words, and we could get a nice performance gain by the patch as you've showed us. > Use NGramPhraseQuery in Solr > > > Key: SOLR-3055 > URL: https://issues.apache.org/jira/browse/SOLR-3055 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis, search >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-3055-1.patch, SOLR-3055-2.patch, SOLR-3055.patch, > schema.xml, solrconfig.xml > > > Solr should use NGramPhraseQuery when searching with default slop on n-gram > field. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6876) Remove unused legacy scripts.conf
[ https://issues.apache.org/jira/browse/SOLR-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256358#comment-14256358 ] Koji Sekiguchi commented on SOLR-6876: -- I think scripts.conf file is not for DIH, but it is for replication scripts. sold/scripts/scripts-util includes scripts.conf and scripts-util is included from many scripts in solr/scripts directory. I don't know Solr users in the world still use shell script based replication, except me. > Remove unused legacy scripts.conf > - > > Key: SOLR-6876 > URL: https://issues.apache.org/jira/browse/SOLR-6876 > Project: Solr > Issue Type: Bug >Affects Versions: 4.10.2, 5.0, Trunk >Reporter: Alexandre Rafalovitch >Assignee: Erick Erickson >Priority: Minor > > Some of the example collections include *scripts.conf* in the *conf* > directory. It is not used by anything in the distribution and is somehow left > over from the Solr 1.x legacy days. > It should be possible to safe delete it to avoid confusing users trying to > understand what different files actually do. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3055) Use NGramPhraseQuery in Solr
[ https://issues.apache.org/jira/browse/SOLR-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255811#comment-14255811 ] Koji Sekiguchi commented on SOLR-3055: -- Thank you for paying attention to this ticket! It's good to me you start this in Lucene. > Use NGramPhraseQuery in Solr > > > Key: SOLR-3055 > URL: https://issues.apache.org/jira/browse/SOLR-3055 > Project: Solr > Issue Type: New Feature > Components: Schema and Analysis, search >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-3055.patch > > > Solr should use NGramPhraseQuery when searching with default slop on n-gram > field. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6112) Compile error with FST package example code
[ https://issues.apache.org/jira/browse/LUCENE-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-6112. Resolution: Fixed Fix Version/s: Trunk 5.0 Thanks, Uchida-san! > Compile error with FST package example code > --- > > Key: LUCENE-6112 > URL: https://issues.apache.org/jira/browse/LUCENE-6112 > Project: Lucene - Core > Issue Type: Task > Components: core/FSTs >Affects Versions: 4.10.2 >Reporter: Tomoko Uchida >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 5.0, Trunk > > Attachments: LUCENE-6112.patch > > > I run the FST construction example guided package.html with lucene 4.10, and > found a compile error. > http://lucene.apache.org/core/4_10_2/core/index.html?org/apache/lucene/util/fst/package-summary.html > javac claimed as below. > "FSTTest" is my test class, just copied from javadoc's example. > {code} > $ javac -cp /opt/lucene-4.10.2/core/lucene-core-4.10.2.jar FSTTest.java > FSTTest.java:28: error: method toIntsRef in class Util cannot be applied to > given types; > builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]); > ^ > required: BytesRef,IntsRefBuilder > found: BytesRef,IntsRef > reason: actual argument IntsRef cannot be converted to IntsRefBuilder by > method invocation conversion > Note: FSTTest.java uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 1 error > {code} > I modified scratchInts variable type from IntsRef to IntsRefBuilder, it > worked fine. (I checked o.a.l.u.fst.TestFSTs.java TestCase and my > modification seems to be correct.) > Util.toIntsRef() method takes IntsRefBuilder as 2nd argument instead of > IntsRef since 4.10, so Javadocs also should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-6112) Compile error with FST package example code
[ https://issues.apache.org/jira/browse/LUCENE-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned LUCENE-6112: -- Assignee: Koji Sekiguchi > Compile error with FST package example code > --- > > Key: LUCENE-6112 > URL: https://issues.apache.org/jira/browse/LUCENE-6112 > Project: Lucene - Core > Issue Type: Task > Components: core/FSTs >Affects Versions: 4.10.2 >Reporter: Tomoko Uchida >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-6112.patch > > > I run the FST construction example guided package.html with lucene 4.10, and > found a compile error. > http://lucene.apache.org/core/4_10_2/core/index.html?org/apache/lucene/util/fst/package-summary.html > javac claimed as below. > "FSTTest" is my test class, just copied from javadoc's example. > {code} > $ javac -cp /opt/lucene-4.10.2/core/lucene-core-4.10.2.jar FSTTest.java > FSTTest.java:28: error: method toIntsRef in class Util cannot be applied to > given types; > builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]); > ^ > required: BytesRef,IntsRefBuilder > found: BytesRef,IntsRef > reason: actual argument IntsRef cannot be converted to IntsRefBuilder by > method invocation conversion > Note: FSTTest.java uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 1 error > {code} > I modified scratchInts variable type from IntsRef to IntsRefBuilder, it > worked fine. (I checked o.a.l.u.fst.TestFSTs.java TestCase and my > modification seems to be correct.) > Util.toIntsRef() method takes IntsRefBuilder as 2nd argument instead of > IntsRef since 4.10, so Javadocs also should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5674) A new token filter: SubSequence
[ https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009140#comment-14009140 ] Koji Sekiguchi commented on LUCENE-5674: bq. Koji: it can't do what I'm trying to do. Have you looked at my description? Please ignore my comment Nitzan as it was just for what Otis described, and PathHierarchyTokenizer is a Tokenizer, not TokenFilter. :) > A new token filter: SubSequence > --- > > Key: LUCENE-5674 > URL: https://issues.apache.org/jira/browse/LUCENE-5674 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nitzan Shaked >Priority: Minor > Attachments: subseqfilter.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > A new configurable token filter which, given a token breaks it into sub-parts > and outputs consecutive sub-sequences of those sub-parts. > Useful for, for example, using during indexing to generate variations on > domain names, so that "www.google.com" can be found by searching for > "google.com", or "www.google.com". > Parameters: > sepRegexp: A regular expression used split incoming tokens into sub-parts. > glue: A string used to concatenate sub-parts together when creating > sub-sequences. > minLen: Minimum length (in sub-parts) of output sub-sequences > maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for > unlimited; negative numbers for token length in sub-parts minus specified > length) > anchor: Anchor.START to output only prefixes, or Anchor.END to output only > suffixes, or Anchor.NONE to output any sub-sequence > withOriginal: whether to output also the original token > EDIT: now includes tests for filter and for factory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5674) A new token filter: SubSequence
[ https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008838#comment-14008838 ] Koji Sekiguchi commented on LUCENE-5674: {quote} Didn't look at this, but I remember needing/writing something like this 10+ years ago but I think back then I wanted to have output be something like: com, com.google, com.google.www - i.e. tokenized, but reversed order. {quote} PathHierarchyTokenizer can tokenize something like that. > A new token filter: SubSequence > --- > > Key: LUCENE-5674 > URL: https://issues.apache.org/jira/browse/LUCENE-5674 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nitzan Shaked >Priority: Minor > Attachments: subseqfilter.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > A new configurable token filter which, given a token breaks it into sub-parts > and outputs consecutive sub-sequences of those sub-parts. > Useful for, for example, using during indexing to generate variations on > domain names, so that "www.google.com" can be found by searching for > "google.com", or "www.google.com". > Parameters: > sepRegexp: A regular expression used split incoming tokens into sub-parts. > glue: A string used to concatenate sub-parts together when creating > sub-sequences. > minLen: Minimum length (in sub-parts) of output sub-sequences > maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for > unlimited; negative numbers for token length in sub-parts minus specified > length) > anchor: Anchor.START to output only prefixes, or Anchor.END to output only > suffixes, or Anchor.NONE to output any sub-sequence > withOriginal: whether to output also the original token > EDIT: now includes tests for filter and for factory. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
[ https://issues.apache.org/jira/browse/LUCENE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-5466. Resolution: Fixed Fix Version/s: 5.0 4.8 > query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier > -- > > Key: LUCENE-5466 > URL: https://issues.apache.org/jira/browse/LUCENE-5466 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 4.8, 5.0 > > Attachments: LUCENE-5466.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
[ https://issues.apache.org/jira/browse/LUCENE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned LUCENE-5466: -- Assignee: Koji Sekiguchi > query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier > -- > > Key: LUCENE-5466 > URL: https://issues.apache.org/jira/browse/LUCENE-5466 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Attachments: LUCENE-5466.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
[ https://issues.apache.org/jira/browse/LUCENE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5466: --- Attachment: LUCENE-5466.patch I think query must be set before calling countDocsWithClass() in train() method. > query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier > -- > > Key: LUCENE-5466 > URL: https://issues.apache.org/jira/browse/LUCENE-5466 > Project: Lucene - Core > Issue Type: Bug > Components: modules/classification >Reporter: Koji Sekiguchi >Priority: Trivial > Attachments: LUCENE-5466.patch > > -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
Koji Sekiguchi created LUCENE-5466: -- Summary: query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier Key: LUCENE-5466 URL: https://issues.apache.org/jira/browse/LUCENE-5466 Project: Lucene - Core Issue Type: Bug Components: modules/classification Reporter: Koji Sekiguchi Priority: Trivial -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5365) Bad version of common-compress
[ https://issues.apache.org/jira/browse/SOLR-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799122#comment-13799122 ] Koji Sekiguchi commented on SOLR-5365: -- Input from Guido Medina in solr ML: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C52612E0F.2080302%40temetra.com%3E {quote} Dont, commons compress 1.5 is broken, either use 1.4.1 or later. Our app stopped compressing properly for a maven update. Guido. {quote} > Bad version of common-compress > -- > > Key: SOLR-5365 > URL: https://issues.apache.org/jira/browse/SOLR-5365 > Project: Solr > Issue Type: Bug > Components: contrib - Solr Cell (Tika extraction) >Affects Versions: 4.4, 4.5 > Environment: MS Windows 2008 Release 2 >Reporter: Roland Everaert > > When a WMZ file is sent to solr on resource /update/extract, the following > exception is thrown by solr: > ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.SolrException; > null:java.lang.RuntimeException: java.lang.NoSuchMethodError: > org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V > at > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) > at > org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) > at > org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023) > at > org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) > at > org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: java.lang.NoSuchMethodError: > org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V > at > org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:102) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) > ... 16 more > According to Koji Sekiguchi, Tika 1.4, the version bundled with solr, should > use common-compress-1.5, but version 1.4.1 is present in > solr/contrib/extraction/lib/ directory. > During our testing, the ignoreTikaException flag was set to true. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch Oops, replace the previous funny name by this patch. Sorry for the noise. > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, > LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: (was: LUCENE-5252_b4.patch) > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, > LUCENE-5252_4x.patch, LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_b4.patch New patch. As for some reason, I give up to support one-way synonym in NGramSynonymTokenizer, I removed indexMode parameter in this patch. > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, > LUCENE-5252_4x.patch, LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch Fix code regarding one-way synonym (aaa=>bbb). > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, > LUCENE-5252_4x.patch, LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch Fix a bug regarding ignoreCase in the attached patch. > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, > LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Description: I'd like to propose that we have another n-gram tokenizer which can process synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram size is fixed, i.e. minGramSize = maxGramSize. Today, I think we have the following problems when using SynonymFilter with NGramTokenizer. For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ expand=true and N = 2 (2-gram). # There is no consensus (I think :-) how we assign offsets to generated synonym tokens DE, EF and FG when expanding source token AB and BC. # If the query pattern looks like ABCY, it cannot be matched even if there is a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, because there is no "CY" token (but "GY" is there) in the index. NGramSynonymTokenizer can solve these problems by providing the following methods. * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't tokenize registered words. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |ABC|AB/DE/BC/EF/FG|ABC/DEFG| * The back and forth of the registered words, NGramSynonymTokenizer generates *extra* tokens w/ posInc=0. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| In the above sample, "Z" and "1" are the extra tokens. was: I'd like to propose that we have another n-gram tokenizer which can process synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram size is fixed, i.e. minGramSize = maxGramSize. Today, I think we have the following problems when using SynonymFilter with NGramTokenizer. For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ expand=true and N = 2 (2-gram). # There is no consensus (I think :-) how we assign offsets to generated synonym tokens DE, EF and FG when expanding source token AB and BC. # If the query pattern looks like XABC or ABCY, it cannot be matched even if there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to true, because there is no "XA" or "CY" tokens in the index. NGramSynonymTokenizer can solve these problems by providing the following methods. * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't tokenize registered words. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |ABC|AB/DE/BC/EF/FG|ABC/DEFG| * The back and forth of the registered words, NGramSynonymTokenizer generates *extra* tokens w/ posInc=0. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| In the above sample, "Z" and "1" are the extra tokens. > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch New patch that has tests. Because the original test was developed in RONDHUIT and it includes test codes for not only NGramSynonymTokenizer but also synonym dictionary, the attached tests may be redundant in terms of SynonymMap. > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like XABC or ABCY, it cannot be matched even if > there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to > true, because there is no "XA" or "CY" tokens in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783754#comment-13783754 ] Koji Sekiguchi edited comment on LUCENE-5252 at 10/2/13 9:17 AM: - The draft patch without tests. As I don't have Java 7 environment for now, the patch is based on 4x branch. When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie for the synonym dictionary. I've tried to convert the code to Lucene's FST. As this is the first experience of FST for me, any inefficient code may exist. Comments are welcome! was (Author: koji): The draft patch without tests. As I don't have Java 7 environment for now, the patch is based on 4x branch. When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie for the synonym dictionary. I've tried to convert the code to Lucene's FST. As this is the first experience of FST for me, any inefficient code may be there. Comments are welcome! > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like XABC or ABCY, it cannot be matched even if > there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to > true, because there is no "XA" or "CY" tokens in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783754#comment-13783754 ] Koji Sekiguchi edited comment on LUCENE-5252 at 10/2/13 9:12 AM: - The draft patch without tests. As I don't have Java 7 environment for now, the patch is based on 4x branch. When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie for the synonym dictionary. I've tried to convert the code to Lucene's FST. As this is the first experience of FST for me, any inefficient code may be there. Comments are welcome! was (Author: koji): The draft patch without tests. When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie for the synonym dictionary. I've tried to convert the code to Lucene's FST. As this is the first experience of FST for me, any inefficient code may be there. Comments are welcome! > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like XABC or ABCY, it cannot be matched even if > there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to > true, because there is no "XA" or "CY" tokens in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch The draft patch without tests. When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie for the synonym dictionary. I've tried to convert the code to Lucene's FST. As this is the first experience of FST for me, any inefficient code may be there. Comments are welcome! > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like XABC or ABCY, it cannot be matched even if > there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to > true, because there is no "XA" or "CY" tokens in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-5253) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi closed LUCENE-5253. -- Resolution: Duplicate Sorry, duplicated. > add NGramSynonymTokenizer > - > > Key: LUCENE-5253 > URL: https://issues.apache.org/jira/browse/LUCENE-5253 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like XABC or ABCY, it cannot be matched even if > there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to > true, because there is no "XA" or "CY" tokens in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5253) add NGramSynonymTokenizer
Koji Sekiguchi created LUCENE-5253: -- Summary: add NGramSynonymTokenizer Key: LUCENE-5253 URL: https://issues.apache.org/jira/browse/LUCENE-5253 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Koji Sekiguchi Priority: Minor I'd like to propose that we have another n-gram tokenizer which can process synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram size is fixed, i.e. minGramSize = maxGramSize. Today, I think we have the following problems when using SynonymFilter with NGramTokenizer. For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ expand=true and N = 2 (2-gram). # There is no consensus (I think :-) how we assign offsets to generated synonym tokens DE, EF and FG when expanding source token AB and BC. # If the query pattern looks like XABC or ABCY, it cannot be matched even if there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to true, because there is no "XA" or "CY" tokens in the index. NGramSynonymTokenizer can solve these problems by providing the following methods. * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't tokenize registered words. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |ABC|AB/DE/BC/EF/FG|ABC/DEFG| * The back and forth of the registered words, NGramSynonymTokenizer generates *extra* tokens w/ posInc=0. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5252) add NGramSynonymTokenizer
Koji Sekiguchi created LUCENE-5252: -- Summary: add NGramSynonymTokenizer Key: LUCENE-5252 URL: https://issues.apache.org/jira/browse/LUCENE-5252 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Koji Sekiguchi Priority: Minor I'd like to propose that we have another n-gram tokenizer which can process synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram size is fixed, i.e. minGramSize = maxGramSize. Today, I think we have the following problems when using SynonymFilter with NGramTokenizer. For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ expand=true and N = 2 (2-gram). # There is no consensus (I think :-) how we assign offsets to generated synonym tokens DE, EF and FG when expanding source token AB and BC. # If the query pattern looks like XABC or ABCY, it cannot be matched even if there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to true, because there is no "XA" or "CY" tokens in the index. NGramSynonymTokenizer can solve these problems by providing the following methods. * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't tokenize registered words. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |ABC|AB/DE/BC/EF/FG|ABC/DEFG| * The back and forth of the registered words, NGramSynonymTokenizer generates *extra* tokens w/ posInc=0. e.g. ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept analyzer attribute
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711130#comment-13711130 ] Koji Sekiguchi commented on SOLR-3359: -- bq. Is there any particular reason why this enhancement is not targeted at 4.x as well? Well, my motivation was that CJKTokenizer(Factory) marked as deprecated and it would be gone at 5.0. If someone provide a patch for 4.x, I'm happy to commit it. bq. Also, could the title summary be updated to reflect the fact that the change specifies the analyzer class name rather than "fieldType"? Done. > SynonymFilterFactory should accept analyzer attribute > - > > Key: SOLR-3359 > URL: https://issues.apache.org/jira/browse/SOLR-3359 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi > Fix For: 5.0 > > Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch > > > I've not been realized that CJKTokenizer and its factory classes was marked > deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. > {code} > * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and > LowerCaseFilter instead. > {code} > I agree with the idea of using the chain of the Tokenizer and TokenFilters > instead of CJKTokenizer, but it could be a problem for the existing users of > SynonymFilterFactory with CJKTokenizerFactory. > So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3359) SynonymFilterFactory should accept analyzer attribute
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-3359: - Summary: SynonymFilterFactory should accept analyzer attribute (was: SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory) > SynonymFilterFactory should accept analyzer attribute > - > > Key: SOLR-3359 > URL: https://issues.apache.org/jira/browse/SOLR-3359 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi > Fix For: 5.0 > > Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch > > > I've not been realized that CJKTokenizer and its factory classes was marked > deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. > {code} > * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and > LowerCaseFilter instead. > {code} > I agree with the idea of using the chain of the Tokenizer and TokenFilters > instead of CJKTokenizer, but it could be a problem for the existing users of > SynonymFilterFactory with CJKTokenizerFactory. > So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-3359. -- Resolution: Fixed Fix Version/s: 5.0 Thanks, Onodera-san! > SynonymFilterFactory should accept fieldType attribute rather than > tokenizerFactory > --- > > Key: SOLR-3359 > URL: https://issues.apache.org/jira/browse/SOLR-3359 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi > Fix For: 5.0 > > Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch > > > I've not been realized that CJKTokenizer and its factory classes was marked > deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. > {code} > * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and > LowerCaseFilter instead. > {code} > I agree with the idea of using the chain of the Tokenizer and TokenFilters > instead of CJKTokenizer, but it could be a problem for the existing users of > SynonymFilterFactory with CJKTokenizerFactory. > So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-3359: Assignee: Koji Sekiguchi > SynonymFilterFactory should accept fieldType attribute rather than > tokenizerFactory > --- > > Key: SOLR-3359 > URL: https://issues.apache.org/jira/browse/SOLR-3359 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi > Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch > > > I've not been realized that CJKTokenizer and its factory classes was marked > deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. > {code} > * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and > LowerCaseFilter instead. > {code} > I agree with the idea of using the chain of the Tokenizer and TokenFilters > instead of CJKTokenizer, but it could be a problem for the existing users of > SynonymFilterFactory with CJKTokenizerFactory. > So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708459#comment-13708459 ] Koji Sekiguchi commented on SOLR-3359: -- When I opened the ticket, I thought SynonymFilterFactory should accept (Solr's) fieldType attribute as I told in the title. But today, as SynonymFilterFactory is in Lucene land, I think analyzer attribute is more natural than (Solr's) fieldType attribute. I'd like to commit the patch in a few days if no one objects. > SynonymFilterFactory should accept fieldType attribute rather than > tokenizerFactory > --- > > Key: SOLR-3359 > URL: https://issues.apache.org/jira/browse/SOLR-3359 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Koji Sekiguchi > Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch > > > I've not been realized that CJKTokenizer and its factory classes was marked > deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. > {code} > * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and > LowerCaseFilter instead. > {code} > I agree with the idea of using the chain of the Tokenizer and TokenFilters > instead of CJKTokenizer, but it could be a problem for the existing users of > SynonymFilterFactory with CJKTokenizerFactory. > So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory
[ https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707748#comment-13707748 ] Koji Sekiguchi commented on SOLR-3359: -- bq. So, I made SynonymFilterFactory accept analyzer attribute so that I can specify CJKAnalyzer. I've never thought up analyzer. Interesting idea. :) > SynonymFilterFactory should accept fieldType attribute rather than > tokenizerFactory > --- > > Key: SOLR-3359 > URL: https://issues.apache.org/jira/browse/SOLR-3359 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis >Reporter: Koji Sekiguchi > Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch > > > I've not been realized that CJKTokenizer and its factory classes was marked > deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me. > {code} > * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and > LowerCaseFilter instead. > {code} > I agree with the idea of using the chain of the Tokenizer and TokenFilters > instead of CJKTokenizer, but it could be a problem for the existing users of > SynonymFilterFactory with CJKTokenizerFactory. > So this ticket comes to my mind again. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-4751. -- Resolution: Fixed Committed on trunk, branch_4x and lucene_solr_4_3. Thanks! > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 5.0, 4.4, 4.3.1 > > Attachments: SOLR-4751.patch, SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-4751: - Fix Version/s: 4.3.1 4.4 5.0 > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 5.0, 4.4, 4.3.1 > > Attachments: SOLR-4751.patch, SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660197#comment-13660197 ] Koji Sekiguchi commented on SOLR-4751: -- Looks good! I'll commit shortly. > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-4751.patch, SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654556#comment-13654556 ] Koji Sekiguchi commented on SOLR-4751: -- Sorry for the inconvenience. I'll contact the reporter and see if we can fix the tests. > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-4751: - Fix Version/s: (was: 4.4) (was: 5.0) > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-4751: - Fix Version/s: 4.4 5.0 > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 5.0, 4.4 > > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654461#comment-13654461 ] Koji Sekiguchi commented on SOLR-4751: -- I committed the patch on trunk, branch_4x and 4.3. Osuka-san, can you check one of them and see that your fix can solve the problem? > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-4751: Assignee: Koji Sekiguchi > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Assignee: Koji Sekiguchi >Priority: Minor > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651578#comment-13651578 ] Koji Sekiguchi commented on SOLR-4751: -- bq. Need to find files recursively. The other day, I talked Osuka-san and I got the problem. It've come into being there since 3.6 as subdirectory (lang/) was created under conf directory. Although it'd better to have test cases for replication, but as I don't have time and the patch looks simple, I'd like to commit in a few days if no one objects. Meanwhile, updates of the patch are welcome. :) > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Priority: Minor > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles in replication settings, > {code:xml} > > > commit > startup > name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt > > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.
[ https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646470#comment-13646470 ] Koji Sekiguchi commented on SOLR-4751: -- Hi Osuka-san, would you elaborate the problem a little more, please? > The replication problem of the file in a subdirectory. > -- > > Key: SOLR-4751 > URL: https://issues.apache.org/jira/browse/SOLR-4751 > Project: Solr > Issue Type: Bug > Components: replication (java) >Affects Versions: 4.2.1 >Reporter: Minoru Osuka >Priority: Minor > Attachments: SOLR-4751.patch > > > When set lang/stopwords_ja.txt to confFiles, > {code:xml} > > commit > startup > schema.xml,stopwords.txt,lang/stopwords_ja.txt > > {code} > Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave > node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4717) SimpleFacets should respect localParams
[ https://issues.apache.org/jira/browse/SOLR-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632758#comment-13632758 ] Koji Sekiguchi commented on SOLR-4717: -- +1 > SimpleFacets should respect localParams > --- > > Key: SOLR-4717 > URL: https://issues.apache.org/jira/browse/SOLR-4717 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Assignee: Ryan McKinley > Attachments: SOLR-4717-FacetLocalParams.patch > > > In trying to implement http://wiki.apache.org/solr/HierarchicalFaceting I > found the need to send multiple prefix facets in the same request on the same > field. > Currently facet params will parse the localParams, but only use them to pick > out a name. We can easily modify things to let localParams override global > ones. For example: > {code} > &{!key=level3 facet.prefix=3/path/to/folder}path > &{!key=level2 facet.prefix=2/path/to}path > &{!key=level1 facet.prefix=1/path}path > {code} > This can easily be supported if we use: > {code:java} > params = SolrParams.wrapDefaults(localParams, orig); > {code} > when local params exist > -- > We have come a long way from *simple* facets! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4899) FastVectorHighlihgter fails with SIOOB if single phrase or term is > fragCharSize
[ https://issues.apache.org/jira/browse/LUCENE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623568#comment-13623568 ] Koji Sekiguchi commented on LUCENE-4899: Looks good! Sounds reasonable and I like the idea. > FastVectorHighlihgter fails with SIOOB if single phrase or term is > > fragCharSize > - > > Key: LUCENE-4899 > URL: https://issues.apache.org/jira/browse/LUCENE-4899 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 4.0, 4.1, 4.2, 3.6.2, 4.2.1 >Reporter: Simon Willnauer > Fix For: 5.0, 4.3 > > Attachments: LUCENE-4899.patch, LUCENE-4899.patch > > > This has been reported on several occasions like SOLR-4660 / SOLR-4137 or on > the ES mailing list > https://groups.google.com/d/msg/elasticsearch/IdyMSPK5gao/nKZq8_NYWmgJ > The reason is that the current code expects the fragCharSize > matchLength > which is not necessarily true if you use phrases or if you have very long > terms like URLs or so. I have a test that reproduces the issue and a fix as > far as I can tell (me doesn't have much experience with the highlighter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4899) FastVectorHighlihgter fails with SIOOB if single phrase or term is > fragCharSize
[ https://issues.apache.org/jira/browse/LUCENE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622441#comment-13622441 ] Koji Sekiguchi commented on LUCENE-4899: Looks good, Simon! > FastVectorHighlihgter fails with SIOOB if single phrase or term is > > fragCharSize > - > > Key: LUCENE-4899 > URL: https://issues.apache.org/jira/browse/LUCENE-4899 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter >Affects Versions: 4.0, 4.1, 4.2, 3.6.2, 4.2.1 >Reporter: Simon Willnauer > Fix For: 5.0, 4.3 > > Attachments: LUCENE-4899.patch > > > This has been reported on several occasions like SOLR-4660 / SOLR-4137 or on > the ES mailing list > https://groups.google.com/d/msg/elasticsearch/IdyMSPK5gao/nKZq8_NYWmgJ > The reason is that the current code expects the fragCharSize > matchLength > which is not necessarily true if you use phrases or if you have very long > terms like URLs or so. I have a test that reproduces the issue and a fix as > far as I can tell (me doesn't have much experience with the highlighter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563308#comment-13563308 ] Koji Sekiguchi commented on LUCENE-1822: I committed the above note to trunk, branch_4x and lucene_solr_4_1. > FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too > naive > -- > > Key: LUCENE-1822 > URL: https://issues.apache.org/jira/browse/LUCENE-1822 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Affects Versions: 2.9 > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, > LUCENE-1822-tests.patch > > > The new FastVectorHighlighter performs extremely well, however I've found in > testing that the window of text chosen per fragment is often very poor, as it > is hard coded in SimpleFragListBuilder to always select starting 6 characters > to the left of the first phrase match in a fragment. When selecting long > fragments, this often means that there is barely any context before the > highlighted word, and lots after; even worse, when highlighting a phrase at > the end of a short text the beginning is cut off, even though the entire > phrase would fit in the specified fragCharSize. For example, highlighting > "Punishment" in "Crime and Punishment" returns "e and Punishment" no > matter what fragCharSize is specified. I am going to attach a patch that > improves the text window selection by recalculating the starting margin once > all phrases in the fragment have been identified - this way if a single word > is matched in a fragment, it will appear in the middle of the highlight, > instead of 6 characters from the beginning. This way one can also guarantee > that the entirety of short texts are represented in a fragment by specifying > a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561187#comment-13561187 ] Koji Sekiguchi commented on LUCENE-1822: Here is the patch for trunk. I think the original reporter Alex describes the heart of the problem very well, so I borrow the description. :) {code} $ svn diff Index: lucene/CHANGES.txt === --- lucene/CHANGES.txt (revision 1437783) +++ lucene/CHANGES.txt (working copy) @@ -414,6 +414,13 @@ This only affects requests with depth>1. If you execute such requests and rely on the facet results being returned flat (i.e. no hierarchy), you should set the ResultMode to GLOBAL_FLAT. (Shai Erera, Gilad Barkai) + +* LUCENE-1822: Improves the text window selection by recalculating the starting margin + once all phrases in the fragment have been identified in FastVectorHighlighter. This + way if a single word is matched in a fragment, it will appear in the middle of the highlight, + instead of 6 characters from the beginning. This way one can also guarantee that + the entirety of short texts are represented in a fragment by specifying a large + enough fragCharSize. Optimizations {code} > FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too > naive > -- > > Key: LUCENE-1822 > URL: https://issues.apache.org/jira/browse/LUCENE-1822 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Affects Versions: 2.9 > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, > LUCENE-1822-tests.patch > > > The new FastVectorHighlighter performs extremely well, however I've found in > testing that the window of text chosen per fragment is often very poor, as it > is hard coded in SimpleFragListBuilder to always select starting 6 characters > to the left of the first phrase match in a fragment. When selecting long > fragments, this often means that there is barely any context before the > highlighted word, and lots after; even worse, when highlighting a phrase at > the end of a short text the beginning is cut off, even though the entire > phrase would fit in the specified fragCharSize. For example, highlighting > "Punishment" in "Crime and Punishment" returns "e and Punishment" no > matter what fragCharSize is specified. I am going to attach a patch that > improves the text window selection by recalculating the starting margin once > all phrases in the fragment have been identified - this way if a single word > is matched in a fragment, it will appear in the middle of the highlight, > instead of 6 characters from the beginning. This way one can also guarantee > that the entirety of short texts are represented in a fragment by specifying > a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560692#comment-13560692 ] Koji Sekiguchi commented on LUCENE-1822: Sure, that's great. Do you have a draft note? > FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too > naive > -- > > Key: LUCENE-1822 > URL: https://issues.apache.org/jira/browse/LUCENE-1822 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Affects Versions: 2.9 > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, > LUCENE-1822-tests.patch > > > The new FastVectorHighlighter performs extremely well, however I've found in > testing that the window of text chosen per fragment is often very poor, as it > is hard coded in SimpleFragListBuilder to always select starting 6 characters > to the left of the first phrase match in a fragment. When selecting long > fragments, this often means that there is barely any context before the > highlighted word, and lots after; even worse, when highlighting a phrase at > the end of a short text the beginning is cut off, even though the entire > phrase would fit in the specified fragCharSize. For example, highlighting > "Punishment" in "Crime and Punishment" returns "e and Punishment" no > matter what fragCharSize is specified. I am going to attach a patch that > improves the text window selection by recalculating the starting margin once > all phrases in the fragment have been identified - this way if a single word > is matched in a fragment, it will appear in the middle of the highlight, > instead of 6 characters from the beginning. This way one can also guarantee > that the entirety of short texts are represented in a fragment by specifying > a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive
[ https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560604#comment-13560604 ] Koji Sekiguchi commented on LUCENE-1822: Uh, Simon, sorry for my lack of prudence. > FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too > naive > -- > > Key: LUCENE-1822 > URL: https://issues.apache.org/jira/browse/LUCENE-1822 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Affects Versions: 2.9 > Environment: any >Reporter: Alex Vigdor >Assignee: Koji Sekiguchi >Priority: Minor > Fix For: 4.1, 5.0 > > Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, > LUCENE-1822-tests.patch > > > The new FastVectorHighlighter performs extremely well, however I've found in > testing that the window of text chosen per fragment is often very poor, as it > is hard coded in SimpleFragListBuilder to always select starting 6 characters > to the left of the first phrase match in a fragment. When selecting long > fragments, this often means that there is barely any context before the > highlighted word, and lots after; even worse, when highlighting a phrase at > the end of a short text the beginning is cut off, even though the entire > phrase would fit in the specified fragCharSize. For example, highlighting > "Punishment" in "Crime and Punishment" returns "e and Punishment" no > matter what fragCharSize is specified. I am going to attach a patch that > improves the text window selection by recalculating the starting margin once > all phrases in the fragment have been identified - this way if a single word > is matched in a fragment, it will appear in the middle of the highlight, > instead of 6 characters from the beginning. This way one can also guarantee > that the entirety of short texts are represented in a fragment by specifying > a large enough fragCharSize. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params
[ https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-4330. -- Resolution: Fixed Fix Version/s: 3.6.3 5.0 4.2 > group.sort is ignored when using truncate and ex/tag local params > - > > Key: SOLR-4330 > URL: https://issues.apache.org/jira/browse/SOLR-4330 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.6, 4.0, 4.1, 5.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Fix For: 4.2, 5.0, 3.6.3 > > Attachments: SOLR-4330.patch, SOLR-4330.patch > > > In parseParams method of SimpleFacets, as group sort is not set after > creating grouping object, member variable groupSort is always null. Because > of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is > created all the time. > {code} > public AbstractAllGroupHeadsCollector createAllGroupCollector() throws > IOException { > Sort sortWithinGroup = groupSort != null ? groupSort : new Sort(); > return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params
[ https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-4330: Assignee: Koji Sekiguchi > group.sort is ignored when using truncate and ex/tag local params > - > > Key: SOLR-4330 > URL: https://issues.apache.org/jira/browse/SOLR-4330 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.6, 4.0, 4.1, 5.0 >Reporter: Koji Sekiguchi >Assignee: Koji Sekiguchi >Priority: Trivial > Attachments: SOLR-4330.patch, SOLR-4330.patch > > > In parseParams method of SimpleFacets, as group sort is not set after > creating grouping object, member variable groupSort is always null. Because > of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is > created all the time. > {code} > public AbstractAllGroupHeadsCollector createAllGroupCollector() throws > IOException { > Sort sortWithinGroup = groupSort != null ? groupSort : new Sort(); > return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params
[ https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-4330: - Attachment: SOLR-4330.patch I added a test case which will be failed if the patch is not applied. > group.sort is ignored when using truncate and ex/tag local params > - > > Key: SOLR-4330 > URL: https://issues.apache.org/jira/browse/SOLR-4330 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.6, 4.0, 4.1, 5.0 >Reporter: Koji Sekiguchi >Priority: Trivial > Attachments: SOLR-4330.patch, SOLR-4330.patch > > > In parseParams method of SimpleFacets, as group sort is not set after > creating grouping object, member variable groupSort is always null. Because > of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is > created all the time. > {code} > public AbstractAllGroupHeadsCollector createAllGroupCollector() throws > IOException { > Sort sortWithinGroup = groupSort != null ? groupSort : new Sort(); > return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params
[ https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-4330: - Attachment: SOLR-4330.patch > group.sort is ignored when using truncate and ex/tag local params > - > > Key: SOLR-4330 > URL: https://issues.apache.org/jira/browse/SOLR-4330 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 3.6, 4.0, 4.1, 5.0 >Reporter: Koji Sekiguchi >Priority: Trivial > Attachments: SOLR-4330.patch > > > In parseParams method of SimpleFacets, as group sort is not set after > creating grouping object, member variable groupSort is always null. Because > of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is > created all the time. > {code} > public AbstractAllGroupHeadsCollector createAllGroupCollector() throws > IOException { > Sort sortWithinGroup = groupSort != null ? groupSort : new Sort(); > return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup); > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org