[jira] [Resolved] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly

2018-07-23 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-12570.
---
   Resolution: Fixed
 Assignee: Koji Sekiguchi
Fix Version/s: 7.4.1

> OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields 
> because pattern replacement doesn't work correctly
> -
>
> Key: SOLR-12570
> URL: https://issues.apache.org/jira/browse/SOLR-12570
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: UpdateRequestProcessors
>Affects Versions: 7.3, 7.3.1, 7.4
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.5, 7.4.1
>
> Attachments: SOLR-12570.patch
>
>
> Because of the following code, if resolvedDest is "body_{EntityType}_s" and 
> becomes "body_PERSON_s" by replacement, but once it is replaced, as 
> placeholder ({EntityType}) is overwritten, the destination is always 
> "body_PERSON_s".
> {code}
> resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8420) Upgrade OpenNLP to 1.9.0

2018-07-23 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-8420.

Resolution: Fixed
  Assignee: Koji Sekiguchi

> Upgrade OpenNLP to 1.9.0
> 
>
> Key: LUCENE-8420
> URL: https://issues.apache.org/jira/browse/LUCENE-8420
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Affects Versions: 7.4
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: LUCENE-8420.patch
>
>
> OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 
> can read the previous format for back-compat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly

2018-07-20 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551521#comment-16551521
 ] 

Koji Sekiguchi commented on SOLR-12570:
---

I posted a patch in LUCENE-8420. It includes the new ner model which can 
predict LOCATION in addition to PERSON. I think we can add the test for this 
after LUCENE-8420 committed, I haven't tried the new model file to predict 
LOCATION, though.

> OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields 
> because pattern replacement doesn't work correctly
> -
>
> Key: SOLR-12570
> URL: https://issues.apache.org/jira/browse/SOLR-12570
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: UpdateRequestProcessors
>Affects Versions: 7.3, 7.3.1, 7.4
>Reporter: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: SOLR-12570.patch
>
>
> Because of the following code, if resolvedDest is "body_{EntityType}_s" and 
> becomes "body_PERSON_s" by replacement, but once it is replaced, as 
> placeholder ({EntityType}) is overwritten, the destination is always 
> "body_PERSON_s".
> {code}
> resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8420) Upgrade OpenNLP to 1.9.0

2018-07-20 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551519#comment-16551519
 ] 

Koji Sekiguchi commented on LUCENE-8420:


I created model files for 1.9.0 by executing ant train-test-models under 
lucene/analysis/opennlp/. As for the training data, I renamed ner_flashman.txt 
to ner.txt and let the file have location type for SOLR-12570.

I deleted opennlp-maxent which is never used (and I think it's old; 
opennlp-tools package includes maxent).

> Upgrade OpenNLP to 1.9.0
> 
>
> Key: LUCENE-8420
> URL: https://issues.apache.org/jira/browse/LUCENE-8420
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Affects Versions: 7.4
>Reporter: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: LUCENE-8420.patch
>
>
> OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 
> can read the previous format for back-compat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8420) Upgrade OpenNLP to 1.9.0

2018-07-20 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-8420:
---
Attachment: LUCENE-8420.patch

> Upgrade OpenNLP to 1.9.0
> 
>
> Key: LUCENE-8420
> URL: https://issues.apache.org/jira/browse/LUCENE-8420
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/analysis
>Affects Versions: 7.4
>Reporter: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: LUCENE-8420.patch
>
>
> OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 
> can read the previous format for back-compat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12571) Upgrade OpenNLP to 1.9.0

2018-07-20 Thread Koji Sekiguchi (JIRA)
Koji Sekiguchi created SOLR-12571:
-

 Summary: Upgrade OpenNLP to 1.9.0
 Key: SOLR-12571
 URL: https://issues.apache.org/jira/browse/SOLR-12571
 Project: Solr
  Issue Type: Task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - LangId, update
Affects Versions: 7.4
Reporter: Koji Sekiguchi
 Fix For: master (8.0), 7.5


OpenNLP 1.9.0 generates new format model file which 1.8.x cannot read. 1.9.0 
can read the previous format for back-compat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly

2018-07-20 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-12570:
--
Attachment: SOLR-12570.patch

> OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields 
> because pattern replacement doesn't work correctly
> -
>
> Key: SOLR-12570
> URL: https://issues.apache.org/jira/browse/SOLR-12570
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: UpdateRequestProcessors
>Affects Versions: 7.3, 7.3.1, 7.4
>Reporter: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.5
>
> Attachments: SOLR-12570.patch
>
>
> Because of the following code, if resolvedDest is "body_{EntityType}_s" and 
> becomes "body_PERSON_s" by replacement, but once it is replaced, as 
> placeholder ({EntityType}) is overwritten, the destination is always 
> "body_PERSON_s".
> {code}
> resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12570) OpenNLPExtractNamedEntitiesUpdateProcessor cannot support multi fields because pattern replacement doesn't work correctly

2018-07-20 Thread Koji Sekiguchi (JIRA)
Koji Sekiguchi created SOLR-12570:
-

 Summary: OpenNLPExtractNamedEntitiesUpdateProcessor cannot support 
multi fields because pattern replacement doesn't work correctly
 Key: SOLR-12570
 URL: https://issues.apache.org/jira/browse/SOLR-12570
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: UpdateRequestProcessors
Affects Versions: 7.4, 7.3.1, 7.3
Reporter: Koji Sekiguchi
 Fix For: master (8.0), 7.5


Because of the following code, if resolvedDest is "body_{EntityType}_s" and 
becomes "body_PERSON_s" by replacement, but once it is replaced, as placeholder 
({EntityType}) is overwritten, the destination is always "body_PERSON_s".

{code}
resolvedDest = resolvedDest.replace(ENTITY_TYPE, entityType);
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12202) failed to run solr-exporter.cmd on Windows platform

2018-05-02 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-12202.
---
   Resolution: Fixed
Fix Version/s: master (8.0)
   7.4
   7.3

Thanks!

> failed to run solr-exporter.cmd on Windows platform
> ---
>
> Key: SOLR-12202
> URL: https://issues.apache.org/jira/browse/SOLR-12202
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.3
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Major
> Fix For: 7.3, 7.4, master (8.0)
>
> Attachments: SOLR-12202.patch, SOLR-12202_branch_7_3.patch
>
>
> failed to run solr-exporter.cmd on Windows platform due to following:
> - incorrect main class name.
> - incorrect classpath specification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-12202) failed to run solr-exporter.cmd on Windows platform

2018-04-28 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-12202:
-

Assignee: Koji Sekiguchi

> failed to run solr-exporter.cmd on Windows platform
> ---
>
> Key: SOLR-12202
> URL: https://issues.apache.org/jira/browse/SOLR-12202
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.3
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Major
> Attachments: SOLR-12202.patch
>
>
> failed to run solr-exporter.cmd on Windows platform due to following:
> - incorrect main class name.
> - incorrect classpath specification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-03-12 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-11795.
---
Resolution: Fixed

Mark as resolved. Thanks everyone!

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, 
> SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, 
> SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, 
> SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, 
> SOLR-11795-ref-guide.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-03-05 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-11795:
---

Thanks. I'll apply the additional patch soon.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, 
> SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, 
> SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, 
> SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, 
> SOLR-11795-ref-guide.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-03-05 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-11795.
---
Resolution: Fixed

Yes, thanks Uwe and all!

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, 
> SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, 
> SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, 
> SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, 
> SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-03-04 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385552#comment-16385552
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

Uwe's suggestion helped us to check this patch working on various platforms 
without causing someone trouble. Actually, Java 9 Jenkins found that SnakeYAML 
stuff uses reflection in illegal ways, which we couldn't notice before 
committing.

... and the results look good so far. I'd like to commit this to master and 
branch_7x soon.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, 
> SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, 
> SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, 
> SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, 
> SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-03-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383080#comment-16383080
 ] 

Koji Sekiguchi edited comment on SOLR-11795 at 3/2/18 3:02 AM:
---

Hi Uwe,

I created the following branches:

* SOLR-11795 (for master)
* branch_7x-SOLR-11795 (for branch_7x)

Sorry to put you to the trouble but can you setup the Linux and Windows 
Policeman Jenkins jobs that you kindly suggested a week ago? Thank you very 
much in advance!


was (Author: koji):
Hi Uwe,

I created the following branches:

* SOLR-1175 (for master)
* branch_7x-SOLR-1175 (for branch_7x)

Sorry to put you to the trouble but can you setup the Linux and Windows 
Policeman Jenkins jobs that you kindly suggested a week ago? Thank you very 
much in advance!

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, 
> SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, 
> SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, 
> SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, 
> SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-03-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383080#comment-16383080
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

Hi Uwe,

I created the following branches:

* SOLR-1175 (for master)
* branch_7x-SOLR-1175 (for branch_7x)

Sorry to put you to the trouble but can you setup the Linux and Windows 
Policeman Jenkins jobs that you kindly suggested a week ago? Thank you very 
much in advance!

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-10.patch, SOLR-11795-11.patch, 
> SOLR-11795-2.patch, SOLR-11795-3.patch, SOLR-11795-4.patch, 
> SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795-7.patch, 
> SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795-dev-tools.patch, 
> SOLR-11795.patch, solr-dashboard.png, solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-24 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375582#comment-16375582
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

Thanks for the kind suggestion. I'd like to create a branch for this and let 
you know the name.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, 
> SOLR-11795-dev-tools.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-23 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375195#comment-16375195
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

I can't apologize enough for this. :(

> Now we have XML, JSON, properties files and now YAML. Why not use one that's 
> already used by other places in Solr?

As for using yaml, I asked the contributor why using it rather than json and I 
just accepted the reason (more readable and understandable, able to include 
comment etc.) but I should gain favor with committers about importing new 
config format.

> Or much simpler: Get rid of YAML!

I'd like to talk to him that we could use json for config but it'll take time 
to apply.

I'm going to revert this soon.


> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, 
> SOLR-11795-dev-tools.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-22 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-11795.
---
Resolution: Fixed

Thanks Minoru and everyone! 

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, 
> SOLR-11795-dev-tools.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-22 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373596#comment-16373596
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

Thanks for letting us know it. Yes, we discussed about the problem last night 
as we were paying careful attention to Jenkins. I think we can fix it soon.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: master (8.0), 7.3
>
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795-9.patch, SOLR-11795.patch, 
> solr-dashboard.png, solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-20 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-11795:
---

Reopening this. We're still working on this.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 7.3
>
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795-8.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-20 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-11795.
---
   Resolution: Fixed
Fix Version/s: 7.3

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 7.3
>
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-19 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369692#comment-16369692
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

I can still see several UpdateRequestProcessors in solrconfig.xml for test. Are 
they necessary? And I'm sorry if I'm wrong but do you need 
test-files/exampledocs/*.xml files?

As for schema settings, existing all Solr contribs use schema.xml, not 
managed-schema. Why don't you follow them?

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, 
> SOLR-11795-7.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-19 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369645#comment-16369645
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

Thank you for updating the patch. I can see hard coded luceneMatchVersion in 
the patch:

{code}
7.1.0
{code}

You can rephrase it like this:

{code}
${tests.luceneMatchVersion:LATEST}
{code}

And I think your solrconfig.xml for test is still fat... Please consult 
solr/contrib/langid for making test config more compact.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795-6.patch, SOLR-11795.patch, 
> solr-dashboard.png, solr-exporter-diagram.png
>
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-02-16 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366903#comment-16366903
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

Today I had a meeting with Minoru, the contributor of this patch. We discussed 
in detail about this contribution and I found this is very nice!

There is a similar ticket SOLR-10654, which implements ResponseWriter for 
Prometheus and is called thru wt parameter, but I prefer Minoru's way.  Why I 
prefer this is because:

* This is highly independent from Solr main unit. He just makes 
contrib/prometheus-exporter directory and provides everything under it, 
including SolrExporter for Prometheus in this patch. This patch doesn't change 
Solr main source.
* Implementing an exporter looks mainstream in Prometheus field, such as MySQL, 
Memcached, Mesos, etc. See https://prometheus.io/docs/instrumenting/exporters/
* Solrj is used to implement SolrExporter in this patch. It can be used on 
SolrCloud environment.
* It allows users to monitor not only Solr metrics which come from 
/admin/metrics but also facet counts which come from /select (see config.yml in 
the patch).

I requested him to update the patch in terms of providing Ref Guide (he already 
wrote README.md so just move its contents to Ref Guide) and adding more tests 
so that we can know the change of the response format of /admin/metrics if it 
happens.

I'll wait for his next patch. Once I got it and nobody objects, I'd like to 
commit this in the next week.

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11795-2.patch, SOLR-11795-3.patch, 
> SOLR-11795-4.patch, SOLR-11795-5.patch, SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11592) add another language detector using OpenNLP

2018-01-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-11592:
--
Affects Version/s: (was: 7.1)
   7.2

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.2
>Reporter: Koji Sekiguchi
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-11592) add another language detector using OpenNLP

2018-01-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-11592:
-

Assignee: Steve Rowe

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.1
>Reporter: Koji Sekiguchi
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11592) add another language detector using OpenNLP

2018-01-17 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328494#comment-16328494
 ] 

Koji Sekiguchi commented on SOLR-11592:
---

Looks good to me. :) 

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11592.patch, SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-11795) Add Solr metrics exporter for Prometheus

2018-01-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-11795:
-

Assignee: Koji Sekiguchi

> Add Solr metrics exporter for Prometheus
> 
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11795.patch, solr-dashboard.png, 
> solr-exporter-diagram.png
>
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11795) Add Solr metrics exporter for Prometheus to contrib directory

2017-12-26 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16304127#comment-16304127
 ] 

Koji Sekiguchi commented on SOLR-11795:
---

+1 looks nice!

> Add Solr metrics exporter for Prometheus to contrib directory
> -
>
> Key: SOLR-11795
> URL: https://issues.apache.org/jira/browse/SOLR-11795
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Minoru Osuka
>Priority: Minor
> Fix For: master (8.0)
>
> Attachments: solr-dashboard.png, solr-exporter-diagram.png
>
>
> I 'd like to monitor Solr using Prometheus and Grafana.
> I've already created Solr metrics exporter for Prometheus. I'd like to 
> contribute to contrib directory if you don't mind.
> !solr-exporter-diagram.png|thumbnail!
> !solr-dashboard.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11592) add another language detector using OpenNLP

2017-11-07 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243314#comment-16243314
 ] 

Koji Sekiguchi commented on SOLR-11592:
---

Hi Steve,

Thank you for reviewing the patch. You're right! I'll do them later, after 
finishing my project. Or, if Steve or someone can implement this, please take. 
I think I can review. :)

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11592) add another language detector using OpenNLP

2017-11-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233857#comment-16233857
 ] 

Koji Sekiguchi edited comment on SOLR-11592 at 11/2/17 12:55 AM:
-

OpenNLP's model covers 103 languages. 
https://svn.apache.org/repos/bigdata/opennlp/tags/langdetect-183_RC3/leipzig/resources/README.txt


was (Author: koji):
OpenNLP's model covers 103 languages.

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11592) add another language detector using OpenNLP

2017-11-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16233857#comment-16233857
 ] 

Koji Sekiguchi commented on SOLR-11592:
---

OpenNLP's model covers 103 languages.

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11592) add another language detector using OpenNLP

2017-11-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-11592:
--
Attachment: SOLR-11592.patch

patch. it doesn't have any tests yet.

> add another language detector using OpenNLP
> ---
>
> Key: SOLR-11592
> URL: https://issues.apache.org/jira/browse/SOLR-11592
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: contrib - LangId
>Affects Versions: 7.1
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-11592.patch
>
>
> We already have two language detectors, lang-detect and Tika's lang detect. 
> This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11592) add another language detector using OpenNLP

2017-11-01 Thread Koji Sekiguchi (JIRA)
Koji Sekiguchi created SOLR-11592:
-

 Summary: add another language detector using OpenNLP
 Key: SOLR-11592
 URL: https://issues.apache.org/jira/browse/SOLR-11592
 Project: Solr
  Issue Type: New Feature
  Security Level: Public (Default Security Level. Issues are Public)
  Components: contrib - LangId
Affects Versions: 7.1
Reporter: Koji Sekiguchi
Priority: Minor


We already have two language detectors, lang-detect and Tika's lang detect. 
This is a ticket that gives users third option using OpenNLP. :)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9184) Add convenience method to ModifiableSolrParams

2017-03-22 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-9184.
--
   Resolution: Fixed
Fix Version/s: 6.6

Thanks, Jörg!

> Add convenience method to ModifiableSolrParams
> --
>
> Key: SOLR-9184
> URL: https://issues.apache.org/jira/browse/SOLR-9184
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Jörg Rathlev
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 6.6
>
> Attachments: SOLR-9184.patch, SOLR-9184.patch
>
>
> Add a static convenience method {{ModifiableSolrParams#of(SolrParams)}} which 
> returns the same instance if it already is modifiable, otherwise creates a 
> new {{ModifiableSolrParams}} instance.
> Rationale: when writing custom SearchComponents, we find that we often need 
> to ensure that the SolrParams are modifiable. The copy constructor of 
> ModifiableSolrParams always creates a copy, even if the SolrParms already are 
> modifiable.
> Alternatives: The method could also be added as a convenience method in 
> SolrParams itself, which already has static helper methods for wrapDefaults 
> and wrapAppended.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9184) Add convenience method to ModifiableSolrParams

2017-03-21 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-9184:


Assignee: Koji Sekiguchi

> Add convenience method to ModifiableSolrParams
> --
>
> Key: SOLR-9184
> URL: https://issues.apache.org/jira/browse/SOLR-9184
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Jörg Rathlev
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-9184.patch
>
>
> Add a static convenience method {{ModifiableSolrParams#of(SolrParams)}} which 
> returns the same instance if it already is modifiable, otherwise creates a 
> new {{ModifiableSolrParams}} instance.
> Rationale: when writing custom SearchComponents, we find that we often need 
> to ensure that the SolrParams are modifiable. The copy constructor of 
> ModifiableSolrParams always creates a copy, even if the SolrParms already are 
> modifiable.
> Alternatives: The method could also be added as a convenience method in 
> SolrParams itself, which already has static helper methods for wrapDefaults 
> and wrapAppended.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9184) Add convenience method to ModifiableSolrParams

2017-03-21 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15935705#comment-15935705
 ] 

Koji Sekiguchi commented on SOLR-9184:
--

I think this is almost ready. How about adding assertNotSame for the first test?

> Add convenience method to ModifiableSolrParams
> --
>
> Key: SOLR-9184
> URL: https://issues.apache.org/jira/browse/SOLR-9184
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Reporter: Jörg Rathlev
>Priority: Minor
> Attachments: SOLR-9184.patch
>
>
> Add a static convenience method {{ModifiableSolrParams#of(SolrParams)}} which 
> returns the same instance if it already is modifiable, otherwise creates a 
> new {{ModifiableSolrParams}} instance.
> Rationale: when writing custom SearchComponents, we find that we often need 
> to ensure that the SolrParams are modifiable. The copy constructor of 
> ModifiableSolrParams always creates a copy, even if the SolrParms already are 
> modifiable.
> Alternatives: The method could also be added as a convenience method in 
> SolrParams itself, which already has static helper methods for wrapDefaults 
> and wrapAppended.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-2867) Problem Wtih solr Score Display

2017-03-21 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi closed SOLR-2867.

Resolution: Invalid

Please ask about your problem in the solr-user mailing list.

> Problem Wtih solr Score Display
> ---
>
> Key: SOLR-2867
> URL: https://issues.apache.org/jira/browse/SOLR-2867
> Project: Solr
>  Issue Type: Bug
>  Components: SearchComponents - other
>Affects Versions: 3.1
> Environment: Linux and Mysql
>Reporter: Pragyanjeet Rout
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> We are firing a solr query and checking its relevancy score.
> But problem with relevancy score is that for some results the value for score 
> is been truncated.
> Example:-I have a query as below
> http://localhost:8983/solr/mywork/select/?q=( contractLength:12 speedScore:[4 
> TO 7] dataScore:[2 TO *])&fq=( ( connectionType:"Cable" 
> connectionType:"Naked")AND ( monthlyCost:[* TO *])AND ( speedScore:[4 TO 
> *])AND ( dataScore:[2 TO 
> *]))&version=2.2&start=0&rows=500&indent=on&sort=score desc, planType asc, 
> monthlyCost1 asc, monthlyCost2  asc
> The below mentioned is my xml returned from solr :-
> 
> 3.6897283
> 12
> 3
> ABC
> 120.9
> 7
> 
> 
> 3.689728
> 12
> 2
> DEF
> 49.95
> 6
> 
> I have used the "debugQuery=true" in query and I saw solr is calculating the 
> correct score(PSB) but somehow is it truncating the lastdigit i.e "3" from 
> the second result.
> Because of this my ranking order gets disturbed and I get wrong results while 
> displaying 
> 
> 3.6897283 = (MATCH) sum of:3.1476827 = (MATCH) weight(contractLength:€#0;#12; 
> in 51), product of:0.92363054 = queryWeight(contractLength:€#0;#12;), product 
> of:3.4079456 = idf(docFreq=8, maxDocs=100)  0.27102268 = queryNorm 3.4079456 
> = (MATCH) fieldWeight(contractLength:€#0;#12; in 51), product of:1.0 = 
> tf(termFreq(contractLength:€#0;#12;)=1) 3.4079456 = idf(docFreq=8, 
> maxDocs=100)
>   1.0 = fieldNorm(field=contractLength, doc=51)  0.27102268 = (MATCH) 
> ConstantScore(speedScore:[€#0;#4; TO *]), product of:
> 1.0 = boost  0.27102268 = queryNorm  0.27102268 = (MATCH) 
> ConstantScore(dataScore:[€#0;#2; TO *]), product of: 1.0 = boost   0.27102268 
> = queryNorm
> 
> 
> 3.6897283 = (MATCH) sum of: 3.1476827 = (MATCH) 
> weight(contractLength:€#0;#12; in 97), product of: 0.92363054 = 
> queryWeight(contractLength:€#0;#12;), product of: 3.4079456 = idf(docFreq=8, 
> maxDocs=100)  0.27102268 = queryNorm 3.4079456 = (MATCH) 
> fieldWeight(contractLength:€#0;#12; in 97), product of: 1.0 = 
> tf(termFreq(contractLength:€#0;#12;)=1) 3.4079456 = idf(docFreq=8, 
> maxDocs=100)  1.0 = fieldNorm(field=contractLength, doc=97)  0.27102268 = 
> (MATCH) ConstantScore(speedScore:[€#0;#4; TO *]), product of: 1.0 = boost
> 0.27102268 = queryNorm  0.27102268 = (MATCH) 
> ConstantScore(dataScore:[€#0;#2; TO *]), product of:1.0 = boost
> 0.27102268 = queryNorm
> 
> Please educate me for the above behaviour from solr.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-9918.
--
   Resolution: Fixed
Fix Version/s: 6.4
   master (7.0)

Thanks, Tim!

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
>Assignee: Koji Sekiguchi
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-09 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15813849#comment-15813849
 ] 

Koji Sekiguchi commented on SOLR-9918:
--

I think this is ready.

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
>Assignee: Koji Sekiguchi
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-06 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806603#comment-15806603
 ] 

Koji Sekiguchi commented on SOLR-9918:
--

Thank you for giving the great explanation which is more than I expected. :)

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
>Assignee: Koji Sekiguchi
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-9918:


Assignee: Koji Sekiguchi

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
>Assignee: Koji Sekiguchi
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-05 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803344#comment-15803344
 ] 

Koji Sekiguchi commented on SOLR-9918:
--

Thank you for your additional explanation. I agree with you on the Confluence 
page is the best place to put that kind of guideline notes. I just wanted to 
see such information in the ticket, not javadoc, because I think it helps 
committers to understand the requirement and importance of this proposal.

As for SignatureUpdateProcessor, I thought it skipped to add the doc if the 
signature is same, but when I looked into the patch on SOLR-799, I noticed that 
it always updates the existing document even if the doc has the same signature.

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch, SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9918) An UpdateRequestProcessor to skip duplicate inserts and ignore updates to missing docs

2017-01-03 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796928#comment-15796928
 ] 

Koji Sekiguchi commented on SOLR-9918:
--

I believe the proposal is very useful for users who need this function, but it 
is better for users if there is an additional explanation of the difference 
from the existing one that gives similar function.

How do users decide which UpdateRequestProcessor to use for their use cases as 
compared to SignatureUpdateProcessor?

> An UpdateRequestProcessor to skip duplicate inserts and ignore updates to 
> missing docs
> --
>
> Key: SOLR-9918
> URL: https://issues.apache.org/jira/browse/SOLR-9918
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update
>Reporter: Tim Owen
> Attachments: SOLR-9918.patch
>
>
> This is an UpdateRequestProcessor and Factory that we have been using in 
> production, to handle 2 common cases that were awkward to achieve using the 
> existing update pipeline and current processor classes:
> * When inserting document(s), if some already exist then quietly skip the new 
> document inserts - do not churn the index by replacing the existing documents 
> and do not throw a noisy exception that breaks the batch of inserts. By 
> analogy with SQL, {{insert if not exists}}. In our use-case, multiple 
> application instances can (rarely) process the same input so it's easier for 
> us to de-dupe these at Solr insert time than to funnel them into a global 
> ordered queue first.
> * When applying AtomicUpdate documents, if a document being updated does not 
> exist, quietly do nothing - do not create a new partially-populated document 
> and do not throw a noisy exception about missing required fields. By analogy 
> with SQL, {{update where id = ..}}. Our use-case relies on this because we 
> apply updates optimistically and have best-effort knowledge about what 
> documents will exist, so it's easiest to skip the updates (in the same way a 
> Database would).
> I would have kept this in our own package hierarchy but it relies on some 
> package-scoped methods, and seems like it could be useful to others if they 
> choose to configure it. Some bits of the code were borrowed from 
> {{DocBasedVersionConstraintsProcessorFactory}}.
> Attached patch has unit tests to confirm the behaviour.
> This class can be used by configuring solrconfig.xml like so..
> {noformat}
>   
> 
>  class="org.apache.solr.update.processor.SkipExistingDocumentsProcessorFactory">
>   true
>   false 
> 
> 
> 
>   
> {noformat}
> and initParams defaults of
> {noformat}
>   skipexisting
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7321) Character Mapping

2016-06-08 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15321858#comment-15321858
 ] 

Koji Sekiguchi commented on LUCENE-7321:


What is the advantage of this compared to MappingCharFilter?

> Character Mapping
> -
>
> Key: LUCENE-7321
> URL: https://issues.apache.org/jira/browse/LUCENE-7321
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Affects Versions: 4.6.1, 6.0, 5.4.1, 6.0.1
>Reporter: Ivan Provalov
>Priority: Minor
>  Labels: patch
> Fix For: 6.0.1
>
> Attachments: CharacterMappingComponent.pdf, LUCENE-7321.patch
>
>
> One of the challenges in search is recall of an item with a common typing 
> variant.  These cases can be as simple as lower/upper case in most languages, 
> accented characters, or more complex morphological phenomena like prefix 
> omitting, or constructing a character with some combining mark.  This 
> component addresses the cases, which are not covered by ASCII folding 
> component, or more complex to design with other tools.  The idea is that a 
> linguist could provide the mappings in a tab-delimited file, which then can 
> be directly used by Solr.
> The mappings are maintained in the tab-delimited file, which could be just a 
> copy paste from Excel spreadsheet.  This gives the linguists the opportunity 
> to create the mappings, then for the developer to include them in Solr 
> configuration.  There are a few cases, when the mappings grow complex, where 
> some additional debugging may be required.  The mappings can contain any 
> sequence of characters to any other sequence of characters.
> Some of the cases I discuss in detail document are handling the voiced vowels 
> for Japanese; common typing substitutions for Korean, Russian, Polish; 
> transliteration for Polish, Arabic; prefix removal for Arabic; suffix folding 
> for Japanese.  In the appendix, I give an example of implementing a Russian 
> light weight stemmer using this component.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-6837) Add N-best output capability to JapaneseTokenizer

2015-10-12 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954483#comment-14954483
 ] 

Koji Sekiguchi commented on LUCENE-6837:


We have our own morphological analyzer with n-best output.

If nobody take this, I'll assign to me. :)

> Add N-best output capability to JapaneseTokenizer
> -
>
> Key: LUCENE-6837
> URL: https://issues.apache.org/jira/browse/LUCENE-6837
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 5.3
>Reporter: KONNO, Hiroharu
>Priority: Minor
> Attachments: LUCENE-6837.patch
>
>
> Japanese morphological analyzers often generate mis-segmented tokens. N-best 
> output reduces the impact of mis-segmentation on search result. N-best output 
> is more meaningful than character N-gram, and it increases hit count too.
> If you use N-best output, you can get decompounded tokens (ex: 
> "シニアソフトウェアエンジニア" => {"シニア", "シニアソフトウェアエンジニア", "ソフトウェア", "エンジニア"}) and 
> overwrapped tokens (ex: "数学部長谷川" => {"数学", "部", "部長", "長谷川", "谷川"}), 
> depending on the dictionary and N-best parameter settings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7488) suspicious FVH init code in DefaultSolrHighlighter even when FVH should not be used

2015-04-29 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520776#comment-14520776
 ] 

Koji Sekiguchi commented on SOLR-7488:
--

Thanks David and Hoss!

> suspicious FVH init code in DefaultSolrHighlighter even when FVH should not 
> be used
> ---
>
> Key: SOLR-7488
> URL: https://issues.apache.org/jira/browse/SOLR-7488
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.10
>Reporter: Hoss Man
>Assignee: David Smiley
> Fix For: Trunk, 5.2
>
>
> Rich Hume reported gettting errors from FastVectorHighlighter, evidently 
> while using the the surround query parser, even though he was not trying to  
> "useFastVectorHighlighter"
> my naive reading of the code leads me to believe that DefaultSolrHighlighter 
> is incorrectly attempting to initialize a FVH instance even when it shouldn't 
> be -- which appears to cause failures in cases where the query in use is not 
> something that can be handled by the FVH.
> Not sure how to reproduce at the moment -- but the code smells fishy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3055) Use NGramPhraseQuery in Solr

2014-12-27 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259553#comment-14259553
 ] 

Koji Sekiguchi commented on SOLR-3055:
--

Hi Uchida-san, thank you for your effort for reworking this issue!

According to your observation (pros and cons), I like the 1st strategy to go 
on. And if you agree, why don't you add test cases for that one? And also, 
don't we need to consider other n-gram type Tokenizers even TokenFilters, such 
as NGramTokenFilter and CJKBigramFilter?

And, I think there is a restriction when minGramSize != maxGramSize. If it's 
not significant, I think we can examine the restriction separately from this 
issue because we rarely set different values to those for searching CJK words. 
But we use a lot NGramTokenizer with fixed gram size for searching CJK words, 
and we could get a nice performance gain by the patch as you've showed us.

> Use NGramPhraseQuery in Solr
> 
>
> Key: SOLR-3055
> URL: https://issues.apache.org/jira/browse/SOLR-3055
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis, search
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-3055-1.patch, SOLR-3055-2.patch, SOLR-3055.patch, 
> schema.xml, solrconfig.xml
>
>
> Solr should use NGramPhraseQuery when searching with default slop on n-gram 
> field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6876) Remove unused legacy scripts.conf

2014-12-22 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14256358#comment-14256358
 ] 

Koji Sekiguchi commented on SOLR-6876:
--

I think scripts.conf file is not for DIH, but it is for replication scripts. 
sold/scripts/scripts-util includes scripts.conf and scripts-util is included 
from many scripts in solr/scripts directory.

I don't know Solr users in the world still use shell script based replication, 
except me.

> Remove unused legacy scripts.conf
> -
>
> Key: SOLR-6876
> URL: https://issues.apache.org/jira/browse/SOLR-6876
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.10.2, 5.0, Trunk
>Reporter: Alexandre Rafalovitch
>Assignee: Erick Erickson
>Priority: Minor
>
> Some of the example collections include *scripts.conf* in the *conf* 
> directory. It is not used by anything in the distribution and is somehow left 
> over from the Solr 1.x legacy days.
> It should be possible to safe delete it to avoid confusing users trying to 
> understand what different files actually do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3055) Use NGramPhraseQuery in Solr

2014-12-22 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255811#comment-14255811
 ] 

Koji Sekiguchi commented on SOLR-3055:
--

Thank you for paying attention to this ticket! It's good to me you start this 
in Lucene.

> Use NGramPhraseQuery in Solr
> 
>
> Key: SOLR-3055
> URL: https://issues.apache.org/jira/browse/SOLR-3055
> Project: Solr
>  Issue Type: New Feature
>  Components: Schema and Analysis, search
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-3055.patch
>
>
> Solr should use NGramPhraseQuery when searching with default slop on n-gram 
> field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-6112) Compile error with FST package example code

2014-12-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-6112.

   Resolution: Fixed
Fix Version/s: Trunk
   5.0

Thanks, Uchida-san!

> Compile error with FST package example code
> ---
>
> Key: LUCENE-6112
> URL: https://issues.apache.org/jira/browse/LUCENE-6112
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/FSTs
>Affects Versions: 4.10.2
>Reporter: Tomoko Uchida
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 5.0, Trunk
>
> Attachments: LUCENE-6112.patch
>
>
> I run the FST construction example guided package.html with lucene 4.10, and 
> found a compile error.
> http://lucene.apache.org/core/4_10_2/core/index.html?org/apache/lucene/util/fst/package-summary.html
> javac claimed as below.
> "FSTTest" is my test class, just copied from javadoc's example.
> {code}
> $ javac -cp /opt/lucene-4.10.2/core/lucene-core-4.10.2.jar FSTTest.java 
> FSTTest.java:28: error: method toIntsRef in class Util cannot be applied to 
> given types;
>   builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]);
>   ^
>   required: BytesRef,IntsRefBuilder
>   found: BytesRef,IntsRef
>   reason: actual argument IntsRef cannot be converted to IntsRefBuilder by 
> method invocation conversion
> Note: FSTTest.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 1 error
> {code}
> I modified scratchInts variable type from IntsRef to IntsRefBuilder, it 
> worked fine. (I checked o.a.l.u.fst.TestFSTs.java TestCase and my 
> modification seems to be correct.)
> Util.toIntsRef() method takes IntsRefBuilder as 2nd argument instead of 
> IntsRef since 4.10, so Javadocs also should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-6112) Compile error with FST package example code

2014-12-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned LUCENE-6112:
--

Assignee: Koji Sekiguchi

> Compile error with FST package example code
> ---
>
> Key: LUCENE-6112
> URL: https://issues.apache.org/jira/browse/LUCENE-6112
> Project: Lucene - Core
>  Issue Type: Task
>  Components: core/FSTs
>Affects Versions: 4.10.2
>Reporter: Tomoko Uchida
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-6112.patch
>
>
> I run the FST construction example guided package.html with lucene 4.10, and 
> found a compile error.
> http://lucene.apache.org/core/4_10_2/core/index.html?org/apache/lucene/util/fst/package-summary.html
> javac claimed as below.
> "FSTTest" is my test class, just copied from javadoc's example.
> {code}
> $ javac -cp /opt/lucene-4.10.2/core/lucene-core-4.10.2.jar FSTTest.java 
> FSTTest.java:28: error: method toIntsRef in class Util cannot be applied to 
> given types;
>   builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]);
>   ^
>   required: BytesRef,IntsRefBuilder
>   found: BytesRef,IntsRef
>   reason: actual argument IntsRef cannot be converted to IntsRefBuilder by 
> method invocation conversion
> Note: FSTTest.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 1 error
> {code}
> I modified scratchInts variable type from IntsRef to IntsRefBuilder, it 
> worked fine. (I checked o.a.l.u.fst.TestFSTs.java TestCase and my 
> modification seems to be correct.)
> Util.toIntsRef() method takes IntsRefBuilder as 2nd argument instead of 
> IntsRef since 4.10, so Javadocs also should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5674) A new token filter: SubSequence

2014-05-26 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009140#comment-14009140
 ] 

Koji Sekiguchi commented on LUCENE-5674:


bq. Koji: it can't do what I'm trying to do. Have you looked at my description?

Please ignore my comment Nitzan as it was just for what Otis described, and 
PathHierarchyTokenizer is a Tokenizer, not TokenFilter. :)

> A new token filter: SubSequence
> ---
>
> Key: LUCENE-5674
> URL: https://issues.apache.org/jira/browse/LUCENE-5674
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nitzan Shaked
>Priority: Minor
> Attachments: subseqfilter.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A new configurable token filter which, given a token breaks it into sub-parts 
> and outputs consecutive sub-sequences of those sub-parts.
> Useful for, for example, using during indexing to generate variations on 
> domain names, so that "www.google.com" can be found by searching for 
> "google.com", or "www.google.com".
> Parameters:
> sepRegexp: A regular expression used split incoming tokens into sub-parts.
> glue: A string used to concatenate sub-parts together when creating 
> sub-sequences.
> minLen: Minimum length (in sub-parts) of output sub-sequences
> maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for 
> unlimited; negative numbers for token length in sub-parts minus specified 
> length)
> anchor: Anchor.START to output only prefixes, or Anchor.END to output only 
> suffixes, or Anchor.NONE to output any sub-sequence
> withOriginal: whether to output also the original token
> EDIT: now includes tests for filter and for factory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5674) A new token filter: SubSequence

2014-05-26 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008838#comment-14008838
 ] 

Koji Sekiguchi commented on LUCENE-5674:


{quote}
Didn't look at this, but I remember needing/writing something like this 10+ 
years ago but I think back then I wanted to have output be something like: 
com, com.google, com.google.www - i.e. tokenized, but reversed order.
{quote}

PathHierarchyTokenizer can tokenize something like that.

> A new token filter: SubSequence
> ---
>
> Key: LUCENE-5674
> URL: https://issues.apache.org/jira/browse/LUCENE-5674
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nitzan Shaked
>Priority: Minor
> Attachments: subseqfilter.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A new configurable token filter which, given a token breaks it into sub-parts 
> and outputs consecutive sub-sequences of those sub-parts.
> Useful for, for example, using during indexing to generate variations on 
> domain names, so that "www.google.com" can be found by searching for 
> "google.com", or "www.google.com".
> Parameters:
> sepRegexp: A regular expression used split incoming tokens into sub-parts.
> glue: A string used to concatenate sub-parts together when creating 
> sub-sequences.
> minLen: Minimum length (in sub-parts) of output sub-sequences
> maxLen: Maximum length (in sub-parts) of output sub-sequences (0 for 
> unlimited; negative numbers for token length in sub-parts minus specified 
> length)
> anchor: Anchor.START to output only prefixes, or Anchor.END to output only 
> suffixes, or Anchor.NONE to output any sub-sequence
> withOriginal: whether to output also the original token
> EDIT: now includes tests for filter and for factory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier

2014-02-27 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved LUCENE-5466.


   Resolution: Fixed
Fix Version/s: 5.0
   4.8

> query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
> --
>
> Key: LUCENE-5466
> URL: https://issues.apache.org/jira/browse/LUCENE-5466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/classification
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 4.8, 5.0
>
> Attachments: LUCENE-5466.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier

2014-02-27 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned LUCENE-5466:
--

Assignee: Koji Sekiguchi

> query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
> --
>
> Key: LUCENE-5466
> URL: https://issues.apache.org/jira/browse/LUCENE-5466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/classification
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Attachments: LUCENE-5466.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier

2014-02-22 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5466:
---

Attachment: LUCENE-5466.patch

I think query must be set before calling countDocsWithClass() in train() method.

> query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier
> --
>
> Key: LUCENE-5466
> URL: https://issues.apache.org/jira/browse/LUCENE-5466
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/classification
>Reporter: Koji Sekiguchi
>Priority: Trivial
> Attachments: LUCENE-5466.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5466) query is always null in countDocsWithClass() of SimpleNaiveBayesClassifier

2014-02-22 Thread Koji Sekiguchi (JIRA)
Koji Sekiguchi created LUCENE-5466:
--

 Summary: query is always null in countDocsWithClass() of 
SimpleNaiveBayesClassifier
 Key: LUCENE-5466
 URL: https://issues.apache.org/jira/browse/LUCENE-5466
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/classification
Reporter: Koji Sekiguchi
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5365) Bad version of common-compress

2013-10-18 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799122#comment-13799122
 ] 

Koji Sekiguchi commented on SOLR-5365:
--

Input from Guido Medina in solr ML:

http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3C52612E0F.2080302%40temetra.com%3E

{quote}
Dont, commons compress 1.5 is broken, either use 1.4.1 or later. Our app 
stopped compressing properly for a maven update.

Guido.
{quote}


> Bad version of common-compress
> --
>
> Key: SOLR-5365
> URL: https://issues.apache.org/jira/browse/SOLR-5365
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 4.4, 4.5
> Environment: MS Windows 2008 Release 2
>Reporter: Roland Everaert
>
> When a WMZ file is sent to solr on resource /update/extract, the following 
> exception is thrown by solr:
> ERROR - 2013-10-17 18:13:48.902; org.apache.solr.common.SolrException; 
> null:java.lang.RuntimeException: java.lang.NoSuchMethodError: 
> org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V
> at 
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:673)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:383)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
> at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
> at 
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
> at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at 
> org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1852)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.commons.compress.compressors.CompressorStreamFactory.setDecompressConcatenated(Z)V
> at 
> org.apache.tika.parser.pkg.CompressorParser.parse(CompressorParser.java:102)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at 
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> ... 16 more
> According to Koji Sekiguchi, Tika 1.4, the version bundled with solr, should 
> use common-compress-1.5, but version 1.4.1 is present in 
> solr/contrib/extraction/lib/ directory.
> During our testing, the ignoreTikaException flag was set to true.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

Oops, replace the previous funny name by this patch. Sorry for the noise.

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
> LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: (was: LUCENE-5252_b4.patch)

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
> LUCENE-5252_4x.patch, LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_b4.patch

New patch. As for some reason, I give up to support one-way synonym in 
NGramSynonymTokenizer, I removed indexMode parameter in this patch.

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
> LUCENE-5252_4x.patch, LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-15 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

Fix code regarding one-way synonym (aaa=>bbb).

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
> LUCENE-5252_4x.patch, LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

Fix a bug regarding ignoreCase in the attached patch.

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
> LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-07 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Description: 
I'd like to propose that we have another n-gram tokenizer which can process 
synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
size is fixed, i.e. minGramSize = maxGramSize.

Today, I think we have the following problems when using SynonymFilter with 
NGramTokenizer. 
For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
expand=true and N = 2 (2-gram).

# There is no consensus (I think :-) how we assign offsets to generated synonym 
tokens DE, EF and FG when expanding source token AB and BC.
# If the query pattern looks like ABCY, it cannot be matched even if there is a 
document "…ABCY…" in index when autoGeneratePhraseQueries set to true, because 
there is no "CY" token (but "GY" is there) in the index.

NGramSynonymTokenizer can solve these problems by providing the following 
methods.

* NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
tokenize registered words. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|ABC|AB/DE/BC/EF/FG|ABC/DEFG|

* The back and forth of the registered words, NGramSynonymTokenizer generates 
*extra* tokens w/ posInc=0. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|

In the above sample, "Z" and "1" are the extra tokens.


  was:
I'd like to propose that we have another n-gram tokenizer which can process 
synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
size is fixed, i.e. minGramSize = maxGramSize.

Today, I think we have the following problems when using SynonymFilter with 
NGramTokenizer. 
For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
expand=true and N = 2 (2-gram).

# There is no consensus (I think :-) how we assign offsets to generated synonym 
tokens DE, EF and FG when expanding source token AB and BC.
# If the query pattern looks like XABC or ABCY, it cannot be matched even if 
there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
true, because there is no "XA" or "CY" tokens in the index.

NGramSynonymTokenizer can solve these problems by providing the following 
methods.

* NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
tokenize registered words. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|ABC|AB/DE/BC/EF/FG|ABC/DEFG|

* The back and forth of the registered words, NGramSynonymTokenizer generates 
*extra* tokens w/ posInc=0. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|

In the above sample, "Z" and "1" are the extra tokens.



> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-

[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-03 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

New patch that has tests.

Because the original test was developed in RONDHUIT and it includes test codes 
for not only NGramSynonymTokenizer but also synonym dictionary, the attached 
tests may be redundant in terms of SynonymMap.

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like XABC or ABCY, it cannot be matched even if 
> there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
> true, because there is no "XA" or "CY" tokens in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-02 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783754#comment-13783754
 ] 

Koji Sekiguchi edited comment on LUCENE-5252 at 10/2/13 9:17 AM:
-

The draft patch without tests. As I don't have Java 7 environment for now, the 
patch is based on 4x branch.

When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie 
for the synonym dictionary.

I've tried to convert the code to Lucene's FST. As this is the first experience 
of FST for me, any inefficient code may exist. Comments are welcome!


was (Author: koji):
The draft patch without tests. As I don't have Java 7 environment for now, the 
patch is based on 4x branch.

When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie 
for the synonym dictionary.

I've tried to convert the code to Lucene's FST. As this is the first experience 
of FST for me, any inefficient code may be there. Comments are welcome!

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like XABC or ABCY, it cannot be matched even if 
> there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
> true, because there is no "XA" or "CY" tokens in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-02 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13783754#comment-13783754
 ] 

Koji Sekiguchi edited comment on LUCENE-5252 at 10/2/13 9:12 AM:
-

The draft patch without tests. As I don't have Java 7 environment for now, the 
patch is based on 4x branch.

When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie 
for the synonym dictionary.

I've tried to convert the code to Lucene's FST. As this is the first experience 
of FST for me, any inefficient code may be there. Comments are welcome!


was (Author: koji):
The draft patch without tests.

When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie 
for the synonym dictionary.

I've tried to convert the code to Lucene's FST. As this is the first experience 
of FST for me, any inefficient code may be there. Comments are welcome!

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like XABC or ABCY, it cannot be matched even if 
> there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
> true, because there is no "XA" or "CY" tokens in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-02 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

The draft patch without tests.

When NGramSynonymTokenizer was developed in RONDHUIT, it used double array trie 
for the synonym dictionary.

I've tried to convert the code to Lucene's FST. As this is the first experience 
of FST for me, any inefficient code may be there. Comments are welcome!

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like XABC or ABCY, it cannot be matched even if 
> there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
> true, because there is no "XA" or "CY" tokens in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-5253) add NGramSynonymTokenizer

2013-10-02 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi closed LUCENE-5253.
--

Resolution: Duplicate

Sorry, duplicated.

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5253
> URL: https://issues.apache.org/jira/browse/LUCENE-5253
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like XABC or ABCY, it cannot be matched even if 
> there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
> true, because there is no "XA" or "CY" tokens in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5253) add NGramSynonymTokenizer

2013-10-02 Thread Koji Sekiguchi (JIRA)
Koji Sekiguchi created LUCENE-5253:
--

 Summary: add NGramSynonymTokenizer
 Key: LUCENE-5253
 URL: https://issues.apache.org/jira/browse/LUCENE-5253
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Koji Sekiguchi
Priority: Minor


I'd like to propose that we have another n-gram tokenizer which can process 
synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
size is fixed, i.e. minGramSize = maxGramSize.

Today, I think we have the following problems when using SynonymFilter with 
NGramTokenizer. 
For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
expand=true and N = 2 (2-gram).

# There is no consensus (I think :-) how we assign offsets to generated synonym 
tokens DE, EF and FG when expanding source token AB and BC.
# If the query pattern looks like XABC or ABCY, it cannot be matched even if 
there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
true, because there is no "XA" or "CY" tokens in the index.

NGramSynonymTokenizer can solve these problems by providing the following 
methods.

* NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
tokenize registered words. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|ABC|AB/DE/BC/EF/FG|ABC/DEFG|

* The back and forth of the registered words, NGramSynonymTokenizer generates 
*extra* tokens w/ posInc=0. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|

In the above sample, "Z" and "1" are the extra tokens.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-02 Thread Koji Sekiguchi (JIRA)
Koji Sekiguchi created LUCENE-5252:
--

 Summary: add NGramSynonymTokenizer
 Key: LUCENE-5252
 URL: https://issues.apache.org/jira/browse/LUCENE-5252
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Koji Sekiguchi
Priority: Minor


I'd like to propose that we have another n-gram tokenizer which can process 
synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
size is fixed, i.e. minGramSize = maxGramSize.

Today, I think we have the following problems when using SynonymFilter with 
NGramTokenizer. 
For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
expand=true and N = 2 (2-gram).

# There is no consensus (I think :-) how we assign offsets to generated synonym 
tokens DE, EF and FG when expanding source token AB and BC.
# If the query pattern looks like XABC or ABCY, it cannot be matched even if 
there is a document "…XABCY…" in index when autoGeneratePhraseQueries set to 
true, because there is no "XA" or "CY" tokens in the index.

NGramSynonymTokenizer can solve these problems by providing the following 
methods.

* NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
tokenize registered words. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|ABC|AB/DE/BC/EF/FG|ABC/DEFG|

* The back and forth of the registered words, NGramSynonymTokenizer generates 
*extra* tokens w/ posInc=0. e.g.

||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
|XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|

In the above sample, "Z" and "1" are the extra tokens.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept analyzer attribute

2013-07-17 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711130#comment-13711130
 ] 

Koji Sekiguchi commented on SOLR-3359:
--

bq. Is there any particular reason why this enhancement is not targeted at 4.x 
as well?

Well, my motivation was that CJKTokenizer(Factory) marked as deprecated and it 
would be gone at 5.0. If someone provide a patch for 4.x, I'm happy to commit 
it.

bq. Also, could the title summary be updated to reflect the fact that the 
change specifies the analyzer class name rather than "fieldType"?

Done.

> SynonymFilterFactory should accept analyzer attribute
> -
>
> Key: SOLR-3359
> URL: https://issues.apache.org/jira/browse/SOLR-3359
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
> Fix For: 5.0
>
> Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch
>
>
> I've not been realized that CJKTokenizer and its factory classes was marked 
> deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
> {code}
>  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
> LowerCaseFilter instead.   
> {code}
> I agree with the idea of using the chain of the Tokenizer and TokenFilters 
> instead of CJKTokenizer, but it could be a problem for the existing users of 
> SynonymFilterFactory with CJKTokenizerFactory.
> So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3359) SynonymFilterFactory should accept analyzer attribute

2013-07-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-3359:
-

Summary: SynonymFilterFactory should accept analyzer attribute  (was: 
SynonymFilterFactory should accept fieldType attribute rather than 
tokenizerFactory)

> SynonymFilterFactory should accept analyzer attribute
> -
>
> Key: SOLR-3359
> URL: https://issues.apache.org/jira/browse/SOLR-3359
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
> Fix For: 5.0
>
> Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch
>
>
> I've not been realized that CJKTokenizer and its factory classes was marked 
> deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
> {code}
>  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
> LowerCaseFilter instead.   
> {code}
> I agree with the idea of using the chain of the Tokenizer and TokenFilters 
> instead of CJKTokenizer, but it could be a problem for the existing users of 
> SynonymFilterFactory with CJKTokenizerFactory.
> So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory

2013-07-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-3359.
--

   Resolution: Fixed
Fix Version/s: 5.0

Thanks, Onodera-san!

> SynonymFilterFactory should accept fieldType attribute rather than 
> tokenizerFactory
> ---
>
> Key: SOLR-3359
> URL: https://issues.apache.org/jira/browse/SOLR-3359
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
> Fix For: 5.0
>
> Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch
>
>
> I've not been realized that CJKTokenizer and its factory classes was marked 
> deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
> {code}
>  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
> LowerCaseFilter instead.   
> {code}
> I agree with the idea of using the chain of the Tokenizer and TokenFilters 
> instead of CJKTokenizer, but it could be a problem for the existing users of 
> SynonymFilterFactory with CJKTokenizerFactory.
> So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory

2013-07-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-3359:


Assignee: Koji Sekiguchi

> SynonymFilterFactory should accept fieldType attribute rather than 
> tokenizerFactory
> ---
>
> Key: SOLR-3359
> URL: https://issues.apache.org/jira/browse/SOLR-3359
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
> Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch
>
>
> I've not been realized that CJKTokenizer and its factory classes was marked 
> deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
> {code}
>  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
> LowerCaseFilter instead.   
> {code}
> I agree with the idea of using the chain of the Tokenizer and TokenFilters 
> instead of CJKTokenizer, but it could be a problem for the existing users of 
> SynonymFilterFactory with CJKTokenizerFactory.
> So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory

2013-07-15 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13708459#comment-13708459
 ] 

Koji Sekiguchi commented on SOLR-3359:
--

When I opened the ticket, I thought SynonymFilterFactory should accept (Solr's) 
fieldType attribute as I told in the title.

But today, as SynonymFilterFactory is in Lucene land, I think analyzer 
attribute is more natural than (Solr's) fieldType attribute.

I'd like to commit the patch in a few days if no one objects.

> SynonymFilterFactory should accept fieldType attribute rather than 
> tokenizerFactory
> ---
>
> Key: SOLR-3359
> URL: https://issues.apache.org/jira/browse/SOLR-3359
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Koji Sekiguchi
> Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch
>
>
> I've not been realized that CJKTokenizer and its factory classes was marked 
> deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
> {code}
>  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
> LowerCaseFilter instead.   
> {code}
> I agree with the idea of using the chain of the Tokenizer and TokenFilters 
> instead of CJKTokenizer, but it could be a problem for the existing users of 
> SynonymFilterFactory with CJKTokenizerFactory.
> So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3359) SynonymFilterFactory should accept fieldType attribute rather than tokenizerFactory

2013-07-13 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707748#comment-13707748
 ] 

Koji Sekiguchi commented on SOLR-3359:
--

bq. So, I made SynonymFilterFactory accept analyzer attribute so that I can 
specify CJKAnalyzer.

I've never thought up analyzer. Interesting idea. :)

> SynonymFilterFactory should accept fieldType attribute rather than 
> tokenizerFactory
> ---
>
> Key: SOLR-3359
> URL: https://issues.apache.org/jira/browse/SOLR-3359
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Koji Sekiguchi
> Attachments: 0001-Make-SynonymFilterFactory-accept-analyzer-attr.patch
>
>
> I've not been realized that CJKTokenizer and its factory classes was marked 
> deprecated in 3.6/4.0 (the ticket is LUCENE-2906) until someone talked to me.
> {code}
>  * @deprecated Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and 
> LowerCaseFilter instead.   
> {code}
> I agree with the idea of using the chain of the Tokenizer and TokenFilters 
> instead of CJKTokenizer, but it could be a problem for the existing users of 
> SynonymFilterFactory with CJKTokenizerFactory.
> So this ticket comes to my mind again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-16 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-4751.
--

Resolution: Fixed

Committed on trunk, branch_4x and lucene_solr_4_3. Thanks!

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 5.0, 4.4, 4.3.1
>
> Attachments: SOLR-4751.patch, SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-16 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-4751:
-

Fix Version/s: 4.3.1
   4.4
   5.0

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 5.0, 4.4, 4.3.1
>
> Attachments: SOLR-4751.patch, SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-16 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13660197#comment-13660197
 ] 

Koji Sekiguchi commented on SOLR-4751:
--

Looks good! I'll commit shortly.

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-4751.patch, SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-10 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654556#comment-13654556
 ] 

Koji Sekiguchi commented on SOLR-4751:
--

Sorry for the inconvenience.

I'll contact the reporter and see if we can fix the tests.

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-4751:
-

Fix Version/s: (was: 4.4)
   (was: 5.0)

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-10 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-4751:
-

Fix Version/s: 4.4
   5.0

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 5.0, 4.4
>
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-10 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654461#comment-13654461
 ] 

Koji Sekiguchi commented on SOLR-4751:
--

I committed the patch on trunk, branch_4x and 4.3.

Osuka-san, can you check one of them and see that your fix can solve the 
problem?

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-07 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-4751:


Assignee: Koji Sekiguchi

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Assignee: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-07 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651578#comment-13651578
 ] 

Koji Sekiguchi commented on SOLR-4751:
--

bq. Need to find files recursively.

The other day, I talked Osuka-san and I got the problem. It've come into being 
there since 3.6 as subdirectory (lang/) was created under conf directory.

Although it'd better to have test cases for replication, but as I don't have 
time and the patch looks simple, I'd like to commit in a few days if no one 
objects.

Meanwhile, updates of the patch are welcome. :)

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Priority: Minor
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles in replication settings,
> {code:xml}
>   
>
>  commit
>  startup
>   name="confFiles">schema.xml,stopwords.txt,lang/stopwords_ja.txt
>
>   
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4751) The replication problem of the file in a subdirectory.

2013-05-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646470#comment-13646470
 ] 

Koji Sekiguchi commented on SOLR-4751:
--

Hi Osuka-san, would you elaborate the problem a little more, please?

> The replication problem of the file in a subdirectory.
> --
>
> Key: SOLR-4751
> URL: https://issues.apache.org/jira/browse/SOLR-4751
> Project: Solr
>  Issue Type: Bug
>  Components: replication (java)
>Affects Versions: 4.2.1
>Reporter: Minoru Osuka
>Priority: Minor
> Attachments: SOLR-4751.patch
>
>
> When set lang/stopwords_ja.txt to confFiles,
> {code:xml}
> 
>   commit
>   startup
>   schema.xml,stopwords.txt,lang/stopwords_ja.txt
> 
> {code}
> Only stopwords_ja.txt exists in solr/collection1/conf/lang directory on slave 
> node.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4717) SimpleFacets should respect localParams

2013-04-16 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632758#comment-13632758
 ] 

Koji Sekiguchi commented on SOLR-4717:
--

+1

> SimpleFacets should respect localParams
> ---
>
> Key: SOLR-4717
> URL: https://issues.apache.org/jira/browse/SOLR-4717
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Attachments: SOLR-4717-FacetLocalParams.patch
>
>
> In trying to implement http://wiki.apache.org/solr/HierarchicalFaceting I 
> found the need to send multiple prefix facets in the same request on the same 
> field.
> Currently facet params will parse the localParams, but only use them to pick 
> out a name.  We can easily modify things to let localParams override global 
> ones.  For example:
> {code}
> &{!key=level3 facet.prefix=3/path/to/folder}path
> &{!key=level2 facet.prefix=2/path/to}path
> &{!key=level1 facet.prefix=1/path}path
> {code}
> This can easily be supported if we use:
> {code:java}
> params = SolrParams.wrapDefaults(localParams, orig);
> {code}
> when local params exist
> --
> We have come a long way from *simple* facets!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4899) FastVectorHighlihgter fails with SIOOB if single phrase or term is > fragCharSize

2013-04-05 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13623568#comment-13623568
 ] 

Koji Sekiguchi commented on LUCENE-4899:


Looks good! Sounds reasonable and I like the idea.

> FastVectorHighlihgter fails with SIOOB if single phrase or term is > 
> fragCharSize
> -
>
> Key: LUCENE-4899
> URL: https://issues.apache.org/jira/browse/LUCENE-4899
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Affects Versions: 4.0, 4.1, 4.2, 3.6.2, 4.2.1
>Reporter: Simon Willnauer
> Fix For: 5.0, 4.3
>
> Attachments: LUCENE-4899.patch, LUCENE-4899.patch
>
>
> This has been reported on several occasions like SOLR-4660 /  SOLR-4137 or on 
> the ES mailing list 
> https://groups.google.com/d/msg/elasticsearch/IdyMSPK5gao/nKZq8_NYWmgJ
> The reason is that the current code expects the fragCharSize > matchLength 
> which is not necessarily true if you use phrases or if you have very long 
> terms like URLs or so. I have a test that reproduces the issue and a fix as 
> far as I can tell (me doesn't have much experience with the highlighter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4899) FastVectorHighlihgter fails with SIOOB if single phrase or term is > fragCharSize

2013-04-04 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13622441#comment-13622441
 ] 

Koji Sekiguchi commented on LUCENE-4899:


Looks good, Simon!

> FastVectorHighlihgter fails with SIOOB if single phrase or term is > 
> fragCharSize
> -
>
> Key: LUCENE-4899
> URL: https://issues.apache.org/jira/browse/LUCENE-4899
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Affects Versions: 4.0, 4.1, 4.2, 3.6.2, 4.2.1
>Reporter: Simon Willnauer
> Fix For: 5.0, 4.3
>
> Attachments: LUCENE-4899.patch
>
>
> This has been reported on several occasions like SOLR-4660 /  SOLR-4137 or on 
> the ES mailing list 
> https://groups.google.com/d/msg/elasticsearch/IdyMSPK5gao/nKZq8_NYWmgJ
> The reason is that the current code expects the fragCharSize > matchLength 
> which is not necessarily true if you use phrases or if you have very long 
> terms like URLs or so. I have a test that reproduces the issue and a fix as 
> far as I can tell (me doesn't have much experience with the highlighter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive

2013-01-25 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13563308#comment-13563308
 ] 

Koji Sekiguchi commented on LUCENE-1822:


I committed the above note to trunk, branch_4x and lucene_solr_4_1.

> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> --
>
> Key: LUCENE-1822
> URL: https://issues.apache.org/jira/browse/LUCENE-1822
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, 
> LUCENE-1822-tests.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and Punishment" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive

2013-01-23 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13561187#comment-13561187
 ] 

Koji Sekiguchi commented on LUCENE-1822:


Here is the patch for trunk. I think the original reporter Alex describes the 
heart of the problem very well, so I borrow the description. :)

{code}
$ svn diff
Index: lucene/CHANGES.txt
===
--- lucene/CHANGES.txt  (revision 1437783)
+++ lucene/CHANGES.txt  (working copy)
@@ -414,6 +414,13 @@
   This only affects requests with depth>1. If you execute such requests and
   rely on the facet results being returned flat (i.e. no hierarchy), you should
   set the ResultMode to GLOBAL_FLAT. (Shai Erera, Gilad Barkai) 
+
+* LUCENE-1822: Improves the text window selection by recalculating the 
starting margin
+  once all phrases in the fragment have been identified in 
FastVectorHighlighter. This
+  way if a single word is matched in a fragment, it will appear in the middle 
of the highlight,
+  instead of 6 characters from the beginning. This way one can also guarantee 
that
+  the entirety of short texts are represented in a fragment by specifying a 
large
+  enough fragCharSize.
   
 Optimizations
 
{code}


> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> --
>
> Key: LUCENE-1822
> URL: https://issues.apache.org/jira/browse/LUCENE-1822
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, 
> LUCENE-1822-tests.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and Punishment" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive

2013-01-23 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560692#comment-13560692
 ] 

Koji Sekiguchi commented on LUCENE-1822:


Sure, that's great. Do you have a draft note?

> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> --
>
> Key: LUCENE-1822
> URL: https://issues.apache.org/jira/browse/LUCENE-1822
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, 
> LUCENE-1822-tests.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and Punishment" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-1822) FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too naive

2013-01-23 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560604#comment-13560604
 ] 

Koji Sekiguchi commented on LUCENE-1822:


Uh, Simon, sorry for my lack of prudence. 

> FastVectorHighlighter: SimpleFragListBuilder hard-coded 6 char margin is too 
> naive
> --
>
> Key: LUCENE-1822
> URL: https://issues.apache.org/jira/browse/LUCENE-1822
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 2.9
> Environment: any
>Reporter: Alex Vigdor
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-1822.patch, LUCENE-1822.patch, LUCENE-1822.patch, 
> LUCENE-1822-tests.patch
>
>
> The new FastVectorHighlighter performs extremely well, however I've found in 
> testing that the window of text chosen per fragment is often very poor, as it 
> is hard coded in SimpleFragListBuilder to always select starting 6 characters 
> to the left of the first phrase match in a fragment.  When selecting long 
> fragments, this often means that there is barely any context before the 
> highlighted word, and lots after; even worse, when highlighting a phrase at 
> the end of a short text the beginning is cut off, even though the entire 
> phrase would fit in the specified fragCharSize.  For example, highlighting 
> "Punishment" in "Crime and Punishment"  returns "e and Punishment" no 
> matter what fragCharSize is specified.  I am going to attach a patch that 
> improves the text window selection by recalculating the starting margin once 
> all phrases in the fragment have been identified - this way if a single word 
> is matched in a fragment, it will appear in the middle of the highlight, 
> instead of 6 characters from the beginning.  This way one can also guarantee 
> that the entirety of short texts are represented in a fragment by specifying 
> a large enough fragCharSize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-22 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-4330.
--

   Resolution: Fixed
Fix Version/s: 3.6.3
   5.0
   4.2

> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Fix For: 4.2, 5.0, 3.6.3
>
> Attachments: SOLR-4330.patch, SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-21 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-4330:


Assignee: Koji Sekiguchi

> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Trivial
> Attachments: SOLR-4330.patch, SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-21 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-4330:
-

Attachment: SOLR-4330.patch

I added a test case which will be failed if the patch is not applied.

> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Priority: Trivial
> Attachments: SOLR-4330.patch, SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4330) group.sort is ignored when using truncate and ex/tag local params

2013-01-21 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-4330:
-

Attachment: SOLR-4330.patch

> group.sort is ignored when using truncate and ex/tag local params
> -
>
> Key: SOLR-4330
> URL: https://issues.apache.org/jira/browse/SOLR-4330
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 3.6, 4.0, 4.1, 5.0
>Reporter: Koji Sekiguchi
>Priority: Trivial
> Attachments: SOLR-4330.patch
>
>
> In parseParams method of SimpleFacets, as group sort is not set after 
> creating grouping object, member variable groupSort is always null. Because 
> of it, AbstractAllGroupHeadsCollector with default sort (new Sort()) is 
> created all the time.
> {code}
> public AbstractAllGroupHeadsCollector createAllGroupCollector() throws 
> IOException {
>   Sort sortWithinGroup = groupSort != null ? groupSort : new Sort();
>   return TermAllGroupHeadsCollector.create(groupBy, sortWithinGroup);
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   >