[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464841#comment-16464841 ] Abhishek Kumar Singh commented on SOLR-11741: - In order to use LearnSchemaUpdateRequestProcessorFactory, add just it to the URP chain. The new API details are :- # *_Get A Training Id:_*** *_GET_* *_//schema/train/start_* Response: {code:java} {"schemaTrainingId" : ""} {code} *2. Start Training:* This api is just like another update api, with documents to be trained with. *POST* *//update?schemaTrainingId=* {code:java} Body: (Same as update request) [{}] {code} *3. Get the schema trained so far:-* *GET* */schema/train/yield?schemaTrainingId=* *Response:* {code:java} { "schema":{ "add-field-type": [ { "name":, "type":, "multivalued":}, { "name":, "type":, "multivalued":}, ... ] } } {code} ** > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: SOLR-11741.patch > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464831#comment-16464831 ] Abhishek Kumar Singh commented on SOLR-11741: - [^SOLR-11741.patch] Adding updated APIs. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464831#comment-16464831 ] Abhishek Kumar Singh edited comment on SOLR-11741 at 5/5/18 4:33 PM: - [^SOLR-11741.patch] Adding updated patch. was (Author: abhidemon): [^SOLR-11741.patch] Adding updated APIs. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: SOLR-11741.patch > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464830#comment-16464830 ] Abhishek Kumar Singh commented on SOLR-11741: - [^SOLR-11741.patch] > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464829#comment-16464829 ] Abhishek Kumar Singh edited comment on SOLR-11741 at 5/5/18 4:31 PM: - Uploading the updated patch, with following features:- A new URP LearnSchemaUpdateRequestProcessorFactory: It simply learns from the incoming data to check what the current data type looks like. Based on, it updates the metadata about each field. was (Author: abhidemon): Uploading the updated patch, with following features:- A new URP LearnSchemaUpdateRequestProcessorFactory: It simply learns from the incoming data to check what the current data type looks like. Based on, it updates the metadata about each field. APIs: > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464829#comment-16464829 ] Abhishek Kumar Singh commented on SOLR-11741: - Uploading the updated patch, with following features:- A new URP LearnSchemaUpdateRequestProcessorFactory: It simply learns from the incoming data to check what the current data type looks like. Based on, it updates the metadata about each field. APIs: > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: SOLR-11741.patch > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > SOLR-11741.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11624: Attachment: SOLR-11624.patch Uploading the updated patch with corrected documentation. [~ichattopadhyaya] [~dsmiley] > collection creation should not also overwrite/delete any configset but it can! > -- > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, > SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3089) Make ResponseBuilder.isDistrib public
[ https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-3089: --- Attachment: SOLR-3089.patch Uploading updated patch with a test case using the method *rb.isDistributed()* [~ichattopadhyaya] > Make ResponseBuilder.isDistrib public > - > > Key: SOLR-3089 > URL: https://issues.apache.org/jira/browse/SOLR-3089 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 4.0-ALPHA >Reporter: Rok Rejc > Fix For: 4.9, 6.0 > > Attachments: SOLR-3089.patch, Solr-3089.patch > > > Hi, > i have posted this issue on a mailing list but i didn't get any response. > I am trying to write distributed search component (a class that extends > SearchComponent). I have checked FacetComponent and TermsComponent. If I want > that search component works in a distributed environment I have to set > ResponseBuilder's isDistrib to true, like this (this is also done in > TermsComponent for example): > public void prepare(ResponseBuilder rb) throws IOException { > SolrParams params = rb.req.getParams(); > String shards = params.get(ShardParams.SHARDS); > if (shards != null) { > List lst = StrUtils.splitSmart(shards, ",", > true); > rb.shards = lst.toArray(new String[lst.size()]); > rb.isDistrib = true; > } > } > If I have my component outside the package org.apache.solr.handler.component > this doesn't work. Is it possible to make isDistrib public (or is this the > wrong procedure/behaviour/design)? > Many thanks, > Rok -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11837) More information required in README.md for Setting up project in IDEs
[ https://issues.apache.org/jira/browse/SOLR-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11837: Attachment: SOLR-11837.patch > More information required in README.md for Setting up project in IDEs > -- > > Key: SOLR-11837 > URL: https://issues.apache.org/jira/browse/SOLR-11837 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Abhishek Kumar Singh > Labels: documentation > Attachments: SOLR-11837.patch > > > Sometimes, the instructions mentioned on the README.md page is not enough to > set up the project in the IDEs. > The following *solr-wiki-page-links* are pretty useful, but are not present > on the README.md page. > https://wiki.apache.org/solr/HowToConfigureEclipse > https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ > https://wiki.apache.org/lucene-java/HowtoConfigureNetbeans > Having links on the README.md page will be quite helpful for beginners. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11837) More information required in README.md for Setting up project in IDEs
Abhishek Kumar Singh created SOLR-11837: --- Summary: More information required in README.md for Setting up project in IDEs Key: SOLR-11837 URL: https://issues.apache.org/jira/browse/SOLR-11837 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Abhishek Kumar Singh Sometimes, the instructions mentioned on the README.md page is not enough to set up the project in the IDEs. The following *solr-wiki-page-links* are pretty useful, but are not present on the README.md page. https://wiki.apache.org/solr/HowToConfigureEclipse https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ https://wiki.apache.org/lucene-java/HowtoConfigureNetbeans Having links on the README.md page will be quite helpful for beginners. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: (was: RuleForMostAccomodatingField.png) > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: RuleForMostAccomodatingField.png > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: RuleForMostAccomodatingField.png, > RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, screenshot-1.png, > screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315465#comment-16315465 ] Abhishek Kumar Singh edited comment on SOLR-11741 at 1/7/18 9:02 PM: - The above approach can be optimised by replacing the *Supported FieldTypes* by *_BitSets_* , As shown in the following table:- !screenshot-1.png! We can map every FieldType to a BitSet. For eg. *String will be 1* , *Long will be 00100* and so on.. 1. Now For every product, get the BitSet of the fieldType supported by each field 2. For every field, Find the *_BITWISE OR_* of the current BitSet with the BitSet value already recorded, and replace it. Use the following rule to decide the final FieldType that the field should have. !RuleForMostAccomodatingField.png! Say if a field called *price* has values as following values: In Product1 -> *12321 (Long, i.e. 00100)* In Product2 -> *77261.66 (Double, i.e. 01000)* The supported BitSet for *price* will have a final value of *[ 00100 OR 01000 = 01100 ]* , i.e. It should be assigned a Double. The above rule can be extended to any number of types, just the number of bits will increase accordingly. Using BitSets like above will decrease the storage space to 1 byte per field, will make the computation easier and faster, and will also remove the overhead of computing the trained schema separately, as they will be updated in-place with every Product. Every api call to ask for *Trained Schema*, will get the schema calculated till that point using the above rule. was (Author: abhidemon): The above approach can be optimised by replacing the *Supported FieldTypes* by *_BitSets_* , As shown in the following table:- !screenshot-1.png! We can map every FieldType to a BitSet. For eg. *String will be 1* , *Long will be 00100* and so on.. 1. Now For every product, get the BitSet of the fieldType supported by each field 2. For every field, Find the *_BITWISE OR_* of the current BitSet with the BitSet value already recorded, and replace it. Use the following rule to decide the final FieldType that the field should have. !screenshot-3.png! Say if a field called *price* has values as following values: In Product1 -> *12321 (Long, i.e. 00100)* In Product2 -> *77261.66 (Double, i.e. 01000)* The supported BitSet for *price* will have a final value of *[ 00100 OR 01000 = 01100 ]* , i.e. It should be assigned a Double. The above rule can be extended to any number of types, just the number of bits will increase accordingly. Using BitSets like above will decrease the storage space to 1 byte per field, will make the computation easier and faster, and will also remove the overhead of computing the trained schema separately, as they will be updated in-place with every Product. Every api call to ask for *Trained Schema*, will get the schema calculated till that point using the above rule. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: RuleForMostAccomodatingField.png > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, > screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315469#comment-16315469 ] Abhishek Kumar Singh edited comment on SOLR-11741 at 1/7/18 8:56 PM: - Uploading patch [^SOLR-11741-temp.patch] with above implementation within *AddSchemaFieldsUpdateProcessorFactory* itself. Have added a config param called mode. The above URP will just *Train The Schema* when *mode=train* . By default *mode=update*, i.e. update the schema as usual. This patch is temporary because it still needs test cases. Also, currently the state is being stored in-memory, in a map. Have to move that to the zookeeper. Will update that design in my next comments. was (Author: abhidemon): Uploading patch with above implementation within *AddSchemaFieldsUpdateProcessorFactory* itself. Have added a config param called mode. The above URP will just *Train The Schema* when *mode=train* . By default *mode=update*, i.e. update the schema as usual. This patch is temporary because it still needs test cases. Also, currently the state is being stored in-memory, in a map. Have to move that to the zookeeper. Will update that design in my next comments. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: SOLR-11741-temp.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315472#comment-16315472 ] Abhishek Kumar Singh commented on SOLR-11741: - @ [~ichattopadhyaya] bq. At every point in time, every field will be mapped to only one possible (most granular) field type, isn't it? Yes. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: SOLR-11741-temp.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: SOLR-11741-temp.patch Uploading patch with above implementation within *AddSchemaFieldsUpdateProcessorFactory* itself. Have added a config param called mode. The above URP will just *Train The Schema* when *mode=train* . By default *mode=update*, i.e. update the schema as usual. This patch is temporary because it still needs test cases. Also, currently the state is being stored in-memory, in a map. Have to move that to the zookeeper. Will update that design in my next comments. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: SOLR-11741-temp.patch, screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: (was: screenshot-2.png) > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: screenshot-1.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315465#comment-16315465 ] Abhishek Kumar Singh commented on SOLR-11741: - The above approach can be optimised by replacing the *Supported FieldTypes* by *_BitSets_* , As shown in the following table:- !screenshot-1.png! We can map every FieldType to a BitSet. For eg. *String will be 1* , *Long will be 00100* and so on.. 1. Now For every product, get the BitSet of the fieldType supported by each field 2. For every field, Find the *_BITWISE OR_* of the current BitSet with the BitSet value already recorded, and replace it. Use the following rule to decide the final FieldType that the field should have. !screenshot-3.png! Say if a field called *price* has values as following values: In Product1 -> *12321 (Long, i.e. 00100)* In Product2 -> *77261.66 (Double, i.e. 01000)* The supported BitSet for *price* will have a final value of *[ 00100 OR 01000 = 01100 ]* , i.e. It should be assigned a Double. The above rule can be extended to any number of types, just the number of bits will increase accordingly. Using BitSets like above will decrease the storage space to 1 byte per field, will make the computation easier and faster, and will also remove the overhead of computing the trained schema separately, as they will be updated in-place with every Product. Every api call to ask for *Trained Schema*, will get the schema calculated till that point using the above rule. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: screenshot-2.png > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: screenshot-3.png > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11741: Attachment: screenshot-1.png > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > Attachments: screenshot-1.png > > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314428#comment-16314428 ] Abhishek Kumar Singh commented on SOLR-11741: - What i was thinking was something similar to the above implementation, just that instead of recording every *value* that ever appeared for a field, I would record all the distinct *fieldTypes of the values* that appeared for a each field. This will be the mapping of *field -> supported types*. This will need very small storage. And instead of recording in memory, this data can be stored externally, (say _zookeeper_, or some _temporary index_ inside solr.). I think it will get rid of the following problem. bq. It doesn't play very nicely with distributed updates (you'd either have to ensure all training data was sent to the same node where you send the "commit" or add special custom logic to ensure it all got forwarded to a special node) and there are probably a lot more sophisticated / smarter ways to do it > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306797#comment-16306797 ] Abhishek Kumar Singh edited comment on SOLR-11624 at 12/30/17 2:39 PM: --- Also, Thanks for pointing this consistency out. bq. so if we have one configSet in ZooKeeper named "myconfig" and the user creates a collection "mycoll" (without specifying which config), then presumably we'll have two configSets: "myconfig" and "mycoll.AUTOCREATED_CONFIGSET". And this point there is no longer one configSet in ZooKeeper, nor is there "_default" for that matter. Does this mean if the user goes to create another collection similarly that it will fail? I think yes, it will fail. Looks like with the new ConfigName being added, this feature in particular will break. We can get rid of this proble by * Either depricating this feature of [using the only configset present|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/CreateCollectionCmd.java#L362], * Or making a new configSet named {{_default}} , whenever such a case arises. [~dsmiley] [~ichattopadhyaya] was (Author: abhidemon): Also, Thanks for pointing this consistency out. bq. so if we have one configSet in ZooKeeper named "myconfig" and the user creates a collection "mycoll" (without specifying which config), then presumably we'll have two configSets: "myconfig" and "mycoll.AUTOCREATED_CONFIGSET". And this point there is no longer one configSet in ZooKeeper, nor is there "_default" for that matter. Does this mean if the user goes to create another collection similarly that it will fail? I think yes, it will fail. Looks like with the new ConfigName being added, this feature in particular will break. We can get rid of this proble by * Either depricating this feature of [using the only configset present|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/CreateCollectionCmd.java#L362], * Or making a new configSet named {{_default}} , whenever such a case arises. > collection creation should not also overwrite/delete any configset but it can! > -- > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, > SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306797#comment-16306797 ] Abhishek Kumar Singh commented on SOLR-11624: - Also, Thanks for pointing this consistency out. bq. so if we have one configSet in ZooKeeper named "myconfig" and the user creates a collection "mycoll" (without specifying which config), then presumably we'll have two configSets: "myconfig" and "mycoll.AUTOCREATED_CONFIGSET". And this point there is no longer one configSet in ZooKeeper, nor is there "_default" for that matter. Does this mean if the user goes to create another collection similarly that it will fail? I think yes, it will fail. Looks like with the new ConfigName being added, this feature in particular will break. We can get rid of this proble by * Either depricating this feature of [using the only configset present|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/CreateCollectionCmd.java#L362], * Or making a new configSet named {{_default}} , whenever such a case arises. > collection creation should not also overwrite/delete any configset but it can! > -- > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, > SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11624: Attachment: SOLR-11624.patch [~dsmiley] , Thanks for the suggestions. Uploading the updated patch with following changes:- * Changed the suffix name to ".AUTOCREATED" * Explicitly setting the name of the created ConfigSet in TimeRoutedAliasUpdateProcessorTest * Modified the documentation to read.. bq. {color:#654982}*collection.configName*{color} : Defines the name of the configuration (which *must already be stored in ZooKeeper*) to use for this collection. If not provided, Solr will use the configuration of {{_default}} configSet *OR* the {{only configSet present}} (if there is only 1 config set in Zookeeper) to create a new (and mutable) configSet named {{.AUTOCREATED}} and will use it for the new collection. > collection creation should not also overwrite/delete any configset but it can! > -- > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, > SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11624: Attachment: SOLR-11624.4.patch Updated the patch with documentation. [~ichattopadhyaya] [~dsmiley] Kindly review the same. > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, > SOLR-11624.4.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286385#comment-16286385 ] Abhishek Kumar Singh edited comment on SOLR-11624 at 12/11/17 6:46 PM: --- Please find the updated patch here -> [^SOLR-11624.3.patch] Changes made:- 1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to create/update and use the correct {{modifiedConfigSet}} name. 2. Refactored the {{configName}} , added suffix to the name of _auto-generated configSet_. [~ichattopadhyaya] , [~dsmiley] Please review the patch. was (Author: asingh2411): Please find the updated patch here -> [^SOLR-11624.3.patch] Changes made:- 1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to create/update and use the correct {{modifiedConfigSet}} name. 2. Refactored the {{configName}} , added suffix to the name of _auto-generated configSet_. [~ichattopadhyaya] [~dsmiley] Please review the patch. > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11744) ConfigSetAdminRequest.CREATE should allow null in baseConfigSetName
Abhishek Kumar Singh created SOLR-11744: --- Summary: ConfigSetAdminRequest.CREATE should allow null in baseConfigSetName Key: SOLR-11744 URL: https://issues.apache.org/jira/browse/SOLR-11744 Project: Solr Issue Type: Wish Security Level: Public (Default Security Level. Issues are Public) Components: config-api, Tests Reporter: Abhishek Kumar Singh Priority: Minor Currently, [ConfigSetAdminRequest.Create|http://lucene.apache.org/solr/6_5_0/solr-solrj/org/apache/solr/client/solrj/request/ConfigSetAdminRequest.Create.html] gives an exception *_{color:red}no Base ConfigSet specified!{color}_* if [baseConfigSetName|http://lucene.apache.org/solr/6_5_0/solr-solrj/org/apache/solr/client/solrj/request/ConfigSetAdminRequest.Create.html#baseConfigSetName] is null. However, a configSet can be created by passing the *__default_* as the {{baseConfigSetName}} which is a hack. IMO *_baseConfigSetName_* should be optional, so that, instead of giving an exception, *_ConfigSetAdminRequest.Create_* lets the user create a *_configSet_* from the *_default config set_* i.e. *__default_* if the *_baseConfigSetName_* is not provided. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11624: Attachment: (was: solr-11624.3.patch) > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286385#comment-16286385 ] Abhishek Kumar Singh edited comment on SOLR-11624 at 12/11/17 6:45 PM: --- Please find the updated patch here -> [^SOLR-11624.3.patch] Changes made:- 1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to create/update and use the correct {{modifiedConfigSet}} name. 2. Refactored the {{configName}} , added suffix to the name of _auto-generated configSet_. [~ichattopadhyaya] [~dsmiley] Please review the patch. was (Author: asingh2411): Please find the updated patch here -> [^SOLR-11624.3.patch] Changes made:- 1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to create/update and use the correct {{modifiedConfigSet}} name. 2. Refactored the {{configName}} , added suffix to the name of _auto-generated configSet_. > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286385#comment-16286385 ] Abhishek Kumar Singh edited comment on SOLR-11624 at 12/11/17 6:43 PM: --- Please find the updated patch here -> [^SOLR-11624.3.patch] Changes made:- 1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to create/update and use the correct {{modifiedConfigSet}} name. 2. Refactored the {{configName}} , added suffix to the name of _auto-generated configSet_. was (Author: asingh2411): Please find the updated patch here. > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11624: Attachment: solr-11624.3.patch Please find the updated patch here. > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.patch, solr-11624.3.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.
[ https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-11624: Attachment: SOLR-11624.3.patch > _default configset overwrites a a configset if collection.configName isn't > specified even if a confiset of the same name already exists. > > > Key: SOLR-11624 > URL: https://issues.apache.org/jira/browse/SOLR-11624 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.2 >Reporter: Erick Erickson >Assignee: Ishan Chattopadhyaya > Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch > > > Looks like a problem that crept in when we changed the _default configset > stuff. > setup: > upload a configset named "wiki" > collections?action=CREATE&name=wiki&. > My custom configset "wiki" gets overwritten by _default and then used by the > "wiki" collection. > Assigning to myself only because it really needs to be fixed IMO and I don't > want to lose track of it. Anyone else please feel free to take it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing
[ https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284694#comment-16284694 ] Abhishek Kumar Singh commented on SOLR-11741: - I am working on the same. > Offline training mode for schema guessing > - > > Key: SOLR-11741 > URL: https://issues.apache.org/jira/browse/SOLR-11741 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya > > Our data driven schema guessing doesn't work under many situations. For > example, if the first document has a field with value "0", it is guessed as > Long and subsequent fields with "0.0" are rejected. Similarly, if the same > field had alphanumeric contents for a latter document, those documents are > rejected. Also, single vs. multi valued field guessing is not ideal. > Proposing an offline training mode where Solr accepts bunch of documents and > returns a guessed schema (without indexing). This schema can then be used for > actual indexing. I think the original idea is from Hoss. > I think initial implementation can be based on an UpdateRequestProcessor. We > can hash out the API soon, as we go along. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Attachment: SOLR-10263.v2.patch Uploading the updated patch > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Attachment: (was: SOLR-10263.v2.patch) > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3089) Make ResponseBuilder.isDistrib public
[ https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150256#comment-16150256 ] Abhishek Kumar Singh edited comment on SOLR-3089 at 9/1/17 9:14 AM: I am using SOLR 6.5.0 and am still facing the same issue. I feel the patch should be merged. was (Author: asingh2411): I am using SOLR 6.50 and am still facing the same issue. I feel the patch should be merged. > Make ResponseBuilder.isDistrib public > - > > Key: SOLR-3089 > URL: https://issues.apache.org/jira/browse/SOLR-3089 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 4.0-ALPHA >Reporter: Rok Rejc > Fix For: 4.9, 6.0 > > Attachments: Solr-3089.patch > > > Hi, > i have posted this issue on a mailing list but i didn't get any response. > I am trying to write distributed search component (a class that extends > SearchComponent). I have checked FacetComponent and TermsComponent. If I want > that search component works in a distributed environment I have to set > ResponseBuilder's isDistrib to true, like this (this is also done in > TermsComponent for example): > public void prepare(ResponseBuilder rb) throws IOException { > SolrParams params = rb.req.getParams(); > String shards = params.get(ShardParams.SHARDS); > if (shards != null) { > List lst = StrUtils.splitSmart(shards, ",", > true); > rb.shards = lst.toArray(new String[lst.size()]); > rb.isDistrib = true; > } > } > If I have my component outside the package org.apache.solr.handler.component > this doesn't work. Is it possible to make isDistrib public (or is this the > wrong procedure/behaviour/design)? > Many thanks, > Rok -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3089) Make ResponseBuilder.isDistrib public
[ https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150256#comment-16150256 ] Abhishek Kumar Singh commented on SOLR-3089: I am using SOLR-6.50 and am still facing the same issue. I feel the patch should be merged. > Make ResponseBuilder.isDistrib public > - > > Key: SOLR-3089 > URL: https://issues.apache.org/jira/browse/SOLR-3089 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 4.0-ALPHA >Reporter: Rok Rejc > Fix For: 4.9, 6.0 > > Attachments: Solr-3089.patch > > > Hi, > i have posted this issue on a mailing list but i didn't get any response. > I am trying to write distributed search component (a class that extends > SearchComponent). I have checked FacetComponent and TermsComponent. If I want > that search component works in a distributed environment I have to set > ResponseBuilder's isDistrib to true, like this (this is also done in > TermsComponent for example): > public void prepare(ResponseBuilder rb) throws IOException { > SolrParams params = rb.req.getParams(); > String shards = params.get(ShardParams.SHARDS); > if (shards != null) { > List lst = StrUtils.splitSmart(shards, ",", > true); > rb.shards = lst.toArray(new String[lst.size()]); > rb.isDistrib = true; > } > } > If I have my component outside the package org.apache.solr.handler.component > this doesn't work. Is it possible to make isDistrib public (or is this the > wrong procedure/behaviour/design)? > Many thanks, > Rok -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3089) Make ResponseBuilder.isDistrib public
[ https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150256#comment-16150256 ] Abhishek Kumar Singh edited comment on SOLR-3089 at 9/1/17 9:11 AM: I am using SOLR 6.50 and am still facing the same issue. I feel the patch should be merged. was (Author: asingh2411): I am using SOLR-6.50 and am still facing the same issue. I feel the patch should be merged. > Make ResponseBuilder.isDistrib public > - > > Key: SOLR-3089 > URL: https://issues.apache.org/jira/browse/SOLR-3089 > Project: Solr > Issue Type: Improvement > Components: Response Writers >Affects Versions: 4.0-ALPHA >Reporter: Rok Rejc > Fix For: 4.9, 6.0 > > Attachments: Solr-3089.patch > > > Hi, > i have posted this issue on a mailing list but i didn't get any response. > I am trying to write distributed search component (a class that extends > SearchComponent). I have checked FacetComponent and TermsComponent. If I want > that search component works in a distributed environment I have to set > ResponseBuilder's isDistrib to true, like this (this is also done in > TermsComponent for example): > public void prepare(ResponseBuilder rb) throws IOException { > SolrParams params = rb.req.getParams(); > String shards = params.get(ShardParams.SHARDS); > if (shards != null) { > List lst = StrUtils.splitSmart(shards, ",", > true); > rb.shards = lst.toArray(new String[lst.size()]); > rb.isDistrib = true; > } > } > If I have my component outside the package org.apache.solr.handler.component > this doesn't work. Is it possible to make isDistrib public (or is this the > wrong procedure/behaviour/design)? > Many thanks, > Rok -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 1:55 PM: -- The problem with "maxCollationTries" is that - a {{collationTry}} is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, after applying the above patch, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that - a {{collationTry}} is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.c
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:42 AM: -- The problem with "maxCollationTries" is that - a {{collationTry}} is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:04 AM: -- The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses*. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions waste our precious {{maxCollationTries}} . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses*. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions waiting for our precious {{maxCollationTries}} . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:08 AM: -- The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlas
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (for relevance or performance reasons) (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE :* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. Related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (for relevance or performance reasons) (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE : * Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. Related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This messa
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:07 AM: -- The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different _suggestMode_ configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:06 AM: -- The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *gold mine sung lasses* and later waste our precious {{maxCollationTries}}. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions which we already know are not required. This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses*. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions waste our precious {{maxCollationTries}} . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh commented on SOLR-10263: - The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. (For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses* , SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions ) . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:04 AM: -- The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses*. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions waiting for our precious {{maxCollationTries}} . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses*. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additio
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:03 AM: -- The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses*. In this case, SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . was (Author: asingh2411): The problem with "maxCollationTries" is that it is an expensive step. So, there is only a limit to which we can increase its value - given a certain level of response time/efficiency requirement. A control on {{wordBreak}} suggestions can give us more freedom to get the relevant suggestions in cases where we know how our queries are going to be. (For example: *gold mine sunglasses* will even give suggestions like *gold mine sun glasses* or even *goldmi sung lasses* , SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions ) . This is why, We faced cases wherein different {{SpellCheckComponents}} required different *suggestModes*. Also, I think _wordBreak_ and _wordJoin_ (within {{WordBreakSolrSpellCheck}} ) should also have different configurations because the use cases can really vary. (for the above usecase itself, we want *gold* and *mine* to be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. ) This is why in our case, we had to configure {{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}} was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}} still had {{SUGGEST_ALWAYS}} . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE :* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Attachment: SOLR-10263.v2.patch The above patch has the changes related to the PR : https://github.com/apache/lucene-solr/pull/218 > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE : * Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080375#comment-16080375 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 7/10/17 2:14 PM: -- The above patch (https://issues.apache.org/jira/secure/attachment/12876418/SOLR-10263.v2.patch) has the changes related to the PR : https://github.com/apache/lucene-solr/pull/218 was (Author: asingh2411): The above patch has the changes related to the PR : https://github.com/apache/lucene-solr/pull/218 > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > Attachments: SOLR-10263.v2.patch > > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE : * Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. So related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > *Update:* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > So related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE : * Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. So related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. So related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE : * Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > So related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (for relevance or performance reasons) (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE : * Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. Related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE : * Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. Related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (for relevance or performance reasons) > (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE : * Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --
[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079987#comment-16079987 ] Abhishek Kumar Singh commented on SOLR-10263: - After PR #218 , The _solrconfig.xml_ of*WordBreakSolrSpellChecker* (and later, for all the components) can be configured like this :- {code:xml} wordbreakspellcheck solr.WordBreakSolrSpellChecker fieldspell true true true 5 0 SUGGEST_WHEN_NOT_IN_INDEX SUGGEST_ALWAYS {code} OR Simply as :- {code:xml} spellcheckword solr.WordBreakSolrSpellChecker fieldspell true true 5 0 SUGGEST_WHEN_NOT_IN_INDEX {code} > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > *Update:* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > So related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE : * Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. Related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *UPDATE : * Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. So related changes have been done at Latest PR. : https://github.com/apache/lucene-solr/pull/218. > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > *UPDATE : * Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. > Related changes have been done at Latest PR. : > https://github.com/apache/lucene-solr/pull/218. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apac
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > *Update:* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . *Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} should also have different suggestModes. We faced this problem in our case, wherein, Most of the WordJoin cases are those where the words individually are valid tokens, but what the users are looking for is actually a combination (wordjoin) of the two tokens. For example:- *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the actual product being looked for is *goldmine sunglasses* , where *goldmine* is a brand. In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . For this, we should have separate suggestModes for both `wordJoin` as well as `wordBreak`. was: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > *Update:* Recently, we also figured out that, for > {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} > should also have different suggestModes. > We faced this problem in our case, wherein, Most of the WordJoin cases are > those where the words individually are valid tokens, but what the users are > looking for is actually a combination (wordjoin) of the two tokens. > For example:- > *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But > the actual product being looked for is *goldmine sunglasses* , where > *goldmine* is a brand. > In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But > this wont be possible because we had set {{SUGGEST_WHEN_NOT_IN_INDEX}} for > {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part) . > For this, we should have separate suggestModes for both `wordJoin` as well as > `wordBreak`. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10513) CLONE - ConjunctionSolrSpellChecker wrong check for same string distance
[ https://issues.apache.org/jira/browse/SOLR-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10513: Description: See ConjunctionSolrSpellChecker.java try { if (stringDistance == null) { stringDistance = checker.getStringDistance(); } else if (stringDistance != checker.getStringDistance()) { throw new IllegalArgumentException( "All checkers need to use the same StringDistance."); } } catch (UnsupportedOperationException uoe) { // ignore } In line stringDistance != checker.getStringDistance() there is comparing by references. So if you are using 2 or more spellcheckers with same distance algorithm, exception will be thrown anyway. *Update:* As of Solr 6.5, this has been changed to *stringDistance.equals(checker.getStringDistance())* . However, *LuceneLevenshteinDistance* does not even override equals method. This does not solve the problem yet, because the *default equals* method anyway compares references. Hence unable to use *FileBasedSolrSpellChecker* . Moreover, Some check of similar sorts should also be in the init method. So that user does not have to wait for this error during query time. If the spellcheck components have been added *solrconfig.xml* , it should throw error during core-reload itself. was: See ConjunctionSolrSpellChecker.java try { if (stringDistance == null) { stringDistance = checker.getStringDistance(); } else if (stringDistance != checker.getStringDistance()) { throw new IllegalArgumentException( "All checkers need to use the same StringDistance."); } } catch (UnsupportedOperationException uoe) { // ignore } In line stringDistance != checker.getStringDistance() there is comparing by references. So if you are using 2 or more spellcheckers with same distance algorithm, exception will be thrown anyway. > CLONE - ConjunctionSolrSpellChecker wrong check for same string distance > > > Key: SOLR-10513 > URL: https://issues.apache.org/jira/browse/SOLR-10513 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.9 >Reporter: Abhishek Kumar Singh >Assignee: James Dyer > Fix For: 5.5 > > > See ConjunctionSolrSpellChecker.java > try { > if (stringDistance == null) { > stringDistance = checker.getStringDistance(); > } else if (stringDistance != checker.getStringDistance()) { > throw new IllegalArgumentException( > "All checkers need to use the same StringDistance."); > } > } catch (UnsupportedOperationException uoe) { > // ignore > } > In line stringDistance != checker.getStringDistance() there is comparing by > references. So if you are using 2 or more spellcheckers with same distance > algorithm, exception will be thrown anyway. > *Update:* As of Solr 6.5, this has been changed to > *stringDistance.equals(checker.getStringDistance())* . > However, *LuceneLevenshteinDistance* does not even override equals method. > This does not solve the problem yet, because the *default equals* method > anyway compares references. > Hence unable to use *FileBasedSolrSpellChecker* . > Moreover, Some check of similar sorts should also be in the init method. So > that user does not have to wait for this error during query time. If the > spellcheck components have been added *solrconfig.xml* , it should throw > error during core-reload itself. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10513) CLONE - ConjunctionSolrSpellChecker wrong check for same string distance
[ https://issues.apache.org/jira/browse/SOLR-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972987#comment-15972987 ] Abhishek Kumar Singh commented on SOLR-10513: - As of Solr 6.5, this has been changed to *stringDistance.equals(checker.getStringDistance())* . However, *LuceneLevenshteinDistance* does not even override equals method. This does not solve the problem yet, because the *default equals* method anyway compares references. Hence unable to use *FileBasedSolrSpellChecker* . > CLONE - ConjunctionSolrSpellChecker wrong check for same string distance > > > Key: SOLR-10513 > URL: https://issues.apache.org/jira/browse/SOLR-10513 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.9 >Reporter: Abhishek Kumar Singh >Assignee: James Dyer > Fix For: 5.5 > > > See ConjunctionSolrSpellChecker.java > try { > if (stringDistance == null) { > stringDistance = checker.getStringDistance(); > } else if (stringDistance != checker.getStringDistance()) { > throw new IllegalArgumentException( > "All checkers need to use the same StringDistance."); > } > } catch (UnsupportedOperationException uoe) { > // ignore > } > In line stringDistance != checker.getStringDistance() there is comparing by > references. So if you are using 2 or more spellcheckers with same distance > algorithm, exception will be thrown anyway. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10513) CLONE - ConjunctionSolrSpellChecker wrong check for same string distance
Abhishek Kumar Singh created SOLR-10513: --- Summary: CLONE - ConjunctionSolrSpellChecker wrong check for same string distance Key: SOLR-10513 URL: https://issues.apache.org/jira/browse/SOLR-10513 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 4.9 Reporter: Abhishek Kumar Singh Assignee: James Dyer Fix For: 5.5 See ConjunctionSolrSpellChecker.java try { if (stringDistance == null) { stringDistance = checker.getStringDistance(); } else if (stringDistance != checker.getStringDistance()) { throw new IllegalArgumentException( "All checkers need to use the same StringDistance."); } } catch (UnsupportedOperationException uoe) { // ignore } In line stringDistance != checker.getStringDistance() there is comparing by references. So if you are using 2 or more spellcheckers with same distance algorithm, exception will be thrown anyway. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6271) ConjunctionSolrSpellChecker wrong check for same string distance
[ https://issues.apache.org/jira/browse/SOLR-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972895#comment-15972895 ] Abhishek Kumar Singh commented on SOLR-6271: As of Solr 6.5, this has been changed to *stringDistance.equals(checker.getStringDistance())* . However, *LuceneLevenshteinDistance* does not even override *equals* method. This does not solve the problem yet, because this default *equals* method anyway compares references. > ConjunctionSolrSpellChecker wrong check for same string distance > > > Key: SOLR-6271 > URL: https://issues.apache.org/jira/browse/SOLR-6271 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.9 >Reporter: Igor Kostromin >Assignee: James Dyer > Fix For: 5.5 > > Attachments: SOLR-6271.patch, SOLR-6271.patch > > > See ConjunctionSolrSpellChecker.java > try { > if (stringDistance == null) { > stringDistance = checker.getStringDistance(); > } else if (stringDistance != checker.getStringDistance()) { > throw new IllegalArgumentException( > "All checkers need to use the same StringDistance."); > } > } catch (UnsupportedOperationException uoe) { > // ignore > } > In line stringDistance != checker.getStringDistance() there is comparing by > references. So if you are using 2 or more spellcheckers with same distance > algorithm, exception will be thrown anyway. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Attachment: (was: SOLR-10256.patch) > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position by > _EdismaxParser_ , which is not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. > ** > *Update*: > Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Description: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position by _EdismaxParser_ , which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. ** *Update*: Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168 was: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position by _EdismaxParser_ , which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. *Update*: Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168 > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position by > _EdismaxParser_ , which is not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. > ** > *Update*: > Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Description: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position by _EdismaxParser_ , which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. *Update*: Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168 was: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position by _EdismaxParser_ , which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > Attachments: SOLR-10256.patch > > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position by > _EdismaxParser_ , which is not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. > *Update*: Raised PR for the same -> > https://github.com/apache/lucene-solr/pull/168 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954662#comment-15954662 ] Abhishek Kumar Singh commented on SOLR-10256: - I agree with your argument that it may not be the best use case. Because it all depends on how we have configured our search to work. And this is why we have configurations like _mm_ , for specifying the minimum match. The problem arises in cases wherein our _mm_ configuration guarantees *100% tokens-match*, but the spellcheck (due to WordBreak) shows the [suggestions wherein even one of the tokens in the broken words](Sugg A) has a higher frequency than the ones with [reasonable frequency but very less Levenstein distance ] (Sugg B) . We would expect *Sugg B* to have higher weightage in spellcheck suggestions than *Sugg A*. But it's not happening due to the compulsory braces. What I feel is, by default it should have braces on, but there should be a configuration to switch it off. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > Attachments: SOLR-10256.patch > > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position by > _EdismaxParser_ , which is not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946518#comment-15946518 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 3/29/17 4:42 AM: -- The _solrconfig.xml_ of*WordBreakSolrSpellChecker* (and later, for all the components) can be configured like this :- {code:xml} spellcheckword solr.WordBreakSolrSpellChecker fieldspell true true 10 0 SUGGEST_WHEN_NOT_IN_INDEX {code} was (Author: asingh2411): The _solrconfig.xml_ of*WordBreakSolrSpellChecker* ( and later, for all the components) can be configured like this :- {code:xml} spellcheckword solr.WordBreakSolrSpellChecker fieldspell true true 10 0 SUGGEST_WHEN_NOT_IN_INDEX {code} > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946518#comment-15946518 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 3/29/17 4:42 AM: -- The _solrconfig.xml_ of*WordBreakSolrSpellChecker* ( and later, for all the components) can be configured like this :- {code:xml} spellcheckword solr.WordBreakSolrSpellChecker fieldspell true true 10 0 SUGGEST_WHEN_NOT_IN_INDEX {code} was (Author: asingh2411): The _solrconfig.xml_ *WordBreakSolrSpellChecker* and later, for all the components can be configured like this :- {code:xml} spellcheckword solr.WordBreakSolrSpellChecker fieldspell true true 10 0 SUGGEST_WHEN_NOT_IN_INDEX {code} > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946518#comment-15946518 ] Abhishek Kumar Singh commented on SOLR-10263: - The _solrconfig.xml_ *WordBreakSolrSpellChecker* and later, for all the components can be configured like this :- {code:xml} spellcheckword solr.WordBreakSolrSpellChecker fieldspell true true 10 0 SUGGEST_WHEN_NOT_IN_INDEX {code} > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945363#comment-15945363 ] Abhishek Kumar Singh commented on SOLR-10263: - Raised this PR for *WordBreakSolrSpellChecker*. https://github.com/apache/lucene-solr/pull/176/files > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Summary: Different SpellcheckComponents should have their own suggestMode (was: Different SpellcheckComponents should have their own options) > Different SpellcheckComponents should have their own suggestMode > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own options
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926226#comment-15926226 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 3/28/17 2:27 PM: -- Yes, This is what is happening in the latest code too. See this, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120 It passes the *options* , for all the SpellCheckComponents. was (Author: asingh2411): Yes, This is what is happening in the latest code too. See this, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120 It passes gives the *options* , for all the SpellCheckComponents. > Different SpellcheckComponents should have their own options > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own options
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926226#comment-15926226 ] Abhishek Kumar Singh edited comment on SOLR-10263 at 3/15/17 2:07 PM: -- Yes, This is what is happening in the latest code too. See this, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120 It passes gives the *options* , for all the SpellCheckComponents. was (Author: asingh2411): Yes, This is what is happening in the latest code too. See this, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120 It passes gives the same options, for all the SpellCheckComponents. > Different SpellcheckComponents should have their own options > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own options
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926226#comment-15926226 ] Abhishek Kumar Singh commented on SOLR-10263: - Yes, This is what is happening in the latest code too. See this, https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120 It passes gives the same options, for all the SpellCheckComponents. > Different SpellcheckComponents should have their own options > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6271) ConjunctionSolrSpellChecker wrong check for same string distance
[ https://issues.apache.org/jira/browse/SOLR-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905234#comment-15905234 ] Abhishek Kumar Singh commented on SOLR-6271: This issue is still occuring in my case, wherein I'm using _DirectSolrSpellChecker_ and _FileBasedSpellChecker_ . The problem is that _DirectSolrSpellChecker_ is using *LuceneLavensteinDistance* while _FileBasedSpellChecker_ is using *LavensteinDistance* as the StringDistance. This is throwing the *IllegalArgumentException( "All checkers need to use the same StringDistance.");* . What can be the fix to this? > ConjunctionSolrSpellChecker wrong check for same string distance > > > Key: SOLR-6271 > URL: https://issues.apache.org/jira/browse/SOLR-6271 > Project: Solr > Issue Type: Bug > Components: spellchecker >Affects Versions: 4.9 >Reporter: Igor Kostromin >Assignee: James Dyer > Fix For: 5.5 > > Attachments: SOLR-6271.patch, SOLR-6271.patch > > > See ConjunctionSolrSpellChecker.java > try { > if (stringDistance == null) { > stringDistance = checker.getStringDistance(); > } else if (stringDistance != checker.getStringDistance()) { > throw new IllegalArgumentException( > "All checkers need to use the same StringDistance."); > } > } catch (UnsupportedOperationException uoe) { > // ignore > } > In line stringDistance != checker.getStringDistance() there is comparing by > references. So if you are using 2 or more spellcheckers with same distance > algorithm, exception will be thrown anyway. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own options
[ https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10263: Description: As of now, common spellcheck options are applied to all the SpellCheckComponents. This can create problem in the following case:- It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST spellcheck suggestions. But we may want *WordBreakSpellChecker* to suggest only if the token is not in the index (SUGGEST_WHEN_NOT_IN_INDEX) . > Different SpellcheckComponents should have their own options > > > Key: SOLR-10263 > URL: https://issues.apache.org/jira/browse/SOLR-10263 > Project: Solr > Issue Type: Wish > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh >Priority: Minor > > As of now, common spellcheck options are applied to all the > SpellCheckComponents. > This can create problem in the following case:- > It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST > spellcheck suggestions. > But we may want *WordBreakSpellChecker* to suggest only if the token is not > in the index (SUGGEST_WHEN_NOT_IN_INDEX) . -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10263) Different SpellcheckComponents should have their own options
Abhishek Kumar Singh created SOLR-10263: --- Summary: Different SpellcheckComponents should have their own options Key: SOLR-10263 URL: https://issues.apache.org/jira/browse/SOLR-10263 Project: Solr Issue Type: Wish Security Level: Public (Default Security Level. Issues are Public) Components: spellchecker Reporter: Abhishek Kumar Singh Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Attachment: SOLR-10256.patch Attaching patch for the above issue. Added a flag to SpellCheckCollator. And made it configurable. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > Attachments: SOLR-10256.patch > > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position by > _EdismaxParser_ , which is not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Description: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position by _EdismaxParser_ , which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. was: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position by > _EdismaxParser_ , which is not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Description: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. Such suggestions are being surrounded by braces by current *SpellCheckCollator*. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parenthesisation of spell check suggestions. was: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parnthesisation of spell check suggestion. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > Such suggestions are being surrounded by braces by current > *SpellCheckCollator*. > And when surrounded by brackets, they represent the same position, which is > not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parenthesisation of spell check suggestions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Description: SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which have space between them. This should be configurable, because if *_WordBreakSpellCheckComponent_* is being used, queries like : *applejuice* will be broken down to *apple juice*. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parnthesisation of spell check suggestion. was: SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have space between them. This should be configurable, because if WordBreakSpellCheckComponent is being used, queries like : *applejuice* will be broken down to *apple juice*. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parnthesisation of spell check suggestion. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > > SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which > have space between them. > This should be configurable, because if *_WordBreakSpellCheckComponent_* is > being used, queries like : *applejuice* will be broken down to *apple juice*. > And when surrounded by brackets, they represent the same position, which is > not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parnthesisation of spell check suggestion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator
[ https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Kumar Singh updated SOLR-10256: Description: SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have space between them. This should be configurable, because if WordBreakSpellCheckComponent is being used, queries like : *applejuice* will be broken down to *apple juice*. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parnthesisation of spell check suggestion. was: SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have space between them. This should be configurable, because if WordBreakSpellCheckComponent is being used, queries like : applejuice will be broken down to apple juice. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parnthesisation of spell check suggestion. > Parentheses in SpellCheckCollator > - > > Key: SOLR-10256 > URL: https://issues.apache.org/jira/browse/SOLR-10256 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: spellchecker >Reporter: Abhishek Kumar Singh > > SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have > space between them. > This should be configurable, because if WordBreakSpellCheckComponent is being > used, queries like : *applejuice* will be broken down to *apple juice*. > And when surrounded by brackets, they represent the same position, which is > not required. > https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 > > A solution to this will be to have a flag, which can help disable this > parnthesisation of spell check suggestion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-10256) Parentheses in SpellCheckCollator
Abhishek Kumar Singh created SOLR-10256: --- Summary: Parentheses in SpellCheckCollator Key: SOLR-10256 URL: https://issues.apache.org/jira/browse/SOLR-10256 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: spellchecker Reporter: Abhishek Kumar Singh SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have space between them. This should be configurable, because if WordBreakSpellCheckComponent is being used, queries like : applejuice will be broken down to apple juice. And when surrounded by brackets, they represent the same position, which is not required. https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227 A solution to this will be to have a flag, which can help disable this parnthesisation of spell check suggestion. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org