[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464841#comment-16464841
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

In order to use LearnSchemaUpdateRequestProcessorFactory, add just it to the 
URP chain.  

The new API details are :-
 # *_Get A Training Id:_***

*_GET_*

*_//schema/train/start_*

Response:

 
{code:java}
{"schemaTrainingId" : ""} 
{code}
 

*2. Start Training:*

This api is just like another update api, with documents to be trained with.   

*POST*

*//update?schemaTrainingId=* 

 
{code:java}
Body: (Same as update request)
[{}] 
{code}
 

*3. Get the schema trained so far:-*

*GET*

*/schema/train/yield?schemaTrainingId=*

*Response:*

 
{code:java}
{
"schema":{
  "add-field-type": [
 { "name":, "type":, "multivalued":},
 { "name":, "type":, "multivalued":},
...
   ]
}
}
{code}
**

 

 

 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, 
> screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: SOLR-11741.patch

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, 
> screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464831#comment-16464831
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

[^SOLR-11741.patch]

 

Adding updated APIs. 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, 
> screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464831#comment-16464831
 ] 

Abhishek Kumar Singh edited comment on SOLR-11741 at 5/5/18 4:33 PM:
-

[^SOLR-11741.patch]

 

Adding updated patch. 


was (Author: abhidemon):
[^SOLR-11741.patch]

 

Adding updated APIs. 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, 
> screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: SOLR-11741.patch

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, SOLR-11741.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464830#comment-16464830
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

[^SOLR-11741.patch]

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464829#comment-16464829
 ] 

Abhishek Kumar Singh edited comment on SOLR-11741 at 5/5/18 4:31 PM:
-

Uploading the updated patch, with following features:-

 A new URP LearnSchemaUpdateRequestProcessorFactory: It simply learns from the 
incoming data to check what the current  data type looks like. Based on, it 
updates the metadata about each field. 

 

 


was (Author: abhidemon):
Uploading the updated patch, with following features:-

 A new URP LearnSchemaUpdateRequestProcessorFactory: It simply learns from the 
incoming data to check what the current  data type looks like. Based on, it 
updates the metadata about each field. 

 

APIs: 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-05-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16464829#comment-16464829
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

Uploading the updated patch, with following features:-

 A new URP LearnSchemaUpdateRequestProcessorFactory: It simply learns from the 
incoming data to check what the current  data type looks like. Based on, it 
updates the metadata about each field. 

 

APIs: 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-04-21 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: SOLR-11741.patch

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> SOLR-11741.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!

2018-01-13 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11624:

Attachment: SOLR-11624.patch

Uploading the updated patch with corrected documentation. 
[~ichattopadhyaya] [~dsmiley]

> collection creation should not also overwrite/delete any configset but it can!
> --
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, 
> SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3089) Make ResponseBuilder.isDistrib public

2018-01-09 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-3089:
---
Attachment: SOLR-3089.patch

Uploading updated patch with a test case using the method *rb.isDistributed()*

[~ichattopadhyaya] 

> Make ResponseBuilder.isDistrib public
> -
>
> Key: SOLR-3089
> URL: https://issues.apache.org/jira/browse/SOLR-3089
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 4.0-ALPHA
>Reporter: Rok Rejc
> Fix For: 4.9, 6.0
>
> Attachments: SOLR-3089.patch, Solr-3089.patch
>
>
> Hi,
> i have posted this issue on a mailing list but i didn't get any response.
> I am trying to write distributed search component (a class that extends 
> SearchComponent). I have checked FacetComponent and TermsComponent. If I want 
> that search component works in a distributed environment I have to set 
> ResponseBuilder's isDistrib to true, like this (this is also done in 
> TermsComponent for example):
>   public void prepare(ResponseBuilder rb) throws IOException {
>   SolrParams params = rb.req.getParams();
>   String shards = params.get(ShardParams.SHARDS);
>   if (shards != null) {
>   List lst = StrUtils.splitSmart(shards, ",", 
> true);
>   rb.shards = lst.toArray(new String[lst.size()]);
>   rb.isDistrib = true;
>   }
>   }
> If I have my component outside the package org.apache.solr.handler.component 
> this doesn't work. Is it possible to make isDistrib public (or is this the 
> wrong procedure/behaviour/design)?
> Many thanks,
> Rok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11837) More information required in README.md for Setting up project in IDEs

2018-01-09 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11837:

Attachment: SOLR-11837.patch

> More information required in README.md for Setting up project in IDEs 
> --
>
> Key: SOLR-11837
> URL: https://issues.apache.org/jira/browse/SOLR-11837
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Abhishek Kumar Singh
>  Labels: documentation
> Attachments: SOLR-11837.patch
>
>
> Sometimes, the instructions mentioned on the README.md page is not enough to 
> set up the project in the IDEs.
> The following *solr-wiki-page-links* are pretty useful, but are not present 
> on the README.md page.
> https://wiki.apache.org/solr/HowToConfigureEclipse
> https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
> https://wiki.apache.org/lucene-java/HowtoConfigureNetbeans
> Having links on the README.md page will be quite helpful for beginners. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11837) More information required in README.md for Setting up project in IDEs

2018-01-09 Thread Abhishek Kumar Singh (JIRA)
Abhishek Kumar Singh created SOLR-11837:
---

 Summary: More information required in README.md for Setting up 
project in IDEs 
 Key: SOLR-11837
 URL: https://issues.apache.org/jira/browse/SOLR-11837
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Abhishek Kumar Singh


Sometimes, the instructions mentioned on the README.md page is not enough to 
set up the project in the IDEs.
The following *solr-wiki-page-links* are pretty useful, but are not present on 
the README.md page.
https://wiki.apache.org/solr/HowToConfigureEclipse
https://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ
https://wiki.apache.org/lucene-java/HowtoConfigureNetbeans

Having links on the README.md page will be quite helpful for beginners. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: (was: RuleForMostAccomodatingField.png)

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: RuleForMostAccomodatingField.png

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: RuleForMostAccomodatingField.png, 
> RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, screenshot-1.png, 
> screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315465#comment-16315465
 ] 

Abhishek Kumar Singh edited comment on SOLR-11741 at 1/7/18 9:02 PM:
-

The above approach can be optimised by replacing the *Supported FieldTypes* by  
*_BitSets_* , 
As shown in the following table:-
!screenshot-1.png!

We can map every FieldType to a BitSet. For eg. *String will be 1* , *Long 
will be 00100* and so on..  

1. Now For every product, get the BitSet of the fieldType supported by each 
field 
2.  For every field, Find the *_BITWISE OR_* of the current BitSet with the 
BitSet value already recorded, and replace it.

Use the following rule to decide the final FieldType that the field should 
have. 
!RuleForMostAccomodatingField.png!

Say if a field called *price* has values as following values: 
In Product1 -> *12321  (Long, i.e. 00100)*
In Product2 -> *77261.66  (Double, i.e. 01000)* 
The supported BitSet for *price* will have a final value of *[ 00100 OR 01000 = 
01100 ]* , i.e. It should be assigned a Double. 

The above rule can be extended to any number of types, just the number of bits 
will increase accordingly. 

Using BitSets like above will decrease the storage space to 1 byte per field, 
will make the computation easier and faster, and will also remove the overhead 
of computing the trained schema separately, as they will be updated in-place 
with every Product.

Every api call to ask for *Trained Schema*,  will get the schema calculated 
till that point using the above rule. 


was (Author: abhidemon):
The above approach can be optimised by replacing the *Supported FieldTypes* by  
*_BitSets_* , 
As shown in the following table:-
!screenshot-1.png!

We can map every FieldType to a BitSet. For eg. *String will be 1* , *Long 
will be 00100* and so on..  

1. Now For every product, get the BitSet of the fieldType supported by each 
field 
2.  For every field, Find the *_BITWISE OR_* of the current BitSet with the 
BitSet value already recorded, and replace it.

Use the following rule to decide the final FieldType that the field should 
have. 
!screenshot-3.png!

Say if a field called *price* has values as following values: 
In Product1 -> *12321  (Long, i.e. 00100)*
In Product2 -> *77261.66  (Double, i.e. 01000)* 
The supported BitSet for *price* will have a final value of *[ 00100 OR 01000 = 
01100 ]* , i.e. It should be assigned a Double. 

The above rule can be extended to any number of types, just the number of bits 
will increase accordingly. 

Using BitSets like above will decrease the storage space to 1 byte per field, 
will make the computation easier and faster, and will also remove the overhead 
of computing the trained schema separately, as they will be updated in-place 
with every Product.

Every api call to ask for *Trained Schema*,  will get the schema calculated 
till that point using the above rule. 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: RuleForMostAccomodatingField.png

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: RuleForMostAccomodatingField.png, SOLR-11741-temp.patch, 
> screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315469#comment-16315469
 ] 

Abhishek Kumar Singh edited comment on SOLR-11741 at 1/7/18 8:56 PM:
-

Uploading patch [^SOLR-11741-temp.patch] with above implementation within 
*AddSchemaFieldsUpdateProcessorFactory* itself.
Have added a config param called mode.

The above URP will just *Train The Schema* when *mode=train* . By default 
*mode=update*, i.e. update the schema as usual.

This patch is temporary because it still needs test cases. Also, currently the 
state is being stored in-memory, in a map.
Have to move that to the zookeeper. Will update that design in my next comments.


was (Author: abhidemon):
Uploading patch with above implementation within 
*AddSchemaFieldsUpdateProcessorFactory* itself.
Have added a config param called mode.

The above URP will just *Train The Schema* when *mode=train* . By default 
*mode=update*, i.e. update the schema as usual.

This patch is temporary because it still needs test cases. Also, currently the 
state is being stored in-memory, in a map.
Have to move that to the zookeeper. Will update that design in my next comments.

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-11741-temp.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315472#comment-16315472
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

@ [~ichattopadhyaya] 
bq. At every point in time, every field will be mapped to only one possible 
(most granular) field type, isn't it?

Yes.


> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-11741-temp.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: SOLR-11741-temp.patch

Uploading patch with above implementation within 
*AddSchemaFieldsUpdateProcessorFactory* itself.
Have added a config param called mode.

The above URP will just *Train The Schema* when *mode=train* . By default 
*mode=update*, i.e. update the schema as usual.

This patch is temporary because it still needs test cases. Also, currently the 
state is being stored in-memory, in a map.
Have to move that to the zookeeper. Will update that design in my next comments.

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: SOLR-11741-temp.patch, screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: (was: screenshot-2.png)

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: screenshot-1.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315465#comment-16315465
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

The above approach can be optimised by replacing the *Supported FieldTypes* by  
*_BitSets_* , 
As shown in the following table:-
!screenshot-1.png!

We can map every FieldType to a BitSet. For eg. *String will be 1* , *Long 
will be 00100* and so on..  

1. Now For every product, get the BitSet of the fieldType supported by each 
field 
2.  For every field, Find the *_BITWISE OR_* of the current BitSet with the 
BitSet value already recorded, and replace it.

Use the following rule to decide the final FieldType that the field should 
have. 
!screenshot-3.png!

Say if a field called *price* has values as following values: 
In Product1 -> *12321  (Long, i.e. 00100)*
In Product2 -> *77261.66  (Double, i.e. 01000)* 
The supported BitSet for *price* will have a final value of *[ 00100 OR 01000 = 
01100 ]* , i.e. It should be assigned a Double. 

The above rule can be extended to any number of types, just the number of bits 
will increase accordingly. 

Using BitSets like above will decrease the storage space to 1 byte per field, 
will make the computation easier and faster, and will also remove the overhead 
of computing the trained schema separately, as they will be updated in-place 
with every Product.

Every api call to ask for *Trained Schema*,  will get the schema calculated 
till that point using the above rule. 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: screenshot-2.png

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: screenshot-3.png

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11741) Offline training mode for schema guessing

2018-01-07 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11741:

Attachment: screenshot-1.png

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
> Attachments: screenshot-1.png
>
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2018-01-05 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314428#comment-16314428
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-

What i was thinking was something similar to the above implementation, just 
that instead of recording every *value* that ever appeared for a field, I would 
record all the distinct *fieldTypes of the values* that appeared for a each 
field. This will be the mapping of *field -> supported types*. This will need 
very small storage.  

And instead of recording in memory, this data can be stored externally, (say 
_zookeeper_, or some _temporary index_ inside solr.). I think it will get rid 
of the following problem.

bq. It doesn't play very nicely with distributed updates (you'd either have to 
ensure all training data was sent to the same node where you send the "commit" 
or add special custom logic to ensure it all got forwarded to a special node) 
and there are probably a lot more sophisticated / smarter ways to do it



> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!

2017-12-30 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306797#comment-16306797
 ] 

Abhishek Kumar Singh edited comment on SOLR-11624 at 12/30/17 2:39 PM:
---

Also, Thanks for pointing this consistency out. 
bq. so if we have one configSet in ZooKeeper named "myconfig" and the user 
creates a collection "mycoll" (without specifying which config), then 
presumably we'll have two configSets: "myconfig" and 
"mycoll.AUTOCREATED_CONFIGSET". And this point there is no longer one configSet 
in ZooKeeper, nor is there "_default" for that matter. Does this mean if the 
user goes to create another collection similarly that it will fail?

I think yes, it will fail.
Looks like with the new ConfigName being added, this feature in particular will 
break. 
We can get rid of this proble by 
* Either depricating this feature of [using the only configset 
present|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/CreateCollectionCmd.java#L362],
 
* Or making a new configSet named {{_default}} , whenever such a case arises. 
[~dsmiley] [~ichattopadhyaya] 


was (Author: abhidemon):
Also, Thanks for pointing this consistency out. 
bq. so if we have one configSet in ZooKeeper named "myconfig" and the user 
creates a collection "mycoll" (without specifying which config), then 
presumably we'll have two configSets: "myconfig" and 
"mycoll.AUTOCREATED_CONFIGSET". And this point there is no longer one configSet 
in ZooKeeper, nor is there "_default" for that matter. Does this mean if the 
user goes to create another collection similarly that it will fail?

I think yes, it will fail.
Looks like with the new ConfigName being added, this feature in particular will 
break. 
We can get rid of this proble by 
* Either depricating this feature of [using the only configset 
present|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/CreateCollectionCmd.java#L362],
 
* Or making a new configSet named {{_default}} , whenever such a case arises. 


> collection creation should not also overwrite/delete any configset but it can!
> --
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, 
> SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!

2017-12-30 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306797#comment-16306797
 ] 

Abhishek Kumar Singh commented on SOLR-11624:
-

Also, Thanks for pointing this consistency out. 
bq. so if we have one configSet in ZooKeeper named "myconfig" and the user 
creates a collection "mycoll" (without specifying which config), then 
presumably we'll have two configSets: "myconfig" and 
"mycoll.AUTOCREATED_CONFIGSET". And this point there is no longer one configSet 
in ZooKeeper, nor is there "_default" for that matter. Does this mean if the 
user goes to create another collection similarly that it will fail?

I think yes, it will fail.
Looks like with the new ConfigName being added, this feature in particular will 
break. 
We can get rid of this proble by 
* Either depricating this feature of [using the only configset 
present|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/cloud/CreateCollectionCmd.java#L362],
 
* Or making a new configSet named {{_default}} , whenever such a case arises. 


> collection creation should not also overwrite/delete any configset but it can!
> --
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, 
> SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11624) collection creation should not also overwrite/delete any configset but it can!

2017-12-30 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11624:

Attachment: SOLR-11624.patch

[~dsmiley] , Thanks for the suggestions.
Uploading the updated patch with following changes:-
* Changed the suffix name to   ".AUTOCREATED"
* Explicitly setting the name of the created ConfigSet in 
TimeRoutedAliasUpdateProcessorTest
* Modified the documentation to read.. 
bq. {color:#654982}*collection.configName*{color} : Defines the name of the 
configuration (which *must already be stored in ZooKeeper*) to use for this 
collection. If not provided, Solr will use the configuration of {{_default}} 
configSet *OR* the {{only configSet present}} (if there is only 1 config set in 
Zookeeper) to create a new (and mutable) configSet named 
{{.AUTOCREATED}} and will use it for the new collection.



> collection creation should not also overwrite/delete any configset but it can!
> --
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, 
> SOLR-11624.4.patch, SOLR-11624.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-12 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11624:

Attachment: SOLR-11624.4.patch

Updated the patch with documentation.

[~ichattopadhyaya] [~dsmiley] Kindly review the same. 

> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, 
> SOLR-11624.4.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286385#comment-16286385
 ] 

Abhishek Kumar Singh edited comment on SOLR-11624 at 12/11/17 6:46 PM:
---

Please find the updated patch here -> [^SOLR-11624.3.patch] 

Changes made:-
1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to 
create/update and use the correct {{modifiedConfigSet}} name.
2. Refactored the {{configName}} , added suffix to the name of _auto-generated 
configSet_. 

[~ichattopadhyaya] , [~dsmiley] Please review the patch.  


was (Author: asingh2411):
Please find the updated patch here -> [^SOLR-11624.3.patch] 

Changes made:-
1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to 
create/update and use the correct {{modifiedConfigSet}} name.
2. Refactored the {{configName}} , added suffix to the name of _auto-generated 
configSet_. 

[~ichattopadhyaya] [~dsmiley] Please review the patch.  

> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-11744) ConfigSetAdminRequest.CREATE should allow null in baseConfigSetName

2017-12-11 Thread Abhishek Kumar Singh (JIRA)
Abhishek Kumar Singh created SOLR-11744:
---

 Summary: ConfigSetAdminRequest.CREATE should allow null in  
baseConfigSetName
 Key: SOLR-11744
 URL: https://issues.apache.org/jira/browse/SOLR-11744
 Project: Solr
  Issue Type: Wish
  Security Level: Public (Default Security Level. Issues are Public)
  Components: config-api, Tests
Reporter: Abhishek Kumar Singh
Priority: Minor


Currently, 
[ConfigSetAdminRequest.Create|http://lucene.apache.org/solr/6_5_0/solr-solrj/org/apache/solr/client/solrj/request/ConfigSetAdminRequest.Create.html]
 gives an exception *_{color:red}no Base ConfigSet specified!{color}_* if 
[baseConfigSetName|http://lucene.apache.org/solr/6_5_0/solr-solrj/org/apache/solr/client/solrj/request/ConfigSetAdminRequest.Create.html#baseConfigSetName]
  is null.


However, a configSet can  be created by passing the *__default_* as the 
{{baseConfigSetName}} which is a hack. 

IMO *_baseConfigSetName_*  should be optional, so that, instead of giving an 
exception, *_ConfigSetAdminRequest.Create_*   lets the user create a 
*_configSet_* from the *_default config set_* i.e. *__default_*  if the 
*_baseConfigSetName_* is not provided.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-11 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11624:

Attachment: (was: solr-11624.3.patch)

> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286385#comment-16286385
 ] 

Abhishek Kumar Singh edited comment on SOLR-11624 at 12/11/17 6:45 PM:
---

Please find the updated patch here -> [^SOLR-11624.3.patch] 

Changes made:-
1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to 
create/update and use the correct {{modifiedConfigSet}} name.
2. Refactored the {{configName}} , added suffix to the name of _auto-generated 
configSet_. 

[~ichattopadhyaya] [~dsmiley] Please review the patch.  


was (Author: asingh2411):
Please find the updated patch here -> [^SOLR-11624.3.patch] 

Changes made:-
1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to 
create/update and use the correct {{modifiedConfigSet}} name.
2. Refactored the {{configName}} , added suffix to the name of _auto-generated 
configSet_. 


> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286385#comment-16286385
 ] 

Abhishek Kumar Singh edited comment on SOLR-11624 at 12/11/17 6:43 PM:
---

Please find the updated patch here -> [^SOLR-11624.3.patch] 

Changes made:-
1. Modified the test case {{TimeRoutedAliasUpdateProcessorTest#test}} to 
create/update and use the correct {{modifiedConfigSet}} name.
2. Refactored the {{configName}} , added suffix to the name of _auto-generated 
configSet_. 



was (Author: asingh2411):
Please find the updated patch here. 

> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-11 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11624:

Attachment: solr-11624.3.patch

Please find the updated patch here. 

> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.patch, solr-11624.3.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11624) _default configset overwrites a a configset if collection.configName isn't specified even if a confiset of the same name already exists.

2017-12-11 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-11624:

Attachment: SOLR-11624.3.patch

> _default configset overwrites a a configset if collection.configName isn't 
> specified even if a confiset of the same name already exists.
> 
>
> Key: SOLR-11624
> URL: https://issues.apache.org/jira/browse/SOLR-11624
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2
>Reporter: Erick Erickson
>Assignee: Ishan Chattopadhyaya
> Attachments: SOLR-11624-2.patch, SOLR-11624.3.patch, SOLR-11624.patch
>
>
> Looks like a problem that crept in when we changed the _default configset 
> stuff.
> setup:
> upload a configset named "wiki"
> collections?action=CREATE&name=wiki&.
> My custom configset "wiki" gets overwritten by _default and then used by the 
> "wiki" collection.
> Assigning to myself only because it really needs to be fixed IMO and I don't 
> want to lose track of it. Anyone else please feel free to take it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11741) Offline training mode for schema guessing

2017-12-09 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284694#comment-16284694
 ] 

Abhishek Kumar Singh commented on SOLR-11741:
-


I am working on the same. 

> Offline training mode for schema guessing
> -
>
> Key: SOLR-11741
> URL: https://issues.apache.org/jira/browse/SOLR-11741
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>
> Our data driven schema guessing doesn't work under many situations. For 
> example, if the first document has a field with value "0", it is guessed as 
> Long and subsequent fields with "0.0" are rejected. Similarly, if the same 
> field had alphanumeric contents for a latter document, those documents are 
> rejected. Also, single vs. multi valued field guessing is not ideal.
> Proposing an offline training mode where Solr accepts bunch of documents and 
> returns a guessed schema (without indexing). This schema can then be used for 
> actual indexing. I think the original idea is from Hoss.
> I think initial implementation can be based on an UpdateRequestProcessor. We 
> can hash out the API soon, as we go along.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-09-18 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Attachment: SOLR-10263.v2.patch

Uploading the updated patch

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-09-18 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Attachment: (was: SOLR-10263.v2.patch)

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3089) Make ResponseBuilder.isDistrib public

2017-09-01 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150256#comment-16150256
 ] 

Abhishek Kumar Singh edited comment on SOLR-3089 at 9/1/17 9:14 AM:


I am using SOLR 6.5.0 and am still facing the same issue. 
I feel the patch should be merged. 

 


was (Author: asingh2411):
I am using SOLR 6.50 and am still facing the same issue. 
I feel the patch should be merged. 

 

> Make ResponseBuilder.isDistrib public
> -
>
> Key: SOLR-3089
> URL: https://issues.apache.org/jira/browse/SOLR-3089
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 4.0-ALPHA
>Reporter: Rok Rejc
> Fix For: 4.9, 6.0
>
> Attachments: Solr-3089.patch
>
>
> Hi,
> i have posted this issue on a mailing list but i didn't get any response.
> I am trying to write distributed search component (a class that extends 
> SearchComponent). I have checked FacetComponent and TermsComponent. If I want 
> that search component works in a distributed environment I have to set 
> ResponseBuilder's isDistrib to true, like this (this is also done in 
> TermsComponent for example):
>   public void prepare(ResponseBuilder rb) throws IOException {
>   SolrParams params = rb.req.getParams();
>   String shards = params.get(ShardParams.SHARDS);
>   if (shards != null) {
>   List lst = StrUtils.splitSmart(shards, ",", 
> true);
>   rb.shards = lst.toArray(new String[lst.size()]);
>   rb.isDistrib = true;
>   }
>   }
> If I have my component outside the package org.apache.solr.handler.component 
> this doesn't work. Is it possible to make isDistrib public (or is this the 
> wrong procedure/behaviour/design)?
> Many thanks,
> Rok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3089) Make ResponseBuilder.isDistrib public

2017-09-01 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150256#comment-16150256
 ] 

Abhishek Kumar Singh commented on SOLR-3089:


I am using SOLR-6.50 and am still facing the same issue. 
I feel the patch should be merged. 

 

> Make ResponseBuilder.isDistrib public
> -
>
> Key: SOLR-3089
> URL: https://issues.apache.org/jira/browse/SOLR-3089
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 4.0-ALPHA
>Reporter: Rok Rejc
> Fix For: 4.9, 6.0
>
> Attachments: Solr-3089.patch
>
>
> Hi,
> i have posted this issue on a mailing list but i didn't get any response.
> I am trying to write distributed search component (a class that extends 
> SearchComponent). I have checked FacetComponent and TermsComponent. If I want 
> that search component works in a distributed environment I have to set 
> ResponseBuilder's isDistrib to true, like this (this is also done in 
> TermsComponent for example):
>   public void prepare(ResponseBuilder rb) throws IOException {
>   SolrParams params = rb.req.getParams();
>   String shards = params.get(ShardParams.SHARDS);
>   if (shards != null) {
>   List lst = StrUtils.splitSmart(shards, ",", 
> true);
>   rb.shards = lst.toArray(new String[lst.size()]);
>   rb.isDistrib = true;
>   }
>   }
> If I have my component outside the package org.apache.solr.handler.component 
> this doesn't work. Is it possible to make isDistrib public (or is this the 
> wrong procedure/behaviour/design)?
> Many thanks,
> Rok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-3089) Make ResponseBuilder.isDistrib public

2017-09-01 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150256#comment-16150256
 ] 

Abhishek Kumar Singh edited comment on SOLR-3089 at 9/1/17 9:11 AM:


I am using SOLR 6.50 and am still facing the same issue. 
I feel the patch should be merged. 

 


was (Author: asingh2411):
I am using SOLR-6.50 and am still facing the same issue. 
I feel the patch should be merged. 

 

> Make ResponseBuilder.isDistrib public
> -
>
> Key: SOLR-3089
> URL: https://issues.apache.org/jira/browse/SOLR-3089
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Affects Versions: 4.0-ALPHA
>Reporter: Rok Rejc
> Fix For: 4.9, 6.0
>
> Attachments: Solr-3089.patch
>
>
> Hi,
> i have posted this issue on a mailing list but i didn't get any response.
> I am trying to write distributed search component (a class that extends 
> SearchComponent). I have checked FacetComponent and TermsComponent. If I want 
> that search component works in a distributed environment I have to set 
> ResponseBuilder's isDistrib to true, like this (this is also done in 
> TermsComponent for example):
>   public void prepare(ResponseBuilder rb) throws IOException {
>   SolrParams params = rb.req.getParams();
>   String shards = params.get(ShardParams.SHARDS);
>   if (shards != null) {
>   List lst = StrUtils.splitSmart(shards, ",", 
> true);
>   rb.shards = lst.toArray(new String[lst.size()]);
>   rb.isDistrib = true;
>   }
>   }
> If I have my component outside the package org.apache.solr.handler.component 
> this doesn't work. Is it possible to make isDistrib public (or is this the 
> wrong procedure/behaviour/design)?
> Many thanks,
> Rok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 1:55 PM:
--

The problem with "maxCollationTries" is that - a {{collationTry}}  is an 
expensive step. So, there is only a limit to which we can increase its value - 
given a certain level of response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, after applying the above patch, we had to configure 
{{DirectSolrSpellChecker}} to {{SUGGEST_ALWAYS}} , while only {{wordBreak}}  
was configured as {{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still 
had  {{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that - a {{collationTry}}  is an 
expensive step. So, there is only a limit to which we can increase its value - 
given a certain level of response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.c

[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:42 AM:
--

The problem with "maxCollationTries" is that - a {{collationTry}}  is an 
expensive step. So, there is only a limit to which we can increase its value - 
given a certain level of response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message

[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:04 AM:
--

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses*. In this case,   
SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions waste 
our precious {{maxCollationTries}} .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses*. In this case,   
SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions 
waiting for our precious {{maxCollationTries}} .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-

[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:08 AM:
--

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *goldmine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlas

[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index  (for relevance or performance reasons)  (SUGGEST_WHEN_NOT_IN_INDEX) 
. 

*UPDATE :* Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

Related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 


  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index  (for relevance or performance reasons)  (SUGGEST_WHEN_NOT_IN_INDEX) 
. 

*UPDATE : * Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

Related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This messa

[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:07 AM:
--

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different _suggestMode_ configurations because the use cases 
can really vary.  (for the above usecase itself, we want *gold* and *mine* to 
be combine to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6

[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:06 AM:
--

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *gold mine sung lasses* and later waste our precious 
{{maxCollationTries}}. In this case,   SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions which we already know are not 
required.  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses*. In this case,   
SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions waste 
our precious {{maxCollationTries}} .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh commented on SOLR-10263:
-

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
(For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses* ,  SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions ) .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:04 AM:
--

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses*. In this case,   
SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions 
waiting for our precious {{maxCollationTries}} .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses*. In this case,   
SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions  .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additio

[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-11 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081922#comment-16081922
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/11/17 9:03 AM:
--

The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses*. In this case,   
SUGGEST_WHEN_NOT_IN_INDEX for {{wordBreak}} will avoid above suggestions  .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 


was (Author: asingh2411):
The problem with "maxCollationTries" is that it is an expensive step. So, there 
is only a limit to which we can increase its value - given a certain level of 
response time/efficiency requirement. 

A control on {{wordBreak}}  suggestions can give us more freedom to get the 
relevant suggestions in cases where we know how our queries are going to be. 
(For example: *gold mine sunglasses* will even give suggestions like *gold mine 
sun glasses*  or even *goldmi sung lasses* ,  SUGGEST_WHEN_NOT_IN_INDEX for 
{{wordBreak}} will avoid above suggestions ) .  

This is why, We faced cases wherein different {{SpellCheckComponents}}  
required different *suggestModes*. 
Also, I think _wordBreak_ and _wordJoin_  (within {{WordBreakSolrSpellCheck}} ) 
should also have different configurations because the use cases can really 
vary.  (for the above usecase itself, we want *gold* and *mine* to be combine 
to *gold mine* , so {{wordJoin}} will again have SUGGEST_ALWAYS. )

This is why in our case, we had to configure {{DirectSolrSpellChecker}} to 
{{SUGGEST_ALWAYS}} , while only {{wordBreak}}  was configured as 
{{SUGGEST_WHEN_NOT_IN_INDEX}} and so that {{wordJoin}}  still had  
{{SUGGEST_ALWAYS}} . 

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE :* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Attachment: SOLR-10263.v2.patch


The above patch has the changes related to the PR :  
https://github.com/apache/lucene-solr/pull/218



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE : * Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080375#comment-16080375
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 7/10/17 2:14 PM:
--

The above patch 
(https://issues.apache.org/jira/secure/attachment/12876418/SOLR-10263.v2.patch) 
has the changes related to the PR :  
https://github.com/apache/lucene-solr/pull/218




was (Author: asingh2411):

The above patch has the changes related to the PR :  
https://github.com/apache/lucene-solr/pull/218



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
> Attachments: SOLR-10263.v2.patch
>
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE : * Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} 
also, both - The {{WordBreak}} and {{WordJoin}} should also have different 
suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

So related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 


  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} 
also, both - The {{WordBreak}} and {{WordJoin}} should also have different 
suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 





> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *Update:* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> So related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*UPDATE : * Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

So related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 


  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} 
also, both - The {{WordBreak}} and {{WordJoin}} should also have different 
suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

So related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE : * Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> So related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.

[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index  (for relevance or performance reasons)  (SUGGEST_WHEN_NOT_IN_INDEX) 
. 

*UPDATE : * Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

Related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 


  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*UPDATE : * Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

Related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index  (for relevance or performance reasons)  
> (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE : * Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

--

[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079987#comment-16079987
 ] 

Abhishek Kumar Singh commented on SOLR-10263:
-

After PR #218 , 

The _solrconfig.xml_   of*WordBreakSolrSpellChecker*   (and later, for all 
the components)   can be configured like this :-

{code:xml}

wordbreakspellcheck
solr.WordBreakSolrSpellChecker
fieldspell
true
true
true
5
0
SUGGEST_WHEN_NOT_IN_INDEX
SUGGEST_ALWAYS

{code}

OR Simply as :-

{code:xml}

spellcheckword
solr.WordBreakSolrSpellChecker
fieldspell
true
true
5
0
SUGGEST_WHEN_NOT_IN_INDEX

{code}

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *Update:* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> So related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*UPDATE : * Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

Related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 


  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*UPDATE : * Recently, we also figured out that, for 
{{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
should also have different suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 

So related changes have been done at Latest PR. : 
https://github.com/apache/lucene-solr/pull/218. 



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *UPDATE : * Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 
> Related changes have been done at Latest PR. : 
> https://github.com/apache/lucene-solr/pull/218. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apac

[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} 
also, both - The {{WordBreak}} and {{WordJoin}} should also have different 
suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 




  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} 
also, both - The {{WordBreak}} and {{WordJoin}} should also have different 
suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 



> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *Update:* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-07-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 

*Update:* Recently, we also figured out that, for {{WordBreakSolrSpellChecker}} 
also, both - The {{WordBreak}} and {{WordJoin}} should also have different 
suggestModes.

We faced this problem in our case, wherein, Most of the WordJoin cases are 
those where the words individually are valid tokens, but what the users are 
looking for is actually a  combination (wordjoin) of the two tokens. 
For example:-
*gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But the 
actual product being looked for is *goldmine sunglasses* , where *goldmine* is 
a brand.
In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
{{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
For this, we should have separate suggestModes for both `wordJoin` as well as 
`wordBreak`. 


  was:
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 




> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 
> *Update:* Recently, we also figured out that, for 
> {{WordBreakSolrSpellChecker}} also, both - The {{WordBreak}} and {{WordJoin}} 
> should also have different suggestModes.
> We faced this problem in our case, wherein, Most of the WordJoin cases are 
> those where the words individually are valid tokens, but what the users are 
> looking for is actually a  combination (wordjoin) of the two tokens. 
> For example:-
> *gold mine sunglasses* : Here, both *gold* and *mine* are valid tokens. But 
> the actual product being looked for is *goldmine sunglasses* , where 
> *goldmine* is a brand.
> In such cases, we should recommend {{didYouMean:goldmine sunglasses}} . But 
> this wont be possible because we had set   {{SUGGEST_WHEN_NOT_IN_INDEX}}  for 
> {{WordBreakSolrSpellChecker}} (of which, WordJoin is a part)  . 
> For this, we should have separate suggestModes for both `wordJoin` as well as 
> `wordBreak`. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10513) CLONE - ConjunctionSolrSpellChecker wrong check for same string distance

2017-04-19 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10513:

Description: 
See ConjunctionSolrSpellChecker.java

try {
  if (stringDistance == null) {
stringDistance = checker.getStringDistance();
  } else if (stringDistance != checker.getStringDistance()) {
throw new IllegalArgumentException(
"All checkers need to use the same StringDistance.");
  }
} catch (UnsupportedOperationException uoe) {
  // ignore
}

In line stringDistance != checker.getStringDistance() there is comparing by 
references. So if you are using 2 or more spellcheckers with same distance 
algorithm, exception will be thrown anyway.


*Update:* As of Solr 6.5, this has been changed to 
*stringDistance.equals(checker.getStringDistance())* .
However, *LuceneLevenshteinDistance* does not even override equals method. 

This does not solve the problem yet, because the *default equals* method anyway 
compares references.

Hence unable to use *FileBasedSolrSpellChecker* .  

Moreover, Some check of similar sorts should also be in the init method. So 
that user does not have to wait for this error during query time. If the 
spellcheck components have been added *solrconfig.xml* , it should throw error 
during core-reload itself.  


  was:
See ConjunctionSolrSpellChecker.java

try {
  if (stringDistance == null) {
stringDistance = checker.getStringDistance();
  } else if (stringDistance != checker.getStringDistance()) {
throw new IllegalArgumentException(
"All checkers need to use the same StringDistance.");
  }
} catch (UnsupportedOperationException uoe) {
  // ignore
}

In line stringDistance != checker.getStringDistance() there is comparing by 
references. So if you are using 2 or more spellcheckers with same distance 
algorithm, exception will be thrown anyway.


> CLONE - ConjunctionSolrSpellChecker wrong check for same string distance
> 
>
> Key: SOLR-10513
> URL: https://issues.apache.org/jira/browse/SOLR-10513
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.9
>Reporter: Abhishek Kumar Singh
>Assignee: James Dyer
> Fix For: 5.5
>
>
> See ConjunctionSolrSpellChecker.java
> try {
>   if (stringDistance == null) {
> stringDistance = checker.getStringDistance();
>   } else if (stringDistance != checker.getStringDistance()) {
> throw new IllegalArgumentException(
> "All checkers need to use the same StringDistance.");
>   }
> } catch (UnsupportedOperationException uoe) {
>   // ignore
> }
> In line stringDistance != checker.getStringDistance() there is comparing by 
> references. So if you are using 2 or more spellcheckers with same distance 
> algorithm, exception will be thrown anyway.
> *Update:* As of Solr 6.5, this has been changed to 
> *stringDistance.equals(checker.getStringDistance())* .
> However, *LuceneLevenshteinDistance* does not even override equals method. 
> This does not solve the problem yet, because the *default equals* method 
> anyway compares references.
> Hence unable to use *FileBasedSolrSpellChecker* .  
> Moreover, Some check of similar sorts should also be in the init method. So 
> that user does not have to wait for this error during query time. If the 
> spellcheck components have been added *solrconfig.xml* , it should throw 
> error during core-reload itself.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10513) CLONE - ConjunctionSolrSpellChecker wrong check for same string distance

2017-04-18 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972987#comment-15972987
 ] 

Abhishek Kumar Singh commented on SOLR-10513:
-

As of Solr 6.5, this has been changed to 
*stringDistance.equals(checker.getStringDistance())* .
However, *LuceneLevenshteinDistance* does not even override equals method. 

This does not solve the problem yet, because the *default equals* method anyway 
compares references.

Hence unable to use *FileBasedSolrSpellChecker* .  

> CLONE - ConjunctionSolrSpellChecker wrong check for same string distance
> 
>
> Key: SOLR-10513
> URL: https://issues.apache.org/jira/browse/SOLR-10513
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.9
>Reporter: Abhishek Kumar Singh
>Assignee: James Dyer
> Fix For: 5.5
>
>
> See ConjunctionSolrSpellChecker.java
> try {
>   if (stringDistance == null) {
> stringDistance = checker.getStringDistance();
>   } else if (stringDistance != checker.getStringDistance()) {
> throw new IllegalArgumentException(
> "All checkers need to use the same StringDistance.");
>   }
> } catch (UnsupportedOperationException uoe) {
>   // ignore
> }
> In line stringDistance != checker.getStringDistance() there is comparing by 
> references. So if you are using 2 or more spellcheckers with same distance 
> algorithm, exception will be thrown anyway.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10513) CLONE - ConjunctionSolrSpellChecker wrong check for same string distance

2017-04-18 Thread Abhishek Kumar Singh (JIRA)
Abhishek Kumar Singh created SOLR-10513:
---

 Summary: CLONE - ConjunctionSolrSpellChecker wrong check for same 
string distance
 Key: SOLR-10513
 URL: https://issues.apache.org/jira/browse/SOLR-10513
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.9
Reporter: Abhishek Kumar Singh
Assignee: James Dyer
 Fix For: 5.5


See ConjunctionSolrSpellChecker.java

try {
  if (stringDistance == null) {
stringDistance = checker.getStringDistance();
  } else if (stringDistance != checker.getStringDistance()) {
throw new IllegalArgumentException(
"All checkers need to use the same StringDistance.");
  }
} catch (UnsupportedOperationException uoe) {
  // ignore
}

In line stringDistance != checker.getStringDistance() there is comparing by 
references. So if you are using 2 or more spellcheckers with same distance 
algorithm, exception will be thrown anyway.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6271) ConjunctionSolrSpellChecker wrong check for same string distance

2017-04-18 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972895#comment-15972895
 ] 

Abhishek Kumar Singh commented on SOLR-6271:


As of Solr 6.5, this has been changed to 
*stringDistance.equals(checker.getStringDistance())* . 

However, *LuceneLevenshteinDistance* does not even override *equals* method.  
This does not solve the problem yet, because this default *equals* method 
anyway compares references.


> ConjunctionSolrSpellChecker wrong check for same string distance
> 
>
> Key: SOLR-6271
> URL: https://issues.apache.org/jira/browse/SOLR-6271
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.9
>Reporter: Igor Kostromin
>Assignee: James Dyer
> Fix For: 5.5
>
> Attachments: SOLR-6271.patch, SOLR-6271.patch
>
>
> See ConjunctionSolrSpellChecker.java
> try {
>   if (stringDistance == null) {
> stringDistance = checker.getStringDistance();
>   } else if (stringDistance != checker.getStringDistance()) {
> throw new IllegalArgumentException(
> "All checkers need to use the same StringDistance.");
>   }
> } catch (UnsupportedOperationException uoe) {
>   // ignore
> }
> In line stringDistance != checker.getStringDistance() there is comparing by 
> references. So if you are using 2 or more spellcheckers with same distance 
> algorithm, exception will be thrown anyway.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-04-04 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Attachment: (was: SOLR-10256.patch)

> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by 
> _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.
> **
> *Update*: 
> Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-04-04 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Description: 
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position by 
_EdismaxParser_ , which is not required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.

**
*Update*: 
Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168

  was:
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position by 
_EdismaxParser_ , which is not required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.


*Update*: Raised PR for the same -> 
https://github.com/apache/lucene-solr/pull/168


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by 
> _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.
> **
> *Update*: 
> Raised PR for the same -> https://github.com/apache/lucene-solr/pull/168



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-04-04 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Description: 
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position by 
_EdismaxParser_ , which is not required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.


*Update*: Raised PR for the same -> 
https://github.com/apache/lucene-solr/pull/168

  was:
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position by 
_EdismaxParser_ , which is not required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
> Attachments: SOLR-10256.patch
>
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by 
> _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.
> *Update*: Raised PR for the same -> 
> https://github.com/apache/lucene-solr/pull/168



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10256) Parentheses in SpellCheckCollator

2017-04-03 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954662#comment-15954662
 ] 

Abhishek Kumar Singh commented on SOLR-10256:
-

I agree with your argument that it may not be the best use case. 
Because it all depends on how we have configured our search to work. And this 
is why we have configurations like  _mm_ , for specifying the minimum match. 
The problem arises in cases wherein our _mm_ configuration guarantees *100% 
tokens-match*, but the spellcheck (due to WordBreak) shows the [suggestions 
wherein even one of the tokens in the broken words](Sugg A)  has a higher 
frequency than the ones with [reasonable frequency but very less Levenstein 
distance ] (Sugg B) . 

We would expect *Sugg B* to have higher weightage in spellcheck suggestions 
than *Sugg A*.  
But it's not happening due to the compulsory braces.

What I feel is, by default it should have braces on, but there should be a 
configuration to switch it off.  

> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
> Attachments: SOLR-10256.patch
>
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by 
> _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-03-28 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946518#comment-15946518
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 3/29/17 4:42 AM:
--

The _solrconfig.xml_   of*WordBreakSolrSpellChecker*   (and later, for all 
the components)   can be configured like this :-

{code:xml}

spellcheckword
solr.WordBreakSolrSpellChecker
fieldspell
true
true
10
0
SUGGEST_WHEN_NOT_IN_INDEX

{code}



was (Author: asingh2411):
The _solrconfig.xml_   of*WordBreakSolrSpellChecker* ( and later, for all 
the components) can be configured like this :-

{code:xml}

spellcheckword
solr.WordBreakSolrSpellChecker
fieldspell
true
true
10
0
SUGGEST_WHEN_NOT_IN_INDEX

{code}


> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-03-28 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946518#comment-15946518
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 3/29/17 4:42 AM:
--

The _solrconfig.xml_   of*WordBreakSolrSpellChecker* ( and later, for all 
the components) can be configured like this :-

{code:xml}

spellcheckword
solr.WordBreakSolrSpellChecker
fieldspell
true
true
10
0
SUGGEST_WHEN_NOT_IN_INDEX

{code}



was (Author: asingh2411):
The _solrconfig.xml_  *WordBreakSolrSpellChecker* and later, for all the 
components can be configured like this :-

{code:xml}

spellcheckword
solr.WordBreakSolrSpellChecker
fieldspell
true
true
10
0
SUGGEST_WHEN_NOT_IN_INDEX

{code}


> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-03-28 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946518#comment-15946518
 ] 

Abhishek Kumar Singh commented on SOLR-10263:
-

The _solrconfig.xml_  *WordBreakSolrSpellChecker* and later, for all the 
components can be configured like this :-

{code:xml}

spellcheckword
solr.WordBreakSolrSpellChecker
fieldspell
true
true
10
0
SUGGEST_WHEN_NOT_IN_INDEX

{code}


> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-03-28 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945363#comment-15945363
 ] 

Abhishek Kumar Singh commented on SOLR-10263:
-

Raised this PR for *WordBreakSolrSpellChecker*.  
https://github.com/apache/lucene-solr/pull/176/files

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own suggestMode

2017-03-28 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Summary: Different SpellcheckComponents should have their own suggestMode  
(was: Different SpellcheckComponents should have their own options)

> Different SpellcheckComponents should have their own suggestMode
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own options

2017-03-28 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926226#comment-15926226
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 3/28/17 2:27 PM:
--

Yes, This is what is happening in the latest code too. 

See this,
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120

It passes  the *options* , for all the SpellCheckComponents.



was (Author: asingh2411):
Yes, This is what is happening in the latest code too. 

See this,
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120

It passes gives the *options* , for all the SpellCheckComponents.


> Different SpellcheckComponents should have their own options
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-10263) Different SpellcheckComponents should have their own options

2017-03-15 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926226#comment-15926226
 ] 

Abhishek Kumar Singh edited comment on SOLR-10263 at 3/15/17 2:07 PM:
--

Yes, This is what is happening in the latest code too. 

See this,
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120

It passes gives the *options* , for all the SpellCheckComponents.



was (Author: asingh2411):
Yes, This is what is happening in the latest code too. 

See this,
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120

It passes gives the same options, for all the SpellCheckComponents.


> Different SpellcheckComponents should have their own options
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-10263) Different SpellcheckComponents should have their own options

2017-03-15 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926226#comment-15926226
 ] 

Abhishek Kumar Singh commented on SOLR-10263:
-

Yes, This is what is happening in the latest code too. 

See this,
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/ConjunctionSolrSpellChecker.java#L120

It passes gives the same options, for all the SpellCheckComponents.


> Different SpellcheckComponents should have their own options
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6271) ConjunctionSolrSpellChecker wrong check for same string distance

2017-03-10 Thread Abhishek Kumar Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905234#comment-15905234
 ] 

Abhishek Kumar Singh commented on SOLR-6271:


This issue is still occuring in my case, wherein I'm using 
_DirectSolrSpellChecker_ and  _FileBasedSpellChecker_ . The problem is that 
_DirectSolrSpellChecker_ is using *LuceneLavensteinDistance* while  
_FileBasedSpellChecker_ is using *LavensteinDistance* as the StringDistance.
This is throwing the   *IllegalArgumentException( "All checkers need to use the 
same StringDistance.");*  .

What can be the fix to this? 


> ConjunctionSolrSpellChecker wrong check for same string distance
> 
>
> Key: SOLR-6271
> URL: https://issues.apache.org/jira/browse/SOLR-6271
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.9
>Reporter: Igor Kostromin
>Assignee: James Dyer
> Fix For: 5.5
>
> Attachments: SOLR-6271.patch, SOLR-6271.patch
>
>
> See ConjunctionSolrSpellChecker.java
> try {
>   if (stringDistance == null) {
> stringDistance = checker.getStringDistance();
>   } else if (stringDistance != checker.getStringDistance()) {
> throw new IllegalArgumentException(
> "All checkers need to use the same StringDistance.");
>   }
> } catch (UnsupportedOperationException uoe) {
>   // ignore
> }
> In line stringDistance != checker.getStringDistance() there is comparing by 
> references. So if you are using 2 or more spellcheckers with same distance 
> algorithm, exception will be thrown anyway.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10263) Different SpellcheckComponents should have their own options

2017-03-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10263:

Description: 
As of now, common spellcheck options are applied to all the 
SpellCheckComponents.
This can create problem in the following case:-
 It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
spellcheck suggestions. 
But we may want *WordBreakSpellChecker* to suggest only if the token is not in 
the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



> Different SpellcheckComponents should have their own options
> 
>
> Key: SOLR-10263
> URL: https://issues.apache.org/jira/browse/SOLR-10263
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>Priority: Minor
>
> As of now, common spellcheck options are applied to all the 
> SpellCheckComponents.
> This can create problem in the following case:-
>  It may be the case that we want *DirectSolrSpellChecker* to ALWAYS_SUGGEST 
> spellcheck suggestions. 
> But we may want *WordBreakSpellChecker* to suggest only if the token is not 
> in the index (SUGGEST_WHEN_NOT_IN_INDEX) . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10263) Different SpellcheckComponents should have their own options

2017-03-10 Thread Abhishek Kumar Singh (JIRA)
Abhishek Kumar Singh created SOLR-10263:
---

 Summary: Different SpellcheckComponents should have their own 
options
 Key: SOLR-10263
 URL: https://issues.apache.org/jira/browse/SOLR-10263
 Project: Solr
  Issue Type: Wish
  Security Level: Public (Default Security Level. Issues are Public)
  Components: spellchecker
Reporter: Abhishek Kumar Singh
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-03-10 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Attachment: SOLR-10256.patch

Attaching patch for the above issue. 
Added a flag to SpellCheckCollator. 
And made it configurable. 


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
> Attachments: SOLR-10256.patch
>
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by 
> _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-03-09 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Description: 
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position by 
_EdismaxParser_ , which is not required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.

  was:
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position by 
> _EdismaxParser_ , which is not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-03-08 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Description: 
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*. 
Such suggestions are being surrounded by braces by current 
*SpellCheckCollator*. 
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parenthesisation of spell check suggestions.

  was:
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*.
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parnthesisation of spell check suggestion.


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*. 
> Such suggestions are being surrounded by braces by current 
> *SpellCheckCollator*. 
> And when surrounded by brackets, they represent the same position, which is 
> not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parenthesisation of spell check suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-03-08 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Description: 
SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
have space between them.  
This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
being used, queries like : *applejuice* will be broken down to *apple juice*.
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parnthesisation of spell check suggestion.

  was:
SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have 
space between them.  
This should be configurable, because if WordBreakSpellCheckComponent is being 
used, queries like : *applejuice* will be broken down to *apple juice*.
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parnthesisation of spell check suggestion.


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>
> SpellCheckCollator adds parentheses ( *'('* and *')'* ) around tokens which 
> have space between them.  
> This should be configurable, because if *_WordBreakSpellCheckComponent_* is 
> being used, queries like : *applejuice* will be broken down to *apple juice*.
> And when surrounded by brackets, they represent the same position, which is 
> not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parnthesisation of spell check suggestion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-10256) Parentheses in SpellCheckCollator

2017-03-08 Thread Abhishek Kumar Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Kumar Singh updated SOLR-10256:

Description: 
SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have 
space between them.  
This should be configurable, because if WordBreakSpellCheckComponent is being 
used, queries like : *applejuice* will be broken down to *apple juice*.
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parnthesisation of spell check suggestion.

  was:
SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have 
space between them.  
This should be configurable, because if WordBreakSpellCheckComponent is being 
used, queries like : applejuice will be broken down to apple juice.
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parnthesisation of spell check suggestion.


> Parentheses in SpellCheckCollator
> -
>
> Key: SOLR-10256
> URL: https://issues.apache.org/jira/browse/SOLR-10256
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: spellchecker
>Reporter: Abhishek Kumar Singh
>
> SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have 
> space between them.  
> This should be configurable, because if WordBreakSpellCheckComponent is being 
> used, queries like : *applejuice* will be broken down to *apple juice*.
> And when surrounded by brackets, they represent the same position, which is 
> not required. 
> https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
>   
> A solution to this will be to have a flag, which can help disable this 
> parnthesisation of spell check suggestion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-10256) Parentheses in SpellCheckCollator

2017-03-08 Thread Abhishek Kumar Singh (JIRA)
Abhishek Kumar Singh created SOLR-10256:
---

 Summary: Parentheses in SpellCheckCollator
 Key: SOLR-10256
 URL: https://issues.apache.org/jira/browse/SOLR-10256
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: spellchecker
Reporter: Abhishek Kumar Singh


SpellCheckCollator adds parentheses ( '(' and ')' ) around tokens which have 
space between them.  
This should be configurable, because if WordBreakSpellCheckComponent is being 
used, queries like : applejuice will be broken down to apple juice.
And when surrounded by brackets, they represent the same position, which is not 
required. 

https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/spelling/SpellCheckCollator.java#L227
  

A solution to this will be to have a flag, which can help disable this 
parnthesisation of spell check suggestion.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org