[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-1321: -- Attachment: SOLR-1321.patch grant's patch, but with the logic to handle matchalldocs > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki >Assignee: Grant Ingersoll > Fix For: 1.4 > > Attachments: SOLR-1321.patch, SOLR-1321.patch, SOLR-1321.patch, > wildcards-2.patch, wildcards-3.patch, wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1321: -- Attachment: SOLR-1321.patch Notwithstanding the fact that other tests that use the QP fail with this patch (and the old one), here's a patch that uses char[] instead of Strings. > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki >Assignee: Grant Ingersoll > Fix For: 1.4 > > Attachments: SOLR-1321.patch, SOLR-1321.patch, wildcards-2.patch, > wildcards-3.patch, wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated SOLR-1321: -- Attachment: SOLR-1321.patch Added ASL headers. I don't understand, in the Test, the comment: {quote} // XXX note: this should be false, but for now we return true for any field, // XXX if at least one field uses the reversing assertTrue(parserThree.getAllowLeadingWildcard()); {quote} Seems like this needs to be fixed before committing. > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki >Assignee: Grant Ingersoll > Fix For: 1.4 > > Attachments: SOLR-1321.patch, wildcards-2.patch, wildcards-3.patch, > wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-1321: Attachment: wildcards-3.patch Updated patch that uses new TokenAttribute API and uses (as much as possible) the new ReverseStringFilter. > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki >Assignee: Grant Ingersoll > Fix For: 1.4 > > Attachments: wildcards-2.patch, wildcards-3.patch, wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-1321: Attachment: wildcards-2.patch Previous patch mistakenly included other stuff instead of ReversedWildcardFilterFactory. > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki > Fix For: 1.4 > > Attachments: wildcards-2.patch, wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-1321: Attachment: (was: wildcards-2.patch) > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki > Fix For: 1.4 > > Attachments: wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-1321: Attachment: wildcards-2.patch Updated patch with more configurable knobs. See javadoc of ReversedWildcardsFilterFactory and unit tests. > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki > Fix For: 1.4 > > Attachments: wildcards-2.patch, wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search
[ https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated SOLR-1321: Attachment: wildcards.patch Patch containing the new filter, example schema and unit tests. > Support for efficient leading wildcards search > -- > > Key: SOLR-1321 > URL: https://issues.apache.org/jira/browse/SOLR-1321 > Project: Solr > Issue Type: Improvement > Components: Analysis >Affects Versions: 1.4 >Reporter: Andrzej Bialecki > Fix For: 1.4 > > Attachments: wildcards.patch > > > This patch is an implementation of the "reversed tokens" strategy for > efficient leading wildcards queries. > ReversedWildcardsTokenFilter reverses tokens and returns both the original > token (optional) and the reversed token (with positionIncrement == 0). > Reversed tokens are prepended with a marker character to avoid collisions > between legitimate tokens and the reversed tokens - e.g. "DNA" would become > "and", thus colliding with the regular term "and", but with the marker > character it becomes "\u0001and". > This TokenFilter can be added to the analyzer chain that it used during > indexing. > SolrQueryParser has been modified to detect the presence of such fields in > the current schema, and treat them in a special way. First, SolrQueryParser > examines the schema and collects a map of fields where these reversed tokens > are indexed. If there is at least one such field, it also sets > QueryParser.setAllowLeadingWildcards(true). When building a wildcard query > (in getWildcardQuery) the term text may be optionally reversed to put > wildcards further along the term text. This happens when the field uses the > reversing filter during indexing (as detected above), AND if the wildcard > characters are either at 0-th or 1-st position in the term. Otherwise the > term text is processed as before, i.e. turned into a regular wildcard query. > Unit tests are provided to test the TokenFilter and the query parsing. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.