[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-09-10 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1321:
--

Attachment: SOLR-1321.patch

grant's patch, but with the logic to handle matchalldocs

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: SOLR-1321.patch, SOLR-1321.patch, SOLR-1321.patch, 
> wildcards-2.patch, wildcards-3.patch, wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-09-10 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1321:
--

Attachment: SOLR-1321.patch

Notwithstanding the fact that other tests that use the QP fail with this patch 
(and the old one), here's a patch that uses char[] instead of Strings.

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: SOLR-1321.patch, SOLR-1321.patch, wildcards-2.patch, 
> wildcards-3.patch, wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-09-10 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-1321:
--

Attachment: SOLR-1321.patch

Added ASL headers.

I don't understand, in the Test, the comment:
{quote}
// XXX note: this should be false, but for now we return true for any field,
// XXX if at least one field uses the reversing
assertTrue(parserThree.getAllowLeadingWildcard());
{quote}

Seems like this needs to be fixed before committing.

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: SOLR-1321.patch, wildcards-2.patch, wildcards-3.patch, 
> wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-09-09 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1321:


Attachment: wildcards-3.patch

Updated patch that uses new TokenAttribute API and uses (as much as possible) 
the new ReverseStringFilter.

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
>Assignee: Grant Ingersoll
> Fix For: 1.4
>
> Attachments: wildcards-2.patch, wildcards-3.patch, wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-08-03 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1321:


Attachment: wildcards-2.patch

Previous patch mistakenly included other stuff instead of 
ReversedWildcardFilterFactory.

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
> Fix For: 1.4
>
> Attachments: wildcards-2.patch, wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-08-03 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1321:


Attachment: (was: wildcards-2.patch)

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
> Fix For: 1.4
>
> Attachments: wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-08-03 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1321:


Attachment: wildcards-2.patch

Updated patch with more configurable knobs. See javadoc of 
ReversedWildcardsFilterFactory and unit tests.

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
> Fix For: 1.4
>
> Attachments: wildcards-2.patch, wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1321) Support for efficient leading wildcards search

2009-07-31 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated SOLR-1321:


Attachment: wildcards.patch

Patch containing the new filter, example schema and unit tests.

> Support for efficient leading wildcards search
> --
>
> Key: SOLR-1321
> URL: https://issues.apache.org/jira/browse/SOLR-1321
> Project: Solr
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
> Fix For: 1.4
>
> Attachments: wildcards.patch
>
>
> This patch is an implementation of the "reversed tokens" strategy for 
> efficient leading wildcards queries.
> ReversedWildcardsTokenFilter reverses tokens and returns both the original 
> token (optional) and the reversed token (with positionIncrement == 0). 
> Reversed tokens are prepended with a marker character to avoid collisions 
> between legitimate tokens and the reversed tokens - e.g. "DNA" would become 
> "and", thus colliding with the regular term "and", but with the marker 
> character it becomes "\u0001and".
> This TokenFilter can be added to the analyzer chain that it used during 
> indexing.
> SolrQueryParser has been modified to detect the presence of such fields in 
> the current schema, and treat them in a special way. First, SolrQueryParser 
> examines the schema and collects a map of fields where these reversed tokens 
> are indexed. If there is at least one such field, it also sets 
> QueryParser.setAllowLeadingWildcards(true). When building a wildcard query 
> (in getWildcardQuery) the term text may be optionally reversed to put 
> wildcards further along the term text. This happens when the field uses the 
> reversing filter during indexing (as detected above), AND if the wildcard 
> characters are either at 0-th or 1-st position in the term. Otherwise the 
> term text is processed as before, i.e. turned into a regular wildcard query.
> Unit tests are provided to test the TokenFilter and the query parsing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.