[jira] Commented: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850613#action_12850613
 ] 

Robert Muir commented on SOLR-1852:
---

ok, so your bug relates somehow to how the accumulated position increment gap 
is handled.

This is how your stopword fits into the situation, somehow the new code is 
handling it "better"  for your case, but perhaps its wrong.

there are quite a few tests in TestWordDelimiter, which it passes, but I'll 
spend some time tonight verifying its correctness before we declare success...

> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another 
> sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it 
> looks like the bug occurs when a stop word preceeds a word that is split up 
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850612#action_12850612
 ] 

Robert Muir commented on SOLR-1852:
---

bq. The changes in the patch originate at SOLR-1706 and SOLR-1657, however I 
don't think it's actually the same bug as SOLR-1706 intended to fix since the 
the admin analyzer interface the generated tokens look correct. 

Yeah, I don't like the situation at all, as its not obvious to me at a glance 
how the trunk impl fixes your problem, but at the same time how this changed 
behavior slipped passed the random tests on SOLR-1710.


> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another 
> sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it 
> looks like the bug occurs when a stop word preceeds a word that is split up 
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850610#action_12850610
 ] 

Peter Wolanin commented on SOLR-1852:
-

The changes in the patch originate at SOLR-1706 and SOLR-1657, however I don't 
think it's actually the same bug as SOLR-1706 intended to fix since the the 
admin analyzer interface the generated tokens look correct.

> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another 
> sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it 
> looks like the bug occurs when a stop word preceeds a word that is split up 
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850608#action_12850608
 ] 

Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:52 PM:
---

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated WordDelimiterFilter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like "Identi ca".  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).

  was (Author: pwolanin):
This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like "Identi ca".  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).
  
> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another 
> sentence:
> "Support Identi.ca"
> the search matches.  Testing suggests the word "for" is the problem, and it 
> looks like the bug occurs when a stop word preceeds a word that is split up 
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in another 
sentence:

"Support Identi.ca"

the search matches.  Testing suggests the word "for" is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the word delimiter filter.

Setting enablePositionIncrements="false" in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory


  was:
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in another 
sentence:

"Support Identi.ca"

the search matches.  Testing suggests the word "for" is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the whitespace delimiter.

Setting enablePositionIncrements="false" in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor



> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"

[jira] Updated: (SOLR-1852) enablePositionIncrements="true" can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

"would be great to have support for Identi.ca on the follow block"

fails to match "Identi.ca", but putting the content on its own or in another 
sentence:

"Support Identi.ca"

the search matches.  Testing suggests the word "for" is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the whitespace delimiter.

Setting enablePositionIncrements="false" in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


  was:
Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements="false" in the stop filter and reindexing 
causes the searches to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


Summary: enablePositionIncrements="true" can cause searches to fail 
when they are parsed as phrase queries  (was: enablePositionIncrements="true" 
causes searches to fail when they are parse as phrase queries)

> enablePositionIncrements="true" can cause searches to fail when they are 
> parsed as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr 
> 1.4 analyzer tells me that I will get a match, but when I enter the search 
> either in the client or directly in Solr, the search fails. 
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Id

[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements="true" causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850608#action_12850608
 ] 

Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:41 PM:
---

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like "Identi ca".  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).

  was (Author: pwolanin):
This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.
  
> enablePositionIncrements="true" causes searches to fail when they are parse 
> as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain
> name containing a '.', the Solr 1.4 analyzer tells me that I will get
> a match, but when I enter the search either in the client or directly
> in Solr, the search fails.  Our default handler is dismax, but this
> also fails with the standard handler.  So I'm wondering if this is a
> known issue, or am I missing something subtle in the analysis chain?
> Solr is 1.4.0 that I built.
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1852) enablePositionIncrements="true" causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Attachment: SOLR-1852.patch

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

> enablePositionIncrements="true" causes searches to fail when they are parse 
> as phrase queries
> -
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch
>
>
> Symptom: searching for a string like a domain
> name containing a '.', the Solr 1.4 analyzer tells me that I will get
> a match, but when I enter the search either in the client or directly
> in Solr, the search fails.  Our default handler is dismax, but this
> also fails with the standard handler.  So I'm wondering if this is a
> known issue, or am I missing something subtle in the analysis chain?
> Solr is 1.4.0 that I built.
> test string:  Identi.ca
> queries that fail:  IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Setting enablePositionIncrements="false" in the stop filter and reindexing 
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in 
> Solr trunk, either due to the upgraded lucene or changes to the 
> WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1659) Get off deprecated Lucene API's to clear the way for a move to Lucene 3.0 +

2010-03-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-1659.
---

Resolution: Fixed

this is resolved on the new merged lucene/solr trunk (part of first update)

> Get off deprecated Lucene API's to clear the way for a move to Lucene 3.0 +
> ---
>
> Key: SOLR-1659
> URL: https://issues.apache.org/jira/browse/SOLR-1659
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Miller
> Attachments: SOLR-1659.patch, SOLR-1659.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1820) Remove custom greek/russian charsets encoding

2010-03-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1820.
---

   Resolution: Fixed
Fix Version/s: 3.1
 Assignee: Robert Muir

This was resolved in revision 922964.

> Remove custom greek/russian charsets encoding
> -
>
> Key: SOLR-1820
> URL: https://issues.apache.org/jira/browse/SOLR-1820
> Project: Solr
>  Issue Type: Task
>  Components: Schema and Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: SOLR-1820.patch
>
>
> In Solr 1.4, we deprecated support for 'custom encodings embedded inside 
> unicode'.
> This is where the analyzer in lucene itself did encoding conversions, its 
> better to just let 
> analyzers be analyzers, and leave encoding conversion to Java.
> In order to move to Lucene 3.x, we need to remove this deprecated support, 
> and instead
> issue an error in the factories if you try to do this (instead of a warning).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options

2010-03-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1706.
---

   Resolution: Fixed
Fix Version/s: 3.1
 Assignee: Mark Miller

This was resolved in revision 922957.

> wrong tokens output from WordDelimiterFilter depending upon options
> ---
>
> Key: SOLR-1706
> URL: https://issues.apache.org/jira/browse/SOLR-1706
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 1.4
>Reporter: Robert Muir
>Assignee: Mark Miller
> Fix For: 3.1
>
>
> below you can see that when I have requested to only output numeric 
> concatenations (not words), some words are still sometimes output, ignoring 
> the options i have provided, and even then, in a very inconsistent way.
> {code}
>   assertWdf("Super-Duper-XL500-42-AutoCoder's", 0,0,0,1,0,0,0,0,1, null,
> new String[] { "42", "AutoCoder" },
> new int[] { 18, 21 },
> new int[] { 20, 30 },
> new int[] { 1, 1 });
>   assertWdf("Super-Duper-XL500-42-AutoCoder's-56", 0,0,0,1,0,0,0,0,1, null,
> new String[] { "42", "AutoCoder", "56" },
> new int[] { 18, 21, 33 },
> new int[] { 20, 30, 35 },
> new int[] { 1, 1, 1 });
>   assertWdf("Super-Duper-XL500-AB-AutoCoder's", 0,0,0,1,0,0,0,0,1, null,
> new String[] {  },
> new int[] {  },
> new int[] {  },
> new int[] {  });
>   assertWdf("Super-Duper-XL500-42-AutoCoder's-BC", 0,0,0,1,0,0,0,0,1, null,
> new String[] { "42" },
> new int[] { 18 },
> new int[] { 20 },
> new int[] { 1 });
> {code}
> where assertWdf is 
> {code}
>   void assertWdf(String text, int generateWordParts, int generateNumberParts,
>   int catenateWords, int catenateNumbers, int catenateAll,
>   int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
>   int stemEnglishPossessive, CharArraySet protWords, String expected[],
>   int startOffsets[], int endOffsets[], String types[], int posIncs[])
>   throws IOException {
> TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
> WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
> generateNumberParts, catenateWords, catenateNumbers, catenateAll,
> splitOnCaseChange, preserveOriginal, splitOnNumerics,
> stemEnglishPossessive, protWords);
> assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
> posIncs);
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1657) convert the rest of solr to use the new tokenstream API

2010-03-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1657.
---

   Resolution: Fixed
Fix Version/s: 3.1
 Assignee: Mark Miller

This was resolved in revision 922957.

> convert the rest of solr to use the new tokenstream API
> ---
>
> Key: SOLR-1657
> URL: https://issues.apache.org/jira/browse/SOLR-1657
> Project: Solr
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Mark Miller
> Fix For: 3.1
>
> Attachments: SOLR-1657.patch, SOLR-1657.patch, SOLR-1657.patch, 
> SOLR-1657.patch, SOLR-1657_part2.patch, 
> SOLR-1657_synonyms_ugly_slightly_less_slow.patch, 
> SOLR-1657_synonyms_ugly_slow.patch
>
>
> org.apache.solr.analysis:
> -BufferedTokenStream-
>  -> -CommonGramsFilter-
>  -> -CommonGramsQueryFilter-
>  -> -RemoveDuplicatesTokenFilter-
> -CapitalizationFilterFactory-
> -HyphenatedWordsFilter-
> -LengthFilter (deprecated, remove)-
> SynonymFilter
> SynonymFilterFactory
> -WordDelimiterFilter-
> -org.apache.solr.handler:-
> -AnalysisRequestHandler-
> -AnalysisRequestHandlerBase-
> -org.apache.solr.handler.component:-
> -QueryElevationComponent-
> -SpellCheckComponent-
> -org.apache.solr.highlight:-
> -DefaultSolrHighlighter-
> -org.apache.solr.spelling:-
> -SpellingQueryConverter-

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1710) convert worddelimiterfilter to new tokenstream API

2010-03-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved SOLR-1710.
---

   Resolution: Fixed
Fix Version/s: 3.1
 Assignee: Mark Miller

This was resolved in revision 922957.

> convert worddelimiterfilter to new tokenstream API
> --
>
> Key: SOLR-1710
> URL: https://issues.apache.org/jira/browse/SOLR-1710
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Reporter: Robert Muir
>Assignee: Mark Miller
> Fix For: 3.1
>
> Attachments: SOLR-1710-readable.patch, SOLR-1710-readable.patch, 
> SOLR-1710.patch, SOLR-1710.patch
>
>
> This one was a doozy, attached is a patch to convert it to the new 
> tokenstream API.
> Some of the logic was split into WordDelimiterIterator (exposes a 
> BreakIterator-like api for iterating subwords)
> the filter is much more efficient now, no cloning.
> before applying the patch, copy the existing WordDelimiterFilter to 
> OriginalWordDelimiterFilter
> the patch includes a testcase (TestWordDelimiterBWComp) which generates 
> random strings from various subword combinations.
> For each random string, it compares output against the existing 
> WordDelimiterFilter for all 512 combinations of boolean parameters.
> NOTE: due to bugs found (SOLR-1706), this currently only tests 256 of these 
> combinations. The bugs discovered in SOLR-1706 are fixed here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1852) enablePositionIncrements="true" causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)
enablePositionIncrements="true" causes searches to fail when they are parse as 
phrase queries
-

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin


Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements="false" in the stop filter and reindexing 
causes the searches to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1851) luceneAutoCommit no longer has any effect - remove it from config

2010-03-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-1851:
--

Attachment: SOLR-1851.patch

> luceneAutoCommit no longer has any effect - remove it from config
> -
>
> Key: SOLR-1851
> URL: https://issues.apache.org/jira/browse/SOLR-1851
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1851.patch
>
>
> missed this on the upgrade to Lucene trunk - Lucene no longer has autocommit 
> - so it now has no effect in Solr - needs to be removed with a warning if its 
> found.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1842) DataImportHandler ODBC keeps lock on the source table while optimisatising is being run...

2010-03-27 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850554#action_12850554
 ] 

Noble Paul commented on SOLR-1842:
--

DIH cannot do anything specific for one type of driver. I'm not sure what is 
the expected fix

> DataImportHandler ODBC keeps lock on the source table while optimisatising is 
> being run...
> --
>
> Key: SOLR-1842
> URL: https://issues.apache.org/jira/browse/SOLR-1842
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.5
>Reporter: Marcin
>
> Hi Guys,
> I don't know if its really a bug but I think its quite good place for it.
> The problem is with dataImportHandler and DB queries.
> For example:
> Let's have a big table which keeps docs to being indexed, we are running 
> query against it on a datimporthandler and query locks table which is quite 
> obvius and desire behaviour from the SQL points of view but while 
> optimisation is being done its should not allow to issue query because in 
> that case table is being locked till optimisation process will finish which 
> can take a time...
> As a workaround you can use "select SQL_BUFFER_RESULT..." statment which will 
> move everything into temp table and release all locks but still 
> dataImportHandlerwill be waiting for optimisation to finish. Which means you 
> will be able to insert new docs into main table at least.
> cheers

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1851) luceneAutoCommit no longer has any effect - remove it from config

2010-03-27 Thread Mark Miller (JIRA)
luceneAutoCommit no longer has any effect - remove it from config
-

 Key: SOLR-1851
 URL: https://issues.apache.org/jira/browse/SOLR-1851
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 1.5


missed this on the upgrade to Lucene trunk - Lucene no longer has autocommit - 
so it now has no effect in Solr - needs to be removed with a warning if its 
found.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser

2010-03-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850539#action_12850539
 ] 

Mark Miller commented on SOLR-896:
--

Yeah, I'll try and do that soon.

Chris was thinking about taking it and putting it up on google code, but not 
sure where he is with that idea.

> Solr Query Parser Plugin for Mark Miller's Qsol Parser
> --
>
> Key: SOLR-896
> URL: https://issues.apache.org/jira/browse/SOLR-896
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Chris Harris
> Attachments: SOLR-896.patch, SOLR-896.patch
>
>
> An extremely basic plugin to get the Qsol query parser 
> (http://www.myhardshadow.com/qsol.php) working in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-896) Solr Query Parser Plugin for Mark Miller's Qsol Parser

2010-03-27 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850508#action_12850508
 ] 

Otis Gospodnetic commented on SOLR-896:
---

This looks super straight forward.  The only problem is that Qsol itself seems 
to be gone.

Mark, any way you can put Qsol somewhere?  Maybe just attach the Jar to this 
issue?

> Solr Query Parser Plugin for Mark Miller's Qsol Parser
> --
>
> Key: SOLR-896
> URL: https://issues.apache.org/jira/browse/SOLR-896
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Chris Harris
> Attachments: SOLR-896.patch, SOLR-896.patch
>
>
> An extremely basic plugin to get the Qsol query parser 
> (http://www.myhardshadow.com/qsol.php) working in Solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal : Solr-trunk #1101

2010-03-27 Thread Apache Hudson Server
See