[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-31 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852233#action_12852233
 ] 

Peter Wolanin commented on SOLR-1852:
-

I'm confused by that comment - I thought this code is already in 1.5/trunk and 
the issue is backporting to the 1.4 branch?

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Robert Muir
 Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)
enablePositionIncrements=true causes searches to fail when they are parse as 
phrase queries
-

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin


Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Attachment: SOLR-1852.patch

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

 enablePositionIncrements=true causes searches to fail when they are parse 
 as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain
 name containing a '.', the Solr 1.4 analyzer tells me that I will get
 a match, but when I enter the search either in the client or directly
 in Solr, the search fails.  Our default handler is dismax, but this
 also fails with the standard handler.  So I'm wondering if this is a
 known issue, or am I missing something subtle in the analysis chain?
 Solr is 1.4.0 that I built.
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850608#action_12850608
 ] 

Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:41 PM:
---

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like Identi ca.  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).

  was (Author: pwolanin):
This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.
  
 enablePositionIncrements=true causes searches to fail when they are parse 
 as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain
 name containing a '.', the Solr 1.4 analyzer tells me that I will get
 a match, but when I enter the search either in the client or directly
 in Solr, the search fails.  Our default handler is dismax, but this
 also fails with the standard handler.  So I'm wondering if this is a
 known issue, or am I missing something subtle in the analysis chain?
 Solr is 1.4.0 that I built.
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

would be great to have support for Identi.ca on the follow block

fails to match Identi.ca, but putting the content on its own or in another 
sentence:

Support Identi.ca

the search matches.  Testing suggests the word for is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the whitespace delimiter.

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


  was:
Symptom: searching for a string like a domain
name containing a '.', the Solr 1.4 analyzer tells me that I will get
a match, but when I enter the search either in the client or directly
in Solr, the search fails.  Our default handler is dismax, but this
also fails with the standard handler.  So I'm wondering if this is a
known issue, or am I missing something subtle in the analysis chain?
Solr is 1.4.0 that I built.

test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.

According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor


Summary: enablePositionIncrements=true can cause searches to fail 
when they are parsed as phrase queries  (was: enablePositionIncrements=true 
causes searches to fail when they are parse as phrase queries)

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the whitespace delimiter.
 Setting enablePositionIncrements=false in the stop 

[jira] Updated: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1852:


Description: 
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

would be great to have support for Identi.ca on the follow block

fails to match Identi.ca, but putting the content on its own or in another 
sentence:

Support Identi.ca

the search matches.  Testing suggests the word for is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the word delimiter filter.

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory


  was:
Symptom: searching for a string like a domain name containing a '.', the Solr 
1.4 analyzer tells me that I will get a match, but when I enter the search 
either in the client or directly in Solr, the search fails. 
test string:  Identi.ca

queries that fail:  IdentiCa, Identi.ca, Identi-ca

query that matches: Identi ca


schema in use is:
http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1

Screen shots:

analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png

Whether or not the bug appears is determined by the surrounding text:

would be great to have support for Identi.ca on the follow block

fails to match Identi.ca, but putting the content on its own or in another 
sentence:

Support Identi.ca

the search matches.  Testing suggests the word for is the problem, and it 
looks like the bug occurs when a stop word preceeds a word that is split up 
using the whitespace delimiter.

Setting enablePositionIncrements=false in the stop filter and reindexing 
causes the searches to match.


According to Mark Miller in #solr, this bug appears to be fixed already in Solr 
trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor



 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false 

[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850608#action_12850608
 ] 

Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:52 PM:
---

This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated WordDelimiterFilter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like Identi ca.  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).

  was (Author: pwolanin):
This patch was created by Mark Miller - it's a back port of Solr trunk code 
plus a tweak to let 1.4 compile

With this updated Whitespace Delimiter if I reindex the bug seems to be fixed.

In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is 
treated as phrase query as if I had quoted it like Identi ca.  That phrase 
search also fails.  I had expected that Identi.ca would be the same as Identi 
ca (i.e. 2 separate tokens, not a phrase).
  
 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries

2010-03-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850610#action_12850610
 ] 

Peter Wolanin commented on SOLR-1852:
-

The changes in the patch originate at SOLR-1706 and SOLR-1657, however I don't 
think it's actually the same bug as SOLR-1706 intended to fix since the the 
admin analyzer interface the generated tokens look correct.

 enablePositionIncrements=true can cause searches to fail when they are 
 parsed as phrase queries
 -

 Key: SOLR-1852
 URL: https://issues.apache.org/jira/browse/SOLR-1852
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: SOLR-1852.patch


 Symptom: searching for a string like a domain name containing a '.', the Solr 
 1.4 analyzer tells me that I will get a match, but when I enter the search 
 either in the client or directly in Solr, the search fails. 
 test string:  Identi.ca
 queries that fail:  IdentiCa, Identi.ca, Identi-ca
 query that matches: Identi ca
 schema in use is:
 http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1
 Screen shots:
 analysis:  http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
 dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
 dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
 standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
 Whether or not the bug appears is determined by the surrounding text:
 would be great to have support for Identi.ca on the follow block
 fails to match Identi.ca, but putting the content on its own or in another 
 sentence:
 Support Identi.ca
 the search matches.  Testing suggests the word for is the problem, and it 
 looks like the bug occurs when a stop word preceeds a word that is split up 
 using the word delimiter filter.
 Setting enablePositionIncrements=false in the stop filter and reindexing 
 causes the searches to match.
 According to Mark Miller in #solr, this bug appears to be fixed already in 
 Solr trunk, either due to the upgraded lucene or changes to the 
 WordDelimiterFactory

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1553) extended dismax query parser

2010-01-26 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805303#action_12805303
 ] 

Peter Wolanin commented on SOLR-1553:
-

some commented out debug code left in the committed parser?

{code}
protected void addClause(List clauses, int conj, int mods, Query q) {
//System.out.println(addClause:clauses=+clauses+ conj=+conj+ mods=+mods+ 
q=+q);
  super.addClause(clauses, conj, mods, q);
}
{code}

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 1.5

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-10-30 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771932#action_12771932
 ] 

Peter Wolanin commented on SOLR-874:


Anyone have an approach for this bug so we can get it fixed before 1.4 is done?

 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher
 Attachments: SOLR-874.patch


 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-08 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752468#action_12752468
 ] 

Peter Wolanin commented on SOLR-1400:
-


these lines seems to vary as to whether there is WS between char and the []

{code}
@@ -29,29 +30,48 @@
 public class TestTrimFilter extends BaseTokenTestCase {
   
   public void testTrim() throws Exception {
+char[] a =  a .toCharArray();
+char [] b = b   .toCharArray();
+char [] ccc = cCc.toCharArray();
+char[] whitespace =.toCharArray();
+char[] empty = .toCharArray();
{code}

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: SOLR-1400.patch, trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-07 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752245#action_12752245
 ] 

Peter Wolanin commented on SOLR-1400:
-

The patch seems to fix the bug for me, but there seems to be some code style 
inconsistency in the test code.

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
Assignee: Grant Ingersoll
 Fix For: 1.4

 Attachments: SOLR-1400.patch, trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.

2009-09-03 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751038#action_12751038
 ] 

Peter Wolanin commented on SOLR-756:


We are regularly hitting this wall and users are very frustrated by not being 
able to use wildcards becuase we wanted the other advantages of the dismax 
parser.

Any chance to get some of these changes in 1.4?

 Make DisjunctionMaxQueryParser generally useful by supporting all query types.
 --

 Key: SOLR-756
 URL: https://issues.apache.org/jira/browse/SOLR-756
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.3
Reporter: David Smiley
 Fix For: 1.5

 Attachments: SolrPluginUtilsDisMax.patch


 This is an enhancement to the DisjunctionMaxQueryParser to work on all the 
 query variants such as wildcard, prefix, and fuzzy queries, and to support 
 working in AND scenarios that are not processed by the min-should-match 
 DisMax QParser. This was not in Solr already because DisMax was only used for 
 a very limited syntax that didn't use those features. In my opinion, this 
 makes a more suitable base parser for general use because unlike the 
 Lucene/Solr parser, this one supports multiple default fields whereas other 
 ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of 
 a single default field is antiquated and a technical under-the-hood detail of 
 Lucene that I think Solr should shield the user from by on-the-fly using a 
 DisMax when multiple fields are used. 
 (patch to be attached soon)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter

2009-09-01 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1400:


Attachment: trim-example.xml


Post the attached document using the trunk sample schema.xml to reproduce.

 Document with empty or white-space only string causes exception with 
 TrimFilter
 ---

 Key: SOLR-1400
 URL: https://issues.apache.org/jira/browse/SOLR-1400
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.4
Reporter: Peter Wolanin
 Attachments: trim-example.xml


 Observed with Solr trunk.  Posting any empty or whitespace-only string to a 
 field using the {code}filter class=solr.TrimFilterFactory /{code}
 Causes a java exception:
 {code}
 Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1
   at 
 org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63)
   at 
 org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74)
   at 
 org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138)
   at 
 org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772)
   at 
 org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611)
   at 
 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 {code}
 Trim of an empty or WS-only string should not fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-08-03 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1274:


Attachment: SOLR-1274.patch

Here's a patch that's nearly there, but somehow I'm missing something in how 
java behaves.  The param is getting picked up, but this line never evals as 
true, even when the param is parsed right:

{code}
  if (extractFormat == text) {
{code}


If I set it to
{code}
  if (true) {
{code}

I get the desired text-only output.

 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1274.patch


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-08-03 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1274:


Attachment: SOLR-1274.patch

Well, indeed - something like that works better.


 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1274.patch, SOLR-1274.patch


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-07-15 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731437#action_12731437
 ] 

Peter Wolanin commented on SOLR-1274:
-

A minimal version of this would be pretty trivial as far as features go, and 
I'd thought Yonik was indicating on the e-mail list that it would be a 
reasonable follow on to his last patch in the linked issue.

 Provide multiple output formats in extract-only mode for tika handler
 -

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4


 The proposed feature is to accept a URL parameter when using extract-only 
 mode to specify an output format.  This parameter might just overload the 
 existing ext.extract.only so that one can optionally specify a format, e.g. 
 false|true|xml|text  where true and xml give the same response (i.e. xml 
 remains the default)
 I had been assuming that I could choose among possible tika output
 formats when using the extracting request handler in extract-only mode
 as if from the CLI with the tika jar:
-x or --xmlOutput XHTML content (default)
-h or --html   Output HTML content
-t or --text   Output plain text content
-m or --metadata   Output only metadata
 However, looking at the docs and source, it seems that only the xml
 option is available (hard-coded) in ExtractingDocumentLoader.java
 {code}
 serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, 
 true));
 {code}
 Providing at least a plain-text response seems to work if you change the 
 serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-14 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-874:
---

Attachment: SOLR-874.patch

Here's a simple patch that escapes with a \.  It prevents the exception, 
however, this fails to match and/or/not (after removing those from the 
stopwords file) so it's clearly not quite right.



 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher
 Attachments: SOLR-874.patch


 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler

2009-07-13 Thread Peter Wolanin (JIRA)
Provide multiple output formats in extract-only mode for tika handler
-

 Key: SOLR-1274
 URL: https://issues.apache.org/jira/browse/SOLR-1274
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4


The proposed feature is to accept a URL parameter when using extract-only mode 
to specify an output format.  This parameter might just overload the existing 
ext.extract.only so that one can optionally specify a format, e.g. 
false|true|xml|text  where true and xml give the same response (i.e. xml 
remains the default)

I had been assuming that I could choose among possible tika output
formats when using the extracting request handler in extract-only mode
as if from the CLI with the tika jar:

   -x or --xmlOutput XHTML content (default)
   -h or --html   Output HTML content
   -t or --text   Output plain text content
   -m or --metadata   Output only metadata

However, looking at the docs and source, it seems that only the xml
option is available (hard-coded) in ExtractingDocumentLoader.java
{code}
serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true));
{code}

Providing at least a plain-text response seems to work if you change the 
serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer).





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-13 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730492#action_12730492
 ] 

Peter Wolanin commented on SOLR-874:


I get the same sort of exception with a *leading* operator and the dismax 
handler.


Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR 
vti OR aut OR author OR dll': Encountered  OR OR  at line
1, column 0.
Was expecting one of:
   NOT ...
   + ...
   - ...
   ( ...
   * ...
   QUOTED ...
   TERM ...
   PREFIXTERM ...
   WILDTERM ...
   [ ...
   { ...
   NUMBER ...
   TERM ...
   * ...

   at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher

 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR

2009-07-13 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730513#action_12730513
 ] 

Peter Wolanin commented on SOLR-874:


possibly a fix could be rolled into this existing method in 
SolrPluginUtils.java ?

{code}
  /**
   * Strips operators that are used illegally, otherwise reuturns it's
   * input.  Some examples of illegal user queries are: chocolate +-
   * chip, chocolate - - chip, and chocolate chip -.
   */
  public static CharSequence stripIllegalOperators(CharSequence s) {
String temp = CONSECUTIVE_OP_PATTERN.matcher( s ).replaceAll(   );
return DANGLING_OP_PATTERN.matcher( temp ).replaceAll(  );
  }
{code}

This seems only to be called from:

org/apache/solr/search/DisMaxQParser.java:156:  userQuery = 
SolrPluginUtils.stripIllegalOperators(userQuery).toString();

 Dismax parser exceptions on trailing OPERATOR
 -

 Key: SOLR-874
 URL: https://issues.apache.org/jira/browse/SOLR-874
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Erik Hatcher

 Dismax is supposed to be immune to parse exceptions, but alas it's not:
 http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND
 kaboom!
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod 
 AND': Encountered EOF at line 1, column 8.
 Was expecting one of:
 NOT ...
 + ...
 - ...
 ( ...
 * ...
 QUOTED ...
 TERM ...
 PREFIXTERM ...
 WILDTERM ...
 [ ...
 { ...
 NUMBER ...
 TERM ...
 * ...
 
   at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
   at 
 org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
   at org.apache.solr.search.QParser.getQuery(QParser.java:88)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1200) NullPointerException when unloading an absent core

2009-06-04 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716252#action_12716252
 ] 

Peter Wolanin commented on SOLR-1200:
-

Do we need to open another issue (maybe for 1.5) - I'd think the expected 
behavior would be to throw a specific exception anywhere in core admin that a 
core is not found, and then catch it and return a 404?

At the moment, however, you can request status for a non-existent core, etc, 
and get a 200 with some data, so this patch makes the behavior consistent, at 
least.

 NullPointerException when unloading an absent core
 --

 Key: SOLR-1200
 URL: https://issues.apache.org/jira/browse/SOLR-1200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: java version 1.6.0_07
Reporter: Peter Wolanin
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1200.patch, SOLR-1200.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When I try to unload a core that does not exist (e.g. it has already been 
 unloaded), Solr throws a NullPointerException
 java.lang.NullPointerException
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319)
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1200) NullPointerException when unloading an absent core

2009-06-03 Thread Peter Wolanin (JIRA)
NullPointerException when unloading an absent core
--

 Key: SOLR-1200
 URL: https://issues.apache.org/jira/browse/SOLR-1200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: java version 1.6.0_07
Reporter: Peter Wolanin
Priority: Minor



When I try to unload a core that does not exist (e.g. it has already been 
unloaded), Solr throws a NullPointerException

java.lang.NullPointerException
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319)
   at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1200) NullPointerException when unloading an absent core

2009-06-03 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1200:


Attachment: SOLR-1200.patch

Here's a simple patch that follows the pattern in the other core admin methods.

 NullPointerException when unloading an absent core
 --

 Key: SOLR-1200
 URL: https://issues.apache.org/jira/browse/SOLR-1200
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: java version 1.6.0_07
Reporter: Peter Wolanin
Priority: Minor
 Attachments: SOLR-1200.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 When I try to unload a core that does not exist (e.g. it has already been 
 unloaded), Solr throws a NullPointerException
 java.lang.NullPointerException
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319)
at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174)
at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   ...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1183) Example script not update for new analysis path from SOLR-1099

2009-05-24 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1183:


Attachment: SOLR-1183.patch

 Example script not update for new analysis path from SOLR-1099
 --

 Key: SOLR-1183
 URL: https://issues.apache.org/jira/browse/SOLR-1183
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1183.patch


 The example script example/exampleAnalysis/post.sh attempts to post to the 
 path http://localhost:8983/solr/analysis
  however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
 default as of r767412
 A simple fix is to change to http://localhost:8983/solr/analysis/document

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1183) Example script not updated for new analysis path from SOLR-1099

2009-05-24 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1183:


Description: 

The example script example/exampleAnalysis/post.sh attempts to post to the path 
http://localhost:8983/solr/analysis
 however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
default as of r767412

A simple fix is to change to http://localhost:8983/solr/analysis/document

  was:


The example script example/exampleAnalysis/post.sh attempts to post to the path 
http://localhost:8983/solr/analysis
 however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
default as of r767412

A simple fix is to change to http://localhost:8983/solr/analysis/document

Summary: Example script not updated for new analysis path from 
SOLR-1099  (was: Example script not update for new analysis path from SOLR-1099)

 Example script not updated for new analysis path from SOLR-1099
 ---

 Key: SOLR-1183
 URL: https://issues.apache.org/jira/browse/SOLR-1183
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1183.patch


 The example script example/exampleAnalysis/post.sh attempts to post to the 
 path http://localhost:8983/solr/analysis
  however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by 
 default as of r767412
 A simple fix is to change to http://localhost:8983/solr/analysis/document

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1167) Support module xml config files using XInclude

2009-05-17 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710200#action_12710200
 ] 

Peter Wolanin commented on SOLR-1167:
-

I think you posted a sample snippet for solrconfig to the list - can you report 
here and possibly include in the patch a change to the sample schema or 
solrconfig that would demonstrate this feature?

 Support module xml config files using XInclude
 --

 Key: SOLR-1167
 URL: https://issues.apache.org/jira/browse/SOLR-1167
 Project: Solr
  Issue Type: New Feature
Reporter: Bryan Talbot
Priority: Minor
 Attachments: SOLR-1167.patch


 Current configuration files (schema and solrconfig) are monolithic which can 
 make maintenance and reuse more difficult that it needs to be.  The XML 
 standards include a feature to include content from external files.  This is 
 described at http://www.w3.org/TR/xinclude/
 This feature is to add support for XInclude features for XML configuration 
 files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1151) Document the new CopyField maxChars property in the example schema.xml

2009-05-08 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1151:


Description: In this issue:  http://issues.apache.org/jira/browse/SOLR-538  
a maxLength property was added to the copyField directive.  However, this is 
not documented in the example schema to make the feature known to users.  (was: 
In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
property was added to the copyField directive.  However, this is not documented 
in the example schema to make the feature known to users.)
Summary: Document the new CopyField maxChars property in the example 
schema.xml  (was: Document the new CopyField maxLength property in the example 
schema.xml)

 Document the new CopyField maxChars property in the example schema.xml
 --

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1151) Document the new CopyField maxChars property in the example schema.xml

2009-05-08 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1151:


Attachment: SOLR-1151.patch

revised patch to use maxChars - still not sure if this is a useful example, but 
at least adds some documentation of this property.


 Document the new CopyField maxChars property in the example schema.xml
 --

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch, SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml

2009-05-07 Thread Peter Wolanin (JIRA)
Document the new CopyField maxLength property in the example schema.xml
---

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4



In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
property was added to the copyField directive.  However, this is not documented 
in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml

2009-05-07 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1151:


Attachment: SOLR-1151.patch

1st pass


 Document the new CopyField maxLength property in the example schema.xml
 ---

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml

2009-05-07 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707211#action_12707211
 ] 

Peter Wolanin commented on SOLR-1151:
-

needs work - the final format is maxChars NOT maxLength

 Document the new CopyField maxLength property in the example schema.xml
 ---

 Key: SOLR-1151
 URL: https://issues.apache.org/jira/browse/SOLR-1151
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 1.4
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1151.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 In this issue:  http://issues.apache.org/jira/browse/SOLR-538  a maxLength 
 property was added to the copyField directive.  However, this is not 
 documented in the example schema to make the feature known to users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-341) PHP Solr Client

2009-03-13 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12681893#action_12681893
 ] 

Peter Wolanin commented on SOLR-341:


r6 has been bundled into a release:  
http://code.google.com/p/solr-php-client/downloads/list

We'll test this with the Drupal module soon, but is likely to work fine.

 PHP Solr Client
 ---

 Key: SOLR-341
 URL: https://issues.apache.org/jira/browse/SOLR-341
 Project: Solr
  Issue Type: New Feature
  Components: clients - php
Affects Versions: 1.2
 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other 
 json_decode function implementation). Solr = 1.2
Reporter: Donovan Jimenez
Priority: Trivial
 Fix For: 1.5

 Attachments: SolrPhpClient.2008-09-02.zip, 
 SolrPhpClient.2008-11-14.zip, SolrPhpClient.2008-11-25.zip, SolrPhpClient.zip


 Developed this client when the example PHP source didn't meet our needs.  The 
 company I work for agreed to release it under the terms of the Apache License.
 This version is slightly different from what I originally linked to on the 
 dev mailing list.  I've incorporated feedback from Yonik and hossman to 
 simplify the client and only accept one response format (JSON currently).
 When Solr 1.3 is released the client can be updated to use the PHP or 
 Serialized PHP response writer.
 example usage from my original mailing list post:
 ?php
 require_once('Solr/Service.php');
 $start = microtime(true);
 $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 
 8180, '/solr');
 try
 {
 $response = $solr-search('solr', 0, 10,
 array(/* you can include other parameters here */));
 echo 'search returned with status = ', 
 $response-responseHeader-status,
 ' and took ', microtime(true) - $start, ' seconds', \n;
 //here's how you would access results
 //Notice that I've mapped the values by name into a tree of stdClass 
 objects
 //and arrays (actually, most of this is done by json_decode )
 if ($response-response-numFound  0)
 {
 $doc_number = $response-response-start;
 foreach ($response-response-docs as $doc)
 {
 $doc_number++;
 echo $doc_number, ': ', $doc-text, \n;
 }
 }
 //for the purposes of seeing the available structure of the response
 //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on 
 the response before
 //any values are accessed may result in different behavior (in case
 //anyone has some troubles debugging)
 //print_r($response);
 }
 catch (Exception $e)
 {
 echo $e-getMessage(), \n;
 }
 ?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-196) A PHP response writer for Solr

2009-03-09 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288
 ] 

Peter Wolanin commented on SOLR-196:


This serialized writer produces output that is inconsistent with the other PHP 
writer adn inconsistent with the JSON

 A PHP response writer for Solr
 --

 Key: SOLR-196
 URL: https://issues.apache.org/jira/browse/SOLR-196
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, search
Reporter: Paul Borgermans
 Fix For: 1.3

 Attachments: SOLR-192-php-responsewriter.patch, 
 SOLR-196-PHPResponseWriter.patch


 It would be useful to have a PHP response writer that returns an array to be 
 eval-ed directly. This is especially true for PHP4.x installs, where there is 
 no built in support for JSON.
 This issue attempts to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-196) A PHP response writer for Solr

2009-03-09 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288
 ] 

Peter Wolanin edited comment on SOLR-196 at 3/9/09 2:33 PM:


This serialized writer produces output that is inconsistent with the other PHP 
writer and inconsistent with the JSON.

  was (Author: pwolanin):
This serialized writer produces output that is inconsistent with the other 
PHP writer adn inconsistent with the JSON
  
 A PHP response writer for Solr
 --

 Key: SOLR-196
 URL: https://issues.apache.org/jira/browse/SOLR-196
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, search
Reporter: Paul Borgermans
 Fix For: 1.3

 Attachments: SOLR-192-php-responsewriter.patch, 
 SOLR-196-PHPResponseWriter.patch


 It would be useful to have a PHP response writer that returns an array to be 
 eval-ed directly. This is especially true for PHP4.x installs, where there is 
 no built in support for JSON.
 This issue attempts to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-196) A PHP response writer for Solr

2009-03-09 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288
 ] 

Peter Wolanin edited comment on SOLR-196 at 3/9/09 4:39 PM:


This PHP writer is inconsistent with the JSON if you use php 5's decode_json, 
maps come back as objects.

  was (Author: pwolanin):
This serialized writer produces output that is inconsistent with the other 
PHP writer and inconsistent with the JSON.
  
 A PHP response writer for Solr
 --

 Key: SOLR-196
 URL: https://issues.apache.org/jira/browse/SOLR-196
 Project: Solr
  Issue Type: New Feature
  Components: clients - php, search
Reporter: Paul Borgermans
 Fix For: 1.3

 Attachments: SOLR-192-php-responsewriter.patch, 
 SOLR-196-PHPResponseWriter.patch


 It would be useful to have a PHP response writer that returns an array to be 
 eval-ed directly. This is especially true for PHP4.x installs, where there is 
 no built in support for JSON.
 This issue attempts to address this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer

2009-02-27 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677627#action_12677627
 ] 

Peter Wolanin commented on SOLR-822:


Is there an issue for CharStream API  in lucene?  The e-mail thread looks like 
people were generally in support.

 CharFilter - normalize characters before tokenizer
 --

 Key: SOLR-822
 URL: https://issues.apache.org/jira/browse/SOLR-822
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4

 Attachments: character-normalization.JPG, sample_mapping_ja.txt, 
 sample_mapping_ja.txt, SOLR-822-for-1.3.patch, SOLR-822.patch, 
 SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch


 A new plugin which can be placed in front of tokenizer/.
 {code:xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping_ja.txt /
 tokenizer class=solr.MappingCJKTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}
 charFilter/ can be multiple (chained). I'll post a JPEG file to show 
 character normalization sample soon.
 MOTIVATION:
 In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and 
 Morphological Analyzer.
 When we use morphological analyzer, because the analyzer uses Japanese 
 dictionary to detect terms,
 we need to normalize characters before tokenization.
 I'll post a patch soon, too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1031) XSS vulnerability in schema.jsp (patch included)

2009-02-20 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675524#action_12675524
 ] 

Peter Wolanin commented on SOLR-1031:
-

To add a little more background - I ran into this bug while doing work on our 
Drupal integration module.  It's easy to demonstrate, and basically happens if 
a script is indexed in an unprocessed or untokenized field (e.g. a string 
field) and shows up as one of the top terms on the schema browser page 
(schema.jsp) when one goes to examine a particular field.

The risk of allowing such script to execute cold include modification or 
deletion of the index, as well as other XSS attacks, and the danger of a small 
JS payload is potentially enhanced by the fact that is could probably use 
jQuery functions like jQuery.post(). 

For the Drupal module we are mitigating this risk by using the PHP strip_tags() 
function prior to indexing content, but it seems liek this is something Solr 
should handle more generally.

I first observed the bug in Solr 1.3, and it's still present in trunk (1.4)

Re-posting Paul's patch with the preferred naming.

 XSS vulnerability in schema.jsp (patch included)
 

 Key: SOLR-1031
 URL: https://issues.apache.org/jira/browse/SOLR-1031
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.2, 1.3
Reporter: Paul Lovvik
 Attachments: SchemaXSS.patch, SOLR-1031.patch


 If javascript is embedded in any of the fields, it is possible for that 
 javascript to be executed when viewing the schema.
 The javascript will appear in the Top Terms part of the UI.
 I have created a simple patch to prevent this problem from occurring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1031) XSS vulnerability in schema.jsp (patch included)

2009-02-20 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1031:


Attachment: SOLR-1031.patch

 XSS vulnerability in schema.jsp (patch included)
 

 Key: SOLR-1031
 URL: https://issues.apache.org/jira/browse/SOLR-1031
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.2, 1.3
Reporter: Paul Lovvik
 Attachments: SchemaXSS.patch, SOLR-1031.patch


 If javascript is embedded in any of the fields, it is possible for that 
 javascript to be executed when viewing the schema.
 The javascript will appear in the Top Terms part of the UI.
 I have created a simple patch to prevent this problem from occurring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1031) XSS vulnerability in schema.jsp (patch included)

2009-02-20 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675526#action_12675526
 ] 

Peter Wolanin commented on SOLR-1031:
-

Drupal ships with a little JS function for sanitizing output (works like the 
PHP function htmlspecialchars($text, ENT_QUOTES) ).  Possibly you could add 
something similar if the text() function doesn't give the desired output:


{code:javascript}
/**
 * Encode special characters in a plain-text string for display as HTML.
 */
Drupal.checkPlain = function(str) {
  str = String(str);
  var replace = { '': 'amp;', '': 'quot;', '': 'lt;', '': 'gt;' };
  for (var character in replace) {
var regex = new RegExp(character, 'g');
str = str.replace(regex, replace[character]);
  }
  return str;
};
{code}

http://php.net/htmlspecialchars

http://cvs.drupal.org/viewvc.py/drupal/drupal/misc/drupal.js?revision=1.50view=markup

 XSS vulnerability in schema.jsp (patch included)
 

 Key: SOLR-1031
 URL: https://issues.apache.org/jira/browse/SOLR-1031
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.2, 1.3
Reporter: Paul Lovvik
 Attachments: SchemaXSS.patch, SOLR-1031.patch


 If javascript is embedded in any of the fields, it is possible for that 
 javascript to be executed when viewing the schema.
 The javascript will appear in the Top Terms part of the UI.
 I have created a simple patch to prevent this problem from occurring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1022) suggest multiValued for ignored field

2009-02-18 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674584#action_12674584
 ] 

Peter Wolanin commented on SOLR-1022:
-

The ignored field is non-indexed and non-stored.   The suggested function in 
the example schema.xml is to avoid Solr errors when there is a doc field that 
matches nothing in the schema.

I found that the one in the example schema being single-valued, if I send in an 
unmatched multi-valued field I still get the error that enabling this field was 
intended to prevent.

I did not send to the ML, since this seemed pretty trivial, but I can do so as 
well.

 suggest multiValued for ignored field
 -

 Key: SOLR-1022
 URL: https://issues.apache.org/jira/browse/SOLR-1022
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
 Environment: Mac OS 10.5 java 1.5
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1022.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 We are actually using the suggested ignored field in the schema.  I have 
 found, however, that Solr still throws a error 400 if I send in an unmatched 
 multi-valued field.
 It seems that if I set this ignored field to be multiValued than a document 
 with unrecognized single or multiple value fields is sucessfully indexed.
 Attached patch alters this suggested item in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1022) suggest multiValued for ignored field

2009-02-17 Thread Peter Wolanin (JIRA)
suggest multiValued for ignored field
-

 Key: SOLR-1022
 URL: https://issues.apache.org/jira/browse/SOLR-1022
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
 Environment: Mac OS 10.5 java 1.5
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4


We are sing the suggested ignored field in the schema.  I have found, 
however, that Solr still throws a error 400 if I send in an unmatched 
multi-valued field.

It seems that if I set this ignored field to be multiValued than a document 
with unrecognized single or multiple value fields is sucessfully indexed.

Attached patch alters this suggested item in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1022) suggest multiValued for ignored field

2009-02-17 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1022:


Attachment: SOLR-1022.patch

 suggest multiValued for ignored field
 -

 Key: SOLR-1022
 URL: https://issues.apache.org/jira/browse/SOLR-1022
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
 Environment: Mac OS 10.5 java 1.5
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1022.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 We are sing the suggested ignored field in the schema.  I have found, 
 however, that Solr still throws a error 400 if I send in an unmatched 
 multi-valued field.
 It seems that if I set this ignored field to be multiValued than a document 
 with unrecognized single or multiple value fields is sucessfully indexed.
 Attached patch alters this suggested item in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1022) suggest multiValued for ignored field

2009-02-17 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-1022:


Description: 
We are actually using the suggested ignored field in the schema.  I have 
found, however, that Solr still throws a error 400 if I send in an unmatched 
multi-valued field.

It seems that if I set this ignored field to be multiValued than a document 
with unrecognized single or multiple value fields is sucessfully indexed.

Attached patch alters this suggested item in the schema.

  was:
We are sing the suggested ignored field in the schema.  I have found, 
however, that Solr still throws a error 400 if I send in an unmatched 
multi-valued field.

It seems that if I set this ignored field to be multiValued than a document 
with unrecognized single or multiple value fields is sucessfully indexed.

Attached patch alters this suggested item in the schema.


 suggest multiValued for ignored field
 -

 Key: SOLR-1022
 URL: https://issues.apache.org/jira/browse/SOLR-1022
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.3
 Environment: Mac OS 10.5 java 1.5
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1022.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 We are actually using the suggested ignored field in the schema.  I have 
 found, however, that Solr still throws a error 400 if I send in an unmatched 
 multi-valued field.
 It seems that if I set this ignored field to be multiValued than a document 
 with unrecognized single or multiple value fields is sucessfully indexed.
 Attached patch alters this suggested item in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-929) error in admin interface for dynamicField name=* type=ignored

2009-02-03 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-929:
---

Attachment: Solr-admin-page.jpg

Screen shot showing schema browser bug

 error in admin interface for dynamicField name=* type=ignored
 -

 Key: SOLR-929
 URL: https://issues.apache.org/jira/browse/SOLR-929
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.3
 Environment: java version 1.5.0_16, Mac OS 10.5.5, Jetty example 
 server.  Also see the same bug on linux with tomcat.
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: Solr-admin-page.jpg


 There appears to be an error in the admin interface (/solr/admin/schema.jsp) 
 when using a '*' field in a schema.  In the example
 schema.xml, there is a commented out sample:
 {code} 
   !-- uncomment the following to ignore any fields that don't
 already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type=ignored to some other type e.g.
 text if you want
unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored /--
 {code} 
 We have this un-commented, and in the schema browser via the admin interface 
 I see that all non-dynamic fields get a type of ignored.
 for example, I see this in the Solr admin interface:
 Field: uid
 Dynamically Created From Pattern: *
 Field Type: ignored
 though the field definition is:
 {code} 
   field name=uid  type=integer indexed=true stored=true/
 {code} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-929) error in admin interface for dynamicField name=* type=ignored

2009-02-03 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-929:
---

Attachment: schema.xml

This schema.xml shows the problem.

 error in admin interface for dynamicField name=* type=ignored
 -

 Key: SOLR-929
 URL: https://issues.apache.org/jira/browse/SOLR-929
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.3
 Environment: java version 1.5.0_16, Mac OS 10.5.5, Jetty example 
 server.  Also see the same bug on linux with tomcat.
Reporter: Peter Wolanin
Priority: Minor
 Fix For: 1.4

 Attachments: schema.xml, Solr-admin-page.jpg


 There appears to be an error in the admin interface (/solr/admin/schema.jsp) 
 when using a '*' field in a schema.  In the example
 schema.xml, there is a commented out sample:
 {code} 
   !-- uncomment the following to ignore any fields that don't
 already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type=ignored to some other type e.g.
 text if you want
unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored /--
 {code} 
 We have this un-commented, and in the schema browser via the admin interface 
 I see that all non-dynamic fields get a type of ignored.
 for example, I see this in the Solr admin interface:
 Field: uid
 Dynamically Created From Pattern: *
 Field Type: ignored
 though the field definition is:
 {code} 
   field name=uid  type=integer indexed=true stored=true/
 {code} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-820) replicate After startup for new replication

2009-01-30 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12668953#action_12668953
 ] 

Peter Wolanin commented on SOLR-820:


Jacob and I are seeing the exact same issue today - is there some way to set 
the timestamp in the index on the slave server ?

 replicate After startup for new replication
 ---

 Key: SOLR-820
 URL: https://issues.apache.org/jira/browse/SOLR-820
 Project: Solr
  Issue Type: Improvement
  Components: replication (scripts)
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-820.patch


 add another option of 
 {code}
  str name=replicateAfterstartup/str
 {code}
 so that replication can be triggered w/o a commit

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-929) error in admin interface for dynamicField name=* type=ignored

2008-12-18 Thread Peter Wolanin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Wolanin updated SOLR-929:
---

Environment: java version 1.5.0_16, Mac OS 10.5.5, Jetty example server.  
Also see the same bug on linux with tomcat.  (was: java version 1.5.0_16, Mac 
OS 10.5.5, Jetty example server)

 error in admin interface for dynamicField name=* type=ignored
 -

 Key: SOLR-929
 URL: https://issues.apache.org/jira/browse/SOLR-929
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.3
 Environment: java version 1.5.0_16, Mac OS 10.5.5, Jetty example 
 server.  Also see the same bug on linux with tomcat.
Reporter: Peter Wolanin
Priority: Minor

 There appears to be an error in the admin interface (/solr/admin/schema.jsp) 
 when using a '*' field in a schema.  In the example
 schema.xml, there is a commented out sample:
 {code} 
   !-- uncomment the following to ignore any fields that don't
 already match an existing
field name or dynamic field, rather than reporting them as an error.
alternately, change the type=ignored to some other type e.g.
 text if you want
unknown fields indexed and/or stored by default --
   !--dynamicField name=* type=ignored /--
 {code} 
 We have this un-commented, and in the schema browser via the admin interface 
 I see that all non-dynamic fields get a type of ignored.
 for example, I see this in the Solr admin interface:
 Field: uid
 Dynamically Created From Pattern: *
 Field Type: ignored
 though the field definition is:
 {code} 
   field name=uid  type=integer indexed=true stored=true/
 {code} 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-929) error in admin interface for dynamicField name=* type=ignored

2008-12-18 Thread Peter Wolanin (JIRA)
error in admin interface for dynamicField name=* type=ignored
-

 Key: SOLR-929
 URL: https://issues.apache.org/jira/browse/SOLR-929
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.3
 Environment: java version 1.5.0_16, Mac OS 10.5.5, Jetty example 
server
Reporter: Peter Wolanin
Priority: Minor


There appears to be an error in the admin interface (/solr/admin/schema.jsp) 
when using a '*' field in a schema.  In the example
schema.xml, there is a commented out sample:

{code} 
  !-- uncomment the following to ignore any fields that don't
already match an existing
   field name or dynamic field, rather than reporting them as an error.
   alternately, change the type=ignored to some other type e.g.
text if you want
   unknown fields indexed and/or stored by default --
  !--dynamicField name=* type=ignored /--
{code} 

We have this un-commented, and in the schema browser via the admin interface I 
see that all non-dynamic fields get a type of ignored.

for example, I see this in the Solr admin interface:

Field: uid
Dynamically Created From Pattern: *
Field Type: ignored

though the field definition is:
{code} 
  field name=uid  type=integer indexed=true stored=true/
{code} 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.