[jira] Updated: (SOLR-319) changes SynonymFilterFactory for N-gram tokenizer

2007-07-25 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-319:


Attachment: SOLR-319-UTF-8.patch

The patch includes TestSynonymMap. To test SynonymMap, I removed "private" 
declaration from parseRules() method.
This patch includes CJKTokenizerFactory, too.

> changes SynonymFilterFactory for N-gram tokenizer
> -
>
> Key: SOLR-319
> URL: https://issues.apache.org/jira/browse/SOLR-319
> Project: Solr
>  Issue Type: Improvement
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: SOLR-319-UTF-8.patch
>
>
> WHAT:
> Currently, SynonymFilterFactory works very well with N-gram tokenizer 
> (CJKTokenizer, for example).
> But we have to take care of the statement in synonyms.txt.
> For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
> C1C2C3 maps to C4C5C6,
> I have to write the rule as follows:
> C1C2 C2C3 => C4C5 C5C6
> But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
> helpful for sharing synonyms.txt.
> HOW:
> tokenFactory attribute is added to  class="solr.SynonymFilterFactory"/>.
> If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
> to create Tokenizer.
> Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
> synonyms.txt file.
> sample-1: CJKTokenizer
>  positionIncrementGap="100">
>   
> 
>  synonyms="ngram_synonym_test_ja.txt"
>   ignoreCase="true" expand="true" 
> tokenFactory="solr.CJKTokenizerFactory"/>
> 
>   
>   
> 
> 
>   
> 
> sample-2: NGramTokenizer
>  positionIncrementGap="100">
>   
>  maxGramSize="2"/>
> 
>   
>   
>  maxGramSize="2"/>
>  synonyms="ngram_synonym_test_ngram.txt"
>   ignoreCase="true" expand="true"
>   tokenFactory="solr.NGramTokenizerFactory" 
> minGramSize="2" maxGramSize="2"/>
> 
>   
> 
> backward compatibility:
> Yes. If you omit tokenFactory attribute from  class="solr.SynonymFilterFactory"/> tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-319) changes SynonymFilterFactory for N-gram tokenizer

2007-07-25 Thread Koji Sekiguchi (JIRA)
changes SynonymFilterFactory for N-gram tokenizer
-

 Key: SOLR-319
 URL: https://issues.apache.org/jira/browse/SOLR-319
 Project: Solr
  Issue Type: Improvement
Reporter: Koji Sekiguchi
Priority: Minor


WHAT:
Currently, SynonymFilterFactory works very well with N-gram tokenizer 
(CJKTokenizer, for example).
But we have to take care of the statement in synonyms.txt.
For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
C1C2C3 maps to C4C5C6,
I have to write the rule as follows:

C1C2 C2C3 => C4C5 C5C6

But I want to write it "C1C2C3=>C4C5C6". This patch allows it. It is also 
helpful for sharing synonyms.txt.

HOW:
tokenFactory attribute is added to .
If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
to create Tokenizer.
Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
synonyms.txt file.

sample-1: CJKTokenizer


  



  
  


  


sample-2: NGramTokenizer


  


  
  



  


backward compatibility:
Yes. If you omit tokenFactory attribute from  tag, it works as usual.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-25 Thread Mike Klaas (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515424
 ] 

Mike Klaas commented on SOLR-139:
-

Darn, you're right: writer.addDocument() is outside of the synchronized block.

We could do as you suggested, downgrading to a read lock from commit.  It 
should only reduce concurrently when the document is in pending state.

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Attachments: getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515406
 ] 

Yonik Seeley commented on SOLR-139:
---

The locking logic for getStoredFields() is indeed flawed.
closing the writer inside the sync block of getStoredFields() doesn't project 
callers of addDoc() from concurrently using that writer.  The commit lock 
aquire will be needed after all... no getting around it I think.

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Attachments: getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515366
 ] 

Yonik Seeley commented on SOLR-139:
---

I disabled logging on all of "org.apache.solr" via a filter, and voila, OOM 
problems are gone.
Perhaps the logger could not keep up with the number of records and they piled 
up over time time (does any component of the logging framework use another 
thread that might be getting starved?)

Anyway, it doesn't look like Solr has a memory leak.
On to the next issue.

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Attachments: getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-139) Support updateable/modifiable documents

2007-07-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12515306
 ] 

Yonik Seeley commented on SOLR-139:
---

OOM still happens from the command line also after lucene updates to 2.2.
Looks like it's time for old-school instrumentation (printfs, etc).

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Reporter: Ryan McKinley
>Assignee: Ryan McKinley
> Attachments: getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-317) A XSLT stylesheet that pretty-prints out the response from the LukeRequestHandler

2007-07-25 Thread Thomas Peuss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Peuss updated SOLR-317:
--

Attachment: prettyluke.xsl

SVG graphics tuning.

> A XSLT stylesheet that pretty-prints out the response from the 
> LukeRequestHandler
> -
>
> Key: SOLR-317
> URL: https://issues.apache.org/jira/browse/SOLR-317
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Thomas Peuss
>Priority: Minor
> Attachments: prettyluke.xsl, prettyluke.xsl, prettyluke.xsl
>
>
> A first version of a XSLT stylesheet for pretty printing the response from 
> the LukeRequestHandler. It uses inline SVG graphics for the histograms if you 
> are on Firefox, Safari or Opera. On IE you get a list for the histograms.
> When you put it in /admin you can try it with 
> http://localhost:8080/apache-solr-1.3-dev/admin/luke?stylesheet=../apache-solr-1.3-dev/admin/prettyluke

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-Nightly #153

2007-07-25 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/153/changes

--
[...truncated 900 lines...]
A client/ruby/solr-ruby/test/unit/field_test.rb
AUclient/ruby/solr-ruby/test/unit/solr_mock_base.rb
A client/ruby/solr-ruby/test/unit/add_document_test.rb
AUclient/ruby/solr-ruby/test/unit/request_test.rb
A client/ruby/solr-ruby/test/unit/commit_test.rb
AUclient/ruby/solr-ruby/test/unit/xpath_mapper_test.rb
AUclient/ruby/solr-ruby/test/unit/suite.rb
A client/ruby/solr-ruby/test/unit/ping_test.rb
A client/ruby/solr-ruby/test/unit/dismax_request_test.rb
A client/ruby/solr-ruby/test/unit/response_test.rb
AUclient/ruby/solr-ruby/test/unit/indexer_test.rb
AUclient/ruby/solr-ruby/test/unit/connection_test.rb
A client/ruby/solr-ruby/test/unit/delete_test.rb
AUclient/ruby/solr-ruby/test/unit/tab_delimited.txt
A client/ruby/solr-ruby/test/unit/hpricot_test_file.xml
AUclient/ruby/solr-ruby/test/unit/standard_request_test.rb
A client/ruby/solr-ruby/test/unit/hpricot_mapper_test.rb
AUclient/ruby/solr-ruby/test/unit/data_mapper_test.rb
AUclient/ruby/solr-ruby/test/unit/util_test.rb
A client/ruby/solr-ruby/test/functional
A client/ruby/solr-ruby/test/functional/test_solr_server.rb
A client/ruby/solr-ruby/test/functional/server_test.rb
A client/ruby/solr-ruby/test/conf
AUclient/ruby/solr-ruby/test/conf/schema.xml
A client/ruby/solr-ruby/test/conf/protwords.txt
A client/ruby/solr-ruby/test/conf/stopwords.txt
AUclient/ruby/solr-ruby/test/conf/solrconfig.xml
A client/ruby/solr-ruby/test/conf/scripts.conf
A client/ruby/solr-ruby/test/conf/admin-extra.html
A client/ruby/solr-ruby/test/conf/synonyms.txt
A client/ruby/solr-ruby/LICENSE.txt
A client/ruby/solr-ruby/Rakefile
A client/ruby/solr-ruby/script
AUclient/ruby/solr-ruby/script/setup.rb
AUclient/ruby/solr-ruby/script/solrshell
A client/ruby/solr-ruby/lib
A client/ruby/solr-ruby/lib/solr
AUclient/ruby/solr-ruby/lib/solr/util.rb
A client/ruby/solr-ruby/lib/solr/document.rb
A client/ruby/solr-ruby/lib/solr/exception.rb
AUclient/ruby/solr-ruby/lib/solr/indexer.rb
AUclient/ruby/solr-ruby/lib/solr/response.rb
AUclient/ruby/solr-ruby/lib/solr/connection.rb
A client/ruby/solr-ruby/lib/solr/importer
AUclient/ruby/solr-ruby/lib/solr/importer/delimited_file_source.rb
AUclient/ruby/solr-ruby/lib/solr/importer/solr_source.rb
AUclient/ruby/solr-ruby/lib/solr/importer/array_mapper.rb
AUclient/ruby/solr-ruby/lib/solr/importer/mapper.rb
AUclient/ruby/solr-ruby/lib/solr/importer/xpath_mapper.rb
A client/ruby/solr-ruby/lib/solr/importer/hpricot_mapper.rb
A client/ruby/solr-ruby/lib/solr/xml.rb
AUclient/ruby/solr-ruby/lib/solr/importer.rb
A client/ruby/solr-ruby/lib/solr/field.rb
AUclient/ruby/solr-ruby/lib/solr/solrtasks.rb
A client/ruby/solr-ruby/lib/solr/request
A client/ruby/solr-ruby/lib/solr/request/ping.rb
A client/ruby/solr-ruby/lib/solr/request/select.rb
AUclient/ruby/solr-ruby/lib/solr/request/optimize.rb
AUclient/ruby/solr-ruby/lib/solr/request/standard.rb
A client/ruby/solr-ruby/lib/solr/request/delete.rb
AUclient/ruby/solr-ruby/lib/solr/request/index_info.rb
A client/ruby/solr-ruby/lib/solr/request/update.rb
A client/ruby/solr-ruby/lib/solr/request/dismax.rb
A client/ruby/solr-ruby/lib/solr/request/add_document.rb
A client/ruby/solr-ruby/lib/solr/request/commit.rb
A client/ruby/solr-ruby/lib/solr/request/base.rb
AUclient/ruby/solr-ruby/lib/solr/request.rb
A client/ruby/solr-ruby/lib/solr/response
A client/ruby/solr-ruby/lib/solr/response/ping.rb
AUclient/ruby/solr-ruby/lib/solr/response/optimize.rb
A client/ruby/solr-ruby/lib/solr/response/standard.rb
A client/ruby/solr-ruby/lib/solr/response/xml.rb
A client/ruby/solr-ruby/lib/solr/response/ruby.rb
A client/ruby/solr-ruby/lib/solr/response/delete.rb
AUclient/ruby/solr-ruby/lib/solr/response/index_info.rb
A client/ruby/solr-ruby/lib/solr/response/dismax.rb
A client/ruby/solr-ruby/lib/solr/response/add_document.rb
A client/ruby/solr-ruby/lib/solr/response/commit.rb
A client/ruby/solr-ruby/lib/solr/response/base.rb
AUclient/ruby/solr-ruby/lib/solr.rb
A client/ruby/solr-ruby/CHANGES.yml
A client/ruby/solr-ruby/README
A client/ruby/solr-ruby/examples
A client/ruby/solr-ruby/examples/marc
AUclient/ruby/solr-ruby/examples/marc/marc_importer.rb
A client/ruby/solr-ruby/examples/delicious_library
A client/ruby/s

PHP Response Writer for Solr

2007-07-25 Thread Pieter Berkel

I've been using the proposed PHP response writer code from SOLR-196
(eval-able php code) and SOLR-275 (serialized php data) for some time now
and would like to work towards getting these included in the main Solr
distribution.

http://www.nabble.com/Created%3A-%28SOLR-196%29-A-PHP-response-writer-for-Solr-tf3458434.html
http://www.nabble.com/-jira--Created%3A-%28SOLR-275%29-PHP-Serialized-Response-Writer-tf3980951.html

There is quite a bit of code duplication in SOLR-196 which I'd like to
eliminate if possible, and due to the way php serializes data (e.g. storing
the number of elements in an array) the JSONWriter may have to be refactored
(specifically where arrays are written directly using writer.write('{') and
writer.write('}') rather than writeArray() method.

In order to differentiate between the two, I propose we rename the
serialized writer to PHPSerializedResponseWriter to avoid any conflicts with
the original "eval" PHPResponseWriter and configure them as such:




I'd just like to get some opinion / feedback on the above and also to figure
out the best approach to achieving this goal before I start making any
changes.

thanks,
Pieter