[jira] Assigned: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley reassigned SOLR-139: -- Assignee: Ryan McKinley Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-XmlUpdater.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Solr nightly build failure
init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/build checkJunitPresence: compile-common: [mkdir] Created dir: /tmp/apache-solr-nightly/build/common [javac] Compiling 25 source files to /tmp/apache-solr-nightly/build/common [javac] Note: /tmp/apache-solr-nightly/src/java/org/apache/solr/common/params/DisMaxParams.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile: [mkdir] Created dir: /tmp/apache-solr-nightly/build/core [javac] Compiling 194 source files to /tmp/apache-solr-nightly/build/core [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile-solrj-core: [mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj [javac] Compiling 21 source files to /tmp/apache-solr-nightly/build/client/solrj [javac] Note: /tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile-solrj: [javac] Compiling 2 source files to /tmp/apache-solr-nightly/build/client/solrj [javac] Note: /tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/embedded/JettySolrRunner.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compileTests: [mkdir] Created dir: /tmp/apache-solr-nightly/build/tests [javac] Compiling 55 source files to /tmp/apache-solr-nightly/build/tests [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. junit: [mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results [junit] Running org.apache.solr.BasicFunctionalityTest [junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 21.204 sec [junit] Running org.apache.solr.ConvertedLegacyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.779 sec [junit] Running org.apache.solr.DisMaxRequestHandlerTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 6.733 sec [junit] Running org.apache.solr.EchoParamsTest [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.036 sec [junit] Running org.apache.solr.OutputWriterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.864 sec [junit] Running org.apache.solr.SampleTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.056 sec [junit] Running org.apache.solr.analysis.TestBufferedTokenStream [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.056 sec [junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.069 sec [junit] Running org.apache.solr.analysis.TestKeepWordFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.067 sec [junit] Running org.apache.solr.analysis.TestPatternReplaceFilter [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.088 sec [junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.083 sec [junit] Running org.apache.solr.analysis.TestPhoneticFilter [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.101 sec [junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.073 sec [junit] Running org.apache.solr.analysis.TestSynonymFilter [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.126 sec [junit] Running org.apache.solr.analysis.TestTrimFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.061 sec [junit] Running org.apache.solr.analysis.TestWordDelimiterFilter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 10.284 sec [junit] Running org.apache.solr.common.SolrDocumentTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.078 sec [junit] Running org.apache.solr.common.params.SolrParamTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.078 sec [junit] Running org.apache.solr.common.util.ContentStreamTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.171 sec [junit] Running org.apache.solr.common.util.IteratorChainTest [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.064 sec
Build failed in Hudson: Solr-Nightly #130
See http://lucene.zones.apache.org:8080/hudson/job/Solr-Nightly/130/changes Changes: [ryan] SOLR-133 -- found a bug in the delete XML parsing. for id's and queries with , it did not behave correctly. Adds a fix and test. Another sideeffect that should be noted is that this parser now accepts multiple delete commands: delete id1/id id3/id id4/id /delete` [ryan] changes note for multiple deletes with delete command [ryan] SOLR-269 -- missing changes for somethign commited a while ago [ryan] SOLR-280 -- changing the SolrDocument/SolrInputDocument implementation so each one as as efficient as it can be. The API changes mostly affect solrj users. -- [...truncated 869 lines...] A client/ruby/solr-ruby/test/unit/document_test.rb A client/ruby/solr-ruby/test/unit/standard_response_test.rb AUclient/ruby/solr-ruby/test/unit/delimited_file_source_test.rb A client/ruby/solr-ruby/test/unit/xpath_test_file.xml AUclient/ruby/solr-ruby/test/unit/array_mapper_test.rb A client/ruby/solr-ruby/test/unit/field_test.rb AUclient/ruby/solr-ruby/test/unit/solr_mock_base.rb A client/ruby/solr-ruby/test/unit/add_document_test.rb AUclient/ruby/solr-ruby/test/unit/request_test.rb A client/ruby/solr-ruby/test/unit/commit_test.rb AUclient/ruby/solr-ruby/test/unit/xpath_mapper_test.rb AUclient/ruby/solr-ruby/test/unit/suite.rb A client/ruby/solr-ruby/test/unit/ping_test.rb A client/ruby/solr-ruby/test/unit/dismax_request_test.rb A client/ruby/solr-ruby/test/unit/response_test.rb AUclient/ruby/solr-ruby/test/unit/indexer_test.rb AUclient/ruby/solr-ruby/test/unit/connection_test.rb A client/ruby/solr-ruby/test/unit/delete_test.rb AUclient/ruby/solr-ruby/test/unit/tab_delimited.txt A client/ruby/solr-ruby/test/unit/hpricot_test_file.xml AUclient/ruby/solr-ruby/test/unit/standard_request_test.rb A client/ruby/solr-ruby/test/unit/hpricot_mapper_test.rb AUclient/ruby/solr-ruby/test/unit/data_mapper_test.rb AUclient/ruby/solr-ruby/test/unit/util_test.rb A client/ruby/solr-ruby/test/functional A client/ruby/solr-ruby/test/functional/test_solr_server.rb A client/ruby/solr-ruby/test/functional/server_test.rb A client/ruby/solr-ruby/test/conf AUclient/ruby/solr-ruby/test/conf/schema.xml A client/ruby/solr-ruby/test/conf/protwords.txt A client/ruby/solr-ruby/test/conf/stopwords.txt AUclient/ruby/solr-ruby/test/conf/solrconfig.xml A client/ruby/solr-ruby/test/conf/scripts.conf A client/ruby/solr-ruby/test/conf/admin-extra.html A client/ruby/solr-ruby/test/conf/synonyms.txt A client/ruby/solr-ruby/LICENSE.txt A client/ruby/solr-ruby/Rakefile A client/ruby/solr-ruby/script AUclient/ruby/solr-ruby/script/setup.rb AUclient/ruby/solr-ruby/script/solrshell A client/ruby/solr-ruby/lib A client/ruby/solr-ruby/lib/solr AUclient/ruby/solr-ruby/lib/solr/util.rb A client/ruby/solr-ruby/lib/solr/document.rb A client/ruby/solr-ruby/lib/solr/exception.rb AUclient/ruby/solr-ruby/lib/solr/indexer.rb AUclient/ruby/solr-ruby/lib/solr/response.rb AUclient/ruby/solr-ruby/lib/solr/connection.rb A client/ruby/solr-ruby/lib/solr/importer AUclient/ruby/solr-ruby/lib/solr/importer/delimited_file_source.rb AUclient/ruby/solr-ruby/lib/solr/importer/solr_source.rb AUclient/ruby/solr-ruby/lib/solr/importer/array_mapper.rb AUclient/ruby/solr-ruby/lib/solr/importer/mapper.rb AUclient/ruby/solr-ruby/lib/solr/importer/xpath_mapper.rb A client/ruby/solr-ruby/lib/solr/importer/hpricot_mapper.rb A client/ruby/solr-ruby/lib/solr/xml.rb AUclient/ruby/solr-ruby/lib/solr/importer.rb A client/ruby/solr-ruby/lib/solr/field.rb AUclient/ruby/solr-ruby/lib/solr/solrtasks.rb A client/ruby/solr-ruby/lib/solr/request A client/ruby/solr-ruby/lib/solr/request/ping.rb A client/ruby/solr-ruby/lib/solr/request/select.rb AUclient/ruby/solr-ruby/lib/solr/request/optimize.rb AUclient/ruby/solr-ruby/lib/solr/request/standard.rb A client/ruby/solr-ruby/lib/solr/request/delete.rb AUclient/ruby/solr-ruby/lib/solr/request/index_info.rb A client/ruby/solr-ruby/lib/solr/request/update.rb A client/ruby/solr-ruby/lib/solr/request/dismax.rb A client/ruby/solr-ruby/lib/solr/request/add_document.rb A client/ruby/solr-ruby/lib/solr/request/commit.rb A client/ruby/solr-ruby/lib/solr/request/base.rb AUclient/ruby/solr-ruby/lib/solr/request.rb A client/ruby/solr-ruby/lib/solr/response A client/ruby/solr-ruby/lib/solr/response/ping.rb AU
[jira] Closed: (SOLR-163) libxml/rexml-related test case failure
[ https://issues.apache.org/jira/browse/SOLR-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher closed SOLR-163. - Resolution: Fixed libxml/rexml-related test case failure -- Key: SOLR-163 URL: https://issues.apache.org/jira/browse/SOLR-163 Project: Solr Issue Type: Bug Components: clients - ruby - flare Reporter: Erik Hatcher http://www.nabble.com/solrb-testing--tf3213880.html#a8949745 1) Failure: test_delete_by_i18n_query_request(DeleteTest) [./test/unit/ delete_test.rb:53]: delete\n queryëäïöü/query\n/ delete expected to be =~ /delete[\s]*query\303\253\303\244\303\257\303\266\303\274\/ query[\s]*\/delete/m. 2) Failure: test_i18n_xml(FieldTest) [./test/unit/field_test.rb:39]: field name=\i18nstring\Äêâîôû Öëäïöü/field expected to be =~ /field name=[']i18nstring[']\303\204\303\252\303\242\303\256\303 \264\303\273 \303\226\303\253\303\244\303\257\303\266\303\274\/ field/m. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-280) slightly more efficient SolrDocument implementation
[ https://issues.apache.org/jira/browse/SOLR-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Will Johnson updated SOLR-280: -- Attachment: SOLR-280-SolrDocument2-API-Compatibility.patch The API changes mostly affect solrj users. being one of those heavily affected users i created the attached patch to make us unaffected. (or at least i went from a few hundred compile errors to 0) the following methods were added back and are mostly 1-5 line wrappers to the existing methods or underlying datastructures. setField(String, Object) getFieldValue(String) getFieldValues(String) addField(String, Object) getFieldNames() - will slightly more efficient SolrDocument implementation --- Key: SOLR-280 URL: https://issues.apache.org/jira/browse/SOLR-280 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: SOLR-280-SolrDocument2-API-Compatibility.patch, SOLR-280-SolrDocument2.patch, SOLR-280-SolrDocument2.patch Following discussion in SOLR-272 This implementation stores fields as a MapString,Object rather then a MapString,CollectionObject. The API changes slightly in that: getFieldValue( name ) returns a Collection if there are more then one fields and a Object if there is only one. getFirstValue( name ) returns a single value for the field. This is intended to make things easier for client applications. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-280) slightly more efficient SolrDocument implementation
[ https://issues.apache.org/jira/browse/SOLR-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509602 ] Ryan McKinley commented on SOLR-280: in rev552521, I changed the Float variables to float and default everything to 1.0. if we have wrapper functions, this seems better then autoboxing/checking null values. slightly more efficient SolrDocument implementation --- Key: SOLR-280 URL: https://issues.apache.org/jira/browse/SOLR-280 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Priority: Minor Attachments: SOLR-280-SolrDocument2-API-Compatibility.patch, SOLR-280-SolrDocument2.patch, SOLR-280-SolrDocument2.patch Following discussion in SOLR-272 This implementation stores fields as a MapString,Object rather then a MapString,CollectionObject. The API changes slightly in that: getFieldValue( name ) returns a Collection if there are more then one fields and a Object if there is only one. getFirstValue( name ) returns a single value for the field. This is intended to make things easier for client applications. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509615 ] Ryan McKinley commented on SOLR-139: So you are suggesting pulling this out of the UpdateHandler and managing the document merging in the UpdateRequestProcessor? (this might makes sense - It was not an option when the patch started in feb) How can the UpdateHandler get access to pending documents? should it just use req.getSearcher()? example1: a userTag field that represents tags on objects of the form user#tagstring. If user==member, then add tagstring to the indexed-only ownerTags field, else add the tagstring to the socialTags field. example2: an UpdateRequestProcessor is used to encode the value of a field with rot13... this should obviously only be done for new field values, and not values that are just being re-stored, so the UpdateRequestProcessor needs to be able to distinguish between the two. 1 2 seem pretty straightforwad example3: some field values are pulled from a database when missing rather than being stored values. Do you mean as input or output? The UpdateRequestProcessor could not affect if a field is stored or not, it could augment a document with more fields *before* it is indexed. To add fields from a database rather then storing them, we would need a hook at the end. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-XmlUpdater.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509622 ] Yonik Seeley commented on SOLR-139: --- So you are suggesting [...] I don't have a concrete implementation idea, I'm just going over all the things I know people will want to do (and many of these I have an immediate use for). Do you mean as input or output? Input, for index-only fields. Normally field values need to be stored for an update to work, but we could also allow the user to get these field values from an external source. we would need a hook at the end. Yes, it might make sense to have more than one callback method per UpdateRequestProcessor Of course now that I finally look at the code, UpdateRequestProcessor isn't quite what I expected. I was originally thinking more along the lines of DocumentMutator(s) that manipulate a document, not that actually initiate the add/delete/udpate calls. But there is a certain greater power to what you are exposing/allowing too (as long as you don't need multiple of them). In UpdateRequestProcessor , instead of protected final NamedListObject response; Why not just expose SolrQueryRequest, SolrQueryResponse? Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-XmlUpdater.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: removing most @author tags
: In the spirit of shared ownership, what do people think of getting rid : of @author tags (for committers or other dev people that consent?). : Other apache projects have done so, for a host of reasons. +1 : $ find . -name \*.java | xargs grep '@author'| grep -i hoss | wc : 2 8 152 wow ... that's 2 more then i expected to see. -Hoss
[jira] Commented: (SOLR-277) Character Entity of XHTML is not supported with XmlUpdateRequestHandler .
[ https://issues.apache.org/jira/browse/SOLR-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509638 ] Hoss Man commented on SOLR-277: --- (FWIW: the old XmlUpdateRequestHandler has been renamed XppUpdateRequestHandler and deprecated, the current XmlUpdateRequestHandler uses stax.) This type of functionality seems like it might be a handy option for people to have if they know they have non standard entities in their input -- but i think be default we want to be strict about our XML parsing. perhaps an optional init param could be added to the XmlUpdateRequestHandler where a filename containing mappings like this could be specified? (this is assuming stax has something akin to the parser.defineEntityReplacementText method used in the patch) Character Entity of XHTML is not supported with XmlUpdateRequestHandler . - Key: SOLR-277 URL: https://issues.apache.org/jira/browse/SOLR-277 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.3 Reporter: Toru Matsuzawa Attachments: XmlUpdateRequestHandler.patch Character Entity of XHTML is not supported with XmlUpdateRequestHandler . http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-special.ent http://www.w3.org/TR/xhtml1/DTD/xhtml-symbol.ent It is necessary to correspond with XmlUpdateRequestHandler because xpp3 cannot use !DOCTYPE. I think it is necessary until StaxUpdateRequestHandler becomes /update. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: removing most @author tags
Thoughts? +1 It does feel a bit akward.
[jira] Created: (SOLR-284) Parsing Rich Document Types
Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: test-files.zip test files to go in test/test-files for unit testing. Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509676 ] Ryan McKinley commented on SOLR-284: I haven't run this patch, but have a few questions... What is the *general* approach to extract a lucene document (list of fields) from a PDF? Word? Powerpoint? Is this just access to a few common fields like author, keywords, text, etc? Is this something that realistically would need to be custom for each case? Perhaps it makes sense to add a contrib section for this sort of stuff. It seems weird to add 10 library dependencies to the core distribution. How does nutch handle this? Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-284: --- Attachment: libs.zip new jars to go in trunk/lib for pdf and office parsing... Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Affects Versions: 1.3 Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, test-files.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. I am attaching a patch file with the code changes, and if this looks good, will add a page similar to http://wiki.apache.org/solr/UpdateCSV. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: removing most @author tags
On 2-Jul-07, at 11:32 AM, Yonik Seeley wrote: In the spirit of shared ownership, what do people think of getting rid of @author tags (for committers or other dev people that consent?). Other apache projects have done so, for a host of reasons. - some people don't use author tags, hence credit is uneven - author tags tend to only credit the original author, and not everyone that works on the code after (or does code reviews, lends ideas, etc, etc) - we have CHANGES.txt to generally credit people (and it prob does a better job) I've seen a better list of reasons elsewhere, but my main motivation was that it didn't feel right having my name spashed all over code that many other people are contributing to now. Thoughts? +0, though I think it is mostly a decision for those who have already tons of @author tags in the repo. FWIW, our internal repository was in a similar situation: I was __author__ of 90% of the files, though certainly not the sole contributor to all of those files. I decided to strip this attribution for precisely the reasons you enumerated. -Mike
[jira] Commented: (SOLR-225) Allow pluggable Highlighting classes -- Formatters and Fragmenters
[ https://issues.apache.org/jira/browse/SOLR-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509709 ] Mike Klaas commented on SOLR-225: - Looking great Ryan (again, only commenting on the Highlighting configurability parts) should: protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || arr[0] == null || arr[0].trim().length() == 0); } perhaps be defined as protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || arr.length == 1arr[0] == null || arr[0].trim().length() == 0); } Allow pluggable Highlighting classes -- Formatters and Fragmenters -- Key: SOLR-225 URL: https://issues.apache.org/jira/browse/SOLR-225 Project: Solr Issue Type: Improvement Reporter: Brian Whitman Assignee: Ryan McKinley Attachments: SOLR-225+260-HighlightPlugins.patch, SOLR-225+260-HighlightPlugins.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch Highlighting should support a pluggable architecture similar to what is seen with RequestHandlers, Fields, FieldTypes, etc ' For more background: http://www.nabble.com/Custom-fragmenter-tf3681588.html#a10289335 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-225) Allow pluggable Highlighting classes -- Formatters and Fragmenters
[ https://issues.apache.org/jira/browse/SOLR-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509709 ] Mike Klaas edited comment on SOLR-225 at 7/2/07 4:14 PM: - Looking great Ryan (again, only commenting on the Highlighting configurability parts) should: protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || arr[0] == null || arr[0].trim().length() == 0); } perhaps be defined as protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || (arr.length == 1 (arr[0] == null || arr[0].trim().length() == 0))); } Params: + public static final String HIGHLIGHT = hl; + public static final String PREFIX = hl.; + public static final String FIELDS = PREFIX+fl; + public static final String SNIPPETS= PREFIX+snippets; + public static final String FRAGSIZE= PREFIX+fragsize; + public static final String INCREMENT = PREFIX+increment; + public static final String SLOP= PREFIX+slop; perhaps this should be PREFIX + 'regex.slop'? + public static final String MAX_CHARS = PREFIX+maxAnalyzedChars; similarly. Seems somewhat inelegant to define/hardcode the plugin-specific parameters here, though it is nice ot have them all in one place... was: Looking great Ryan (again, only commenting on the Highlighting configurability parts) should: protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || arr[0] == null || arr[0].trim().length() == 0); } perhaps be defined as protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || arr.length == 1arr[0] == null || arr[0].trim().length() == 0); } Allow pluggable Highlighting classes -- Formatters and Fragmenters -- Key: SOLR-225 URL: https://issues.apache.org/jira/browse/SOLR-225 Project: Solr Issue Type: Improvement Reporter: Brian Whitman Assignee: Ryan McKinley Attachments: SOLR-225+260-HighlightPlugins.patch, SOLR-225+260-HighlightPlugins.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch Highlighting should support a pluggable architecture similar to what is seen with RequestHandlers, Fields, FieldTypes, etc ' For more background: http://www.nabble.com/Custom-fragmenter-tf3681588.html#a10289335 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-285) Server Side XSLT for update processing
[ https://issues.apache.org/jira/browse/SOLR-285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-285: -- Attachment: xslt_updater.diff this is mainly just a proof of concept ... there is a lot of room for improvement here .. this reuses the same TransformerProvider as the XSLTResposneWriter but doesn't even try to use hte cache (even if it did, using it in conjunction with XSLTResposneWriter would constantly invalidate the cache) the biggest improvement would be to find someway to pipeline the XSLT transformation into the Stax parsing ... i tried to at least use a DOMResult for hte transformer and a DOMSource for the XMLStreamReader but i got this exception... SEVERE: java.lang.UnsupportedOperationException: XMLInputFactory.createXMLStreamReader(javax.xml.transform.dom.DOMSource) not yet implemented at com.bea.xml.stream.MXParserFactory.createXMLStreamReader(MXParserFactory.java:70) ...oh well. patch also includes a simple rss2solr.xml stylesheet that does some very simplistic/silly transformations to match the example schema.xml comments from people who understand javax.xml.* better then i do would be greatly appreciated. Server Side XSLT for update processing -- Key: SOLR-285 URL: https://issues.apache.org/jira/browse/SOLR-285 Project: Solr Issue Type: New Feature Reporter: Hoss Man Attachments: xslt_updater.diff Ideally, we should support a way for people to specify send XML ContentStreams to Solr and do server side XSLT processing to convert it (much like the XSLTResposneWriter supports server side XSLT processing of responses. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-225) Allow pluggable Highlighting classes -- Formatters and Fragmenters
[ https://issues.apache.org/jira/browse/SOLR-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509721 ] Ryan McKinley commented on SOLR-225: perhaps be defined as protected boolean emptyArray(String[] arr) { return (arr == null || arr.length == 0 || (arr.length == 1 (arr[0] == null || arr[0].trim().length() == 0))); } seems good. This patch tried not to change any highlighting logic, it is just moved it from the existing HighlightingUtils.java I will add this change. + public static final String MAX_CHARS = PREFIX+maxAnalyzedChars; similarly. Seems somewhat inelegant to define/hardcode the plugin-specific parameters here, though it is nice ot have them all in one place... I'm torn on what is more/less elegant. Should we have a new class in o.a.s.common.params for each plugin? Since the number of 'core' plugins will be relatively small, having a single HighlightParams class with sections for the core plugin options seems ok. But I can easily be talked out of this... Allow pluggable Highlighting classes -- Formatters and Fragmenters -- Key: SOLR-225 URL: https://issues.apache.org/jira/browse/SOLR-225 Project: Solr Issue Type: Improvement Reporter: Brian Whitman Assignee: Ryan McKinley Attachments: SOLR-225+260-HighlightPlugins.patch, SOLR-225+260-HighlightPlugins.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch Highlighting should support a pluggable architecture similar to what is seen with RequestHandlers, Fields, FieldTypes, etc ' For more background: http://www.nabble.com/Custom-fragmenter-tf3681588.html#a10289335 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: removing most @author tags
Yonik Seeley wrote: In the spirit of shared ownership, what do people think of getting rid of @author tags (for committers or other dev people that consent?). Other apache projects have done so, for a host of reasons. - some people don't use author tags, hence credit is uneven - author tags tend to only credit the original author, and not everyone that works on the code after (or does code reviews, lends ideas, etc, etc) - we have CHANGES.txt to generally credit people (and it prob does a better job) you forgot another big reason people tend to email people in the @author tags directly, instead of using the lists.
[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them
[ https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509725 ] Yonik Seeley commented on SOLR-269: --- Looking at UpdateRequestProcessor further, it seems like these should be singletons (instance per entry in solrconfig, no factory needed), and any extra state that is needed should be added to classes we already have (like AddCommand, etc), no? UpdateRequestProcessorFactory - process requests before submitting them --- Key: SOLR-269 URL: https://issues.apache.org/jira/browse/SOLR-269 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. An UpdateRequestProcessor lets clients plug in logic after a document has been parsed and before it has been 'updated' with the index. This is a good place to add custom logic for: * transforming the document fields * fine grained authorization (can user X updated document Y?) * allow update, but not delete (by query?) requestHandler name=/update class=solr.StaxUpdateRequestHandler str name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str lst name=update.processor.args ... (optionally pass in arguments to the factory init method) ... /lst /requestHandler http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-225) Allow pluggable Highlighting classes -- Formatters and Fragmenters
[ https://issues.apache.org/jira/browse/SOLR-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509728 ] Mike Klaas commented on SOLR-225: - I'm torn on what is more/less elegant. Should we have a new class in o.a.s.common.params for each plugin? Since the number of 'core' plugins will be relatively small, having a single HighlightParams class with sections for the core plugin options seems ok. But I can easily be talked out of this... Seems ok to me too. Spreading everything into a jumble of classes won't exactly help coherence. Allow pluggable Highlighting classes -- Formatters and Fragmenters -- Key: SOLR-225 URL: https://issues.apache.org/jira/browse/SOLR-225 Project: Solr Issue Type: Improvement Reporter: Brian Whitman Assignee: Ryan McKinley Attachments: SOLR-225+260-HighlightPlugins.patch, SOLR-225+260-HighlightPlugins.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch, SOLR-225-HighlightingConfig.patch Highlighting should support a pluggable architecture similar to what is seen with RequestHandlers, Fields, FieldTypes, etc ' For more background: http://www.nabble.com/Custom-fragmenter-tf3681588.html#a10289335 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them
[ https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509732 ] Yonik Seeley commented on SOLR-269: --- I think the newly added incremental time should not be on by default, as well as logging per id for deletes and adds. Mike added the id aggregation code specifically because logging each add was taking so much time. UpdateRequestProcessorFactory - process requests before submitting them --- Key: SOLR-269 URL: https://issues.apache.org/jira/browse/SOLR-269 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. An UpdateRequestProcessor lets clients plug in logic after a document has been parsed and before it has been 'updated' with the index. This is a good place to add custom logic for: * transforming the document fields * fine grained authorization (can user X updated document Y?) * allow update, but not delete (by query?) requestHandler name=/update class=solr.StaxUpdateRequestHandler str name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str lst name=update.processor.args ... (optionally pass in arguments to the factory init method) ... /lst /requestHandler http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them
[ https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509733 ] Ryan McKinley commented on SOLR-269: maybe. I'm not sure I totally understand your suggestion though. I need something that is easily subclassed and can cleanly holds state across an entire request cycle. The alternative is to pass the SolrQueryRequest/Response into each action and maybe pull out the schema/updateHandler/logged in user/etc for each command (each document in the list of 100) Is the factory a performance concern? (to my tastes) it seems nicer to work with: processDelete( DeleteUpdateCommand cmd ) { if( user.isAdmin() ) { updateHandler.delete( cmd ); } else { ... } } than: processDelete( DeleteUpdateCommand cmd, SolrQueryRequest req, SolrQueryResponse rsp ) { User user = req.getContext().get( user ); if( user.isAdmin() ) { SolrCore core = req.getCore(); SolrSchema schema = core.getSchema(); UpdateHandler updateHandler = core.getUpdateHandler(); updateHandler.delete( cmd ); } else { ... } } I'm fine either way, like the easy 1 per-request interface. UpdateRequestProcessorFactory - process requests before submitting them --- Key: SOLR-269 URL: https://issues.apache.org/jira/browse/SOLR-269 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. An UpdateRequestProcessor lets clients plug in logic after a document has been parsed and before it has been 'updated' with the index. This is a good place to add custom logic for: * transforming the document fields * fine grained authorization (can user X updated document Y?) * allow update, but not delete (by query?) requestHandler name=/update class=solr.StaxUpdateRequestHandler str name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str lst name=update.processor.args ... (optionally pass in arguments to the factory init method) ... /lst /requestHandler http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them
[ https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509734 ] Ryan McKinley commented on SOLR-269: I think the newly added incremental time should not be on by default, as well as logging per id for deletes and adds. Mike added the id aggregation code specifically because logging each add was taking so much time. sounds good. the testing I did showed that lots of time is spent in the logging phase. I will remove it from the default implementation. UpdateRequestProcessorFactory - process requests before submitting them --- Key: SOLR-269 URL: https://issues.apache.org/jira/browse/SOLR-269 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. An UpdateRequestProcessor lets clients plug in logic after a document has been parsed and before it has been 'updated' with the index. This is a good place to add custom logic for: * transforming the document fields * fine grained authorization (can user X updated document Y?) * allow update, but not delete (by query?) requestHandler name=/update class=solr.StaxUpdateRequestHandler str name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str lst name=update.processor.args ... (optionally pass in arguments to the factory init method) ... /lst /requestHandler http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-160) [Patch] Get Test Solr Server working in Windows environment
[ https://issues.apache.org/jira/browse/SOLR-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mel Riffe resolved SOLR-160. Resolution: Fixed let me know if anyone else has problems with this patch; i'm moving the majority (99%) of my Rails development to the Mac but will support as needed. [Patch] Get Test Solr Server working in Windows environment --- Key: SOLR-160 URL: https://issues.apache.org/jira/browse/SOLR-160 Project: Solr Issue Type: Improvement Components: clients - ruby - flare Environment: Windows XP Home Reporter: Mel Riffe Attachments: win32_functional_tests.patch, win32_functional_tests.results Because Windows does not support forking processes I created a patch that uses the Win32 api to create and destroy a process to control the test solr server. I have attached two files: 1) the patch and 2) the results from running 'rake test' In my environment I have two failures. My approach was to still support the including/requiring of the single file test/functional/test_solr_server.rb and have it further require the platform dependent start/stop api. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-269) UpdateRequestProcessorFactory - process requests before submitting them
[ https://issues.apache.org/jira/browse/SOLR-269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509737 ] Yonik Seeley commented on SOLR-269: --- I need something that is easily subclassed and can cleanly holds state across an entire request cycle. Having a factory and separate object so that one can use core instead of req.getCore(), etc, seems like overkill for the normal case though since getCore(), getSchema(), getUpdateHandler() all just return instance variables. I was thinking any state like that could be on the UpdateCommand. I'd like to have potentially several request processors, but if people start doing single doc add requests, instantiating and initializing all those request processors will get expensive. I do see your usecase though, in the case of multiple docs per add and you have some expensive state you only want to calculate once. If it's a relatively rare case, one could put it in the request context. The tradeoff would be an extra hash lookup per-document of a multi-document add vs an extra object creation for single-doc adds. Different Q on usage: is this where my document mutator stuff should go??? If I want a transformation done on a field, regardless of where the data is coming from (XML update handler, CSV update handler, future REST update handler, etc), how should that be done? Is there a single place I can register a plugin to do this, and is UpdateRequestProcessor where you see it happening? UpdateRequestProcessorFactory - process requests before submitting them --- Key: SOLR-269 URL: https://issues.apache.org/jira/browse/SOLR-269 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 Attachments: SOLR-269-UpdateRequestProcessorFactory.patch A simple UpdateRequestProcessor was added to a bloated SOLR-133 commit. An UpdateRequestProcessor lets clients plug in logic after a document has been parsed and before it has been 'updated' with the index. This is a good place to add custom logic for: * transforming the document fields * fine grained authorization (can user X updated document Y?) * allow update, but not delete (by query?) requestHandler name=/update class=solr.StaxUpdateRequestHandler str name=update.processor.classorg.apache.solr.handler.UpdateRequestProcessor/str lst name=update.processor.args ... (optionally pass in arguments to the factory init method) ... /lst /requestHandler http://www.nabble.com/Re%3A-svn-commit%3A-r547495---in--lucene-solr-trunk%3A-example-solr-conf-solrconfig.xml-src-java-org-apache-solr-handler-StaxUpdateRequestHandler.java-src-java-org-apache-solr-handler-UpdateRequestProcessor.jav-tf3950072.html#a11206583 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.