[jira] Commented: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508993 ] Yonik Seeley commented on SOLR-272: --- > The one big difference to Yoniks suggestion above is that it returns a > Collection for getFieldValues() even if it is a single valued field That's a good change as it leads to simpler client code. I think that getFieldValue() should perhaps return the raw entry (an Object or a Collection) for those (like the indexer) who would want the most efficient access. > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, > SolrInputDoc.patch, SolrInputDoc.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Multiple indexes/cores (aka solr-215) functional value?
This is precisely what I want to do. Yes, I can add JNDI entries to various Jetty XML files, but this is good only if you have a fixed set of indices known ahead of time (before starting the servlet container). I want the ability to add and remove indices on the fly, while the servlet container with Solr is running. This is where SOLR-215 comes in. Henri, hang in there. :) Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: solr-dev@lucene.apache.org Sent: Wednesday, June 27, 2007 5:00:08 PM Subject: Re: Multiple indexes/cores (aka solr-215) functional value? On 6/27/07, Henrib <[EMAIL PROTECTED]> wrote: > This http://www.nabble.com/multiple-indices-tf3982573.html thread triggers > the question again. > Solr-215 makes it easier to deploy multiple indexes than using multiple web > applications; but is "easier" enough for not being just a superfluous > feature? With a fixed handful of indicies, IMO, no. Though if one needs to programmatically add new indicies/schemas, SOLR-215 becomes interesting. I don't know how common of a case that is though. There are probably other use cases I've not considered. SOLR-215 does seem unrelated to distributed search though. -Yonik
[jira] Commented: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508890 ] Yonik Seeley commented on SOLR-272: --- > To be honest, I'm not sure the complexity of dealing with a > Map (where the Object may be a > collection or not) is worth the marginal speedup. I'm not sure either, but one reason the speedup is marginal is that it's not the bottleneck (other things are taking more time, like dynamic copy-field checking... I've never checked that code to see if it could be optimized, but things are quite a bit faster when all the dynamic fields are removed). SolrInputDocument could similary be sped up by getting rid of the Map for boosts. One could either store a bare value, or a BoostedValue. class BoostedValue { float boost; Object value; } > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, > SolrInputDoc.patch, SolrInputDoc.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-272) SolrDocument performance testing
[ https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan McKinley updated SOLR-272: --- Attachment: SolrInputDoc.patch This is an alternative version of SolrDocument that only creates Collections for mulitvalued fields... The one big difference to Yoniks suggestion above is that it returns a Collection for getFieldValues() even if it is a single valued field. Running the perf test for 1M docs 5 times for each implementation: [100] SolrInputDocument: 9992 9827 9823 9854 9948 [100] SolrInputDocument2: 9636 9719 9699 9807 9729 [100] DocumentBuilder: 8866 8818 8946 8812 8953 To be honest, I'm not sure the complexity of dealing with a Map (where the Object may be a collection or not) is worth the marginal speedup. I suppose if the docs are all single valued it would be a more substantial difference. > SolrDocument performance testing > > > Key: SOLR-272 > URL: https://issues.apache.org/jira/browse/SOLR-272 > Project: Solr > Issue Type: Test >Affects Versions: 1.3 >Reporter: Ryan McKinley > Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, > SOLR-272-SolrDocumentPerformanceTesting.patch, > SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, > SolrInputDoc.patch, SolrInputDoc.patch > > > In 1.3, we added SolrInputDocument -- a temporary class to hold document > information. There is concern that this may be less then ideal > performance-wise. > To settle some concerns (mine included) I want to compare a few SolrDocument > implementations to make sure we are not doing something crazy. > I implemented a LuceneInputDocument subclass of SolrInputDocument that stores > its values directly in Lucene Document (rather then a Map). > This is a quick test comparing: > 1. Building documents with SolrInputDocument > 2. Building documents with LuceneInputDocument (same interface writing > directly to Document) > 3. using DocumentBuilder (solr 1.2, solr 1.1) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Running Unit Tests from inside Eclipse
: > In the case of the unit tests and though, it seems like a : > simplification of the tests to make them not dependent on external : > configuration that is provided via Ant or any other tool : : Yes, I agree. : : Any objections to committing the setProperty part? i'm still not sure what we're talking about exactly (am i the only one not getting the attachments?) but i'm okay with changing tests to make them run more universally ... i was just leary of some mentioned changes to Config.java (not that i know what they were mind you .. jus that it seemed odd to need to change the actual code to get the tests to run in an IDE). -Hoss
Re: Running Unit Tests from inside Eclipse
On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote: > > I have a PDF handler modeled on the CSVHandler that allows > > you to stream a PDF document to Solr and extract the text and store > > it. > > Cool! > > Any thoughts of a general framework for going from unstructured > document -> lucene document with fields? It feels like utilizing > Apache Tika here would be the way to go (although it's in the really > early stages). > > -Yonik > Humm... So I have a PDF, Word, Excel, and Powerpoint, all as seperate handlers. And there is a lot of duplication between them... I may try and pull out the common stuff into some sort of AbstractRichDocumentHandler, and then just add the special sauce for each one. I am close to having the basic unit tests, modeled on CSVHandler, and will post a JIRA issue with it. Another thing to consider is document type/charset/language detection. People may not want to have to hit a different URL for each different type of document. I looked for Tika, but didn't see it, what is the URL? It's *really* early (entered the incubator in March) http://incubator.apache.org/tika/ http://www.nabble.com/Apache-Tika---Development-f20913.html -Yonik
Re: Running Unit Tests from inside Eclipse
> I have a PDF handler modeled on the CSVHandler that allows > you to stream a PDF document to Solr and extract the text and store > it. Cool! Any thoughts of a general framework for going from unstructured document -> lucene document with fields? It feels like utilizing Apache Tika here would be the way to go (although it's in the really early stages). -Yonik Humm... So I have a PDF, Word, Excel, and Powerpoint, all as seperate handlers. And there is a lot of duplication between them... I may try and pull out the common stuff into some sort of AbstractRichDocumentHandler, and then just add the special sauce for each one. I am close to having the basic unit tests, modeled on CSVHandler, and will post a JIRA issue with it. I looked for Tika, but didn't see it, what is the URL?
Re: Running Unit Tests from inside Eclipse
On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote: Sounds great to me! In the future, should I be communicating via JIRA issues? Code should go in JIRA issues, but you can discuss it before hand on the dev list if you like. I have a PDF handler modeled on the CSVHandler that allows you to stream a PDF document to Solr and extract the text and store it. Cool! Any thoughts of a general framework for going from unstructured document -> lucene document with fields? It feels like utilizing Apache Tika here would be the way to go (although it's in the really early stages). -Yonik
Re: Running Unit Tests from inside Eclipse
On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote: > In the case of the unit tests and though, it seems like a > simplification of the tests to make them not dependent on external > configuration that is provided via Ant or any other tool Yes, I agree. Any objections to committing the setProperty part? -Yonik Sounds great to me! In the future, should I be communicating via JIRA issues? I have a PDF handler modeled on the CSVHandler that allows you to stream a PDF document to Solr and extract the text and store it.
Re: Running Unit Tests from inside Eclipse
On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote: In the case of the unit tests and though, it seems like a simplification of the tests to make them not dependent on external configuration that is provided via Ant or any other tool Yes, I agree. Any objections to committing the setProperty part? -Yonik
Re: Running Unit Tests from inside Eclipse
I agree with the thought about bending your code to fit your IDE. In the case of the unit tests and though, it seems like a simplification of the tests to make them not dependent on external configuration that is provided via Ant or any other tool Coming from the "new to Solr and don't know the ins and outs" end of things! Hence why I like defining the System properties inside the Java test code. Eric Pugh On Jun 27, 2007, at 4:11 PM, Chris Hostetter wrote: : the path in Config.java. Attached is a patch file for these two : changes. FYI; apache mailing lists strip most attachments ... i think it works if hte mime-type is text/plain, but the simplest thing to do is just include it inline in your message. (as a general philosophy, i'm opposed to code changes solely for the purpose of making IDEs happy ... IDEs should make developing code easier, not hte other way arround) -Hoss --- Principal OpenSource Connections Site: http://www.opensourceconnections.com Blog: http://blog.opensourceconnections.com Cell: 1-434-466-1467
Solr nightly build failure
init-forrest-entities: [mkdir] Created dir: /tmp/apache-solr-nightly/build checkJunitPresence: compile-common: [mkdir] Created dir: /tmp/apache-solr-nightly/build/common [javac] Compiling 24 source files to /tmp/apache-solr-nightly/build/common [javac] Note: /tmp/apache-solr-nightly/src/java/org/apache/solr/common/params/DisMaxParams.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile: [mkdir] Created dir: /tmp/apache-solr-nightly/build/core [javac] Compiling 193 source files to /tmp/apache-solr-nightly/build/core [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. compile-solrj-core: [mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj [javac] Compiling 21 source files to /tmp/apache-solr-nightly/build/client/solrj [javac] Note: /tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compile-solrj: [javac] Compiling 2 source files to /tmp/apache-solr-nightly/build/client/solrj [javac] Note: /tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/embedded/JettySolrRunner.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. compileTests: [mkdir] Created dir: /tmp/apache-solr-nightly/build/tests [javac] Compiling 57 source files to /tmp/apache-solr-nightly/build/tests [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. junit: [mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results [junit] Running org.apache.solr.BasicFunctionalityTest [junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 21.688 sec [junit] Running org.apache.solr.ConvertedLegacyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.233 sec [junit] Running org.apache.solr.DisMaxRequestHandlerTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.904 sec [junit] Running org.apache.solr.EchoParamsTest [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.425 sec [junit] Running org.apache.solr.OutputWriterTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.273 sec [junit] Running org.apache.solr.SampleTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.231 sec [junit] Running org.apache.solr.analysis.TestBufferedTokenStream [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.045 sec [junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.043 sec [junit] Running org.apache.solr.analysis.TestKeepWordFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.046 sec [junit] Running org.apache.solr.analysis.TestPatternReplaceFilter [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.045 sec [junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.061 sec [junit] Running org.apache.solr.analysis.TestPhoneticFilter [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.061 sec [junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.046 sec [junit] Running org.apache.solr.analysis.TestSynonymFilter [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.072 sec [junit] Running org.apache.solr.analysis.TestTrimFilter [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.049 sec [junit] Running org.apache.solr.analysis.TestWordDelimiterFilter [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.71 sec [junit] Running org.apache.solr.common.SolrDocumentTest [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.047 sec [junit] Running org.apache.solr.common.params.SolrParamTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.055 sec [junit] Running org.apache.solr.common.util.ContentStreamTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.52 sec [junit] Running org.apache.solr.common.util.IteratorChainTest [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.043 sec [jun