[jira] Commented: (SOLR-272) SolrDocument performance testing

2007-06-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508993
 ] 

Yonik Seeley commented on SOLR-272:
---

> The one big difference to Yoniks suggestion above is that it returns a 
> Collection for getFieldValues() even if it is a single valued field

That's a good change as it leads to simpler client code.
I think that getFieldValue() should perhaps return the raw entry (an Object or 
a Collection) for those (like the indexer) who would want the most 
efficient access.


> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, 
> SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Multiple indexes/cores (aka solr-215) functional value?

2007-06-28 Thread Otis Gospodnetic
This is precisely what I want to do.  Yes, I can add JNDI entries to various 
Jetty XML files, but this is good only if you have a fixed set of indices known 
ahead of time (before starting the servlet container).  I want the ability to 
add and remove indices on the fly, while the servlet container with Solr is 
running.  This is where SOLR-215 comes in.  Henri, hang in there. :)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-dev@lucene.apache.org
Sent: Wednesday, June 27, 2007 5:00:08 PM
Subject: Re: Multiple indexes/cores (aka solr-215) functional value?

On 6/27/07, Henrib <[EMAIL PROTECTED]> wrote:
> This  http://www.nabble.com/multiple-indices-tf3982573.html thread  triggers
> the question again.
> Solr-215 makes it easier to deploy multiple indexes than using multiple web
> applications; but is "easier" enough for not being just a superfluous
> feature?

With a fixed handful of indicies, IMO, no.
Though if one needs to programmatically add new indicies/schemas,
SOLR-215 becomes interesting.  I don't know how common of a case that
is though.  There are probably other use cases I've not considered.

SOLR-215 does seem unrelated to distributed search though.

-Yonik





[jira] Commented: (SOLR-272) SolrDocument performance testing

2007-06-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508890
 ] 

Yonik Seeley commented on SOLR-272:
---

> To be honest, I'm not sure the complexity of dealing with a 
> Map (where the Object may be a 
> collection or not) is worth the marginal speedup.

I'm not sure either, but one reason the speedup is marginal is that it's not 
the bottleneck (other things are taking more time, like dynamic copy-field 
checking... I've never checked that code to see if it could be optimized, but 
things are quite a bit faster when all the dynamic fields are removed).

SolrInputDocument could similary be sped up by getting rid of the Map for 
boosts.
One could either store a bare value, or a BoostedValue.

class BoostedValue {
  float boost;
  Object value;
}



> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, 
> SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-272) SolrDocument performance testing

2007-06-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-272:
---

Attachment: SolrInputDoc.patch

This is an alternative version of SolrDocument that only creates Collections 
for mulitvalued fields... The one big difference to Yoniks suggestion above is 
that it returns a Collection for getFieldValues() even if it is a 
single valued field.  

Running the perf test for 1M docs 5 times for each implementation:

[100] SolrInputDocument:   9992  9827  9823  9854  9948  
[100] SolrInputDocument2:  9636   9719  9699  9807  9729
[100] DocumentBuilder: 8866   8818  8946  8812  8953

To be honest, I'm not sure the complexity of dealing with a Map 
(where the Object may be a collection or not) is worth the marginal speedup.  I 
suppose if the docs are all single valued it would be a more substantial 
difference.

> SolrDocument performance testing
> 
>
> Key: SOLR-272
> URL: https://issues.apache.org/jira/browse/SOLR-272
> Project: Solr
>  Issue Type: Test
>Affects Versions: 1.3
>Reporter: Ryan McKinley
> Attachments: SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SOLR-272-SolrDocumentPerformanceTesting.patch, 
> SolrDocumentPerformanceTester.java, SolrDocumentPerformanceTester.java, 
> SolrInputDoc.patch, SolrInputDoc.patch
>
>
> In 1.3, we added SolrInputDocument -- a temporary class to hold document 
> information.  There is concern that this may be less then ideal 
> performance-wise.
> To settle some concerns (mine included) I want to compare a few SolrDocument 
> implementations to make sure we are not doing something crazy.
> I implemented a LuceneInputDocument subclass of SolrInputDocument that stores 
> its values directly in Lucene Document (rather then a Map).
> This is a quick test comparing:
> 1. Building documents with SolrInputDocument 
> 2. Building documents with LuceneInputDocument (same interface writing 
> directly to Document)
> 3. using DocumentBuilder (solr 1.2, solr 1.1)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Chris Hostetter

: > In the case of the unit tests and  though, it seems like a
: > simplification of the tests to make them not dependent on external
: > configuration that is provided via Ant or any other tool
:
: Yes, I agree.
:
: Any objections to committing the setProperty part?

i'm still not sure what we're talking about exactly (am i the only one not
getting the attachments?) but i'm okay with changing tests to make them
run more universally ... i was just leary of some mentioned changes to
Config.java (not that i know what they were mind you .. jus that it seemed
odd to need to change the actual code to get the tests to run in an IDE).



-Hoss



Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Yonik Seeley

On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote:

> >  I have a PDF handler modeled on the CSVHandler that allows
> > you to stream a PDF document to Solr and extract the text and store
> > it.
>
> Cool!
>
> Any thoughts of a general framework for going from unstructured
> document -> lucene document with fields?  It feels like utilizing
> Apache Tika here would be the way to go (although it's in the really
> early stages).
>
> -Yonik
>
Humm...  So I have a PDF, Word, Excel, and Powerpoint, all as seperate
handlers.  And there is a lot of duplication between them...  I may
try and pull out the common stuff into some sort of
AbstractRichDocumentHandler, and then just add the special sauce for
each one.   I am close to having the basic unit tests, modeled on
CSVHandler, and will post a JIRA issue with it.


Another thing to consider is document type/charset/language detection.
People may not want to have to hit a different URL for each different
type of document.


I looked for Tika, but didn't see it, what is the URL?


It's *really* early (entered the incubator in March)
http://incubator.apache.org/tika/
http://www.nabble.com/Apache-Tika---Development-f20913.html


-Yonik


Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Eric Pugh


>  I have a PDF handler modeled on the CSVHandler that allows
> you to stream a PDF document to Solr and extract the text and store
> it.

Cool!

Any thoughts of a general framework for going from unstructured
document -> lucene document with fields?  It feels like utilizing
Apache Tika here would be the way to go (although it's in the really
early stages).

-Yonik


Humm...  So I have a PDF, Word, Excel, and Powerpoint, all as seperate
handlers.  And there is a lot of duplication between them...  I may
try and pull out the common stuff into some sort of
AbstractRichDocumentHandler, and then just add the special sauce for
each one.   I am close to having the basic unit tests, modeled on
CSVHandler, and will post a JIRA issue with it.

I looked for Tika, but didn't see it, what is the URL?


Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Yonik Seeley

On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote:

Sounds great to me!  In the future, should I be communicating via JIRA
issues?


Code should go in JIRA issues, but you can discuss it before hand on
the dev list if you like.


 I have a PDF handler modeled on the CSVHandler that allows
you to stream a PDF document to Solr and extract the text and store
it.


Cool!

Any thoughts of a general framework for going from unstructured
document -> lucene document with fields?  It feels like utilizing
Apache Tika here would be the way to go (although it's in the really
early stages).

-Yonik


Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Eric Pugh

On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote:
> In the case of the unit tests and  though, it seems like a
> simplification of the tests to make them not dependent on external
> configuration that is provided via Ant or any other tool

Yes, I agree.

Any objections to committing the setProperty part?

-Yonik



Sounds great to me!  In the future, should I be communicating via JIRA
issues?  I have a PDF handler modeled on the CSVHandler that allows
you to stream a PDF document to Solr and extract the text and store
it.


Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Yonik Seeley

On 6/28/07, Eric Pugh <[EMAIL PROTECTED]> wrote:

In the case of the unit tests and  though, it seems like a
simplification of the tests to make them not dependent on external
configuration that is provided via Ant or any other tool


Yes, I agree.

Any objections to committing the setProperty part?

-Yonik


Re: Running Unit Tests from inside Eclipse

2007-06-28 Thread Eric Pugh

I agree with the thought about bending your code to fit your IDE.

In the case of the unit tests and  though, it seems like a  
simplification of the tests to make them not dependent on external  
configuration that is provided via Ant or any other tool   Coming  
from the "new to Solr and don't know the ins and outs" end of things!


Hence why I like defining the System properties inside the Java test  
code.


Eric Pugh


On Jun 27, 2007, at 4:11 PM, Chris Hostetter wrote:


: the path in Config.java.  Attached is a patch file for these two
: changes.

FYI; apache mailing lists strip most attachments ... i think it  
works if
hte mime-type is text/plain, but the simplest thing to do is just  
include

it inline in your message.

(as a general philosophy, i'm opposed to code changes solely for the
purpose of making IDEs happy ... IDEs should make developing code  
easier,

not hte other way arround)



-Hoss



---
Principal
OpenSource Connections
Site: http://www.opensourceconnections.com
Blog: http://blog.opensourceconnections.com
Cell: 1-434-466-1467






Solr nightly build failure

2007-06-28 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build

checkJunitPresence:

compile-common:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/common
[javac] Compiling 24 source files to /tmp/apache-solr-nightly/build/common
[javac] Note: 
/tmp/apache-solr-nightly/src/java/org/apache/solr/common/params/DisMaxParams.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/core
[javac] Compiling 193 source files to /tmp/apache-solr-nightly/build/core
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile-solrj-core:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/client/solrj
[javac] Compiling 21 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/impl/CommonsHttpSolrServer.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compile-solrj:
[javac] Compiling 2 source files to 
/tmp/apache-solr-nightly/build/client/solrj
[javac] Note: 
/tmp/apache-solr-nightly/client/java/solrj/src/org/apache/solr/client/solrj/embedded/JettySolrRunner.java
 uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 57 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 24, Failures: 0, Errors: 0, Time elapsed: 21.688 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.233 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.904 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.425 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.273 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.231 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.045 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.043 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.046 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.045 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.061 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.061 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.046 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.072 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.049 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.71 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.047 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.055 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.52 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.043 sec
[jun