good performance news

2009-08-16 Thread Yonik Seeley
I just profiled a CSV upload, and aside from the CSV parsing, Solr
adds pretty much no overhead!
I was expecting some non-trivial overhead due to Solr's
SolrInputDocument, update processing pipeline, and update handler...
but profiling showed that it amounted to less than 1%.

85% of the time was spent in Lucene's IndexWriter
12% of the time was spent in the CSV parser2
2% of the time was spent merging segments  in the IndexWriter

-Yonik
http://www.lucidimagination.com


[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

2009-08-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743860#action_12743860
 ] 

Yonik Seeley commented on SOLR-1353:


FYI, with all these changes, but with reuse turned off, I was seeing 10% slower 
performance than the pre-reflection code.  Some of that performance impact 
could have been due to more mixing of old and new style APIs, or proper 
clearing of attributes, etc.

> implement reusable token streams for all Solr tokenizers / token filters
> 
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you 
> don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

2009-08-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743859#action_12743859
 ] 

Yonik Seeley commented on SOLR-1353:


Yes, on my simple short field test, I got about a 90% increase in performance 
vs the pre-reflection (but still attribute based) code.
I don't know how it compares to the code pre-attributes.

> implement reusable token streams for all Solr tokenizers / token filters
> 
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you 
> don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

2009-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743858#action_12743858
 ] 

Robert Muir commented on SOLR-1353:
---

seems to almost double throughput... how does this compare to pre-reflection 
etc... is it actually any faster?

> implement reusable token streams for all Solr tokenizers / token filters
> 
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you 
> don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

2009-08-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1353.


Resolution: Fixed

Committed.

> implement reusable token streams for all Solr tokenizers / token filters
> 
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you 
> don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1353) implement reusable token streams for all Solr tokenizers / token filters

2009-08-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1353:
---

Attachment: SOLR-1353.patch

Patch implementing reusable analyzers.
Simple filters have been converted to use the new API.
Complex filters such as synonym and WFD have not been converted.

> implement reusable token streams for all Solr tokenizers / token filters
> 
>
> Key: SOLR-1353
> URL: https://issues.apache.org/jira/browse/SOLR-1353
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 1.4
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Blocker
> Fix For: 1.4
>
> Attachments: SOLR-1353.patch
>
>
> The new lucene token architecture causes bad indexing performance if you 
> don't happen to use reusable token streams.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1143) Return partial results when a connection to a shard is refused

2009-08-16 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-1143:


Attachment: SOLR-1143-2.patch

I have added a test in _TestDistributedSearch_ class. This test sets up a 
cluster of shards and then kills one shard and then it expects that the search 
request as a whole to continue. The _TestDistributedSearch_ class in general 
tests distributed search by having a non distributed instance and a cluster of 
shards both have the same documents. All results from the cluster are compared 
with results from the non distributed instance. Some things in the test I added 
like facets and maxScore could not be tested because one shard in the cluster 
is down (so part of the corpus is missing). Only the documents that are 
returned from the shards are compared against the documents in the non 
distributed instance.

I have also included the option to disable / enable partial results as Lance 
described. I agree with Lance that ignoring a  shard failure should *not* be 
enabled by default, if you do not know about this feature then finding the 
cause of the actual problem might be difficult. 

In this patch you return a partial result when a shard has failed by setting 
_partialResults_ to _true_ in the request or if you want it to for all requests 
your can add _true_  to your search 
handler in your solrconfig.xml. If both are not specified, partial results are 
disabled. Currently the _partialResults_ parameter overrides the 
_return-partial-results_ property in the search handler.

> Return partial results when a connection to a shard is refused
> --
>
> Key: SOLR-1143
> URL: https://issues.apache.org/jira/browse/SOLR-1143
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Nicolas Dessaigne
> Fix For: 1.4
>
> Attachments: SOLR-1143-2.patch, SOLR-1143.patch
>
>
> If any shard is down in a distributed search, a ConnectException it thrown.
> Here's a little patch that change this behaviour: if we can't connect to a 
> shard (ConnectException), we get partial results from the active shards. As 
> for TimeOut parameter (https://issues.apache.org/jira/browse/SOLR-502), we 
> set the parameter "partialResults" at true.
> This patch also adresses a problem expressed in the mailing list about a year 
> ago 
> (http://www.nabble.com/partialResults,-distributed-search---SOLR-502-td19002610.html)
> We have a use case that needs this behaviour and we would like to know your 
> thougths about such a behaviour? Should it be the default behaviour for 
> distributed search?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: lucene upgrade

2009-08-16 Thread Yonik Seeley
In the process of upgrading - there were changes to Weight that
require some work.

-Yonik
http://www.lucidimagination.com

On Sat, Aug 15, 2009 at 10:54 AM, Yonik
Seeley wrote:
> FYI, I plan on updating the lucene libs once LUCENE-1794 is checked in.
>


[jira] Commented: (SOLR-1315) new replication command needed to force a backup when there is no committed index data

2009-08-16 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743827#action_12743827
 ] 

Noble Paul commented on SOLR-1315:
--

jay, I guess this fix should be good enough for you

> new replication command needed to force a backup when there is no committed 
> index data
> --
>
> Key: SOLR-1315
> URL: https://issues.apache.org/jira/browse/SOLR-1315
> Project: Solr
>  Issue Type: New Feature
>  Components: replication (java)
>Affects Versions: 1.4
> Environment: Mac OS
>Reporter: Jay
>Assignee: Noble Paul
>Priority: Minor
> Fix For: 1.4
>
> Attachments: SOLR-1315.patch
>
>   Original Estimate: 5h
>  Remaining Estimate: 5h
>
> Here is an email describing the problem, and a possible solution.
> I agree. I think both options could be useful - perhaps a 'forceBackup' as
> well? Documentation would take care of the rest. Have you added this info to
> the wiki yet?
> --
> - Mark
> http://www.lucidimagination.com
> On Thu, Jul 23, 2009 at 12:56 PM, solr jay  wrote:
> > Hi,
> >
> > I noticed that the backup request
> >
> > http://master_host:port/solr/replication?command=backup<
> > http://master_host/solr/replication?command=backup>
> - Hide quoted text -
> >
> > works only if there are committed index data, i.e.
> > core.getDeletionPolicy().getLatestCommit() is not null. Otherwise, no
> > backup
> > is created. It sounds logical because if nothing has been committed since
> > your last backup, it doesn't help much to do a new backup. However,
> > consider
> > this scenario:
> >
> > 1. a backup process is scheduled at 1:00AM every Monday
> > 2. just before 1:00AM, the system is shutdown (for whatever reason), and
> > then restarts
> > 3. No index is committed before 1:00AM
> > 4. at 1:00AM, backup process starts and no committed index is found, and
> > therefore no backup (until next week)
> >
> > The probability of this scenario is probably small, but it still could
> > happen, and it seems to me that if I want to backup index, a backup should
> > be created whether there are new committed index or not.
> >
> > Your thoughts?
> >
> > Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-trunk #897

2009-08-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/897/changes

Changes:

[yonik] add back timestamps to more example docs

--
[...truncated 7182 lines...]
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.512 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.106 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.186 sec
[junit] Running org.apache.solr.analysis.TestStopFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.013 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 4.457 sec
[junit] Running org.apache.solr.analysis.TestSynonymMap
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.278 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.664 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 28.585 sec
[junit] Running org.apache.solr.client.solrj.SolrExceptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.638 sec
[junit] Running org.apache.solr.client.solrj.SolrQueryTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.571 sec
[junit] Running org.apache.solr.client.solrj.TestBatchUpdate
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 16.132 sec
[junit] Running org.apache.solr.client.solrj.TestLBHttpSolrServer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 11.049 sec
[junit] Running org.apache.solr.client.solrj.beans.TestDocumentObjectBinder
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.028 sec
[junit] Running org.apache.solr.client.solrj.embedded.JettyWebappTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.321 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.511 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.379 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 10.832 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.062 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.466 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.902 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 15.576 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 25.283 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 26.877 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.297 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.662 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.446 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.838 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.674 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.298 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.287 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.472 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.468 sec
[junit] Running org.a

Solr nightly build failure

2009-08-16 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 84 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 371 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 165 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 41.251 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 15.153 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 13.187 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 5.74 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 10.281 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 6.948 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.73 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 89.011 sec
[junit] Running org.apache.solr.TestTrie
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 25.963 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.764 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.422 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.974 sec
[junit] Running org.apache.solr.analysis.HTMLStripCharFilterTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 2.855 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.939 sec
[junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 4.955 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.976 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.022 sec
[junit] Running 
org.apache.solr.analysis.TestDelimitedPayloadTokenFilterFactory
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 12.122 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.351 sec
[junit] Running org.apache.solr.analysis.TestKeepFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.816 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.013 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.482 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 8.407 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.663 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.845 sec
[junit] Running org.apache.solr.a