[jira] Updated: (SOLR-1301) Solr + Hadoop

2010-01-31 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1301:
---

Attachment: SOLR-1301.patch

This update include's Kevin's recommended path change

> Solr + Hadoop
> -
>
> Key: SOLR-1301
> URL: https://issues.apache.org/jira/browse/SOLR-1301
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 1.4
>Reporter: Andrzej Bialecki 
> Fix For: 1.5
>
> Attachments: commons-logging-1.0.4.jar, 
> commons-logging-api-1.0.4.jar, hadoop-0.19.1-core.jar, hadoop.patch, 
> log4j-1.2.15.jar, README.txt, SOLR-1301.patch, SOLR-1301.patch, 
> SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, SOLR-1301.patch, 
> SolrRecordWriter.java
>
>
> This patch contains  a contrib module that provides distributed indexing 
> (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
> twofold:
> * provide an API that is familiar to Hadoop developers, i.e. that of 
> OutputFormat
> * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
> SolrOutputFormat consumes data produced by reduce tasks directly, without 
> storing it in intermediate files. Furthermore, by using an 
> EmbeddedSolrServer, the indexing task is split into as many parts as there 
> are reducers, and the data to be indexed is not sent over the network.
> Design
> --
> Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
> which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
> instantiates an EmbeddedSolrServer, and it also instantiates an 
> implementation of SolrDocumentConverter, which is responsible for turning 
> Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
> batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
> task completes, and the OutputFormat is closed, SolrRecordWriter calls 
> commit() and optimize() on the EmbeddedSolrServer.
> The API provides facilities to specify an arbitrary existing solr.home 
> directory, from which the conf/ and lib/ files will be taken.
> This process results in the creation of as many partial Solr home directories 
> as there were reduce tasks. The output shards are placed in the output 
> directory on the default filesystem (e.g. HDFS). Such part-N directories 
> can be used to run N shard servers. Additionally, users can specify the 
> number of reduce tasks, in particular 1 reduce task, in which case the output 
> will consist of a single shard.
> An example application is provided that processes large CSV files and uses 
> this API. It uses a custom CSV processing to avoid (de)serialization overhead.
> This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
> issue, you should put it in contrib/hadoop/lib.
> Note: the development of this patch was sponsored by an anonymous contributor 
> and approved for release under Apache License.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug

2010-01-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12806833#action_12806833
 ] 

Robert Muir commented on SOLR-1670:
---

Steven, i don't have a problem with your patch (I do not wish to be in the way 
of anyone trying to work on SynonymFilter)

But i want to explain some of where i was coming from.

The main reason i got myself into this mess was to try to add wordnet support 
to solr. However, this is currently not possible without duplicating a lot of 
code.
We need to be really careful about allowing any order, it does matter in some 
situations.
For example, in Lucene's synonymfilter (with wordnet support), it has an option 
to limit the number of expansions (so its like a top-N synonym expansion).
Solr doesnt currently have this, so its N/A for now, but just an example where 
the order suddenly becomes important.

only slightly related: we added some improvements to this assertion in lucene 
recently and found a lot of bugs, better checking for clearAttribute() and end()
at some I would like to port these test improvements over to solr, too. 


> synonymfilter/map repeat bug
> 
>
> Key: SOLR-1670
> URL: https://issues.apache.org/jira/browse/SOLR-1670
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 1.4
>Reporter: Robert Muir
>Assignee: Yonik Seeley
> Attachments: SOLR-1670.patch, SOLR-1670.patch, SOLR-1670_test.patch
>
>
> as part of converting tests for SOLR-1657, I ran into a problem with 
> synonymfilter
> the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
> which does not really validate that two lists of token are equal, it just 
> stops at the shorted one.
> {code}
> // repeats
> map.add(strings("a b"), tokens("ab"), orig, merge);
> map.add(strings("a b"), tokens("ab"), orig, merge);
> assertTokEqual(getTokList(map,"a b",false), tokens("ab"));
> /* in reality the result from getTokList is ab ab ab! */
> {code}
> when converted to assertTokenStreamContents this problem surfaced. attached 
> is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1741) NPE when deletionPolicy sets maxOptimizedCommitsTokeep=0

2010-01-31 Thread Noble Paul (JIRA)
NPE when deletionPolicy sets maxOptimizedCommitsTokeep=0


 Key: SOLR-1741
 URL: https://issues.apache.org/jira/browse/SOLR-1741
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 1.4
Reporter: Noble Paul
Priority: Minor
 Fix For: 1.5


This is a user reported issue http://markmail.org/thread/bjcwiw3s66b5x76h



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-trunk #1048

2010-01-31 Thread Apache Hudson Server
See 

--
[...truncated 2365 lines...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 13.772 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.047 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.916 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.64 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.056 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.419 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 50.798 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 44.358 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 50.666 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.421 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.491 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.367 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.465 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.365 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.64 sec
[junit] Running org.apache.solr.client.solrj.response.TermsResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.12 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 12.225 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.43 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.433 sec
[junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.413 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.435 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.426 sec
[junit] Running org.apache.solr.common.util.DOMUtilTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.483 sec
[junit] Running org.apache.solr.common.util.FileUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.425 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.411 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.383 sec
[junit] Running org.apache.solr.common.util.TestFastInputStream
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.376 sec
[junit] Running org.apache.solr.common.util.TestHash
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.483 sec
[junit] Running org.apache.solr.common.util.TestNamedListCodec
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.49 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.511 sec
[junit] Running org.apache.solr.core.AlternateDirectoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.81 sec
[junit] Running org.apache.solr.core.AlternateIndexReaderTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.704 sec
[junit] Running org.apache.solr.core.IndexReaderFactoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.432 sec
[junit] Running org.apache.solr.core.RequestHandlersTest
[junit] Tests r

[jira] Updated: (SOLR-1670) synonymfilter/map repeat bug

2010-01-31 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated SOLR-1670:
--

Attachment: SOLR-1670.patch

New version of the patch, introducing the possibility to test token streams 
that are allowed to vary the order of their overlapping tokens.  All tests pass 
for me.

Yonik wrote:
bq. The description of the synonym filter never specifies the order of 
overlapping tokens, thus the tests should accept either.

Yonik, can you check that the attached patch does what you are suggesting?

Robert wrote:
{quote}
I guess in my opinion, overall its better for the tests to be overly strict, 
and if in the future we make a valid change to the implementation that breaks a 
test, we can discuss it during said change, and people can comment on whether 
this behavior was actually important: for example the aa/a versus a/aa i would 
probably say not a big deal, but the aa versus aa/aa/aa thing to me is a big 
deal.

The alternative, being overly lax, presents the possibility of introducing 
incorrect behavior without being caught, and I think this is very dangerous.
{quote}

Robert, do you think that the attached patch crosses over into overly lax land?

> synonymfilter/map repeat bug
> 
>
> Key: SOLR-1670
> URL: https://issues.apache.org/jira/browse/SOLR-1670
> Project: Solr
>  Issue Type: Bug
>  Components: Schema and Analysis
>Affects Versions: 1.4
>Reporter: Robert Muir
>Assignee: Yonik Seeley
> Attachments: SOLR-1670.patch, SOLR-1670.patch, SOLR-1670_test.patch
>
>
> as part of converting tests for SOLR-1657, I ran into a problem with 
> synonymfilter
> the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
> which does not really validate that two lists of token are equal, it just 
> stops at the shorted one.
> {code}
> // repeats
> map.add(strings("a b"), tokens("ab"), orig, merge);
> map.add(strings("a b"), tokens("ab"), orig, merge);
> assertTokEqual(getTokList(map,"a b",false), tokens("ab"));
> /* in reality the result from getTokList is ab ab ab! */
> {code}
> when converted to assertTokenStreamContents this problem surfaced. attached 
> is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.