Solr nightly build failure

2009-03-02 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 74 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 358 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 142 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 19, Failures: 0, Errors: 0, Time elapsed: 20.271 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.824 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.983 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.686 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 2.515 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.231 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.863 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 23.811 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.417 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.375 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.206 sec
[junit] Running org.apache.solr.analysis.HTMLStripReaderTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.819 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.957 sec
[junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.357 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.214 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.207 sec
[junit] Running org.apache.solr.analysis.TestCharFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.501 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.138 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.991 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilter
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.427 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.448 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.942 sec
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.31 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.815 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.348 sec
[junit] Running org.apache.solr.a

[jira] Updated: (SOLR-940) TrieRange support

2009-03-02 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-940:
---

Attachment: SOLR-940.patch

Changes:
# Adding support for open ranges
# Changed precision step in test schema.xml to 4
# Renamed TrieTokenizerFactory to TrieIndexTokenizerFactory
# Added a TrieQueryTokenizerFactory which converts query token to 
xxxToPrefixCoded form. Now term queries (in q or fq) are supported
# Updated tests for open ranges and term queries
# Minor javadoc updates

TODO:
# Date support
# Wiki updates
# Example schema updates

> TrieRange support
> -
>
> Key: SOLR-940
> URL: https://issues.apache.org/jira/browse/SOLR-940
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Fix For: 1.4
>
> Attachments: SOLR-940.patch, SOLR-940.patch, SOLR-940.patch
>
>
> We need support in Solr for the new TrieRange Lucene functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-940) TrieRange support

2009-03-02 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677936#action_12677936
 ] 

Shalin Shekhar Mangar commented on SOLR-940:


Thanks Uwe for spotting those problems. The latest patch should take care of 
these issues.

bq. For trie fields it would be good, to have something like "sorting on the 
first term of the document".

Hmm, yeah. This looks like the easiest solution.

> TrieRange support
> -
>
> Key: SOLR-940
> URL: https://issues.apache.org/jira/browse/SOLR-940
> Project: Solr
>  Issue Type: New Feature
>Reporter: Yonik Seeley
> Fix For: 1.4
>
> Attachments: SOLR-940.patch, SOLR-940.patch, SOLR-940.patch
>
>
> We need support in Solr for the new TrieRange Lucene functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-844) A SolrServer impl to front-end multiple urls

2009-03-02 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-844:


Attachment: SOLR-844.patch

#implementation changed to do the pings in a separate thread. This keeps the 
code simple
# javadocs added

> A SolrServer impl to front-end multiple urls
> 
>
> Key: SOLR-844
> URL: https://issues.apache.org/jira/browse/SOLR-844
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java
>Affects Versions: 1.3
>Reporter: Noble Paul
>Assignee: Shalin Shekhar Mangar
> Fix For: 1.4
>
> Attachments: SOLR-844.patch, SOLR-844.patch, SOLR-844.patch, 
> SOLR-844.patch, SOLR-844.patch, SOLR-844.patch
>
>
> Currently a {{CommonsHttpSolrServer}} can talk to only one server. This 
> demands that the user have a LoadBalancer or do the roundrobin on their own. 
> We must have a {{LBHttpSolrServer}} which must automatically do a 
> Loadbalancing between multiple hosts. This can be backed by the 
> {{CommonsHttpSolrServer}}
> This can have the following other features
> * Automatic failover
> * Optionally take in  a file /url containing the the urls of servers so that 
> the server list can be automatically updated  by periodically loading the 
> config
> * Support for adding removing servers during runtime
> * Pluggable Loadbalancing mechanism. (round-robin, weighted round-robin, 
> random etc)
> * Pluggable Failover mechanisms

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li
Hi,

I wonder if there is interest in a contrib module that builds Solr
index using Hadoop MapReduce?

It is different from the Solr support in Nutch. The Solr support in
Nutch sends a document to a Solr server in a reduce task. Here, I aim
at building/updating Solr index within map/reduce tasks. Also, it
achieves better parallelism when the number of map tasks is greater
than the number of reduce tasks, which is usually the case.

I worked out a very simple initial version. But I want to check if
there is any interest before proceeding. If so, I'll open a Jira
issue.

Cheers,
Ning


Re: Build Solr index using Hadoop MapReduce

2009-03-02 Thread Shalin Shekhar Mangar
On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:

> Hi,
>
> I wonder if there is interest in a contrib module that builds Solr
> index using Hadoop MapReduce?
>

Absolutely!


> It is different from the Solr support in Nutch. The Solr support in
> Nutch sends a document to a Solr server in a reduce task. Here, I aim
> at building/updating Solr index within map/reduce tasks. Also, it
> achieves better parallelism when the number of map tasks is greater
> than the number of reduce tasks, which is usually the case.
>
> I worked out a very simple initial version. But I want to check if
> there is any interest before proceeding. If so, I'll open a Jira
> issue.
>

+1

Please do. It'd be great to see this in Solr.

-- 
Regards,
Shalin Shekhar Mangar.


[jira] Commented: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678074#action_12678074
 ] 

Steven Rowe commented on SOLR-346:
--

Jason, I think you're right.

The following script:
{code}
#!/bin/bash
data_dir=something
name=`perl -e 'print q|${data_dir}|;'`
echo "As it is: $name"
name=`perl -e 'print q|'${data_dir}'|;'`
echo "As you would have it: $name"
{code}

Produces this output:

{noformat}
As it is: ${data_dir}
As you would have it: something
{noformat}


> need to improve snapinstaller to ignore non-snapshots in data directory
> ---
>
> Key: SOLR-346
> URL: https://issues.apache.org/jira/browse/SOLR-346
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (scripts)
>Affects Versions: 1.2, 1.3
>Reporter: Bill Au
>Assignee: Bill Au
>Priority: Minor
> Fix For: 1.3.1
>
> Attachments: solr-346.patch
>
>
> http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
> > latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
> > installed
> A directory in the Solr data directory is causing snapinstaller to fail.  
> Snapinstaller should be improved to ignore any much non-snapshot as possible. 
>  It can use a regular expression to look for snapshot.dd where d 
> is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678105#action_12678105
 ] 

Yonik Seeley commented on SOLR-1044:


Is our use of HTTP really a bottleneck?

My feeling has been that if we go to a call mechanism, it should be based on 
something more standard that will have many off the shelf bindings - perl, 
python, php, C, etc.

On the plus side of hadoop RPC, it could handle multiple requests per socket.  
That can also be a potential weakness though I think... a slow reader or writer 
for one request/response hangs up all the others.

> Use Hadoop RPC for inter Solr communication
> ---
>
> Key: SOLR-1044
> URL: https://issues.apache.org/jira/browse/SOLR-1044
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Noble Paul
>
> Solr uses http for distributed search . We can make it a whole lot faster if 
> we use an RPC mechanism which is more lightweight/efficient. 
> Hadoop RPC looks like a good candidate for this.  
> The implementation should just have one protocol. It should follow the Solr's 
> idiom of making remote calls . A uri + params +[optional stream(s)] . The 
> response can be a stream of bytes.
> To make this work we must make the SolrServer implementation pluggable in 
> distributed search. Users should be able to choose between the current 
> CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678108#action_12678108
 ] 

Ken Krugler commented on SOLR-1044:
---

I agree with both of Yonik's points:

# We'd first want to measure real-world performance before deciding that using 
something other than HTTP was important.
# Using something other than HTTP has related costs that should be considered.

At Krugle we used Hadoop RPC to handle remote searchers. In general it worked 
well, but we did run into the problem similar to what Yonik voiced as a 
potential concern - occasionally a remote searcher would hang, and when that 
happened the socket would essentially become a zombie. Under very heavy load 
testing this wound up eventually causing the entire system to lock up.

Though we heard that there were subsequent changes to the Hadoop RPC that fixed 
a number of similar bugs. Not sure about any details, though, and we never 
re-ran tests with the latest Hadoop (at that time, which was about a year ago).

If there are performance issues, I would be curious if using a long-lasting 
connection via keep-alive significantly reduces the overhead. I know that Jetty 
(for example) has a very efficient implementation of the Comet web app model, 
where you don't wind up needing a gazillion threads to handle many 
requests/second.

> Use Hadoop RPC for inter Solr communication
> ---
>
> Key: SOLR-1044
> URL: https://issues.apache.org/jira/browse/SOLR-1044
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Noble Paul
>
> Solr uses http for distributed search . We can make it a whole lot faster if 
> we use an RPC mechanism which is more lightweight/efficient. 
> Hadoop RPC looks like a good candidate for this.  
> The implementation should just have one protocol. It should follow the Solr's 
> idiom of making remote calls . A uri + params +[optional stream(s)] . The 
> response can be a stream of bytes.
> To make this work we must make the SolrServer implementation pluggable in 
> distributed search. Users should be able to choose between the current 
> CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Stax-API jar

2009-03-02 Thread Tricia Williams
Thanks.  I just realized that this type of information is included in 
the NOTICE.txt file.


Shalin Shekhar Mangar wrote:


Solr is using the woodstox implementation:
1. wstx-asl-3.2.7.jar
2. geronimo-stax-api_1.0_spec-1.0.1.jar

  




Re: Build Solr index using Hadoop MapReduce

2009-03-02 Thread Marc Sturlese

I am doing some research about creating lucene/solr index using hadoop but
there's not so much info around, would be great to see some code!!! (I am
experiencing problems specially in duplication detection)
Thanks

Shalin Shekhar Mangar wrote:
> 
> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:
> 
>> Hi,
>>
>> I wonder if there is interest in a contrib module that builds Solr
>> index using Hadoop MapReduce?
>>
> 
> Absolutely!
> 
> 
>> It is different from the Solr support in Nutch. The Solr support in
>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>> at building/updating Solr index within map/reduce tasks. Also, it
>> achieves better parallelism when the number of map tasks is greater
>> than the number of reduce tasks, which is usually the case.
>>
>> I worked out a very simple initial version. But I want to check if
>> there is any interest before proceeding. If so, I'll open a Jira
>> issue.
>>
> 
> +1
> 
> Please do. It'd be great to see this in Solr.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
Sent from the Solr - Dev mailing list archive at Nabble.com.



[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678121#action_12678121
 ] 

Eks Dev commented on SOLR-1044:
---

I do not know much about Solr needs there, but we are using one of prehistoric 
versions of hadoop RPC (no NIO version)  as everything else proved to eat far 
to much time (in 800+ rq/sec environment every millisecond counts). Creating 
new Sockets is not working there as OSs start having problems to keep up with 
this rate (especially with java , slower Socket release due to gc() latency).  


We are anyhow contemplating to give etch (or thrift) a try. Etch looks like 
really good peace of work, with great flexibility. Someone tried it? 

> Use Hadoop RPC for inter Solr communication
> ---
>
> Key: SOLR-1044
> URL: https://issues.apache.org/jira/browse/SOLR-1044
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Noble Paul
>
> Solr uses http for distributed search . We can make it a whole lot faster if 
> we use an RPC mechanism which is more lightweight/efficient. 
> Hadoop RPC looks like a good candidate for this.  
> The implementation should just have one protocol. It should follow the Solr's 
> idiom of making remote calls . A uri + params +[optional stream(s)] . The 
> response can be a stream of bytes.
> To make this work we must make the SolrServer implementation pluggable in 
> distributed search. Users should be able to choose between the current 
> CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-346:
--

Fix Version/s: (was: 1.3.1)
   1.4

> need to improve snapinstaller to ignore non-snapshots in data directory
> ---
>
> Key: SOLR-346
> URL: https://issues.apache.org/jira/browse/SOLR-346
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (scripts)
>Affects Versions: 1.2, 1.3
>Reporter: Bill Au
>Assignee: Bill Au
>Priority: Minor
> Fix For: 1.4
>
> Attachments: solr-346.patch
>
>
> http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
> > latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
> > installed
> A directory in the Solr data directory is causing snapinstaller to fail.  
> Snapinstaller should be improved to ignore any much non-snapshot as possible. 
>  It can use a regular expression to look for snapshot.dd where d 
> is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic reopened SOLR-346:
---


> need to improve snapinstaller to ignore non-snapshots in data directory
> ---
>
> Key: SOLR-346
> URL: https://issues.apache.org/jira/browse/SOLR-346
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (scripts)
>Affects Versions: 1.2, 1.3
>Reporter: Bill Au
>Assignee: Bill Au
>Priority: Minor
> Fix For: 1.4
>
> Attachments: solr-346.patch
>
>
> http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
> > latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
> > installed
> A directory in the Solr data directory is causing snapinstaller to fail.  
> Snapinstaller should be improved to ignore any much non-snapshot as possible. 
>  It can use a regular expression to look for snapshot.dd where d 
> is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678137#action_12678137
 ] 

Steven Rowe commented on SOLR-346:
--

BTW, I tested the similar change made to snappuller under SOLR-830, and it does 
*not* have the same interpolation issue:

{code}
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} \
"perl -e 'chdir ${master_data_dir}|; print ((sort grep 
{/^snapshot[.][1-9][0-9]{13}$/} <*>)[-1])'"`
{code}

Since the perl script is enclosed in double quotes, ${master_data_dir} is 
properly interpolated before perl sees it.

> need to improve snapinstaller to ignore non-snapshots in data directory
> ---
>
> Key: SOLR-346
> URL: https://issues.apache.org/jira/browse/SOLR-346
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (scripts)
>Affects Versions: 1.2, 1.3
>Reporter: Bill Au
>Assignee: Bill Au
>Priority: Minor
> Fix For: 1.4
>
> Attachments: solr-346.patch
>
>
> http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
> > latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
> > installed
> A directory in the Solr data directory is causing snapinstaller to fail.  
> Snapinstaller should be improved to ignore any much non-snapshot as possible. 
>  It can use a regular expression to look for snapshot.dd where d 
> is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-346) need to improve snapinstaller to ignore non-snapshots in data directory

2009-03-02 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678137#action_12678137
 ] 

steve_rowe edited comment on SOLR-346 at 3/2/09 2:39 PM:
--

BTW, I tested the similar change made to snappuller under SOLR-830, and it does 
*not* have the same interpolation issue:

{code}
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} \
"perl -e 'chdir q|${master_data_dir}|; print ((sort grep 
{/^snapshot[.][1-9][0-9]{13}$/} <*>)[-1])'"`
{code}

Since the shell script containing the perl script is enclosed in double quotes, 
${master_data_dir} is properly interpolated before perl sees it.

  was (Author: steve_rowe):
BTW, I tested the similar change made to snappuller under SOLR-830, and it 
does *not* have the same interpolation issue:

{code}
snap_name=`ssh -o StrictHostKeyChecking=no ${master_host} \
"perl -e 'chdir ${master_data_dir}|; print ((sort grep 
{/^snapshot[.][1-9][0-9]{13}$/} <*>)[-1])'"`
{code}

Since the perl script is enclosed in double quotes, ${master_data_dir} is 
properly interpolated before perl sees it.
  
> need to improve snapinstaller to ignore non-snapshots in data directory
> ---
>
> Key: SOLR-346
> URL: https://issues.apache.org/jira/browse/SOLR-346
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (scripts)
>Affects Versions: 1.2, 1.3
>Reporter: Bill Au
>Assignee: Bill Au
>Priority: Minor
> Fix For: 1.4
>
> Attachments: solr-346.patch
>
>
> http://www.mail-archive.com/solr-u...@lucene.apache.org/msg05734.html
> > latest snapshot /opt/solr/data/temp-snapshot.20070816120113 already
> > installed
> A directory in the Solr data directory is causing snapinstaller to fail.  
> Snapinstaller should be improved to ignore any much non-snapshot as possible. 
>  It can use a regular expression to look for snapshot.dd where d 
> is a digit.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Replication page failure :(

2009-03-02 Thread Jeff Newburn
I have tracked down the error to the specific date and file. On 2009-02-13
changes were apparently made to fix the replication page for replication off
of a master and slave server.  This causes the jasper error in the
replication/index.jsp file.  I am not sure why it does as most of the file
was rewritten during that update.  I have included the information on
revisions below.

Date: 2009-02-13

svn up -r {2009-02-13} src/webapp/web/admin/replication/index.jsp (This
breaks the page)

Updated to revision 744021

Items added into CHANGES.txt
-31. SOLR-1015: Incomplete information in replication admin page and http
command response when server
-is both master and slave i.e. when server is a repeater (Akshay Ukey
via shalin)
 
-32. SOLR-1018: Slave is unable to replicate when server acts as repeater
(as both master and slave)
-(Akshay Ukey, Noble Paul via shalin)

Any help would be appreciated.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



From: Jeff Newburn 
Date: Fri, 27 Feb 2009 10:46:13 -0800
To: "solr-dev@lucene.apache.org" 
Conversation: Replication page failure :(
Subject: Replication page failure :(

When updating to trunk for the code the replication page no longer works it
is a page of exception code.

Info:
Slave server on the replication page.  Just a code dump as follows.

Feb 27, 2009 8:44:37 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.jasper.JasperException: java.lang.NullPointerException
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:4
18)
at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:337)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:290)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:206)
at 
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.
java:630)
at 
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDis
patcher.java:436)
at 
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatch
er.java:374)
at 
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher
.java:302)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
273)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
879)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
ttp11NioProtocol.java:719)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
2080)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
07)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
at 
org.apache.jsp.admin.replication.index_jsp._jspService(index_jsp.java:294)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3
74)
... 24 more
-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



[jira] Created: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li (JIRA)
Build Solr index using Hadoop MapReduce
---

 Key: SOLR-1045
 URL: https://issues.apache.org/jira/browse/SOLR-1045
 Project: Solr
  Issue Type: New Feature
Reporter: Ning Li


The goal is a contrib module that builds Solr index using Hadoop MapReduce.

It is different from the Solr support in Nutch. The Solr support in Nutch sends 
a document to a Solr server in a reduce task. Here, the goal is to build/update 
Solr index within map/reduce tasks. Also, it achieves better parallelism when 
the number of map tasks is greater than the number of reduce tasks, which is 
usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li
SOLR-1045 it is. More details will be available in that issue.

Marc, you can check out Hadoop contrib/index which builds a Lucene
index using Hadoop MapReduce. However, it does not handle duplicate
detection.

Cheers,
Ning


On Mon, Mar 2, 2009 at 4:25 PM, Marc Sturlese  wrote:
>
> I am doing some research about creating lucene/solr index using hadoop but
> there's not so much info around, would be great to see some code!!! (I am
> experiencing problems specially in duplication detection)
> Thanks
>
> Shalin Shekhar Mangar wrote:
>>
>> On Mon, Mar 2, 2009 at 11:24 PM, Ning Li  wrote:
>>
>>> Hi,
>>>
>>> I wonder if there is interest in a contrib module that builds Solr
>>> index using Hadoop MapReduce?
>>>
>>
>> Absolutely!
>>
>>
>>> It is different from the Solr support in Nutch. The Solr support in
>>> Nutch sends a document to a Solr server in a reduce task. Here, I aim
>>> at building/updating Solr index within map/reduce tasks. Also, it
>>> achieves better parallelism when the number of map tasks is greater
>>> than the number of reduce tasks, which is usually the case.
>>>
>>> I worked out a very simple initial version. But I want to check if
>>> there is any interest before proceeding. If so, I'll open a Jira
>>> issue.
>>>
>>
>> +1
>>
>> Please do. It'd be great to see this in Solr.
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Build-Solr-index-using-Hadoop-MapReduce-tp22293172p22296832.html
> Sent from the Solr - Dev mailing list archive at Nabble.com.
>
>


[jira] Updated: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Li updated SOLR-1045:
--

Attachment: SOLR-1045.0.patch

> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1045) Build Solr index using Hadoop MapReduce

2009-03-02 Thread Ning Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678200#action_12678200
 ] 

Ning Li commented on SOLR-1045:
---

The purpose of this simple initial version is to give people an idea of the 
functionality. It uses Hadoop contrib/index, which uses Hadoop mapred package. 
Future versions will be very different from this version. The main difference 
is that in this version, after a Solr input document is converted to a Lucene 
document, a Lucene index writer is used to build the index. In future versions, 
a Solr writer/core will be used.

Here are some pre-requisites for this issue:
  - Hadoop 0.20. Hadoop 0.20 is to be released. There are two features in 0.20 
that are important for this issue.
First is the new mapreduce package. The flexibility of the new mapreduce 
api makes it possible to use a Solr writer/core in mapper tasks.
Second is the upgrade to Jetty 6 (6.1.14). The current release 0.19 uses 
Jetty 5.

  - There are a couple of changes required in Solr.
First is to make SolrCore support an indexing-only mode (i.e. no search). 
Only then is it feasible to use it for indexing in a map task.
Second is to upgrate from Jetty 6.1.3 to Jetty 6.1.14. Hadoop 0.20 uses a 
feature that is not available in 6.1.3.

What do you think about making "SolrCore support an indexing-only mode"?


> Build Solr index using Hadoop MapReduce
> ---
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch 
> sends a document to a Solr server in a reduce task. Here, the goal is to 
> build/update Solr index within map/reduce tasks. Also, it achieves better 
> parallelism when the number of map tasks is greater than the number of reduce 
> tasks, which is usually the case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678242#action_12678242
 ] 

Noble Paul commented on SOLR-1044:
--

bq.Is our use of HTTP really a bottleneck? 
we are limited by the servlet engine's ability to serve requests . I guess it 
would easily peak out at 600-800 req/sec .Whereas a NIO based system can serve 
far more with lower latency (http://www.jboss.org/netty/performance.html). If 
we have a request served out of cache (no lucene search involved) the only 
overhead will be that of the HTTP . Then there is the overhead of servlet 
engine itself . Moreover HTTP is not a very efficient for large volume small 
sized requests

bq.My feeling has been that if we go to a call mechanism, it should be based on 
something more standard that will have many off the shelf bindings - perl, 
python, php, C, etc.

I agree. Hadoop looked like a simple RPC mechanism .

bq. That can also be a potential weakness though I think... a slow reader or 
writer for one request/response hangs up all the others.

The requests on the server are served by multiple handlers (each one is a 
thread). One request will not block another if there are enough 
handlers/threads 


> Use Hadoop RPC for inter Solr communication
> ---
>
> Key: SOLR-1044
> URL: https://issues.apache.org/jira/browse/SOLR-1044
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Noble Paul
>
> Solr uses http for distributed search . We can make it a whole lot faster if 
> we use an RPC mechanism which is more lightweight/efficient. 
> Hadoop RPC looks like a good candidate for this.  
> The implementation should just have one protocol. It should follow the Solr's 
> idiom of making remote calls . A uri + params +[optional stream(s)] . The 
> response can be a stream of bytes.
> To make this work we must make the SolrServer implementation pluggable in 
> distributed search. Users should be able to choose between the current 
> CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.