Stemming questions

2012-08-07 Thread Alexander Cougarman
Dear friends,

A few questions on stemming support in Solr 3.6.1:
 - Can you do non-English stemming?
 - We're using solr.PorterStemFilterFactory on the "text_en" field type. We 
will index a ton of PDF, DOCX, etc. docs in multiple languages. Is this the 
best filter factory to use for stemming? 
 - For words like "run", "runners", "running", "ran", we need all to be 
returned. Is there a factory that will return all those? When searching on 
"run", Porter returned "run", "running", "runners" but not "ran". Not sure if 
anything could pick that up. 
 - Is it possible to turn off the stemming filter via code, so it could be a 
checkbox on a web page? We will be writing this in C#. 

Thank you for your help :)

Sincerely,
Alex 



Re: Updating document with the Solr Java API

2012-08-07 Thread Sami Siren
On Tue, Jul 31, 2012 at 5:16 PM, Jonatan Fournier
 wrote:
> Hi,
>
> What is the Java syntax to create an update document?

Currently to update a field in a document with solrj you need to do
something like this:

doc = new SolrInputDocument();
doc.addField("id", "unique");
doc.addField("_version_", version); // this is needed only if
optimistic locking is to be used
HashMap value = new HashMap();
value.put("set",100);
doc.addField("price_f", value);

--
 Sami Siren


Re: Stemming questions

2012-08-07 Thread Tanguy Moal
Dear Alexander,

A few questions on stemming support in Solr 3.6.1:
>  - Can you do non-English stemming?
>
With solr, many languages are supported, see
http://wiki.apache.org/solr/LanguageAnalysis

 - We're using solr.PorterStemFilterFactory on the "text_en" field type. We
> will index a ton of PDF, DOCX, etc. docs in multiple languages. Is this the
> best filter factory to use for stemming?
>
I think it's hard to answer that question, so may be someone else will have
a better answer than mine!
My answer to that question would be: the best thing to do is to test the
available alternatives and then make a decision.
There are different implementations depending on the language. For English,
there is the EnglishMinimalStemFilterFactory which does, as it says in the
name, minimal stemming. I think that's essentially about plural/singular
and some other things.

 - For words like "run", "runners", "running", "ran", we need all to be
> returned. Is there a factory that will return all those? When searching on
> "run", Porter returned "run", "running", "runners" but not "ran". Not sure
> if anything could pick that up.
>
If you read the page linked above, down to
http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming, you'll
see that you can add custom mapping rules for unsupported cases you need to
cover.

 - Is it possible to turn off the stemming filter via code, so it could be
> a checkbox on a web page? We will be writing this in C#.

Yes it is. In practice you will not be turning stemming on or off, but
you'll have the same content indexed in distinct fields, say :
text_unstemmed and text_en, for example ... Where text_unstemmed will not
have the stemmer in the analysis pipeline and text_en would have it.

Checking the checkbox on the webpage would then simply change the query
made to solr so that the stemmed field is queried or not ;-)

Practically, you could use dismax queries, and checking the "[x] activate
stemming" would make the "qf" parameter be "text_unstemmed^2 text_en" and
unchecking the "[ ] activat stemming" would make "qf" parameter be
"text_unstemmed" only.

You can test all these using  a web browser and Solr's HTTP API before
digging into the C# client to make sure you get what you expected ;-)

Thank you for your help :)

I hope this helps :-)

Sincerely,
> Alex
>

Best regards,
Tanguy


Re: termFrequncy off and still use fastvector highlighter?

2012-08-07 Thread Tanguy Moal
May be it wasn't clear in my response, sorry!
You can use a different field for searching (qf parameter for dismax) than
the one for highlighting (hl.fl) :
q="a phrase
query"&qf="text_without_termFreqs"&hl=on&hl.fl="text_with_termFreqs".

Scoring will be based on fq's fields only (i.e. those without termFreqs).
Highlighting will be base on hl.fl's fields only (i.e. those with termFreqs,
as required by fast vector highlighter)

Is it any clearer ? :-)

Best regards,

Tanguy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590p3999544.html
Sent from the Solr - User mailing list archive at Nabble.com.


Select where in select

2012-08-07 Thread JoniJnm
Hi!

I'm trying to do a query with a select in another.

I would like to do something like:

select?q=*:* AND id_user=5&fl=id_other

select?q=test AND -id(the result of the other select)

So:

select?q=test AND -id(select?q=* AND id_user=5&fl=id_other)

Is it possible? Or I have to do two separates selects?

$ids = select?q=*:* AND id_user=5&fl=id_other;
$result = select?q=test AND -id(implode(' AND ', $ids))

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Select-where-in-select-tp3999545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: termFrequncy off and still use fastvector highlighter?

2012-08-07 Thread Tanguy Moal
Hum sorry I think I didn't get your point right!

Maybe what you want to do is more like providing a custom similarity for
scoring of matches, see
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/core/org/apache/lucene/search/package-summary.html#changingSimilarity

That way you can keep the termPositions (enabling phrase searches) but make
the frequency have no impact in scoring by making the tf() method return a
constant value (say 1) instead of the real terms' frequencies.

Is this what you were looking for ?
If so, you'll have to package you're own code into a jar and make that jar
accessible to solr, see http://wiki.apache.org/solr/SolrPlugins for how to
plug your custom code into Solr.

The main drawback of that approach is that it will be activated for all
queries and all fields...

--
Tanguy

2012/8/7 Tanguy Moal 

> May be it wasn't clear in my response, sorry!
> You can use a different field for searching (qf parameter for dismax) than
> the one for highlighting (hl.fl) :
> q="a phrase
> query"&qf="text_without_termFreqs"&hl=on&hl.fl="text_with_termFreqs".
>
> Scoring will be based on fq's fields only (i.e. those without termFreqs).
> Highlighting will be base on hl.fl's fields only (i.e. those with
> termFreqs,
> as required by fast vector highlighter)
>
> Is it any clearer ? :-)
>
> Best regards,
>
> Tanguy
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590p3999544.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Recovery problem in solrcloud

2012-08-07 Thread Markus Jelsma
Perhaps this describes your problem:
https://issues.apache.org/jira/browse/SOLR-3685

 
 
-Original message-
> From:Jam Luo 
> Sent: Tue 07-Aug-2012 11:52
> To: solr-user@lucene.apache.org
> Subject: Recovery problem in solrcloud
> 
> Hi
> I have  big index data files  more then 200g, there are two solr
> instance in a shard.  leader startup and is ok, but the peer alway OOM
>  when  it startup.  The peer alway download index files from leader because
> of  recoveringAfterStartup property in RecoveryStrategy, total time taken
> for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
> leader and the peer have a same generation number,  why the peer
> do recovering?
> 
> thanks
> cooljam
> 


Re: Recovery problem in solrcloud

2012-08-07 Thread Mark Miller

On Aug 7, 2012, at 5:49 AM, Jam Luo  wrote:

> Hi
>I have  big index data files  more then 200g, there are two solr
> instance in a shard.  leader startup and is ok, but the peer alway OOM
> when  it startup.  

Can you share the OOM msg and stacktrace please?

> The peer alway download index files from leader because
> of  recoveringAfterStartup property in RecoveryStrategy, total time taken
> for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
> leader and the peer have a same generation number,  why the peer
> do recovering?

We are looking into this.

> 
> thanks
> cooljam

- Mark Miller
lucidimagination.com













Re: Stemming questions

2012-08-07 Thread Jack Krupansky

You could use a synonym filter to map "ran" to "run".

ran => run (and apply same filter at query and index time)

or

ran, run (only apply filter at index time, synonym filtering not needed at 
query time)


But you would have to manually add all such word forms.

-- Jack Krupansky

-Original Message- 
From: Alexander Cougarman

Sent: Tuesday, August 07, 2012 4:18 AM
To: solr-user@lucene.apache.org
Subject: Stemming questions

Dear friends,

A few questions on stemming support in Solr 3.6.1:
- Can you do non-English stemming?
- We're using solr.PorterStemFilterFactory on the "text_en" field type. We 
will index a ton of PDF, DOCX, etc. docs in multiple languages. Is this the 
best filter factory to use for stemming?
- For words like "run", "runners", "running", "ran", we need all to be 
returned. Is there a factory that will return all those? When searching on 
"run", Porter returned "run", "running", "runners" but not "ran". Not sure 
if anything could pick that up.
- Is it possible to turn off the stemming filter via code, so it could be a 
checkbox on a web page? We will be writing this in C#.


Thank you for your help :)

Sincerely,
Alex 



Synonym file for American-British words

2012-08-07 Thread Alexander Cougarman
Dear friends,

Is there a downloadable synonym file for American-British words? This page has 
some, for example the VarCon file, but it's not in the Solr synonym.txt file. 

We need something that can normalize words like "center" to "centre". The 
VarCon file has it, but it's in the wrong format.

Thank you in advance :)

Sincerely,
Alex 



RE: Synonym file for American-British words

2012-08-07 Thread Alexander Cougarman
Sorry, the VarCon file is here: http://wordlist.sourceforge.net/

Sincerely,
Alex 


-Original Message-
From: Alexander Cougarman [mailto:acoug...@bwc.org] 
Sent: 7 August 2012 5:09 PM
To: solr-user@lucene.apache.org
Subject: Synonym file for American-British words

Dear friends,

Is there a downloadable synonym file for American-British words? This page has 
some, for example the VarCon file, but it's not in the Solr synonym.txt file. 

We need something that can normalize words like "center" to "centre". The 
VarCon file has it, but it's in the wrong format.

Thank you in advance :)

Sincerely,
Alex 



Re: Multiple Embedded Servers Pointing to single solrhome/index

2012-08-07 Thread Bing Hua
Thanks Lance. The use case is to have a cluster of nodes which runs the same
application with EmbeddedSolrServer on each of them, and they all point to
the same index on NFS. Every application is designed equal, meaning that
everyone may index and/or search. 

In such way, after every commit the writer needs to be closed for other
nodes' availability.

Do you see any issues of this use case? Is the EmbeddedSolrServer able to
release its write lock without shutting down?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Embedded-Servers-Pointing-to-single-solrhome-index-tp3999451p3999591.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Recovery problem in solrcloud

2012-08-07 Thread Mark Miller
Still no idea on the OOM - please send the stacktrace if you can.

As for doing a replication recovery when it should not be necessary, yonik just 
committed a fix for that a bit ago.

On Aug 7, 2012, at 9:41 AM, Mark Miller  wrote:

> 
> On Aug 7, 2012, at 5:49 AM, Jam Luo  wrote:
> 
>> Hi
>>   I have  big index data files  more then 200g, there are two solr
>> instance in a shard.  leader startup and is ok, but the peer alway OOM
>> when  it startup.  
> 
> Can you share the OOM msg and stacktrace please?
> 
>> The peer alway download index files from leader because
>> of  recoveringAfterStartup property in RecoveryStrategy, total time taken
>> for download : 2350 secs.  if  data of the peer is empty, it is ok, but the
>> leader and the peer have a same generation number,  why the peer
>> do recovering?
> 
> We are looking into this.
> 
>> 
>> thanks
>> cooljam
> 
> - Mark Miller
> lucidimagination.com
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

- Mark Miller
lucidimagination.com













replication from lucene to solr

2012-08-07 Thread Robert Stewart
Hi,

I have a client who uses Lucene in a home grown CMS system they
developed in Java.  They have a lot of code that uses the Lucene API
directly and they can't change it now.  But they also need to use SOLR
for some other apps which must use the same Lucene index data.  So I
need to make a good way to periodically replicate the Lucene index to
SOLR.  I know how to make efficient Lucene index snapshots from within
their CMS Java app (basically using the same method as the old
replication scripts, using hard-links, etc.) - assuming I have a new
index snapshot, how can I tell a running SOLR instance to start using
the new index snapshot instead of its current index, and also how can
I configure SOLR to use the latest "snapshot" directory on re-start?
Assume I create new index snapshots into a directory such that each
new snapshot is a folder in format MMHHMMDDSS (timestamp).  Is
there any way to configure SOLR to look someplace for new index
snapshots (some multi-core setup?).

Thanks!


Wildcard searches in phrases throws exception

2012-08-07 Thread Alexander Cougarman
Hi,

Is it possible to do wildcard searches on multiple words? Here's an example: We 
need to search on the words "Dearly loved friends" using this

 text:dearly * friends

This blows up Solr with this exception. From my Googling, I see that the error 
has to do with too many tokens being created. So how do you do this kind of 
wildcard searches where the "*" goes in the middle of some phrase? Thank you.

HTTP ERROR 500

Problem accessing /solr/select. Reason:

maxClauseCount is set to 1024

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 
1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
at 
org.apache.lucene.search.ScoringRewrite$1.addClause(ScoringRewrite.java:51)
at 
org.apache.lucene.search.ScoringRewrite$1.addClause(ScoringRewrite.java:55)
at 
org.apache.lucene.search.ScoringRewrite$3.collect(ScoringRewrite.java:95)
at 
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:38)
at 
org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:93)
at 
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:312)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:98)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:391)
at 
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:185)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


Sincerely,
Alex 



null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null

2012-08-07 Thread Markus Jelsma
Hello,

We sometimes see the error below in our `master` when indexing. Our master is 
currently the node we send documents to - we've not yet implemented 
CloudSolrServer in Apache Nutch. This causes the indexer to crash when using 
Nutch locally, the task is retried when running on Hadoop. We're running it 
locally in this test set up so there's only one indexing thread.

Anyway, for me it's quite a cryptic error because i don't know what connection 
has timed out, i assume a connection from the indexing node to some other node 
in the cluster when it passes a document to the correct leader? Each node of 
the 10 node cluster has the same configuration, Tomcat is configured with 
maxThreads=512 and a time out of one second.

We're using today's trunk in this test set up and we cannot reliably reproduce 
the error. We've seen the error before so it's not a very recent issue. No 
errors are found in the other node's logs.

2012-08-07 17:52:05,260 ERROR [solr.servlet.SolrDispatchFilter] - 
[http-8080-exec-6] - : null:java.lang.RuntimeException: [was class 
java.net.SocketTimeoutException] null
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:376)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
at 
io.openindex.solr.servlet.HttpResponseSolrDispatchFilter.doFilter(HttpResponseSolrDispatchFilter.java:219)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException
at 
org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
at 
org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:229)
at 
org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:210)
at 
org.apache.coyote.http11.InternalNioInputBuffer.readSocket(InternalNioInputBuffer.java:643)
at 
org.apache.coyote.http11.InternalNioInputBuffer.fill(InternalNioInputBuffer.java:945)
at org.apache.coyote.http11.InternalNioInputBuffer$SocketInputBuffer.doR
ead(InternalNioInputBuffer.java:969)
at 
org.apache.coyote.http11.filters.ChunkedInputFilter.readBytes(ChunkedInputFilter.java:268)
at 
org.apache.coyote.http11.filters.ChunkedInputFilter.doRead(ChunkedInputFilter.java:167)
at 
org.apache.coyote.http11.InternalNioInputBuffer.doRead(InternalNioInputBuffer.java:916)
at org.apache.coyote.Request.doRead(Request.java:427)
at 
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java:304)
at org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:419)
at org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:327)
at 
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.java:162)
at com.ctc.wstx.io.UTF8Reader.loadMore(UTF8Reader.java:365)
at c

Re: Wildcard searches in phrases throws exception

2012-08-07 Thread Tomás Fernández Löbbe
Maybe you can take a look at this Jira:
https://issues.apache.org/jira/browse/SOLR-1604

On Tue, Aug 7, 2012 at 2:54 PM, Alexander Cougarman wrote:

> Hi,
>
> Is it possible to do wildcard searches on multiple words? Here's an
> example: We need to search on the words "Dearly loved friends" using this
>
>  text:dearly * friends
>
> This blows up Solr with this exception. From my Googling, I see that the
> error has to do with too many tokens being created. So how do you do this
> kind of wildcard searches where the "*" goes in the middle of some phrase?
> Thank you.
>
> HTTP ERROR 500
>
> Problem accessing /solr/select. Reason:
>
> maxClauseCount is set to 1024
>
> org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is
> set to 1024
> at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
> at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
> at
> org.apache.lucene.search.ScoringRewrite$1.addClause(ScoringRewrite.java:51)
> at
> org.apache.lucene.search.ScoringRewrite$1.addClause(ScoringRewrite.java:55)
> at
> org.apache.lucene.search.ScoringRewrite$3.collect(ScoringRewrite.java:95)
> at
> org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:38)
> at
> org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:93)
> at
> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:312)
> at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
> at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:98)
> at
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:391)
> at
> org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
> at
> org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:185)
> at
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
> at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:490)
> at
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
> at
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:131)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:186)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>
>
> Sincerely,
> Alex
>
>


Re: Synonym file for American-British words

2012-08-07 Thread SUJIT PAL
Hi Alex,

I implemented something similar using the rules described in this page:

http://en.wikipedia.org/wiki/American_and_British_English_spelling_differences 

The idea is to normalize the British spelling form to the American form during 
indexing and query using a tokenizer that takes in a word and if matched to one 
of the rules, returns the converted form.

My rules were modeled as a chain of transformations. Each transformation had a 
set of (pattern, action) pairs. The transformations were:
a. word_replacement (such as artefact => artifact) - in this case the source 
word would directly be normalized into the specified target word.
b) prefix rules (eg anae => ane for anemic) - in this case the prefix 
characters of the word, if matched, would be transformed into the target.
c) suffix rules (eg tre => ter for center) - similar to prefix rules except it 
works on suffix.
d) infix rules (eg moeb => meb for ameba) - replaces characters in the middle 
of the word. 

I cannot share the actual rules, but they should be relatively simple to figure 
out from the wiki page, if you want to go that route.

HTM
Sujit

On Aug 7, 2012, at 7:08 AM, Alexander Cougarman wrote:

> Dear friends,
> 
> Is there a downloadable synonym file for American-British words? This page 
> has some, for example the VarCon file, but it's not in the Solr synonym.txt 
> file. 
> 
> We need something that can normalize words like "center" to "centre". The 
> VarCon file has it, but it's in the wrong format.
> 
> Thank you in advance :)
> 
> Sincerely,
> Alex 
> 



Syntax for parameter substitution in function queries?

2012-08-07 Thread Timothy Hill
Hello, all ...

According to http://wiki.apache.org/solr/FunctionQuery/#What_is_a_Function.3F,
it is possible under Solr 4.0 to perform parameter substitutions
within function queries.

However, I can't get the syntax provided in the documentation there to
work *at all* with Solr 4.0 out of the box: the only location at which
function queries can be specified, it seems, is in the 'fl' parameter.
And attempts at parameter substitutions here fail. Using (haphazardly
guessed) syntax like

select?q=*:*&fl=*, test_id:if(exists(employee), employee_id,
socialsecurity_id), boost_id:sum($test_id, 10)&wt=xml

results in the following error

Error parsing fieldname: Missing param test_id while parsing function
'sum($test_id, 10)'

Right now I'm entertaining the following hypotheses:

(i) I'm somehow borking the syntax, probably in an embarrassingly obvious way
(ii) param substitutions of this type work with the syntax given in
the wiki, but not with the standard query handler
(iii) the wiki documentation is an artefact of development-in-progress
for a feature that was subsequently dropped

Can anyone on the list shed any light on this?

Thanks,

Tim


Connect to SOLR over socket file

2012-08-07 Thread Jason Axelson
Hi,

Is it possible to connect to SOLR over a socket file as is possible
with mysql? I've looked around and I get the feeling that I may be
mi-understanding part of SOLR's architecture.

Any pointers are welcome.

Thanks,
Jason


Re: Connect to SOLR over socket file

2012-08-07 Thread Walter Underwood
Yes. You connect over a socket and talk HTTP. --wunder

On Aug 7, 2012, at 12:43 PM, Jason Axelson wrote:

> Hi,
> 
> Is it possible to connect to SOLR over a socket file as is possible
> with mysql? I've looked around and I get the feeling that I may be
> mi-understanding part of SOLR's architecture.
> 
> Any pointers are welcome.
> 
> Thanks,
> Jason






Custom Search Logic

2012-08-07 Thread joshy_m
I am a new user to Solr and I am still learning the techniques used here. 

I had a requirement to do a relative search based on a specific logic. Its
something like, I have a text string which when searched, should return all
the items that are matching that text string for an attribute and all the
other products which are having closeby matches. Since the search is to
check the closeby matches for a string, there is a logic that needs to be
performed. 

Did any one of you have a similar requirements in your past experience? Or
if any one can throw in some ideas that would be helpful as well

Thanks in Advance





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Search-Logic-tp3999642.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Select where in select

2012-08-07 Thread in.abdul
Yes you use filter query ... check filter query .


Syed Abdul kather
send from Samsung S3
On Aug 7, 2012 2:28 PM, "JoniJnm [via Lucene]" <
ml-node+s472066n3999545...@n3.nabble.com> wrote:

> Hi!
>
> I'm trying to do a query with a select in another.
>
> I would like to do something like:
>
> select?q=*:* AND id_user=5&fl=id_other
>
> select?q=test AND -id(the result of the other select)
>
> So:
>
> select?q=test AND -id(select?q=* AND id_user=5&fl=id_other)
>
> Is it possible? Or I have to do two separates selects?
>
> $ids = select?q=*:* AND id_user=5&fl=id_other;
> $result = select?q=test AND -id(implode(' AND ', $ids))
>
> Thanks!
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/Select-where-in-select-tp3999545.html
>  To unsubscribe from Lucene, click 
> here
> .
> NAML
>




-
THANKS AND REGARDS,
SYED ABDUL KATHER
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Select-where-in-select-tp3999545p3999644.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Custom Search Logic

2012-08-07 Thread Michael Della Bitta
Hello Joshy,

You might want to look at MoreLikeThis:

http://wiki.apache.org/solr/MoreLikeThis

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Tue, Aug 7, 2012 at 3:55 PM, joshy_m  wrote:
> I am a new user to Solr and I am still learning the techniques used here.
>
> I had a requirement to do a relative search based on a specific logic. Its
> something like, I have a text string which when searched, should return all
> the items that are matching that text string for an attribute and all the
> other products which are having closeby matches. Since the search is to
> check the closeby matches for a string, there is a logic that needs to be
> performed.
>
> Did any one of you have a similar requirements in your past experience? Or
> if any one can throw in some ideas that would be helpful as well
>
> Thanks in Advance
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Custom-Search-Logic-tp3999642.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-07 Thread anarchos78
Greetings friends,
I have successfully indexed Pdf –using Tika- and pure text –fetched from
database- in one single collection. Now I am trying to implement
highlighting. When I querying Solr i placing in the url the following: 
http://localhost:8090/solr/ktimatologio/select/?q=BlahBlah&;
&start=0&rows=120&indent=on&hl=true&wt=json . Everything is OK. The received
output has the original (not highlighted text) content under “docs” and the
highlighted snippets under “highlighting”. But I had noticed the documents
that have been extracted by Tika don’t have “highlighting” snippet. That
kind of response, cause me many troubles (zero length rows). Is there any
workaround in order to tackle it? I have already tried to copyField (at
index time) but the response come out blank *({“highlighting”:{}})*. I
really need help on this.

With honor,

Tom

Greece 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does Solr support 'Value Search'?

2012-08-07 Thread Bing Hua
Hi folks,

Just wondering if there is a query handler that simply takes a query string
and search on all/part of fields for field values?

e.g. 
q=*admin*

Response may look like
author: [admin, system_admin, sub_admin]
last_modifier: [admin, system_admin, sub_admin]
doctitle: [AdminGuide, AdminManual]



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr search – Tika extracted text from PDF not return highlighting snippet

2012-08-07 Thread Jack Krupansky
The out-of-the-box example for SolrCell/Tika redirects the Tika "content" to 
the "text" field, which is not stored/highlighted, so the Tika content is 
indexed but not retrievable/highligtable.


What field are you highlighting for your database text?

You should direct your Tika "content" to a stored field, and then copy it to 
"text" for indexing and to whatever field you are highlighting.


-- Jack Krupansky

-Original Message- 
From: anarchos78

Sent: Tuesday, August 07, 2012 4:28 PM
To: solr-user@lucene.apache.org
Subject: Solr search – Tika extracted text from PDF not return highlighting 
snippet


Greetings friends,
I have successfully indexed Pdf –using Tika- and pure text –fetched from
database- in one single collection. Now I am trying to implement
highlighting. When I querying Solr i placing in the url the following:
http://localhost:8090/solr/ktimatologio/select/?q=BlahBlah&;
&start=0&rows=120&indent=on&hl=true&wt=json . Everything is OK. The received
output has the original (not highlighted text) content under “docs” and the
highlighted snippets under “highlighting”. But I had noticed the documents
that have been extracted by Tika don’t have “highlighting” snippet. That
kind of response, cause me many troubles (zero length rows). Is there any
workaround in order to tackle it? I have already tried to copyField (at
index time) but the response come out blank *({“highlighting”:{}})*. I
really need help on this.

With honor,

Tom

Greece




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-search-Tika-extracted-text-from-PDF-not-return-highlighting-snippet-tp3999647.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Syntax for parameter substitution in function queries?

2012-08-07 Thread Yonik Seeley
On Tue, Aug 7, 2012 at 3:01 PM, Timothy Hill  wrote:
> Hello, all ...
>
> According to http://wiki.apache.org/solr/FunctionQuery/#What_is_a_Function.3F,
> it is possible under Solr 4.0 to perform parameter substitutions
> within function queries.
>
> However, I can't get the syntax provided in the documentation there to
> work *at all* with Solr 4.0 out of the box: the only location at which
> function queries can be specified, it seems, is in the 'fl' parameter.
> And attempts at parameter substitutions here fail. Using (haphazardly
> guessed) syntax like
>
> select?q=*:*&fl=*, test_id:if(exists(employee), employee_id,
> socialsecurity_id), boost_id:sum($test_id, 10)&wt=xml
>
> results in the following error
>
> Error parsing fieldname: Missing param test_id while parsing function
> 'sum($test_id, 10)'

test_id needs to be an actual request parameter.

This worked for me on the example data:
http://localhost:8983/solr/query?q=*:*&fl=*,%20test_id:if(exists(price),id,name),%20boost_id:sum($param,10)¶m=price

-Yonik
http://lucidimagination.com


Solr index storage strategy on FileSystem

2012-08-07 Thread Bing Hua
Hi folks,

With StandardDirectoryFactory, index is stored under data/index in forms of
frq, tim, tip and a few other files. While index grows larger, more files
are generated and sometimes it merges a few of them. It's like there're some
kinds of separation and merging strategies there.

My question is, are the separation / merging strategies configurable?
Basically I want to add a size limit for any individual file. Is it feasible
without changing solr core code?

Thanks!
Bing



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-index-storage-strategy-on-FileSystem-tp3999661.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null

2012-08-07 Thread Markus Jelsma
A signicant detail is the batch size which we set to 64 documents due to 
earlier memory limitations. We index segments of roughly 300-500k records each 
time. Lowering the batch size to 32 lead to an early internal server error and 
the stack trace below. Increasing it to 128 allowed us to index some more 
records but it still throws the error after 200k+ indexed records.

Increasing it even more to 256 records per batch allowed us to index an entire 
segment without errors.

Another detail is that we do not restart the cluster between indexing attempts 
so it seems that something only builds up during indexing (nothing seems to 
leak afterwards) and throws an error.

Any hints?

Thanks,
Markus

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Tue 07-Aug-2012 20:08
> To: solr-user@lucene.apache.org
> Subject: null:java.lang.RuntimeException: [was class 
> java.net.SocketTimeoutException] null
> 
> Hello,
> 
> We sometimes see the error below in our `master` when indexing. Our master is 
> currently the node we send documents to - we've not yet implemented 
> CloudSolrServer in Apache Nutch. This causes the indexer to crash when using 
> Nutch locally, the task is retried when running on Hadoop. We're running it 
> locally in this test set up so there's only one indexing thread.
> 
> Anyway, for me it's quite a cryptic error because i don't know what 
> connection has timed out, i assume a connection from the indexing node to 
> some other node in the cluster when it passes a document to the correct 
> leader? Each node of the 10 node cluster has the same configuration, Tomcat 
> is configured with maxThreads=512 and a time out of one second.
> 
> We're using today's trunk in this test set up and we cannot reliably 
> reproduce the error. We've seen the error before so it's not a very recent 
> issue. No errors are found in the other node's logs.
> 
> 2012-08-07 17:52:05,260 ERROR [solr.servlet.SolrDispatchFilter] - 
> [http-8080-exec-6] - : null:java.lang.RuntimeException: [was class 
> java.net.SocketTimeoutException] null
> at 
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> at 
> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> at 
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> at 
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> at 
> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:376)
> at 
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229)
> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
> at 
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
> at 
> io.openindex.solr.servlet.HttpResponseSolrDispatchFilter.doFilter(HttpResponseSolrDispatchFilter.java:219)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at 
> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
> at 
> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
> at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.SocketTimeoutException
> at 
> org.apache.tomcat.util.net.NioBlockingSelector.read(NioBlockingSelector.java:185)
> at 
> org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:229)
> at 
> org.apache.tomcat.util.net.NioSelectorPool.read(NioSelectorPool.java:210)
> at 
> org.apa

Re: java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null

2012-08-07 Thread Jack Krupansky
I'm wondering if the timeout occurs due to a JVM garbage collection due to a 
large number of Lucene segments to merge.


What is the JVM heap usage like, compared to the total heap space available?

In other words, maybe the JVM needs more heap memory.

-- Jack Krupansky

-Original Message- 
From: Markus Jelsma

Sent: Tuesday, August 07, 2012 5:39 PM
To: solr-user@lucene.apache.org ; Markus Jelsma
Subject: RE: null:java.lang.RuntimeException: [was class 
java.net.SocketTimeoutException] null


A signicant detail is the batch size which we set to 64 documents due to 
earlier memory limitations. We index segments of roughly 300-500k records 
each time. Lowering the batch size to 32 lead to an early internal server 
error and the stack trace below. Increasing it to 128 allowed us to index 
some more records but it still throws the error after 200k+ indexed records.


Increasing it even more to 256 records per batch allowed us to index an 
entire segment without errors.


Another detail is that we do not restart the cluster between indexing 
attempts so it seems that something only builds up during indexing (nothing 
seems to leak afterwards) and throws an error.


Any hints?

Thanks,
Markus



-Original message-

From:Markus Jelsma 
Sent: Tue 07-Aug-2012 20:08
To: solr-user@lucene.apache.org
Subject: null:java.lang.RuntimeException: [was class 
java.net.SocketTimeoutException] null


Hello,

We sometimes see the error below in our `master` when indexing. Our master 
is currently the node we send documents to - we've not yet implemented 
CloudSolrServer in Apache Nutch. This causes the indexer to crash when 
using Nutch locally, the task is retried when running on Hadoop. We're 
running it locally in this test set up so there's only one indexing 
thread.


Anyway, for me it's quite a cryptic error because i don't know what 
connection has timed out, i assume a connection from the indexing node to 
some other node in the cluster when it passes a document to the correct 
leader? Each node of the 10 node cluster has the same configuration, 
Tomcat is configured with maxThreads=512 and a time out of one second.


We're using today's trunk in this test set up and we cannot reliably 
reproduce the error. We've seen the error before so it's not a very recent 
issue. No errors are found in the other node's logs.


2012-08-07 17:52:05,260 ERROR [solr.servlet.SolrDispatchFilter] - 
[http-8080-exec-6] - : null:java.lang.RuntimeException: [was class 
java.net.SocketTimeoutException] null
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at 
com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at 
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at 
org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:376)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229)
at 
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
at 
io.openindex.solr.servlet.HttpResponseSolrDispatchFilter.doFilter(HttpResponseSolrDispatchFilter.java:219)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(T

Solr 3.5 vs 3.6

2012-08-07 Thread bbarani
Hi,

I heard SOLR 3.5 performs better than SOLR 3.6 (I havent tested though, will
do that very soon), just want to hear thoughts from this forum regarding
that... Also is SOLR 3.6 a stable release?

Thanks,
BB



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-5-vs-3-6-tp3999667.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null

2012-08-07 Thread Yonik Seeley
Could this be just a simple case of a socket timeout?  Can you raise
the timout on request threads in Tomcat?
It's a lot easier to reproduce/diagnose stuff like this when people
use the stock jetty server shipped with Solr.

-Yonik
http://lucidimagination.com


On Tue, Aug 7, 2012 at 5:39 PM, Markus Jelsma
 wrote:
> A signicant detail is the batch size which we set to 64 documents due to 
> earlier memory limitations. We index segments of roughly 300-500k records 
> each time. Lowering the batch size to 32 lead to an early internal server 
> error and the stack trace below. Increasing it to 128 allowed us to index 
> some more records but it still throws the error after 200k+ indexed records.
>
> Increasing it even more to 256 records per batch allowed us to index an 
> entire segment without errors.
>
> Another detail is that we do not restart the cluster between indexing 
> attempts so it seems that something only builds up during indexing (nothing 
> seems to leak afterwards) and throws an error.
>
> Any hints?
>
> Thanks,
> Markus
>
>
>
> -Original message-
>> From:Markus Jelsma 
>> Sent: Tue 07-Aug-2012 20:08
>> To: solr-user@lucene.apache.org
>> Subject: null:java.lang.RuntimeException: [was class 
>> java.net.SocketTimeoutException] null
>>
>> Hello,
>>
>> We sometimes see the error below in our `master` when indexing. Our master 
>> is currently the node we send documents to - we've not yet implemented 
>> CloudSolrServer in Apache Nutch. This causes the indexer to crash when using 
>> Nutch locally, the task is retried when running on Hadoop. We're running it 
>> locally in this test set up so there's only one indexing thread.
>>
>> Anyway, for me it's quite a cryptic error because i don't know what 
>> connection has timed out, i assume a connection from the indexing node to 
>> some other node in the cluster when it passes a document to the correct 
>> leader? Each node of the 10 node cluster has the same configuration, Tomcat 
>> is configured with maxThreads=512 and a time out of one second.
>>
>> We're using today's trunk in this test set up and we cannot reliably 
>> reproduce the error. We've seen the error before so it's not a very recent 
>> issue. No errors are found in the other node's logs.
>>
>> 2012-08-07 17:52:05,260 ERROR [solr.servlet.SolrDispatchFilter] - 
>> [http-8080-exec-6] - : null:java.lang.RuntimeException: [was class 
>> java.net.SocketTimeoutException] null
>> at 
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>> at 
>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>> at 
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>> at 
>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>> at 
>> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:376)
>> at 
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229)
>> at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
>> at 
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> at 
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
>> at 
>> io.openindex.solr.servlet.HttpResponseSolrDispatchFilter.doFilter(HttpResponseSolrDispatchFilter.java:219)
>> at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>> at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>> at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>> at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>> at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>> at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>> at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>> at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>> at 
>> org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
>> at 
>> org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
>> at 
>> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa

RE: null:java.lang.RuntimeException: [was class java.net.SocketTimeoutException] null

2012-08-07 Thread Markus Jelsma
Jack,

There are no peculiarities in the JVM graphs. Only increase in used threads and 
GC time. Heap space is collected quickly and doesn't suddenly increase. There's 
only 256MB available for the heap but it's fine.


Yonik,

I'll increase the time out to five seconds tomorrow and try to reproduce it 
with a low batch size of 32. Juding from what i've seen it should throw an 
error quickly with such a low batch size. However, what is timing out here? My 
client connection to the indexing node or something else that i don't see?

Unfortunately no Jetty here (yet).

Thanks
Markus
 
 
-Original message-
> From:Yonik Seeley 
> Sent: Tue 07-Aug-2012 23:54
> To: solr-user@lucene.apache.org
> Subject: Re: null:java.lang.RuntimeException: [was class 
> java.net.SocketTimeoutException] null
> 
> Could this be just a simple case of a socket timeout?  Can you raise
> the timout on request threads in Tomcat?
> It's a lot easier to reproduce/diagnose stuff like this when people
> use the stock jetty server shipped with Solr.
> 
> -Yonik
> http://lucidimagination.com
> 
> 
> On Tue, Aug 7, 2012 at 5:39 PM, Markus Jelsma
>  wrote:
> > A signicant detail is the batch size which we set to 64 documents due to 
> > earlier memory limitations. We index segments of roughly 300-500k records 
> > each time. Lowering the batch size to 32 lead to an early internal server 
> > error and the stack trace below. Increasing it to 128 allowed us to index 
> > some more records but it still throws the error after 200k+ indexed records.
> >
> > Increasing it even more to 256 records per batch allowed us to index an 
> > entire segment without errors.
> >
> > Another detail is that we do not restart the cluster between indexing 
> > attempts so it seems that something only builds up during indexing (nothing 
> > seems to leak afterwards) and throws an error.
> >
> > Any hints?
> >
> > Thanks,
> > Markus
> >
> >
> >
> > -Original message-
> >> From:Markus Jelsma 
> >> Sent: Tue 07-Aug-2012 20:08
> >> To: solr-user@lucene.apache.org
> >> Subject: null:java.lang.RuntimeException: [was class 
> >> java.net.SocketTimeoutException] null
> >>
> >> Hello,
> >>
> >> We sometimes see the error below in our `master` when indexing. Our master 
> >> is currently the node we send documents to - we've not yet implemented 
> >> CloudSolrServer in Apache Nutch. This causes the indexer to crash when 
> >> using Nutch locally, the task is retried when running on Hadoop. We're 
> >> running it locally in this test set up so there's only one indexing thread.
> >>
> >> Anyway, for me it's quite a cryptic error because i don't know what 
> >> connection has timed out, i assume a connection from the indexing node to 
> >> some other node in the cluster when it passes a document to the correct 
> >> leader? Each node of the 10 node cluster has the same configuration, 
> >> Tomcat is configured with maxThreads=512 and a time out of one second.
> >>
> >> We're using today's trunk in this test set up and we cannot reliably 
> >> reproduce the error. We've seen the error before so it's not a very recent 
> >> issue. No errors are found in the other node's logs.
> >>
> >> 2012-08-07 17:52:05,260 ERROR [solr.servlet.SolrDispatchFilter] - 
> >> [http-8080-exec-6] - : null:java.lang.RuntimeException: [was class 
> >> java.net.SocketTimeoutException] null
> >> at 
> >> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
> >> at 
> >> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
> >> at 
> >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
> >> at 
> >> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
> >> at 
> >> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:376)
> >> at 
> >> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:229)
> >> at 
> >> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
> >> at 
> >> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >> at 
> >> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >> at 
> >> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
> >> at 
> >> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
> >> at 
> >> io.openindex.solr.servlet.HttpResponseSolrDispatchFilter.doFilter(HttpResponseSolrDispatchFilter.java:219)
> >> at 
> >> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >> at 
> >> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >> at 
> >> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java

Re: Custom Search Logic

2012-08-07 Thread Lance Norskog
Does "closeby" mean other words nearby in the text? For that, you want
Lucene or Solr. Lucene is a text search library which does this nearby
search very very quickly, and Solr is an app that wraps Lucene.

On Tue, Aug 7, 2012 at 1:14 PM, Michael Della Bitta
 wrote:
> Hello Joshy,
>
> You might want to look at MoreLikeThis:
>
> http://wiki.apache.org/solr/MoreLikeThis
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
> www.appinions.com
> Where Influence Isn’t a Game
>
>
> On Tue, Aug 7, 2012 at 3:55 PM, joshy_m  wrote:
>> I am a new user to Solr and I am still learning the techniques used here.
>>
>> I had a requirement to do a relative search based on a specific logic. Its
>> something like, I have a text string which when searched, should return all
>> the items that are matching that text string for an attribute and all the
>> other products which are having closeby matches. Since the search is to
>> check the closeby matches for a string, there is a logic that needs to be
>> performed.
>>
>> Did any one of you have a similar requirements in your past experience? Or
>> if any one can throw in some ideas that would be helpful as well
>>
>> Thanks in Advance
>>
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/Custom-Search-Logic-tp3999642.html
>> Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Lance Norskog
goks...@gmail.com


exclusions by query and many values

2012-08-07 Thread caddmngr
we have Solr docs for manufacturer parts, where each part is available from
100's of suppliers. those suppliers are stored within each Mfg part document
in a multi-field.

customers searching our parts by keyword against part titles and
descriptions, each have a unique list of what suppliers they are not allowed
to view. For example:

solr docs:
doc_id = 1, mfg_part = abc, suppliers = s1, s2, s3, s4, s5, s6, s7
doc_id = 2, mfg_part = def, suppliers = s4, s5, s6, s7
doc_id = 3, mfg_part = ghi, suppliers = s4
doc_id = 4, mfg_part = jkl, suppliers = s1, s2, s3, s4
doc_id = 5, mfg_part = mno, suppliers = s1, s2, s3, s5

customer A:  exclude suppliers: s4, s5, s6, s7
customer B:  exclude suppliers: s1, s2, s3, s4

when customer A searches, documents 2 & 3 should not be returned in any
result set
when customer B searches, documents 3 & 4 should not be returned in any
result set

one thought we have is to restructure our docs so that there is one doc per
supplier mfg part, instead of per mfg part, but the result would be an index
of 3000 times the size!!! Many of our mfg parts have 1000 or more suppliers.
Currently we shove exclude lists into the filter query, but its getting to
be quite

we haved looked at Solr "join" in Solr 4, but being this is a production
site generating millions of dollars per week, we cannot afford to use alpha
or beta versions of software.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclusions-by-query-and-many-values-tp3999672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4.0 Alpha incompatible with Index created with 3 Months old trunk code

2012-08-07 Thread Jack Krupansky
The last index format change I recall seeing was on June 11, which was like 
two weeks before the 4.0 Alpha.


So, yeah, any 4.0 index created before that June 11 commit would have to be 
reindexed.


-- Jack Krupansky

-Original Message- 
From: roz dev

Sent: Tuesday, August 07, 2012 7:38 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.0 Alpha incompatible with Index created with 3 Months old 
trunk code


Hi All

We recently upgraded Solr version to use Solr 4.0.Alpha. After restarting
the server, we got an error indicating that Index Version is not compatible.
We were using Solr 4 from trunk and it was from 4/12/12.

Has index format changed in last 3 months?

Exception that we got is

2012-08-07 18:25:10,728 2733 INFO [org.apache.solr.core.SolrCore] (main -
[tools] CLOSING SolrCore org.apache.solr.core.SolrCore@63e5a3e
2012-08-07 18:25:10,731 2736 INFO [org.apache.solr.core.SolrCore] (main -
[tools] Closing main searcher on request.
2012-08-07 18:25:10,732 2737 ERROR
[org.apache.solr.core.CoreContainer](main-
null:org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.(SolrCore.java:688)
at org.apache.solr.core.SolrCore.(SolrCore.java:551)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:854)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:539)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:360)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:309)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:106)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4071)
at org.apache.catalina.core.StandardContext.start(StandardContext.java:4725)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053)
at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at org.apache.catalina.core.StandardService.start(StandardService.java:525)
at org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1267)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1379)
at org.apache.solr.core.SolrCore.(SolrCore.java:663)
... 24 more
Caused by: org.apache.lucene.index.IndexFormatTooNewException: Format
version is not supported (resource:
ChecksumIndexInput(MMapIndexInput(path="/opt/gid/solr/tools/data/index/segments_2k"))):
-12 (needs to be between -9 and -11)
at
org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.readLegacySegmentInfo(Lucene3xSegmentInfoReader.java:132)
at
org.apache.lucene.codecs.lucene3x.Lucene3xSegmentInfoReader.readLegacyInfos(Lucene3xSegmentInfoReader.java:54)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:297)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:752)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:81)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
at
org.apache.solr.search.SolrIndexSearcher.(SolrIndexSearcher.java:119)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1243)
... 26 more 



Re: exclusions by query and many values

2012-08-07 Thread Jack Krupansky
The usual technique is via filter queries that constrain what documents a 
user query can "see", either by OR-ing the doc classes it can see or 
starting with "*:*" and NOT-ing the doc classes it can't see, or a 
combination of the two techniques.


The filter queries could either be supplied as query request parameters or 
added to the base query by a custom search component. For example, if you 
have some authorization system that you need to communicate with to 
determine the access authorization for a given user.


-- Jack Krupansky

-Original Message- 
From: caddmngr

Sent: Tuesday, August 07, 2012 6:59 PM
To: solr-user@lucene.apache.org
Subject: exclusions by query and many values

we have Solr docs for manufacturer parts, where each part is available from
100's of suppliers. those suppliers are stored within each Mfg part document
in a multi-field.

customers searching our parts by keyword against part titles and
descriptions, each have a unique list of what suppliers they are not allowed
to view. For example:

solr docs:
doc_id = 1, mfg_part = abc, suppliers = s1, s2, s3, s4, s5, s6, s7
doc_id = 2, mfg_part = def, suppliers = s4, s5, s6, s7
doc_id = 3, mfg_part = ghi, suppliers = s4
doc_id = 4, mfg_part = jkl, suppliers = s1, s2, s3, s4
doc_id = 5, mfg_part = mno, suppliers = s1, s2, s3, s5

customer A:  exclude suppliers: s4, s5, s6, s7
customer B:  exclude suppliers: s1, s2, s3, s4

when customer A searches, documents 2 & 3 should not be returned in any
result set
when customer B searches, documents 3 & 4 should not be returned in any
result set

one thought we have is to restructure our docs so that there is one doc per
supplier mfg part, instead of per mfg part, but the result would be an index
of 3000 times the size!!! Many of our mfg parts have 1000 or more suppliers.
Currently we shove exclude lists into the filter query, but its getting to
be quite

we haved looked at Solr "join" in Solr 4, but being this is a production
site generating millions of dollars per week, we cannot afford to use alpha
or beta versions of software.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclusions-by-query-and-many-values-tp3999672.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: replication from lucene to solr

2012-08-07 Thread Lance Norskog
Look at how the older rsync-based snapshooter works: it uses the Unix
rsync program to very efficiently spot and copy updated files in the
master index. It runs from each query slave, just like Java
replication. Unlike Java replication, it just uses the SSH copy
protocol, and does not talk to the master indexing Solr program.

You can run the snapshooter against any directory with a Lucene index.
An actively updated index will work great.

The key to this replicator is that Lucene never saves inconsistent
data on disk: it writes new data and the updates the master list of
what is new data, then deletes the old data. You can copy a Lucene
index at any point in time and it will be consistent.

On Tue, Aug 7, 2012 at 9:25 AM, Robert Stewart  wrote:
> Hi,
>
> I have a client who uses Lucene in a home grown CMS system they
> developed in Java.  They have a lot of code that uses the Lucene API
> directly and they can't change it now.  But they also need to use SOLR
> for some other apps which must use the same Lucene index data.  So I
> need to make a good way to periodically replicate the Lucene index to
> SOLR.  I know how to make efficient Lucene index snapshots from within
> their CMS Java app (basically using the same method as the old
> replication scripts, using hard-links, etc.) - assuming I have a new
> index snapshot, how can I tell a running SOLR instance to start using
> the new index snapshot instead of its current index, and also how can
> I configure SOLR to use the latest "snapshot" directory on re-start?
> Assume I create new index snapshots into a directory such that each
> new snapshot is a folder in format MMHHMMDDSS (timestamp).  Is
> there any way to configure SOLR to look someplace for new index
> snapshots (some multi-core setup?).
>
> Thanks!



-- 
Lance Norskog
goks...@gmail.com


Solr Ping Request Handler Response problem

2012-08-07 Thread vempap
Hello,

  I've a problem with SOLR 4.0 Alpha ping request handler. If there are many
cores & if I do start all the solr instances and they are up & running
successfully, when I do a create index it fails with logs saying that one of
the instances is down. I really donno why it is happening as starting all
the solr instances passed successfully. But, if I allow couple of seconds
wait time & then I'm able to create index without any problem. When I did a
little debugging into Solr code, the response from the PingRequestHandler
seems to be empty when it fails creating an index just after successfully
starting all the solr instances.

I absolutely have no idea why this happens. Is it absolutely necessary that
I've to wait for about few seconds before I try to create an index just the
moment after successfully starting all the solr instances.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Ping-Request-Handler-Response-problem-tp3999694.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is this too much time for full Data Import?

2012-08-07 Thread Mikhail Khludnev
Hello,

Does your indexer utilize CPU/IO? - check it by iostat/vmstat.
If it doesn't, take several thread dumps by jvisualvm sampler or jstack,
try to understand what blocks your threads from progress.
It might happen you need to speedup your SQL data consumption, to do this,
you can enable threads in DIH (only in 3.6.1), move from N+1 SQL queries to
select all/cache approach
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor and
https://issues.apache.org/jira/browse/SOLR-2382

Good luck

On Wed, Aug 8, 2012 at 9:16 AM, Pranav Prakash  wrote:

> Folks,
>
> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL
> queries for each document. The database servers are different from Solr
> Servers. Each document has an update processor chain which (a) calculates
> signature of the document using SignatureUpdateProcessorFactory and (b)
> Finds out terms which have term frequency > 2; using a custom processor.
> The index size is ~ 480GiB
>
> I want to know if the amount of time taken is too large compared to the
> document count? How do I benchmark the stats and what are some of the ways
> I can improve this? I believe there are some optimizations that I could do
> at Update Processor Factory level as well. What would be a good way to get
> dirty on this?
>
> *Pranav Prakash*
>
> "temet nosce"
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics


 


Re: Recovery problem in solrcloud

2012-08-07 Thread Jam Luo
Aug 06, 2012 10:05:55 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java
heap space
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:456)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:284)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.FixedBitSet.(FixedBitSet.java:54)
at
org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104)
at
org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:129)
at
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:318)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:507)
at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
at
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1394)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1269)
at
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:384)
at
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:420)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1544)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:442)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:263)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:499)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHa

Re: Does Solr support 'Value Search'?

2012-08-07 Thread Mikhail Khludnev
Hello,

Have you checked
http://lucidworks.lucidimagination.com/display/lweug/Wildcard+Queries ?

On Wed, Aug 8, 2012 at 12:56 AM, Bing Hua  wrote:

> Hi folks,
>
> Just wondering if there is a query handler that simply takes a query string
> and search on all/part of fields for field values?
>
> e.g.
> q=*admin*
>
> Response may look like
> author: [admin, system_admin, sub_admin]
> last_modifier: [admin, system_admin, sub_admin]
> doctitle: [AdminGuide, AdminManual]
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Does-Solr-support-Value-Search-tp3999654.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics