RE: out of memory during indexing do to large incoming queue

2013-06-03 Thread Yoni Amir
Solrconfig.xml - http://apaste.info/dsbv

Schema.xml - http://apaste.info/67PI

This solrconfig.xml file has optimization enabled. I had another file which I 
can't locate at the moment, in which I defined a custom merge scheduler in 
order to disable optimization.

When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
there were much more files than that.

Thanks,
Yoni



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 22:53
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 12:25 PM, Yoni Amir wrote:
 Hi Shawn and Shreejay, thanks for the response.
 Here is some more information:
 1) The machine is a virtual machine on ESX server. It has 4 CPUs and 
 8GB of RAM. I don't remember what CPU but something modern enough. It 
 is running Java 7 without any special parameters, and 4GB allocated 
 for Java (-Xmx)
 2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
 This is the size after it was optimized.
 3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
 available at the time that we had a release deadline.
 4) The setup with master-slave replication, not Solr Cloud. The server that I 
 am discussing is the indexing server, and in these tests there were actually 
 no slaves involved, and virtually zero searches performed.
 5) Attached is my configuration. I tried to disable the warm-up and opening 
 of searchers, it didn't change anything. The commits are done by Solr, using 
 autocommit. The client sends the updates without a commit command.
 6) I want to disable optimization, but when I disabled it, the OOME occurred 
 even faster. The number of segments reached around a thousand within an hour 
 or so. I don't know if it's normal or not, but at that point if I restarted 
 Solr it immediately took about 1GB of heap space just on start-up, instead of 
 the usual 50MB or so.
 
 If I commit less frequently, don't I increase the risk of losing data, e.g., 
 if the power goes down, etc.?
 If I disable optimization, is it necessary to avoid such a large number of 
 segments? Is it possible?

Last part first: Losing data is much less of a risk with Solr 4.x, if you have 
enabled the updateLog.

We'll need some more info.  See the end of the message for specifics.

Right off the bat, I can tell you that with an index that's 117GB, you're going 
to need a LOT of RAM.

Each of my 4.2.1 servers has 42GB of index and about 37 million documents 
between all the index shards.  The web application never uses facets, which 
tend to use a lot of memory.  My index is a lot smaller than yours, and I need 
a 6GB heap, seeing OOM errors if it's only 4GB.
You probably need at least an 8GB heap, and possibly larger.

Beyond the amount of memory that Solr itself uses, for good performance you 
will also need a large amount of memory for OS disk caching.  Unless the server 
is using SSD, you need to allocate at least 64GB of real memory to the virtual 
machine.  If you've got your index on SSD, 32GB might be enough.  I've got 64GB 
total on my servers.

http://wiki.apache.org/solr/SolrPerformanceProblems

When you say that there are over 1000 segments, are you seeing 1000 files, or 
are there literally 1000 segments, giving you between 12000 and 15000 files?  
Even if your mergeFactor were higher than the default 10, that just shouldn't 
happen.

Can you share your solrconfig.xml and schema.xml?  Use a paste website like 
http://apaste.info and share the URLs.

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.



Solr + Groovy

2013-06-03 Thread Achim Domma
Hi,

I have some query building and result processing code, which is currently 
running as normal Solr client outside of Solr. I think it would make a lot of 
sense to move parts of this code into a custom SearchHandler or 
SearchComponent. Because I'm not a big fan of the Java language, I would like 
to use Groovy.

Searching the web I got the impression that Solr + alternative JVM languages 
is not a very common topic. So before starting my project, I would like to 
know: Is there a well known good reason not to use Groovy (or Clojure, Scala, 
...) for implementing custom Solr code?

kind regards,
Achim

how are you handling killer queries?

2013-06-03 Thread Bernd Fehling
How are you handling killer queries with solr?

While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
stupid queries
in my logs, located with extremly long query time.

Example:
q=???+and+??+and+???+and++and+???+and+??

I even get hits for this (hits=34091309 status=0 QTime=88667).

But the jetty log says:
WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
 (broken pipe),trace=org.eclipse.jetty.io.EofException...
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?... 
35 more|,code=500}
WARN:oejs.ServletHandler:/solr/base/select
java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

Because I get hits and qtime the search is successful, right?

But jetty/http has already closed the connection and solr doesn't know about 
this?

How are you handling killer queries, just ignoring?
Or something to tune (jetty config about timeout) or filter (query filtering)?

Would be pleased to hear your comments.

Bernd


Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail
Thanks for your answer.
Can you please elaborate on
mssql text searching is pretty primitive compared to Solr
(Link or anything)
Thanks.


On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.comwrote:

 1 Maybe, maybe not. mssql text searching is pretty primitive
 compared to Solr, just as Solr's db-like operations are
 primitive compared to mssql. They address different use-cases.

 So, you can store the docs in Solr and not touch your SQL db
 at all to return the docs. You can store just the IDs in Solr and
 retrieve your docs from the SQL store. You can store just
 enough data in Solr to display the results page and when the user
 tries to drill down you can go to your SQL database for assembling
 the full document. You can. It all depend on your use case, data
size, all that rot.

Very often, something like the DB is considered the system-of-record
and it's indexed to Solr (See DIH or SolrJ) periodically.

   There is no underlying connection between your SQL store and Solr.
   You control when data is fetched from SQL and put into Solr. You
control what the search experience is. etc.

 2 Not really :(.  See:

 http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

 Best
 Erick

 On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com wrote:
  Hi,
 
  I am just starting to learn about solr.
  I want to test it in my env working with ms sql server.
 
  I have followed the tutorial and imported some rows to the Solr.
  Now I have a few noob question regarding the benefits of implementing
 Solr
  on a sql environment.
 
  1. As I understand, When I send a query request over http, I receive a
  result with ID from the Solr system and than I query the full object row
  from the db.
  Is that right?
  Is there a comparison  next to ms sql full text search which retrieves
 the
  full object in the same select?
  Is there a comparison that relates to db/server cluster and multiple
  machines?
  2. Is there a technic that will assist me to estimate the volume size I
  will need for the indexed data (obviously, based on the indexed data
  properties) ?



Re: Removing a single value from a multiValue field

2013-06-03 Thread Dotan Cohen
On Thu, May 30, 2013 at 5:01 PM, Jack Krupansky j...@basetechnology.com wrote:
 You gave an XML example, so I assumed you were working with XML!


Right, I did give the output as XML. I find XML to be a great document
markup language, but a terrible command format! Mostly, due to
(mis-)use of the attributes.


 In JSON...

 [{id: doc-id, tags: {add: [a, b]}]

 and

 [{id: doc-id, tags: {set: null}}]


Thank you! That is quite more intuitive and less ambiguous than the
XML, would you not agree?

 BTW, this kind of stuff is covered in the book, separate chapters for XML
 and JSON, each with dozens of examples like this.


I have not posted on the book postings, but I will definitely order
one. My vote is for spiral bound, though I know that the perfect-bound
will look more professional on a bookshelf. I don't even care what the
book costs, within reason. Any resource that compiles in a single
package the wonderful methods that yourself and other contributors
mention here and in other places online, will pay for itself in short
order. Apache Solr is an amazing product, but it is often obtuse and
unintuitive. Other times one does not even know what Solr is capable
of, such as the case in this thread, where I was parsing entire
documents to change the multiField value.

Thank you very much!

--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


/non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
Hi,

I am constantly getting this error in my solr log:

Can't find (or read) directory to add to classloader:
/non/existent/dir/yields/warning (resolved as:
E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

Anyone got any idea on how to solve this


-- 
Regards,
Raheel Hasan


Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć
Hello!

You should remove that entry from your solrconfig.xml file. It is
something like this:

  lib dir=/non/existent/dir/yields/warning / 


-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 Hi,

 I am constantly getting this error in my solr log:

 Can't find (or read) directory to add to classloader:
 /non/existent/dir/yields/warning (resolved as:
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

 Anyone got any idea on how to solve this




Re: /non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
ok thanks :)

But why was it there anyway? I mean it says in comments:
If a 'dir' option (with or without a regex) is used and nothing
is found that matches, a warning will be logged.

So it looks like a kind of exception handling or logging for libs not
found... so shouldnt this folder actually exist?





On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 You should remove that entry from your solrconfig.xml file. It is
 something like this:

   lib dir=/non/existent/dir/yields/warning /


 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I am constantly getting this error in my solr log:

  Can't find (or read) directory to add to classloader:
  /non/existent/dir/yields/warning (resolved as:
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

  Anyone got any idea on how to solve this





-- 
Regards,
Raheel Hasan


HostPort attribute of core tag in solr.xml

2013-06-03 Thread Prathik Puthran
Hi,

I am not very sure what the hostPort attribute in core tag of solr.xml
mean. Can someone please let me know?

Thanks,
Prathik


Constant score for more like this reference document

2013-06-03 Thread Achim Domma
I call the mlt handler using a query which searches for a certain document 
(?q=id:some_document_id). The reference document is included in the result and 
the score is also returned. I found out, that the score if fixed, independent 
of the document. So for each document id I get the same score. The score varies 
between cores, but is fixed per core.

I'm aware of all the warnings about scores not being absolute values and that 
you cannot compare them. But I wonder, why the value is fixed per core. Is it 
just a random value or is it possible to explain how it's calculated?

I'm just digging into the code to get a better understanding of the inner 
working, but I'm not yet deep enough. Feel free to point me to the relevant 
code snippets!

kind regards,
Achim

Re: /non/existent/dir/yields/warning

2013-06-03 Thread Rafał Kuć
Hello!

That's a good question. I suppose its there to show users how to setup
a custom path to libraries.

-- 
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

 ok thanks :)

 But why was it there anyway? I mean it says in comments:
 If a 'dir' option (with or without a regex) is used and nothing
 is found that matches, a warning will be logged.

 So it looks like a kind of exception handling or logging for libs not
 found... so shouldnt this folder actually exist?





 On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 You should remove that entry from your solrconfig.xml file. It is
 something like this:

   lib dir=/non/existent/dir/yields/warning /


 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  Hi,

  I am constantly getting this error in my solr log:

  Can't find (or read) directory to add to classloader:
  /non/existent/dir/yields/warning (resolved as:
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).

  Anyone got any idea on how to solve this







Re: How can a Tokenizer be CoreAware?

2013-06-03 Thread Michael Sokolov
Benson, I think the idea is that Tokenizers are created as needed (from 
the TokenizerFactory), while those other objects are singular (one 
created for each corresponding stanza in solrconfig.xml).  So Tokenizers 
should be short-lived; they'll be cleaned up after each use, and the 
assumption is you wouldn't need to do any cleanup yourself; rather just 
let the garbage collector do its work -- assuming these are per-document 
resources.  But if you have longer-lived resources, maybe you could 
manage them in the TokenizerFactory, which will be a singleton?  Or in 
UpdateRequestProcessFactory, like you suggested


-Mike

On 5/29/13 7:36 AM, Benson Margulies wrote:

I am currently testing some things with Solr 4.0.0. I tried to make a
tokenizer CoreAware, and was rewarded with:

Caused by: org.apache.solr.common.SolrException: Invalid 'Aware'
object: com.basistech.rlp.solr.RLPTokenizerFactory@19336006 --
org.apache.solr.util.plugin.SolrCoreAware must be an instance of:
[org.apache.solr.request.SolrRequestHandler]
[org.apache.solr.response.QueryResponseWriter]
[org.apache.solr.handler.component.SearchComponent]
[org.apache.solr.update.processor.UpdateRequestProcessorFactory]
[org.apache.solr.handler.component.ShardHandlerFactory]

I need this to allow cleanup of some cached items in the tokenizer.

Questions:

1: will a newer version allow me to do this directly?
2: is there some other approach that anyone would recommend? I could,
for example, make a fake object in the list above to act as a
singleton with a static accessor, but that seems pretty ugly.




Re: Solr + Groovy

2013-06-03 Thread Michael Sokolov

On 6/3/13 3:07 AM, Achim Domma wrote:

Hi,

I have some query building and result processing code, which is currently running as 
normal Solr client outside of Solr. I think it would make a lot of sense to 
move parts of this code into a custom SearchHandler or SearchComponent. Because I'm not a 
big fan of the Java language, I would like to use Groovy.

Searching the web I got the impression that Solr + alternative JVM languages 
is not a very common topic. So before starting my project, I would like to know: Is there 
a well known good reason not to use Groovy (or Clojure, Scala, ...) for implementing 
custom Solr code?

kind regards,
Achim

Check out Paul Nelson's work, presented at Lucene Revolution 2013:

http://www.lucenerevolution.org/sites/default/files/Advanced%20Query%20Parsing%20Techniques.pdf

He reported success using Groovy embedded in Solr to generate queries

-Mike


Re: Reindexing strategy

2013-06-03 Thread Dotan Cohen
On Fri, May 31, 2013 at 3:57 AM, Michael Sokolov
msoko...@safaribooksonline.comgt wrote:
 On UNIX platforms, take a look at vmstat for basic I/O measurement, and
 iostat for more detailed stats.  One coarse measurement is the number of
 blocked/waiting processes - usually this is due to I/O contention, and you
 will want to look at the paging and swapping numbers - you don't want any
 swapping at all.  But the best single number to look at is overall disk
 activity, which is the I/O percentage utilized number Shaun was mentioning.

 -Mike

Great, thanks! I've got some terms to google. For those who follow in
my footsteps, on Ubuntu the package 'sysstat' needs to be installed to
use iostat. Here are my reference stats before starting to experiment,
both for my own use later to compare and also if anybody sees anything
amiss here then I would love to know about it. If there is any fine
manual that is particularly urgent that I should read, please do
mention it. Thanks!


--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com


SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson
Hi,
I'm seeing really slow query times. 7-25 seconds when I run a simple filter
query that uses my SpatialRecursivePrefixTreeFieldType field.

My index is about 30k documents. Prior to adding the Spatial field, the on
disk space was about 100Mb, so it's a really tiny index. Once I add the
spatial field (which is multi-values), the index size jumps up to 2GB. (Is
this normal?).

Only about 10k documents will have any spatial data. Typically, they will
have at most 10 shapes each, but the majority are all one of two
rectangles.

This is my fieldType definition.

   fieldType name=date_availability
class=solr.SpatialRecursivePrefixTreeFieldType
geo=false
worldBounds=0 0 3650 1
distErrPct=0
maxDistErr=1
units=degrees
/

And the field

 field name=availability_spatial  type=date_availability
 indexed=true stored=false multiValued=true /


I am using the field to represent approximately 10 years after January 1st
2013, where each day is along the X-axis. Because the availability starts
and ends at 2pm and 10am, I was using a decimal place when creating my
shape to show that detail. (Is this approach wrong?)

So a typical rectangle when indexed would be (minX minY maxX maxY)

Rectangle 100.6 0 120.4 1

Is it wrong that my Y and X values are not of the same scale? Since I don't
care about the Y axis at all, I just set it to be of 1 height always.

I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
have 2GB RAM. (Again can be increased).

Thanks


ContributorsGroup

2013-06-03 Thread Emrah Kara
Hi,
Could you please add EmrahKara to ContributorsGroup in solr wiki?

-- 
  *[image: CNT logo] http://www.cntbilisim.com.tr/
**Emrah Kara*
Developer at CNT

Email / Gtalk: em...@cntbilisim.com.tr   Skype: rockipsiz
TEL: +90 232 3481851   GSM: +90 533 3634362   FAX: +90 232 3481861
283/14 Sk No 4 Ender Apt. D:4 Mansuroglu Mah. Bayrakli IZMIR TURKEY
www.tamindir.com


Re: ContributorsGroup

2013-06-03 Thread Erick Erickson
Done, looking forward to your contributions!

Erick

On Mon, Jun 3, 2013 at 7:22 AM, Emrah Kara em...@cntbilisim.com.tr wrote:
 Hi,
 Could you please add EmrahKara to ContributorsGroup in solr wiki?

 --
   *[image: CNT logo] http://www.cntbilisim.com.tr/
 **Emrah Kara*
 Developer at CNT

 Email / Gtalk: em...@cntbilisim.com.tr   Skype: rockipsiz
 TEL: +90 232 3481851   GSM: +90 533 3634362   FAX: +90 232 3481861
 283/14 Sk No 4 Ender Apt. D:4 Mansuroglu Mah. Bayrakli IZMIR TURKEY
 www.tamindir.com


Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Chris Atkinson
Also, here is a sample query, and the debugQuery output

fq={!cost=200}*:* -availability_spatial:Intersects(182.6 0 199.4 1)

Incase the formatting is bad, here is a raw past of the debugQuery:

http://pastie.org/pastes/872/text?key=ksjyboect4imrha0rck8sa


?xml version=1.0 encoding=UTF-8? response lst name=responseHeader
 int name=status0/int int name=QTime8171/int lst name=params
 str name=debugQuerytrue/str str name=indenttrue/str str name=
q*:*/str str name=_1370259235923/str str name=wtxml/str 
str name=fq{!cost=200}*:* -availability_spatial:Intersects(182.6 0
199.4 1)/str str name=rows0/str /lst /lst result name=
response numFound=16137 start=0 /result lst name=debug str
name=rawquerystring*:*/str str name=querystring*:*/str str name=
parsedqueryMatchAllDocsQuery(*:*)/str str name=parsedquery_toString
*:*/str lst name=explain/ str name=QParserLuceneQParser/str arr
name=filter_queries str{!cost=200}*:*
-availability_spatial:Intersects(182.6 0 199.4 1)/str /arr arr name=
parsed_filter_queries str+MatchAllDocsQuery(*:*)
-ConstantScore(org.apache.lucene.spatial.prefix.IntersectsPrefixTreeFilter@42ce603b
)/str /arr lst name=timing double name=time8171.0/double lst
name=prepare double name=time1.0/double lst name=query double
name=time0.0/double /lst lst name=facet double name=time0.0/
double /lst lst name=mlt double name=time1.0/double /lst lst
name=highlight double name=time0.0/double /lst lst name=stats
double name=time0.0/double /lst lst name=debug double name=
time0.0/double /lst /lst lst name=process double name=time
8170.0/double lst name=query double name=time8170.0/double /lst
 lst name=facet double name=time0.0/double /lst lst name=mlt
 double name=time0.0/double /lst lst name=highlight double
name=time0.0/double /lst lst name=stats double name=time0.0/
double /lst lst name=debug double name=time0.0/double /lst /
lst /lst /lst /response



On Mon, Jun 3, 2013 at 12:27 PM, Chris Atkinson chrisa...@gmail.com wrote:

 Hi,
 I'm seeing really slow query times. 7-25 seconds when I run a simple
 filter query that uses my SpatialRecursivePrefixTreeFieldType field.

 My index is about 30k documents. Prior to adding the Spatial field, the on
 disk space was about 100Mb, so it's a really tiny index. Once I add the
 spatial field (which is multi-values), the index size jumps up to 2GB. (Is
 this normal?).

 Only about 10k documents will have any spatial data. Typically, they will
 have at most 10 shapes each, but the majority are all one of two
 rectangles.

 This is my fieldType definition.

fieldType name=date_availability
 class=solr.SpatialRecursivePrefixTreeFieldType
 geo=false
 worldBounds=0 0 3650 1
 distErrPct=0
 maxDistErr=1
 units=degrees
 /

 And the field

  field name=availability_spatial  type=date_availability
  indexed=true stored=false multiValued=true /


 I am using the field to represent approximately 10 years after January 1st
 2013, where each day is along the X-axis. Because the availability starts
 and ends at 2pm and 10am, I was using a decimal place when creating my
 shape to show that detail. (Is this approach wrong?)

 So a typical rectangle when indexed would be (minX minY maxX maxY)

 Rectangle 100.6 0 120.4 1

 Is it wrong that my Y and X values are not of the same scale? Since I
 don't care about the Y axis at all, I just set it to be of 1 height always.

 I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
 have 2GB RAM. (Again can be increased).

 Thanks






Re: Estimating the required volume to

2013-06-03 Thread Erick Erickson
Here's a link to various transformations you can do
while indexing and searching in Solr:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
Consider
 stemming
 ngrams
 WordDelimiterFilterFactory
 ASCIIFoldingFilterFactory
 phrase queries
 boosting
 synonyms
 blah blah blah

You can't do a lot of these transformations, at least not easily
in SQL. OTOH, you can't do 5-way joins in Solr. Different problems,
different tools

All that said, there's no good reason to use Solr if your use-case
is satisfied by simple keyword searches that have no transformations,
mysql etc. work just fine in those cases. It's all about selecting the
right tool for the use-case.

FWIW,
Erick

On Mon, Jun 3, 2013 at 4:44 AM, Mysurf Mail stammail...@gmail.com wrote:
 Thanks for your answer.
 Can you please elaborate on
 mssql text searching is pretty primitive compared to Solr
 (Link or anything)
 Thanks.


 On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.comwrote:

 1 Maybe, maybe not. mssql text searching is pretty primitive
 compared to Solr, just as Solr's db-like operations are
 primitive compared to mssql. They address different use-cases.

 So, you can store the docs in Solr and not touch your SQL db
 at all to return the docs. You can store just the IDs in Solr and
 retrieve your docs from the SQL store. You can store just
 enough data in Solr to display the results page and when the user
 tries to drill down you can go to your SQL database for assembling
 the full document. You can. It all depend on your use case, data
size, all that rot.

Very often, something like the DB is considered the system-of-record
and it's indexed to Solr (See DIH or SolrJ) periodically.

   There is no underlying connection between your SQL store and Solr.
   You control when data is fetched from SQL and put into Solr. You
control what the search experience is. etc.

 2 Not really :(.  See:

 http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

 Best
 Erick

 On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com wrote:
  Hi,
 
  I am just starting to learn about solr.
  I want to test it in my env working with ms sql server.
 
  I have followed the tutorial and imported some rows to the Solr.
  Now I have a few noob question regarding the benefits of implementing
 Solr
  on a sql environment.
 
  1. As I understand, When I send a query request over http, I receive a
  result with ID from the Solr system and than I query the full object row
  from the db.
  Is that right?
  Is there a comparison  next to ms sql full text search which retrieves
 the
  full object in the same select?
  Is there a comparison that relates to db/server cluster and multiple
  machines?
  2. Is there a technic that will assist me to estimate the volume size I
  will need for the indexed data (obviously, based on the indexed data
  properties) ?



Re: /non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
Hi,

but the path looks like it shows how to setup non existent lib warning...
:D


On Mon, Jun 3, 2013 at 2:56 PM, Rafał Kuć r@solr.pl wrote:

 Hello!

 That's a good question. I suppose its there to show users how to setup
 a custom path to libraries.

 --
 Regards,
  Rafał Kuć
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

  ok thanks :)

  But why was it there anyway? I mean it says in comments:
  If a 'dir' option (with or without a regex) is used and nothing
  is found that matches, a warning will be logged.

  So it looks like a kind of exception handling or logging for libs not
  found... so shouldnt this folder actually exist?





  On Mon, Jun 3, 2013 at 2:06 PM, Rafał Kuć r@solr.pl wrote:

  Hello!
 
  You should remove that entry from your solrconfig.xml file. It is
  something like this:
 
lib dir=/non/existent/dir/yields/warning /
 
 
  --
  Regards,
   Rafał Kuć
   Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
 ElasticSearch
 
   Hi,
 
   I am constantly getting this error in my solr log:
 
   Can't find (or read) directory to add to classloader:
   /non/existent/dir/yields/warning (resolved as:
  
 
 E:\Projects\apache_solr\solr-4.3.0\example\solr\genesis_experimental\non\existent\dir\yields\warning).
 
   Anyone got any idea on how to solve this
 
 
 





-- 
Regards,
Raheel Hasan


Re: FieldCache insanity with field used as facet and group

2013-06-03 Thread Elodie Sannier

I'm reproducing the problem with the 4.2.1 example with 2 shards.

1) started up solr shards, indexed the example data, and confirmed empty
fieldCaches
[sanniere@funlevel-dx example]$ java
-Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar
[sanniere@funlevel-dx example2]$ java -Djetty.port=7574
-DzkHost=localhost:9983 -jar start.jar

2) used both grouping and faceting on the popularity field, then checked
the fieldcache insanity count
[sanniere@funlevel-dx example]$ curl -sS
http://localhost:8983/solr/select?q=*:*group=truegroup.field=popularity;
 /dev/null
[sanniere@funlevel-dx example]$ curl -sS
http://localhost:8983/solr/select?q=*:*facet=truefacet.field=popularity;
 /dev/null
[sanniere@funlevel-dx example]$ curl -sS
http://localhost:8983/solr/admin/mbeans?stats=truekey=fieldCachewt=jsonindent=true;
| grep entries_count|insanity_count
entries_count:10,
insanity_count:2,

insanity#0:VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_g(4.2.1):C1)+popularity\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'='popularity',class
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#12129794\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'='popularity',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n\t'SegmentCoreReader(owner=_g(4.2.1):C1)'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#12298774\n,
insanity#1:VALUEMISMATCH: Multiple distinct value objects for
SegmentCoreReader(owner=_f(4.2.1):C9)+popularity\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'='popularity',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#16648315\n\t'SegmentCoreReader(owner=_f(4.2.1):C9)'='popularity',class
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#1130715\n}}},
HIGHLIGHTING,{},
OTHER,{}]}

I've updated https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 28.05.2013 10:22, Elodie Sannier a écrit :

I've created https://issues.apache.org/jira/browse/SOLR-4866

Elodie

Le 07.05.2013 18:19, Chris Hostetter a écrit :

: I am using the Lucene FieldCache with SolrCloud and I have insane instances
: with messages like:

FWIW: I'm the one that named the result of these sanity checks
FieldCacheInsantity and i have regretted it ever since -- a better label
would have been inconsistency

: VALUEMISMATCH: Multiple distinct value objects for
: SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)+merchantid
: 'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'='merchantid',class
: 
org.apache.lucene.index.SortedDocValues,0.5=org.apache.lucene.search.FieldCacheImpl$SortedDocValuesImpl#557711353
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'='merchantid',int,null=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
: 
'SegmentCoreReader(​owner=_11i(​4.2.1):C4493997/853637)'='merchantid',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=org.apache.lucene.search.FieldCacheImpl$IntsFromArray#1105988713
:
: All insane instances are for a field merchantid of type int used as facet
: and group field.

Interesting: it appears that the grouping code and the facet code are not
being consistent in how they are building hte field cache, so you are
getting two objects in the cache for each segment

I haven't checked if this happens much with the example configs, but if
you could: please file a bug with the details of which Solr version you
are using along with the schema fieldType   filed declarations for your
merchantid field, along with the mbean stats output showing the field
cache insanity after executing two queries like...

/select?q=*:*facet=truefacet.field=merchantid
/select?q=*:*group=truegroup.field=merchantid

(that way we can rule out your custom SearchComponent as having a bug in
it)

: This insanity can have performance impact ?
: How can I fix it ?

the impact is just that more ram is being used them is probably strictly
neccessary.  unless there is something unusual in your fieldType
delcataion, i don't think there is an easy fix you can apply -- we need to
fix the underlying code.

-Hoss


--
Kelkoo

*Elodie Sannier *Software engineer

*E*elodie.sann...@kelkoo.frmailto:elodie.sann...@kelkoo.fr
*Y!Messenger* kelkooelodies
*T* +33 (0)4 56 09 07 55 *M*
*A* 4/6 Rue des Méridiens 38130 Echirolles




Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le 

Multitable import - uniqueKey

2013-06-03 Thread Raheel Hasan
Hi,

I am importing multiple table (by join) into solr using DIH. All is set,
except for 1 confusion:
what to do with *uniqueKey* in schema?

When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
from different table).

For example:

uniqueKeytable1_id/uniqueKey
uniqueKeytable2_id/uniqueKey

Will this work?

-- 
Regards,
Raheel Hasan


Re: Estimating the required volume to

2013-06-03 Thread Mysurf Mail
Hi,
Thanks for your answer.
I want to refer to your message, because I am trying to choose the right
tool.


1. regarding stemming:
I am running in ms-sql

SELECT * FROM sys.dm_fts_parser ('FORMSOF(INFLECTIONAL,provide)', 1033,
0, 0)

and I receive

group_id phrase_id occurrence special_term display_term expansion_type
source_term
1 0 1 Exact Match *provided *2 provide
1 0 1 Exact Match *provides  *2 provide
1 0 1 Exact Match *providing *2 provide
1 0 1 Exact Match *provide *0 provide

isnt that stemming ?
2. Regarding synonyms
sql server has a full thesaurus
featurehttp://msdn.microsoft.com/en-us/library/ms142491.aspx.
Doesnt it mean synonyms?


On Mon, Jun 3, 2013 at 2:43 PM, Erick Erickson erickerick...@gmail.comwrote:

 Here's a link to various transformations you can do
 while indexing and searching in Solr:
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
 Consider
  stemming
  ngrams
  WordDelimiterFilterFactory
  ASCIIFoldingFilterFactory
  phrase queries
  boosting
  synonyms
  blah blah blah

 You can't do a lot of these transformations, at least not easily
 in SQL. OTOH, you can't do 5-way joins in Solr. Different problems,
 different tools

 All that said, there's no good reason to use Solr if your use-case
 is satisfied by simple keyword searches that have no transformations,
 mysql etc. work just fine in those cases. It's all about selecting the
 right tool for the use-case.

 FWIW,
 Erick

 On Mon, Jun 3, 2013 at 4:44 AM, Mysurf Mail stammail...@gmail.com wrote:
  Thanks for your answer.
  Can you please elaborate on
  mssql text searching is pretty primitive compared to Solr
  (Link or anything)
  Thanks.
 
 
  On Sun, Jun 2, 2013 at 4:54 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  1 Maybe, maybe not. mssql text searching is pretty primitive
  compared to Solr, just as Solr's db-like operations are
  primitive compared to mssql. They address different use-cases.
 
  So, you can store the docs in Solr and not touch your SQL db
  at all to return the docs. You can store just the IDs in Solr and
  retrieve your docs from the SQL store. You can store just
  enough data in Solr to display the results page and when the user
  tries to drill down you can go to your SQL database for assembling
  the full document. You can. It all depend on your use case, data
 size, all that rot.
 
 Very often, something like the DB is considered the system-of-record
 and it's indexed to Solr (See DIH or SolrJ) periodically.
 
There is no underlying connection between your SQL store and Solr.
You control when data is fetched from SQL and put into Solr. You
 control what the search experience is. etc.
 
  2 Not really :(.  See:
 
 
 http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
 
  Best
  Erick
 
  On Sat, Jun 1, 2013 at 1:07 PM, Mysurf Mail stammail...@gmail.com
 wrote:
   Hi,
  
   I am just starting to learn about solr.
   I want to test it in my env working with ms sql server.
  
   I have followed the tutorial and imported some rows to the Solr.
   Now I have a few noob question regarding the benefits of implementing
  Solr
   on a sql environment.
  
   1. As I understand, When I send a query request over http, I receive a
   result with ID from the Solr system and than I query the full object
 row
   from the db.
   Is that right?
   Is there a comparison  next to ms sql full text search which retrieves
  the
   full object in the same select?
   Is there a comparison that relates to db/server cluster and multiple
   machines?
   2. Is there a technic that will assist me to estimate the volume size
 I
   will need for the indexed data (obviously, based on the indexed data
   properties) ?
 



Re: how are you handling killer queries?

2013-06-03 Thread Shawn Heisey
On 6/3/2013 2:39 AM, Bernd Fehling wrote:
 How are you handling killer queries with solr?
 
 While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
 stupid queries
 in my logs, located with extremly long query time.
 
 Example:
 q=???+and+??+and+???+and++and+???+and+??
 
 I even get hits for this (hits=34091309 status=0 QTime=88667).
 
 But the jetty log says:
 WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
  (broken pipe),trace=org.eclipse.jetty.io.EofException...
 org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?... 
 35 more|,code=500}
 WARN:oejs.ServletHandler:/solr/base/select
 java.lang.IllegalStateException: Committed
 at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)
 
 Because I get hits and qtime the search is successful, right?
 
 But jetty/http has already closed the connection and solr doesn't know about 
 this?
 
 How are you handling killer queries, just ignoring?
 Or something to tune (jetty config about timeout) or filter (query filtering)?

As you might know, EofException happens when one end (usually the
client) closes the TCP connection before the response is delivered.
This is usually caused by explicitly setting timeouts, or by using a
load balancer in front of Solr, because these will normally limit how
long the response can take.  The timeout involved is probably 60 seconds
in this case, and the query took nearly 90 seconds.

It doesn't cause any *direct* problems for Solr, though the nasty
exception that gets logged every time is annoying.  A query like that
does use a lot of resources, so if the server doesn't have a lot of
spare capacity, it can cause problems for everyone else.

Assuming that this isn't happening due to bugs in your application, the
only way to really handle this problem is to first locate the problem
user and educate them.  If the problem continues and it's a viable
option, you might need to ban that user from your system.

Thanks,
Shawn



Re: HostPort attribute of core tag in solr.xml

2013-06-03 Thread Shawn Heisey
On 6/3/2013 3:16 AM, Prathik Puthran wrote:
 I am not very sure what the hostPort attribute in core tag of solr.xml
 mean. Can someone please let me know?

This only has meaning if you are using SolrCloud.  This is how each Solr
server in the cloud informs the cloud what port it is using.

http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params

Thanks,
Shawn



Re: /non/existent/dir/yields/warning

2013-06-03 Thread Shawn Heisey
On 6/3/2013 5:58 AM, Raheel Hasan wrote:
 but the path looks like it shows how to setup non existent lib warning...
 :D

The reason for its existence is encoded in its name.  A nonexistent path
results in a warning.  It's a way to illustrate to a novice what happens
when you have a non-fatal misconfiguration.  The message is a warning
and doesn't prevent Solr startup.

Thanks,
Shawn



Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Eric Wilson
I would like to have the min-match set differently for different fields in
my dismax handler. Is this possible?


Re: how are you handling killer queries?

2013-06-03 Thread Bernd Fehling
Hi Shawn,
well, the user is the world and the servers have enough capacity.
So its nothing really to worry about.
OK, could raise timeout from standard 60 to 90, 120 or even 180 seconds.
Just wanted to know how other solr developer handle this.

The technical question, where is the difference between hitting
the stop button from the browser while a search is running and
the timeout of http connection in my container (in my case jetty)?

I guess the stop button from the browser will inform all parts involved
whereas the timeout just leaves an open end somewhere in the container (broken 
pipe)?

And the container has no way to simulate a browser stop button in case of a 
timeout
to get a sane termination?

Bernd


Am 03.06.2013 16:20, schrieb Shawn Heisey:
 On 6/3/2013 2:39 AM, Bernd Fehling wrote:
 How are you handling killer queries with solr?

 While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
 stupid queries
 in my logs, located with extremly long query time.

 Example:
 q=???+and+??+and+???+and++and+???+and+??

 I even get hits for this (hits=34091309 status=0 QTime=88667).

 But the jetty log says:
 WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
  (broken pipe),trace=org.eclipse.jetty.io.EofException...
 org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?...
  35 more|,code=500}
 WARN:oejs.ServletHandler:/solr/base/select
 java.lang.IllegalStateException: Committed
 at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

 Because I get hits and qtime the search is successful, right?

 But jetty/http has already closed the connection and solr doesn't know about 
 this?

 How are you handling killer queries, just ignoring?
 Or something to tune (jetty config about timeout) or filter (query 
 filtering)?
 
 As you might know, EofException happens when one end (usually the
 client) closes the TCP connection before the response is delivered.
 This is usually caused by explicitly setting timeouts, or by using a
 load balancer in front of Solr, because these will normally limit how
 long the response can take.  The timeout involved is probably 60 seconds
 in this case, and the query took nearly 90 seconds.
 
 It doesn't cause any *direct* problems for Solr, though the nasty
 exception that gets logged every time is annoying.  A query like that
 does use a lot of resources, so if the server doesn't have a lot of
 spare capacity, it can cause problems for everyone else.
 
 Assuming that this isn't happening due to bugs in your application, the
 only way to really handle this problem is to first locate the problem
 user and educate them.  If the problem continues and it's a viable
 option, you might need to ban that user from your system.
 
 Thanks,
 Shawn
 


Re: Multitable import - uniqueKey

2013-06-03 Thread Raheel Hasan
Hi,

Thanks for the replies. Actually, I had only a small confusion:

From table_1 I got key_1; using this I join into table_2. But table_2 also
gave another key key_2 which is needed for joining with table_3.

So for Table1 and Table2 its obviously just fine... but what will happen
when table3 is also added? will the 3 tables be intact in terms of
relationship?

Thanks.



On Mon, Jun 3, 2013 at 7:33 PM, Jack Krupansky j...@basetechnology.comwrote:

 If the respective table IDs are not globally unique, then you (the
 developer) will have to supplement the raw ID with a prefix or suffix or
 other form of global ID (e.g., UUID) to assure that they are unique. You
 could just add the SQL table name as a prefix or suffix.

 The bottom line: What do you WANT the Solr key field to look like? I mean,
 YOU are the data architect, right? What requirements do you have? When your
 Solr application users receive the key values in the responses to queries,
 what expectations do you expect to set for them?

 -- Jack Krupansky

 -Original Message- From: Raheel Hasan
 Sent: Monday, June 03, 2013 9:12 AM
 To: solr-user@lucene.apache.org
 Subject: Multitable import - uniqueKey


 Hi,

 I am importing multiple table (by join) into solr using DIH. All is set,
 except for 1 confusion:
 what to do with *uniqueKey* in schema?


 When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
 from different table).

 For example:

 uniqueKeytable1_id/**uniqueKey
 uniqueKeytable2_id/**uniqueKey

 Will this work?

 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan


Re: /non/existent/dir/yields/warning

2013-06-03 Thread Raheel Hasan
ok fantastic... now I will comment it to be sure thanks a lot

Regards,
Raheel


On Mon, Jun 3, 2013 at 7:27 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/3/2013 5:58 AM, Raheel Hasan wrote:
  but the path looks like it shows how to setup non existent lib warning...
  :D

 The reason for its existence is encoded in its name.  A nonexistent path
 results in a warning.  It's a way to illustrate to a novice what happens
 when you have a non-fatal misconfiguration.  The message is a warning
 and doesn't prevent Solr startup.

 Thanks,
 Shawn




-- 
Regards,
Raheel Hasan


Re: how are you handling killer queries?

2013-06-03 Thread Shawn Heisey
On 6/3/2013 8:43 AM, Bernd Fehling wrote:
 Hi Shawn,
 well, the user is the world and the servers have enough capacity.
 So its nothing really to worry about.
 OK, could raise timeout from standard 60 to 90, 120 or even 180 seconds.
 Just wanted to know how other solr developer handle this.
 
 The technical question, where is the difference between hitting
 the stop button from the browser while a search is running and
 the timeout of http connection in my container (in my case jetty)?
 
 I guess the stop button from the browser will inform all parts involved
 whereas the timeout just leaves an open end somewhere in the container 
 (broken pipe)?
 
 And the container has no way to simulate a browser stop button in case of a 
 timeout
 to get a sane termination?

The result is probably the same, no matter how the connection gets
closed.  I've seen it mostly from my load balancer, and most often with
the layer 7 check that uses my ping handler.  It has a timeout of 5
seconds, and occasionally (usually due to garbage collection pauses) the
query will take longer than 5 seconds.  The load balancer closes the
connection with a TCP reset, which is a perfectly valid (and very fast)
way to close a TCP connection.  The exception isn't coming from unclean
closes, it's coming from ANY close.

I think that Solr shouldn't log a full stacktrace when this happens, but
I'm not sure whether Solr has any control over it, because the exception
comes from Jetty.

Thanks,
Shawn



Re: Multitable import - uniqueKey

2013-06-03 Thread Jack Krupansky
Same answer. Whether it is 2, 3, 10 or 1000 tables, you, the data architect 
must decide how to uniquely identify Solr documents. In general, when 
joining n tables, combine the n keys into one composite key. Either do it on 
the SQL query side, or with a Solr update request processor.


-- Jack Krupansky

-Original Message- 
From: Raheel Hasan

Sent: Monday, June 03, 2013 10:44 AM
To: solr-user@lucene.apache.org
Subject: Re: Multitable import - uniqueKey

Hi,

Thanks for the replies. Actually, I had only a small confusion:


From table_1 I got key_1; using this I join into table_2. But table_2 also

gave another key key_2 which is needed for joining with table_3.

So for Table1 and Table2 its obviously just fine... but what will happen
when table3 is also added? will the 3 tables be intact in terms of
relationship?

Thanks.



On Mon, Jun 3, 2013 at 7:33 PM, Jack Krupansky 
j...@basetechnology.comwrote:



If the respective table IDs are not globally unique, then you (the
developer) will have to supplement the raw ID with a prefix or suffix or
other form of global ID (e.g., UUID) to assure that they are unique. You
could just add the SQL table name as a prefix or suffix.

The bottom line: What do you WANT the Solr key field to look like? I mean,
YOU are the data architect, right? What requirements do you have? When 
your

Solr application users receive the key values in the responses to queries,
what expectations do you expect to set for them?

-- Jack Krupansky

-Original Message- From: Raheel Hasan
Sent: Monday, June 03, 2013 9:12 AM
To: solr-user@lucene.apache.org
Subject: Multitable import - uniqueKey


Hi,

I am importing multiple table (by join) into solr using DIH. All is set,
except for 1 confusion:
what to do with *uniqueKey* in schema?


When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
from different table).

For example:

uniqueKeytable1_id/**uniqueKey
uniqueKeytable2_id/**uniqueKey

Will this work?

--
Regards,
Raheel Hasan





--
Regards,
Raheel Hasan 



RE: Spell Checker (DirectSolrSpellChecker) correct settings

2013-06-03 Thread Dyer, James
My first guess is that no documents match the query provinical court.  
Because you have spellcheck.maxCollationTries set to a non-zero value, it 
will not return these as collations unless the correction will return hits.  
You can test my theory out by removing spellcheck.maxCollationTries from the 
request and see if it returns provinical court as expected.

If this isn't it, then give us the full query request and also the full 
spellcheck response for your failing case.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Raheel Hasan [mailto:raheelhasan@gmail.com] 
Sent: Friday, May 31, 2013 9:38 AM
To: solr-user@lucene.apache.org
Subject: Spell Checker (DirectSolrSpellChecker) correct settings

Hi guyz, I am new to solr. Here is the thing I have:

When i search Courtt, I get correct suggestion saying:



spellcheck: {
suggestions: [
  courtt,
  {
numFound: 1,
startOffset: 0,
endOffset: 6,
suggestion: [
  court
]
  },
  collation,
  [
collationQuery,
court,
hits,
53,
misspellingsAndCorrections,
[
  courtt,
  court
]
  ]
]
  },



But when I try Provincial Courtt, it gives me no suggestions, instead it
searches for Provincial only.


Here is the spell check settings in *solrconfig.xml*:
searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetext_en_splitting/str

!-- a spellchecker built from a field of the main index --
lst name=spellchecker
  str name=namedefault/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=fieldtext/str

  !-- minimum accuracy needed to be considered a valid spellcheck
suggestion --
  float name=accuracy0.5/float
  !-- Require terms to occur in 1% of documents in order to be
included in the dictionary --
  float name=thresholdTokenFrequency.01/float
  !-- the spellcheck distance measure used, the default is the
internal levenshtein --
  !--str name=distanceMeasureinternal/str--
  !-- the maximum #edits we consider when enumerating terms: can be 1
or 2 --
  int name=maxEdits1/int
  !-- the minimum number of characters the terms should share --
  int name=minPrefix3/int
  !-- maximum number of possible matches to review before returning
results --
  int name=maxInspections3/int
  !-- minimum length of a query term to be considered for correction
--
  int name=minQueryLength4/int
  !-- maximum threshold of documents a query term can appear to be
considered for correction --
  float name=maxQueryFrequency0.01/float
/lst


!-- a spellchecker that can break or combine words.  See /spell
handler below for usage --
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldtext/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges5/int
/lst
  /searchComponent



Here is the *requestHandler*:

requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=dftext/str

   !-- Spell checking defaults --
   str name=spellcheckon/str
   str name=spellcheck.count5/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.maxResultsForSuggest5/str
   str name=spellcheck.alternativeTermCount2/str
   str name=spellcheck.extendedResultsfalse/str

   str name=spellcheck.collatetrue/str
   str name=spellcheck.maxCollations3/str
   str name=spellcheck.maxCollationTries3/str
   str name=spellcheck.collateExtendedResultstrue/str
 /lst

 !-- append spellchecking to our list of components --
 arr name=last-components
   strspellcheck/str
 /arr

  /requestHandler



-- 
Regards,
Raheel Hasan



Re: how are you handling killer queries?

2013-06-03 Thread Jack Krupansky

There are two radically distinct use cases:

1. Consumers on the open Internet. They do stupid things. Give them a very 
constrained search experience, enforced with query preprocessing. Maybe give 
them only dismax queries.
2. Professional power users. They typically have credentials for using the 
application, so if they are detected as performing long or stupid queries, 
log the details and administratively take action, such as denying them 
access (or billing them for excessive resource usage.)


-- Jack Krupansky

-Original Message- 
From: Bernd Fehling

Sent: Monday, June 03, 2013 4:39 AM
To: solr-user@lucene.apache.org
Subject: how are you handling killer queries?

How are you handling killer queries with solr?

While solr/lucene (currently 4.2.1) is trying to do its best I see sometimes 
stupid queries

in my logs, located with extremly long query time.

Example:
q=???+and+??+and+???+and++and+???+and+??

I even get hits for this (hits=34091309 status=0 QTime=88667).

But the jetty log says:
WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
(broken pipe),trace=org.eclipse.jetty.io.EofException...
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?... 
35 more|,code=500}

WARN:oejs.ServletHandler:/solr/base/select
java.lang.IllegalStateException: Committed
   at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

Because I get hits and qtime the search is successful, right?

But jetty/http has already closed the connection and solr doesn't know about 
this?


How are you handling killer queries, just ignoring?
Or something to tune (jetty config about timeout) or filter (query 
filtering)?


Would be pleased to hear your comments.

Bernd 



Re: Multitable import - uniqueKey

2013-06-03 Thread Raheel Hasan
ok. But do we need it? Thats what I am confused at. should 1 key from
table_1 pull all the data in relationship as they were inserted?


On Mon, Jun 3, 2013 at 7:53 PM, Jack Krupansky j...@basetechnology.comwrote:

 Same answer. Whether it is 2, 3, 10 or 1000 tables, you, the data
 architect must decide how to uniquely identify Solr documents. In general,
 when joining n tables, combine the n keys into one composite key. Either do
 it on the SQL query side, or with a Solr update request processor.


 -- Jack Krupansky

 -Original Message- From: Raheel Hasan
 Sent: Monday, June 03, 2013 10:44 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Multitable import - uniqueKey


 Hi,

 Thanks for the replies. Actually, I had only a small confusion:

 From table_1 I got key_1; using this I join into table_2. But table_2 also
 gave another key key_2 which is needed for joining with table_3.

 So for Table1 and Table2 its obviously just fine... but what will happen
 when table3 is also added? will the 3 tables be intact in terms of
 relationship?

 Thanks.



 On Mon, Jun 3, 2013 at 7:33 PM, Jack Krupansky j...@basetechnology.com**
 wrote:

  If the respective table IDs are not globally unique, then you (the
 developer) will have to supplement the raw ID with a prefix or suffix or
 other form of global ID (e.g., UUID) to assure that they are unique. You
 could just add the SQL table name as a prefix or suffix.

 The bottom line: What do you WANT the Solr key field to look like? I mean,
 YOU are the data architect, right? What requirements do you have? When
 your
 Solr application users receive the key values in the responses to queries,
 what expectations do you expect to set for them?

 -- Jack Krupansky

 -Original Message- From: Raheel Hasan
 Sent: Monday, June 03, 2013 9:12 AM
 To: solr-user@lucene.apache.org
 Subject: Multitable import - uniqueKey


 Hi,

 I am importing multiple table (by join) into solr using DIH. All is set,
 except for 1 confusion:
 what to do with *uniqueKey* in schema?


 When I had only 1 table, I had it fine. Now how to put 2 uniqueKeys (both
 from different table).

 For example:

 uniqueKeytable1_id/uniqueKey
 uniqueKeytable2_id/uniqueKey


 Will this work?

 --
 Regards,
 Raheel Hasan




 --
 Regards,
 Raheel Hasan




-- 
Regards,
Raheel Hasan


Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jack Krupansky

No, but you can with the LucidWorks Search query parser:

f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2

See:
http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries

-- Jack Krupansky

-Original Message- 
From: Eric Wilson 
Sent: Monday, June 03, 2013 10:30 AM 
To: solr-user@lucene.apache.org 
Subject: Can mm (min-match) be specified by field in dismax or edismax? 


I would like to have the min-match set differently for different fields in
my dismax handler. Is this possible?


Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jason Hellman
Well, there is a hack(ish) way to do it:

_query_:{!type=edismax qf='someField' v='$q' mm=100%}

This is clearly not a solrconfig.xml settings, but part of your query string 
using LocalParam behavior.

This is going to get really messy if you have plenty of fields you'd like to 
search, where you'd need a similar construct for each.  I cannot attest to 
performance at scale with such a construct…but just showing a way you can go 
about this if you feel compelled enough to do so.

Jason

On Jun 3, 2013, at 8:08 AM, Jack Krupansky j...@basetechnology.com wrote:

 No, but you can with the LucidWorks Search query parser:
 
 f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2
 
 See:
 http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries
 
 -- Jack Krupansky
 
 -Original Message- From: Eric Wilson Sent: Monday, June 03, 2013 
 10:30 AM To: solr-user@lucene.apache.org Subject: Can mm (min-match) be 
 specified by field in dismax or edismax? 
 I would like to have the min-match set differently for different fields in
 my dismax handler. Is this possible?



Re: updating docs in solr cloud hangs

2013-06-03 Thread Yago Riveiro
Hi,

My cluster hangs again running an update process, the HTTP POST request was 
aborted because a timeout error. After the hang,  I couldn't do more updates 
without restart the cluster.

I could see this error on node's log after kill it. Is like if solr waits for 
the update response forever … and no more operations can be handle until this 
one finish.

[qtp301150411-1248] ERROR org.apache.solr.core.SolrCore  – 
org.apache.solr.common.SolrException: interrupted waiting for shard update 
response
at 
org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:429)
at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:99)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:447)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1140)
at 
org.apache.solr.update.processor.LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:179)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:937)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:998)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:856)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.solr.update.SolrCmdDistributor.checkResponses(SolrCmdDistributor.java:408)
... 35 more

--  
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, June 3, 2013 at 2:18 AM, Erick Erickson wrote:

 Did you take a stack trace of your _server_ and see if the
 fragment I posted is the place a bunch of threads are
 stuck? If so, then it's what I mentioned, and the patch
 I pointed to should fix it up (when it's ready)...
  
 The fact that it hangs more frequently with replication  1
 is consistent with the JIRA.
  
 Shawn:
  
 Thanks, you beat me to the punch for clarifying replication!
  
 Best
 Erick
  
 On Sun, Jun 2, 2013 at 12:41 PM, Yago Riveiro yago.rive...@gmail.com 
 (mailto:yago.rive...@gmail.com) wrote:
  Shawn:
   
  replicationFactor higher than one yes.
   
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
   
   
  On Sunday, June 2, 2013 at 4:07 PM, Shawn Heisey wrote:
   
   On 6/2/2013 8:28 AM, Yago Riveiro wrote:
Erick:
 
In my case, when server hangs, no exception is thrown, the logs on both 
servers stop registering the update INFO messages. if a shutdown one 
node, immediately the log of the alive node register some update INFO 
messages that appears was stuck 

RE: Spell Checker (DirectSolrSpellChecker) correct settings

2013-06-03 Thread Dyer, James
For each fot he 4 cases listed below, can you give your query request string 
(q=...fq=...qt=...etc) and also the spellchecker output?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Raheel Hasan [mailto:raheelhasan@gmail.com] 
Sent: Monday, June 03, 2013 10:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Spell Checker (DirectSolrSpellChecker) correct settings

Hi, thanks a lot for the reply.

Actually, Provincial Courtt is mentioned in many documents (sorry about
the type earlier).

Secondly, I tried your idea, but not much of help. The issue is very
microscopic:

1) When I search for Provinciaal Courtt = it only suggests `str name=
courttcourt/str` and not Provincial
2) Search for Provincial Courtt = returns result for 'Provincial' keyword
and no suggestion for 'court'.
3) Search for Provinciaal Court = no suggestion; instead searches for
court and returns result.
4) Search for Provinciall Courtt = correct suggestions..






On Mon, Jun 3, 2013 at 7:55 PM, Dyer, James james.d...@ingramcontent.comwrote:

 My first guess is that no documents match the query provinical court.
  Because you have spellcheck.maxCollationTries set to a non-zero value,
 it will not return these as collations unless the correction will return
 hits.  You can test my theory out by removing
 spellcheck.maxCollationTries from the request and see if it returns
 provinical court as expected.

 If this isn't it, then give us the full query request and also the full
 spellcheck response for your failing case.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Raheel Hasan [mailto:raheelhasan@gmail.com]
 Sent: Friday, May 31, 2013 9:38 AM
 To: solr-user@lucene.apache.org
 Subject: Spell Checker (DirectSolrSpellChecker) correct settings

 Hi guyz, I am new to solr. Here is the thing I have:

 When i search Courtt, I get correct suggestion saying:

 

 spellcheck: {
 suggestions: [
   courtt,
   {
 numFound: 1,
 startOffset: 0,
 endOffset: 6,
 suggestion: [
   court
 ]
   },
   collation,
   [
 collationQuery,
 court,
 hits,
 53,
 misspellingsAndCorrections,
 [
   courtt,
   court
 ]
   ]
 ]
   },

 

 But when I try Provincial Courtt, it gives me no suggestions, instead it
 searches for Provincial only.


 Here is the spell check settings in *solrconfig.xml*:
 searchComponent name=spellcheck class=solr.SpellCheckComponent

 str name=queryAnalyzerFieldTypetext_en_splitting/str

 !-- a spellchecker built from a field of the main index --
 lst name=spellchecker
   str name=namedefault/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=fieldtext/str

   !-- minimum accuracy needed to be considered a valid spellcheck
 suggestion --
   float name=accuracy0.5/float
   !-- Require terms to occur in 1% of documents in order to be
 included in the dictionary --
   float name=thresholdTokenFrequency.01/float
   !-- the spellcheck distance measure used, the default is the
 internal levenshtein --
   !--str name=distanceMeasureinternal/str--
   !-- the maximum #edits we consider when enumerating terms: can be 1
 or 2 --
   int name=maxEdits1/int
   !-- the minimum number of characters the terms should share --
   int name=minPrefix3/int
   !-- maximum number of possible matches to review before returning
 results --
   int name=maxInspections3/int
   !-- minimum length of a query term to be considered for correction
 --
   int name=minQueryLength4/int
   !-- maximum threshold of documents a query term can appear to be
 considered for correction --
   float name=maxQueryFrequency0.01/float
 /lst


 !-- a spellchecker that can break or combine words.  See /spell
 handler below for usage --
 lst name=spellchecker
   str name=namewordbreak/str
   str name=classnamesolr.WordBreakSolrSpellChecker/str
   str name=fieldtext/str
   str name=combineWordstrue/str
   str name=breakWordstrue/str
   int name=maxChanges5/int
 /lst
   /searchComponent

 

 Here is the *requestHandler*:

 requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str name=dftext/str

!-- Spell checking defaults --
str name=spellcheckon/str
str name=spellcheck.count5/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.extendedResultsfalse/str

str name=spellcheck.collatetrue/str
str 

Re: how are you handling killer queries?

2013-06-03 Thread Roman Chyla
I think you should take a look at the TimeLimitingCollector (it is used
also inside SolrIndexSearcher).
My understanding is that it will stop your server from consuming
unnecessary resources.

--roman


On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:

 How are you handling killer queries with solr?

 While solr/lucene (currently 4.2.1) is trying to do its best I see
 sometimes stupid queries
 in my logs, located with extremly long query time.

 Example:
 q=???+and+??+and+???+and++and+???+and+??

 I even get hits for this (hits=34091309 status=0 QTime=88667).

 But the jetty log says:
 WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
  (broken pipe),trace=org.eclipse.jetty.io.EofException...
 org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?...
 35 more|,code=500}
 WARN:oejs.ServletHandler:/solr/base/select
 java.lang.IllegalStateException: Committed
 at
 org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

 Because I get hits and qtime the search is successful, right?

 But jetty/http has already closed the connection and solr doesn't know
 about this?

 How are you handling killer queries, just ignoring?
 Or something to tune (jetty config about timeout) or filter (query
 filtering)?

 Would be pleased to hear your comments.

 Bernd



Solr: separating index and storage

2013-06-03 Thread Sourajit Basak
Consider the following use case.

Certain words are extracted from a document and indexed. The exact sentence
containing the word cannot be stored alongside the extracted word because
of the volume at which the documents grow; How can the index and, lets call
it doc servers be separated ?

An option is to store the sentences in MongoDB or a RDBMS. But there seems
to be a schema level design issue. Assuming 'word' to be a multivalued
field, how do we associate to it a reference to the corresponding entry in
the doc server.

May create (word_1, ref_1) tuples. Is there any other in-built feature ?

Any related project which separates index  doc servers ?

Thanks,
Sourajit


Re: Solr query performance tool

2013-06-03 Thread bbarani
You can use this tool to analyze the logs..

https://github.com/dfdeshom/solr-loganalyzer

We use solrmeter to test the performance / Stress testing.

https://code.google.com/p/solrmeter/

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-performance-tool-tp4066900p4067869.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how are you handling killer queries?

2013-06-03 Thread Jack Krupansky

There is the timeAllowed parameter:

http://wiki.apache.org/solr/CommonQueryParameters#timeAllowed

-- Jack Krupansky

-Original Message- 
From: Roman Chyla

Sent: Monday, June 03, 2013 11:53 AM
To: solr-user@lucene.apache.org
Subject: Re: how are you handling killer queries?

I think you should take a look at the TimeLimitingCollector (it is used
also inside SolrIndexSearcher).
My understanding is that it will stop your server from consuming
unnecessary resources.

--roman


On Mon, Jun 3, 2013 at 4:39 AM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:


How are you handling killer queries with solr?

While solr/lucene (currently 4.2.1) is trying to do its best I see
sometimes stupid queries
in my logs, located with extremly long query time.

Example:
q=???+and+??+and+???+and++and+???+and+??

I even get hits for this (hits=34091309 status=0 QTime=88667).

But the jetty log says:
WARN:oejs.Response:Committed before 500 {msg=Datenübergabe unterbrochen
 (broken pipe),trace=org.eclipse.jetty.io.EofException...
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:838)|?...
35 more|,code=500}
WARN:oejs.ServletHandler:/solr/base/select
java.lang.IllegalStateException: Committed
at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1136)

Because I get hits and qtime the search is successful, right?

But jetty/http has already closed the connection and solr doesn't know
about this?

How are you handling killer queries, just ignoring?
Or something to tune (jetty config about timeout) or filter (query
filtering)?

Would be pleased to hear your comments.

Bernd





Saravanan Chinnadurai/Actionimages is out of the office.

2013-06-03 Thread Saravanan . Chinnadurai
I will be out of the office starting  03/06/2013 and will not return until
04/06/2013.

Please email to itsta...@actionimages.com  for any urgent issues.


Action Images is a division of Reuters Limited and your data will therefore be 
protected
in accordance with the Reuters Group Privacy / Data Protection notice which is 
available
in the privacy footer at www.reuters.com
Registered in England No. 145516   VAT REG: 397000555


Solr 4.2.1 higher memory footprint vs Solr 3.5

2013-06-03 Thread SandeepM
Hi,

Using the same schema for both Solr 3.5 and Solr 4.2.1 and posting the same
data to both these server,  and the memory requirements seem to have gone up
sharply during request handling.
. Requests come in at around 200QPS.
. Document sizes are very large but that did not seem to be a problem with
3.5 (Lots of multivalued fields with large array lengths.)
Could you help me understand what change in SOLR 4.2.1 would attribute to
this higher memory requirement?

Also, in a different test, I ran a query to just get a list of all unique
ID's via a single query and no load and I see it complete in 500ms however
the time it takes to ship the data back to the client seems to be very
large.  Any idea what could be causing this behavior?

Would appreciate any help.

Regards,
-- Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Can mm (min-match) be specified by field in dismax or edismax?

2013-06-03 Thread Jack Krupansky
Also, just to be clear, MM/minMatch, is not an option for a field but for 
a full BooleanQuery. I mean, you can't have two different MM values within 
the same BooleanQuery, except with nested BooleanQuerys, where each BQ has 
its own MM.


-- Jack Krupansky

-Original Message- 
From: Jason Hellman

Sent: Monday, June 03, 2013 11:40 AM
To: solr-user@lucene.apache.org
Subject: Re: Can mm (min-match) be specified by field in dismax or edismax?

Well, there is a hack(ish) way to do it:

_query_:{!type=edismax qf='someField' v='$q' mm=100%}

This is clearly not a solrconfig.xml settings, but part of your query string 
using LocalParam behavior.


This is going to get really messy if you have plenty of fields you'd like to 
search, where you'd need a similar construct for each.  I cannot attest to 
performance at scale with such a construct…but just showing a way you can go 
about this if you feel compelled enough to do so.


Jason

On Jun 3, 2013, at 8:08 AM, Jack Krupansky j...@basetechnology.com wrote:


No, but you can with the LucidWorks Search query parser:

f1:(cat dog fox bat fish cow)~50% f2:(cat dog fox bat fish zebra)~2

See:
http://docs.lucidworks.com/display/lweug/Minimum+Match+for+Simple+Queries

-- Jack Krupansky

-Original Message- From: Eric Wilson Sent: Monday, June 03, 2013 
10:30 AM To: solr-user@lucene.apache.org Subject: Can mm (min-match) be 
specified by field in dismax or edismax?

I would like to have the min-match set differently for different fields in
my dismax handler. Is this possible?




Re: Disable all caches in solr

2013-06-03 Thread bbarani
You can also check out this link.

http://lucene.472066.n3.nabble.com/Is-there-a-way-to-remove-caches-in-SOLR-td4061216.html#a4061219





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-all-caches-in-solr-tp4066517p4067870.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr + Groovy

2013-06-03 Thread Achim Domma
Looks interesting, but it's just for the UpdateHandler. Right? Does a similar 
handler for searching already exist?

Achim

Am 03.06.2013 um 17:22 schrieb Jack Krupansky:

 Check out the support for external scripting of update request processors:
 
 http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
 
 Are there any of your requirements that that doesn't address?
 
 -- Jack Krupansky
 
 -Original Message- From: Achim Domma
 Sent: Monday, June 03, 2013 3:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr + Groovy
 
 Hi,
 
 I have some query building and result processing code, which is currently 
 running as normal Solr client outside of Solr. I think it would make a lot 
 of sense to move parts of this code into a custom SearchHandler or 
 SearchComponent. Because I'm not a big fan of the Java language, I would like 
 to use Groovy.
 
 Searching the web I got the impression that Solr + alternative JVM 
 languages is not a very common topic. So before starting my project, I would 
 like to know: Is there a well known good reason not to use Groovy (or 
 Clojure, Scala, ...) for implementing custom Solr code?
 
 kind regards,
 Achim= 



Re: Solr + Groovy

2013-06-03 Thread Jack Krupansky
Sorry about that. Unfortunately, scripting is only on the update side. But I 
imagine athat a lot of the logic could be repurposed for the query side.


-- Jack Krupansky

-Original Message- 
From: Achim Domma

Sent: Monday, June 03, 2013 2:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr + Groovy

Looks interesting, but it's just for the UpdateHandler. Right? Does a 
similar handler for searching already exist?


Achim

Am 03.06.2013 um 17:22 schrieb Jack Krupansky:


Check out the support for external scripting of update request processors:

http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html

Are there any of your requirements that that doesn't address?

-- Jack Krupansky

-Original Message- From: Achim Domma
Sent: Monday, June 03, 2013 3:07 AM
To: solr-user@lucene.apache.org
Subject: Solr + Groovy

Hi,

I have some query building and result processing code, which is currently 
running as normal Solr client outside of Solr. I think it would make a 
lot of sense to move parts of this code into a custom SearchHandler or 
SearchComponent. Because I'm not a big fan of the Java language, I would 
like to use Groovy.


Searching the web I got the impression that Solr + alternative JVM 
languages is not a very common topic. So before starting my project, I 
would like to know: Is there a well known good reason not to use Groovy 
(or Clojure, Scala, ...) for implementing custom Solr code?


kind regards,
Achim= 




Re: Solr + Groovy

2013-06-03 Thread Erik Hatcher
Yeah, it's currently just for the update side of things.  But this issue is 
open https://issues.apache.org/jira/browse/SOLR-3669 and assigned to me, for 
one of these days.  I set it for my 5.0 radar.  Certainly anyone that wants to 
make this happen sooner than I maybe will possibly hopefully one week will 
delve into, go for it!

Erik

p.s. [infomercial] We do have update-side scripting (JavaScript) and business 
rules (via Drools) capabilities in our LucidWorks Search platform* 
http://www.lucidworks.com/products/lucidworks-search with the update-side 
scripting running in the connector framework by design rather than on the Solr 
side of things to allow it to scale in a separate tier.

On Jun 3, 2013, at 14:31 , Achim Domma wrote:

 Looks interesting, but it's just for the UpdateHandler. Right? Does a similar 
 handler for searching already exist?
 
 Achim
 
 Am 03.06.2013 um 17:22 schrieb Jack Krupansky:
 
 Check out the support for external scripting of update request processors:
 
 http://lucene.apache.org/solr/4_3_0/solr-core/org/apache/solr/update/processor/StatelessScriptUpdateProcessorFactory.html
 
 Are there any of your requirements that that doesn't address?
 
 -- Jack Krupansky
 
 -Original Message- From: Achim Domma
 Sent: Monday, June 03, 2013 3:07 AM
 To: solr-user@lucene.apache.org
 Subject: Solr + Groovy
 
 Hi,
 
 I have some query building and result processing code, which is currently 
 running as normal Solr client outside of Solr. I think it would make a lot 
 of sense to move parts of this code into a custom SearchHandler or 
 SearchComponent. Because I'm not a big fan of the Java language, I would 
 like to use Groovy.
 
 Searching the web I got the impression that Solr + alternative JVM 
 languages is not a very common topic. So before starting my project, I 
 would like to know: Is there a well known good reason not to use Groovy (or 
 Clojure, Scala, ...) for implementing custom Solr code?
 
 kind regards,
 Achim= 
 



Re: Dynamic Indexing using DB and DIH

2013-06-03 Thread Shawn Heisey

On 6/3/2013 12:35 PM, PeriS wrote:

I noticed the delta-import is creating a new indexed entry on top of the 
existing one..is that normal?


Not sure what you are asking here, so I'll give an answer to the 
question I think you're asking:  If you have a uniqueKey defined in your 
schema, then new documents with matching values in the uniqueKey field 
will replace the existing documents.  Solr will delete the old one 
before inserting the new one.


Thanks,
Shawn



Re: Dynamic Indexing using DB and DIH

2013-06-03 Thread PeriS
Shawn,

You got the point; I do have a the unique key defined, but for some reason, 
when i run the delta-import; a new entry is created for the same record with a 
new unique key. Its almost somehow it doesn't detect the existing record. 

On Jun 3, 2013, at 3:51 PM, Shawn Heisey s...@elyograg.org wrote:

 On 6/3/2013 12:35 PM, PeriS wrote:
 I noticed the delta-import is creating a new indexed entry on top of the 
 existing one..is that normal?
 
 Not sure what you are asking here, so I'll give an answer to the question I 
 think you're asking:  If you have a uniqueKey defined in your schema, then 
 new documents with matching values in the uniqueKey field will replace the 
 existing documents.  Solr will delete the old one before inserting the new 
 one.
 
 Thanks,
 Shawn
 
 
 
 
 





Re: Custom Response Handler

2013-06-03 Thread vibhoreng04
Hi Erik,

In my case I have to calculate a custom value depending on the retrieved
candidates .This will be for each document.So my choice will be Doc
Transformer.
Lets say in this case if I need to include a java class which does the
computation , how does I tie that with Doc transformer.

Solr wiki (http://wiki.apache.org/solr/DocTransformers) talks about the
Custom Transformers but does not include an example.

Please help.

Regards,
Vibhor Jaiswal



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Response-Handler-tp4067558p4067923.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom Response Handler

2013-06-03 Thread bbarani
You can refer this post to use doctransforemers..

http://java.dzone.com/news/solr-40-doctransformers-first





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Response-Handler-tp4067558p4067926.html
Sent from the Solr - User mailing list archive at Nabble.com.


Inconsistent Full import document index counts.

2013-06-03 Thread chris . donaher

Hello All,
 
I've been working on a 2-shard SolrCloud instance with several million 
documents, and the import process has recently begun to miss documents as they 
are added to the underlying Postgres database. There are no glaring failures in 
the log files (all SEVERE and WARNING level errors in the log are from 
malformed queries). To ensure that it is not an issue with my delta-import 
query, I've tried running full imports to no avail. Strangely, when I modify my 
data-import query to only search for a specific id that is missed in the 
full-import, all of the relevant documents are indexed. Any ideas for possible 
causes of missed document imports in long-running full-imports?
 
Thanks,
Chris Donaher
 
 

RE: Solr query performance tool

2013-06-03 Thread Greg Harris

You have to be careful looking at the QTime's. They do not include garbage 
collection. I've run into issues where QTime is short (cause it was), it just 
happened that the query came in during a long garbage collection where 
everything was paused. So you can get into situations where once the 15 second 
GC is done everything performs as expected! I'd make sure and have an external 
querying tool and you can monitor GC times as well via JMX.



From: bbarani [bbar...@gmail.com]
Sent: Monday, June 03, 2013 8:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr query performance tool

You can use this tool to analyze the logs..

https://github.com/dfdeshom/solr-loganalyzer

We use solrmeter to test the performance / Stress testing.

https://code.google.com/p/solrmeter/





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-query-performance-tool-tp4066900p4067869.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr query performance tool

2013-06-03 Thread Shawn Heisey

On 6/3/2013 3:33 PM, Greg Harris wrote:


You have to be careful looking at the QTime's. They do not include garbage 
collection. I've run into issues where QTime is short (cause it was), it just 
happened that the query came in during a long garbage collection where 
everything was paused. So you can get into situations where once the 15 second 
GC is done everything performs as expected! I'd make sure and have an external 
querying tool and you can monitor GC times as well via JMX.


The QTime value in the response is calculated using 
System.currentTimeMillis(), so it should include the GC time, unless the 
GC happens to hit just after the QTime is calculated but before the 
final response with all the results is sent.  If you are requesting a 
lot of documents or you have very large documents where most/all of the 
fields are stored, having long GCs hit during that particular moment 
might actually be a common occurrence.


Thanks,
Shawn



SolrCloud Load Balancer weight

2013-06-03 Thread Tim Vaillancourt
Hey guys,

I have recently looked into an issue with my Solrcloud related to very high
load when performing a full-import on DIH.

While some work could be done to improve my queries, etc in DIH, this lead
me to a new feature idea in Solr: weighted internal load balancing.

Basically, I can think of two uses cases, and how a weight on load
balancing could help:

1) My situation from above - I'm doing a huge import and want SolrCloud to
direct fewer queries to the node handling the DIH full-import, say weight
10/100 (10%) instead of 100/100.
2) Mixed hardware - Although I wouldn't recommend doing this, some people
may have mixed hardware, some capable of handling more or less traffic.

These weights wouldn't be expected to be exact, just best-effort to be able
generally to influence load on nodes inside the cluster. They of course
would only matter on reads (/get, /select, etc).

A full blown approach would have weight awareness in the Zookeeper-aware
client implementation, and on inter-node replica requests.

Should I JIRA this? Thoughts?

Tim


Re: SpatialRecursivePrefixTreeFieldType Spatial Searching

2013-06-03 Thread Smiley, David W.
Hi Chris:

Have you read: http://wiki.apache.org/solr/SpatialForTimeDurations
You're modeling your data sub-optimally.  Full precision rectangles
(distErrPct=0) doesn't scale well and you're seeing that.  You should
represent your durations as a point and it will take up a fraction of the
space (see above).  Furthermore, because your detail gets into one digit
to the right of the decimal, your maxDistErr should definitely be smaller
than 1 -- use something like 0.5 (given you have two levels of precision
below a full day) but to be safer (more certain it's not a problem) use
0.3 -- a little less.  Please report back how that goes.

~ David

On 6/3/13 7:27 AM, Chris Atkinson chrisa...@gmail.com wrote:

Hi,
I'm seeing really slow query times. 7-25 seconds when I run a simple
filter
query that uses my SpatialRecursivePrefixTreeFieldType field.

My index is about 30k documents. Prior to adding the Spatial field, the on
disk space was about 100Mb, so it's a really tiny index. Once I add the
spatial field (which is multi-values), the index size jumps up to 2GB. (Is
this normal?).

Only about 10k documents will have any spatial data. Typically, they will
have at most 10 shapes each, but the majority are all one of two
rectangles.

This is my fieldType definition.

   fieldType name=date_availability
class=solr.SpatialRecursivePrefixTreeFieldType
geo=false
worldBounds=0 0 3650 1
distErrPct=0
maxDistErr=1
units=degrees
/

And the field

 field name=availability_spatial  type=date_availability
 indexed=true stored=false multiValued=true /


I am using the field to represent approximately 10 years after January 1st
2013, where each day is along the X-axis. Because the availability starts
and ends at 2pm and 10am, I was using a decimal place when creating my
shape to show that detail. (Is this approach wrong?)

So a typical rectangle when indexed would be (minX minY maxX maxY)

Rectangle 100.6 0 120.4 1

Is it wrong that my Y and X values are not of the same scale? Since I
don't
care about the Y axis at all, I just set it to be of 1 height always.

I'm running Solr 4.3, with a small JVM of 768M (can be increased). And I
have 2GB RAM. (Again can be increased).

Thanks



Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread John Guerrero
SOLR 4.2.1, tomcat 6.0.35, CentOS 6.2 (2.6.32-220.4.1.el6.x86_64 #1 SMP),
java 6u27 64 bit
6 nodes, 2 shards, 3 replicas each.  Names changed to r1s2 (replica1 - shard
2), r2s2, and r3s2 for each replica in shard 2.

What we see:

* Under production load, we restart a leader (r1s2), and observe in the
cloud admin
that the old leader is in state Down and no new leader is ever elected.
* The system will stay like this until we stop the old leader (or cause a ZK
timeout...see below).

*Please note:* the leader is killed, then kill -9'd 5 seconds later, before
restarting.  We have since changed this.

Digging into the logs on the old leader (r1s2 = replica1-shard 2):

* The old leader restarted at 5:23:29 PM, but appears to be stuck in
SolrDispatchFilter.init() -- (See recovery at bottom).
* It doesn't want to become leader, possibly due to the unclean shutdown.
May 28, 2013 5:24:42 PM org.apache.solr.update.PeerSync handleVersions
INFO: PeerSync: core=browse url=http://r1s2:8080/solr  Our versions are too
old. ourHighThreshold=1436325665147191297
otherLowThreshold=1436325775374548992
* It then tries to recover, but cannot, because there is no leader.
May 28, 2013 5:24:43 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover.
core=browse:org.apache.solr.common.SolrException: No registered leader was
found, collection:browse slice:shard2
* Meanwhile, it appears that blocking in init(), prevents the http-8080
handler from starting (See recovery at bottom).

Digging into the other replicas (r2s2):

* For some reason, the old leader (r1s2) remains in the list of replicas
that r2s2 attempts to sync to.
May 28, 2013 5:23:42 PM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=browse url=http://r2s2:8080/solr START
replicas=[http://r1s2:8080/solr/browse/, http://r3s2:8080/solr/browse/]
nUpdates=100
* This apparently fails (30 second timeout), possibly due to http-8080
handler not being started on r1s2.
May 28, 2013 5:24:12 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=browse url=http://r2s2:8080/solr  exception talking
to http://r1s2:8080/solr/browse/, failed
org.apache.solr.client.solrj.SolrServerException: Timeout occured while
waiting response from server at: http://r1s2:8080/solr/browse

*At this point, the cluster will remain indefinitely without a leader, if
nothing else changes.*

But in this particular instance, we took some stack and heap dumps from
r1s2, which paused java
long enough to cause a *zookeeper timeout on the old leader (r1s2)*:
May 28, 2013 5:33:26 PM org.apache.zookeeper.ClientCnxn$SendThread run
INFO: Client session timed out, have not heard from server in 38226ms for
sessionid 0x23d28e0f584005d, closing socket connection and attempting
reconnect

Then, one of the replicas (r3s2) finally stopped trying to sync to r1s2 and
succeeded in becoming leader:
May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=browse url=http://r3s2:8080/solr START
replicas=[http://r2s2:8080/solr/browse/] nUpdates=100
May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync handleVersions
INFO: PeerSync: core=browse url=http://r3s2:8080/solr  Received 100 versions
from r2s2:8080/solr/browse/
May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync handleVersions
INFO: PeerSync: core=browse url=http://r3s2:8080/solr  Our versions are
newer. ourLowThreshold=1436325775374548992 otherHigh=1436325775805513730
May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=browse url=http://r3s2:8080/solr DONE. sync succeeded

Now that we have a leader, r1s2 can succeed in recovery and finish
SolrDispatchFilter.init(),
apparently allowing the http-8080 handler to start (r1s2).
May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy replay
INFO: No replay needed. core=browse
May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Replication Recovery was successful - registering as Active.
core=browse
May 28, 2013 5:34:49 PM org.apache.solr.cloud.ZkController publish
INFO: publishing core=browse state=active
May 28, 2013 5:34:49 PM org.apache.solr.cloud.ZkController publish
INFO: numShards not found on descriptor - reading it from system property
May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Finished recovery process. core=browse
May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=browse recoveringAfterStartup=false
May 28, 2013 5:34:49 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState
INFO: Updating cloud state from ZooKeeper...
May 28, 2013 5:34:49 PM org.apache.solr.servlet.SolrDispatchFilter init
INFO: user.dir=/
May 28, 2013 5:34:49 PM org.apache.solr.servlet.SolrDispatchFilter init
*INFO: SolrDispatchFilter.init() done*
May 28, 2013 5:34:49 PM org.apache.solr.cloud.ZkController publish
INFO: publishing core=browse state=recovering
May 28, 2013 5:34:49 PM 

Re: SolrCloud Load Balancer weight

2013-06-03 Thread Mark Miller

On Jun 3, 2013, at 3:33 PM, Tim Vaillancourt t...@elementspace.com wrote:

 Should I JIRA this? Thoughts?

Yeah - it's always been in the back of my mind - it's come up a few times - 
eventually we would like nodes to report some stats to zk to influence load 
balancing. 

- mark

How to Get Cluster State By Solrj?

2013-06-03 Thread Furkan KAMACI
I want to get cluster state of my SolrCloud by Solrj (I know that admin
page shows it but I want to customize it at my application).

Firstly wiki says that:

CloudSolrServer server = new CloudSolrServer(localhost:9983);

why CloudSolrServer takes only one Zookeeper host:port as an argument? I
have a quorum of Zookeeper and some of them maybe down even quorum works?

Secondly how can I get the current state of clusters properly?


Re: How to Get Cluster State By Solrj?

2013-06-03 Thread Mark Miller
It actually accepts a comma separated list of zk host addresses (your quorum). 
Same format as zk describes in it's docs.

To get the cluster state, get the ZkStateReader from the CloudSolrServer and 
then it's getClusterState or something.

- Mark

On Jun 3, 2013, at 5:30 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I want to get cluster state of my SolrCloud by Solrj (I know that admin
 page shows it but I want to customize it at my application).
 
 Firstly wiki says that:
 
 CloudSolrServer server = new CloudSolrServer(localhost:9983);
 
 why CloudSolrServer takes only one Zookeeper host:port as an argument? I
 have a quorum of Zookeeper and some of them maybe down even quorum works?
 
 Secondly how can I get the current state of clusters properly?



Re: Leader election deadlock after restarting leader in 4.2.1

2013-06-03 Thread Mark Miller
Thanks - I can try and look into this perhaps next week. You might copy the 
details into a JIRA issue to prevent it from getting lost though...

- Mark

On Jun 3, 2013, at 4:46 PM, John Guerrero jguerr...@tagged.com wrote:

 SOLR 4.2.1, tomcat 6.0.35, CentOS 6.2 (2.6.32-220.4.1.el6.x86_64 #1 SMP),
 java 6u27 64 bit
 6 nodes, 2 shards, 3 replicas each.  Names changed to r1s2 (replica1 - shard
 2), r2s2, and r3s2 for each replica in shard 2.
 
 What we see:
 
 * Under production load, we restart a leader (r1s2), and observe in the
 cloud admin
 that the old leader is in state Down and no new leader is ever elected.
 * The system will stay like this until we stop the old leader (or cause a ZK
 timeout...see below).
 
 *Please note:* the leader is killed, then kill -9'd 5 seconds later, before
 restarting.  We have since changed this.
 
 Digging into the logs on the old leader (r1s2 = replica1-shard 2):
 
 * The old leader restarted at 5:23:29 PM, but appears to be stuck in
 SolrDispatchFilter.init() -- (See recovery at bottom).
 * It doesn't want to become leader, possibly due to the unclean shutdown.
 May 28, 2013 5:24:42 PM org.apache.solr.update.PeerSync handleVersions
 INFO: PeerSync: core=browse url=http://r1s2:8080/solr  Our versions are too
 old. ourHighThreshold=1436325665147191297
 otherLowThreshold=1436325775374548992
 * It then tries to recover, but cannot, because there is no leader.
 May 28, 2013 5:24:43 PM org.apache.solr.common.SolrException log
 SEVERE: Error while trying to recover.
 core=browse:org.apache.solr.common.SolrException: No registered leader was
 found, collection:browse slice:shard2
 * Meanwhile, it appears that blocking in init(), prevents the http-8080
 handler from starting (See recovery at bottom).
 
 Digging into the other replicas (r2s2):
 
 * For some reason, the old leader (r1s2) remains in the list of replicas
 that r2s2 attempts to sync to.
 May 28, 2013 5:23:42 PM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=browse url=http://r2s2:8080/solr START
 replicas=[http://r1s2:8080/solr/browse/, http://r3s2:8080/solr/browse/]
 nUpdates=100
 * This apparently fails (30 second timeout), possibly due to http-8080
 handler not being started on r1s2.
 May 28, 2013 5:24:12 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=browse url=http://r2s2:8080/solr  exception talking
 to http://r1s2:8080/solr/browse/, failed
 org.apache.solr.client.solrj.SolrServerException: Timeout occured while
 waiting response from server at: http://r1s2:8080/solr/browse
 
 *At this point, the cluster will remain indefinitely without a leader, if
 nothing else changes.*
 
 But in this particular instance, we took some stack and heap dumps from
 r1s2, which paused java
 long enough to cause a *zookeeper timeout on the old leader (r1s2)*:
 May 28, 2013 5:33:26 PM org.apache.zookeeper.ClientCnxn$SendThread run
 INFO: Client session timed out, have not heard from server in 38226ms for
 sessionid 0x23d28e0f584005d, closing socket connection and attempting
 reconnect
 
 Then, one of the replicas (r3s2) finally stopped trying to sync to r1s2 and
 succeeded in becoming leader:
 May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=browse url=http://r3s2:8080/solr START
 replicas=[http://r2s2:8080/solr/browse/] nUpdates=100
 May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync handleVersions
 INFO: PeerSync: core=browse url=http://r3s2:8080/solr  Received 100 versions
 from r2s2:8080/solr/browse/
 May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync handleVersions
 INFO: PeerSync: core=browse url=http://r3s2:8080/solr  Our versions are
 newer. ourLowThreshold=1436325775374548992 otherHigh=1436325775805513730
 May 28, 2013 5:33:34 PM org.apache.solr.update.PeerSync sync
 INFO: PeerSync: core=browse url=http://r3s2:8080/solr DONE. sync succeeded
 
 Now that we have a leader, r1s2 can succeed in recovery and finish
 SolrDispatchFilter.init(),
 apparently allowing the http-8080 handler to start (r1s2).
 May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy replay
 INFO: No replay needed. core=browse
 May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
 INFO: Replication Recovery was successful - registering as Active.
 core=browse
 May 28, 2013 5:34:49 PM org.apache.solr.cloud.ZkController publish
 INFO: publishing core=browse state=active
 May 28, 2013 5:34:49 PM org.apache.solr.cloud.ZkController publish
 INFO: numShards not found on descriptor - reading it from system property
 May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
 INFO: Finished recovery process. core=browse
 May 28, 2013 5:34:49 PM org.apache.solr.cloud.RecoveryStrategy run
 INFO: Starting recovery process.  core=browse recoveringAfterStartup=false
 May 28, 2013 5:34:49 PM org.apache.solr.common.cloud.ZkStateReader
 updateClusterState
 INFO: Updating cloud state from ZooKeeper...
 May 28, 2013 5:34:49 PM