Re: OutOfMemoryError

2013-03-27 Thread Arkadi Colson

I upgraded java to version 7 and everything seems to be stable now!

BR,
Arkadi

On 03/25/2013 09:54 PM, Shawn Heisey wrote:

On 3/25/2013 1:34 AM, Arkadi Colson wrote:

I changed my system memory to 12GB. Solr now gets -Xms2048m -Xmx8192m as
parameters. I also added -XX:+UseG1GC to the java process. But now the
whole machine crashes! Any idea why?

Mar 22 20:30:01 solr01-gs kernel: [716098.077809] java invoked
oom-killer: gfp_mask=0x201da, order=0, oom_adj=0


Linux (the out of memory killer, or oom-killer) is deciding to kill 
the java process because the entire machine is out of memory.  
Normally it kills off the process using the most memory. This will 
only happen when all RAM is fully allocated to programs as well as all 
available swap space.  At this point, this is not a direct problem 
with Solr.  It *could* be a problem with Java itself, but that is not 
very likely.


Because Java is set to use only 8GB out of the 12GB you have on the 
machine, this suggests that you have at least one other 
memory-intensive application on the same server.  Are you using the 
same hardware to run a website and/or database?  Solr works best on 
dedicated hardware.


Thanks,
Shawn







Disc space and replication

2013-03-27 Thread Arkadi Colson

Hi


When replication is down for some time or an instance crashed for some 
reason replication will always start over again from the beginning. This 
means it will copy the whole shard over of about 150GB. So we need at 
least a disc of about 300 GB.


I've read somewhere that Solr will replicate everything when 100 entries 
are missing? Why is that? Is it configurable?


What about optimization? Is it still needed in SolrCloud? Will it reduce 
the disc usage? Does it also need twice the shard size to run successful?


Is it correct that currently the only option for now the make more 
shards to reduce the disc space?


Is the any progress in the resharding option the developers are working on?


Thx!

--
Met vriendelijke groeten

Arkadi Colson

Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
T +32 11 64 08 80 • F +32 11 64 08 81



Elasticsearch with kerberos

2013-03-27 Thread Debika Mukherjee
Hi,

Is there any integration of Solr with Kerberos?


Thanks and regards,
Debika Mukherjee
CLOUD BBSR
VOIP 6743071561

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: [ScriptUpdateProcessor] Params aren't being picked up from solrconfig

2013-03-27 Thread Rene Nederhand
I cannot believe I've looked over this :}
Thanks for helping me out. It works fine now.

I'd like to contribute to the wiki
pagehttp://wiki.apache.org/solr/ScriptUpdateProcessorand add a
python example. So, if anyone could allow me write access or tell
me how to do this without, I'd be happy to contribute.

On Wed, Mar 27, 2013 at 12:38 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : none of the params I specify in solrconfig.xml are being picked up. The
 : error I'm getting is: NameError: global name 'params' is not defined.

 ...

 :   updateRequestProcessorChain name=script
 : processor class=solr.StatelessScriptUpdateProcessorFactory
 :   str name=scriptsummarize.py/str
 : /processor
 : !--  optional parameters passed to script --
 :   lst name=params
 : str name=from_fieldabstract/str
 : str name=to_fieldsummary/str
 :   /lst

 ...that list of params isn't inside the processor tag, so
 StatelessScriptUpdateProcessorFactory doesn't know anything about it, so
 it's not passing it to the ScriptEngineManager


 -Hoss



Re: [ScriptUpdateProcessor] Params aren't being picked up from solrconfig

2013-03-27 Thread Steve Rowe
Hi Rene,

Thanks for offering to help with wiki documentation.

You'll need to register on the wiki first, then tell us your wiki username, and 
we'll add you to ContributorsGroup, which will allow you to make edits.

Steve

On Mar 27, 2013, at 7:40 AM, Rene Nederhand r...@nederhand.net wrote:

 I cannot believe I've looked over this :}
 Thanks for helping me out. It works fine now.
 
 I'd like to contribute to the wiki
 pagehttp://wiki.apache.org/solr/ScriptUpdateProcessorand add a
 python example. So, if anyone could allow me write access or tell
 me how to do this without, I'd be happy to contribute.
 
 On Wed, Mar 27, 2013 at 12:38 AM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:
 
 
 : none of the params I specify in solrconfig.xml are being picked up. The
 : error I'm getting is: NameError: global name 'params' is not defined.
 
...
 
 :   updateRequestProcessorChain name=script
 : processor class=solr.StatelessScriptUpdateProcessorFactory
 :   str name=scriptsummarize.py/str
 : /processor
 : !--  optional parameters passed to script --
 :   lst name=params
 : str name=from_fieldabstract/str
 : str name=to_fieldsummary/str
 :   /lst
 
 ...that list of params isn't inside the processor tag, so
 StatelessScriptUpdateProcessorFactory doesn't know anything about it, so
 it's not passing it to the ScriptEngineManager
 
 
 -Hoss
 



Re: Disc space and replication

2013-03-27 Thread Mark Miller

On Mar 27, 2013, at 3:57 AM, Arkadi Colson ark...@smartbit.be wrote:

 Hi
 
 
 When replication is down for some time or an instance crashed for some reason 
 replication will always start over again from the beginning. This means it 
 will copy the whole shard over of about 150GB. So we need at least a disc of 
 about 300 GB.
 
 I've read somewhere that Solr will replicate everything when 100 entries are 
 missing? Why is that? Is it configurable?

Not configurable. Are you using 4.2? It will not recopy any segment files that 
already exist on the replica - 4.0 and 4.1 copied all the files regardless in 
SolrCloud mode.

 
 What about optimization? Is it still needed in SolrCloud? Will it reduce the 
 disc usage? Does it also need twice the shard size to run successful?

I wouldn't optimize if you will continue to add/update documents. Use merge 
policy settings to control the segment count.

 
 Is it correct that currently the only option for now the make more shards to 
 reduce the disc space?

??

 
 Is the any progress in the resharding option the developers are working on?

Yes, see the JIRA issue on shard splitting.

- Mark

 
 
 Thx!
 
 -- 
 Met vriendelijke groeten
 
 Arkadi Colson
 
 Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen
 T +32 11 64 08 80 • F +32 11 64 08 81
 



Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-27 Thread Chris R
So - I must be missing something very basic here and I've gone back to the
Wiki example.  After setting up the two shard example in the first tutorial
and indexing the three example documents, look at the shards in the Admin
UI.  The documents are stored in the index where the update with directed -
they aren't distributed across both shards.

Release notes state that the compositeId router is the default when using
the numshards parameter?  I want an even distribution of documents based on
ID across all shards suggestions on what I'm screwing up.

Chris

On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote:

 I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
 don't specify numShards it goes into a mode where it's up to you to
 distribute updates.

 - Mark

 On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote:

  I have two issues and I'm unsure if they are related:
 
  Problem:  After setting up a multiple collection Solrcloud 4.1 instance
 on
  seven servers, when I index the documents they aren't distributed across
  the index slices.  It feels as though, I don't actually have a cloud
  implementation, yet everything I see in the admin interface and zookeeper
  implies I do.  I feel as I'm overlooking something obvious, but have not
  been able to figure out what.
 
  Configuration: Seven servers and four collections, each with 12 slices
 (no
  replica shards yet).  Zookeeper configured in a three node ensemble.
  When
  I send documents to Server1/Collection1 (which holds two slices of
  collection1), all the documents show up in a single index shard (core).
  Perhaps related, I have found it impossible to get Solr to recognize the
  server names with anything but a literal host=servername parameter in
 the
  solr.xml.  hostname parameters, host files, network, dns, are all
  configured correctly
 
  I have a Solr 4.0 single collection set up similarly and it works just
  fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
  implementation with only the luceneMatchVersion changed to LUCENE_41.
 
  sample solr.xml from server1
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true
  cores adminPath=/admin/cores hostPort=8080 host=server1
  shareSchema=true zkClientTimeout=6
  core collection=col201301 shard=col201301s04
  instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01
  dataDir=/solr/col201301/col201301s04sh01/data/
  core collection=col201301 shard=col201301s11
  instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01
  dataDir=/solr/col201301/col201301s11sh01/data/
  core collection=col201302 shard=col201302s06
  instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01
  dataDir=/solr/col201302/col201302s06sh01/data/
  core collection=col201303 shard=col201303s01
  instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01
  dataDir=/solr/col201303/col201303s01sh01/data/
  core collection=col201303 shard=col201303s08
  instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01
  dataDir=/solr/col201303/col201303s08sh01/data/
  core collection=col201304 shard=col201304s03
  instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01
  dataDir=/solr/col201304/col201304s03sh01/data/
  core collection=col201304 shard=col201304s10
  instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01
  dataDir=/solr/col201304/col201304s10sh01/data/
  /cores
  /solr
 
  Thanks
  Chris




Using multiple text files for Suggestor dictionarys

2013-03-27 Thread Eric Wilson
I'm using the Suggester component for autocomplete. I have a variety of
types of suggestions that I would like to offer, such as locations, company
names, products, and dictionary words.

These lists vary in size and volatility, so keeping them all in the same
text file is not the most convenient.

I'm using text files because I want the ability to add weights to the terms
suggested.

Is it possible to use multiple text files? I tried the following:

!-- WFSTLookup suggest component -- searchComponent class=
solr.SpellCheckComponent name=suggestword lst name=spellchecker 
str name=namesuggestword/str str name=classname
org.apache.solr.spelling.suggest.Suggester/str str name=lookupImpl
org.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=
storeDirsuggestword/str str name=buildOnCommitfalse/str !--
Suggester properties -- bool name=exactMatchFirsttrue/bool str name=
sourceLocation../data/words.txt/str str name=sourceLocation
../data/cities.txt/str /lst

But the second list, the cities, are apparently undetected, after
restarting the tomcat and rebuilding the dictionary. Can this be done? If
not, how would you recommend managing different dictionaries?

Thanks,

Eric Wilson


Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-27 Thread Furkan KAMACI
Hi Nate;

This may be out of topic however could you explain that why you want to use
Tomcat instead of Jetty or Embedded Jetty?


2013/3/27 Michael Della Bitta michael.della.bi...@appinions.com

 You're using the blocking IO connector, which isn't so great for heavy
 loads.

 Give this a shot... You'll end up with 8192 max connections by
 default, although this is tunable too:

 Run:
 apt-get install libapr1 libtcnative-1

 Add this to the list of Listeners at the top of server.xml:

 Listener className=org.apache.catalina.core.AprLifecycleListener
 SSLEngine=off /

 These instructions assume you're running Tomcat 6 or 7.

 Here's some documentation:
 http://tomcat.apache.org/tomcat-7.0-doc/apr.html
 http://tomcat.apache.org/tomcat-7.0-doc/config/http.html


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Tue, Mar 26, 2013 at 5:31 PM, Nate Fox n...@neogov.com wrote:
  We're not using ELB and I have no idea which connector I'm using - I'm
  guessing whatever is default (I'm a total noob). This is from my
 server.xml:
  Connector port=8080 protocol=HTTP/1.1
 connectionTimeout=6
 URIEncoding=UTF-8 redirectPort=8443 /
 
 
 
  --
  Nate Fox
  Sr Systems Engineer
 
  o: 310.658.5775
  m: 714.248.5350
 
  Follow us @NEOGOV http://twitter.com/NEOGOV and on
  Facebookhttp://www.facebook.com/neogov
 
  NEOGOV http://www.neogov.com/ is among the top fastest growing
 software
  companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and
  the LA Business Journal. We are hiring!
 http://www.neogov.com/#/company/careers
 
 
 
  On Tue, Mar 26, 2013 at 1:02 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com wrote:
 
  Nate,
 
  We just cleared up a problem similar to this by ditching Elastic Load
  Balancer and switching over to the APR connector in Tomcat. Are you
  using either of those?
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Tue, Mar 26, 2013 at 2:58 PM, Otis Gospodnetic
  otis.gospodne...@gmail.com wrote:
   Hi Nate,
  
   Try adding some warmup queries and making sure the setting for using
   the cold searcher in solrconfig.xml is set to false.  Your warmup
   queries should use facets and sorting if your normal queries use them.
In SPM you'll actually see how much time warming up takes, so you'll
   get a better idea of the cost of that (when you don't do it).
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Tue, Mar 26, 2013 at 2:50 PM, Nate Fox n...@neogov.com wrote:
   I was wondering if the warmup stuff was one of the culprits (we dont
  have
   warmup's at all - the configs are pretty stock).
   As for the system, it seems capable of quite a bit more: memory
 usage is
   ~30%, jvm-memory (from the dashboard) is very low (~220Mb out of 3Gb)
  and
   load below 1.00.
  
   The seed data and queries were put together by one of our developers.
  I've
   put all the solrmeter files here:
   https://gist.github.com/natefox/ee5cef3d4fbbc73e9bce
   Unfortunately I'm quite new to solr (and tomcat) so I'm not entirely
  sure
   which file does which specifically.
  
   Does the system's reaction to a 'fast load' without a warmup sound
  normal?
   I would have expected the first couple hundred queries to be very
 slow
   (500ms) and then the system catch up after a while. But it just dies
  very
   quickly and never recovers.
  
   I'll check out your SPM - I've seen it mentioned before. Thanks!
  
  
  
   --
   Nate Fox
   Sr Systems Engineer
  
   o: 310.658.5775
   m: 714.248.5350
  
   Follow us @NEOGOV http://twitter.com/NEOGOV and on
   Facebookhttp://www.facebook.com/neogov
  
   NEOGOV http://www.neogov.com/ is among the top fastest growing
  software
   companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500,
 and
   the LA Business Journal. We are hiring!
  http://www.neogov.com/#/company/careers
  
  
  
   On Tue, Mar 26, 2013 at 11:12 AM, Otis Gospodnetic 
   otis.gospodne...@gmail.com wrote:
  
   Hi,
  
   In short, certain data structures need to load from index in the
   beginning, (for sorting and faceting) caches need to warm up, JVM
   needs to warm up, etc., so going slowly in the beginning makes
 sense.
   Why things die after that is a different Q.  Maybe it OOMs?  Maybe
   queries are very complex?  What do your queries look like?  I see
   newrelic.jar in the command-line.  May want to try SPM for Solr, it
   has better Solr metrics.
  
   Otis
   --
   Solr  ElasticSearch Support
   http://sematext.com/
  
  
  
  
  
   On Tue, Mar 26, 2013 at 1:24 PM, Nate Fox n...@neogov.com wrote:
I'm new to solr and I'm load testing our setup to see what we can
  handle.

How do I recover the position and offset a highlight for solr (4.1/4.2)?

2013-03-27 Thread Skealler Nametic
Hi,

I would like to retrieve the position and offset of each highlighting found.
I searched on the internet, but I have not found the exact solution to my
problem...


Re: Elasticsearch with kerberos

2013-03-27 Thread Shawn Heisey

On 3/27/2013 5:29 AM, Debika Mukherjee wrote:

Is there any integration of Solr with Kerberos?


I am pretty sure that the answer is no.  Solr has no security features 
at all - it is intended to live where regular users cannot get to it.


Thanks,
Shawn



Querying a transitive closure?

2013-03-27 Thread Jack Park
This is a question about isA?

We want to know if M isA B   isA?(M,B)

For some M, one might be able to look into M to see its type or which
class(es) for which it is a subClass. We're talking taxonomic queries
now.
But, for some M, one might need to ripple up the transitive closure,
looking at all the super classes, etc, recursively.

It seems unreasonable to do that over HTTP; it seems more reasonable
to grab a core and write a custom isA query handler. But, how do you
do that in a SolrCloud?

Really curious...

Many thanks in advance for ideas.
Jack


Re: Elasticsearch with kerberos

2013-03-27 Thread Otis Gospodnetic
Debika,

Did you really mean to ask about Solr or ElasticSearch (see subject)?

I think your best bet is ManifoldCF, where I see some mention of it
http://search-lucene.com/?q=kerberos

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 11:55 AM, Shawn Heisey s...@elyograg.org wrote:
 On 3/27/2013 5:29 AM, Debika Mukherjee wrote:

 Is there any integration of Solr with Kerberos?


 I am pretty sure that the answer is no.  Solr has no security features at
 all - it is intended to live where regular users cannot get to it.

 Thanks,
 Shawn



Re: Querying a transitive closure?

2013-03-27 Thread Otis Gospodnetic
Hi Jack,

Is this really about HTTP and Solr vs. SolrCloud or more whether
Solr(Cloud) is the right tool for the job and if so how to structure
the schema and queries to make such lookups efficient?

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote:
 This is a question about isA?

 We want to know if M isA B   isA?(M,B)

 For some M, one might be able to look into M to see its type or which
 class(es) for which it is a subClass. We're talking taxonomic queries
 now.
 But, for some M, one might need to ripple up the transitive closure,
 looking at all the super classes, etc, recursively.

 It seems unreasonable to do that over HTTP; it seems more reasonable
 to grab a core and write a custom isA query handler. But, how do you
 do that in a SolrCloud?

 Really curious...

 Many thanks in advance for ideas.
 Jack


Re: Elasticsearch with kerberos

2013-03-27 Thread Chris Hostetter

:  Is there any integration of Solr with Kerberos?

:  I am pretty sure that the answer is no.  Solr has no security features at
:  all - it is intended to live where regular users cannot get to it.

The key question is how you define integration of Solr with Kerberos ? 
what is your goal?  How is it you want Kerberos to be used?

Because Solr is webapp that can run in any servlet container, you may be 
able to achieve your goals by using a servlet container that already 
supports kerberos (ie: if your goal is to use kerberose authentication of 
clients talking to Solr)

But w/o more details as to what it is you actually car about, there's no 
real way to give you a meaningful answer other then to say nothing in Solr 
requires or directly knows about kerberose authentication.

-Hoss


Solr Cloud update process

2013-03-27 Thread Walter Underwood
What do people do for updating, say from 4.1 to 4.2.1, on a live cluster?

I need to help our release engineering team create the Jenkins scripts for 
deployment.

wunder
--
Walter Underwood
wun...@wunderwood.org




Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Bill Au
I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
replicas, 4 nodes total.  Do query requests send to a node only query the
replica on that node, or are they load-balanced to the entire cluster?

Bill


Re: Solr Cloud update process

2013-03-27 Thread Shawn Heisey

On 3/27/2013 12:34 PM, Walter Underwood wrote:

What do people do for updating, say from 4.1 to 4.2.1, on a live cluster?

I need to help our release engineering team create the Jenkins scripts for 
deployment.


Aside from replacing the .war file and restarting your container, there 
hopefully won't be anything additional required.


The subject says SolrCloud, so your config(s) should be in zookeeper. 
It would generally be a good idea to update luceneMatchVersion to 
LUCENE_42 in the config(s), unless you happen to know that you're 
relying on behavior from the old version that changed in the new version.


I also make a point of deleting the old extracted version of the .war 
before restarting, just to be sure there won't be any problems.  In 
theory a servlet container should be able to handle this without 
intervention, but I don't like taking the chance.


Thanks,
Shawn



Re: Querying a transitive closure?

2013-03-27 Thread Jack Park
Hi Otis,

I fully expect to grow to SolrCloud -- many shards. For now, it's
solo. But, my thinking relates to cloud. I look for ways to reduce the
number of HTTP round trips through SolrJ. Maybe you have some ideas?

Thanks
Jack

On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi Jack,

 Is this really about HTTP and Solr vs. SolrCloud or more whether
 Solr(Cloud) is the right tool for the job and if so how to structure
 the schema and queries to make such lookups efficient?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote:
 This is a question about isA?

 We want to know if M isA B   isA?(M,B)

 For some M, one might be able to look into M to see its type or which
 class(es) for which it is a subClass. We're talking taxonomic queries
 now.
 But, for some M, one might need to ripple up the transitive closure,
 looking at all the super classes, etc, recursively.

 It seems unreasonable to do that over HTTP; it seems more reasonable
 to grab a core and write a custom isA query handler. But, how do you
 do that in a SolrCloud?

 Really curious...

 Many thanks in advance for ideas.
 Jack


Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-27 Thread Nate Fox
Update: issue resolved!
Cranking up the maxThreads did the trick. Default is 200. I went with 2500
for grins and giggles and things work great. Now, even if I overwhelm the
box with too many requests, when the requests back off the box continues to
respond. And when I slam the server after it's been restarted (without
having warmup queries), it acts as I wanted: queries are slow to respond
(upwards of 30s) for the first couple minutes then they start to all be
under 25ms and normalize at a very fast pace (obviously as the cache is
warmed).

Christopher, I could have sworn I tried upping acceptCount, maxConnections
and maxThreads in my testing, but with your prodding I tried it again - and
that was the solution.

I have a couple quick followup questions:
- What is the downside of having a maxThreads, acceptCount and
maxConnections really high? Obviously defaults are there for a reason - I'd
like to know what the reasoning is.
- Any reason I shouldnt use Tomcat? I just went with it because I figured
it was extremely mature and was easy to use with apt-get :)

I'll probably toy with the APR as suggested by Michael, as I like the idea
of a non-blocking connector.





--
Nate Fox
Sr Systems Engineer

o: 310.658.5775
m: 714.248.5350

Follow us @NEOGOV http://twitter.com/NEOGOV and on
Facebookhttp://www.facebook.com/neogov

NEOGOV http://www.neogov.com/ is among the top fastest growing software
companies in the USA, recognized by Inc 500|5000, Deloitte Fast 500, and
the LA Business Journal. We are hiring!http://www.neogov.com/#/company/careers



On Tue, Mar 26, 2013 at 5:56 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : * When I set solrmeter to run 4000 queries/min, it will handle a few
 : hundred queries and then tomcat will stop responding completely to
 requests
 : (even though according to lsof -i it is still listening and the java
 : process is still running).

 have you tried tacking using jstack to generate a thread dump of the
 server to see what it's doing?

 : * When I set solrmeter to run 1000 queries/min it runs fine. I can stop
 : solrmeter after a couple of  minutes at that pace and then run at
 4000/min
 : without issue.
 :
 : It's as if it needs a ramp up time? Also, I noticed (regardless of ramp
 up)
 : that my setup cannot handle 8000/min. The reaction at 8k/min is the same
 as
 : if I were to run 4k/min without the ramp up. Of note, only the shard that
 : solrmeter is pointed to stops responding. The other shard hums along
 : without incident.

 Just to clarify: you're running a 2 node SolrCloud cluster, where each
 node contains a unique shard, and pointing solrmeter at a single node for
 the queries -- correct?

 Here's my hunch: you are probably hitting the limit of the number of
 concurrent connections tomcat will allow (whatever it may be confiurged
 ot in your setup).

 In the 8000/min case, you are probably maxing out that limit with direct
 connections you issue from solrmeter to that single node.

 In the 4000/min case, each request you issue causes that single node to
 fire off multiple requests to each shard, and since each shard exists on
 only one node, you are garunteeing thta you double the number of
 concurrent requests hitting that first node.

 in the case where you start w/ 1000/min, and then later ramp up to
 4000/min, you are probably causing enough of the queries to be warmed up
 that they are in the caches on both nodes, so they can be served really
 fast and return their results before you reach that max number of
 concurrent connections after you ramp up.

 I'm no tomcat expert, but skimming hte docs, you may want to look at
 settings like acceptCount, maxConnections, maxThreads, etc...

 -Hoss



Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Mark Miller
They are load-balanced across the cluster unless you pass the distrib=false 
param.

- Mark

On Mar 27, 2013, at 2:51 PM, Bill Au bill.w...@gmail.com wrote:

 I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
 replicas, 4 nodes total.  Do query requests send to a node only query the
 replica on that node, or are they load-balanced to the entire cluster?
 
 Bill



Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Erik Hatcher
Requests to a node in your example would be answered by that node (no need to 
distribute; it's a single shard system) and it would not internally be routed 
otherwise either.  Ultimately it is up to the client to load-balance the 
initial requests into a SolrCloud cluster, but internally in a multi-shard 
distributed search request it will be load balanced beyond that initial node.

CloudSolrServer does load balance, so if you're using that client it'll 
randomly pick a shard to send to from the client-side.  If you're using some 
other mechanism, it'll request directly to whatever node that you've specified 
directly for that initial request.

Erik

p.s. Thanks for attending the webinar, Bill!   I saw your name as one of the 
question askers.  Hopefully all that stuff I made up is close to the truth :)



On Mar 27, 2013, at 14:51 , Bill Au wrote:

 I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
 replicas, 4 nodes total.  Do query requests send to a node only query the
 replica on that node, or are they load-balanced to the entire cluster?
 
 Bill



Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-27 Thread Shawn Heisey

On 3/27/2013 1:16 PM, Nate Fox wrote:

I have a couple quick followup questions:
- What is the downside of having a maxThreads, acceptCount and
maxConnections really high? Obviously defaults are there for a reason - I'd
like to know what the reasoning is.
- Any reason I shouldnt use Tomcat? I just went with it because I figured
it was extremely mature and was easy to use with apt-get :)


The maxThreads parameter in the jetty config that's included with Solr 
is set to 1 - this is the value chosen by Solr's development team. 
Your setting of 2500 should be perfectly fine, and it is definitely not 
really high.  The default of 200 in your distribution is very low.


Tomcat is certainly a viable solution, one used by many.  It is very 
mature and has proven itself.  The really nice thing with using an 
OS-packaged version is that you don't have to write or change the init 
script.  I use the jetty that was included with Solr, and had to write 
my own init script.


Jetty, especially the stripped-down version included with Solr, has a 
smaller footprint than tomcat.  The bells and whistles are not required. 
 It is not better or worse than tomcat, just another choice.


Thanks,
Shawn



Re: Loadtesting solr/tomcat7 and tomcat stops responding entirely

2013-03-27 Thread Mark Miller

On Mar 27, 2013, at 3:29 PM, Shawn Heisey s...@elyograg.org wrote:

 The maxThreads parameter in the jetty config that's included with Solr is set 
 to 1

Yonik raised this at some point if I remember right - it helps avoid some 
distrib deadlock issue.

- Mark



Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Timothy Potter
When running in SolrCloud mode, does it make sense to disable distributed
mode for warming queries? i.e. distrib=false in my warming query config

I actually asked this on Erik's informative Webinar this morning but had to
drop off before I heard the answer ... so Erik might have answered this
already ;-)

My thinking here is that a hard commit gets sent around the cluster
automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard
commit, all 36 nodes will be warming up. If my warming queries are
distributed, then all nodes are going to be sending the same query
needlessly around the cluster 36 times - seems unnecessary.

Thoughts?

Cheers,
Tim


Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Mark Miller
Yup. You only want to warm locally. We should add that to the wiki.

- Mark

On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote:

 When running in SolrCloud mode, does it make sense to disable distributed
 mode for warming queries? i.e. distrib=false in my warming query config
 
 I actually asked this on Erik's informative Webinar this morning but had to
 drop off before I heard the answer ... so Erik might have answered this
 already ;-)
 
 My thinking here is that a hard commit gets sent around the cluster
 automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard
 commit, all 36 nodes will be warming up. If my warming queries are
 distributed, then all nodes are going to be sending the same query
 needlessly around the cluster 36 times - seems unnecessary.
 
 Thoughts?
 
 Cheers,
 Tim



Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Timothy Potter
Ok - thanks for confirming Mark - I'll add that to the wiki.

Cheers,
Tim

On Wed, Mar 27, 2013 at 1:59 PM, Mark Miller markrmil...@gmail.com wrote:

 Yup. You only want to warm locally. We should add that to the wiki.

 - Mark

 On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote:

  When running in SolrCloud mode, does it make sense to disable distributed
  mode for warming queries? i.e. distrib=false in my warming query config
 
  I actually asked this on Erik's informative Webinar this morning but had
 to
  drop off before I heard the answer ... so Erik might have answered this
  already ;-)
 
  My thinking here is that a hard commit gets sent around the cluster
  automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard
  commit, all 36 nodes will be warming up. If my warming queries are
  distributed, then all nodes are going to be sending the same query
  needlessly around the cluster 36 times - seems unnecessary.
 
  Thoughts?
 
  Cheers,
  Tim




Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread santoash
This is interesting. I'm looking into doing something similar too. 

Quick question: Would you be targeting each of the shard with exactly the same 
set of queries? 


On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote:

 Yup. You only want to warm locally. We should add that to the wiki.
 
 - Mark
 
 On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote:
 
 When running in SolrCloud mode, does it make sense to disable distributed
 mode for warming queries? i.e. distrib=false in my warming query config
 
 I actually asked this on Erik's informative Webinar this morning but had to
 drop off before I heard the answer ... so Erik might have answered this
 already ;-)
 
 My thinking here is that a hard commit gets sent around the cluster
 automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard
 commit, all 36 nodes will be warming up. If my warming queries are
 distributed, then all nodes are going to be sending the same query
 needlessly around the cluster 36 times - seems unnecessary.
 
 Thoughts?
 
 Cheers,
 Tim
 



Re: Solr 4.1 SolrCloud with 1 shard and 3 replicas

2013-03-27 Thread Bill Au
Thanks for the info, Erik.

I had gone through the tutorial in the SolrCloud Wiki and verified that
queries are load balanced in the two shard cluster with shard replicas
setup.  I was wondering if I need to explicitly specify distrib=false in my
single shard setup.  Glad to see that Solr is doing the right thing by
default in my case.

Bill

ps thanks for a very informative webinar.  I am going to recommend it to my
co-workers once the recording is available


On Wed, Mar 27, 2013 at 3:26 PM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Requests to a node in your example would be answered by that node (no need
 to distribute; it's a single shard system) and it would not internally be
 routed otherwise either.  Ultimately it is up to the client to load-balance
 the initial requests into a SolrCloud cluster, but internally in a
 multi-shard distributed search request it will be load balanced beyond that
 initial node.

 CloudSolrServer does load balance, so if you're using that client it'll
 randomly pick a shard to send to from the client-side.  If you're using
 some other mechanism, it'll request directly to whatever node that you've
 specified directly for that initial request.

 Erik

 p.s. Thanks for attending the webinar, Bill!   I saw your name as one of
 the question askers.  Hopefully all that stuff I made up is close to the
 truth :)



 On Mar 27, 2013, at 14:51 , Bill Au wrote:

  I am running Solr 4.1.  I have set up SolrCloud with 1 leader and 3
  replicas, 4 nodes total.  Do query requests send to a node only query the
  replica on that node, or are they load-balanced to the entire cluster?
 
  Bill




Query on all dynamic fields or wildcard field query

2013-03-27 Thread Luis Lebolo
Hi All,

First I have to apologize and admit that I'm asking this question before
doing any real research =( Was hoping for some preliminary help before I
start this endeavor tomorrow. So here goes:

Can I query for a value in multiple (wildcarded) fields?

For example, if I have dynamic fields fieldName_someToken (e.g.
fieldName_1, fieldName_2, fieldName_3), can I construct a query like
fieldName_*:someValue?

The query itself doesn't work, but is there a way to query numerous dynamic
fields without explicitly listing them?

Thanks,
Luis


Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Timothy Potter
In our case, yes - same non-distrib query is warmed on each node. Seems
like you'd need something a little more dynamic than statically configured
warming queries in solrconfig.xml for targeting specfic shards.

Tim

On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote:

 This is interesting. I'm looking into doing something similar too.

 Quick question: Would you be targeting each of the shard with exactly the
 same set of queries?


 On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote:

  Yup. You only want to warm locally. We should add that to the wiki.
 
  - Mark
 
  On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com
 wrote:
 
  When running in SolrCloud mode, does it make sense to disable
 distributed
  mode for warming queries? i.e. distrib=false in my warming query config
 
  I actually asked this on Erik's informative Webinar this morning but
 had to
  drop off before I heard the answer ... so Erik might have answered this
  already ;-)
 
  My thinking here is that a hard commit gets sent around the cluster
  automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard
  commit, all 36 nodes will be warming up. If my warming queries are
  distributed, then all nodes are going to be sending the same query
  needlessly around the cluster 36 times - seems unnecessary.
 
  Thoughts?
 
  Cheers,
  Tim
 




Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Joel Bernstein
This jira looks like it addresses this.

https://issues.apache.org/jira/browse/SOLR-3081

I'll run a quick test.


On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.comwrote:

 In our case, yes - same non-distrib query is warmed on each node. Seems
 like you'd need something a little more dynamic than statically configured
 warming queries in solrconfig.xml for targeting specfic shards.

 Tim

 On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote:

  This is interesting. I'm looking into doing something similar too.
 
  Quick question: Would you be targeting each of the shard with exactly the
  same set of queries?
 
 
  On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com wrote:
 
   Yup. You only want to warm locally. We should add that to the wiki.
  
   - Mark
  
   On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com
  wrote:
  
   When running in SolrCloud mode, does it make sense to disable
  distributed
   mode for warming queries? i.e. distrib=false in my warming query
 config
  
   I actually asked this on Erik's informative Webinar this morning but
  had to
   drop off before I heard the answer ... so Erik might have answered
 this
   already ;-)
  
   My thinking here is that a hard commit gets sent around the cluster
   automatically. Say I have 36 nodes (18 leaders and 18 replicas), on
 hard
   commit, all 36 nodes will be warming up. If my warming queries are
   distributed, then all nodes are going to be sending the same query
   needlessly around the cluster 36 times - seems unnecessary.
  
   Thoughts?
  
   Cheers,
   Tim
  
 
 




-- 
Joel Bernstein
Professional Services LucidWorks


Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Joel Bernstein
I ran a quick test and distrib=false is being tacked on automatically. Here
is the log record:

INFO: [collection1] webapp=null path=null
params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1
status=0 QTime=17

So I think this is OK.





On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com wrote:

 This jira looks like it addresses this.

 https://issues.apache.org/jira/browse/SOLR-3081

 I'll run a quick test.


 On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.comwrote:

 In our case, yes - same non-distrib query is warmed on each node. Seems
 like you'd need something a little more dynamic than statically configured
 warming queries in solrconfig.xml for targeting specfic shards.

 Tim

 On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote:

  This is interesting. I'm looking into doing something similar too.
 
  Quick question: Would you be targeting each of the shard with exactly
 the
  same set of queries?
 
 
  On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
   Yup. You only want to warm locally. We should add that to the wiki.
  
   - Mark
  
   On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com
  wrote:
  
   When running in SolrCloud mode, does it make sense to disable
  distributed
   mode for warming queries? i.e. distrib=false in my warming query
 config
  
   I actually asked this on Erik's informative Webinar this morning but
  had to
   drop off before I heard the answer ... so Erik might have answered
 this
   already ;-)
  
   My thinking here is that a hard commit gets sent around the cluster
   automatically. Say I have 36 nodes (18 leaders and 18 replicas), on
 hard
   commit, all 36 nodes will be warming up. If my warming queries are
   distributed, then all nodes are going to be sending the same query
   needlessly around the cluster 36 times - seems unnecessary.
  
   Thoughts?
  
   Cheers,
   Tim
  
 
 




 --
 Joel Bernstein
 Professional Services LucidWorks




-- 
Joel Bernstein
Professional Services LucidWorks


Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Mark Miller
Ah, interesting. Forgot about doing that issue entirely.

- Mark

On Mar 27, 2013, at 6:25 PM, Joel Bernstein joels...@gmail.com wrote:

 I ran a quick test and distrib=false is being tacked on automatically. Here
 is the log record:
 
 INFO: [collection1] webapp=null path=null
 params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1
 status=0 QTime=17
 
 So I think this is OK.
 
 
 
 
 
 On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com wrote:
 
 This jira looks like it addresses this.
 
 https://issues.apache.org/jira/browse/SOLR-3081
 
 I'll run a quick test.
 
 
 On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.comwrote:
 
 In our case, yes - same non-distrib query is warmed on each node. Seems
 like you'd need something a little more dynamic than statically configured
 warming queries in solrconfig.xml for targeting specfic shards.
 
 Tim
 
 On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote:
 
 This is interesting. I'm looking into doing something similar too.
 
 Quick question: Would you be targeting each of the shard with exactly
 the
 same set of queries?
 
 
 On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 Yup. You only want to warm locally. We should add that to the wiki.
 
 - Mark
 
 On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com
 wrote:
 
 When running in SolrCloud mode, does it make sense to disable
 distributed
 mode for warming queries? i.e. distrib=false in my warming query
 config
 
 I actually asked this on Erik's informative Webinar this morning but
 had to
 drop off before I heard the answer ... so Erik might have answered
 this
 already ;-)
 
 My thinking here is that a hard commit gets sent around the cluster
 automatically. Say I have 36 nodes (18 leaders and 18 replicas), on
 hard
 commit, all 36 nodes will be warming up. If my warming queries are
 distributed, then all nodes are going to be sending the same query
 needlessly around the cluster 36 times - seems unnecessary.
 
 Thoughts?
 
 Cheers,
 Tim
 
 
 
 
 
 
 
 --
 Joel Bernstein
 Professional Services LucidWorks
 
 
 
 
 -- 
 Joel Bernstein
 Professional Services LucidWorks



Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Timothy Potter
lol - you know you're a bad ass when you've forgotten more about Solr cloud
than the rest of us know ;-)

On Wed, Mar 27, 2013 at 4:41 PM, Mark Miller markrmil...@gmail.com wrote:

 Ah, interesting. Forgot about doing that issue entirely.

 - Mark

 On Mar 27, 2013, at 6:25 PM, Joel Bernstein joels...@gmail.com wrote:

  I ran a quick test and distrib=false is being tacked on automatically.
 Here
  is the log record:
 
  INFO: [collection1] webapp=null path=null
  params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1
  status=0 QTime=17
 
  So I think this is OK.
 
 
 
 
 
  On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com
 wrote:
 
  This jira looks like it addresses this.
 
  https://issues.apache.org/jira/browse/SOLR-3081
 
  I'll run a quick test.
 
 
  On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.com
 wrote:
 
  In our case, yes - same non-distrib query is warmed on each node. Seems
  like you'd need something a little more dynamic than statically
 configured
  warming queries in solrconfig.xml for targeting specfic shards.
 
  Tim
 
  On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote:
 
  This is interesting. I'm looking into doing something similar too.
 
  Quick question: Would you be targeting each of the shard with exactly
  the
  same set of queries?
 
 
  On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Yup. You only want to warm locally. We should add that to the wiki.
 
  - Mark
 
  On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com
  wrote:
 
  When running in SolrCloud mode, does it make sense to disable
  distributed
  mode for warming queries? i.e. distrib=false in my warming query
  config
 
  I actually asked this on Erik's informative Webinar this morning but
  had to
  drop off before I heard the answer ... so Erik might have answered
  this
  already ;-)
 
  My thinking here is that a hard commit gets sent around the cluster
  automatically. Say I have 36 nodes (18 leaders and 18 replicas), on
  hard
  commit, all 36 nodes will be warming up. If my warming queries are
  distributed, then all nodes are going to be sending the same query
  needlessly around the cluster 36 times - seems unnecessary.
 
  Thoughts?
 
  Cheers,
  Tim
 
 
 
 
 
 
 
  --
  Joel Bernstein
  Professional Services LucidWorks
 
 
 
 
  --
  Joel Bernstein
  Professional Services LucidWorks




Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Joel Bernstein
That was a good fix Mark. I had this picture in my head of a large Solr
Cloud sending around thousands of simultaneous searches and crashing itself.


On Wed, Mar 27, 2013 at 6:47 PM, Timothy Potter thelabd...@gmail.comwrote:

 lol - you know you're a bad ass when you've forgotten more about Solr cloud
 than the rest of us know ;-)

 On Wed, Mar 27, 2013 at 4:41 PM, Mark Miller markrmil...@gmail.com
 wrote:

  Ah, interesting. Forgot about doing that issue entirely.
 
  - Mark
 
  On Mar 27, 2013, at 6:25 PM, Joel Bernstein joels...@gmail.com wrote:
 
   I ran a quick test and distrib=false is being tacked on automatically.
  Here
   is the log record:
  
   INFO: [collection1] webapp=null path=null
   params={sort=price+ascevent=newSearcherq=solrdistrib=false} hits=1
   status=0 QTime=17
  
   So I think this is OK.
  
  
  
  
  
   On Wed, Mar 27, 2013 at 6:02 PM, Joel Bernstein joels...@gmail.com
  wrote:
  
   This jira looks like it addresses this.
  
   https://issues.apache.org/jira/browse/SOLR-3081
  
   I'll run a quick test.
  
  
   On Wed, Mar 27, 2013 at 5:41 PM, Timothy Potter thelabd...@gmail.com
  wrote:
  
   In our case, yes - same non-distrib query is warmed on each node.
 Seems
   like you'd need something a little more dynamic than statically
  configured
   warming queries in solrconfig.xml for targeting specfic shards.
  
   Tim
  
   On Wed, Mar 27, 2013 at 2:04 PM, santoash santo...@me.com wrote:
  
   This is interesting. I'm looking into doing something similar too.
  
   Quick question: Would you be targeting each of the shard with
 exactly
   the
   same set of queries?
  
  
   On Mar 27, 2013, at 12:59 PM, Mark Miller markrmil...@gmail.com
   wrote:
  
   Yup. You only want to warm locally. We should add that to the wiki.
  
   - Mark
  
   On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com
   wrote:
  
   When running in SolrCloud mode, does it make sense to disable
   distributed
   mode for warming queries? i.e. distrib=false in my warming query
   config
  
   I actually asked this on Erik's informative Webinar this morning
 but
   had to
   drop off before I heard the answer ... so Erik might have answered
   this
   already ;-)
  
   My thinking here is that a hard commit gets sent around the
 cluster
   automatically. Say I have 36 nodes (18 leaders and 18 replicas),
 on
   hard
   commit, all 36 nodes will be warming up. If my warming queries are
   distributed, then all nodes are going to be sending the same query
   needlessly around the cluster 36 times - seems unnecessary.
  
   Thoughts?
  
   Cheers,
   Tim
  
  
  
  
  
  
  
   --
   Joel Bernstein
   Professional Services LucidWorks
  
  
  
  
   --
   Joel Bernstein
   Professional Services LucidWorks
 
 




-- 
Joel Bernstein
Professional Services LucidWorks


Re: Query on all dynamic fields or wildcard field query

2013-03-27 Thread Jack Krupansky
No, but you can use the dismax feature of the dismax and edismax query 
parsers to specify a static list of any number of fields to be searched for 
terms in a query that do not have an explicit field specified.


And, no harm filing a Jira to request support for a wildcard field search 
feature.


-- Jack Krupansky

-Original Message- 
From: Luis Lebolo

Sent: Wednesday, March 27, 2013 5:08 PM
To: solr-user
Subject: Query on all dynamic fields or wildcard field query

Hi All,

First I have to apologize and admit that I'm asking this question before
doing any real research =( Was hoping for some preliminary help before I
start this endeavor tomorrow. So here goes:

Can I query for a value in multiple (wildcarded) fields?

For example, if I have dynamic fields fieldName_someToken (e.g.
fieldName_1, fieldName_2, fieldName_3), can I construct a query like
fieldName_*:someValue?

The query itself doesn't work, but is there a way to query numerous dynamic
fields without explicitly listing them?

Thanks,
Luis 



Re: Solr index Backup and restore of large indexs

2013-03-27 Thread Joel Bernstein
Hi,

Are you running Solr Cloud or Master/Slave? I'm assuming with 1TB a day
you're sharding.

With master/slave you can configure incremental index replication to
another core. The backup core can be local on the server, on a separate
sever or in a separate data center.

With Solr Cloud replicas can be setup to automatically have redundant
copies of the index. These copies though are live copies and will handle
queries. Replicating data to a separate data center is typically not done
through Solr Cloud replication.

Joel


On Mon, Mar 25, 2013 at 11:43 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 Try something like this: http://host/solr/replication?command=backup

 See: http://wiki.apache.org/solr/SolrReplication

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Thu, Mar 21, 2013 at 3:23 AM, Sandeep Kumar Anumalla
 sanuma...@etisalat.ae wrote:
 
  Hi,
 
  We are loading daily 1TB (Apprx) of index data .Please let me know the
 best procedure to take Backup and restore of the indexes. I am using Solr
 4.2.
 
 
 
  Thanks  Regards
  Sandeep A
  Ext : 02618-2856
  M : 0502493820
 
 
  
  The content of this email together with any attachments, statements and
 opinions expressed herein contains information that is private and
 confidential are intended for the named addressee(s) only. If you are not
 the addressee of this email you may not copy, forward, disclose or
 otherwise use it or any part of it in any form whatsoever. If you have
 received this message in error please notify postmas...@etisalat.ae by
 email immediately and delete the message without making any copies.




-- 
Joel Bernstein
Professional Services LucidWorks


Solr sorting and relevance

2013-03-27 Thread scallawa
We are using solr for search on our ecommerce site that primarily sells
clothing.  We index search terms based on a title field and description
field.  

We want to be able to sort by most relevant and what we have more inventory
(there is a field for that).  We have done some coding outside of Solr to
try and achieve this but it causes the following problem.

Let's take jeans and boots as an example.  A customer might search on boots
and solr returns a bunch of boots and jeans.  The jeans are included because
the description might contain some data like pant legs fits easily over
boots.  Now if we have more inventory in the particular jeans than the boots
solr returned, the user will get back a list that shows mostly jeans at top
and then somewhere down the list boots will show up.  

There isn't a problem with the jeans showing up but the boots should
actually be displayed first with the ones having the most inventory then the
jeans can be somewhere at the bottom of the list.

I want to eliminate the hacks that have been done to try to incorporate
inventory, i.e. have solr return the results and not manipulate it in code.

I hope I have explained the problem enough for you to get the gist of what I
am trying to accomplish.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-27 Thread Erick Erickson
First, three documents isn't enough to really test. The formula for
assigning shards is to hash on the unique ID. It _is_ possible that
all three just happened to land on the same shard. If you index all 32
docs in the example dir and they're all on the same shard, we should
talk.

Second, a regular query to the cluster will always search all the
shards. Use distrib=false on the URL to restrict the search to just
the node you fire the request at.

Let us know if you index more docs and still see the problem.

Best
Erick

On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote:
 So - I must be missing something very basic here and I've gone back to the
 Wiki example.  After setting up the two shard example in the first tutorial
 and indexing the three example documents, look at the shards in the Admin
 UI.  The documents are stored in the index where the update with directed -
 they aren't distributed across both shards.

 Release notes state that the compositeId router is the default when using
 the numshards parameter?  I want an even distribution of documents based on
 ID across all shards suggestions on what I'm screwing up.

 Chris

 On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote:

 I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
 don't specify numShards it goes into a mode where it's up to you to
 distribute updates.

 - Mark

 On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote:

  I have two issues and I'm unsure if they are related:
 
  Problem:  After setting up a multiple collection Solrcloud 4.1 instance
 on
  seven servers, when I index the documents they aren't distributed across
  the index slices.  It feels as though, I don't actually have a cloud
  implementation, yet everything I see in the admin interface and zookeeper
  implies I do.  I feel as I'm overlooking something obvious, but have not
  been able to figure out what.
 
  Configuration: Seven servers and four collections, each with 12 slices
 (no
  replica shards yet).  Zookeeper configured in a three node ensemble.
  When
  I send documents to Server1/Collection1 (which holds two slices of
  collection1), all the documents show up in a single index shard (core).
  Perhaps related, I have found it impossible to get Solr to recognize the
  server names with anything but a literal host=servername parameter in
 the
  solr.xml.  hostname parameters, host files, network, dns, are all
  configured correctly
 
  I have a Solr 4.0 single collection set up similarly and it works just
  fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
  implementation with only the luceneMatchVersion changed to LUCENE_41.
 
  sample solr.xml from server1
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true
  cores adminPath=/admin/cores hostPort=8080 host=server1
  shareSchema=true zkClientTimeout=6
  core collection=col201301 shard=col201301s04
  instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01
  dataDir=/solr/col201301/col201301s04sh01/data/
  core collection=col201301 shard=col201301s11
  instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01
  dataDir=/solr/col201301/col201301s11sh01/data/
  core collection=col201302 shard=col201302s06
  instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01
  dataDir=/solr/col201302/col201302s06sh01/data/
  core collection=col201303 shard=col201303s01
  instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01
  dataDir=/solr/col201303/col201303s01sh01/data/
  core collection=col201303 shard=col201303s08
  instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01
  dataDir=/solr/col201303/col201303s08sh01/data/
  core collection=col201304 shard=col201304s03
  instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01
  dataDir=/solr/col201304/col201304s03sh01/data/
  core collection=col201304 shard=col201304s10
  instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01
  dataDir=/solr/col201304/col201304s10sh01/data/
  /cores
  /solr
 
  Thanks
  Chris




Re: Warming queries and Solr Cloud - just curious ...

2013-03-27 Thread Erick Erickson
Tim:

Unfortunately, due to the increase in spam pages from bots, we had to
lock down the Solr wiki. Post a request for us to add your Wiki ID
(and give us the ID!) to the list of authorized IDs and we'll get you
added (just takes a second). Or send me (or Steve Rowe) a private
e-mail if you'd prefer.

Best
Erick

On Wed, Mar 27, 2013 at 4:03 PM, Timothy Potter thelabd...@gmail.com wrote:
 Ok - thanks for confirming Mark - I'll add that to the wiki.

 Cheers,
 Tim

 On Wed, Mar 27, 2013 at 1:59 PM, Mark Miller markrmil...@gmail.com wrote:

 Yup. You only want to warm locally. We should add that to the wiki.

 - Mark

 On Mar 27, 2013, at 3:54 PM, Timothy Potter thelabd...@gmail.com wrote:

  When running in SolrCloud mode, does it make sense to disable distributed
  mode for warming queries? i.e. distrib=false in my warming query config
 
  I actually asked this on Erik's informative Webinar this morning but had
 to
  drop off before I heard the answer ... so Erik might have answered this
  already ;-)
 
  My thinking here is that a hard commit gets sent around the cluster
  automatically. Say I have 36 nodes (18 leaders and 18 replicas), on hard
  commit, all 36 nodes will be warming up. If my warming queries are
  distributed, then all nodes are going to be sending the same query
  needlessly around the cluster 36 times - seems unnecessary.
 
  Thoughts?
 
  Cheers,
  Tim




Re: Solrcloud 4.1 Collection with multiple slices only use

2013-03-27 Thread corgone
I realized my error shortly, more docs, better spread.  I continued to do some 
testing to see how I could manually lay out the shards in what I thought was a 
more organized manner and with more descriptive  names than the numshards 
parameter alone produced.  I also gen'd up a few thousand docs and schema to 
test with.

Appreciate the help.



- Reply message -
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Subject: Solrcloud 4.1 Collection with multiple slices only use
Date: Wed, Mar 27, 2013 9:30 pm


First, three documents isn't enough to really test. The formula for
assigning shards is to hash on the unique ID. It _is_ possible that
all three just happened to land on the same shard. If you index all 32
docs in the example dir and they're all on the same shard, we should
talk.

Second, a regular query to the cluster will always search all the
shards. Use distrib=false on the URL to restrict the search to just
the node you fire the request at.

Let us know if you index more docs and still see the problem.

Best
Erick

On Wed, Mar 27, 2013 at 9:39 AM, Chris R corg...@gmail.com wrote:
 So - I must be missing something very basic here and I've gone back to the
 Wiki example.  After setting up the two shard example in the first tutorial
 and indexing the three example documents, look at the shards in the Admin
 UI.  The documents are stored in the index where the update with directed -
 they aren't distributed across both shards.

 Release notes state that the compositeId router is the default when using
 the numshards parameter?  I want an even distribution of documents based on
 ID across all shards suggestions on what I'm screwing up.

 Chris

 On Mon, Mar 25, 2013 at 11:34 PM, Mark Miller markrmil...@gmail.com wrote:

 I'm guessing you didn't specify numShards. Things changed in 4.1 - if you
 don't specify numShards it goes into a mode where it's up to you to
 distribute updates.

 - Mark

 On Mar 25, 2013, at 10:29 PM, Chris R corg...@gmail.com wrote:

  I have two issues and I'm unsure if they are related:
 
  Problem:  After setting up a multiple collection Solrcloud 4.1 instance
 on
  seven servers, when I index the documents they aren't distributed across
  the index slices.  It feels as though, I don't actually have a cloud
  implementation, yet everything I see in the admin interface and zookeeper
  implies I do.  I feel as I'm overlooking something obvious, but have not
  been able to figure out what.
 
  Configuration: Seven servers and four collections, each with 12 slices
 (no
  replica shards yet).  Zookeeper configured in a three node ensemble.
  When
  I send documents to Server1/Collection1 (which holds two slices of
  collection1), all the documents show up in a single index shard (core).
  Perhaps related, I have found it impossible to get Solr to recognize the
  server names with anything but a literal host=servername parameter in
 the
  solr.xml.  hostname parameters, host files, network, dns, are all
  configured correctly
 
  I have a Solr 4.0 single collection set up similarly and it works just
  fine.  I'm using the same schema.xml and solrconfig.xml files on the 4.1
  implementation with only the luceneMatchVersion changed to LUCENE_41.
 
  sample solr.xml from server1
 
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true
  cores adminPath=/admin/cores hostPort=8080 host=server1
  shareSchema=true zkClientTimeout=6
  core collection=col201301 shard=col201301s04
  instanceDir=/solr/col201301/col201301s04sh01 name=col201301s04sh01
  dataDir=/solr/col201301/col201301s04sh01/data/
  core collection=col201301 shard=col201301s11
  instanceDir=/solr/col201301/col201301s11sh01 name=col201301s11sh01
  dataDir=/solr/col201301/col201301s11sh01/data/
  core collection=col201302 shard=col201302s06
  instanceDir=/solr/col201302/col201302s06sh01 name=col201302s06sh01
  dataDir=/solr/col201302/col201302s06sh01/data/
  core collection=col201303 shard=col201303s01
  instanceDir=/solr/col201303/col201303s01sh01 name=col201303s01sh01
  dataDir=/solr/col201303/col201303s01sh01/data/
  core collection=col201303 shard=col201303s08
  instanceDir=/solr/col201303/col201303s08sh01 name=col201303s08sh01
  dataDir=/solr/col201303/col201303s08sh01/data/
  core collection=col201304 shard=col201304s03
  instanceDir=/solr/col201304/col201304s03sh01 name=col201304s03sh01
  dataDir=/solr/col201304/col201304s03sh01/data/
  core collection=col201304 shard=col201304s10
  instanceDir=/solr/col201304/col201304s10sh01 name=col201304s10sh01
  dataDir=/solr/col201304/col201304s10sh01/data/
  /cores
  /solr
 
  Thanks
  Chris




Re: Querying a transitive closure?

2013-03-27 Thread Otis Gospodnetic
Hi Jack,

I don't fully understand the exact taxonomy structure and your needs,
but in terms of reducing the number of HTTP round trips, you can do it
by writing a custom SearchComponent that, upon getting the initial
request, does everything locally, meaning that it talks to the
local/specified shard before returning to the caller.  In SolrCloud
setup with N shards, each of these N shards could be queried in such a
way in parallel, running query/queries on their local shards.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote:
 Hi Otis,

 I fully expect to grow to SolrCloud -- many shards. For now, it's
 solo. But, my thinking relates to cloud. I look for ways to reduce the
 number of HTTP round trips through SolrJ. Maybe you have some ideas?

 Thanks
 Jack

 On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
 Hi Jack,

 Is this really about HTTP and Solr vs. SolrCloud or more whether
 Solr(Cloud) is the right tool for the job and if so how to structure
 the schema and queries to make such lookups efficient?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org wrote:
 This is a question about isA?

 We want to know if M isA B   isA?(M,B)

 For some M, one might be able to look into M to see its type or which
 class(es) for which it is a subClass. We're talking taxonomic queries
 now.
 But, for some M, one might need to ripple up the transitive closure,
 looking at all the super classes, etc, recursively.

 It seems unreasonable to do that over HTTP; it seems more reasonable
 to grab a core and write a custom isA query handler. But, how do you
 do that in a SolrCloud?

 Really curious...

 Many thanks in advance for ideas.
 Jack


Re: Querying a transitive closure?

2013-03-27 Thread Jack Park
Hi Otis,
That's essentially the answer I was looking for: each shard (are we
talking master + replicas?) has the plug-in custom query handler.  I
need to build it to find out.

What I mean is that there is a taxonomy, say one with a single root
for sake of illustration, which grows all the classes, subclasses, and
instances. If I have an object that is somewhere in that taxonomy,
then it has a zigzag chain of parents up that tree (I've seen that
called a transitive closure. If class B is way up that tree from M,
no telling how many queries it will take to find it.  Hmmm...
recursive ascent, I suppose.

Many thanks
Jack

On Wed, Mar 27, 2013 at 6:52 PM, Otis Gospodnetic
otis.gospodne...@gmail.com wrote:
 Hi Jack,

 I don't fully understand the exact taxonomy structure and your needs,
 but in terms of reducing the number of HTTP round trips, you can do it
 by writing a custom SearchComponent that, upon getting the initial
 request, does everything locally, meaning that it talks to the
 local/specified shard before returning to the caller.  In SolrCloud
 setup with N shards, each of these N shards could be queried in such a
 way in parallel, running query/queries on their local shards.

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 3:11 PM, Jack Park jackp...@topicquests.org wrote:
 Hi Otis,

 I fully expect to grow to SolrCloud -- many shards. For now, it's
 solo. But, my thinking relates to cloud. I look for ways to reduce the
 number of HTTP round trips through SolrJ. Maybe you have some ideas?

 Thanks
 Jack

 On Wed, Mar 27, 2013 at 10:04 AM, Otis Gospodnetic
 otis.gospodne...@gmail.com wrote:
 Hi Jack,

 Is this really about HTTP and Solr vs. SolrCloud or more whether
 Solr(Cloud) is the right tool for the job and if so how to structure
 the schema and queries to make such lookups efficient?

 Otis
 --
 Solr  ElasticSearch Support
 http://sematext.com/





 On Wed, Mar 27, 2013 at 12:53 PM, Jack Park jackp...@topicquests.org 
 wrote:
 This is a question about isA?

 We want to know if M isA B   isA?(M,B)

 For some M, one might be able to look into M to see its type or which
 class(es) for which it is a subClass. We're talking taxonomic queries
 now.
 But, for some M, one might need to ripple up the transitive closure,
 looking at all the super classes, etc, recursively.

 It seems unreasonable to do that over HTTP; it seems more reasonable
 to grab a core and write a custom isA query handler. But, how do you
 do that in a SolrCloud?

 Really curious...

 Many thanks in advance for ideas.
 Jack


Re: Solr sorting and relevance

2013-03-27 Thread Joel Bernstein
It sounds like you might be able to get the mix you want with three
different boosts:

1) High boost on title
2) Lower boost on description
3) Function query boost on inventory

The high boost on title will help push products with matches in the title
to the top. The function query boost on inventory will help move higher
inventory to the top. You can also use the QueryElevationComponent to move
specific docs to the top for specific queries but this might not be
effective for your use case.

There is also a patch (SOLR-4465) which is experimental at this point but
is designed for people to move custom sort algorithms into Solr through
custom collectors. This is an advanced approach and would take a strong
understanding Lucene collectors.




On Wed, Mar 27, 2013 at 9:02 PM, scallawa dami...@altrec.com wrote:

 We are using solr for search on our ecommerce site that primarily sells
 clothing.  We index search terms based on a title field and description
 field.

 We want to be able to sort by most relevant and what we have more inventory
 (there is a field for that).  We have done some coding outside of Solr to
 try and achieve this but it causes the following problem.

 Let's take jeans and boots as an example.  A customer might search on boots
 and solr returns a bunch of boots and jeans.  The jeans are included
 because
 the description might contain some data like pant legs fits easily over
 boots.  Now if we have more inventory in the particular jeans than the
 boots
 solr returned, the user will get back a list that shows mostly jeans at top
 and then somewhere down the list boots will show up.

 There isn't a problem with the jeans showing up but the boots should
 actually be displayed first with the ones having the most inventory then
 the
 jeans can be somewhere at the bottom of the list.

 I want to eliminate the hacks that have been done to try to incorporate
 inventory, i.e. have solr return the results and not manipulate it in code.

 I hope I have explained the problem enough for you to get the gist of what
 I
 am trying to accomplish.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-sorting-and-relevance-tp4051918.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Joel Bernstein
Professional Services LucidWorks


Re: Could not load config for solrconfig.xml

2013-03-27 Thread A. Lotfi


 
Hi Hoss,

Thank you for replying to my question,

The solrconfig.xml in the example-DIH in solr download is exactly the same like 
the links you posted in your reply, so where is the big difference ?

I think I typed a mistake in my last question, instead of saying 
db-data-config.xml I said solrconfig.xml.

but still did not understand where that exception come from.
Your helps will be appreciated.

Abdel.


 From: Chris Hostetter hossman_luc...@fucit.org
To: gene...@lucene.apache.org gene...@lucene.apache.org; A. Lotfi 
majidna...@yahoo.com 
Sent: Wednesday, March 27, 2013 6:00 PM
Subject: Re: Could not load config for solrconfig.xml
 

1) the email list you want to be using is solr-user@lucene, not 
general@lucene

2) there is a big differnece between solrconfig.xml (which controls in 
general how solr works for managing a SolrCore); and the config files 
for DIH (which can be used to tell Solr where/how to fetch data to index) 
typically called data-config.xml (but you can name them anything you 
want).

what you have described below is a data config file for DIH, if you are 
trying to use it as a solrconfig.xml file you aren't going to get very 
far.

I suggest you take a gandar at the example config set for using DIH with a 
database...

https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/
https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/db/conf/
https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/db/conf/solrconfig.xml?view=markup
https://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/example/example-DIH/solr/db/conf/db-data-config.xml?view=log

...and keep them in mind while reviewing the DIH docs...

http://wiki.apache.org/solr/DataImportHandler



-Hoss

Could not load config for solrconfig.xml

2013-03-27 Thread A. Lotfi
Hi,
I am trying solr with an oracle database, It's working but I have on the top of 
the page an exception :

SolrCore Initialization Failures 
solr: 
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not load config for solrconfig.xml


Here is my db-data-config.xml :

dataConfig
    dataSource driver=oracle.jdbc.OracleDriver
            url=jdbc:oracle:thin:@ourIPaddress:1521:ourDB
            user=username
                password=password/
    document
        entity name=residential query=select * from 
tsunami.consumer_data_01 where state='MA'
                deltaQuery=select  LEMSMATCHCODE, STREETNAME from residential 
where last_modified  '${dataimporter.last_index_time}'
field column=LEMSMATCHCODE name=lemsmatchcode /
            field column=STREETNAME name=streetname /
        /entity
    /document
/dataConfig

Thanks, your help is appreciated.

Re: Too many fields to Sort in Solr

2013-03-27 Thread adityab
Hi Joel, 
you are correct, boost function populates the field cache. Well i am not
aware of docValue, so while trying the example you provided i see the error
when i define the field type 

Caused by: org.apache.solr.common.SolrException: FieldType 'dvLong' is
configured with a docValues format, but the codec does not support it: class
org.apache.solr.core.SolrCore$3
at org.apache.solr.core.SolrCore.initCodec(SolrCore.java:854)
at org.apache.solr.core.SolrCore.init(SolrCore.java:719)
... 13 more

My field defination: 
fieldType name=dvLong class=solr.TrieLongField precisionStep=0
positionIncrementGap=0 docValuesFormat=Disk/

what am i missing here?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Too-many-fields-to-Sort-in-Solr-tp4049139p4051960.html
Sent from the Solr - User mailing list archive at Nabble.com.