RE: Solr hardware memory question

2013-12-12 Thread Hoggarth, Gil
Thanks for this - I haven't any previous experience with utilising SSDs in the 
way you suggest, so I guess I need to start learning! And thanks for the 
Danish-webscale URL, looks like very informed reading. (Yes, I think we're 
working in similar industries with similar constraints and expectations).

Compiliing my answers into one email, " Curious how many documents per shard 
you were planning? The number of documents per shard and field type will drive 
the amount of a RAM needed to sort and facet."
- Number of documents per shard, I think about 200 million. That's a bit of a 
rough estimate based on other Solrs we run though. Which I think means we hold 
a lot of data for each document, though I keep arguing to keep this to the 
truly required minimum. We also have many facets, some of which are pretty 
large (I'm stretching my understanding here but I think most documents have 
many 'entries' in many facets so these really hit us performance-wise.)

I try to keep a 1-to-1 ratio of Solr nodes to CPUs with a few spare for the 
operating system. I utilise MMapDirectory to manage memory via the OS. So at 
this moment I guessing that we'll have 56 Solr dedicated CPUs across 2 physical 
32 CPU servers and _hopefully_ 256GB RAM on each. This would give 28 shards and 
each would have 5GB java memory (in Tomcat), leaving 126GB on each server for 
the OS and MMap. (I believe the Solr theory for this doesn't accurately work 
out but we can accept the edge cases where this will fail.)

I can also see that our hardware requirements will also depend on usage as well 
as the volume of data, and I've been pondering how best we can structure our 
index/es to facilitate a long term service (which means that, given it's a lot 
of data, I need to structure the data so that new usage doesn't require 
re-indexing.) But at this early stage, as people say, we need to prototype, 
test, profile etc. and to do that I need the hardware to run the trials (policy 
dictates that I buy the production hardware now, before profiling - I get to 
control much of the design and construction so I don't argue with this!) 

Thanks for all the comments everyone, all very much appreciated :)
Gil


-Original Message-
From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Sent: 11 December 2013 12:02
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On Tue, 2013-12-10 at 17:51 +0100, Hoggarth, Gil wrote:
> We're probably going to be building a Solr service to handle a dataset 
> of ~60TB, which for our data and schema typically gives a Solr index 
> size of 1/10th - i.e., 6TB. Given there's a general rule about the 
> amount of hardware memory required should exceed the size of the Solr 
> index (exceed to also allow for the operating system etc.), how have 
> people handled this situation?

By acknowledging that it is cheaper to buy SSDs instead of trying to compensate 
for slow spinning drives with excessive amounts of RAM. 

Our plans for an estimated 20TB of indexes out of 372TB of raw web data is to 
use SSDs controlled by a single machine with 512GB of RAM (or was it 256GB? 
I'll have to ask the hardware guys):
https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

As always YMMW and the numbers you quite elsewhere indicates that your queries 
are quite complex. You might want to be a bit of profiling to see if they are 
heavy enough to make the CPU the bottleneck.

Regards,
Toke Eskildsen, State and University Library, Denmark




RE: Solr hardware memory question

2013-12-10 Thread Hoggarth, Gil
Thanks Shawn. You're absolutely right about the performance balance,
though it's good to hear it from an experienced source (if you don't
mind me calling you that!) Fortunately we don't have a top performance
requirement, and we have a small audience so a low query volume. On
similar systems we're "managing" to just provide a Solr service with a
3TB index size on 160GB RAM, though we have scripts to handle the
occasionally necessary service restart when someone submits a more
exotic query. This, btw, gives a response time of ~45-90 seconds for
uncached queries. My question I suppose comes from my hope that we can
do away with the restart scripts as I doubt they help the Solr service
(as they can if necessary just kill processes and restart), and get to
responses times < 20 seconds.

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 10 December 2013 17:37
To: solr-user@lucene.apache.org
Subject: Re: Solr hardware memory question

On 12/10/2013 9:51 AM, Hoggarth, Gil wrote:
> We're probably going to be building a Solr service to handle a dataset

> of ~60TB, which for our data and schema typically gives a Solr index 
> size of 1/10th - i.e., 6TB. Given there's a general rule about the 
> amount of hardware memory required should exceed the size of the Solr 
> index (exceed to also allow for the operating system etc.), how have 
> people handled this situation? Do I really need, for example, 12 
> servers with 512GB RAM, or are there other techniques to handling
this?

That really depends on what kind of query volume you'll have and what
kind of performance you want.  If your query volume is low and you can
deal with slow individual queries, then you won't need that much memory.
 If either of those requirements increases, you'd probably need more
memory, up to the 6TB total -- or 12TB if you need to double the total
index size for redundancy purposes.  If your index is constantly growing
like most are, you need to plan for that too.

Putting the entire index into RAM is required for *top* performance, but
not for base functionality.  It might be possible to put only a fraction
of your index into RAM.  Only testing can determine what you really need
to obtain the performance you're after.

Perhaps you've already done this, but you should try as much as possible
to reduce your index size.  Store as few fields as possible, only just
enough to build a search result list/grid and retrieve the full document
from the canonical data store.  Save termvectors and docvalues on as few
fields as possible.  If you can, reduce the number of terms produced by
your analysis chains.

Thanks,
Shawn



Solr hardware memory question

2013-12-10 Thread Hoggarth, Gil
We're probably going to be building a Solr service to handle a dataset
of ~60TB, which for our data and schema typically gives a Solr index
size of 1/10th - i.e., 6TB. Given there's a general rule about the
amount of hardware memory required should exceed the size of the Solr
index (exceed to also allow for the operating system etc.), how have
people handled this situation? Do I really need, for example, 12 servers
with 512GB RAM, or are there other techniques to handling this?

 

Many thanks in advance for any general/conceptual/specific
ideas/comments/answers!

Gil

 

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ



RE: How to work with remote solr savely?

2013-11-22 Thread Hoggarth, Gil
You could also use one of the proxy scripts, such as
http://code.google.com/p/solr-php-client/, which is coincidentally
linked (eventually) from Michael's suggested SolrSecurity URL.

-Original Message-
From: michael.boom [mailto:my_sky...@yahoo.com] 
Sent: 22 November 2013 14:53
To: solr-user@lucene.apache.org
Subject: Re: How to work with remote solr savely?

http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication

Maybe you could achieve write/read access limitation by setting path
based
authentication:
The update handler "/solr/core/update"  should be protected by
authentication, with credentials only known to you. But then of course,
your indexing client will need to authenticate in order to add docs to
solr.
Your select handler "/solr/core/select" could then be open or protected
by http auth with credentials open to developers.

That's the first idea that comes to mind - haven't tested it. 
If you do, feedback and let us know how it went.



-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-t
p4102612p4102618.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: How to work with remote solr savely?

2013-11-22 Thread Hoggarth, Gil
We solved this issue outside of Solr. As you've done, restrict the
server to localhost access to Solr, add firewall rules to allow your
developers on port 80, and proxypass allowed port 80 transfer to Solr.
Remember to include the proxypassreverse too.
(This runs on linux and apache httpd btw.)

-Original Message-
From: Stavros Delisavas [mailto:stav...@delisavas.de] 
Sent: 22 November 2013 14:24
To: solr-user@lucene.apache.org
Subject: How to work with remote solr savely?

Hello Solr-Friends,
I have a question about working with solr which is installed on a remote
server.
I have a php-project with a very big mysql-database of about 10gb and I
am also using solr for about 10,000,000 entries indexed for fast search
and access of the mysql-data.
I have a local copy myself so I can continue to work on the php-project
itself, but I want to make it available for more developers too. How can
I make solr accessable ONLY for those exclusive developers? For mysql
it's no problem to add an additional mysql-user with limited access.

But for Solr it seems difficult to me. I have had my administrator
restrict the java-port 8080 to localhost only. That way no one outside
can access solr or the solr-admin interface.
How can I allow access to other developers without making the whole
solr-interface (port 8080) available to the public?

Thanks,

Stavros


RE: Why do people want to deploy to Tomcat?

2013-11-12 Thread Hoggarth, Gil
For me, a side-affect of 'example' is that it's just that, not appropriate for 
production. But also, there's the organisation factor beyond Solr that is about 
staff expertise - we don't have any systems that utilise jetty so we're 
unfamiliar with its configuration, issues, or oddities. Tomcat is our defacto 
container so it makes sense for us to implement Solr within Tomcat.

If we ruled out these reasons, I'd still be looking for a container that:
- was a standalone installation (i.e., outside of Solr tarball) so that it 
would be "managed" via yum (we run on RHEL). This separates any issues of Solr 
from issues of jetty, which given a current lack of jetty knowledge would be a 
helpful thing.
- the container service could be managed via standard SysV startup processes. 
To be fair, I've implemented our own for Tomcat and could do this for jetty, 
but I'd prefer jetty included this (which would suggest it is more prepared for 
enterprise use).
- Likewise, I assume all of jetty's configuration can be reset to use normal 
RHEL /etc/ and /var/ directories, but I'd prefer that jetty did this for me (to 
demonstrate again it's enterprise-ready status).

Yes, I could do all the necessary bespoke configuration so that jetty follows 
the above reasons, but because I'd have to I question if it's ready for our 
enterprise setup (which mainly means that our Operations team will fight 
against unusual configurations).

Having added all of this, I have to admit that I like the idea of using jetty 
because you guys tell me that Solr is affectively pre-configured for jetty. But 
then I'd want to know what in particular these jetty configurations were!

BTW Very pleased that this is being discussed - the views can help me argue our 
case to use jetty if it is indeed more beneficial to do so.

Gil

-Original Message-
From: Sebastián Ramírez [mailto:sebastian.rami...@senseta.com] 
Sent: 12 November 2013 13:38
To: solr-user@lucene.apache.org
Subject: Re: Why do people want to deploy to Tomcat?

I agree with Doug, when I started I had to spend some time figuring out what 
was just an "example" and what I would have to change in a "production" 
environment... until I found that all the "example" was ready for production.

Of course, you commonly have to change the settings, parameters, fields, etc. 
of your Solr system, but the "example" doesn't have anything that is not for 
production.


Sebastián Ramírez
[image: SENSETA – Capture & Analyze] 


On Tue, Nov 12, 2013 at 8:18 AM, Amit Aggarwal wrote:

> Agreed with Doug
> On 12-Nov-2013 6:46 PM, "Doug Turnbull" < 
> dturnb...@opensourceconnections.com>
> wrote:
>
> > As an aside, I think one reason people feel compelled to deviate 
> > from the distributed jetty distribution is because the folder is named 
> > "example".
> > I've had to explain to a few clients that this is a bit of a misnomer.
> The
> > IT dept especially sees "example" and feels uncomfortable using that 
> > as a starting point for a jetty install. I wish it was called 
> > "default" or
> "bin"
> > or something where its more obviously the default jetty distribution 
> > of Solr.
> >
> >
> > On Tue, Nov 12, 2013 at 7:06 AM, Roland Everaert 
> >  > >wrote:
> >
> > > In my case, the first time I had to deploy and configure solr on 
> > > tomcat (and jboss) it was a requirement to reuse as much as 
> > > possible the application/web server already in place. The next 
> > > deployment I also use tomcat, because I was used to deploy on 
> > > tomcat and I don't know jetty
> at
> > > all.
> > >
> > > I could ask the same question with regard to jetty. Why 
> > > use/bundle(/ if
> > not
> > > recommend) jetty with solr over other webserver solutions?
> > >
> > > Regards,
> > >
> > >
> > > Roland Everaert.
> > >
> > >
> > >
> > > On Tue, Nov 12, 2013 at 12:33 PM, Alvaro Cabrerizo 
> > >  > > >wrote:
> > >
> > > > In my case, the selection of the servlet container has never 
> > > > been a
> > hard
> > > > requirement. I mean, some customers provide us a virtual machine
> > > configured
> > > > with java/tomcat , others have a tomcat installed and want to 
> > > > share
> it
> > > with
> > > > solr, others prefer jetty because their sysadmins are used to
> configure
> > > > it...  At least in the projects I've been working in, the 
> > > > selection
> of
> > > the
> > > > servlet engine has not been a key factor in the project success.
> > > >
> > > > Regards.
> > > >
> > > >
> > > > On Tue, Nov 12, 2013 at 12:11 PM, Andre Bois-Crettez
> > > > wrote:
> > > >
> > > > > We are using Solr running on Tomcat.
> > > > >
> > > > > I think the top reasons for us are :
> > > > >  - we already have nagios monitoring plugins for tomcat that 
> > > > > trace queries ok/error, http codes / response time etc in 
> > > > > access logs,
> > number
> > > > > of threads, jvm memory usage etc
> > > > >  - start, stop, watchdogs, logs : we also use our standard 
> > > > > tools
> for
> > > that
> > > 

RE: How to cancel a collection 'optimize'?

2013-11-11 Thread Hoggarth, Gil
Hi Otis, thanks for the response. I could stop the whole Solr service as
as yet there's no audience access to it, but might it be left in an
incomplete state and thus try to complete optimisation when the service
is restarted?

[Yes, we did speak in Dublin - you can see we need that monitoring
service! Must set up the demo version, asap!]

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: 11 November 2013 16:02
To: solr-user@lucene.apache.org
Subject: Re: How to cancel a collection 'optimize'?

Hi Gil,
(we spoke in Dublin, didn't we?)

Short of stopping Solr I have a feeling there isn't much you can do
hm. or, I wonder if you could somehow get a thread dump, get the PID
of the thread (since I believe threads in Linux are run as processes),
and then kill that thread... Feels scary and I'm not sure what this
might do to the index, but maybe somebody else can jump in and comment
on this approach or suggest a better one.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr &
Elasticsearch Support * http://sematext.com/


On Mon, Nov 11, 2013 at 10:44 AM, Hoggarth, Gil 
wrote:
> We have an internal Solr collection with ~1 billion documents. It's 
> split across 24 shards and uses ~3.2TB of disk space. Unfortunately 
> we've triggered an 'optimize' on the collection (via a restarted 
> browser tab), which has raised the disk usage to 4.6TB, with 130GB 
> left on the disk volume.
>
>
>
> As I fully expect Solr to use up all of the disk space as the 
> collection is more than 50% of the disk volume, how can I cancel this 
> optimize? And separately, if I were to reissue with maxSegments=(high 
> number, eg 40), should I still expect the same disk usage? (I'm 
> presuming so as doesn't it need to gather the whole index to determine

> which docs should go into which segments?)
>
>
>
> Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.
>
>
>
> (Great conference last week btw - so much to learn!)
>
>
>
>
>
> Gil Hoggarth
>
> Web Archiving Technical Services Engineer
>
> The British Library, Boston Spa, West Yorkshire, LS23 7BQ
>
> Tel: 01937 546163
>
>
>


How to cancel a collection 'optimize'?

2013-11-11 Thread Hoggarth, Gil
We have an internal Solr collection with ~1 billion documents. It's
split across 24 shards and uses ~3.2TB of disk space. Unfortunately
we've triggered an 'optimize' on the collection (via a restarted browser
tab), which has raised the disk usage to 4.6TB, with 130GB left on the
disk volume.

 

As I fully expect Solr to use up all of the disk space as the collection
is more than 50% of the disk volume, how can I cancel this optimize? And
separately, if I were to reissue with maxSegments=(high number, eg 40),
should I still expect the same disk usage? (I'm presuming so as doesn't
it need to gather the whole index to determine which docs should go into
which segments?)

 

Solr 4.4 on RHEL6.4, 160GB RAM, 5GB per shard.

 

(Great conference last week btw - so much to learn!)

 

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

Tel: 01937 546163

 



RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
I think my question is easier, because I think the problem below was
caused by the very first startup of the 'ldwa01' collection/'ldwa01cfg'
zk collection name didn't specify the number of shards (and thus
defaulted to 1).

So, how can I change the number of shards for an existing collection/zk
collection name, especially when the ZK ensemble in question is the
production version and supporting other Solr collections that I do not
want to interrupt. (Which I think means that I can't just delete the
clusterstate.json and restart the ZKs as this will also lose the other
Solr collection information.)

Thanks in advance, Gil

-----Original Message-
From: Hoggarth, Gil [mailto:gil.hogga...@bl.uk] 
Sent: 24 October 2013 10:13
To: solr-user@lucene.apache.org
Subject: RE: New shard leaders or existing shard replicas depends on
zookeeper?

Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
" tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start"

- The Solr node dashboard shows "-DnumShards=24" in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil 
wrote:

> Hi solr-users,
>
>
>
> I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

> shed some light on what's happening/how I can correct it.
>
>
>
> We have two physical servers running automated builds of RedHat 6.4 
> and Solr 4.4.0 that host two separate Solr services. The first server 
> (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
> the second server (ld02) also has 24 shards and hosts a different 
> collection called 'ldwa01'. It's evidently important to note that 
> previou

RE: New shard leaders or existing shard replicas depends on zookeeper?

2013-10-24 Thread Hoggarth, Gil
Absolutely, the scenario I'm seeing does _sound_ like I've not specified
the number of shards, but I think I have - the evidence is:
- DnumShards=24 defined within the /etc/sysconfig/solrnode* files

- DnumShards=24 seen on each 'ps' line (two nodes listed here):
" tomcat   26135 1  5 09:51 ?00:00:22 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode1/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode1 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode1/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode1/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode1
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode1/tmp
org.apache.catalina.startup.Bootstrap start
tomcat   26225 1  5 09:51 ?00:00:19 /opt/java/bin/java
-Djava.util.logging.config.file=/opt/tomcat_instances/solrnode2/conf/log
ging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
-Xms512m -Xmx5120m -Dsolr.solr.home=/opt/solrnode2 -Duser.language=en
-Duser.country=uk -Dbootstrap_confdir=/opt/solrnode2/ldwa01/conf
-Dcollection.configName=ldwa01cfg -DnumShards=24
-Dsolr.data.dir=/opt/data/solrnode2/ldwa01/data
-DzkHost=zk01.solr.wa.bl.uk:9983,zk02.solr.wa.bl.uk:9983,zk03.solr.wa.bl
.uk:9983 -Djava.endorsed.dirs=/opt/tomcat/endorsed -classpath
/opt/tomcat/bin/bootstrap.jar:/opt/tomcat/bin/tomcat-juli.jar
-Dcatalina.base=/opt/tomcat_instances/solrnode2
-Dcatalina.home=/opt/tomcat
-Djava.io.tmpdir=/opt/tomcat_instances/solrnode2/tmp
org.apache.catalina.startup.Bootstrap start"

- The Solr node dashboard shows "-DnumShards=24" in its list of Args for
each node

And yet, the ldwa01 nodes are leader and replica of shard 17 and there
are no other shard leaders created. Plus, if I only change the ZK
ensemble declarations in /etc/system/solrnode* to the different dev ZK
servers, all 24 leaders are created before any replicas are added.

I can also mention, when I browse the Cloud view, I can see both the
ldwa01 collection and the ukdomain collection listed, suggesting that
this information comes from the ZKs - I assume this is as expected.
Plus, the correct node addresses (e.g., 192.168.45.17:8984) are listed
for ldwa01 but these addresses are also listed as 'Down' in the ukdomain
collection (except for :8983 which only shows in the ldwa01 collection).

Any help very gratefully received.
Gil

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: 23 October 2013 18:50
To: solr-user@lucene.apache.org
Subject: Re: New shard leaders or existing shard replicas depends on
zookeeper?

My first impulse would be to ask how you created the collection. It sure
_sounds_ like you didn't specify 24 shards and thus have only a single
shard, one leader and 23 replicas

bq: ...to point to the zookeeper ensemble also used for the ukdomain
collection...

so my guess is that this ZK ensemble has the ldwa01 collection defined
as having only one shard

I admit I pretty much skimmed your post though...

Best,
Erick


On Wed, Oct 23, 2013 at 12:54 PM, Hoggarth, Gil 
wrote:

> Hi solr-users,
>
>
>
> I'm seeing some confusing behaviour in Solr/zookeeper and hope you can

> shed some light on what's happening/how I can correct it.
>
>
>
> We have two physical servers running automated builds of RedHat 6.4 
> and Solr 4.4.0 that host two separate Solr services. The first server 
> (called ld01) has 24 shards and hosts a collection called 'ukdomain'; 
> the second server (ld02) also has 24 shards and hosts a different 
> collection called 'ldwa01'. It's evidently important to note that 
> previously both of these physical servers provided the 'ukdomain'
> collection, but the 'ldwa01' server has been rebuilt for the new 
> collection.
>
>
>
> When I start the ldwa01 solr nodes with their zookeeper configuration 
> (defined in /etc/sysconfig/solrnode* and with collection.configName as
> 'ldwa01cfg') pointing to the development zookeeper ensemble, all nodes

> initially become shard leaders and then replicas as I'd expect. But if

> I change the ldwa01 solr nodes to point to the zookeeper ensemble also

> used for the ukdomain collection, all ldwa01 solr nodes start on the 
> same shard (that is, the first ldwa01 solr node becomes the shard 
> leader, then every other solr node becomes a replica for this shard). 
> The significant point here is no other ldwa01 shards gain leaders (or
repl

New shard leaders or existing shard replicas depends on zookeeper?

2013-10-23 Thread Hoggarth, Gil
Hi solr-users,

 

I'm seeing some confusing behaviour in Solr/zookeeper and hope you can
shed some light on what's happening/how I can correct it.

 

We have two physical servers running automated builds of RedHat 6.4 and
Solr 4.4.0 that host two separate Solr services. The first server
(called ld01) has 24 shards and hosts a collection called 'ukdomain';
the second server (ld02) also has 24 shards and hosts a different
collection called 'ldwa'. It's evidently important to note that
previously both of these physical servers provided the 'ukdomain'
collection, but the 'ldwa' server has been rebuilt for the new
collection.

 

When I start the ldwa solr nodes with their zookeeper configuration
(defined in /etc/sysconfig/solrnode* and with collection.configName as
'ldwacfg') pointing to the development zookeeper ensemble, all nodes
initially become shard leaders and then replicas as I'd expect. But if I
change the ldwa solr nodes to point to the zookeeper ensemble also used
for the ukdomain collection, all ldwa solr nodes start on the same shard
(that is, the first ldwa solr node becomes the shard leader, then every
other solr node becomes a replica for this shard). The significant point
here is no other ldwa shards gain leaders (or replicas).

 

The ukdomain collection uses a zookeeper collection.configName of
'ukdomaincfg', and prior to the creation of this ldwa service the
collection.configName of 'ldwacfg' has never previously been used. So
I'm confused why the ldwa service would differ when the only difference
is which zookeeper ensemble is used (both zookeeper ensembles are
automatedly built using version 3.4.5).

 

If anyone can explain why this is happening and how I can get the ldwa
services to start correctly using the non-development zookeeper
ensemble, I'd be very grateful! If more information or explanation is
needed, just ask.

 

Thanks, Gil

 

Gil Hoggarth

Web Archiving Technical Services Engineer 

The British Library, Boston Spa, West Yorkshire, LS23 7BQ

 



RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil
Thanks for your response Shawn, very much appreciated.
Gil

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: 16 May 2013 15:59
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.3.0: Shard instances using incorrect data directory
on machine boot

> The dataDir is set in each solrconfig.xml; each one has been checked 
> to ensure it points to its corresponding location. The error we see is

> that on machine reboot not all of the shards start successfully, and 
> if the fail was to be a leader the replicas can't take its place 
> (presumably because the leader incorrect data directory is 
> inconsistent with their own).

Although you can set the dataDir in solrconfig.xml, I would strongly
recommend that you don't.

If you are using the old-style solr.xml (which has cores and core tags)
then set the dataDir in each core tag in solr.xml. This gets read and
set before the core is created, so there's less chance of it getting
scrambled. The solrconfig is read as part of core creation.

If you are using the new style solr.xml (new with 4.3.0) then you'll
need absolute dataDir paths, and they need to go in each core.properties
file.
Due to a bug, relative paths won't work as expected. I need to see if I
can make sure the fix makes it into 4.3.1.

If moving dataDir out of solrconfig.xml fixes it, then we probably have
a bug.

Yout Zookeeper problems might be helped by increasing zkClientTimeout.

Thanks,
Shawn




RE: Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil
Thanks for your reply Daniel.

The dataDir is set in each solrconfig.xml; each one has been checked to
ensure it points to its corresponding location. The error we see is that
on machine reboot not all of the shards start successfully, and if the
fail was to be a leader the replicas can't take its place (presumably
because the leader incorrect data directory is inconsistent with their
own).

More detail that I can add is that the catalina.out log for failed
shards reports:
May 15, 2013 5:56:02 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@524e13f6]) and a
value of type
[org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value
[org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads
are going to be renewed over time to try and avoid a probable memory
leak.

This doesn't (to me) relate to the problem, but that doesn't necessarily
mean it's not. Plus, it's the only SEVERE reported and only reported in
the failed shard catalina.out log.

Checking the zookeeper logs, we're seeing:
2013-05-16 13:25:46,839 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@762] - Connection broken for
id 3, my id = 1, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(Quoru
mCnxManager.java:747)
2013-05-16 13:25:46,841 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@765] - Interrupting SendWorker
2013-05-16 13:25:46,842 [myid:1] - WARN
[SendWorker:3:QuorumCnxManager$SendWorker@679] - Interrupted while
waiting for message on queue
java.lang.InterruptedException
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.re
portInterruptAfterWait(AbstractQueuedSynchronizer.java:2017)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.aw
aitNanos(AbstractQueuedSynchronizer.java:2095)
at
java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389
)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(Quorum
CnxManager.java:831)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnx
Manager.java:62)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(Quoru
mCnxManager.java:667)
2013-05-16 13:25:46,843 [myid:1] - WARN
[SendWorker:3:QuorumCnxManager$SendWorker@688] - Send worker leaving
thread

This is I think as separate issue in that this happens immediately after
I restart a zookeeper. (I.e., I see this in a log, restart that
zookeeper, and immediately see a similar issue in one of the other two
zookeeper logs).



-Original Message-
From: Daniel Collins [mailto:danwcoll...@gmail.com] 
Sent: 16 May 2013 13:28
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.0: Shard instances using incorrect data directory
on machine boot

What actual error do you see in Solr?  Is there an exception and if so,
can you post that?  As I understand it, datatDir is set from the
solrconfig.xml file, so either your instances are picking up the "wrong"
file, or you have some override which is incorrect?  Where do you set
solr.data.dir, at the environment when you start Solr or in solrconfig?


On 16 May 2013 12:23, Hoggarth, Gil  wrote:

> Hi all, I hope you can advise a solution to our incorrect data 
> directory issue.
>
>
>
> We have 2 physical servers using Solr 4.3.0, each with 24 separate 
> tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a

> solr shard in each. This configuration means that each shard has its 
> own data directory declared. (Server OS, tomcat and solr, including 
> shards, created via automated builds.)
>
>
>
> That is, for example,
>
> - tomcat instance, /var/local/tomcat/solrshard3/, port 8985
>
> - corresponding solr instance, /usr/local/solrshard3/, with 
> /usr/local/solrshard3/collection1/conf/solrconfig.xml
>
> - corresponding solr data directory,
> /var/local/solrshard3/collection1/data/
>
>
>
> We process ~1.5 billion documents, which is why we use so 48 shards 
> (24 leaders, 24 replicas). These physical servers are rebooted 
> regularly to fsck their drives. When rebooted, we always see several 
> (~10-20) shards failing to start (UI cloud view shows them as 'Down'
or 'Recovering'
> though they never recover without intervention), though there is not a

> pattern to which shards fail to start - we haven't recorded any that 
> always or never fail. On inspection, the UI dashboard for these failed

> shards displays,

Solr 4.3.0: Shard instances using incorrect data directory on machine boot

2013-05-16 Thread Hoggarth, Gil
Hi all, I hope you can advise a solution to our incorrect data directory
issue.

 

We have 2 physical servers using Solr 4.3.0, each with 24 separate
tomcat instances (RedHat 6.4, java 1.7.0_10-b18, tomcat 7.0.34) with a
solr shard in each. This configuration means that each shard has its own
data directory declared. (Server OS, tomcat and solr, including shards,
created via automated builds.) 

 

That is, for example,

- tomcat instance, /var/local/tomcat/solrshard3/, port 8985

- corresponding solr instance, /usr/local/solrshard3/, with
/usr/local/solrshard3/collection1/conf/solrconfig.xml

- corresponding solr data directory,
/var/local/solrshard3/collection1/data/

 

We process ~1.5 billion documents, which is why we use so 48 shards (24
leaders, 24 replicas). These physical servers are rebooted regularly to
fsck their drives. When rebooted, we always see several (~10-20) shards
failing to start (UI cloud view shows them as 'Down' or 'Recovering'
though they never recover without intervention), though there is not a
pattern to which shards fail to start - we haven't recorded any that
always or never fail. On inspection, the UI dashboard for these failed
shards displays, for example:

- HostServer1

- Instance/usr/local/sholrshard3/collection1

- Data/var/local/solrshard6/collection1/data

- Index  /var/local/solrshard6/collection1/data/index

 

To fix such failed shards, I manually restart the shard leader and
replicas, which fixes the issue. However, of course, I would like to
know a permanent cure for this, not a remedy.

 

We use a separate zookeeper service, spread across 3 Virtual Machines
within our private network of ~200 servers (physical and virtual).
Network traffic is constant but relatively little across 1GB bandwidth.

 

Any advice or suggestions greatly appreciated.

Gil

 

Gil Hoggarth

Web Archiving Engineer

The British Library, Boston Spa, West Yorkshire, LS23 7BQ