Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread anand.mahajan
Okay - Thanks for the confirmation Shalin.  Could this be a feature request
in the Collections API - that we have a Split shard dry run API that accepts
sub-shards count as a request param and returns the optimal shard ranges for
the number of sub-shards requested to be created along with the respective
document counts for each of the sub-shards? The users can then use this
shard ranges for the actual split?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Finding out optimal hash ranges for shard split

2015-05-06 Thread anand.mahajan
Yes - I'm using 2 level composite ids and that has caused the imbalance for
some shards.
Its cars data and the composite ids are of the form year-make!model-and
couple of other specifications. e.g. 2013Ford!Edge!123456 - but there are
just far too many Ford 2013 or 2011 cars that go and occupy the same shards.
This was done so as co-location of these docs is required as well for a few
of the search requirements - to avoid it hitting all shards all the time and
all queries do have the year and make combinations always specified and its
easier to work out the target shard for the query.

Regarding storing the hash against each document and then querying to find
out the optimal ranges - could it be done so that Solr maintains incremental
counters for each of the hash in the range for the shard - and then the
collections Splitshard API could use this internally to propose the optimal
shard ranges for the split? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204124.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Finding out optimal hash ranges for shard split

2015-05-05 Thread anand.mahajan
Looks like its not possible to find out the optimal hash ranges for a split
before you actually split it. So the only way out is to keep splitting out
the large subshards?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204045.html
Sent from the Solr - User mailing list archive at Nabble.com.


Finding out optimal hash ranges for shard split

2015-05-03 Thread anand.mahajan
Hi all,

Before doing a splitshard - Is there a way to figure out optimal hash ranges
for the shard that will evenly split the documents on the new sub-shards
that get created? Sort of a dry-run to the actual split shard command with
ranges parameter specified with it that just shows the number of docs that
will reside on the new sub-shards if the split shard command was executed
with a given hash range? 

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Leaders in Recovery Failed state

2015-02-09 Thread anand.mahajan
Erick Erickson erickerickson at gmail.com writes:

 
 What version of Solr?
 
 On Tue, Jan 20, 2015 at 7:07 AM, anand.mahajan anand at zerebral.co.in
 wrote:
  Hi all,
 
 
  I have a cluster with 36 Shards and 3 replica per shard. I had to
 recently
  restart the entire cluster - most of the shards  replica are back up -
 but
  a few shards have not had any leaders for a long long time (close to 18
  hours now) - I tried reloading these cores and even the servlet
 containers
  hosting these cores. Its only now that all the shards have leaders
 allocated
  - but few of these Leaders are still shown as Recovery Failed status on
 the
  Solr Cloud tree view.
 
 
  I see the following in the logs for these shards -
  INFO  - 2015-01-20 14:38:19.797;
  org.apache.solr.handler.admin.CoreAdminHandler; In
 WaitForState(recovering):
  collection=collection1, shard=shard1,
 thisCore=collection1_shard1_replica3,
  leaderDoesNotNeedRecovery=false, isLeader? true, live=true,
 checkLive=true,
  currentState=recovering, localState=recovery_failed,
  nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2,
  onlyIfActiveCheckResult=true, nodeProps:
 
 core_node2:{state:recovering,core:collection1_shard1_replica1,node_name:10.68.77.9:8983_solr,base_url:http://10.68.77.9:8983/solr}
 
 
  And on other server hosting the replica for this shard -
  ERROR - 2015-01-20 14:38:20.768; org.apache.solr.common.SolrException;
  org.apache.solr.common.SolrException: I was asked to wait on state
  recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I still
 do
  not see the requested state. I see state: recovering live:true leader
 from
  ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/
  at
 
 org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999)
  at
 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245)
  at
 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
  at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
  at
 
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
  at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
  at org.eclipse.jetty.server.Server.handle(Server.java:368)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
  at
 
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
  at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
  at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
  at
 
 org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
  at
 
 org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
  at java.lang.Thread.run(Unknown Source)
 
 
  I see that there is no replica catch-up going on between any of these
  servers now.
  Couple of questions -
  1. What

Delete Replica API Async Calls not being processed

2015-02-09 Thread anand.mahajan
Hi,

I needed to delete a couple replica for a shard and used the Async
Collections API calls to do that. I see all my requests in the 'submitted'
state but none have been processed yet. (been 4 hours or so)

How do I know whether these requests are under process at all? And if
required how could I delete these now? I'm using Solr 4.10

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Delete-Replica-API-Async-Calls-not-being-processed-tp4184998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Leaders in Recovery Failed state

2015-01-20 Thread anand.mahajan
Hi all,


I have a cluster with 36 Shards and 3 replica per shard. I had to recently
restart the entire cluster - most of the shards  replica are back up - but
a few shards have not had any leaders for a long long time (close to 18
hours now) - I tried reloading these cores and even the servlet containers
hosting these cores. Its only now that all the shards have leaders allocated
- but few of these Leaders are still shown as Recovery Failed status on the
Solr Cloud tree view.


I see the following in the logs for these shards - 
INFO  - 2015-01-20 14:38:19.797;
org.apache.solr.handler.admin.CoreAdminHandler; In WaitForState(recovering):
collection=collection1, shard=shard1, thisCore=collection1_shard1_replica3,
leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true,
currentState=recovering, localState=recovery_failed,
nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2,
onlyIfActiveCheckResult=true, nodeProps:
core_node2:{state:recovering,core:collection1_shard1_replica1,node_name:10.68.77.9:8983_solr,base_url:http://10.68.77.9:8983/solr}


And on other server hosting the replica for this shard - 
ERROR - 2015-01-20 14:38:20.768; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: I was asked to wait on state
recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I still do
not see the requested state. I see state: recovering live:true leader from
ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/
at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)


I see that there is no replica catch-up going on between any of these
servers now. 
Couple of questions - 
1. What is it that the Solr cloud is waiting on to allocate the leaders for
such shards?
2. Why are few of these shards show leaders in Recovery Failed state? And
how do I recover such shards?

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Leaders-in-Recovery-Failed-state-tp4180611.html
Sent from the Solr - User mailing list archive at Nabble.com.


Leaders in Recovery Failed state

2015-01-20 Thread anand.mahajan
Hi all,I have a cluster with 36 Shards and 3 replica per shard. I had to
recently restart the entire cluster - most of the shards  replica are back
up - but a few shards have not had any leaders for a long long time (close
to 18 hours now) - I tried reloading these cores and even the servlet
containers hosting these cores. Its only now that all the shards have
leaders allocated - but few of these Leaders are still shown as Recovery
Failed status on the Solr Cloud tree view.I see the following in the logs
for these shards - INFO  - 2015-01-20 14:38:19.797;
org.apache.solr.handler.admin.CoreAdminHandler; In WaitForState(recovering):
collection=collection1, shard=shard1, thisCore=collection1_shard1_replica3,
leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true,
currentState=recovering, localState=recovery_failed,
nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2,
onlyIfActiveCheckResult=true, nodeProps:
core_node2:{state:recovering,core:collection1_shard1_replica1,node_name:10.68.77.9:8983_solr,base_url:http://10.68.77.9:8983/solr}And
on other server hosting the replica for this shard - ERROR - 2015-01-20
14:38:20.768; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: I was asked to wait on state
recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I still do
not see the requested state. I see state: recovering live:true leader from
ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245)
at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)  at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)   at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)I see that there is no replica
catch-up going on between any of these servers now. Couple of questions - 1.
What is it that the Solr cloud is waiting on to allocate the leaders for
such shards?2. Why are few of these shards show leaders in Recovery Failed
state? And how do I recover such shards?Thanks,Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Leaders-in-Recovery-Failed-state-tp4180610.html
Sent from the Solr - User mailing list archive at Nabble.com.

SolrCloud Slow to boot up

2014-09-25 Thread anand.mahajan
Hello all,

Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each - 108 cores
across 6 servers. Moved in about 250M documents in this cluster. When I
restart this cluster - only the leaders per shard comes up live instantly
(within a minute) and all the replicas are shown as Recovering on the Cloud
screen and all 6 servers are doing some processing (consuming about 4 CPUs
at the back and doing a lot of Network IO too) In essence its not doing any
reads are writes to the index and I dont see any replication/catch up
activity going on too at the back, yet the RAM grows consuming all 96GB
available on each box. And all the Recovering replicas recover one by one in
about an hour or so. Why is it taking so long to boot up, and what is it
doing that is consuming so much CPU, RAM and Network IO? All disks are
reading at 100% on all servers during this boot up. Is there are setting I
might have missed that will help?  

FYI - The Zookeeper cluster is on the same 6 boxes.  Size of the Solr data
dir is about 150GB per server and each box has 96GB RAM.

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Slow to boot up

2014-09-25 Thread anand.mahajan
1. I've hosted it with Helios v 0.07 that ships with Solr 4.10
2. Change to solrconfig.xml - 
   a. commits every 10 mins
   b. soft commits every 10 secs
   c. disabled all caches as the usage is very random (no end users only
services doing the searches) and mostly single requests
   d. use cold searcher = true



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098p4161132.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-10 Thread anand.mahajan
Hello all,

Thank you for your suggestions. With the autoCommit (every 10 mins) and
softCommit (every 10 secs) frequencies reduced things work much better now.
The CPU usages has gone down considerably too (by about 60%) and the
read/write throughput is showing considerable improvements too. 

There are a certain shards that are giving poor response times - these have
over 10M listings - I guess this is due to the fact that these are starving
for RAM? Would it help if I split these up in smaller shards, but with the
existing set of hardware? (I cannot allocate more machines to the cloud as
yet)

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4152239.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-02 Thread anand.mahajan
Thank you everyone for your responses. Increased the hard commit to 10mins
and autoSoftCommit to 10 secs. (I wont really need a real time get - tweaked
the app code to cache the doc and use the app side cached version instead of
fetching it from Solr) Will watch it for a day or two and clock the
throughput.

For this deployment the peak is throughout the day as more data keeps
streaming in - there are no direct users with search queries here (as of
now) - but every incoming doc is compared against the existing set of docs
in Solr - to check whether its a new one or an updated version of an
existing one and only then the doc is inserted/updated. Right now its adding
about 1100 docs a minute (~20 docs a second) [But thats because it has to
run a search before to determine whether its an insert/update]

Also, since there are already 18 JVMs per machine - How do I go about
merging these existing cores under just 1 JVM? Would it be that I'd need to
create 1 Solr instance with 18 cores inside and then migrate data from these
separate JVMs into the new instance?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-02 Thread anand.mahajan
Thanks Shawn. I'm using 2 level composite id routing right now. These are all
Used Cars listings and all search queries always have car year and make in
the search criteria - hence that made sense to have Year+Make as level 1 in
the composite id. Beyond that the second level composite id is based on
about 8 car attributes and that means all listings for a similar type of car
and listings of any car are grouped together and co-located in the
SlorCloud. Even with this there is still an imbalance in the cluster - as
certain car makes are popular and there are more listings for such cars that
go the same shard. Will splitting these up with the existing set of hardware
help at all?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150811.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud Scale Struggle

2014-08-01 Thread anand.mahajan
Hello all,

Struggling to get this going with SolrCloud - 

Requirement in brief :
 - Ingest about 4M Used Cars listings a day and track all unique cars for
changes
 - 4M automated searches a day (during the ingestion phase to check if a doc
exists in the index (based on values of 4-5 key fields) or it is a new one
or an updated version)
 - Of the 4 M - About 3M Updates to existing docs (for every non-key value
change)
 - About 1M inserts a day (I'm assuming these many new listings come in
every day)
 - Daily Bulk CSV exports of inserts / updates in last 24 hours of various
snapshots of the data to various clients

My current deployment : 
 i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines
- 24 Core + 96 GB RAM each.
 ii)There are over 190M docs in the SolrCloud at the moment (for all
replicas its consuming overall disk 2340GB which implies - each doc is at
about 5-8kb in size.)
 iii) The docs are split into 36 Shards - and 3 replica per shard (in all
108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs
running on each host)
 iv) There are 60 fields per doc and all fields are stored at the moment  :( 
(The backend is only Solr at the moment)
 v) The current shard/routing key is a combination of Car Year, Make and
some other car level attributes that help classify the cars
vi) We are mostly using the default Solr config as of now - no heavy caching
as the search is pretty random in nature 
vii) Autocommit is on - with maxDocs = 1

Current throughput  Issues :
With the above mentioned deployment the daily throughout is only at about
1.5M on average (Inserts + Updates) - falling way short of what is required.
Search is slow - Some queries take about 15 seconds to return - and since
insert is dependent on at least one Search that degrades the write
throughput too. (This is not a Solr issue - but the app demands it so)

Questions :

1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing
down indexing? Its a requirement that all docs are available as soon as
indexed.

2. Should I have been better served had I deployed a Single Jetty Solr
instance per server with multiple cores running inside? The servers do start
to swap out after a couple of days of Solr uptime - right now we reboot the
entire cluster every 4 days.

3. The routing key is not able to effectively balance the docs on available
shards - There are a few shards with just about 2M docs - and others over
11M docs. Shall I split the larger shards? But I do not have more nodes /
hardware to allocate to this deployment. In such case would splitting up the
large shards give better read-write throughput? 

4. To remain with the current hardware - would it help if I remove 1 replica
each from a shard? But that would mean even when just 1 node goes down for a
shard there would be only 1 live node left that would not serve the write
requests.

5. Also, is there a way to control where the Split Shard replicas would go?
Is there a pattern / rule that Solr follows when it creates replicas for
split shards?

6. I read somewhere that creating a Core would cost the OS one thread and a
file handle. Since a core repsents an index in its entirty would it not be
allocated the configured number of write threads? (The dafault that is 8)

7. The Zookeeper cluster is deployed on the same boxes as the Solr instance
- Would separating the ZK cluster out help?

Sorry for the long thread _ I thought of asking these all at once rather
than posting separate ones.

Thanks,
Anand



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-01 Thread anand.mahajan
Oops - my bad - Its autoSoftCommit that is set after every doc and not an
autoCommit. 

Following snippet from the solrconfig - 

autoCommit 
   maxTime1/maxTime 
   openSearchertrue/openSearcher 
/autoCommit

autoSoftCommit 
   maxDocs1/maxDocs 
/autoSoftCommit

Shall I increase the autoCommit time as well? But would that mean more RAM
is consumed by all instances running on the box?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150615.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Scale Struggle

2014-08-01 Thread anand.mahajan
Thanks for the reply Shalin.

1. I'll try increasing the softCommit interval and the autoSoftCommit too.
One mistake I made that I realized just now is that I am using /solr/select
and expecting it to do an NRT - for NRT search its got to be /select/get
handler that needs to be used. Please confirm.

2. Also, on the number of shards - I made 36 (even with 6 machines) as I was
hoping I'd get more hardware and i'll be able to distribute existing shards
on the new boxes. That has not happened yet. But even with current
deployment - less number of shards would mean more docs per shard and would
that now slow down search queries?

3. Increasing the commit interval would mean more RAM usage and could that
make the situation bad? as there is already less RAM in there compared to
the total doc size (with all fields stored)  [FYI - ramBufferSizeMB and
maxBufferedDocs are set to default - 100MB and 1000 respectively]

4. I read DataStack Enterprise edition could be an answer here? Is there an
easy way to migrate to DSE - and something that would not cause too many
code changes? (I had a discussion with the DSE folks a few weeks ago and
they mentioned migration would be breeze from Solr to DSE and there would
not be 'any' code changes required too on the ingestion and search code.
(Perhaps I was talking to the Sales guy maybe?))  - With DSE - the data
would sit in Cassendra and the search will still be with Solr plugged into
DSE. but would that work with a 6 Node cluster?  (Sorry if I'm deviating
here a bit from the core problem i'm trying to fix - but if DSE could work
with a very minimal time and effort requirement - i wont mind trying it
out.)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150619.html
Sent from the Solr - User mailing list archive at Nabble.com.