Re: SolrCloud replication question

2012-07-30 Thread Jan Høydahl
Hi,

Interesting article in your link. What servlet container do you use and how is 
it configured wrt. threads etc? You should be able to utilize all CPUs with a 
single Solr index, given that you are not I/O bound.. Also, what is your 
mergeFactor?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 9. juli 2012, at 22:11, avenka wrote:

> Hmm, never mind my question about replicating using symlinks. Given that
> replication on a single machine improves throughput, I should be able to get
> a similar improvement by simply sharding on a single machine. As also
> observed at
> 
> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/
> 
> I am now benchmarking my workload to compare replication vs. sharding
> performance on a single machine.
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrCloud replication question

2012-07-10 Thread Erick Erickson
The symlink thing sounds... complicated, but as you say you're going
another route

The indexing speed you're seeing is surprisingly slow, I'd get to the root
of the timeouts before giving up. SolrCloud simply _can't_ be that slow
by design, something about your setup is causing that I suspect. The
timeouts you're seeing are certainly a clue here. Incoming updates have
a couple of things happen

1> the incoming request is pulled apart. Any docs for this shard are
 indexed and forwarded to any replicas.
2> any docs that are for a different shard are packed up and forwarded
 to the leader for that shard. Which in turn distributes them to any
 replicas.

So I _suspect_ that indexing will be a bit slower, there's some additional
communication going on. But not _that_ much slower. Any clue what your
slow server is doing that would cause it to timeout?

Best
Erick

On Mon, Jul 9, 2012 at 4:11 PM, avenka  wrote:
> Hmm, never mind my question about replicating using symlinks. Given that
> replication on a single machine improves throughput, I should be able to get
> a similar improvement by simply sharding on a single machine. As also
> observed at
>
> http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/
>
> I am now benchmarking my workload to compare replication vs. sharding
> performance on a single machine.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replication question

2012-07-09 Thread avenka
Hmm, never mind my question about replicating using symlinks. Given that
replication on a single machine improves throughput, I should be able to get
a similar improvement by simply sharding on a single machine. As also
observed at
 
http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/

I am now benchmarking my workload to compare replication vs. sharding
performance on a single machine.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3994017.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud replication question

2012-07-09 Thread avenka
Erick, thanks. I now do see segment files in an index. directory at 
the replicas. Not sure why they were not getting populated earlier. 

I have a couple more questions, the second is more elaborate - let me know if I 
should move it to a separate thread.

(1) The speed of adding documents in SolrCloud is excruciatingly slow. It takes 
about 30-50 seconds to add a batch of 100 documents (and about twice that to 
add 200, etc.) to the primary but just ~10 seconds to add 5K documents in 
batches of 200 on a standalone solr 4 server. The log files indicate that the 
primary is timing out with messages like below and Cloud->Graph in the UI shows 
the other two replicas in orange after starting green.
 org.apache.solr.client.solrj.SolrServerException: Timeout occured while 
waiting response from server at: http://localhost:7574/solr

Any idea why?

(3) I am seriously considering using symbolic links for a replicated solr setup 
with completely independent instances on a *single machine*. Tell me if I am 
thinking about this incorrectly. Here is my reasoning: 

(a) Master/slave replication in 3.6 simply seems old school as it doesn't have 
the nice consistency properties of SolrCloud. Polling say every 20 seconds 
means I don't know exactly how up-to-speed each replica is, which will 
complicate my request re-distribution.

(b) SolrCloud seems like a great alternative to master/slave replication. But 
it seems slow (see 1) and having played with it, I don't feel comfortable with 
the maturity of ZK integration (or my comprehension of it) in solr 4 alpha. 

(c) Symbolic links seem like the fastest and most space-efficient solution 
*provided* there is only a single writer, which is just fine for me. I plan to 
run completely separate solr instances with one designated as the primary and 
do the following operations in sequence: Add a batch to the primary and commit 
--> From each replica's index directory, remove all symlinks and re-create 
symlinks to segment files in the primary (but not the write.lock file) --> Call 
update?commit=true to force replicas to re-load their in-memory index --> Do 
whatever read-only processing is required on the batch using the primary and 
all replicas by manually (randomly) distributing read requests --> Repeat 
sequence.

Is there any downside to 3(c) (other than maintaining a trivial script to 
manage symlinks and call commit)? I tested it on small index sizes and it seems 
to work fine. The throughput improves with more replicas (for 2-4 replicas) as 
a single replica is not enough to saturate the machine (due to high query 
latency). Am I overlooking something in this setup?

Overall, I need high throughput and minimal latency from the time a document is 
added to the time it is available at a replica. SolrCloud's automated request 
redirection, consistency, and fault-tolerance is awesome for a physically 
distributed setup, but I don't see how it beats 3(c) in a single-writer, 
single-machine, replicated setup.

AV

On Jul 9, 2012, at 9:43 AM, Erick Erickson [via Lucene] wrote:

> No, you're misunderstanding the setup. Each replica has a complete 
> index. Updates get automatically forwarded to _both_ nodes for a 
> particular shard. So, when a doc comes in to be indexed, it gets 
> sent to the leader for, say, shard1. From there: 
> 1> it gets indexed on the leader 
> 2> it gets forwarded to the replica(s) where it gets indexed locally. 
> 
> Each replica has a complete index (for that shard). 
> 
> There is no master/slave setup any more. And you do 
> _not_ have to configure replication. 
> 
> Best 
> Erick 
> 
> On Sun, Jul 8, 2012 at 1:03 PM, avenka <[hidden email]> wrote:
> 
> > I am trying to wrap my head around replication in SolrCloud. I tried the 
> > setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication 
> > for high query throughput. The setup at the URL above appears to maintain 
> > just one copy of the index at the primary node (instead of a replicated 
> > index as in a master/slave configuration). Will I still get roughly an 
> > n-fold increase in query throughput with n replicas? And if so, why would 
> > one do master/slave replication with multiple copies of the index at all? 
> > 
> > -- 
> > View this message in context: 
> > http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html
> > Sent from the Solr - User mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993889.html
> To unsubscribe from SolrCloud replication question, click here.
> NAML



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761p3993960.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SolrCloud replication question

2012-07-09 Thread Erick Erickson
No, you're misunderstanding the setup. Each replica has a complete
index. Updates get automatically forwarded to _both_ nodes for a
particular shard. So, when a doc comes in to be indexed, it gets
sent to the leader for, say, shard1. From there:
1> it gets indexed on the leader
2> it gets forwarded to the replica(s) where it gets indexed locally.

Each replica has a complete index (for that shard).

There is no master/slave setup any more. And you do
_not_ have to configure replication.

Best
Erick

On Sun, Jul 8, 2012 at 1:03 PM, avenka  wrote:
> I am trying to wrap my head around replication in SolrCloud. I tried the
> setup at http://wiki.apache.org/solr/SolrCloud/. I mainly need replication
> for high query throughput. The setup at the URL above appears to maintain
> just one copy of the index at the primary node (instead of a replicated
> index as in a master/slave configuration). Will I still get roughly an
> n-fold increase in query throughput with n replicas? And if so, why would
> one do master/slave replication with multiple copies of the index at all?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SolrCloud-replication-question-tp3993761.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrCloud Replication Question

2012-02-16 Thread Jamie Johnson
Ok, great.  Just wanted to make sure someone was aware.  Thanks for
looking into this.

On Thu, Feb 16, 2012 at 8:26 AM, Mark Miller  wrote:
>
> On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote:
>
>>  Not sure if this is
>> expected or not.
>
> Nope - should be already resolved or will be today though.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-16 Thread Mark Miller

On Feb 14, 2012, at 10:57 PM, Jamie Johnson wrote:

>  Not sure if this is
> expected or not.

Nope - should be already resolved or will be today though.

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
All of the nodes now show as being Active.  When starting the replicas
I did receive the following message though.  Not sure if this is
expected or not.

INFO: Attempting to replicate from
http://JamiesMac.local:8501/solr/slice2_shard2/
Feb 14, 2012 10:53:34 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to
recover:org.apache.solr.common.SolrException: null
java.lang.NullPointerException  at
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)  at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

null  java.lang.NullPointerExceptionat
org.apache.solr.handler.admin.CoreAdminHandler.handlePrepRecoveryAction(CoreAdminHandler.java:646)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:358)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:172)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326) at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

request: 
http://JamiesMac.local:8501/solr/admin/cores?action=PREPRECOVERY&core=slice2_shard2&nodeName=JamiesMac.local:8502_solr&coreNodeName=JamiesMac.local:8502_solr_slice2_shard1&wt=javabin&version=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:433)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:251)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:208)

Feb 14, 2012 10:53:34 PM org.apache.solr.update.UpdateLog dropBufferedUpdates

Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Doing so now, will let you know if I continue to see the same issues

On Tue, Feb 14, 2012 at 4:59 PM, Mark Miller  wrote:
> Doh - looks like I was just seeing a test issue. Do you mind updating and 
> trying the latest rev? At the least there should be some better logging 
> around the recovery.
>
> I'll keep working on tests in the meantime.
>
> - Mark
>
> On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:
>
>> Sounds good, if I pull the latest from trunk and rerun will that be
>> useful or were you able to duplicate my issue now?
>>
>> On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller  wrote:
>>> Okay Jamie, I think I have a handle on this. It looks like an issue with 
>>> what config files are being used by cores created with the admin core 
>>> handler - I think it's just picking up default config and not the correct 
>>> config for the collection. This means they end up using config that has no 
>>> UpdateLog defined - and so recovery fails.
>>>
>>> I've added more logging around this so that it's easy to determine that.
>>>
>>> I'm investigating more and working on a test + fix. I'll file a JIRA issue 
>>> soon as well.
>>>
>>> - Mark
>>>
>>> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
>>>
 Thanks Mark, not a huge rush, just me trying to get to use the latest
 stuff on our project.

 On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  
 wrote:
> Sorry, have not gotten it yet, but will be back trying later today - 
> monday, tuesday tend to be slow for me (meetings and crap).
>
> - Mark
>
> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>
>> Has there been any success in replicating this?  I'm wondering if it
>> could be something with my setup that is causing the issue...
>>
>>
>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>>> Yes, I have the following layout on the FS
>>>
>>> ./bootstrap.sh
>>> ./example (standard example directory from distro containing jetty
>>> jars, solr confs, solr war, etc)
>>> ./slice1
>>>  - start.sh
>>>  -solr.xml
>>>  - slice1_shard1
>>>   - data
>>>  - slice2_shard2
>>>   -data
>>> ./slice2
>>>  - start.sh
>>>  - solr.xml
>>>  -slice2_shard1
>>>    -data
>>>  -slice1_shard2
>>>    -data
>>>
>>> if it matters I'm running everything from localhost, zk and the solr 
>>> shards
>>>
>>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Doh - looks like I was just seeing a test issue. Do you mind updating and 
trying the latest rev? At the least there should be some better logging around 
the recovery.

I'll keep working on tests in the meantime.

- Mark

On Feb 14, 2012, at 3:15 PM, Jamie Johnson wrote:

> Sounds good, if I pull the latest from trunk and rerun will that be
> useful or were you able to duplicate my issue now?
> 
> On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller  wrote:
>> Okay Jamie, I think I have a handle on this. It looks like an issue with 
>> what config files are being used by cores created with the admin core 
>> handler - I think it's just picking up default config and not the correct 
>> config for the collection. This means they end up using config that has no 
>> UpdateLog defined - and so recovery fails.
>> 
>> I've added more logging around this so that it's easy to determine that.
>> 
>> I'm investigating more and working on a test + fix. I'll file a JIRA issue 
>> soon as well.
>> 
>> - Mark
>> 
>> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
>> 
>>> Thanks Mark, not a huge rush, just me trying to get to use the latest
>>> stuff on our project.
>>> 
>>> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
 Sorry, have not gotten it yet, but will be back trying later today - 
 monday, tuesday tend to be slow for me (meetings and crap).
 
 - Mark
 
 On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
 
> Has there been any success in replicating this?  I'm wondering if it
> could be something with my setup that is causing the issue...
> 
> 
> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>> Yes, I have the following layout on the FS
>> 
>> ./bootstrap.sh
>> ./example (standard example directory from distro containing jetty
>> jars, solr confs, solr war, etc)
>> ./slice1
>>  - start.sh
>>  -solr.xml
>>  - slice1_shard1
>>   - data
>>  - slice2_shard2
>>   -data
>> ./slice2
>>  - start.sh
>>  - solr.xml
>>  -slice2_shard1
>>-data
>>  -slice1_shard2
>>-data
>> 
>> if it matters I'm running everything from localhost, zk and the solr 
>> shards
>> 
>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>>> Do you have unique dataDir for each instance?
>>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Sounds good, if I pull the latest from trunk and rerun will that be
useful or were you able to duplicate my issue now?

On Tue, Feb 14, 2012 at 3:00 PM, Mark Miller  wrote:
> Okay Jamie, I think I have a handle on this. It looks like an issue with what 
> config files are being used by cores created with the admin core handler - I 
> think it's just picking up default config and not the correct config for the 
> collection. This means they end up using config that has no UpdateLog defined 
> - and so recovery fails.
>
> I've added more logging around this so that it's easy to determine that.
>
> I'm investigating more and working on a test + fix. I'll file a JIRA issue 
> soon as well.
>
> - Mark
>
> On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:
>
>> Thanks Mark, not a huge rush, just me trying to get to use the latest
>> stuff on our project.
>>
>> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
>>> Sorry, have not gotten it yet, but will be back trying later today - 
>>> monday, tuesday tend to be slow for me (meetings and crap).
>>>
>>> - Mark
>>>
>>> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>>>
 Has there been any success in replicating this?  I'm wondering if it
 could be something with my setup that is causing the issue...


 On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
> Yes, I have the following layout on the FS
>
> ./bootstrap.sh
> ./example (standard example directory from distro containing jetty
> jars, solr confs, solr war, etc)
> ./slice1
>  - start.sh
>  -solr.xml
>  - slice1_shard1
>   - data
>  - slice2_shard2
>   -data
> ./slice2
>  - start.sh
>  - solr.xml
>  -slice2_shard1
>    -data
>  -slice1_shard2
>    -data
>
> if it matters I'm running everything from localhost, zk and the solr 
> shards
>
> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>> Do you have unique dataDir for each instance?
>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Okay Jamie, I think I have a handle on this. It looks like an issue with what 
config files are being used by cores created with the admin core handler - I 
think it's just picking up default config and not the correct config for the 
collection. This means they end up using config that has no UpdateLog defined - 
and so recovery fails.

I've added more logging around this so that it's easy to determine that.

I'm investigating more and working on a test + fix. I'll file a JIRA issue soon 
as well.

- Mark

On Feb 14, 2012, at 11:39 AM, Jamie Johnson wrote:

> Thanks Mark, not a huge rush, just me trying to get to use the latest
> stuff on our project.
> 
> On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
>> Sorry, have not gotten it yet, but will be back trying later today - monday, 
>> tuesday tend to be slow for me (meetings and crap).
>> 
>> - Mark
>> 
>> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>> 
>>> Has there been any success in replicating this?  I'm wondering if it
>>> could be something with my setup that is causing the issue...
>>> 
>>> 
>>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
 Yes, I have the following layout on the FS
 
 ./bootstrap.sh
 ./example (standard example directory from distro containing jetty
 jars, solr confs, solr war, etc)
 ./slice1
  - start.sh
  -solr.xml
  - slice1_shard1
   - data
  - slice2_shard2
   -data
 ./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data
 
 if it matters I'm running everything from localhost, zk and the solr shards
 
 On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
> Do you have unique dataDir for each instance?
> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Thanks Mark, not a huge rush, just me trying to get to use the latest
stuff on our project.

On Tue, Feb 14, 2012 at 10:53 AM, Mark Miller  wrote:
> Sorry, have not gotten it yet, but will be back trying later today - monday, 
> tuesday tend to be slow for me (meetings and crap).
>
> - Mark
>
> On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:
>
>> Has there been any success in replicating this?  I'm wondering if it
>> could be something with my setup that is causing the issue...
>>
>>
>> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>>> Yes, I have the following layout on the FS
>>>
>>> ./bootstrap.sh
>>> ./example (standard example directory from distro containing jetty
>>> jars, solr confs, solr war, etc)
>>> ./slice1
>>>  - start.sh
>>>  -solr.xml
>>>  - slice1_shard1
>>>   - data
>>>  - slice2_shard2
>>>   -data
>>> ./slice2
>>>  - start.sh
>>>  - solr.xml
>>>  -slice2_shard1
>>>    -data
>>>  -slice1_shard2
>>>    -data
>>>
>>> if it matters I'm running everything from localhost, zk and the solr shards
>>>
>>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
 Do you have unique dataDir for each instance?
 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-14 Thread Mark Miller
Sorry, have not gotten it yet, but will be back trying later today - monday, 
tuesday tend to be slow for me (meetings and crap).

- Mark

On Feb 14, 2012, at 9:10 AM, Jamie Johnson wrote:

> Has there been any success in replicating this?  I'm wondering if it
> could be something with my setup that is causing the issue...
> 
> 
> On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
>> Yes, I have the following layout on the FS
>> 
>> ./bootstrap.sh
>> ./example (standard example directory from distro containing jetty
>> jars, solr confs, solr war, etc)
>> ./slice1
>>  - start.sh
>>  -solr.xml
>>  - slice1_shard1
>>   - data
>>  - slice2_shard2
>>   -data
>> ./slice2
>>  - start.sh
>>  - solr.xml
>>  -slice2_shard1
>>-data
>>  -slice1_shard2
>>-data
>> 
>> if it matters I'm running everything from localhost, zk and the solr shards
>> 
>> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>>> Do you have unique dataDir for each instance?
>>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-14 Thread Jamie Johnson
Has there been any success in replicating this?  I'm wondering if it
could be something with my setup that is causing the issue...


On Mon, Feb 13, 2012 at 8:55 AM, Jamie Johnson  wrote:
> Yes, I have the following layout on the FS
>
> ./bootstrap.sh
> ./example (standard example directory from distro containing jetty
> jars, solr confs, solr war, etc)
> ./slice1
>  - start.sh
>  -solr.xml
>  - slice1_shard1
>   - data
>  - slice2_shard2
>   -data
> ./slice2
>  - start.sh
>  - solr.xml
>  -slice2_shard1
>    -data
>  -slice1_shard2
>    -data
>
> if it matters I'm running everything from localhost, zk and the solr shards
>
> On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
>> Do you have unique dataDir for each instance?
>> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:


Re: SolrCloud Replication Question

2012-02-13 Thread Jamie Johnson
Yes, I have the following layout on the FS

./bootstrap.sh
./example (standard example directory from distro containing jetty
jars, solr confs, solr war, etc)
./slice1
 - start.sh
 -solr.xml
 - slice1_shard1
   - data
 - slice2_shard2
   -data
./slice2
  - start.sh
  - solr.xml
  -slice2_shard1
-data
  -slice1_shard2
-data

if it matters I'm running everything from localhost, zk and the solr shards

On Mon, Feb 13, 2012 at 8:42 AM, Sami Siren  wrote:
> Do you have unique dataDir for each instance?
> 13.2.2012 14.30 "Jamie Johnson"  kirjoitti:


Re: SolrCloud Replication Question

2012-02-13 Thread Sami Siren
Do you have unique dataDir for each instance?
13.2.2012 14.30 "Jamie Johnson"  kirjoitti:


Re: SolrCloud Replication Question

2012-02-13 Thread Jamie Johnson
I don't see any errors in the log.  here are the following scripts I'm
running, and to create the cores I run the following commands

curl 
'http://localhost:8501/solr/admin/cores?action=CREATE&name=slice1_shard1&collection=collection1&shard=slice1&collection.configName=config1'
curl 
'http://localhost:8501/solr/admin/cores?action=CREATE&name=slice2_shard2&collection=collection1&shard=slice2&collection.configName=config1'
curl 
'http://localhost:8502/solr/admin/cores?action=CREATE&name=slice2_shard1&collection=collection1&shard=slice2&collection.configName=config1'
curl 
'http://localhost:8502/solr/admin/cores?action=CREATE&name=slice1_shard2&collection=collection1&shard=slice1&collection.configName=config1'

after doing this the nodes are immediately marked as down in
clusterstate.json.  Restating the solr instances I see that which ever
I start first shows up as active, and the other is down.  There are no
errors in the logs either.



On Sat, Feb 11, 2012 at 9:48 PM, Mark Miller  wrote:
> Yeah, that is what I would expect - for a node to be marked as down, it 
> either didn't finish starting, or it gave up recovering...either case should 
> be logged. You might try searching for the recover keyword and see if there 
> are any interesting bits around that.
>
> Meanwhile, I have dug up a couple issues around recovery and committed fixes 
> to trunk - still playing around...
>
> On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote:
>
>> I didn't see anything in the logs, would it be an error?
>>
>> On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller  wrote:
>>>
>>> On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:
>>>
 I wiped the zk and started over (when I switch networks I get
 different host names and honestly haven't dug into why).  That being
 said the latest state shows all in sync, why would the cores show up
 as down?
>>>
>>>
>>> If recovery fails X times (say because the leader can't be reached from the 
>>> replica), a node is marked as down. It can't be active, and technically it 
>>> has stopped trying to recover (it tries X times and eventually give up 
>>> until you restart it).
>>>
>>> Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
>>> sure if you have looked at your logs or not, but perhaps it's involved.
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


bootstrap.sh
Description: Bourne shell script


start.sh
Description: Bourne shell script


start.sh
Description: Bourne shell script


  
  



Re: SolrCloud Replication Question

2012-02-11 Thread Mark Miller
Yeah, that is what I would expect - for a node to be marked as down, it either 
didn't finish starting, or it gave up recovering...either case should be 
logged. You might try searching for the recover keyword and see if there are 
any interesting bits around that.

Meanwhile, I have dug up a couple issues around recovery and committed fixes to 
trunk - still playing around...

On Feb 11, 2012, at 8:44 PM, Jamie Johnson wrote:

> I didn't see anything in the logs, would it be an error?
> 
> On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller  wrote:
>> 
>> On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:
>> 
>>> I wiped the zk and started over (when I switch networks I get
>>> different host names and honestly haven't dug into why).  That being
>>> said the latest state shows all in sync, why would the cores show up
>>> as down?
>> 
>> 
>> If recovery fails X times (say because the leader can't be reached from the 
>> replica), a node is marked as down. It can't be active, and technically it 
>> has stopped trying to recover (it tries X times and eventually give up until 
>> you restart it).
>> 
>> Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
>> sure if you have looked at your logs or not, but perhaps it's involved.
>> 
>> - Mark Miller
>> lucidimagination.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-11 Thread Jamie Johnson
I didn't see anything in the logs, would it be an error?

On Sat, Feb 11, 2012 at 3:58 PM, Mark Miller  wrote:
>
> On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:
>
>> I wiped the zk and started over (when I switch networks I get
>> different host names and honestly haven't dug into why).  That being
>> said the latest state shows all in sync, why would the cores show up
>> as down?
>
>
> If recovery fails X times (say because the leader can't be reached from the 
> replica), a node is marked as down. It can't be active, and technically it 
> has stopped trying to recover (it tries X times and eventually give up until 
> you restart it).
>
> Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
> sure if you have looked at your logs or not, but perhaps it's involved.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-11 Thread Mark Miller

On Feb 11, 2012, at 3:08 PM, Jamie Johnson wrote:

> I wiped the zk and started over (when I switch networks I get
> different host names and honestly haven't dug into why).  That being
> said the latest state shows all in sync, why would the cores show up
> as down?


If recovery fails X times (say because the leader can't be reached from the 
replica), a node is marked as down. It can't be active, and technically it has 
stopped trying to recover (it tries X times and eventually give up until you 
restart it).

Side note, I recently ran into this issue: SOLR-3122 - fix coming soon. Not 
sure if you have looked at your logs or not, but perhaps it's involved.

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-11 Thread Jamie Johnson
I wiped the zk and started over (when I switch networks I get
different host names and honestly haven't dug into why).  That being
said the latest state shows all in sync, why would the cores show up
as down?

On Sat, Feb 11, 2012 at 11:08 AM, Mark Miller  wrote:
>
> On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote:
>
>>
>>
>> how'd you resolve this issue?
>>
>
>
> I was basing my guess on seeing "JamiesMac.local" and "jamiesmac" in your 
> first cluster state dump - your latest doesn't seem to mismatch like that 
> though.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-11 Thread Mark Miller

On Feb 10, 2012, at 9:40 PM, Jamie Johnson wrote:

> 
> 
> how'd you resolve this issue?
> 


I was basing my guess on seeing "JamiesMac.local" and "jamiesmac" in your first 
cluster state dump - your latest doesn't seem to mismatch like that though.

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
hmmperhaps I'm seeing the issue you're speaking of.  I have
everything running right now and my state is as follows:

{"collection1":{
"slice1":{
  "JamiesMac.local:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"leader":"true",
"state":"active",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"state":"down",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"}},
"slice2":{
  "JamiesMac.local:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"leader":"true",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "JamiesMac.local:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"down",
"core":"slice2_shard2",
"collection":"dataspace",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"


how'd you resolve this issue?


On Fri, Feb 10, 2012 at 8:49 PM, Mark Miller  wrote:
>
> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> jamiesmac
>
> Another note:
>
> Have no idea if this is involved, but when I do tests with my linux box and 
> mac I run into the following:
>
> My linux box auto finds the address of halfmetal and my macbook mbpro.local. 
> If I accept those defaults, my mac connect reach my linux box. It can only 
> reach the linux box through halfmetal.local, and so I have to override the 
> host on the linux box to advertise as halfmetal.local and then they can talk.
>
> In the bad case, if my leaders where on the linux box, they would be able to 
> forward to the mac no problem, but then if shards on the mac needed to 
> recover, they would fail to reach the linux box through the halfmetal address.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller

On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

> jamiesmac

Another note:

Have no idea if this is involved, but when I do tests with my linux box and mac 
I run into the following:

My linux box auto finds the address of halfmetal and my macbook mbpro.local. If 
I accept those defaults, my mac connect reach my linux box. It can only reach 
the linux box through halfmetal.local, and so I have to override the host on 
the linux box to advertise as halfmetal.local and then they can talk.

In the bad case, if my leaders where on the linux box, they would be able to 
forward to the mac no problem, but then if shards on the mac needed to recover, 
they would fail to reach the linux box through the halfmetal address.

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Thanks.

If the given ZK snapshot was the end state, then two nodes are marked as
down. Generally that happens because replication failed - if you have not,
I'd check the logs for those two nodes.

- Mark

On Fri, Feb 10, 2012 at 7:35 PM, Jamie Johnson  wrote:

> nothing seems that different.  In regards to the states of each I'll
> try to verify tonight.
>
> This was using a version I pulled from SVN trunk yesterday morning
>
> On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller 
> wrote:
> > Also, it will help if you can mention the exact version of solrcloud you
> are talking about in each issue - I know you have one from the old branch,
> and I assume a version off trunk you are playing with - so a heads up on
> which and if trunk, what rev or day will help in the case that I'm trying
> to dupe issues that have been addressed.
> >
> > - Mark
> >
> > On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:
> >
> >> I'm trying, but so far I don't see anything. I'll have to try and mimic
> your setup closer it seems.
> >>
> >> I tried starting up 6 solr instances on different ports as 2 shards,
> each with a replication factor of 3.
> >>
> >> Then I indexed 20k documents to the cluster and verified doc counts.
> >>
> >> Then I shutdown all the replicas so that only one instance served each
> shard.
> >>
> >> Then I indexed 20k documents to the cluster.
> >>
> >> Then I started the downed nodes and verified that they where in a
> recovery state.
> >>
> >> After enough time went by I checked and verified document counts on
> each instance - they where as expected.
> >>
> >> I guess next I can try a similar experiment using multiple cores, but
> if you notice anything that stands out that is largely different in what
> you are doing, let me know.
> >>
> >> The cores that are behind, does it say they are down, recovering, or
> active in zookeeper?
> >>
> >> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
> >>
> >>> Sorry for pinging this again, is more information needed on this?  I
> >>> can provide more details but am not sure what to provide.
> >>>
> >>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson 
> wrote:
>  Sorry, I shut down the full solr instance.
> 
>  On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller 
> wrote:
> > Can you explain a little more how you doing this? How are you
> bringing the cores down and then back up? Shutting down a full solr
> instance, unloading the core?
> >
> > On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
> >
> >> I know that the latest Solr Cloud doesn't use standard replication
> but
> >> I have a question about how it appears to be working.  I currently
> >> have the following cluster state
> >>
> >> {"collection1":{
> >>   "slice1":{
> >> "JamiesMac.local:8501_solr_slice1_shard1":{
> >>   "shard_id":"slice1",
> >>   "state":"active",
> >>   "core":"slice1_shard1",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8501_solr",
> >>   "base_url":"http://JamiesMac.local:8501/solr"},
> >> "JamiesMac.local:8502_solr_slice1_shard2":{
> >>   "shard_id":"slice1",
> >>   "state":"active",
> >>   "core":"slice1_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8502_solr",
> >>   "base_url":"http://JamiesMac.local:8502/solr"},
> >> "jamiesmac:8501_solr_slice1_shard1":{
> >>   "shard_id":"slice1",
> >>   "state":"down",
> >>   "core":"slice1_shard1",
> >>   "collection":"collection1",
> >>   "node_name":"jamiesmac:8501_solr",
> >>   "base_url":"http://jamiesmac:8501/solr"},
> >> "jamiesmac:8502_solr_slice1_shard2":{
> >>   "shard_id":"slice1",
> >>   "leader":"true",
> >>   "state":"active",
> >>   "core":"slice1_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"jamiesmac:8502_solr",
> >>   "base_url":"http://jamiesmac:8502/solr"}},
> >>   "slice2":{
> >> "JamiesMac.local:8501_solr_slice2_shard2":{
> >>   "shard_id":"slice2",
> >>   "state":"active",
> >>   "core":"slice2_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8501_solr",
> >>   "base_url":"http://JamiesMac.local:8501/solr"},
> >> "JamiesMac.local:8502_solr_slice2_shard1":{
> >>   "shard_id":"slice2",
> >>   "state":"active",
> >>   "core":"slice2_shard1",
> >>   "collection":"collection1",
> >>   "node_name":"JamiesMac.local:8502_solr",
> >>   "base_url":"http://JamiesMac.local:8502/solr"},
> >> "jamiesmac:8501_solr_slice2_shard2":{
> >>   "shard_id":"slice2",
> >>   "state":"down",
> >>   "core":"slice2_shard2",
> >>   "collection":"collection1",
> >>   "node_name":"jamiesmac:8501_solr",
> >> 

Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
nothing seems that different.  In regards to the states of each I'll
try to verify tonight.

This was using a version I pulled from SVN trunk yesterday morning

On Fri, Feb 10, 2012 at 6:22 PM, Mark Miller  wrote:
> Also, it will help if you can mention the exact version of solrcloud you are 
> talking about in each issue - I know you have one from the old branch, and I 
> assume a version off trunk you are playing with - so a heads up on which and 
> if trunk, what rev or day will help in the case that I'm trying to dupe 
> issues that have been addressed.
>
> - Mark
>
> On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:
>
>> I'm trying, but so far I don't see anything. I'll have to try and mimic your 
>> setup closer it seems.
>>
>> I tried starting up 6 solr instances on different ports as 2 shards, each 
>> with a replication factor of 3.
>>
>> Then I indexed 20k documents to the cluster and verified doc counts.
>>
>> Then I shutdown all the replicas so that only one instance served each shard.
>>
>> Then I indexed 20k documents to the cluster.
>>
>> Then I started the downed nodes and verified that they where in a recovery 
>> state.
>>
>> After enough time went by I checked and verified document counts on each 
>> instance - they where as expected.
>>
>> I guess next I can try a similar experiment using multiple cores, but if you 
>> notice anything that stands out that is largely different in what you are 
>> doing, let me know.
>>
>> The cores that are behind, does it say they are down, recovering, or active 
>> in zookeeper?
>>
>> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
>>
>>> Sorry for pinging this again, is more information needed on this?  I
>>> can provide more details but am not sure what to provide.
>>>
>>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
 Sorry, I shut down the full solr instance.

 On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
> Can you explain a little more how you doing this? How are you bringing 
> the cores down and then back up? Shutting down a full solr instance, 
> unloading the core?
>
> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> I know that the latest Solr Cloud doesn't use standard replication but
>> I have a question about how it appears to be working.  I currently
>> have the following cluster state
>>
>> {"collection1":{
>>   "slice1":{
>>     "JamiesMac.local:8501_solr_slice1_shard1":{
>>       "shard_id":"slice1",
>>       "state":"active",
>>       "core":"slice1_shard1",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8501_solr",
>>       "base_url":"http://JamiesMac.local:8501/solr"},
>>     "JamiesMac.local:8502_solr_slice1_shard2":{
>>       "shard_id":"slice1",
>>       "state":"active",
>>       "core":"slice1_shard2",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8502_solr",
>>       "base_url":"http://JamiesMac.local:8502/solr"},
>>     "jamiesmac:8501_solr_slice1_shard1":{
>>       "shard_id":"slice1",
>>       "state":"down",
>>       "core":"slice1_shard1",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8501_solr",
>>       "base_url":"http://jamiesmac:8501/solr"},
>>     "jamiesmac:8502_solr_slice1_shard2":{
>>       "shard_id":"slice1",
>>       "leader":"true",
>>       "state":"active",
>>       "core":"slice1_shard2",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8502_solr",
>>       "base_url":"http://jamiesmac:8502/solr"}},
>>   "slice2":{
>>     "JamiesMac.local:8501_solr_slice2_shard2":{
>>       "shard_id":"slice2",
>>       "state":"active",
>>       "core":"slice2_shard2",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8501_solr",
>>       "base_url":"http://JamiesMac.local:8501/solr"},
>>     "JamiesMac.local:8502_solr_slice2_shard1":{
>>       "shard_id":"slice2",
>>       "state":"active",
>>       "core":"slice2_shard1",
>>       "collection":"collection1",
>>       "node_name":"JamiesMac.local:8502_solr",
>>       "base_url":"http://JamiesMac.local:8502/solr"},
>>     "jamiesmac:8501_solr_slice2_shard2":{
>>       "shard_id":"slice2",
>>       "state":"down",
>>       "core":"slice2_shard2",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8501_solr",
>>       "base_url":"http://jamiesmac:8501/solr"},
>>     "jamiesmac:8502_solr_slice2_shard1":{
>>       "shard_id":"slice2",
>>       "leader":"true",
>>       "state":"active",
>>       "core":"slice2_shard1",
>>       "collection":"collection1",
>>       "node_name":"jamiesmac:8502_solr",
>>       "base_url":"http://jamiesmac:8502/solr"
>>
>> I then added some docs to the following shards using Solr

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Also, it will help if you can mention the exact version of solrcloud you are 
talking about in each issue - I know you have one from the old branch, and I 
assume a version off trunk you are playing with - so a heads up on which and if 
trunk, what rev or day will help in the case that I'm trying to dupe issues 
that have been addressed.

- Mark

On Feb 10, 2012, at 6:09 PM, Mark Miller wrote:

> I'm trying, but so far I don't see anything. I'll have to try and mimic your 
> setup closer it seems.
> 
> I tried starting up 6 solr instances on different ports as 2 shards, each 
> with a replication factor of 3.
> 
> Then I indexed 20k documents to the cluster and verified doc counts.
> 
> Then I shutdown all the replicas so that only one instance served each shard.
> 
> Then I indexed 20k documents to the cluster.
> 
> Then I started the downed nodes and verified that they where in a recovery 
> state.
> 
> After enough time went by I checked and verified document counts on each 
> instance - they where as expected.
> 
> I guess next I can try a similar experiment using multiple cores, but if you 
> notice anything that stands out that is largely different in what you are 
> doing, let me know.
> 
> The cores that are behind, does it say they are down, recovering, or active 
> in zookeeper?
> 
> On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:
> 
>> Sorry for pinging this again, is more information needed on this?  I
>> can provide more details but am not sure what to provide.
>> 
>> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
>>> Sorry, I shut down the full solr instance.
>>> 
>>> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
 Can you explain a little more how you doing this? How are you bringing the 
 cores down and then back up? Shutting down a full solr instance, unloading 
 the core?
 
 On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
 
> I know that the latest Solr Cloud doesn't use standard replication but
> I have a question about how it appears to be working.  I currently
> have the following cluster state
> 
> {"collection1":{
>   "slice1":{
> "JamiesMac.local:8501_solr_slice1_shard1":{
>   "shard_id":"slice1",
>   "state":"active",
>   "core":"slice1_shard1",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8501_solr",
>   "base_url":"http://JamiesMac.local:8501/solr"},
> "JamiesMac.local:8502_solr_slice1_shard2":{
>   "shard_id":"slice1",
>   "state":"active",
>   "core":"slice1_shard2",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8502_solr",
>   "base_url":"http://JamiesMac.local:8502/solr"},
> "jamiesmac:8501_solr_slice1_shard1":{
>   "shard_id":"slice1",
>   "state":"down",
>   "core":"slice1_shard1",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8501_solr",
>   "base_url":"http://jamiesmac:8501/solr"},
> "jamiesmac:8502_solr_slice1_shard2":{
>   "shard_id":"slice1",
>   "leader":"true",
>   "state":"active",
>   "core":"slice1_shard2",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8502_solr",
>   "base_url":"http://jamiesmac:8502/solr"}},
>   "slice2":{
> "JamiesMac.local:8501_solr_slice2_shard2":{
>   "shard_id":"slice2",
>   "state":"active",
>   "core":"slice2_shard2",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8501_solr",
>   "base_url":"http://JamiesMac.local:8501/solr"},
> "JamiesMac.local:8502_solr_slice2_shard1":{
>   "shard_id":"slice2",
>   "state":"active",
>   "core":"slice2_shard1",
>   "collection":"collection1",
>   "node_name":"JamiesMac.local:8502_solr",
>   "base_url":"http://JamiesMac.local:8502/solr"},
> "jamiesmac:8501_solr_slice2_shard2":{
>   "shard_id":"slice2",
>   "state":"down",
>   "core":"slice2_shard2",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8501_solr",
>   "base_url":"http://jamiesmac:8501/solr"},
> "jamiesmac:8502_solr_slice2_shard1":{
>   "shard_id":"slice2",
>   "leader":"true",
>   "state":"active",
>   "core":"slice2_shard1",
>   "collection":"collection1",
>   "node_name":"jamiesmac:8502_solr",
>   "base_url":"http://jamiesmac:8502/solr"
> 
> I then added some docs to the following shards using SolrJ
> http://localhost:8502/solr/slice2_shard1
> http://localhost:8502/solr/slice1_shard2
> 
> I then bring back up the other cores and I don't see replication
> happening.  Looking at the stats for each core I see that on the 8501
> instance (the instance that was off) the number of docs is 0, so I

Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
I'm trying, but so far I don't see anything. I'll have to try and mimic your 
setup closer it seems.

I tried starting up 6 solr instances on different ports as 2 shards, each with 
a replication factor of 3.

Then I indexed 20k documents to the cluster and verified doc counts.

Then I shutdown all the replicas so that only one instance served each shard.

Then I indexed 20k documents to the cluster.

Then I started the downed nodes and verified that they where in a recovery 
state.

After enough time went by I checked and verified document counts on each 
instance - they where as expected.

I guess next I can try a similar experiment using multiple cores, but if you 
notice anything that stands out that is largely different in what you are 
doing, let me know.

The cores that are behind, does it say they are down, recovering, or active in 
zookeeper?

On Feb 10, 2012, at 4:48 PM, Jamie Johnson wrote:

> Sorry for pinging this again, is more information needed on this?  I
> can provide more details but am not sure what to provide.
> 
> On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
>> Sorry, I shut down the full solr instance.
>> 
>> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
>>> Can you explain a little more how you doing this? How are you bringing the 
>>> cores down and then back up? Shutting down a full solr instance, unloading 
>>> the core?
>>> 
>>> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>>> 
 I know that the latest Solr Cloud doesn't use standard replication but
 I have a question about how it appears to be working.  I currently
 have the following cluster state
 
 {"collection1":{
"slice1":{
  "JamiesMac.local:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"state":"active",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"state":"active",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "jamiesmac:8501_solr_slice1_shard1":{
"shard_id":"slice1",
"state":"down",
"core":"slice1_shard1",
"collection":"collection1",
"node_name":"jamiesmac:8501_solr",
"base_url":"http://jamiesmac:8501/solr"},
  "jamiesmac:8502_solr_slice1_shard2":{
"shard_id":"slice1",
"leader":"true",
"state":"active",
"core":"slice1_shard2",
"collection":"collection1",
"node_name":"jamiesmac:8502_solr",
"base_url":"http://jamiesmac:8502/solr"}},
"slice2":{
  "JamiesMac.local:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"active",
"core":"slice2_shard2",
"collection":"collection1",
"node_name":"JamiesMac.local:8501_solr",
"base_url":"http://JamiesMac.local:8501/solr"},
  "JamiesMac.local:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"JamiesMac.local:8502_solr",
"base_url":"http://JamiesMac.local:8502/solr"},
  "jamiesmac:8501_solr_slice2_shard2":{
"shard_id":"slice2",
"state":"down",
"core":"slice2_shard2",
"collection":"collection1",
"node_name":"jamiesmac:8501_solr",
"base_url":"http://jamiesmac:8501/solr"},
  "jamiesmac:8502_solr_slice2_shard1":{
"shard_id":"slice2",
"leader":"true",
"state":"active",
"core":"slice2_shard1",
"collection":"collection1",
"node_name":"jamiesmac:8502_solr",
"base_url":"http://jamiesmac:8502/solr"
 
 I then added some docs to the following shards using SolrJ
 http://localhost:8502/solr/slice2_shard1
 http://localhost:8502/solr/slice1_shard2
 
 I then bring back up the other cores and I don't see replication
 happening.  Looking at the stats for each core I see that on the 8501
 instance (the instance that was off) the number of docs is 0, so I've
 clearly set something up incorrectly.  Any help on this would be
 greatly appreciated.
>>> 
>>> - Mark Miller
>>> lucidimagination.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 

- Mark Miller
lucidimagination.com













Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
Sorry for pinging this again, is more information needed on this?  I
can provide more details but am not sure what to provide.

On Fri, Feb 10, 2012 at 10:26 AM, Jamie Johnson  wrote:
> Sorry, I shut down the full solr instance.
>
> On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
>> Can you explain a little more how you doing this? How are you bringing the 
>> cores down and then back up? Shutting down a full solr instance, unloading 
>> the core?
>>
>> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>>
>>> I know that the latest Solr Cloud doesn't use standard replication but
>>> I have a question about how it appears to be working.  I currently
>>> have the following cluster state
>>>
>>> {"collection1":{
>>>    "slice1":{
>>>      "JamiesMac.local:8501_solr_slice1_shard1":{
>>>        "shard_id":"slice1",
>>>        "state":"active",
>>>        "core":"slice1_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8501_solr",
>>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>>      "JamiesMac.local:8502_solr_slice1_shard2":{
>>>        "shard_id":"slice1",
>>>        "state":"active",
>>>        "core":"slice1_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8502_solr",
>>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>>      "jamiesmac:8501_solr_slice1_shard1":{
>>>        "shard_id":"slice1",
>>>        "state":"down",
>>>        "core":"slice1_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8501_solr",
>>>        "base_url":"http://jamiesmac:8501/solr"},
>>>      "jamiesmac:8502_solr_slice1_shard2":{
>>>        "shard_id":"slice1",
>>>        "leader":"true",
>>>        "state":"active",
>>>        "core":"slice1_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8502_solr",
>>>        "base_url":"http://jamiesmac:8502/solr"}},
>>>    "slice2":{
>>>      "JamiesMac.local:8501_solr_slice2_shard2":{
>>>        "shard_id":"slice2",
>>>        "state":"active",
>>>        "core":"slice2_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8501_solr",
>>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>>      "JamiesMac.local:8502_solr_slice2_shard1":{
>>>        "shard_id":"slice2",
>>>        "state":"active",
>>>        "core":"slice2_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"JamiesMac.local:8502_solr",
>>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>>      "jamiesmac:8501_solr_slice2_shard2":{
>>>        "shard_id":"slice2",
>>>        "state":"down",
>>>        "core":"slice2_shard2",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8501_solr",
>>>        "base_url":"http://jamiesmac:8501/solr"},
>>>      "jamiesmac:8502_solr_slice2_shard1":{
>>>        "shard_id":"slice2",
>>>        "leader":"true",
>>>        "state":"active",
>>>        "core":"slice2_shard1",
>>>        "collection":"collection1",
>>>        "node_name":"jamiesmac:8502_solr",
>>>        "base_url":"http://jamiesmac:8502/solr"
>>>
>>> I then added some docs to the following shards using SolrJ
>>> http://localhost:8502/solr/slice2_shard1
>>> http://localhost:8502/solr/slice1_shard2
>>>
>>> I then bring back up the other cores and I don't see replication
>>> happening.  Looking at the stats for each core I see that on the 8501
>>> instance (the instance that was off) the number of docs is 0, so I've
>>> clearly set something up incorrectly.  Any help on this would be
>>> greatly appreciated.
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>


Re: SolrCloud Replication Question

2012-02-10 Thread Jamie Johnson
Sorry, I shut down the full solr instance.

On Fri, Feb 10, 2012 at 9:42 AM, Mark Miller  wrote:
> Can you explain a little more how you doing this? How are you bringing the 
> cores down and then back up? Shutting down a full solr instance, unloading 
> the core?
>
> On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:
>
>> I know that the latest Solr Cloud doesn't use standard replication but
>> I have a question about how it appears to be working.  I currently
>> have the following cluster state
>>
>> {"collection1":{
>>    "slice1":{
>>      "JamiesMac.local:8501_solr_slice1_shard1":{
>>        "shard_id":"slice1",
>>        "state":"active",
>>        "core":"slice1_shard1",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8501_solr",
>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>      "JamiesMac.local:8502_solr_slice1_shard2":{
>>        "shard_id":"slice1",
>>        "state":"active",
>>        "core":"slice1_shard2",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8502_solr",
>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>      "jamiesmac:8501_solr_slice1_shard1":{
>>        "shard_id":"slice1",
>>        "state":"down",
>>        "core":"slice1_shard1",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8501_solr",
>>        "base_url":"http://jamiesmac:8501/solr"},
>>      "jamiesmac:8502_solr_slice1_shard2":{
>>        "shard_id":"slice1",
>>        "leader":"true",
>>        "state":"active",
>>        "core":"slice1_shard2",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8502_solr",
>>        "base_url":"http://jamiesmac:8502/solr"}},
>>    "slice2":{
>>      "JamiesMac.local:8501_solr_slice2_shard2":{
>>        "shard_id":"slice2",
>>        "state":"active",
>>        "core":"slice2_shard2",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8501_solr",
>>        "base_url":"http://JamiesMac.local:8501/solr"},
>>      "JamiesMac.local:8502_solr_slice2_shard1":{
>>        "shard_id":"slice2",
>>        "state":"active",
>>        "core":"slice2_shard1",
>>        "collection":"collection1",
>>        "node_name":"JamiesMac.local:8502_solr",
>>        "base_url":"http://JamiesMac.local:8502/solr"},
>>      "jamiesmac:8501_solr_slice2_shard2":{
>>        "shard_id":"slice2",
>>        "state":"down",
>>        "core":"slice2_shard2",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8501_solr",
>>        "base_url":"http://jamiesmac:8501/solr"},
>>      "jamiesmac:8502_solr_slice2_shard1":{
>>        "shard_id":"slice2",
>>        "leader":"true",
>>        "state":"active",
>>        "core":"slice2_shard1",
>>        "collection":"collection1",
>>        "node_name":"jamiesmac:8502_solr",
>>        "base_url":"http://jamiesmac:8502/solr"
>>
>> I then added some docs to the following shards using SolrJ
>> http://localhost:8502/solr/slice2_shard1
>> http://localhost:8502/solr/slice1_shard2
>>
>> I then bring back up the other cores and I don't see replication
>> happening.  Looking at the stats for each core I see that on the 8501
>> instance (the instance that was off) the number of docs is 0, so I've
>> clearly set something up incorrectly.  Any help on this would be
>> greatly appreciated.
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>


Re: SolrCloud Replication Question

2012-02-10 Thread Mark Miller
Can you explain a little more how you doing this? How are you bringing the 
cores down and then back up? Shutting down a full solr instance, unloading the 
core?

On Feb 10, 2012, at 9:33 AM, Jamie Johnson wrote:

> I know that the latest Solr Cloud doesn't use standard replication but
> I have a question about how it appears to be working.  I currently
> have the following cluster state
> 
> {"collection1":{
>"slice1":{
>  "JamiesMac.local:8501_solr_slice1_shard1":{
>"shard_id":"slice1",
>"state":"active",
>"core":"slice1_shard1",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8501_solr",
>"base_url":"http://JamiesMac.local:8501/solr"},
>  "JamiesMac.local:8502_solr_slice1_shard2":{
>"shard_id":"slice1",
>"state":"active",
>"core":"slice1_shard2",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8502_solr",
>"base_url":"http://JamiesMac.local:8502/solr"},
>  "jamiesmac:8501_solr_slice1_shard1":{
>"shard_id":"slice1",
>"state":"down",
>"core":"slice1_shard1",
>"collection":"collection1",
>"node_name":"jamiesmac:8501_solr",
>"base_url":"http://jamiesmac:8501/solr"},
>  "jamiesmac:8502_solr_slice1_shard2":{
>"shard_id":"slice1",
>"leader":"true",
>"state":"active",
>"core":"slice1_shard2",
>"collection":"collection1",
>"node_name":"jamiesmac:8502_solr",
>"base_url":"http://jamiesmac:8502/solr"}},
>"slice2":{
>  "JamiesMac.local:8501_solr_slice2_shard2":{
>"shard_id":"slice2",
>"state":"active",
>"core":"slice2_shard2",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8501_solr",
>"base_url":"http://JamiesMac.local:8501/solr"},
>  "JamiesMac.local:8502_solr_slice2_shard1":{
>"shard_id":"slice2",
>"state":"active",
>"core":"slice2_shard1",
>"collection":"collection1",
>"node_name":"JamiesMac.local:8502_solr",
>"base_url":"http://JamiesMac.local:8502/solr"},
>  "jamiesmac:8501_solr_slice2_shard2":{
>"shard_id":"slice2",
>"state":"down",
>"core":"slice2_shard2",
>"collection":"collection1",
>"node_name":"jamiesmac:8501_solr",
>"base_url":"http://jamiesmac:8501/solr"},
>  "jamiesmac:8502_solr_slice2_shard1":{
>"shard_id":"slice2",
>"leader":"true",
>"state":"active",
>"core":"slice2_shard1",
>"collection":"collection1",
>"node_name":"jamiesmac:8502_solr",
>"base_url":"http://jamiesmac:8502/solr"
> 
> I then added some docs to the following shards using SolrJ
> http://localhost:8502/solr/slice2_shard1
> http://localhost:8502/solr/slice1_shard2
> 
> I then bring back up the other cores and I don't see replication
> happening.  Looking at the stats for each core I see that on the 8501
> instance (the instance that was off) the number of docs is 0, so I've
> clearly set something up incorrectly.  Any help on this would be
> greatly appreciated.

- Mark Miller
lucidimagination.com