Re: Two instances of solr - the same datadir?

2013-07-03 Thread Roman Chyla
I have spent lot of time in the past day playing with this setup, and made
it work finally, here are few bits of interest:

- solr v40
- linux, java7, local filesystem
- big index, 1 RW instance + 2 RO instances (sharing the same index)


lock is acquired when solr is writing data - if you happen to be starting
your RO instance at this moment and you are using 'native' lock, it will
fail. However, when using RW instance with 'native' lock, and 2 RO
instances 'single' lock, the RO instances can start, but they will
eventually get into troubles too - our index is too big and so when core
RELOAD is called and indexing is under way, the RO instances time out.

core reload, when using 'native' lock, seems to work fine - if you were
lucky and all instances managed to start - HOWEVER, the core is
unresponsive until fully loaded (makes sense), but this is actually
terrible - your search is gone for seconds/minutes

the best setup is as described in my original post - RO instances MUST NOT
commit anything - neither use reload (because during reload solr tries to
acquire lock). Instead, they should just reopen the searcher - i repeat:
you should make sure that nothing is every going to write on the RO
instance. And because there is no public api for reopening the searcher, I
wrote a simple handler which just calls:

req.getCore().getSearcher(true, false, null, false);

when called, the RO instances continue to handle requests using the old
searcher, warming in the background, once ready, the new searcher takes
over [to repeat: i am triggering this refresh from the RW instance, it does
'curl http://foo/solr/myhandler?command=reopenSearcher]


the bad thing: when the RO instance dies (eg OOM error) and the RW is just
in the middle of writing data, you can't restart RO instance (unless you
use lock 'single' or some other lock)

HTH,

  roman




On Tue, Jul 2, 2013 at 5:35 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Wouldn't it be better to do a RELOAD?
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge 
> wrote:
>
> > The RO instance commit isn't (or shouldn't be) doing any real writing,
> just
> > an empty commit to force new searchers, autowarm/refresh caches etc.
> > Admittedly, we do all this on 3.6, so 4.0 could have different behaviour
> in
> > this area.
> > As long as you don't have autocommit in solrconfig.xml, there wouldn't be
> > any commits 'behind the scenes' (we do all our commits via a local solrj
> > client so it can be fully managed).
> > The only caveat might be NRT/soft commits, but I'm not too familiar with
> > this in 4.0.
> > In any case, your RO instance must be getting updated somehow, otherwise
> > how would it know your write instance made any changes?
> > Perhaps your write instance notifies the RO instance externally from
> Solr?
> > (a perfectly valid approach, and one that would allow a 'single' lock to
> > work without contention)
> >
> >
> >
> > On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla 
> wrote:
> >
> > > Interesting, we are running 4.0 - and solr will refuse the start (or
> > > reload) the core. But from looking at the code I am not seeing it is
> > doing
> > > any writing - but I should digg more...
> > >
> > > Are you sure it needs to do writing? Because I am not calling commits,
> in
> > > fact I have deactivated *all* components that write into index, so
> unless
> > > there is something deep inside, which automatically calls the commit,
> it
> > > should never happen.
> > >
> > > roman
> > >
> > >
> > > On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge 
> > > wrote:
> > >
> > > > Hmmm, single lock sounds dangerous. It probably works ok because
> you've
> > > > been [un]lucky.
> > > > For example, even with a RO instance, you still need to do a commit
> in
> > > > order to reload caches/changes from the other instance.
> > > > What happens if this commit gets called in the middle of the other
> > > > instance's commit? I've not tested this scenario, but it's very
> > possible
> > > > with a 'single' lock the results are indeterminate.
> > > > If the 'single' lock mechanism is making assumptions e.g. no other
> > > process
> > > > will interfere, and then one does, the Lucene index could very well
> get
> > > > corrupted.
> > > >
> > > > For the error you're seeing using 'native', we use native lockType
> for
> > > both
> > > > write and RO instances, and it works fine - no contention.
> > > > Which version of Solr are you using? Perhaps there's been a change in
> > > > behaviour?
> > > >
> > > > Peter
> > > >
> > > >
> > > > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla 
> > > wrote

Re: Two instances of solr - the same datadir?

2013-07-03 Thread Peter Sturge
You can do a reload, yes, but a commit() is considerably faster.


On Tue, Jul 2, 2013 at 10:35 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Wouldn't it be better to do a RELOAD?
>
> http://wiki.apache.org/solr/CoreAdmin#RELOAD
>
> Michael Della Bitta
>
> Applications Developer
>
> o: +1 646 532 3062  | c: +1 917 477 7906
>
> appinions inc.
>
> “The Science of Influence Marketing”
>
> 18 East 41st Street
>
> New York, NY 10017
>
> t: @appinions  | g+:
> plus.google.com/appinions
> w: appinions.com 
>
>
> On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge 
> wrote:
>
> > The RO instance commit isn't (or shouldn't be) doing any real writing,
> just
> > an empty commit to force new searchers, autowarm/refresh caches etc.
> > Admittedly, we do all this on 3.6, so 4.0 could have different behaviour
> in
> > this area.
> > As long as you don't have autocommit in solrconfig.xml, there wouldn't be
> > any commits 'behind the scenes' (we do all our commits via a local solrj
> > client so it can be fully managed).
> > The only caveat might be NRT/soft commits, but I'm not too familiar with
> > this in 4.0.
> > In any case, your RO instance must be getting updated somehow, otherwise
> > how would it know your write instance made any changes?
> > Perhaps your write instance notifies the RO instance externally from
> Solr?
> > (a perfectly valid approach, and one that would allow a 'single' lock to
> > work without contention)
> >
> >
> >
> > On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla 
> wrote:
> >
> > > Interesting, we are running 4.0 - and solr will refuse the start (or
> > > reload) the core. But from looking at the code I am not seeing it is
> > doing
> > > any writing - but I should digg more...
> > >
> > > Are you sure it needs to do writing? Because I am not calling commits,
> in
> > > fact I have deactivated *all* components that write into index, so
> unless
> > > there is something deep inside, which automatically calls the commit,
> it
> > > should never happen.
> > >
> > > roman
> > >
> > >
> > > On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge 
> > > wrote:
> > >
> > > > Hmmm, single lock sounds dangerous. It probably works ok because
> you've
> > > > been [un]lucky.
> > > > For example, even with a RO instance, you still need to do a commit
> in
> > > > order to reload caches/changes from the other instance.
> > > > What happens if this commit gets called in the middle of the other
> > > > instance's commit? I've not tested this scenario, but it's very
> > possible
> > > > with a 'single' lock the results are indeterminate.
> > > > If the 'single' lock mechanism is making assumptions e.g. no other
> > > process
> > > > will interfere, and then one does, the Lucene index could very well
> get
> > > > corrupted.
> > > >
> > > > For the error you're seeing using 'native', we use native lockType
> for
> > > both
> > > > write and RO instances, and it works fine - no contention.
> > > > Which version of Solr are you using? Perhaps there's been a change in
> > > > behaviour?
> > > >
> > > > Peter
> > > >
> > > >
> > > > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla 
> > > wrote:
> > > >
> > > > > as i discovered, it is not good to use 'native' locktype in this
> > > > scenario,
> > > > > actually there is a note in the solrconfig.xml which says the same
> > > > >
> > > > > when a core is reloaded and solr tries to grab lock, it will fail -
> > > even
> > > > if
> > > > > the instance is configured to be read-only, so i am using 'single'
> > lock
> > > > for
> > > > > the readers and 'native' for the writer, which seems to work OK
> > > > >
> > > > > roman
> > > > >
> > > > >
> > > > > On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla  >
> > > > wrote:
> > > > >
> > > > > > I have auto commit after 40k RECs/1800secs. But I only tested
> with
> > > > manual
> > > > > > commit, but I don't see why it should work differently.
> > > > > > Roman
> > > > > > On 7 Jun 2013 20:52, "Tim Vaillancourt" 
> > > wrote:
> > > > > >
> > > > > >> If it makes you feel better, I also considered this approach
> when
> > I
> > > > was
> > > > > in
> > > > > >> the same situation with a separate indexer and searcher on one
> > > > Physical
> > > > > >> linux machine.
> > > > > >>
> > > > > >> My main concern was "re-using" the FS cache between both
> > instances -
> > > > If
> > > > > I
> > > > > >> replicated to myself there would be two independent copies of
> the
> > > > index,
> > > > > >> FS-cached separately.
> > > > > >>
> > > > > >> I like the suggestion of using autoCommit to reload the index.
> If
> > > I'm
> > > > > >> reading that right, you'd set an autoCommit on 'zero docs
> > changing',
> > > > or
> > > > > >> just 'every N seconds'? Did that work?
> > > > > >>
> > > > > >> Best of luck!
> > > > > >>
> > > > > >> Tim
> > > > > >>
> > > > > >>
> > > > > >> On 5 June 2013 10:19, Roman Chyla 
> wrote:
> > > > > >>
> > > > > >> > So here it is for a record 

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Michael Della Bitta
Wouldn't it be better to do a RELOAD?

http://wiki.apache.org/solr/CoreAdmin#RELOAD

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Tue, Jul 2, 2013 at 5:05 PM, Peter Sturge  wrote:

> The RO instance commit isn't (or shouldn't be) doing any real writing, just
> an empty commit to force new searchers, autowarm/refresh caches etc.
> Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in
> this area.
> As long as you don't have autocommit in solrconfig.xml, there wouldn't be
> any commits 'behind the scenes' (we do all our commits via a local solrj
> client so it can be fully managed).
> The only caveat might be NRT/soft commits, but I'm not too familiar with
> this in 4.0.
> In any case, your RO instance must be getting updated somehow, otherwise
> how would it know your write instance made any changes?
> Perhaps your write instance notifies the RO instance externally from Solr?
> (a perfectly valid approach, and one that would allow a 'single' lock to
> work without contention)
>
>
>
> On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla  wrote:
>
> > Interesting, we are running 4.0 - and solr will refuse the start (or
> > reload) the core. But from looking at the code I am not seeing it is
> doing
> > any writing - but I should digg more...
> >
> > Are you sure it needs to do writing? Because I am not calling commits, in
> > fact I have deactivated *all* components that write into index, so unless
> > there is something deep inside, which automatically calls the commit, it
> > should never happen.
> >
> > roman
> >
> >
> > On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge 
> > wrote:
> >
> > > Hmmm, single lock sounds dangerous. It probably works ok because you've
> > > been [un]lucky.
> > > For example, even with a RO instance, you still need to do a commit in
> > > order to reload caches/changes from the other instance.
> > > What happens if this commit gets called in the middle of the other
> > > instance's commit? I've not tested this scenario, but it's very
> possible
> > > with a 'single' lock the results are indeterminate.
> > > If the 'single' lock mechanism is making assumptions e.g. no other
> > process
> > > will interfere, and then one does, the Lucene index could very well get
> > > corrupted.
> > >
> > > For the error you're seeing using 'native', we use native lockType for
> > both
> > > write and RO instances, and it works fine - no contention.
> > > Which version of Solr are you using? Perhaps there's been a change in
> > > behaviour?
> > >
> > > Peter
> > >
> > >
> > > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla 
> > wrote:
> > >
> > > > as i discovered, it is not good to use 'native' locktype in this
> > > scenario,
> > > > actually there is a note in the solrconfig.xml which says the same
> > > >
> > > > when a core is reloaded and solr tries to grab lock, it will fail -
> > even
> > > if
> > > > the instance is configured to be read-only, so i am using 'single'
> lock
> > > for
> > > > the readers and 'native' for the writer, which seems to work OK
> > > >
> > > > roman
> > > >
> > > >
> > > > On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla 
> > > wrote:
> > > >
> > > > > I have auto commit after 40k RECs/1800secs. But I only tested with
> > > manual
> > > > > commit, but I don't see why it should work differently.
> > > > > Roman
> > > > > On 7 Jun 2013 20:52, "Tim Vaillancourt" 
> > wrote:
> > > > >
> > > > >> If it makes you feel better, I also considered this approach when
> I
> > > was
> > > > in
> > > > >> the same situation with a separate indexer and searcher on one
> > > Physical
> > > > >> linux machine.
> > > > >>
> > > > >> My main concern was "re-using" the FS cache between both
> instances -
> > > If
> > > > I
> > > > >> replicated to myself there would be two independent copies of the
> > > index,
> > > > >> FS-cached separately.
> > > > >>
> > > > >> I like the suggestion of using autoCommit to reload the index. If
> > I'm
> > > > >> reading that right, you'd set an autoCommit on 'zero docs
> changing',
> > > or
> > > > >> just 'every N seconds'? Did that work?
> > > > >>
> > > > >> Best of luck!
> > > > >>
> > > > >> Tim
> > > > >>
> > > > >>
> > > > >> On 5 June 2013 10:19, Roman Chyla  wrote:
> > > > >>
> > > > >> > So here it is for a record how I am solving it right now:
> > > > >> >
> > > > >> > Write-master is started with: -Dmontysolr.warming.enabled=false
> > > > >> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> > > > >> > http://localhost:5005
> > > > >> > Read-master is started with: -Dmontysolr.warming.enabled=true
> > > > >> > -Dmontysolr.write.master=false
> > > > >> >
> > > > >> >
> > > > >> > solrconfig.xml changes:
> > > > >> >
> > > > >> > 1. all index changing components have t

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Peter Sturge
The RO instance commit isn't (or shouldn't be) doing any real writing, just
an empty commit to force new searchers, autowarm/refresh caches etc.
Admittedly, we do all this on 3.6, so 4.0 could have different behaviour in
this area.
As long as you don't have autocommit in solrconfig.xml, there wouldn't be
any commits 'behind the scenes' (we do all our commits via a local solrj
client so it can be fully managed).
The only caveat might be NRT/soft commits, but I'm not too familiar with
this in 4.0.
In any case, your RO instance must be getting updated somehow, otherwise
how would it know your write instance made any changes?
Perhaps your write instance notifies the RO instance externally from Solr?
(a perfectly valid approach, and one that would allow a 'single' lock to
work without contention)



On Tue, Jul 2, 2013 at 7:59 PM, Roman Chyla  wrote:

> Interesting, we are running 4.0 - and solr will refuse the start (or
> reload) the core. But from looking at the code I am not seeing it is doing
> any writing - but I should digg more...
>
> Are you sure it needs to do writing? Because I am not calling commits, in
> fact I have deactivated *all* components that write into index, so unless
> there is something deep inside, which automatically calls the commit, it
> should never happen.
>
> roman
>
>
> On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge 
> wrote:
>
> > Hmmm, single lock sounds dangerous. It probably works ok because you've
> > been [un]lucky.
> > For example, even with a RO instance, you still need to do a commit in
> > order to reload caches/changes from the other instance.
> > What happens if this commit gets called in the middle of the other
> > instance's commit? I've not tested this scenario, but it's very possible
> > with a 'single' lock the results are indeterminate.
> > If the 'single' lock mechanism is making assumptions e.g. no other
> process
> > will interfere, and then one does, the Lucene index could very well get
> > corrupted.
> >
> > For the error you're seeing using 'native', we use native lockType for
> both
> > write and RO instances, and it works fine - no contention.
> > Which version of Solr are you using? Perhaps there's been a change in
> > behaviour?
> >
> > Peter
> >
> >
> > On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla 
> wrote:
> >
> > > as i discovered, it is not good to use 'native' locktype in this
> > scenario,
> > > actually there is a note in the solrconfig.xml which says the same
> > >
> > > when a core is reloaded and solr tries to grab lock, it will fail -
> even
> > if
> > > the instance is configured to be read-only, so i am using 'single' lock
> > for
> > > the readers and 'native' for the writer, which seems to work OK
> > >
> > > roman
> > >
> > >
> > > On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla 
> > wrote:
> > >
> > > > I have auto commit after 40k RECs/1800secs. But I only tested with
> > manual
> > > > commit, but I don't see why it should work differently.
> > > > Roman
> > > > On 7 Jun 2013 20:52, "Tim Vaillancourt" 
> wrote:
> > > >
> > > >> If it makes you feel better, I also considered this approach when I
> > was
> > > in
> > > >> the same situation with a separate indexer and searcher on one
> > Physical
> > > >> linux machine.
> > > >>
> > > >> My main concern was "re-using" the FS cache between both instances -
> > If
> > > I
> > > >> replicated to myself there would be two independent copies of the
> > index,
> > > >> FS-cached separately.
> > > >>
> > > >> I like the suggestion of using autoCommit to reload the index. If
> I'm
> > > >> reading that right, you'd set an autoCommit on 'zero docs changing',
> > or
> > > >> just 'every N seconds'? Did that work?
> > > >>
> > > >> Best of luck!
> > > >>
> > > >> Tim
> > > >>
> > > >>
> > > >> On 5 June 2013 10:19, Roman Chyla  wrote:
> > > >>
> > > >> > So here it is for a record how I am solving it right now:
> > > >> >
> > > >> > Write-master is started with: -Dmontysolr.warming.enabled=false
> > > >> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> > > >> > http://localhost:5005
> > > >> > Read-master is started with: -Dmontysolr.warming.enabled=true
> > > >> > -Dmontysolr.write.master=false
> > > >> >
> > > >> >
> > > >> > solrconfig.xml changes:
> > > >> >
> > > >> > 1. all index changing components have this bit,
> > > >> > enable="${montysolr.master:true}" - ie.
> > > >> >
> > > >> >  > > >> >  enable="${montysolr.master:true}">
> > > >> >
> > > >> > 2. for cache warming de/activation
> > > >> >
> > > >> >  > > >> >   class="solr.QuerySenderListener"
> > > >> >   enable="${montysolr.enable.warming:true}">...
> > > >> >
> > > >> > 3. to trigger refresh of the read-only-master (from write-master):
> > > >> >
> > > >> >  > > >> >   class="solr.RunExecutableListener"
> > > >> >   enable="${montysolr.master:true}">
> > > >> >   curl
> > > >> >   .
> > > >> >   false
> > > >> >${montysolr.read.master:
> > http://localhost
> > > >> >
> > 

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
Interesting, we are running 4.0 - and solr will refuse the start (or
reload) the core. But from looking at the code I am not seeing it is doing
any writing - but I should digg more...

Are you sure it needs to do writing? Because I am not calling commits, in
fact I have deactivated *all* components that write into index, so unless
there is something deep inside, which automatically calls the commit, it
should never happen.

roman


On Tue, Jul 2, 2013 at 2:54 PM, Peter Sturge  wrote:

> Hmmm, single lock sounds dangerous. It probably works ok because you've
> been [un]lucky.
> For example, even with a RO instance, you still need to do a commit in
> order to reload caches/changes from the other instance.
> What happens if this commit gets called in the middle of the other
> instance's commit? I've not tested this scenario, but it's very possible
> with a 'single' lock the results are indeterminate.
> If the 'single' lock mechanism is making assumptions e.g. no other process
> will interfere, and then one does, the Lucene index could very well get
> corrupted.
>
> For the error you're seeing using 'native', we use native lockType for both
> write and RO instances, and it works fine - no contention.
> Which version of Solr are you using? Perhaps there's been a change in
> behaviour?
>
> Peter
>
>
> On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla  wrote:
>
> > as i discovered, it is not good to use 'native' locktype in this
> scenario,
> > actually there is a note in the solrconfig.xml which says the same
> >
> > when a core is reloaded and solr tries to grab lock, it will fail - even
> if
> > the instance is configured to be read-only, so i am using 'single' lock
> for
> > the readers and 'native' for the writer, which seems to work OK
> >
> > roman
> >
> >
> > On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla 
> wrote:
> >
> > > I have auto commit after 40k RECs/1800secs. But I only tested with
> manual
> > > commit, but I don't see why it should work differently.
> > > Roman
> > > On 7 Jun 2013 20:52, "Tim Vaillancourt"  wrote:
> > >
> > >> If it makes you feel better, I also considered this approach when I
> was
> > in
> > >> the same situation with a separate indexer and searcher on one
> Physical
> > >> linux machine.
> > >>
> > >> My main concern was "re-using" the FS cache between both instances -
> If
> > I
> > >> replicated to myself there would be two independent copies of the
> index,
> > >> FS-cached separately.
> > >>
> > >> I like the suggestion of using autoCommit to reload the index. If I'm
> > >> reading that right, you'd set an autoCommit on 'zero docs changing',
> or
> > >> just 'every N seconds'? Did that work?
> > >>
> > >> Best of luck!
> > >>
> > >> Tim
> > >>
> > >>
> > >> On 5 June 2013 10:19, Roman Chyla  wrote:
> > >>
> > >> > So here it is for a record how I am solving it right now:
> > >> >
> > >> > Write-master is started with: -Dmontysolr.warming.enabled=false
> > >> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> > >> > http://localhost:5005
> > >> > Read-master is started with: -Dmontysolr.warming.enabled=true
> > >> > -Dmontysolr.write.master=false
> > >> >
> > >> >
> > >> > solrconfig.xml changes:
> > >> >
> > >> > 1. all index changing components have this bit,
> > >> > enable="${montysolr.master:true}" - ie.
> > >> >
> > >> >  > >> >  enable="${montysolr.master:true}">
> > >> >
> > >> > 2. for cache warming de/activation
> > >> >
> > >> >  > >> >   class="solr.QuerySenderListener"
> > >> >   enable="${montysolr.enable.warming:true}">...
> > >> >
> > >> > 3. to trigger refresh of the read-only-master (from write-master):
> > >> >
> > >> >  > >> >   class="solr.RunExecutableListener"
> > >> >   enable="${montysolr.master:true}">
> > >> >   curl
> > >> >   .
> > >> >   false
> > >> >${montysolr.read.master:
> http://localhost
> > >> >
> > >> >
> > >>
> >
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > >> > 
> > >> >
> > >> > This works, I still don't like the reload of the whole core, but it
> > >> seems
> > >> > like the easiest thing to do now.
> > >> >
> > >> > -- roman
> > >> >
> > >> >
> > >> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla  >
> > >> > wrote:
> > >> >
> > >> > > Hi Peter,
> > >> > >
> > >> > > Thank you, I am glad to read that this usecase is not alien.
> > >> > >
> > >> > > I'd like to make the second instance (searcher) completely
> > read-only,
> > >> so
> > >> > I
> > >> > > have disabled all the components that can write.
> > >> > >
> > >> > > (being lazy ;)) I'll probably use
> > >> > > http://wiki.apache.org/solr/CollectionDistribution to call the
> curl
> > >> > after
> > >> > > commit, or write some IndexReaderFactory that checks for changes
> > >> > >
> > >> > > The problem with calling the 'core reload' - is that it seems lots
> > of
> > >> > work
> > >> > > for just opening a new searcher, eeekkk...somewhere I read that it
> > is
> > >> > cheap
> > >> > > to reload a c

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Peter Sturge
Hmmm, single lock sounds dangerous. It probably works ok because you've
been [un]lucky.
For example, even with a RO instance, you still need to do a commit in
order to reload caches/changes from the other instance.
What happens if this commit gets called in the middle of the other
instance's commit? I've not tested this scenario, but it's very possible
with a 'single' lock the results are indeterminate.
If the 'single' lock mechanism is making assumptions e.g. no other process
will interfere, and then one does, the Lucene index could very well get
corrupted.

For the error you're seeing using 'native', we use native lockType for both
write and RO instances, and it works fine - no contention.
Which version of Solr are you using? Perhaps there's been a change in
behaviour?

Peter


On Tue, Jul 2, 2013 at 7:30 PM, Roman Chyla  wrote:

> as i discovered, it is not good to use 'native' locktype in this scenario,
> actually there is a note in the solrconfig.xml which says the same
>
> when a core is reloaded and solr tries to grab lock, it will fail - even if
> the instance is configured to be read-only, so i am using 'single' lock for
> the readers and 'native' for the writer, which seems to work OK
>
> roman
>
>
> On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla  wrote:
>
> > I have auto commit after 40k RECs/1800secs. But I only tested with manual
> > commit, but I don't see why it should work differently.
> > Roman
> > On 7 Jun 2013 20:52, "Tim Vaillancourt"  wrote:
> >
> >> If it makes you feel better, I also considered this approach when I was
> in
> >> the same situation with a separate indexer and searcher on one Physical
> >> linux machine.
> >>
> >> My main concern was "re-using" the FS cache between both instances - If
> I
> >> replicated to myself there would be two independent copies of the index,
> >> FS-cached separately.
> >>
> >> I like the suggestion of using autoCommit to reload the index. If I'm
> >> reading that right, you'd set an autoCommit on 'zero docs changing', or
> >> just 'every N seconds'? Did that work?
> >>
> >> Best of luck!
> >>
> >> Tim
> >>
> >>
> >> On 5 June 2013 10:19, Roman Chyla  wrote:
> >>
> >> > So here it is for a record how I am solving it right now:
> >> >
> >> > Write-master is started with: -Dmontysolr.warming.enabled=false
> >> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> >> > http://localhost:5005
> >> > Read-master is started with: -Dmontysolr.warming.enabled=true
> >> > -Dmontysolr.write.master=false
> >> >
> >> >
> >> > solrconfig.xml changes:
> >> >
> >> > 1. all index changing components have this bit,
> >> > enable="${montysolr.master:true}" - ie.
> >> >
> >> >  >> >  enable="${montysolr.master:true}">
> >> >
> >> > 2. for cache warming de/activation
> >> >
> >> >  >> >   class="solr.QuerySenderListener"
> >> >   enable="${montysolr.enable.warming:true}">...
> >> >
> >> > 3. to trigger refresh of the read-only-master (from write-master):
> >> >
> >> >  >> >   class="solr.RunExecutableListener"
> >> >   enable="${montysolr.master:true}">
> >> >   curl
> >> >   .
> >> >   false
> >> >${montysolr.read.master:http://localhost
> >> >
> >> >
> >>
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> >> > 
> >> >
> >> > This works, I still don't like the reload of the whole core, but it
> >> seems
> >> > like the easiest thing to do now.
> >> >
> >> > -- roman
> >> >
> >> >
> >> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
> >> > wrote:
> >> >
> >> > > Hi Peter,
> >> > >
> >> > > Thank you, I am glad to read that this usecase is not alien.
> >> > >
> >> > > I'd like to make the second instance (searcher) completely
> read-only,
> >> so
> >> > I
> >> > > have disabled all the components that can write.
> >> > >
> >> > > (being lazy ;)) I'll probably use
> >> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> >> > after
> >> > > commit, or write some IndexReaderFactory that checks for changes
> >> > >
> >> > > The problem with calling the 'core reload' - is that it seems lots
> of
> >> > work
> >> > > for just opening a new searcher, eeekkk...somewhere I read that it
> is
> >> > cheap
> >> > > to reload a core, but re-opening the index searches must be
> definitely
> >> > > cheaper...
> >> > >
> >> > > roman
> >> > >
> >> > >
> >> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge <
> peter.stu...@gmail.com
> >> > >wrote:
> >> > >
> >> > >> Hi,
> >> > >> We use this very same scenario to great effect - 2 instances using
> >> the
> >> > >> same
> >> > >> dataDir with many cores - 1 is a writer (no caching), the other is
> a
> >> > >> searcher (lots of caching).
> >> > >> To get the searcher to see the index changes from the writer, you
> >> need
> >> > the
> >> > >> searcher to do an empty commit - i.e. you invoke a commit with 0
> >> > >> documents.
> >> > >> This will refresh the caches (including autowarming), [re]build the
> >> > >> relevant searchers etc. and ma

Re: Two instances of solr - the same datadir?

2013-07-02 Thread Roman Chyla
as i discovered, it is not good to use 'native' locktype in this scenario,
actually there is a note in the solrconfig.xml which says the same

when a core is reloaded and solr tries to grab lock, it will fail - even if
the instance is configured to be read-only, so i am using 'single' lock for
the readers and 'native' for the writer, which seems to work OK

roman


On Fri, Jun 7, 2013 at 9:05 PM, Roman Chyla  wrote:

> I have auto commit after 40k RECs/1800secs. But I only tested with manual
> commit, but I don't see why it should work differently.
> Roman
> On 7 Jun 2013 20:52, "Tim Vaillancourt"  wrote:
>
>> If it makes you feel better, I also considered this approach when I was in
>> the same situation with a separate indexer and searcher on one Physical
>> linux machine.
>>
>> My main concern was "re-using" the FS cache between both instances - If I
>> replicated to myself there would be two independent copies of the index,
>> FS-cached separately.
>>
>> I like the suggestion of using autoCommit to reload the index. If I'm
>> reading that right, you'd set an autoCommit on 'zero docs changing', or
>> just 'every N seconds'? Did that work?
>>
>> Best of luck!
>>
>> Tim
>>
>>
>> On 5 June 2013 10:19, Roman Chyla  wrote:
>>
>> > So here it is for a record how I am solving it right now:
>> >
>> > Write-master is started with: -Dmontysolr.warming.enabled=false
>> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
>> > http://localhost:5005
>> > Read-master is started with: -Dmontysolr.warming.enabled=true
>> > -Dmontysolr.write.master=false
>> >
>> >
>> > solrconfig.xml changes:
>> >
>> > 1. all index changing components have this bit,
>> > enable="${montysolr.master:true}" - ie.
>> >
>> > > >  enable="${montysolr.master:true}">
>> >
>> > 2. for cache warming de/activation
>> >
>> > > >   class="solr.QuerySenderListener"
>> >   enable="${montysolr.enable.warming:true}">...
>> >
>> > 3. to trigger refresh of the read-only-master (from write-master):
>> >
>> > > >   class="solr.RunExecutableListener"
>> >   enable="${montysolr.master:true}">
>> >   curl
>> >   .
>> >   false
>> >${montysolr.read.master:http://localhost
>> >
>> >
>> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
>> > 
>> >
>> > This works, I still don't like the reload of the whole core, but it
>> seems
>> > like the easiest thing to do now.
>> >
>> > -- roman
>> >
>> >
>> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
>> > wrote:
>> >
>> > > Hi Peter,
>> > >
>> > > Thank you, I am glad to read that this usecase is not alien.
>> > >
>> > > I'd like to make the second instance (searcher) completely read-only,
>> so
>> > I
>> > > have disabled all the components that can write.
>> > >
>> > > (being lazy ;)) I'll probably use
>> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl
>> > after
>> > > commit, or write some IndexReaderFactory that checks for changes
>> > >
>> > > The problem with calling the 'core reload' - is that it seems lots of
>> > work
>> > > for just opening a new searcher, eeekkk...somewhere I read that it is
>> > cheap
>> > > to reload a core, but re-opening the index searches must be definitely
>> > > cheaper...
>> > >
>> > > roman
>> > >
>> > >
>> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge > > >wrote:
>> > >
>> > >> Hi,
>> > >> We use this very same scenario to great effect - 2 instances using
>> the
>> > >> same
>> > >> dataDir with many cores - 1 is a writer (no caching), the other is a
>> > >> searcher (lots of caching).
>> > >> To get the searcher to see the index changes from the writer, you
>> need
>> > the
>> > >> searcher to do an empty commit - i.e. you invoke a commit with 0
>> > >> documents.
>> > >> This will refresh the caches (including autowarming), [re]build the
>> > >> relevant searchers etc. and make any index changes visible to the RO
>> > >> instance.
>> > >> Also, make sure to use native in solrconfig.xml
>> to
>> > >> ensure the two instances don't try to commit at the same time.
>> > >> There are several ways to trigger a commit:
>> > >> Call commit() periodically within your own code.
>> > >> Use autoCommit in solrconfig.xml.
>> > >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
>> > >> searcher the index has changed, then call commit when called (more
>> > complex
>> > >> coding, but good if the index changes on an ad-hoc basis).
>> > >> Note, doing things this way isn't really suitable for an NRT
>> > environment.
>> > >>
>> > >> HTH,
>> > >> Peter
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
>> > >> wrote:
>> > >>
>> > >> > Replication is fine, I am going to use it, but I wanted it for
>> > instances
>> > >> > *distributed* across several (physical) machines - but here I have
>> one
>> > >> > physical machine, it has many cores. I want to run 2 instances of
>> solr
>> > >> > because I think it has these benefits:
>> > >> >
>> > >> > 1) I ca

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Roman Chyla
I have auto commit after 40k RECs/1800secs. But I only tested with manual
commit, but I don't see why it should work differently.
Roman
On 7 Jun 2013 20:52, "Tim Vaillancourt"  wrote:

> If it makes you feel better, I also considered this approach when I was in
> the same situation with a separate indexer and searcher on one Physical
> linux machine.
>
> My main concern was "re-using" the FS cache between both instances - If I
> replicated to myself there would be two independent copies of the index,
> FS-cached separately.
>
> I like the suggestion of using autoCommit to reload the index. If I'm
> reading that right, you'd set an autoCommit on 'zero docs changing', or
> just 'every N seconds'? Did that work?
>
> Best of luck!
>
> Tim
>
>
> On 5 June 2013 10:19, Roman Chyla  wrote:
>
> > So here it is for a record how I am solving it right now:
> >
> > Write-master is started with: -Dmontysolr.warming.enabled=false
> > -Dmontysolr.write.master=true -Dmontysolr.read.master=
> > http://localhost:5005
> > Read-master is started with: -Dmontysolr.warming.enabled=true
> > -Dmontysolr.write.master=false
> >
> >
> > solrconfig.xml changes:
> >
> > 1. all index changing components have this bit,
> > enable="${montysolr.master:true}" - ie.
> >
> >  >  enable="${montysolr.master:true}">
> >
> > 2. for cache warming de/activation
> >
> >  >   class="solr.QuerySenderListener"
> >   enable="${montysolr.enable.warming:true}">...
> >
> > 3. to trigger refresh of the read-only-master (from write-master):
> >
> >  >   class="solr.RunExecutableListener"
> >   enable="${montysolr.master:true}">
> >   curl
> >   .
> >   false
> >${montysolr.read.master:http://localhost
> >
> >
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > 
> >
> > This works, I still don't like the reload of the whole core, but it seems
> > like the easiest thing to do now.
> >
> > -- roman
> >
> >
> > On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
> > wrote:
> >
> > > Hi Peter,
> > >
> > > Thank you, I am glad to read that this usecase is not alien.
> > >
> > > I'd like to make the second instance (searcher) completely read-only,
> so
> > I
> > > have disabled all the components that can write.
> > >
> > > (being lazy ;)) I'll probably use
> > > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> > after
> > > commit, or write some IndexReaderFactory that checks for changes
> > >
> > > The problem with calling the 'core reload' - is that it seems lots of
> > work
> > > for just opening a new searcher, eeekkk...somewhere I read that it is
> > cheap
> > > to reload a core, but re-opening the index searches must be definitely
> > > cheaper...
> > >
> > > roman
> > >
> > >
> > > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  > >wrote:
> > >
> > >> Hi,
> > >> We use this very same scenario to great effect - 2 instances using the
> > >> same
> > >> dataDir with many cores - 1 is a writer (no caching), the other is a
> > >> searcher (lots of caching).
> > >> To get the searcher to see the index changes from the writer, you need
> > the
> > >> searcher to do an empty commit - i.e. you invoke a commit with 0
> > >> documents.
> > >> This will refresh the caches (including autowarming), [re]build the
> > >> relevant searchers etc. and make any index changes visible to the RO
> > >> instance.
> > >> Also, make sure to use native in solrconfig.xml
> to
> > >> ensure the two instances don't try to commit at the same time.
> > >> There are several ways to trigger a commit:
> > >> Call commit() periodically within your own code.
> > >> Use autoCommit in solrconfig.xml.
> > >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> > >> searcher the index has changed, then call commit when called (more
> > complex
> > >> coding, but good if the index changes on an ad-hoc basis).
> > >> Note, doing things this way isn't really suitable for an NRT
> > environment.
> > >>
> > >> HTH,
> > >> Peter
> > >>
> > >>
> > >>
> > >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> > >> wrote:
> > >>
> > >> > Replication is fine, I am going to use it, but I wanted it for
> > instances
> > >> > *distributed* across several (physical) machines - but here I have
> one
> > >> > physical machine, it has many cores. I want to run 2 instances of
> solr
> > >> > because I think it has these benefits:
> > >> >
> > >> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> > >> > searcher (28GB)
> > >> > 2) I can deactivate warming for the writer and keep it for the
> > searcher
> > >> > (this considerably speeds up indexing - each time we commit, the
> > server
> > >> is
> > >> > rebuilding a citation network of 80M edges)
> > >> > 3) saving disk space and better OS caching (OS should be able to use
> > >> more
> > >> > RAM for the caching, which should result in faster operations - the
> > two
> > >> > processes are accessing the same index)
> > >> >
> > >> > Maybe I sho

Re: Two instances of solr - the same datadir?

2013-06-07 Thread Tim Vaillancourt
If it makes you feel better, I also considered this approach when I was in
the same situation with a separate indexer and searcher on one Physical
linux machine.

My main concern was "re-using" the FS cache between both instances - If I
replicated to myself there would be two independent copies of the index,
FS-cached separately.

I like the suggestion of using autoCommit to reload the index. If I'm
reading that right, you'd set an autoCommit on 'zero docs changing', or
just 'every N seconds'? Did that work?

Best of luck!

Tim


On 5 June 2013 10:19, Roman Chyla  wrote:

> So here it is for a record how I am solving it right now:
>
> Write-master is started with: -Dmontysolr.warming.enabled=false
> -Dmontysolr.write.master=true -Dmontysolr.read.master=
> http://localhost:5005
> Read-master is started with: -Dmontysolr.warming.enabled=true
> -Dmontysolr.write.master=false
>
>
> solrconfig.xml changes:
>
> 1. all index changing components have this bit,
> enable="${montysolr.master:true}" - ie.
>
>   enable="${montysolr.master:true}">
>
> 2. for cache warming de/activation
>
>class="solr.QuerySenderListener"
>   enable="${montysolr.enable.warming:true}">...
>
> 3. to trigger refresh of the read-only-master (from write-master):
>
>class="solr.RunExecutableListener"
>   enable="${montysolr.master:true}">
>   curl
>   .
>   false
>${montysolr.read.master:http://localhost
>
> }/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> 
>
> This works, I still don't like the reload of the whole core, but it seems
> like the easiest thing to do now.
>
> -- roman
>
>
> On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla 
> wrote:
>
> > Hi Peter,
> >
> > Thank you, I am glad to read that this usecase is not alien.
> >
> > I'd like to make the second instance (searcher) completely read-only, so
> I
> > have disabled all the components that can write.
> >
> > (being lazy ;)) I'll probably use
> > http://wiki.apache.org/solr/CollectionDistribution to call the curl
> after
> > commit, or write some IndexReaderFactory that checks for changes
> >
> > The problem with calling the 'core reload' - is that it seems lots of
> work
> > for just opening a new searcher, eeekkk...somewhere I read that it is
> cheap
> > to reload a core, but re-opening the index searches must be definitely
> > cheaper...
> >
> > roman
> >
> >
> > On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  >wrote:
> >
> >> Hi,
> >> We use this very same scenario to great effect - 2 instances using the
> >> same
> >> dataDir with many cores - 1 is a writer (no caching), the other is a
> >> searcher (lots of caching).
> >> To get the searcher to see the index changes from the writer, you need
> the
> >> searcher to do an empty commit - i.e. you invoke a commit with 0
> >> documents.
> >> This will refresh the caches (including autowarming), [re]build the
> >> relevant searchers etc. and make any index changes visible to the RO
> >> instance.
> >> Also, make sure to use native in solrconfig.xml to
> >> ensure the two instances don't try to commit at the same time.
> >> There are several ways to trigger a commit:
> >> Call commit() periodically within your own code.
> >> Use autoCommit in solrconfig.xml.
> >> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> >> searcher the index has changed, then call commit when called (more
> complex
> >> coding, but good if the index changes on an ad-hoc basis).
> >> Note, doing things this way isn't really suitable for an NRT
> environment.
> >>
> >> HTH,
> >> Peter
> >>
> >>
> >>
> >> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> >> wrote:
> >>
> >> > Replication is fine, I am going to use it, but I wanted it for
> instances
> >> > *distributed* across several (physical) machines - but here I have one
> >> > physical machine, it has many cores. I want to run 2 instances of solr
> >> > because I think it has these benefits:
> >> >
> >> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> >> > searcher (28GB)
> >> > 2) I can deactivate warming for the writer and keep it for the
> searcher
> >> > (this considerably speeds up indexing - each time we commit, the
> server
> >> is
> >> > rebuilding a citation network of 80M edges)
> >> > 3) saving disk space and better OS caching (OS should be able to use
> >> more
> >> > RAM for the caching, which should result in faster operations - the
> two
> >> > processes are accessing the same index)
> >> >
> >> > Maybe I should just forget it and go with the replication, but it
> >> doesn't
> >> > 'feel right' IFF it is on the same physical machine. And Lucene
> >> > specifically has a method for discovering changes and re-opening the
> >> index
> >> > (DirectoryReader.openIfChanged)
> >> >
> >> > Am I not seeing something?
> >> >
> >> > roman
> >> >
> >> >
> >> >
> >> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> >> > jhell...@innoventsolutions.com> wrote:
> >> >
> >> > > Roman,
> >

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
So here it is for a record how I am solving it right now:

Write-master is started with: -Dmontysolr.warming.enabled=false
-Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005
Read-master is started with: -Dmontysolr.warming.enabled=true
-Dmontysolr.write.master=false


solrconfig.xml changes:

1. all index changing components have this bit,
enable="${montysolr.master:true}" - ie.



2. for cache warming de/activation

...

3. to trigger refresh of the read-only-master (from write-master):


  curl
  .
  false
   ${montysolr.read.master:http://localhost
}/solr/admin/cores?wt=json&action=RELOAD&core=collection1


This works, I still don't like the reload of the whole core, but it seems
like the easiest thing to do now.

-- roman


On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla  wrote:

> Hi Peter,
>
> Thank you, I am glad to read that this usecase is not alien.
>
> I'd like to make the second instance (searcher) completely read-only, so I
> have disabled all the components that can write.
>
> (being lazy ;)) I'll probably use
> http://wiki.apache.org/solr/CollectionDistribution to call the curl after
> commit, or write some IndexReaderFactory that checks for changes
>
> The problem with calling the 'core reload' - is that it seems lots of work
> for just opening a new searcher, eeekkk...somewhere I read that it is cheap
> to reload a core, but re-opening the index searches must be definitely
> cheaper...
>
> roman
>
>
> On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge wrote:
>
>> Hi,
>> We use this very same scenario to great effect - 2 instances using the
>> same
>> dataDir with many cores - 1 is a writer (no caching), the other is a
>> searcher (lots of caching).
>> To get the searcher to see the index changes from the writer, you need the
>> searcher to do an empty commit - i.e. you invoke a commit with 0
>> documents.
>> This will refresh the caches (including autowarming), [re]build the
>> relevant searchers etc. and make any index changes visible to the RO
>> instance.
>> Also, make sure to use native in solrconfig.xml to
>> ensure the two instances don't try to commit at the same time.
>> There are several ways to trigger a commit:
>> Call commit() periodically within your own code.
>> Use autoCommit in solrconfig.xml.
>> Use an RPC/IPC mechanism between the 2 instance processes to tell the
>> searcher the index has changed, then call commit when called (more complex
>> coding, but good if the index changes on an ad-hoc basis).
>> Note, doing things this way isn't really suitable for an NRT environment.
>>
>> HTH,
>> Peter
>>
>>
>>
>> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
>> wrote:
>>
>> > Replication is fine, I am going to use it, but I wanted it for instances
>> > *distributed* across several (physical) machines - but here I have one
>> > physical machine, it has many cores. I want to run 2 instances of solr
>> > because I think it has these benefits:
>> >
>> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
>> > searcher (28GB)
>> > 2) I can deactivate warming for the writer and keep it for the searcher
>> > (this considerably speeds up indexing - each time we commit, the server
>> is
>> > rebuilding a citation network of 80M edges)
>> > 3) saving disk space and better OS caching (OS should be able to use
>> more
>> > RAM for the caching, which should result in faster operations - the two
>> > processes are accessing the same index)
>> >
>> > Maybe I should just forget it and go with the replication, but it
>> doesn't
>> > 'feel right' IFF it is on the same physical machine. And Lucene
>> > specifically has a method for discovering changes and re-opening the
>> index
>> > (DirectoryReader.openIfChanged)
>> >
>> > Am I not seeing something?
>> >
>> > roman
>> >
>> >
>> >
>> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
>> > jhell...@innoventsolutions.com> wrote:
>> >
>> > > Roman,
>> > >
>> > > Could you be more specific as to why replication doesn't meet your
>> > > requirements?  It was geared explicitly for this purpose, including
>> the
>> > > automatic discovery of changes to the data on the index master.
>> > >
>> > > Jason
>> > >
>> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla 
>> wrote:
>> > >
>> > > > OK, so I have verified the two instances can run alongside, sharing
>> the
>> > > > same datadir
>> > > >
>> > > > All update handlers are unaccessible in the read-only master
>> > > >
>> > > > > > > > enable="${solr.can.write:true}">
>> > > >
>> > > > java -Dsolr.can.write=false .
>> > > >
>> > > > And I can reload the index manually:
>> > > >
>> > > > curl "
>> > > >
>> > >
>> >
>> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
>> > > > "
>> > > >
>> > > > But this is not an ideal solution; I'd like for the read-only
>> server to
>> > > > discover index changes on its own. Any pointers?
>> > > >
>> > > > Thanks,
>> > > >
>> > > >  roman
>> > > >
>> > > >
>> > > > On Tu

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Roman Chyla
Hi Peter,

Thank you, I am glad to read that this usecase is not alien.

I'd like to make the second instance (searcher) completely read-only, so I
have disabled all the components that can write.

(being lazy ;)) I'll probably use
http://wiki.apache.org/solr/CollectionDistribution to call the curl after
commit, or write some IndexReaderFactory that checks for changes

The problem with calling the 'core reload' - is that it seems lots of work
for just opening a new searcher, eeekkk...somewhere I read that it is cheap
to reload a core, but re-opening the index searches must be definitely
cheaper...

roman


On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge  wrote:

> Hi,
> We use this very same scenario to great effect - 2 instances using the same
> dataDir with many cores - 1 is a writer (no caching), the other is a
> searcher (lots of caching).
> To get the searcher to see the index changes from the writer, you need the
> searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
> This will refresh the caches (including autowarming), [re]build the
> relevant searchers etc. and make any index changes visible to the RO
> instance.
> Also, make sure to use native in solrconfig.xml to
> ensure the two instances don't try to commit at the same time.
> There are several ways to trigger a commit:
> Call commit() periodically within your own code.
> Use autoCommit in solrconfig.xml.
> Use an RPC/IPC mechanism between the 2 instance processes to tell the
> searcher the index has changed, then call commit when called (more complex
> coding, but good if the index changes on an ad-hoc basis).
> Note, doing things this way isn't really suitable for an NRT environment.
>
> HTH,
> Peter
>
>
>
> On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla 
> wrote:
>
> > Replication is fine, I am going to use it, but I wanted it for instances
> > *distributed* across several (physical) machines - but here I have one
> > physical machine, it has many cores. I want to run 2 instances of solr
> > because I think it has these benefits:
> >
> > 1) I can give less RAM to the writer (4GB), and use more RAM for the
> > searcher (28GB)
> > 2) I can deactivate warming for the writer and keep it for the searcher
> > (this considerably speeds up indexing - each time we commit, the server
> is
> > rebuilding a citation network of 80M edges)
> > 3) saving disk space and better OS caching (OS should be able to use more
> > RAM for the caching, which should result in faster operations - the two
> > processes are accessing the same index)
> >
> > Maybe I should just forget it and go with the replication, but it doesn't
> > 'feel right' IFF it is on the same physical machine. And Lucene
> > specifically has a method for discovering changes and re-opening the
> index
> > (DirectoryReader.openIfChanged)
> >
> > Am I not seeing something?
> >
> > roman
> >
> >
> >
> > On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> > jhell...@innoventsolutions.com> wrote:
> >
> > > Roman,
> > >
> > > Could you be more specific as to why replication doesn't meet your
> > > requirements?  It was geared explicitly for this purpose, including the
> > > automatic discovery of changes to the data on the index master.
> > >
> > > Jason
> > >
> > > On Jun 4, 2013, at 1:50 PM, Roman Chyla  wrote:
> > >
> > > > OK, so I have verified the two instances can run alongside, sharing
> the
> > > > same datadir
> > > >
> > > > All update handlers are unaccessible in the read-only master
> > > >
> > > >  > > > enable="${solr.can.write:true}">
> > > >
> > > > java -Dsolr.can.write=false .
> > > >
> > > > And I can reload the index manually:
> > > >
> > > > curl "
> > > >
> > >
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > > > "
> > > >
> > > > But this is not an ideal solution; I'd like for the read-only server
> to
> > > > discover index changes on its own. Any pointers?
> > > >
> > > > Thanks,
> > > >
> > > >  roman
> > > >
> > > >
> > > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla 
> > > wrote:
> > > >
> > > >> Hello,
> > > >>
> > > >> I need your expert advice. I am thinking about running two instances
> > of
> > > >> solr that share the same datadirectory. The *reason* being: indexing
> > > >> instance is constantly building cache after every commit (we have a
> > big
> > > >> cache) and this slows it down. But indexing doesn't need much RAM,
> > only
> > > the
> > > >> search does (and server has lots of CPUs)
> > > >>
> > > >> So, it is like having two solr instances
> > > >>
> > > >> 1. solr-indexing-master
> > > >> 2. solr-read-only-master
> > > >>
> > > >> In the solrconfig.xml I can disable update components, It should be
> > > fine.
> > > >> However, I don't know how to 'trigger' index re-opening on (2) after
> > the
> > > >> commit happens on (1).
> > > >>
> > > >> Ideally, the second instance could monitor the disk and re-open disk
> > > after
> > > >> new files appear there. Do I have to implement cus

Re: Two instances of solr - the same datadir?

2013-06-05 Thread Peter Sturge
Hi,
We use this very same scenario to great effect - 2 instances using the same
dataDir with many cores - 1 is a writer (no caching), the other is a
searcher (lots of caching).
To get the searcher to see the index changes from the writer, you need the
searcher to do an empty commit - i.e. you invoke a commit with 0 documents.
This will refresh the caches (including autowarming), [re]build the
relevant searchers etc. and make any index changes visible to the RO
instance.
Also, make sure to use native in solrconfig.xml to
ensure the two instances don't try to commit at the same time.
There are several ways to trigger a commit:
Call commit() periodically within your own code.
Use autoCommit in solrconfig.xml.
Use an RPC/IPC mechanism between the 2 instance processes to tell the
searcher the index has changed, then call commit when called (more complex
coding, but good if the index changes on an ad-hoc basis).
Note, doing things this way isn't really suitable for an NRT environment.

HTH,
Peter



On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla  wrote:

> Replication is fine, I am going to use it, but I wanted it for instances
> *distributed* across several (physical) machines - but here I have one
> physical machine, it has many cores. I want to run 2 instances of solr
> because I think it has these benefits:
>
> 1) I can give less RAM to the writer (4GB), and use more RAM for the
> searcher (28GB)
> 2) I can deactivate warming for the writer and keep it for the searcher
> (this considerably speeds up indexing - each time we commit, the server is
> rebuilding a citation network of 80M edges)
> 3) saving disk space and better OS caching (OS should be able to use more
> RAM for the caching, which should result in faster operations - the two
> processes are accessing the same index)
>
> Maybe I should just forget it and go with the replication, but it doesn't
> 'feel right' IFF it is on the same physical machine. And Lucene
> specifically has a method for discovering changes and re-opening the index
> (DirectoryReader.openIfChanged)
>
> Am I not seeing something?
>
> roman
>
>
>
> On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
> jhell...@innoventsolutions.com> wrote:
>
> > Roman,
> >
> > Could you be more specific as to why replication doesn't meet your
> > requirements?  It was geared explicitly for this purpose, including the
> > automatic discovery of changes to the data on the index master.
> >
> > Jason
> >
> > On Jun 4, 2013, at 1:50 PM, Roman Chyla  wrote:
> >
> > > OK, so I have verified the two instances can run alongside, sharing the
> > > same datadir
> > >
> > > All update handlers are unaccessible in the read-only master
> > >
> > >  > > enable="${solr.can.write:true}">
> > >
> > > java -Dsolr.can.write=false .
> > >
> > > And I can reload the index manually:
> > >
> > > curl "
> > >
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > > "
> > >
> > > But this is not an ideal solution; I'd like for the read-only server to
> > > discover index changes on its own. Any pointers?
> > >
> > > Thanks,
> > >
> > >  roman
> > >
> > >
> > > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla 
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> I need your expert advice. I am thinking about running two instances
> of
> > >> solr that share the same datadirectory. The *reason* being: indexing
> > >> instance is constantly building cache after every commit (we have a
> big
> > >> cache) and this slows it down. But indexing doesn't need much RAM,
> only
> > the
> > >> search does (and server has lots of CPUs)
> > >>
> > >> So, it is like having two solr instances
> > >>
> > >> 1. solr-indexing-master
> > >> 2. solr-read-only-master
> > >>
> > >> In the solrconfig.xml I can disable update components, It should be
> > fine.
> > >> However, I don't know how to 'trigger' index re-opening on (2) after
> the
> > >> commit happens on (1).
> > >>
> > >> Ideally, the second instance could monitor the disk and re-open disk
> > after
> > >> new files appear there. Do I have to implement custom
> > IndexReaderFactory?
> > >> Or something else?
> > >>
> > >> Please note: I know about the replication, this usecase is IMHO
> slightly
> > >> different - in fact, write-only-master (1) is also a replication
> master
> > >>
> > >> Googling turned out only this
> > >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
> > no
> > >> pointers there.
> > >>
> > >> But If I am approaching the problem wrongly, please don't hesitate to
> > >> 're-educate' me :)
> > >>
> > >> Thanks!
> > >>
> > >>  roman
> > >>
> >
> >
>


Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
Replication is fine, I am going to use it, but I wanted it for instances
*distributed* across several (physical) machines - but here I have one
physical machine, it has many cores. I want to run 2 instances of solr
because I think it has these benefits:

1) I can give less RAM to the writer (4GB), and use more RAM for the
searcher (28GB)
2) I can deactivate warming for the writer and keep it for the searcher
(this considerably speeds up indexing - each time we commit, the server is
rebuilding a citation network of 80M edges)
3) saving disk space and better OS caching (OS should be able to use more
RAM for the caching, which should result in faster operations - the two
processes are accessing the same index)

Maybe I should just forget it and go with the replication, but it doesn't
'feel right' IFF it is on the same physical machine. And Lucene
specifically has a method for discovering changes and re-opening the index
(DirectoryReader.openIfChanged)

Am I not seeing something?

roman



On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman <
jhell...@innoventsolutions.com> wrote:

> Roman,
>
> Could you be more specific as to why replication doesn't meet your
> requirements?  It was geared explicitly for this purpose, including the
> automatic discovery of changes to the data on the index master.
>
> Jason
>
> On Jun 4, 2013, at 1:50 PM, Roman Chyla  wrote:
>
> > OK, so I have verified the two instances can run alongside, sharing the
> > same datadir
> >
> > All update handlers are unaccessible in the read-only master
> >
> >  > enable="${solr.can.write:true}">
> >
> > java -Dsolr.can.write=false .
> >
> > And I can reload the index manually:
> >
> > curl "
> >
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> > "
> >
> > But this is not an ideal solution; I'd like for the read-only server to
> > discover index changes on its own. Any pointers?
> >
> > Thanks,
> >
> >  roman
> >
> >
> > On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla 
> wrote:
> >
> >> Hello,
> >>
> >> I need your expert advice. I am thinking about running two instances of
> >> solr that share the same datadirectory. The *reason* being: indexing
> >> instance is constantly building cache after every commit (we have a big
> >> cache) and this slows it down. But indexing doesn't need much RAM, only
> the
> >> search does (and server has lots of CPUs)
> >>
> >> So, it is like having two solr instances
> >>
> >> 1. solr-indexing-master
> >> 2. solr-read-only-master
> >>
> >> In the solrconfig.xml I can disable update components, It should be
> fine.
> >> However, I don't know how to 'trigger' index re-opening on (2) after the
> >> commit happens on (1).
> >>
> >> Ideally, the second instance could monitor the disk and re-open disk
> after
> >> new files appear there. Do I have to implement custom
> IndexReaderFactory?
> >> Or something else?
> >>
> >> Please note: I know about the replication, this usecase is IMHO slightly
> >> different - in fact, write-only-master (1) is also a replication master
> >>
> >> Googling turned out only this
> >> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 -
> no
> >> pointers there.
> >>
> >> But If I am approaching the problem wrongly, please don't hesitate to
> >> 're-educate' me :)
> >>
> >> Thanks!
> >>
> >>  roman
> >>
>
>


Re: Two instances of solr - the same datadir?

2013-06-04 Thread Jason Hellman
Roman,

Could you be more specific as to why replication doesn't meet your 
requirements?  It was geared explicitly for this purpose, including the 
automatic discovery of changes to the data on the index master.  

Jason

On Jun 4, 2013, at 1:50 PM, Roman Chyla  wrote:

> OK, so I have verified the two instances can run alongside, sharing the
> same datadir
> 
> All update handlers are unaccessible in the read-only master
> 
>  enable="${solr.can.write:true}">
> 
> java -Dsolr.can.write=false .
> 
> And I can reload the index manually:
> 
> curl "
> http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
> "
> 
> But this is not an ideal solution; I'd like for the read-only server to
> discover index changes on its own. Any pointers?
> 
> Thanks,
> 
>  roman
> 
> 
> On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla  wrote:
> 
>> Hello,
>> 
>> I need your expert advice. I am thinking about running two instances of
>> solr that share the same datadirectory. The *reason* being: indexing
>> instance is constantly building cache after every commit (we have a big
>> cache) and this slows it down. But indexing doesn't need much RAM, only the
>> search does (and server has lots of CPUs)
>> 
>> So, it is like having two solr instances
>> 
>> 1. solr-indexing-master
>> 2. solr-read-only-master
>> 
>> In the solrconfig.xml I can disable update components, It should be fine.
>> However, I don't know how to 'trigger' index re-opening on (2) after the
>> commit happens on (1).
>> 
>> Ideally, the second instance could monitor the disk and re-open disk after
>> new files appear there. Do I have to implement custom IndexReaderFactory?
>> Or something else?
>> 
>> Please note: I know about the replication, this usecase is IMHO slightly
>> different - in fact, write-only-master (1) is also a replication master
>> 
>> Googling turned out only this
>> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no
>> pointers there.
>> 
>> But If I am approaching the problem wrongly, please don't hesitate to
>> 're-educate' me :)
>> 
>> Thanks!
>> 
>>  roman
>> 



Re: Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
OK, so I have verified the two instances can run alongside, sharing the
same datadir

All update handlers are unaccessible in the read-only master



java -Dsolr.can.write=false .

And I can reload the index manually:

curl "
http://localhost:5005/solr/admin/cores?wt=json&action=RELOAD&core=collection1
"

But this is not an ideal solution; I'd like for the read-only server to
discover index changes on its own. Any pointers?

Thanks,

  roman


On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla  wrote:

> Hello,
>
> I need your expert advice. I am thinking about running two instances of
> solr that share the same datadirectory. The *reason* being: indexing
> instance is constantly building cache after every commit (we have a big
> cache) and this slows it down. But indexing doesn't need much RAM, only the
> search does (and server has lots of CPUs)
>
> So, it is like having two solr instances
>
> 1. solr-indexing-master
> 2. solr-read-only-master
>
> In the solrconfig.xml I can disable update components, It should be fine.
> However, I don't know how to 'trigger' index re-opening on (2) after the
> commit happens on (1).
>
> Ideally, the second instance could monitor the disk and re-open disk after
> new files appear there. Do I have to implement custom IndexReaderFactory?
> Or something else?
>
> Please note: I know about the replication, this usecase is IMHO slightly
> different - in fact, write-only-master (1) is also a replication master
>
> Googling turned out only this
> http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no
> pointers there.
>
> But If I am approaching the problem wrongly, please don't hesitate to
> 're-educate' me :)
>
> Thanks!
>
>   roman
>


Two instances of solr - the same datadir?

2013-06-04 Thread Roman Chyla
Hello,

I need your expert advice. I am thinking about running two instances of
solr that share the same datadirectory. The *reason* being: indexing
instance is constantly building cache after every commit (we have a big
cache) and this slows it down. But indexing doesn't need much RAM, only the
search does (and server has lots of CPUs)

So, it is like having two solr instances

1. solr-indexing-master
2. solr-read-only-master

In the solrconfig.xml I can disable update components, It should be fine.
However, I don't know how to 'trigger' index re-opening on (2) after the
commit happens on (1).

Ideally, the second instance could monitor the disk and re-open disk after
new files appear there. Do I have to implement custom IndexReaderFactory?
Or something else?

Please note: I know about the replication, this usecase is IMHO slightly
different - in fact, write-only-master (1) is also a replication master

Googling turned out only this
http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no
pointers there.

But If I am approaching the problem wrongly, please don't hesitate to
're-educate' me :)

Thanks!

  roman