Re: index size with replication

2012-03-15 Thread Erick Erickson
Or just ignore it if you have the disk space. The files will be cleaned up
eventually. I believe they'll magically disappear if you simply bounce the
server (but work on *nix so can't personally guarantee it). And replication
won't replicate the stale files, so that's not a problem either

Best
Erick

On Wed, Mar 14, 2012 at 11:54 PM, Mike Austin mike.aus...@juggle.com wrote:
 Shawn,

 Thanks for the detailed answer! I will play around with this information in
 hand.  Maybe a second optimize or just a dummy commit after the optimize
 will help get me past this.  Both not the best options, but maybe it's a do
 it because it's running on windows work-around. If it is indeed a file
 locking issue, I think I can probably work around this since my indexing is
 scheduled at certain times and not live so I could try the optimize again
 soon after or do a single commit that seems to fix the issue also.  Or just
 not optimize..

 Thanks,
 Mike

 On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/14/2012 2:54 PM, Mike Austin wrote:

 The odd thing is that if I optimize the index it doubles in size.. If I
 then, add one more document to the index it goes back down to half size?

 Is there a way to force this without needing to wait until another
 document
 is added? Or do you have more information on what you think is going on?
 I'm using a trunk version of solr4 from 9/12/2011 with a master with two
 slaves setup.  Everything besides this is working great!


 The not-very-helpful-but-true answer: Don't run on Windows.  I checked
 your prior messages to the list to verify that this is your environment.
  If you can control index updates so they don't happen at the same time as
 your optimizes, you can also get around this problem by doing the optimize
 twice.  You would have to be absolutely sure that no changes are made to
 the index between the two optimizes, so the second one basically doesn't do
 anything except take care of the deletes.

 Nuts and bolts of why this happens: Solr keeps the old files open so the
 existing reader can continue to serve queries.  That reader will not be
 closed until the last query completes, which may not happen until well
 after the time the new reader is completely online and ready.  I assume
 that the delete attempt occurs as soon as the new index segments are
 completely online, before the old reader begins to close.  I've not read
 the source code to find out.

 On Linux and other UNIX-like environments, you can delete files while they
 are open by a process.  They continue to exist as in-memory links and take
 up space until those processes close them, at which point they are truly
 gone.  On Windows, an attempt to delete an open file will fail, even if
 it's open read-only.

 There are probably a number of ways that this problem could be solved for
 Windows platforms.  The simplest that I can think of, assuming it's even
 possible, would be to wait until the old reader is closed before attempting
 the segment deletion.  That may not be possible - the information may not
 be available to the portion of code that does the deletion.  There are a
 few things standing in the way of me fixing this problem myself: 1) I'm a
 beginning Java programmer.  2) I'm not familiar with the Solr code at all.
 3) My interest level is low because I run on Linux, not Windows.

 Thanks,
 Shawn




Re: index size with replication

2012-03-15 Thread Mike Austin
The problem is that when replicating, the double-size index gets replicated
to slaves.  I am now doing a dummy commit with always the same document and
it works fine.. After the optimize and dummy commit process I just end up
with numDocs = x and maxDocs = x+1.  I don't get the nice green checkmark
in the admin interface but I can live with that.

mike

On Thu, Mar 15, 2012 at 8:17 AM, Erick Erickson erickerick...@gmail.comwrote:

 Or just ignore it if you have the disk space. The files will be cleaned up
 eventually. I believe they'll magically disappear if you simply bounce the
 server (but work on *nix so can't personally guarantee it). And replication
 won't replicate the stale files, so that's not a problem either

 Best
 Erick

 On Wed, Mar 14, 2012 at 11:54 PM, Mike Austin mike.aus...@juggle.com
 wrote:
  Shawn,
 
  Thanks for the detailed answer! I will play around with this information
 in
  hand.  Maybe a second optimize or just a dummy commit after the optimize
  will help get me past this.  Both not the best options, but maybe it's a
 do
  it because it's running on windows work-around. If it is indeed a file
  locking issue, I think I can probably work around this since my indexing
 is
  scheduled at certain times and not live so I could try the optimize
 again
  soon after or do a single commit that seems to fix the issue also.  Or
 just
  not optimize..
 
  Thanks,
  Mike
 
  On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 3/14/2012 2:54 PM, Mike Austin wrote:
 
  The odd thing is that if I optimize the index it doubles in size.. If I
  then, add one more document to the index it goes back down to half
 size?
 
  Is there a way to force this without needing to wait until another
  document
  is added? Or do you have more information on what you think is going
 on?
  I'm using a trunk version of solr4 from 9/12/2011 with a master with
 two
  slaves setup.  Everything besides this is working great!
 
 
  The not-very-helpful-but-true answer: Don't run on Windows.  I checked
  your prior messages to the list to verify that this is your environment.
   If you can control index updates so they don't happen at the same time
 as
  your optimizes, you can also get around this problem by doing the
 optimize
  twice.  You would have to be absolutely sure that no changes are made to
  the index between the two optimizes, so the second one basically
 doesn't do
  anything except take care of the deletes.
 
  Nuts and bolts of why this happens: Solr keeps the old files open so the
  existing reader can continue to serve queries.  That reader will not be
  closed until the last query completes, which may not happen until well
  after the time the new reader is completely online and ready.  I assume
  that the delete attempt occurs as soon as the new index segments are
  completely online, before the old reader begins to close.  I've not read
  the source code to find out.
 
  On Linux and other UNIX-like environments, you can delete files while
 they
  are open by a process.  They continue to exist as in-memory links and
 take
  up space until those processes close them, at which point they are truly
  gone.  On Windows, an attempt to delete an open file will fail, even if
  it's open read-only.
 
  There are probably a number of ways that this problem could be solved
 for
  Windows platforms.  The simplest that I can think of, assuming it's even
  possible, would be to wait until the old reader is closed before
 attempting
  the segment deletion.  That may not be possible - the information may
 not
  be available to the portion of code that does the deletion.  There are a
  few things standing in the way of me fixing this problem myself: 1) I'm
 a
  beginning Java programmer.  2) I'm not familiar with the Solr code at
 all.
  3) My interest level is low because I run on Linux, not Windows.
 
  Thanks,
  Shawn
 
 



Re: index size with replication

2012-03-15 Thread Walter Underwood
No, the deleted files do not get replicated. Instead, the slaves do the same 
thing as the master, holding on to the deleted files after the new files are 
copied over.

The optimize is obsoleting all of your index files, so maybe should quit doing 
that. Without an optimize, the deleted files will be much smaller and the 
replicated files will be much smaller. Once in a while, the automatic merging 
will rebuild the largest files, but it will happen less often.

You need free disk space equal to the index size anyway, to handle a full 
reindex or replicating a full reindex. So provide the free space and stop 
worrying about this.

Shawn is right, Unix does a more graceful job of handling file deletion, but it 
doesn't make a lot of difference here. Even if the files are unlinked, open 
files still use disk blocks.

wunder
Search Guy, Chegg

On Mar 15, 2012, at 8:54 AM, Mike Austin wrote:

 The problem is that when replicating, the double-size index gets replicated
 to slaves.  I am now doing a dummy commit with always the same document and
 it works fine.. After the optimize and dummy commit process I just end up
 with numDocs = x and maxDocs = x+1.  I don't get the nice green checkmark
 in the admin interface but I can live with that.
 
 mike
 
 On Thu, Mar 15, 2012 at 8:17 AM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 Or just ignore it if you have the disk space. The files will be cleaned up
 eventually. I believe they'll magically disappear if you simply bounce the
 server (but work on *nix so can't personally guarantee it). And replication
 won't replicate the stale files, so that's not a problem either
 
 Best
 Erick
 
 On Wed, Mar 14, 2012 at 11:54 PM, Mike Austin mike.aus...@juggle.com
 wrote:
 Shawn,
 
 Thanks for the detailed answer! I will play around with this information
 in
 hand.  Maybe a second optimize or just a dummy commit after the optimize
 will help get me past this.  Both not the best options, but maybe it's a
 do
 it because it's running on windows work-around. If it is indeed a file
 locking issue, I think I can probably work around this since my indexing
 is
 scheduled at certain times and not live so I could try the optimize
 again
 soon after or do a single commit that seems to fix the issue also.  Or
 just
 not optimize..
 
 Thanks,
 Mike
 
 On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey s...@elyograg.org wrote:
 
 On 3/14/2012 2:54 PM, Mike Austin wrote:
 
 The odd thing is that if I optimize the index it doubles in size.. If I
 then, add one more document to the index it goes back down to half
 size?
 
 Is there a way to force this without needing to wait until another
 document
 is added? Or do you have more information on what you think is going
 on?
 I'm using a trunk version of solr4 from 9/12/2011 with a master with
 two
 slaves setup.  Everything besides this is working great!
 
 
 The not-very-helpful-but-true answer: Don't run on Windows.  I checked
 your prior messages to the list to verify that this is your environment.
 If you can control index updates so they don't happen at the same time
 as
 your optimizes, you can also get around this problem by doing the
 optimize
 twice.  You would have to be absolutely sure that no changes are made to
 the index between the two optimizes, so the second one basically
 doesn't do
 anything except take care of the deletes.
 
 Nuts and bolts of why this happens: Solr keeps the old files open so the
 existing reader can continue to serve queries.  That reader will not be
 closed until the last query completes, which may not happen until well
 after the time the new reader is completely online and ready.  I assume
 that the delete attempt occurs as soon as the new index segments are
 completely online, before the old reader begins to close.  I've not read
 the source code to find out.
 
 On Linux and other UNIX-like environments, you can delete files while
 they
 are open by a process.  They continue to exist as in-memory links and
 take
 up space until those processes close them, at which point they are truly
 gone.  On Windows, an attempt to delete an open file will fail, even if
 it's open read-only.
 
 There are probably a number of ways that this problem could be solved
 for
 Windows platforms.  The simplest that I can think of, assuming it's even
 possible, would be to wait until the old reader is closed before
 attempting
 the segment deletion.  That may not be possible - the information may
 not
 be available to the portion of code that does the deletion.  There are a
 few things standing in the way of me fixing this problem myself: 1) I'm
 a
 beginning Java programmer.  2) I'm not familiar with the Solr code at
 all.
 3) My interest level is low because I run on Linux, not Windows.
 
 Thanks,
 Shawn
 
 
 







Re: index size with replication

2012-03-14 Thread Mike Austin
The odd thing is that if I optimize the index it doubles in size.. If I
then, add one more document to the index it goes back down to half size?

Is there a way to force this without needing to wait until another document
is added? Or do you have more information on what you think is going on?
I'm using a trunk version of solr4 from 9/12/2011 with a master with two
slaves setup.  Everything besides this is working great!

Thanks,
Mike

On Tue, Mar 13, 2012 at 9:32 PM, Li Li fancye...@gmail.com wrote:

  optimize will generate new segments and delete old ones. if your master
 also provides searching service during indexing, the old files may be
 opened by old SolrIndexSearcher. they will be deleted later. So when
 indexing, the index size may double. But a moment later, old indexes will
 be deleted.

 On Wed, Mar 14, 2012 at 7:06 AM, Mike Austin mike.aus...@juggle.com
 wrote:

  I have a master with two slaves.  For some reason on the master if I do
 an
  optimize after indexing on the master it double in size from 42meg to 90
  meg.. however,  when the slaves replicate they get the 42meg index..
 
  Should the master and slaves always be the same size?
 
  Thanks,
  Mike
 



Re: index size with replication

2012-03-14 Thread Ahmet Arslan

 Another note.. if I reload solr app
 it goes back down in size.
 
 here is my replication settings on the master:
 
 requestHandler name=/replication
 class=solr.ReplicationHandler 
        lst name=master
          str
 name=replicateAfterstartup/str
          str
 name=replicateAftercommit/str
          str
 name=replicateAfteroptimize/str
          int
 name=numberToKeep1/int
          str
 name=confFilesschema.xml,stopwords.txt,elevate.xml/str
          str
 name=commitReserveDuration00:00:30/str
        /lst
 /requestHandler

Could it be https://issues.apache.org/jira/browse/SOLR-3033 ?




RE: index size with replication

2012-03-14 Thread Dyer, James
SOLR-3033 is related to ReplcationHandler's ability to do backups.  It allows 
you to specify how many backups you want to keep.  You don't seem to have any 
backups configured here so it is not an applicable parameter (note that 
SOLR-3033 was committed to trunk recently but the config param was made 
maxNumberOfBackups ... see http://wiki.apache.org/solr/SolrReplication#Master 
)

I can only take a wild guess why you have the temporary increase in index size. 
 Could it be that something is locking the old segment files so they do not get 
deleted on optimize?  Then maybe they are subsequently getting cleaned up at 
your next commit and restart ?

Finally, keep in mind that doing optimizes aren't generally recommended 
anymore.  Everyone's situation is different, but if you have good settings for 
mergeFactor and ramBufferSizeMB, then optimize is (probably) not going to 
do anything helpful. 

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Wednesday, March 14, 2012 4:25 PM
To: solr-user@lucene.apache.org
Subject: Re: index size with replication


 Another note.. if I reload solr app
 it goes back down in size.
 
 here is my replication settings on the master:
 
 requestHandler name=/replication
 class=solr.ReplicationHandler 
        lst name=master
          str
 name=replicateAfterstartup/str
          str
 name=replicateAftercommit/str
          str
 name=replicateAfteroptimize/str
          int
 name=numberToKeep1/int
          str
 name=confFilesschema.xml,stopwords.txt,elevate.xml/str
          str
 name=commitReserveDuration00:00:30/str
        /lst
 /requestHandler

Could it be https://issues.apache.org/jira/browse/SOLR-3033 ?




Re: index size with replication

2012-03-14 Thread Mike Austin
Thanks.  I might just remove the optimize.  I had it planned for once a
week but maybe I'll just do it and restart the app if performance slows.


On Wed, Mar 14, 2012 at 4:37 PM, Dyer, James james.d...@ingrambook.comwrote:

 SOLR-3033 is related to ReplcationHandler's ability to do backups.  It
 allows you to specify how many backups you want to keep.  You don't seem to
 have any backups configured here so it is not an applicable parameter (note
 that SOLR-3033 was committed to trunk recently but the config param was
 made maxNumberOfBackups ... see
 http://wiki.apache.org/solr/SolrReplication#Master )

 I can only take a wild guess why you have the temporary increase in index
 size.  Could it be that something is locking the old segment files so they
 do not get deleted on optimize?  Then maybe they are subsequently getting
 cleaned up at your next commit and restart ?

 Finally, keep in mind that doing optimizes aren't generally recommended
 anymore.  Everyone's situation is different, but if you have good settings
 for mergeFactor and ramBufferSizeMB, then optimize is (probably) not
 going to do anything helpful.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Wednesday, March 14, 2012 4:25 PM
 To: solr-user@lucene.apache.org
 Subject: Re: index size with replication


  Another note.. if I reload solr app
  it goes back down in size.
 
  here is my replication settings on the master:
 
  requestHandler name=/replication
  class=solr.ReplicationHandler 
 lst name=master
   str
  name=replicateAfterstartup/str
   str
  name=replicateAftercommit/str
   str
  name=replicateAfteroptimize/str
   int
  name=numberToKeep1/int
   str
  name=confFilesschema.xml,stopwords.txt,elevate.xml/str
   str
  name=commitReserveDuration00:00:30/str
 /lst
  /requestHandler

 Could it be https://issues.apache.org/jira/browse/SOLR-3033 ?





Re: index size with replication

2012-03-14 Thread Shawn Heisey

On 3/14/2012 2:54 PM, Mike Austin wrote:

The odd thing is that if I optimize the index it doubles in size.. If I
then, add one more document to the index it goes back down to half size?

Is there a way to force this without needing to wait until another document
is added? Or do you have more information on what you think is going on?
I'm using a trunk version of solr4 from 9/12/2011 with a master with two
slaves setup.  Everything besides this is working great!


The not-very-helpful-but-true answer: Don't run on Windows.  I checked 
your prior messages to the list to verify that this is your 
environment.  If you can control index updates so they don't happen at 
the same time as your optimizes, you can also get around this problem by 
doing the optimize twice.  You would have to be absolutely sure that no 
changes are made to the index between the two optimizes, so the second 
one basically doesn't do anything except take care of the deletes.


Nuts and bolts of why this happens: Solr keeps the old files open so the 
existing reader can continue to serve queries.  That reader will not be 
closed until the last query completes, which may not happen until well 
after the time the new reader is completely online and ready.  I assume 
that the delete attempt occurs as soon as the new index segments are 
completely online, before the old reader begins to close.  I've not read 
the source code to find out.


On Linux and other UNIX-like environments, you can delete files while 
they are open by a process.  They continue to exist as in-memory links 
and take up space until those processes close them, at which point they 
are truly gone.  On Windows, an attempt to delete an open file will 
fail, even if it's open read-only.


There are probably a number of ways that this problem could be solved 
for Windows platforms.  The simplest that I can think of, assuming it's 
even possible, would be to wait until the old reader is closed before 
attempting the segment deletion.  That may not be possible - the 
information may not be available to the portion of code that does the 
deletion.  There are a few things standing in the way of me fixing this 
problem myself: 1) I'm a beginning Java programmer.  2) I'm not familiar 
with the Solr code at all. 3) My interest level is low because I run on 
Linux, not Windows.


Thanks,
Shawn



Re: index size with replication

2012-03-14 Thread Mike Austin
Shawn,

Thanks for the detailed answer! I will play around with this information in
hand.  Maybe a second optimize or just a dummy commit after the optimize
will help get me past this.  Both not the best options, but maybe it's a do
it because it's running on windows work-around. If it is indeed a file
locking issue, I think I can probably work around this since my indexing is
scheduled at certain times and not live so I could try the optimize again
soon after or do a single commit that seems to fix the issue also.  Or just
not optimize..

Thanks,
Mike

On Wed, Mar 14, 2012 at 6:34 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/14/2012 2:54 PM, Mike Austin wrote:

 The odd thing is that if I optimize the index it doubles in size.. If I
 then, add one more document to the index it goes back down to half size?

 Is there a way to force this without needing to wait until another
 document
 is added? Or do you have more information on what you think is going on?
 I'm using a trunk version of solr4 from 9/12/2011 with a master with two
 slaves setup.  Everything besides this is working great!


 The not-very-helpful-but-true answer: Don't run on Windows.  I checked
 your prior messages to the list to verify that this is your environment.
  If you can control index updates so they don't happen at the same time as
 your optimizes, you can also get around this problem by doing the optimize
 twice.  You would have to be absolutely sure that no changes are made to
 the index between the two optimizes, so the second one basically doesn't do
 anything except take care of the deletes.

 Nuts and bolts of why this happens: Solr keeps the old files open so the
 existing reader can continue to serve queries.  That reader will not be
 closed until the last query completes, which may not happen until well
 after the time the new reader is completely online and ready.  I assume
 that the delete attempt occurs as soon as the new index segments are
 completely online, before the old reader begins to close.  I've not read
 the source code to find out.

 On Linux and other UNIX-like environments, you can delete files while they
 are open by a process.  They continue to exist as in-memory links and take
 up space until those processes close them, at which point they are truly
 gone.  On Windows, an attempt to delete an open file will fail, even if
 it's open read-only.

 There are probably a number of ways that this problem could be solved for
 Windows platforms.  The simplest that I can think of, assuming it's even
 possible, would be to wait until the old reader is closed before attempting
 the segment deletion.  That may not be possible - the information may not
 be available to the portion of code that does the deletion.  There are a
 few things standing in the way of me fixing this problem myself: 1) I'm a
 beginning Java programmer.  2) I'm not familiar with the Solr code at all.
 3) My interest level is low because I run on Linux, not Windows.

 Thanks,
 Shawn




index size with replication

2012-03-13 Thread Mike Austin
I have a master with two slaves.  For some reason on the master if I do an
optimize after indexing on the master it double in size from 42meg to 90
meg.. however,  when the slaves replicate they get the 42meg index..

Should the master and slaves always be the same size?

Thanks,
Mike


Re: index size with replication

2012-03-13 Thread Li Li
 optimize will generate new segments and delete old ones. if your master
also provides searching service during indexing, the old files may be
opened by old SolrIndexSearcher. they will be deleted later. So when
indexing, the index size may double. But a moment later, old indexes will
be deleted.

On Wed, Mar 14, 2012 at 7:06 AM, Mike Austin mike.aus...@juggle.com wrote:

 I have a master with two slaves.  For some reason on the master if I do an
 optimize after indexing on the master it double in size from 42meg to 90
 meg.. however,  when the slaves replicate they get the 42meg index..

 Should the master and slaves always be the same size?

 Thanks,
 Mike