Re: Getting fsync out of the loop

2010-04-08 Thread Michael McCandless
On Wed, Apr 7, 2010 at 3:27 PM, Earwin Burrfoot ear...@gmail.com wrote:
 No, this doesn't make sense.  The OS detects a disk full on accepting
 the write into the write cache, not [later] on flushing the write
 cache to disk.  If the OS accepts the write, then disk is not full (ie
 flushing the cache will succeed, unless some other not-disk-full
 problem happens).

 Hmmm, at least, normally.  What OS/IO system were you on when you saw
 corruption due to disk full when fsync is disabled?

 I'm still skeptical that disk full even with fsync disabled can lead
 to corruption I'd like to see some concrete proof :)

 Linux 2.6.30-1-amd64, ext3, simple scsi drive

Hm.  Linux should detect disk full on the initial write.

 I checked with our resident DB brainiac, he says such things are possible.

 Okay, I'm not 100% sure this is the cause of my corruptions. It just happened
 that when the index got corrupted, disk space was also used up - several 
 times.
 I had that silent-fail-to-write theory and checked it up with some 
 knowledgeable
 people. Even if they are right, I can be mistaken and the root cause
 is different.

OK... if you get a more concrete case where disk full causes
corruption when you disable fsync, please post details back.  From
what I understand this should never happen.

 You're mixing up terminology a bit here -- you can't hold on to the
 latest commit then switch to it.  A commit (as sent to the deletion
 policy) means a *real* commit (ie, IW.commit or IW.close was called).
 So I think your BG thread would simply be calling IW.commit every N
 seconds?
 Under hold on to I meant - keep from being deleted, like SnapshotDP does.

But, IW doesn't let you hold on to checkpoints... only to commits.

Ie SnapshotDP will only see actual commit/close calls, not
intermediate checkpoints like a random segment merge completing, a
flush happening, etc.

Or... maybe you would in fact call commit frequently from the main
threads (but with fsync disabled), and then your DP holds onto these
fake commits, periodically picking one of them to do the real
fsync ing?

 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)
 I see... well binding durability  visibility will always be costly.
 This is why Lucene decouples them (by making NRT readers available).
 My experiments do the same, essentially.

 But after I understood that to perform deletions IW has to load term indexes
 anyway, I'm almost ready to give up and go for intertwined IW/IR mess :)

Hey if you really think it's a mess, post a patch that cleans it up :)

 BTW, if you know your OS/IO system always persists cached writes w/in
 N seconds, a safe way to avoid fsync is to use a by-time expiring
 deletion policy.  Ie, a commit stays alive as long as its age is less
 than X... DP's unit test has such a policy.  But you better really
 know for sure that the OS/IO system guarantee that :)
 Yeah. I thought of it, but it is even more shady :)

I agree.  And even if you know you're on Linux, and that your pdflush
flushes after X seconds, you still have the IO system to contend with.

Best to stick with fsync, commit only for safety as needed by the app,
and use NRT for fast visibility.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-08 Thread Earwin Burrfoot
 But, IW doesn't let you hold on to checkpoints... only to commits.

 Ie SnapshotDP will only see actual commit/close calls, not
 intermediate checkpoints like a random segment merge completing, a
 flush happening, etc.

 Or... maybe you would in fact call commit frequently from the main
 threads (but with fsync disabled), and then your DP holds onto these
 fake commits, periodically picking one of them to do the real
 fsync ing?
Yeah, that's exactly what I tried to describe in my initial post :)

 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)
 I see... well binding durability  visibility will always be costly.
 This is why Lucene decouples them (by making NRT readers available).
 My experiments do the same, essentially.
 But after I understood that to perform deletions IW has to load term indexes
 anyway, I'm almost ready to give up and go for intertwined IW/IR mess :)
 Hey if you really think it's a mess, post a patch that cleans it up :)
Uh oh. Let me finish current one, first. Second - I don't know yet how
this should look like.
Something along the lines of deletions/norms writers being extracted
from segment reader
and reader pool being made external to IW??

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-08 Thread Michael McCandless
On Thu, Apr 8, 2010 at 6:21 PM, Earwin Burrfoot ear...@gmail.com wrote:
 But, IW doesn't let you hold on to checkpoints... only to commits.

 Ie SnapshotDP will only see actual commit/close calls, not
 intermediate checkpoints like a random segment merge completing, a
 flush happening, etc.

 Or... maybe you would in fact call commit frequently from the main
 threads (but with fsync disabled), and then your DP holds onto these
 fake commits, periodically picking one of them to do the real
 fsync ing?
 Yeah, that's exactly what I tried to describe in my initial post :)

Ahh ok then it makes more sense.  But still you shouldn't commit that
often (even with fake fsync) since it must flush the segment.

 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)
 I see... well binding durability  visibility will always be costly.
 This is why Lucene decouples them (by making NRT readers available).
 My experiments do the same, essentially.
 But after I understood that to perform deletions IW has to load term indexes
 anyway, I'm almost ready to give up and go for intertwined IW/IR mess :)
 Hey if you really think it's a mess, post a patch that cleans it up :)
 Uh oh. Let me finish current one, first.

Heh, yes :)

 Second - I don't know yet how
 this should look like.
 Something along the lines of deletions/norms writers being extracted
 from segment reader
 and reader pool being made external to IW??

Yeah, reader pool should be pulled out of IW, and I think IW should be
split into that which manages the segment infos, that which
adds/deletes docs, and the rest (merging, addIndexes*)?  (There's
an issue open for this refactoring...).

I'm not sure about deletions/norms writers being extracted from SR
I think delete ops would still go through IW?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-07 Thread Michael McCandless
On Tue, Apr 6, 2010 at 7:26 PM, Earwin Burrfoot ear...@gmail.com wrote:
 Running out of disk space with fsync disabled won't lead to corruption.
 Even kill -9 the JRE process with fsync disabled won't corrupt.
 In these cases index just falls back to last successful commit.

 It's only power loss / OS / machine crash where you need fsync to
 avoid possible corruption (corruption may not even occur w/o fsync if
 you get lucky).

 Sorry to disappoint you, but running out of disk space is worse than kill -9.
 You can write down the file (to cache in fact), close it, all without
 getting any
 exceptions. And then it won't get flushed to disk because the disk is full.
 This can happen to segments file (and old one is deleted with default deletion
 policy). This can happen to fat freq/prox files mentioned in segments file
 (and yeah, the old segments file is deleted, so no falling back).

No, this doesn't make sense.  The OS detects a disk full on accepting
the write into the write cache, not [later] on flushing the write
cache to disk.  If the OS accepts the write, then disk is not full (ie
flushing the cache will succeed, unless some other not-disk-full
problem happens).

Hmmm, at least, normally.  What OS/IO system were you on when you saw
corruption due to disk full when fsync is disabled?

 What if your background thread simply committed every couple of minutes?
 What's the difference between taking the snapshot (which means you had
 to call commit previously) and commit it, to call iw.commit by a backgroud 
 merge?
 --
 But: why do you need to commit so often?
 To see stuff on reopen? Yes, I know about NRT.

 You've reinvented autocommit=true!
 ?? I'm doing regular commits, syncing down every Nth of it.

 Doesn't this just BG the syncing?  Ie you could make a dedicated
 thread to do this.

 Yes, exactly, this BGs the syncing to a dedicated thread. Threads
 doing indexation/merging can continue unhampered.

OK.  Or you can index with N+1 threads, and each indexer thread does
the commit if it's time...

 One possible win with this aproach is the cost of fsync should go
 way down the longer you wait after writing bytes to the file and
 before calling fsync.  This is because typically OS write caches
 expire by time (eg 30 seconds) so if you want long enough the bytes
 will already at least be delivered to the IO system (but the IO system
 can do further caching which could still take time).  On windows at
 least I definitely noticed this effect -- wait some before fync'ing
 and it's net/net much less costly.

 Yup. In fact you can just hold on to the latest commit for N seconds,
 than switch to the new latest commit.
 OS will fsync everything for you.

You're mixing up terminology a bit here -- you can't hold on to the
latest commit then switch to it.  A commit (as sent to the deletion
policy) means a *real* commit (ie, IW.commit or IW.close was called).
So I think your BG thread would simply be calling IW.commit every N
seconds?

 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)

I see... well binding durability  visibility will always be costly.
This is why Lucene decouples them (by making NRT readers available).

 Right now it's probably best for me to save my time and cut over to current 
 NRT.
 But. An important lesson was learnt - no fsyncing blows up your index
 on out-of-disk-space.

I'm still skeptical that disk full even with fsync disabled can lead
to corruption I'd like to see some concrete proof :)

BTW, if you know your OS/IO system always persists cached writes w/in
N seconds, a safe way to avoid fsync is to use a by-time expiring
deletion policy.  Ie, a commit stays alive as long as its age is less
than X... DP's unit test has such a policy.  But you better really
know for sure that the OS/IO system guarantee that :)

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-07 Thread Earwin Burrfoot
I don't have the system at hand now, but if I remember right fsync
took like 100-200ms.

2010/4/7 Shai Erera ser...@gmail.com:
 Earwin - do you have some numbers to share on the running time of the
 indexing application? You've mentioned that if you take out fsync into a BG
 thread, the running time improves, but I'm curious to know by how much.

 Shai

 On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot ear...@gmail.com wrote:

  Running out of disk space with fsync disabled won't lead to corruption.
  Even kill -9 the JRE process with fsync disabled won't corrupt.
  In these cases index just falls back to last successful commit.
 
  It's only power loss / OS / machine crash where you need fsync to
  avoid possible corruption (corruption may not even occur w/o fsync if
  you get lucky).

 Sorry to disappoint you, but running out of disk space is worse than kill
 -9.
 You can write down the file (to cache in fact), close it, all without
 getting any
 exceptions. And then it won't get flushed to disk because the disk is
 full.
 This can happen to segments file (and old one is deleted with default
 deletion
 policy). This can happen to fat freq/prox files mentioned in segments file
 (and yeah, the old segments file is deleted, so no falling back).

  What if your background thread simply committed every couple of minutes?
  What's the difference between taking the snapshot (which means you had
  to call commit previously) and commit it, to call iw.commit by a
  backgroud merge?
 --
  But: why do you need to commit so often?
 To see stuff on reopen? Yes, I know about NRT.

  You've reinvented autocommit=true!
 ?? I'm doing regular commits, syncing down every Nth of it.

  Doesn't this just BG the syncing?  Ie you could make a dedicated
  thread to do this.
 Yes, exactly, this BGs the syncing to a dedicated thread. Threads
 doing indexation/merging can continue unhampered.

  One possible win with this aproach is the cost of fsync should go
  way down the longer you wait after writing bytes to the file and
  before calling fsync.  This is because typically OS write caches
  expire by time (eg 30 seconds) so if you want long enough the bytes
  will already at least be delivered to the IO system (but the IO system
  can do further caching which could still take time).  On windows at
  least I definitely noticed this effect -- wait some before fync'ing
  and it's net/net much less costly.
 Yup. In fact you can just hold on to the latest commit for N seconds,
 than switch to the new latest commit.
 OS will fsync everything for you.


 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)
 Right now it's probably best for me to save my time and cut over to
 current NRT.
 But. An important lesson was learnt - no fsyncing blows up your index
 on out-of-disk-space.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org






-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-07 Thread Earwin Burrfoot
 No, this doesn't make sense.  The OS detects a disk full on accepting
 the write into the write cache, not [later] on flushing the write
 cache to disk.  If the OS accepts the write, then disk is not full (ie
 flushing the cache will succeed, unless some other not-disk-full
 problem happens).

 Hmmm, at least, normally.  What OS/IO system were you on when you saw
 corruption due to disk full when fsync is disabled?

 I'm still skeptical that disk full even with fsync disabled can lead
 to corruption I'd like to see some concrete proof :)

Linux 2.6.30-1-amd64, ext3, simple scsi drive
I checked with our resident DB brainiac, he says such things are possible.

Okay, I'm not 100% sure this is the cause of my corruptions. It just happened
that when the index got corrupted, disk space was also used up - several times.
I had that silent-fail-to-write theory and checked it up with some knowledgeable
people. Even if they are right, I can be mistaken and the root cause
is different.

 You're mixing up terminology a bit here -- you can't hold on to the
 latest commit then switch to it.  A commit (as sent to the deletion
 policy) means a *real* commit (ie, IW.commit or IW.close was called).
 So I think your BG thread would simply be calling IW.commit every N
 seconds?
Under hold on to I meant - keep from being deleted, like SnapshotDP does.

 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)
 I see... well binding durability  visibility will always be costly.
 This is why Lucene decouples them (by making NRT readers available).
My experiments do the same, essentially.

But after I understood that to perform deletions IW has to load term indexes
anyway, I'm almost ready to give up and go for intertwined IW/IR mess :)

 BTW, if you know your OS/IO system always persists cached writes w/in
 N seconds, a safe way to avoid fsync is to use a by-time expiring
 deletion policy.  Ie, a commit stays alive as long as its age is less
 than X... DP's unit test has such a policy.  But you better really
 know for sure that the OS/IO system guarantee that :)
Yeah. I thought of it, but it is even more shady :)


-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-06 Thread Shai Erera
How often is fsync called? If it's just during calls to commit, then is that
that expensive? I mean, how often do you call commit?

If that's that expensive (do you have some numbers to share) then I think
that's be a neat idea. Though losing a few minutes worth of updates may
sometimes be unrecoverable, depending on the scenario, bur I guess for those
cases the 'standard way' should be used.

What if your background thread simply committed every couple of minutes?
What's the difference between taking the snapshot (which means you had to
call commit previously) and commit it, to call iw.commit by a backgroud
merge?

Shai

On Tue, Apr 6, 2010 at 5:11 PM, Earwin Burrfoot ear...@gmail.com wrote:

 So, I want to pump my IndexWriter hard and fast with documents.

 Removing fsync from FSDirectory helps. But for that I pay with possibility
 of
 index corruption, not only if my node suddenly loses
 power/kernelpanics, but also if it
 runs out of disk space (which happens more frequently).

 I invented the following solution:
 We write a special deletion policy that resembles SnapshotDeletionPolicy.
 At all times it takes hold of current synced commit and preserves
 it. Once every N minutes
 a special thread takes latest commit, syncs it and nominates as
 current synced commit. The
 previous one gets deleted.

 Now we are disastery-proof, and do fsync asynchronously from indexing
 threads. We pay for this with
 somewhat bigger transient disc usage, and probably losing a few
 minutes worth of updates in
 case of a crash, but that's acceptable.

 How does this sound?

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org




Re: Getting fsync out of the loop

2010-04-06 Thread Michael McCandless
On Tue, Apr 6, 2010 at 10:11 AM, Earwin Burrfoot ear...@gmail.com wrote:

 So, I want to pump my IndexWriter hard and fast with documents.

Nice.

 Removing fsync from FSDirectory helps. But for that I pay with possibility of
 index corruption, not only if my node suddenly loses
 power/kernelpanics, but also if it
 runs out of disk space (which happens more frequently).

Running out of disk space with fsync disabled won't lead to corruption.

Even kill -9 the JRE process with fsync disabled won't corrupt.

In these cases index just falls back to last successful commit.

It's only power loss / OS / machine crash where you need fsync to
avoid possible corruption (corruption may not even occur w/o fsync if
you get lucky).

But: why do you need to commit so often?

 I invented the following solution:
 We write a special deletion policy that resembles SnapshotDeletionPolicy.
 At all times it takes hold of current synced commit and preserves
 it. Once every N minutes
 a special thread takes latest commit, syncs it and nominates as
 current synced commit. The
 previous one gets deleted.

 Now we are disastery-proof, and do fsync asynchronously from indexing
 threads. We pay for this with
 somewhat bigger transient disc usage, and probably losing a few
 minutes worth of updates in
 case of a crash, but that's acceptable.

 How does this sound?

You've reinvented autocommit=true!

Doesn't this just BG the syncing?  Ie you could make a dedicated
thread to do this.

One possible win with this aproach is the cost of fsync should go
way down the longer you wait after writing bytes to the file and
before calling fsync.  This is because typically OS write caches
expire by time (eg 30 seconds) so if you want long enough the bytes
will already at least be delivered to the IO system (but the IO system
can do further caching which could still take time).  On windows at
least I definitely noticed this effect -- wait some before fync'ing
and it's net/net much less costly.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-06 Thread Earwin Burrfoot
 Running out of disk space with fsync disabled won't lead to corruption.
 Even kill -9 the JRE process with fsync disabled won't corrupt.
 In these cases index just falls back to last successful commit.

 It's only power loss / OS / machine crash where you need fsync to
 avoid possible corruption (corruption may not even occur w/o fsync if
 you get lucky).

Sorry to disappoint you, but running out of disk space is worse than kill -9.
You can write down the file (to cache in fact), close it, all without
getting any
exceptions. And then it won't get flushed to disk because the disk is full.
This can happen to segments file (and old one is deleted with default deletion
policy). This can happen to fat freq/prox files mentioned in segments file
(and yeah, the old segments file is deleted, so no falling back).

 What if your background thread simply committed every couple of minutes?
 What's the difference between taking the snapshot (which means you had
 to call commit previously) and commit it, to call iw.commit by a backgroud 
 merge?
--
 But: why do you need to commit so often?
To see stuff on reopen? Yes, I know about NRT.

 You've reinvented autocommit=true!
?? I'm doing regular commits, syncing down every Nth of it.

 Doesn't this just BG the syncing?  Ie you could make a dedicated
 thread to do this.
Yes, exactly, this BGs the syncing to a dedicated thread. Threads
doing indexation/merging can continue unhampered.

 One possible win with this aproach is the cost of fsync should go
 way down the longer you wait after writing bytes to the file and
 before calling fsync.  This is because typically OS write caches
 expire by time (eg 30 seconds) so if you want long enough the bytes
 will already at least be delivered to the IO system (but the IO system
 can do further caching which could still take time).  On windows at
 least I definitely noticed this effect -- wait some before fync'ing
 and it's net/net much less costly.
Yup. In fact you can just hold on to the latest commit for N seconds,
than switch to the new latest commit.
OS will fsync everything for you.


I'm just playing around with stupid idea. I'd like to have NRT
look-alike without binding readers and writers. :)
Right now it's probably best for me to save my time and cut over to current NRT.
But. An important lesson was learnt - no fsyncing blows up your index
on out-of-disk-space.

-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Getting fsync out of the loop

2010-04-06 Thread Shai Erera
Earwin - do you have some numbers to share on the running time of the
indexing application? You've mentioned that if you take out fsync into a BG
thread, the running time improves, but I'm curious to know by how much.

Shai

On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot ear...@gmail.com wrote:

  Running out of disk space with fsync disabled won't lead to corruption.
  Even kill -9 the JRE process with fsync disabled won't corrupt.
  In these cases index just falls back to last successful commit.
 
  It's only power loss / OS / machine crash where you need fsync to
  avoid possible corruption (corruption may not even occur w/o fsync if
  you get lucky).

 Sorry to disappoint you, but running out of disk space is worse than kill
 -9.
 You can write down the file (to cache in fact), close it, all without
 getting any
 exceptions. And then it won't get flushed to disk because the disk is full.
 This can happen to segments file (and old one is deleted with default
 deletion
 policy). This can happen to fat freq/prox files mentioned in segments file
 (and yeah, the old segments file is deleted, so no falling back).

  What if your background thread simply committed every couple of minutes?
  What's the difference between taking the snapshot (which means you had
  to call commit previously) and commit it, to call iw.commit by a
 backgroud merge?
 --
  But: why do you need to commit so often?
 To see stuff on reopen? Yes, I know about NRT.

  You've reinvented autocommit=true!
 ?? I'm doing regular commits, syncing down every Nth of it.

  Doesn't this just BG the syncing?  Ie you could make a dedicated
  thread to do this.
 Yes, exactly, this BGs the syncing to a dedicated thread. Threads
 doing indexation/merging can continue unhampered.

  One possible win with this aproach is the cost of fsync should go
  way down the longer you wait after writing bytes to the file and
  before calling fsync.  This is because typically OS write caches
  expire by time (eg 30 seconds) so if you want long enough the bytes
  will already at least be delivered to the IO system (but the IO system
  can do further caching which could still take time).  On windows at
  least I definitely noticed this effect -- wait some before fync'ing
  and it's net/net much less costly.
 Yup. In fact you can just hold on to the latest commit for N seconds,
 than switch to the new latest commit.
 OS will fsync everything for you.


 I'm just playing around with stupid idea. I'd like to have NRT
 look-alike without binding readers and writers. :)
 Right now it's probably best for me to save my time and cut over to current
 NRT.
 But. An important lesson was learnt - no fsyncing blows up your index
 on out-of-disk-space.

 --
 Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
 Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
 ICQ: 104465785

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org