Earwin - do you have some numbers to share on the running time of the indexing application? You've mentioned that if you take out fsync into a BG thread, the running time improves, but I'm curious to know by how much.
Shai On Wed, Apr 7, 2010 at 2:26 AM, Earwin Burrfoot <ear...@gmail.com> wrote: > > Running out of disk space with fsync disabled won't lead to corruption. > > Even kill -9 the JRE process with fsync disabled won't corrupt. > > In these cases index just falls back to last successful commit. > > > > It's "only" power loss / OS / machine crash where you need fsync to > > avoid possible corruption (corruption may not even occur w/o fsync if > > you "get lucky"). > > Sorry to disappoint you, but running out of disk space is worse than kill > -9. > You can write down the file (to cache in fact), close it, all without > getting any > exceptions. And then it won't get flushed to disk because the disk is full. > This can happen to segments file (and old one is deleted with default > deletion > policy). This can happen to fat freq/prox files mentioned in segments file > (and yeah, the old segments file is deleted, so no falling back). > > > What if your background thread simply committed every couple of minutes? > > What's the difference between taking the snapshot (which means you had > > to call commit previously) and commit it, to call iw.commit by a > backgroud merge? > -- > > But: why do you need to commit so often? > To see stuff on reopen? Yes, I know about NRT. > > > You've reinvented autocommit=true! > ?? I'm doing regular commits, syncing down every Nth of it. > > > Doesn't this just BG the syncing? Ie you could make a dedicated > > thread to do this. > Yes, exactly, this BGs the syncing to a dedicated thread. Threads > doing indexation/merging can continue unhampered. > > > One possible win with this aproach is.... the cost of fsync should go > > way down the longer you wait after writing bytes to the file and > > before calling fsync. This is because typically OS write caches > > expire by time (eg 30 seconds) so if you want long enough the bytes > > will already at least be delivered to the IO system (but the IO system > > can do further caching which could still take time). On windows at > > least I definitely noticed this effect -- wait some before fync'ing > > and it's net/net much less costly. > Yup. In fact you can just hold on to the latest commit for N seconds, > than switch to the new latest commit. > OS will fsync everything for you. > > > I'm just playing around with stupid idea. I'd like to have NRT > look-alike without binding readers and writers. :) > Right now it's probably best for me to save my time and cut over to current > NRT. > But. An important lesson was learnt - no fsyncing blows up your index > on out-of-disk-space. > > -- > Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) > Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423 > ICQ: 104465785 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >