[ 
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1044:
---------------------------------------

    Attachment: LUCENE-1044.take6.patch

New rev of this patch.  All tests pass.  I think it's ready to
commit, but I'll wait a few days for comments.

This patch has a small change to the segments_N file: it adds a
checksum to the end.  I added ChecksumIndexInput/Output that wrap an
existing IndexInput/Output for this.  This is used to verify the file
is "intact" before trusting its contents when opening the index.  We
need this to guard against the machine crashing after we've written
segments_N and before we've succeeded in syncing it.

Unfortunately, in testing performance, I still see a sizable (~30-50%)
performance hit to indexing throughput, on windows computers (XP Pro
laptop & Win 2003 Server R64 computer).  It seems that calling sync
was causing IO in other threads (ie flushing a new segment) to
drasically slow down.  Note that this is only when autoCommit=true; if
it's false then performance is only slightly worse (because only on
closing the writer do we sync)

So I tried sleeping, after writing and before syncing.  I sleep based
on number of bytes written, for up to 10 seconds, and amazingly, this
greadly reduces the performance loss on the windows computers, and
doesn't hurt performance on Linux/OS X computers.

I think this must be because calling sync immediately forces the OS to
write dirty buffers to disk "in a rush" (severely impacting IO writes
from other threads), whereas if you wait first, you let the OS
schedule those writes on its own, at good times (maybe when IO system
is "relatively" idle).

It's disappointing to have to "game" the OS to gain back this
performance.  I wish Java had a "waitUntilSync'd" to do the same
things as fsync, but without "rushing" the OS.

On Linux 2.6.22 on a RAID5 array I still see a net performance cost of
~12%, sleeping or no sleeping.  On Mac OS X it's ~3% loss.

Other fixes:
  * DirectoryIndexReader's doCommit now also syncs
  * Improved logic on when we must sync-before-CFS: it's not necessary
    if the just-merged segments are not referenced by the last commit
    point (ie if they were all flushed during this writer session)
  * Created SegmentInfos.commit() method, which writes and then syncs
    the next segments_N file
  * Simplified sync() logic now that merge threads are stopped before
    writer is closed
  * Changed CMS.newMergeThread to name its threads
  * More test cases
  * Various other small fixes

Here are test details.  I index first 200K Wikipedia docs with this
alg:

  analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
  doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
  docs.file=/Volumes/External/lucene/wiki.txt
  doc.stored = true
  doc.term.vector = true
  doc.term.vector.offsets = true
  doc.term.vector.positions = true

  doc.maker.forever = false
  directory=FSDirectory

  { "BuildIndex"
    CreateIndex
    { "AddDocs" AddDoc > : 200000
    CloseIndex
  }

  RepSumByPref BuildIndex

Win2003 R64, JVM 1.6.0_03
  trunk: 523 sec
  patch: 547 sec (5% slower)

Win XP Pro, laptop hard drive, JVM 1.4.2_15-b02
  trunk: 1237 sec
  patch: 1278 sec (3% slower)

Linux ReiserFS on 6 drive RAID 5 array, JVM 1.5.0_08
  trunk: 483 sec
  patch: 539 sec (12% slower)

Mac OS X 10.4 4-drive RAID 0 array, JVM 1.5.0_13
  trunk: 268 sec
  patch: 276 sec (3% slower)


> Behavior on hard power shutdown
> -------------------------------
>
>                 Key: LUCENE-1044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1044
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>         Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 
> 1.5
>            Reporter: venkat rangan
>            Assignee: Michael McCandless
>             Fix For: 2.4
>
>         Attachments: FSyncPerfTest.java, LUCENE-1044.patch, 
> LUCENE-1044.take2.patch, LUCENE-1044.take3.patch, LUCENE-1044.take4.patch, 
> LUCENE-1044.take5.patch, LUCENE-1044.take6.patch
>
>
> When indexing a large number of documents, upon a hard power failure  (e.g. 
> pull the power cord), the index seems to get corrupted. We start a Java 
> application as an Windows Service, and feed it documents. In some cases 
> (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the 
> following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes 
> are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes 
> are zeros.
> Before corruption, the segments file and deleted file appear to be correct. 
> After this corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our 
> customer deployments to 1.9 or later version, but would be happy to back-port 
> a patch, if the patch is small enough and if this problem is already solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to