[ 
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544803
 ] 

Doron Cohen commented on LUCENE-1044:
-------------------------------------

{quote}
I'll look into the separate thread to sync/close files in the
background next...
{quote}

I was wondering if delaying sync to actual commit point would run faster
than a background thread. I thought it would, because the background
thread, though not holding current thread from continue with indexing, 
does force the sync *now* rather than letting the IO subsystem actually 
write stuff on its time. I was also hoping that by doing them later, 
some of the syncs would become no-ops, and hence faster. I found
out however that delaying the syncs (but intending to sync) also 
means keeping the file handles open, and therefore  this is not 
a practical approach. Still it was interesting to compare. 

So... my small test sequentially writes M characters to N files 
and either do not sync (just close), or does sync in one of three 
ways: (1) at the end, (2) immediately, (3) in a background thread. 
The results (in millis) on my Windows XP were:

|| num files || num chars per file || No Sync || Sync At End || Background Sync 
|| Immediate Sync ||
|   100 | 10000 |   631 |   5778 |   5729 |   5828 |
|   100 | 10000 |   581 |   4486 |   4117 |   4687 |
|  1000 |  1000 |  1612 |  38996 |  34900 |  35852 |
|  1000 |  1000 |  1432 |  37153 |  35051 |  37263 |
| 10000 |   100 | 10335 | 154262 | 162103 | 174251 |
| 10000 |   100 | 11276 | 147752 | 159480 | 222450 |

Each configuration ran twice and there are fluctuations, 
but it is obvious (as Mike noticed) that no-sync is much faster
then sync. In fact in my test no-sync is at least 10 times faster
than any sync approach, while in Mike's test which is using 
Lucene the penalty is smaller. Difference might be because 
in my test there is no CPU work involved, just IO. 

Comparing "immediate" to "background" I it is not clearly worth it 
to add a background thread (unless Mike's test proves otherwise..)

> Behavior on hard power shutdown
> -------------------------------
>
>                 Key: LUCENE-1044
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1044
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>         Environment: Windows Server 2003, Standard Edition, Sun Hotspot Java 
> 1.5
>            Reporter: venkat rangan
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: LUCENE-1044.patch, LUCENE-1044.take2.patch, 
> LUCENE-1044.take3.patch
>
>
> When indexing a large number of documents, upon a hard power failure  (e.g. 
> pull the power cord), the index seems to get corrupted. We start a Java 
> application as an Windows Service, and feed it documents. In some cases 
> (after an index size of 1.7GB, with 30-40 index segment .cfs files) , the 
> following is observed.
> The 'segments' file contains only zeros. Its size is 265 bytes - all bytes 
> are zeros.
> The 'deleted' file also contains only zeros. Its size is 85 bytes - all bytes 
> are zeros.
> Before corruption, the segments file and deleted file appear to be correct. 
> After this corruption, the index is corrupted and lost.
> This is a problem observed in Lucene 1.4.3. We are not able to upgrade our 
> customer deployments to 1.9 or later version, but would be happy to back-port 
> a patch, if the patch is small enough and if this problem is already solved.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to