[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846991#action_12846991
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-----------------------------------------

Okay, summing up.

1. Directory gets a new method - sync(Collection<String>), it will become 
abstract in 4.0, but now by default delegates to current sync(String), which is 
deprecated.
2. FSDirectory tracks newly written, closed and not deleted files, by changing 
FSD.IndexOutput accordingly.
3. sync() semantics changes from "sync this now" to "sync this now, if you 
think it's needed". Noop sync() impls like RAMDir continue to be noop, FSDir 
syncs only those files that exist in its tracking set and ignores all others.
4. IW/IR stop tracking synced files completely (lots of garbage code gone from 
IW), and instead call sync(Collection) on commit with a list of all files that 
constitute said commit.

These steps preserve back-compatibility (Except for cases of custom Directory 
impls in which calling sync on the same file sequentially is costly. They will 
suffer performance degradation), ensure that for each commit only strictly 
requested subset of files is synced (thing Mike insisted on), and will 
completely remove sync-tracking code from IW and IR.

5. We open another issue to experiment with batch syncing and various 
filesystems. Some relevant fun data: 
http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html


> IndexWriter.synced  field accumulates data leading to a Memory Leak
> -------------------------------------------------------------------
>
>                 Key: LUCENE-2328
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2328
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
>         Environment: all
>            Reporter: Gregor Kaczor
>            Priority: Minor
>             Fix For: 3.1
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am running into a strange OutOfMemoryError. My small test application does
> index and delete some few files. This is repeated for 60k times. Optimization
> is run from every 2k times a file is indexed. Index size is 50KB. I did 
> analyze
> the HeapDumpFile and realized that IndexWriter.synced field occupied more than
> half of the heap. That field is a private HashSet without a getter. Its task 
> is
> to hold files which have been synced already.
> There are two calls to addAll and one call to add on synced but no remove or
> clear throughout the lifecycle of the IndexWriter instance.
> According to the Eclipse Memory Analyzer synced contains 32618 entries which
> look like file names "_e065_1.del" or "_e067.cfs"
> The index directory contains 10 files only.
> I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to