[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849076#action_12849076 ] Michael McCandless commented on LUCENE-2328: Woops, fixed, thanks! IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848632#action_12848632 ] Michael McCandless commented on LUCENE-2328: I think it's OK to make an exception to back-compat here. Users who subclass FSDir, and also borrow SimpleFDDir's IndexOutput impl, are very advanced and can change their code. The break will also be very clear -- compilation error, which you must fix to move on -- so we're not making a trap here. Uwe are you OK with the rename? I think it actually does make sense that it be in the base class... IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848778#action_12848778 ] Uwe Schindler commented on LUCENE-2328: --- I am fine now! Go for it! Policeman is happy. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848789#action_12848789 ] Michael McCandless commented on LUCENE-2328: OK I will commit shortly! Thanks Earwin :) IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-2328. Resolution: Fixed IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848973#action_12848973 ] Earwin Burrfoot commented on LUCENE-2328: - Mike, you missed latest patch, with Shai-requested comment: {code} @@ -85,6 +85,8 @@ * stable storage. Lucene uses this to properly commit * changes to the index, to prevent a machine/OS crash * from corrupting the index. + * @deprecated use {...@link #sync(Collection)} instead. + * For easy migration you can change your code to call sync(Collections.singleton(name)) */ @Deprecated public void sync(String name) throws IOException { // TODO 4.0 kill me {code} IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848102#action_12848102 ] Earwin Burrfoot commented on LUCENE-2328: - Ah, patch is based off LUCENE-2339. If applied over trunk, there may be some import conflicts in Directory.java IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Assigned: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-2328: -- Assignee: Michael McCandless IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848159#action_12848159 ] Michael McCandless commented on LUCENE-2328: BTW on the sync of still-open files non-supported case, if we ever did want to support, I think we'd add sync to IndexOutput. Ie it makes sense that this dir-level sync only works after the file is closed. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848212#action_12848212 ] Michael McCandless commented on LUCENE-2328: Patch looks great -- what a sweet cleanup! I love all the code removed from IW DR :) Can we remove NoFSync Ext3StyleFSync? People can write these if they want... Also, you moved SimpleFSDir.SimpleFSIndexOutput - FSDir.FSIndexOutput... but this is a break in back-compat right? Ie subclasses out there may be using this? Can you add a CHANGES entry? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-2328: Attachment: LUCENE-2328.patch New patch. FSyncStrategy removed, default inlined. All our Directory impls override deprecated sync() to preserve back-compat. Preserving back-compat for IO move is impossible, mentioned in CHANGES.txt, which probably needs some love. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-2328: Attachment: LUCENE-2328.patch Clean patch against trunk IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848324#action_12848324 ] Michael McCandless commented on LUCENE-2328: Patch looks great Earwin -- I'll commit in a day or two. Thanks! IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848341#action_12848341 ] Shai Erera commented on LUCENE-2328: Earwin, can you add a deprecation message to sync(String)? When I upgraded from 2.9 to 3.0 some methods were deprecated w/o any explanation as to what I should use instead. I think a message like @deprecated use #sync(Collection) instead. For easy migration you can change your code to call sync(Colllections.singleton(name)) ... or something along those lines. Other than that, patch looks great! I really like the code cleanup from IW. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Earwin Burrfoot updated LUCENE-2328: Attachment: LUCENE-2328.patch added comment to jdocs IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848385#action_12848385 ] Uwe Schindler commented on LUCENE-2328: --- Ahm, one question, why does this patch reimplement the deprecated and removed FSIndexInput/FSIndexOutput? They have to be and are in SimpleFSIndexOutput. You are reverting to the pre-2.9 state. This is not obvious to me, so I am -1 about this patch without explanation. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848427#action_12848427 ] Earwin Burrfoot commented on LUCENE-2328: - I do not touch *IndexInput, these should stay where they are. FSIndexOutput is used in *all* child classes, without changes, so it's only logical to move it to parent. It is also tied in now with sync tracking logic, and required for it to work properly. Preserving backwards-compatibility here is impossible because we need FSIO to call back its parent, whether it's by declaring it non-static, or passing a new explicit parameter to constructor, it is required and it is a break. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Assignee: Michael McCandless Priority: Minor Fix For: 3.1 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847448#action_12847448 ] Shai Erera commented on LUCENE-2328: Earwin, I agree that sub-classing FSDir is not that easy. So I guess you'll add another piece of jdoc to createOutput, to notify Dir when it's closed? This seems reasonable. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847492#action_12847492 ] Michael McCandless commented on LUCENE-2328: When it's opened, not closed, right? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847512#action_12847512 ] Earwin Burrfoot commented on LUCENE-2328: - I'll either jdoc this, or move createOutput to FSDir, as all three current impls are a copy of each other. In such a case someone overriding createOutput can look at the original and decide for himself if he wants to keep and call this functionality, or not. When it's opened, not closed, right? Mike, I thought about it once again. If you allow sync()ing open files, you still need to track when they are closed. Or the following may happen: io = dir.createIndexOutput(name); // registers 'name' as a stale file dir.sync(name) // syncs 'name', removes it from registry ... // do stuff io.close() dir.sync(name) // does not sync 'name', as it is no longer in the registry ... // BZZWHAM!! ... // crash happens, the data is lost IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847515#action_12847515 ] Earwin Burrfoot commented on LUCENE-2328: - Thus, I think we should officially disallow syncing open files. This operation is impossible right now and pointless, anyway. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847536#action_12847536 ] Michael McCandless commented on LUCENE-2328: bq. If you allow sync()ing open files, you still need to track when they are closed. Or the following may happen: Ahh right. OK so let's disallow that in the API. You can only sync a file after it's been closed. Trying to sync a file that hasn't yet been closed will be undefined. (and it sounds like *FSDir will silently ignore the request). IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847585#action_12847585 ] Shai Erera commented on LUCENE-2328: bq. Trying to sync a file that hasn't yet been closed will be undefined Can we avoid 'undefined'? We have an issue open about SegmentInfos.fileLength() not clearly defined and it causes confusion. If it's undefined, then someone might attempt to call sync before he closes the file, and only then close ... can we throw an exception in that case? We can have close(), sync() and closeAndSync(). Would the latter make sense? I prefer if the API will be explicit,, and I think that throwing an exception (StillOpenException?) if sync() is called before close() is very explicit, and reasonable if accompanied by a proper jdoc. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847642#action_12847642 ] Michael McCandless commented on LUCENE-2328: bq. We can have close(), sync() and closeAndSync(). Would the latter make sense? I don't think closeAndSync could be used by Lucene, at least today. Typically, at the time these files are closed, Lucene has no idea whether sync is needed (ie, whether a commit() will be called by the app before the segment gets merged). So I don't think we should add it now? (Design for today). bq. I prefer if the API will be explicit,, and I think that throwing an exception (StillOpenException?) if sync() is called before close() is very explicit, and reasonable if accompanied by a proper jdoc. This would be great... I think, especially, for something as important as sync(), we should not silently ignore you when you think you've sync'd an open file. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-2328: --- Fix Version/s: 3.1 Anyone wanna cons up a patch here...? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: IndexWriter.synced field accumulates data
Thanks! Mike On Wed, Mar 17, 2010 at 3:16 PM, Gregor Kaczor gkac...@gmx.de wrote: followup in https://issues.apache.org/jira/browse/LUCENE-2328 Original-Nachricht Datum: Wed, 17 Mar 2010 14:30:25 -0500 Von: Michael McCandless luc...@mikemccandless.com An: java-dev@lucene.apache.org Betreff: Re: IndexWriter.synced field accumulates data You're right! Really we should delete from sync'd when we delete the files. We need to tie into IndexFileDeleter for that, maybe moving this set into there. Though in practice the amount of actual RAM used should rarely be an issue? But we should fix it... Can you open an issue? Mike On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote: I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846821#action_12846821 ] Shai Erera commented on LUCENE-2328: Would that mean removing files from synced whenever 'deleter' (which is an IndexFileDeleter) calls delete*? Are there other places to look for? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846825#action_12846825 ] Michael McCandless commented on LUCENE-2328: Yes I think that's it. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846835#action_12846835 ] Earwin Burrfoot commented on LUCENE-2328: - A shot in the sky (didn't delve deep into the problem, could definetly miss stuff): What about tracking 'syncidness' from within Directory? There shouldn't be more than one writer anyway (unless your locking is broken), so that's a single set of 'files-to-be-synced' for each given moment of time. Might as well keep track of it inside the directory, and have a syncAllUnsyncedGuys() on it. This will also remove the need to transfer that list around when transferring write lock (IR hell). And all-round that sounds quite logical, as the need/method of syncing depends solely on directory. If you're working with RAMDirectory, you don't need to keep track of these files at all. Probably same for some of DB impls. Also some filesystems sync everything, when you ask to sync a single file, so if you're syncing a batch of them in a row, that's some overhead that you can theoretically work around with a special flag to FSDir. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846872#action_12846872 ] Michael McCandless commented on LUCENE-2328: I like this idea! But, we don't want to simply sync all new files. When IW commits, it's possibly a subset of all new files. EG running merges (or any still-open files) should not be sync'd. Not necessarily all closed files should be sync'd either -- eg any files that were opened closed while we were syncing (since syncing can take some time) should not then be sync'd. Maybe we change Dir.sync to take a CollectionString? Then dir would be the one place that keeps track of what's already been sync'd and what hasn't. Or... I wonder if calling sync on a file that's already been sync'd is really that wasteful... I mean it's technically a no-op, so it's just the overhead of a no-op system call from way up in javaland. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846880#action_12846880 ] Earwin Burrfoot commented on LUCENE-2328: - EG running merges (or any still-open files) should not be sync'd. Files that are still being written should not be synced, that's kinda obvious. Not necessarily all closed files should be sync'd either - eg any files that were opened closed while we were syncing (since syncing can take some time) should not then be sync'd. This one is not so obvious. I assume that on calling syncEveryoneAndHisDog() you should sync all files that have been written to, and were closed, and not yet deleted. Maybe we change Dir.sync to take a CollectionString? What does that alone give us over the current situation? You can call Dir.sync() repeatedly, it's all the same. Or... I wonder if calling sync on a file that's already been sync'd is really that wasteful... It can be on these systems, that just sync down everything. I don't believe in people writing good software : } IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846890#action_12846890 ] Shai Erera commented on LUCENE-2328: ok so let me see if I understand this. Before Earwin suggested adding synced to Directory, the approach (as I understood it) was - whenever deleter deletes a file, remove it from synced as well. After Earwin's suggestion, which I like very much, as it moves more stuff out of IW, which could use some simplification, I initially thought that we should do this: when dir.sync is called, add that file to dir.synced. Then when dir.delete is called, remove it from there. When dir.commit is called, add all changed/synced files to the set (probably all of them). Something very straightforward and simple. However, the last two posts seem to try to complicate it ... and I don't understand why. So I'd appreciate if you can explain what am I missing. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846899#action_12846899 ] Earwin Burrfoot commented on LUCENE-2328: - I'm proposing something even more dead simple. 1. We remove Directory.sync(String) completely. 2. Each time you call IndexOutput.close(), Dir adds this file to its internal set (if it cares about it at all). 3. If you call Directory.delete(), it also removes file from the set (though not strictly necessary). 4. When you commit at IW, it calls Directory.sync() and everything in its internal set gets synced. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846902#action_12846902 ] Earwin Burrfoot commented on LUCENE-2328: - Btw, initial problem stems from the fact that IW/IR keeps track of the files it *has already* synced, instead of the files it *has not yet* synced. Which is kinda upside down, and requires upkeep, unlike straightforward approach in which this set gets cleared anew after each commit call. I can conjure up a patch in a day or two. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846913#action_12846913 ] Shai Erera commented on LUCENE-2328: How would IndexInput report back to the Directory when its close() was called? I've checked a couple of Directories and when they openInput, they don't pass themselves to the IndexInput. I think what you say makes sense, but I don't see how this can be implemented w/ the current implementations (and w/o relying on broken Directory impls out there). Broken in the sense that they don't expect to get any notification from IndexInput.close(). Other than that, I like that approach. Also, what you wrote about IW keeping track on already synced files - I guess you'll change that when it moves into Directory, so that it will track the files it hasn't synced yet? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846938#action_12846938 ] Earwin Burrfoot commented on LUCENE-2328: - How would IndexInput report back to the Directory when its close() was called? I've checked a couple of Directories and when they openInput, they don't pass themselves to the IndexInput. Hmm. I guess I have to change IndexOutput impls? so that it will track the files it hasn't synced yet? Sure IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846944#action_12846944 ] Michael McCandless commented on LUCENE-2328: Keeping track of not-yet-sync'd files instead of sync'd files is better, but it still requires upkeep (ie when file is deleted you have to remove it) because files can be opened, written to, closed, deleted without ever being sync'd. And I like moving this tracking under Dir -- that's where it belongs. bq. I assume that on calling syncEveryoneAndHisDog() you should sync all files that have been written to, and were closed, and not yet deleted. This will over-sync in some situations. Ie, causing commit to take longer than it should. EG say a merge has finished with the first set of files (say _X.fdx/t, since it merges fields first) but is still working on postings, when the user calls commit. We should not then sync _X.fdx/t because they are unreferenced by the segments_N we are committing. Or the merge has finished (so _X.* has been created) but is now off building the _X.cfs file -- we don't want to sync _X.*, only _X.cfs when its done. Another example: we don't do this today, but, addIndexes should really run fully outside of IW's normal segments file, merging away, and then only on final success alter IW's segmentInfos. If we switch to that, we don't want to sync all the files that addIndexes is temporarily writing... The knowledge of which files make up the transaction lives above Directory... so I think we should retain the per-file control. I proposed the bulk-sync API so that Dir impls could choose to do a system-wide sync. Or, more generally, any Dir which can be more efficient if it knows the precise set of files that must be sync'd right now. If we stick with file-by-file API, doing a system-wide sync is somewhat trickier... because you can't assume from one call to the next that nothing had changed. Also, bulk sync better matches the semantics IW/IR require: these consumers don't care the order in which these files are sync'd. They just care that the requested set is sync'd. So it exposes a degree of freedom to the Dir impls that's otherwise hidden today. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846956#action_12846956 ] Earwin Burrfoot commented on LUCENE-2328: - Keeping track of not-yet-sync'd files instead of sync'd files is better, but it still requires upkeep (ie when file is deleted you have to remove it) because files can be opened, written to, closed, deleted without ever being sync'd. You can just skip this and handle FileNotFound exception when syncing. Have to handle it anyway, no guarantees some file won't be snatched from under your nose. This will over-sync in some situations. Don't feel this is a serious problem. If you over-sync (in fact sync some files a little bit earlier than strictly required), in a few seconds you will under-sync, so total time is still the same. But I feel you're somewhat missing the point. System-wide sync is not the original aim, it's just a possible byproduct of what is the original aim - to move sync tracking code from IW to Directory. And I don't see at all how adding batch-syncs achieves this. If you're calling sync(CollectionString), damn, you should keep that collection somewhere :) and it is supposed to be inside! IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846960#action_12846960 ] Michael McCandless commented on LUCENE-2328: {quote} bq. Keeping track of not-yet-sync'd files instead of sync'd files is better, but it still requires upkeep (ie when file is deleted you have to remove it) because files can be opened, written to, closed, deleted without ever being sync'd. You can just skip this and handle FileNotFound exception when syncing. Have to handle it anyway, no guarantees some file won't be snatched from under your nose. {quote} IW IR do in fact guarantee they will never ask for a deleted file to be sync'd. If they ever do that we have more serious problems ;) {quote} bq. This will over-sync in some situations. Don't feel this is a serious problem. If you over-sync (in fact sync some files a little bit earlier than strictly required), in a few seconds you will under-sync, so total time is still the same. {quote} I think this is important -- commit is already slow enough -- why make it slower? Further, the extra files you sync'd may never have needed to be sync'd (they will be merged away). My examples above include such cases. Turning this around... what's so bad about keeping the sync per file? bq. System-wide sync is not the original aim, it's just a possible byproduct of what is the original aim I know this is not the aim of this issue, rather just a nice by-product if we switch to a global sync method. bq. to move sync tracking code from IW to Directory. Right this is a great step forward, as long as long as we don't slow commit by dumbing down the API :) bq. And I don't see at all how adding batch-syncs achieves this. You're right: this doesn't achieve / is not required for moving sync'd file tracking down to Dir. It's orthogonal, but, is another way that we could allow Dir impls to do global sync. I'm proposing this as a different change, to make the API better match the needs of its consumers. In fact, really the OS ought to allow for this as well (but I know of none that do) since it'd give the IO scheduler more freedom on which bytes need to be moved to disk. We can open this one as a separate issue... IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846991#action_12846991 ] Earwin Burrfoot commented on LUCENE-2328: - Okay, summing up. 1. Directory gets a new method - sync(CollectionString), it will become abstract in 4.0, but now by default delegates to current sync(String), which is deprecated. 2. FSDirectory tracks newly written, closed and not deleted files, by changing FSD.IndexOutput accordingly. 3. sync() semantics changes from sync this now to sync this now, if you think it's needed. Noop sync() impls like RAMDir continue to be noop, FSDir syncs only those files that exist in its tracking set and ignores all others. 4. IW/IR stop tracking synced files completely (lots of garbage code gone from IW), and instead call sync(Collection) on commit with a list of all files that constitute said commit. These steps preserve back-compatibility (Except for cases of custom Directory impls in which calling sync on the same file sequentially is costly. They will suffer performance degradation), ensure that for each commit only strictly requested subset of files is synced (thing Mike insisted on), and will completely remove sync-tracking code from IW and IR. 5. We open another issue to experiment with batch syncing and various filesystems. Some relevant fun data: http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846996#action_12846996 ] Shai Erera commented on LUCENE-2328: bq. changing FSD.IndexOutput accordingly This worries me a bit. If only FSD.IndexOutput will do that, I'm afraid other Directory implementations won't realize that they should do so as well (NIO?). I'd prefer if IndexOutput in its contract is supposed to callback on Directory upon close ... not sure - maybe just put some heave documentation around createOutput? If we could enforce this API-wise, and let the Dirs that don't care simply ignore, then it'd be better. It'll also allow for someone to extend FSD.createOutput, return his own IndexOutput and not worry (or do, but knowingly) about calling back to Dir. Other than that - this looks great. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847010#action_12847010 ] Earwin Burrfoot commented on LUCENE-2328: - Every Directory implementation decides how to handle sync() calls on its own. The fact that FSDir (and descendants) do this performance optimization is their implementation details. I don't want to bind this somehow into the base class. But, I will note in javadocs to sync() that clients may pass the same file over and over again, so you might want to optimize for this. IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847015#action_12847015 ] Michael McCandless commented on LUCENE-2328: Must the Dir insist the file is closed in order to sync it? Why not enroll newly created files in the to be sync'd set? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847036#action_12847036 ] Shai Erera commented on LUCENE-2328: Yeah I guess I wasn't clear enough. So suppose someone sub-classes FSDir and overrides createOutput. How should he know his IndexOutput should call dir.sync()? How should he know he needs to pass the Dir to his IndexOutput? So I suggested to either mention it in the Javadocs, or somehow make all of FSDir's outputs know about that, API-wise ... So today a file is closed only upon commit (?), and it's then that it's synced? If so, why would you want to sync a file that is still open? I guess it cannot harm, but what's the use case? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847050#action_12847050 ] Michael McCandless commented on LUCENE-2328: In the current proposal, IndexOutput won't call dir.sync. All it will do is notify the dir when it was closed so the dir will record that filename as eligible for commit. Lucene today never syncs a file until after it's closed, but, conceivably some day it could. Or others who use the Dir API to write their own files could. At the OS level this is perfectly fine (in fact you have to pass an open fd to fsync). It seems presumptuous of the directory to silently ignore a call to sync just because the file hadn't been closed yet... IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
[ https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847063#action_12847063 ] Michael McCandless commented on LUCENE-2328: Yes please clean as you go Earwin -- those sound great. {quote} bq. Must the Dir insist the file is closed in order to sync it? Well, no, this can be relaxed. Because default Directory clients - IW+IR will never call sync() on a file they didn't close yet. Also this client behaviour is guaranteed with current implementation - if someone calls current sync() on an open file, it will fail on 'new RandomAccessFile'? {quote} I'd like to allow for this to work in the future, even if current FSDir impls cannot sync an open file. EG conceivably they could reach in and get the RAF that IndexOutput has open and sync it. So I think we just note this as a limitation of FSDir impls today, but, the API allows for it? IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1 Environment: all Reporter: Gregor Kaczor Priority: Minor Fix For: 3.1 Original Estimate: 1h Remaining Estimate: 1h I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
IndexWriter.synced field accumulates data
I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: IndexWriter.synced field accumulates data
You're right! Really we should delete from sync'd when we delete the files. We need to tie into IndexFileDeleter for that, maybe moving this set into there. Though in practice the amount of actual RAM used should rarely be an issue? But we should fix it... Can you open an issue? Mike On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote: I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: IndexWriter.synced field accumulates data
I will open an issue. Acually its not the size of occupied RAM. The leak is the problem. Original-Nachricht Datum: Wed, 17 Mar 2010 14:30:25 -0500 Von: Michael McCandless luc...@mikemccandless.com An: java-dev@lucene.apache.org Betreff: Re: IndexWriter.synced field accumulates data You're right! Really we should delete from sync'd when we delete the files. We need to tie into IndexFileDeleter for that, maybe moving this set into there. Though in practice the amount of actual RAM used should rarely be an issue? But we should fix it... Can you open an issue? Mike On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote: I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak
IndexWriter.synced field accumulates data leading to a Memory Leak --- Key: LUCENE-2328 URL: https://issues.apache.org/jira/browse/LUCENE-2328 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 3.0.1, 3.0, 2.9.2, 2.9.1 Environment: all Reporter: Gregor Kaczor Priority: Minor I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: IndexWriter.synced field accumulates data
followup in https://issues.apache.org/jira/browse/LUCENE-2328 Original-Nachricht Datum: Wed, 17 Mar 2010 14:30:25 -0500 Von: Michael McCandless luc...@mikemccandless.com An: java-dev@lucene.apache.org Betreff: Re: IndexWriter.synced field accumulates data You're right! Really we should delete from sync'd when we delete the files. We need to tie into IndexFileDeleter for that, maybe moving this set into there. Though in practice the amount of actual RAM used should rarely be an issue? But we should fix it... Can you open an issue? Mike On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote: I am running into a strange OutOfMemoryError. My small test application does index and delete some few files. This is repeated for 60k times. Optimization is run from every 2k times a file is indexed. Index size is 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced field occupied more than half of the heap. That field is a private HashSet without a getter. Its task is to hold files which have been synced already. There are two calls to addAll and one call to add on synced but no remove or clear throughout the lifecycle of the IndexWriter instance. According to the Eclipse Memory Analyzer synced contains 32618 entries which look like file names _e065_1.del or _e067.cfs The index directory contains 10 files only. I guess synced is holding obsolete data - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org