[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-24 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849076#action_12849076
 ] 

Michael McCandless commented on LUCENE-2328:


Woops, fixed, thanks!

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848632#action_12848632
 ] 

Michael McCandless commented on LUCENE-2328:


I think it's OK to make an exception to back-compat here.  Users who subclass 
FSDir, and also borrow SimpleFDDir's IndexOutput impl, are very advanced and 
can change their code.  The break will also be very clear -- compilation error, 
which you must fix to move on -- so we're not making a trap here.

Uwe are you OK with the rename?  I think it actually does make sense that it be 
in the base class...

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848778#action_12848778
 ] 

Uwe Schindler commented on LUCENE-2328:
---

I am fine now! Go for it! Policeman is happy.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848789#action_12848789
 ] 

Michael McCandless commented on LUCENE-2328:


OK I will commit shortly!  Thanks Earwin :)

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2328.


Resolution: Fixed

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-23 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848973#action_12848973
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Mike, you missed latest patch, with Shai-requested comment:

{code}
@@ -85,6 +85,8 @@
* stable storage.  Lucene uses this to properly commit
* changes to the index, to prevent a machine/OS crash
* from corrupting the index.
+   * @deprecated use {...@link #sync(Collection)} instead.
+   * For easy migration you can change your code to call 
sync(Collections.singleton(name))
*/
   @Deprecated
   public void sync(String name) throws IOException { // TODO 4.0 kill me
{code}


 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848102#action_12848102
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Ah, patch is based off LUCENE-2339. If applied over trunk, there may be some 
import conflicts in Directory.java

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-2328:
--

Assignee: Michael McCandless

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848159#action_12848159
 ] 

Michael McCandless commented on LUCENE-2328:


BTW on the sync of still-open files non-supported case, if we ever did want to 
support, I think we'd add sync to IndexOutput.

Ie it makes sense that this dir-level sync only works after the file is closed.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848212#action_12848212
 ] 

Michael McCandless commented on LUCENE-2328:


Patch looks great -- what a sweet cleanup!  I love all the code removed from IW 
  DR :)

Can we remove NoFSync  Ext3StyleFSync?  People can write these if they want...

Also, you moved SimpleFSDir.SimpleFSIndexOutput - FSDir.FSIndexOutput... but 
this is a break in back-compat right?  Ie subclasses out there may be using 
this?

Can you add a CHANGES entry?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2328:


Attachment: LUCENE-2328.patch

New patch.
FSyncStrategy removed, default inlined. All our Directory impls override 
deprecated sync() to preserve back-compat.
Preserving back-compat for IO move is impossible, mentioned in CHANGES.txt, 
which probably needs some love.


 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2328:


Attachment: LUCENE-2328.patch

Clean patch against trunk

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848324#action_12848324
 ] 

Michael McCandless commented on LUCENE-2328:


Patch looks great Earwin -- I'll commit in a day or two.  Thanks!

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848341#action_12848341
 ] 

Shai Erera commented on LUCENE-2328:


Earwin, can you add a deprecation message to sync(String)? When I upgraded from 
2.9 to 3.0 some methods were deprecated w/o any explanation as to what I should 
use instead. I think a message like @deprecated use #sync(Collection) instead. 
For easy migration you can change your code to call 
sync(Colllections.singleton(name)) ... or something along those lines.

Other than that, patch looks great! I really like the code cleanup from IW.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Earwin Burrfoot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earwin Burrfoot updated LUCENE-2328:


Attachment: LUCENE-2328.patch

added comment to jdocs

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848385#action_12848385
 ] 

Uwe Schindler commented on LUCENE-2328:
---

Ahm,

one question, why does this patch reimplement the deprecated and removed 
FSIndexInput/FSIndexOutput? They have to be and are in SimpleFSIndexOutput. You 
are reverting to the pre-2.9 state. This is not obvious to me, so I am -1 about 
this patch without explanation.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848427#action_12848427
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

I do not touch *IndexInput, these should stay where they are.
FSIndexOutput is used in *all* child classes, without changes, so it's only 
logical to move it to parent. It is also tied in now with sync tracking logic, 
and required for it to work properly.
Preserving backwards-compatibility here is impossible because we need FSIO to 
call back its parent, whether it's by declaring it non-static, or passing a new 
explicit parameter to constructor, it is required and it is a break.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch, 
 LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847448#action_12847448
 ] 

Shai Erera commented on LUCENE-2328:


Earwin, I agree that sub-classing FSDir is not that easy. So I guess you'll add 
another piece of jdoc to createOutput, to notify Dir when it's closed? This 
seems reasonable.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847492#action_12847492
 ] 

Michael McCandless commented on LUCENE-2328:


When it's opened, not closed, right?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847512#action_12847512
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

I'll either jdoc this, or move createOutput to FSDir, as all three current 
impls are a copy of each other. In such a case someone overriding createOutput 
can look at the original and decide for himself if he wants to keep and call 
this functionality, or not.

 When it's opened, not closed, right? 
Mike, I thought about it once again. If you allow sync()ing open files, you 
still need to track when they are closed. Or the following may happen:

io = dir.createIndexOutput(name); // registers 'name' as a stale file
dir.sync(name) // syncs 'name', removes it from registry
... // do stuff
io.close()
dir.sync(name) // does not sync 'name', as it is no longer in the registry
... // BZZWHAM!!
... // crash happens, the data is lost

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847515#action_12847515
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Thus, I think we should officially disallow syncing open files. This operation 
is impossible right now and pointless, anyway.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847536#action_12847536
 ] 

Michael McCandless commented on LUCENE-2328:


bq. If you allow sync()ing open files, you still need to track when they are 
closed. Or the following may happen:

Ahh right.

OK so let's disallow that in the API.  You can only sync a file after it's been 
closed.  Trying to sync a file that hasn't yet been closed will be undefined.  
(and it sounds like *FSDir will silently ignore the request).

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847585#action_12847585
 ] 

Shai Erera commented on LUCENE-2328:


bq. Trying to sync a file that hasn't yet been closed will be undefined

Can we avoid 'undefined'? We have an issue open about SegmentInfos.fileLength() 
not clearly defined and it causes confusion. If it's undefined, then someone 
might attempt to call sync before he closes the file, and only then close ... 
can we throw an exception in that case?

We can have close(), sync() and closeAndSync(). Would the latter make sense?

I prefer if the API will be explicit,, and I think that throwing an exception 
(StillOpenException?) if sync() is called before close() is very explicit, and 
reasonable if accompanied by a proper jdoc.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847642#action_12847642
 ] 

Michael McCandless commented on LUCENE-2328:


bq. We can have close(), sync() and closeAndSync(). Would the latter make sense?

I don't think closeAndSync could be used by Lucene, at least today.  Typically, 
at the time these files are closed, Lucene has no idea whether sync is needed 
(ie, whether a commit() will be called by the app before the segment gets 
merged).  So I don't think we should add it now?  (Design for today).

bq. I prefer if the API will be explicit,, and I think that throwing an 
exception (StillOpenException?) if sync() is called before close() is very 
explicit, and reasonable if accompanied by a proper jdoc.

This would be great... I think, especially, for something as important as 
sync(), we should not silently ignore you when you think you've sync'd an open 
file.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2328:
---

Fix Version/s: 3.1

Anyone wanna cons up a patch here...?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: IndexWriter.synced field accumulates data

2010-03-18 Thread Michael McCandless
Thanks!

Mike

On Wed, Mar 17, 2010 at 3:16 PM, Gregor Kaczor gkac...@gmx.de wrote:
 followup in

 https://issues.apache.org/jira/browse/LUCENE-2328


  Original-Nachricht 
 Datum: Wed, 17 Mar 2010 14:30:25 -0500
 Von: Michael McCandless luc...@mikemccandless.com
 An: java-dev@lucene.apache.org
 Betreff: Re: IndexWriter.synced field accumulates data

 You're right!

 Really we should delete from sync'd when we delete the files.  We need
 to tie into IndexFileDeleter for that, maybe moving this set into
 there.

 Though in practice the amount of actual RAM used should rarely be an
 issue?  But we should fix it...

 Can you open an issue?

 Mike

 On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote:
  I am running into a strange OutOfMemoryError. My small test application
 does index and delete some few files. This is repeated for 60k times.
  Optimization is run from every 2k times a file is indexed. Index size is 
 50KB.
 I did analyze the HeapDumpFile and realized that IndexWriter.synced field
 occupied more than half of the heap. That field is a private HashSet
 without a getter. Its task is to hold files which have been synced already.
 
  There are two calls to addAll and one call to add on synced but no
 remove or clear throughout the lifecycle of the IndexWriter instance.
 
  According to the Eclipse Memory Analyzer synced contains 32618 entries
 which look like file names _e065_1.del or _e067.cfs
 
  The index directory contains 10 files only.
 
  I guess synced is holding obsolete data
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846821#action_12846821
 ] 

Shai Erera commented on LUCENE-2328:


Would that mean removing files from synced whenever 'deleter' (which is an 
IndexFileDeleter) calls delete*? Are there other places to look for?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846825#action_12846825
 ] 

Michael McCandless commented on LUCENE-2328:


Yes I think that's it.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846835#action_12846835
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

A shot in the sky (didn't delve deep into the problem, could definetly miss 
stuff):

What about tracking 'syncidness' from within Directory?
There shouldn't be more than one writer anyway (unless your locking is broken), 
so that's a single set of 'files-to-be-synced' for each given moment of time. 
Might as well keep track of it inside the directory, and have a 
syncAllUnsyncedGuys() on it.

This will also remove the need to transfer that list around when transferring 
write lock (IR hell).

And all-round that sounds quite logical, as the need/method of syncing depends 
solely on directory. If you're working with RAMDirectory, you don't need to 
keep track of these files at all.
Probably same for some of DB impls.
Also some filesystems sync everything, when you ask to sync a single file, so 
if you're syncing a batch of them in a row, that's some overhead that you can 
theoretically work around with a special flag to FSDir.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846872#action_12846872
 ] 

Michael McCandless commented on LUCENE-2328:


I like this idea!

But, we don't want to simply sync all new files.  When IW commits,
it's possibly a subset of all new files.  EG running merges (or any
still-open files) should not be sync'd.

Not necessarily all closed files should be sync'd either -- eg any
files that were opened  closed while we were syncing (since syncing
can take some time) should not then be sync'd.

Maybe we change Dir.sync to take a CollectionString?

Then dir would be the one place that keeps track of what's already
been sync'd and what hasn't.

Or... I wonder if calling sync on a file that's already been sync'd is
really that wasteful... I mean it's technically a no-op, so it's just
the overhead of a no-op system call from way up in javaland.


 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846880#action_12846880
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

 EG running merges (or any still-open files) should not be sync'd.
Files that are still being written should not be synced, that's kinda obvious.

 Not necessarily all closed files should be sync'd either - eg any files that 
 were opened  closed while we were syncing (since syncing can take some time) 
 should not then be sync'd.
This one is not so obvious.
I assume that on calling syncEveryoneAndHisDog() you should sync all files that 
have been written to, and were closed, and not yet deleted.

 Maybe we change Dir.sync to take a CollectionString?
What does that alone give us over the current situation? You can call 
Dir.sync() repeatedly, it's all the same.

 Or... I wonder if calling sync on a file that's already been sync'd is really 
 that wasteful... 
It can be on these systems, that just sync down everything. I don't believe in 
people writing good software : }

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846890#action_12846890
 ] 

Shai Erera commented on LUCENE-2328:


ok so let me see if I understand this. Before Earwin suggested adding synced to 
Directory, the approach (as I understood it) was - whenever deleter deletes a 
file, remove it from synced as well.

After Earwin's suggestion, which I like very much, as it moves more stuff out 
of IW, which could use some simplification, I initially thought that we should 
do this: when dir.sync is called, add that file to dir.synced. Then when 
dir.delete is called, remove it from there. When dir.commit is called, add all 
changed/synced files to the set (probably all of them). Something very 
straightforward and simple.

However, the last two posts seem to try to complicate it ... and I don't 
understand why. So I'd appreciate if you can explain what am I missing.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846899#action_12846899
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

I'm proposing something even more dead simple.

1. We remove Directory.sync(String) completely.
2. Each time you call IndexOutput.close(), Dir adds this file to its internal 
set (if it cares about it at all).
3. If you call Directory.delete(), it also removes file from the set (though 
not strictly necessary).
4. When you commit at IW, it calls Directory.sync() and everything in its 
internal set gets synced. 

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846902#action_12846902
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Btw, initial problem stems from the fact that IW/IR keeps track of the files it 
*has already* synced, instead of the files it *has not yet* synced. Which is 
kinda upside down, and requires upkeep, unlike straightforward approach in 
which this set gets cleared anew after each commit call.

I can conjure up a patch in a day or two.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846913#action_12846913
 ] 

Shai Erera commented on LUCENE-2328:


How would IndexInput report back to the Directory when its close() was called? 
I've checked a couple of Directories and when they openInput, they don't pass 
themselves to the IndexInput. I think what you say makes sense, but I don't see 
how this can be implemented w/ the current implementations (and w/o relying on 
broken Directory impls out there). Broken in the sense that they don't expect 
to get any notification from IndexInput.close().

Other than that, I like that approach. Also, what you wrote about IW keeping 
track on already synced files - I guess you'll change that when it moves into 
Directory, so that it will track the files it hasn't synced yet?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846938#action_12846938
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

 How would IndexInput report back to the Directory when its close() was 
 called? I've checked a couple of Directories and when they openInput, they 
 don't pass themselves to the IndexInput.
Hmm. I guess I have to change IndexOutput impls?

 so that it will track the files it hasn't synced yet?
Sure

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846944#action_12846944
 ] 

Michael McCandless commented on LUCENE-2328:


Keeping track of not-yet-sync'd files instead of sync'd files is
better, but it still requires upkeep (ie when file is deleted you have
to remove it) because files can be opened, written to, closed, deleted
without ever being sync'd.

And I like moving this tracking under Dir -- that's where it belongs.

bq. I assume that on calling syncEveryoneAndHisDog() you should sync all files 
that have been written to, and were closed, and not yet deleted.

This will over-sync in some situations.

Ie, causing commit to take longer than it should.

EG say a merge has finished with the first set of files (say _X.fdx/t,
since it merges fields first) but is still working on postings, when
the user calls commit.  We should not then sync _X.fdx/t because they
are unreferenced by the segments_N we are committing.

Or the merge has finished (so _X.* has been created) but is now off
building the _X.cfs file -- we don't want to sync _X.*, only _X.cfs
when its done.

Another example: we don't do this today, but, addIndexes should really
run fully outside of IW's normal segments file, merging away, and then
only on final success alter IW's segmentInfos.  If we switch to that,
we don't want to sync all the files that addIndexes is temporarily
writing...

The knowledge of which files make up the transaction lives above
Directory... so I think we should retain the per-file control.

I proposed the bulk-sync API so that Dir impls could choose to do a
system-wide sync.  Or, more generally, any Dir which can be more
efficient if it knows the precise set of files that must be sync'd
right now.

If we stick with file-by-file API, doing a system-wide sync is
somewhat trickier... because you can't assume from one call to the
next that nothing had changed.

Also, bulk sync better matches the semantics IW/IR require: these
consumers don't care the order in which these files are sync'd.  They
just care that the requested set is sync'd.  So it exposes a degree of
freedom to the Dir impls that's otherwise hidden today.


 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846956#action_12846956
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

 Keeping track of not-yet-sync'd files instead of sync'd files is better, but 
 it still requires upkeep (ie when file is deleted you have to remove it) 
 because files can be opened, written to, closed, deleted without ever being 
 sync'd.
You can just skip this and handle FileNotFound exception when syncing. Have to 
handle it anyway, no guarantees some file won't be snatched from under your 
nose.

 This will over-sync in some situations.
Don't feel this is a serious problem. If you over-sync (in fact sync some files 
a little bit earlier than strictly required), in a few seconds you will 
under-sync, so total time is still the same.

But I feel you're somewhat missing the point. System-wide sync is not the 
original aim, it's just a possible byproduct of what is the original aim - to 
move sync tracking code from IW to Directory. And I don't see at all how adding 
batch-syncs achieves this.
If you're calling sync(CollectionString), damn, you should keep that 
collection somewhere :) and it is supposed to be inside!

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846960#action_12846960
 ] 

Michael McCandless commented on LUCENE-2328:


{quote}
bq. Keeping track of not-yet-sync'd files instead of sync'd files is better, 
but it still requires upkeep (ie when file is deleted you have to remove it) 
because files can be opened, written to, closed, deleted without ever being 
sync'd.
You can just skip this and handle FileNotFound exception when syncing. Have to 
handle it anyway, no guarantees some file won't be snatched from under your 
nose.
{quote}

IW  IR do in fact guarantee they will never ask for a deleted file to
be sync'd.  If they ever do that we have more serious problems ;)

{quote}
bq. This will over-sync in some situations.
Don't feel this is a serious problem. If you over-sync (in fact sync some files 
a little bit earlier than strictly required), in a few seconds you will 
under-sync, so total time is still the same.
{quote}

I think this is important -- commit is already slow enough -- why make
it slower?

Further, the extra files you sync'd may never have needed to be sync'd
(they will be merged away).  My examples above include such cases.

Turning this around... what's so bad about keeping the sync per file?

bq. System-wide sync is not the original aim, it's just a possible byproduct of 
what is the original aim

I know this is not the aim of this issue, rather just a nice
by-product if we switch to a global sync method.

bq. to move sync tracking code from IW to Directory.

Right this is a great step forward, as long as long as we don't slow
commit by dumbing down the API :)

bq. And I don't see at all how adding batch-syncs achieves this.

You're right: this doesn't achieve / is not required for moving
sync'd file tracking down to Dir.  It's orthogonal, but, is another
way that we could allow Dir impls to do global sync.

I'm proposing this as a different change, to make the API better match
the needs of its consumers.  In fact, really the OS ought to allow for
this as well (but I know of none that do) since it'd give the IO
scheduler more freedom on which bytes need to be moved to disk.

We can open this one as a separate issue...


 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846991#action_12846991
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Okay, summing up.

1. Directory gets a new method - sync(CollectionString), it will become 
abstract in 4.0, but now by default delegates to current sync(String), which is 
deprecated.
2. FSDirectory tracks newly written, closed and not deleted files, by changing 
FSD.IndexOutput accordingly.
3. sync() semantics changes from sync this now to sync this now, if you 
think it's needed. Noop sync() impls like RAMDir continue to be noop, FSDir 
syncs only those files that exist in its tracking set and ignores all others.
4. IW/IR stop tracking synced files completely (lots of garbage code gone from 
IW), and instead call sync(Collection) on commit with a list of all files that 
constitute said commit.

These steps preserve back-compatibility (Except for cases of custom Directory 
impls in which calling sync on the same file sequentially is costly. They will 
suffer performance degradation), ensure that for each commit only strictly 
requested subset of files is synced (thing Mike insisted on), and will 
completely remove sync-tracking code from IW and IR.

5. We open another issue to experiment with batch syncing and various 
filesystems. Some relevant fun data: 
http://www.humboldt.co.uk/2009/03/fsync-across-platforms.html


 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12846996#action_12846996
 ] 

Shai Erera commented on LUCENE-2328:


bq.  changing FSD.IndexOutput accordingly

This worries me a bit. If only FSD.IndexOutput will do that, I'm afraid other 
Directory implementations won't realize that they should do so as well (NIO?). 
I'd prefer if IndexOutput in its contract is supposed to callback on Directory 
upon close ... not sure - maybe just put some heave documentation around 
createOutput? If we could enforce this API-wise, and let the Dirs that don't 
care simply ignore, then it'd be better. It'll also allow for someone to extend 
FSD.createOutput, return his own IndexOutput and not worry (or do, but 
knowingly) about calling back to Dir.

Other than that - this looks great.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Earwin Burrfoot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847010#action_12847010
 ] 

Earwin Burrfoot commented on LUCENE-2328:
-

Every Directory implementation decides how to handle sync() calls on its own. 
The fact that FSDir (and descendants) do this performance optimization is their 
implementation details.
I don't want to bind this somehow into the base class. But, I will note in 
javadocs to sync() that clients may pass the same file over and over again, so 
you might want to optimize for this.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847015#action_12847015
 ] 

Michael McCandless commented on LUCENE-2328:


Must the Dir insist the file is closed in order to sync it?

Why not enroll newly created files in the to be sync'd set?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847036#action_12847036
 ] 

Shai Erera commented on LUCENE-2328:


Yeah I guess I wasn't clear enough. So suppose someone sub-classes FSDir and 
overrides createOutput. How should he know his IndexOutput should call 
dir.sync()? How should he know he needs to pass the Dir to his IndexOutput? So 
I suggested to either mention it in the Javadocs, or somehow make all of 
FSDir's outputs know about that, API-wise ...

So today a file is closed only upon commit (?), and it's then that it's synced? 
If so, why would you want to sync a file that is still open? I guess it cannot 
harm, but what's the use case?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847050#action_12847050
 ] 

Michael McCandless commented on LUCENE-2328:


In the current proposal, IndexOutput won't call dir.sync.  All it will do is 
notify the dir when it was closed so the dir will record that filename as 
eligible for commit.

Lucene today never syncs a file until after it's closed, but, conceivably some 
day it could.  Or others who use the Dir API to write their own files could.

At the OS level this is perfectly fine (in fact you have to pass an open fd to 
fsync).  It seems presumptuous of the directory to silently ignore a call to 
sync just because the file hadn't been closed yet...

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-18 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847063#action_12847063
 ] 

Michael McCandless commented on LUCENE-2328:


Yes please clean as you go Earwin -- those sound great.

{quote}
bq. Must the Dir insist the file is closed in order to sync it?
Well, no, this can be relaxed.
Because default Directory clients - IW+IR will never call sync() on a file they 
didn't close yet.
Also this client behaviour is guaranteed with current implementation - if 
someone calls current sync() on an open file, it will fail on 'new 
RandomAccessFile'?
{quote}

I'd like to allow for this to work in the future, even if current FSDir impls 
cannot sync an open file.  EG conceivably they could reach in and get the RAF 
that IndexOutput has open and sync it.

So I think we just note this as a limitation of FSDir impls today, but, the API 
allows for it?

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



IndexWriter.synced field accumulates data

2010-03-17 Thread Gregor Kaczor
I am running into a strange OutOfMemoryError. My small test application does 
index and delete some few files. This is repeated for 60k times.  Optimization 
is run from every 2k times a file is indexed. Index size is 50KB. I did analyze 
the HeapDumpFile and realized that IndexWriter.synced field occupied more than 
half of the heap. That field is a private HashSet without a getter. Its task is 
to hold files which have been synced already. 

There are two calls to addAll and one call to add on synced but no remove or 
clear throughout the lifecycle of the IndexWriter instance. 

According to the Eclipse Memory Analyzer synced contains 32618 entries which 
look like file names _e065_1.del or _e067.cfs 

The index directory contains 10 files only.

I guess synced is holding obsolete data

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: IndexWriter.synced field accumulates data

2010-03-17 Thread Michael McCandless
You're right!

Really we should delete from sync'd when we delete the files.  We need
to tie into IndexFileDeleter for that, maybe moving this set into
there.

Though in practice the amount of actual RAM used should rarely be an
issue?  But we should fix it...

Can you open an issue?

Mike

On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote:
 I am running into a strange OutOfMemoryError. My small test application does 
 index and delete some few files. This is repeated for 60k times.  
 Optimization is run from every 2k times a file is indexed. Index size is 
 50KB. I did analyze the HeapDumpFile and realized that IndexWriter.synced 
 field occupied more than half of the heap. That field is a private HashSet 
 without a getter. Its task is to hold files which have been synced already.

 There are two calls to addAll and one call to add on synced but no remove or 
 clear throughout the lifecycle of the IndexWriter instance.

 According to the Eclipse Memory Analyzer synced contains 32618 entries which 
 look like file names _e065_1.del or _e067.cfs

 The index directory contains 10 files only.

 I guess synced is holding obsolete data

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: IndexWriter.synced field accumulates data

2010-03-17 Thread Gregor Kaczor
I will open an issue.

Acually its not the size of occupied RAM. The leak is the problem.

 Original-Nachricht 
 Datum: Wed, 17 Mar 2010 14:30:25 -0500
 Von: Michael McCandless luc...@mikemccandless.com
 An: java-dev@lucene.apache.org
 Betreff: Re: IndexWriter.synced field accumulates data

 You're right!
 
 Really we should delete from sync'd when we delete the files.  We need
 to tie into IndexFileDeleter for that, maybe moving this set into
 there.
 
 Though in practice the amount of actual RAM used should rarely be an
 issue?  But we should fix it...
 
 Can you open an issue?
 
 Mike
 
 On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote:
  I am running into a strange OutOfMemoryError. My small test application
 does index and delete some few files. This is repeated for 60k times.
  Optimization is run from every 2k times a file is indexed. Index size is 
 50KB.
 I did analyze the HeapDumpFile and realized that IndexWriter.synced field
 occupied more than half of the heap. That field is a private HashSet
 without a getter. Its task is to hold files which have been synced already.
 
  There are two calls to addAll and one call to add on synced but no
 remove or clear throughout the lifecycle of the IndexWriter instance.
 
  According to the Eclipse Memory Analyzer synced contains 32618 entries
 which look like file names _e065_1.del or _e067.cfs
 
  The index directory contains 10 files only.
 
  I guess synced is holding obsolete data
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-17 Thread Gregor Kaczor (JIRA)
IndexWriter.synced  field accumulates data leading to a Memory Leak
---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.1, 3.0, 2.9.2, 2.9.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor


I am running into a strange OutOfMemoryError. My small test application does
index and delete some few files. This is repeated for 60k times. Optimization
is run from every 2k times a file is indexed. Index size is 50KB. I did analyze
the HeapDumpFile and realized that IndexWriter.synced field occupied more than
half of the heap. That field is a private HashSet without a getter. Its task is
to hold files which have been synced already.

There are two calls to addAll and one call to add on synced but no remove or
clear throughout the lifecycle of the IndexWriter instance.

According to the Eclipse Memory Analyzer synced contains 32618 entries which
look like file names _e065_1.del or _e067.cfs

The index directory contains 10 files only.

I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: IndexWriter.synced field accumulates data

2010-03-17 Thread Gregor Kaczor
followup in

https://issues.apache.org/jira/browse/LUCENE-2328


 Original-Nachricht 
 Datum: Wed, 17 Mar 2010 14:30:25 -0500
 Von: Michael McCandless luc...@mikemccandless.com
 An: java-dev@lucene.apache.org
 Betreff: Re: IndexWriter.synced field accumulates data

 You're right!
 
 Really we should delete from sync'd when we delete the files.  We need
 to tie into IndexFileDeleter for that, maybe moving this set into
 there.
 
 Though in practice the amount of actual RAM used should rarely be an
 issue?  But we should fix it...
 
 Can you open an issue?
 
 Mike
 
 On Wed, Mar 17, 2010 at 1:15 PM, Gregor Kaczor gkac...@gmx.de wrote:
  I am running into a strange OutOfMemoryError. My small test application
 does index and delete some few files. This is repeated for 60k times.
  Optimization is run from every 2k times a file is indexed. Index size is 
 50KB.
 I did analyze the HeapDumpFile and realized that IndexWriter.synced field
 occupied more than half of the heap. That field is a private HashSet
 without a getter. Its task is to hold files which have been synced already.
 
  There are two calls to addAll and one call to add on synced but no
 remove or clear throughout the lifecycle of the IndexWriter instance.
 
  According to the Eclipse Memory Analyzer synced contains 32618 entries
 which look like file names _e065_1.del or _e067.cfs
 
  The index directory contains 10 files only.
 
  I guess synced is holding obsolete data
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org