Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Earwin Burrfoot
On Tue, Mar 22, 2011 at 06:21, Chris Hostetter hossman_luc...@fucit.org wrote:

 (replying to the dev list, see context below)

 : Unfortunately, you can't easily recover from this (except by
 : reindexing your docs again).
 :
 : Failing to call IW.commit() or IW.close() means no segments file was 
 written...


 I know there were good reasons for eliminating the autoCommit
 functionality from IndexWriter, but threads like tis make me thing thta
 even though autoCommit on flush/merge/whatever was bad, having an option
 for some sort of autoClose using a finalizer might by a good idea to
 give new/novice users a safety net.

 In the case of totally successful normal operation, this would result in
 one commit at GC (assuming the JVM calls the finalizer) and if there were
 any errors it should (if i understnad correclty) do an implicit rollback.

 Anyone see a downside?
Yes. Totally unexpected magical behaviour.
What if I didn't commit something on purporse?

        ...

 :  I had a program running for 2 days to build an index for around 160 
 million
 :  text files, and after program ended, I tried searching the index and found
 :  the index was not correctly built, *indexReader.numDocs()* returns 0. I
 :  checked the index directory, it looked good, all the index data seemed to 
 be
 :  there, the directory is 1.5 Gigabytes in size.
 : 
 :  I checked my code and found that I forgot to call 
 *indexWriter.optimize()*and
 :  *indexWriter.close()*, I want to know if it is possible to
 :  *re-optimize()*the index so I don't need to rebuild the whole index
 :  from scratch? I don't
 :  really want the program to take another 2 days.


 -Hoss

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко
E-Mail/Jabber: ear...@gmail.com
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Doron Cohen
Hi,


  I know there were good reasons for eliminating the autoCommit
  functionality from IndexWriter, but threads like tis make me thing thta
  even though autoCommit on flush/merge/whatever was bad, having an
 option
  for some sort of autoClose using a finalizer might by a good idea to
  give new/novice users a safety net.
 
  In the case of totally successful normal operation, this would result in
  one commit at GC (assuming the JVM calls the finalizer) and if there were
  any errors it should (if i understnad correclty) do an implicit rollback.
 
  Anyone see a downside?


I think finalize() is that not trustworthy, in that it may
never be called, e.g. in case GC happened to not collect the specific
object,
and so the way for programmers to guarantee execution of any code
at shutdown is with shutdown hooks, I guess this is that what you meant,
that Lucene would add a shutdown hook?

I.e, each IndexWriter object opened for write would add its own method
as a shutdown hook, so that at shutdown, that writer would check its state,
and in case that it was not closed (and hence also not rolled-back) and
has pending uncommitted changes, those changes would be committed,
is this what you mean?

I think it is almost okay - it would save the use case of this thread, but
could
still surprise someone...

Perhaps there's a third option - semi-commit? - that is, with the proposed
shutdown hook, iw commits without deleting the previous commit, and marks
on dir that its state is semi-commit and so when that index
is opened for read or write it would throw a special new exception that
indicates
this stare, and the caller, before continuing to use this index for either
read or
write would have to call either one of two new utility methods:
- commitSemiCommit(Directory)
- roolbackSemiCommit(Directory)
(Perhaps better names, rollbackSelfCommit, rollbackPartialCommit, etc.)
After that, it would be possible to open the index as usual.

It seems to me that something like this can work.
Not totally convinced that it is worth the effort...?



 Yes. Totally unexpected magical behaviour.
 What if I didn't commit something on purporse?


Applications can call rollback() in this case.

Regards,
Doron


Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Robert Muir
On Mon, Mar 21, 2011 at 11:21 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:

 Anyone see a downside?


I don't think we should do anything serious in a gc finalizer.

sounds like its asking for a JRE crash.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Uwe Schindler
Hi,

 I know there were good reasons for eliminating the autoCommit
 functionality from IndexWriter, but threads like tis make me thing thta
even
 though autoCommit on flush/merge/whatever was bad, having an option
 for some sort of autoClose using a finalizer might by a good idea to
give
 new/novice users a safety net.
 
 In the case of totally successful normal operation, this would result in
one
 commit at GC (assuming the JVM calls the finalizer) and if there were any
 errors it should (if i understnad correclty) do an implicit rollback.
 
 Anyone see a downside?

I am against all finalizer stuff, because it also lead to problems and is
unreliable - we already removed all finalizer stuff in Lucene left over from
early day, so we should not add them again. This error done by this user is
only done once, the second time this user will have a try...finally block
around his stuff.

A comparison is relational databases with autocommit off. If I crash my app
or don't correctly commit my stuff, it's also reverted on loose of
connection or foreful shutdown of JDBC driver! Where is the difference?

But I am for adding a recovery tool for uncommitted segments to CheckIndex.
I this this should not be too hard. Something like looking for cfs/other
filetypes and creating SegmentReaders that are then added using addIndex().

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Erick Erickson
I like Uwe's idea. As for Hoss's original suggestion, my initial
reaction is that if a user understands the need to set the option
in the first place, they're also more likely to understand the need
for close().

FWIW
Erick

On Tue, Mar 22, 2011 at 8:15 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 I know there were good reasons for eliminating the autoCommit
 functionality from IndexWriter, but threads like tis make me thing thta
 even
 though autoCommit on flush/merge/whatever was bad, having an option
 for some sort of autoClose using a finalizer might by a good idea to
 give
 new/novice users a safety net.

 In the case of totally successful normal operation, this would result in
 one
 commit at GC (assuming the JVM calls the finalizer) and if there were any
 errors it should (if i understnad correclty) do an implicit rollback.

 Anyone see a downside?

 I am against all finalizer stuff, because it also lead to problems and is
 unreliable - we already removed all finalizer stuff in Lucene left over from
 early day, so we should not add them again. This error done by this user is
 only done once, the second time this user will have a try...finally block
 around his stuff.

 A comparison is relational databases with autocommit off. If I crash my app
 or don't correctly commit my stuff, it's also reverted on loose of
 connection or foreful shutdown of JDBC driver! Where is the difference?

 But I am for adding a recovery tool for uncommitted segments to CheckIndex.
 I this this should not be too hard. Something like looking for cfs/other
 filetypes and creating SegmentReaders that are then added using addIndex().

 Uwe


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Chris Hostetter

: I like Uwe's idea. As for Hoss's original suggestion, my initial
: reaction is that if a user understands the need to set the option
: in the first place, they're also more likely to understand the need
: for close().

my intention was that if the user used a novice type API for getting an 
IndexWriter, it would default to true but any of hte non-trivial 
constructors where default to false.

:  I am against all finalizer stuff, because it also lead to problems and is
:  unreliable - we already removed all finalizer stuff in Lucene left over from

generally i agree with you, you shouldn't *expect* finalizers to be 
called, but i'm not aware of any problems that can happen by using the 
finalizer as a safety net ... rmuir mentioned it could cause a JRE crash 
but i don't understand how that would happen.

:  A comparison is relational databases with autocommit off. If I crash my app
:  or don't correctly commit my stuff, it's also reverted on loose of
:  connection or foreful shutdown of JDBC driver! Where is the difference?

the difference is a lot of DBs do default to autocommit, and we not only 
don't have autocommit (or autoclose as i'm suggestion) as a 
defualt, we don't even offer it as an option.

it just seems like the kind of thing that could easily bite someone in the 
ass that we could help prevent.

not just in the caes of a person who writes their first Lucene app and 
doesn't know to call close() or commit() at all, but in the case of 
someone who has an app that works fine 90% of the time, but doesn't 
realize they have a stray code path where they aren't committing/closing 
properly ... so *most* of hte time their app works fine and all of their 
data is there, but sometimesfor reasons they can't understand, data is 
missing when they do searches (even though their indexing code logs that 
it was added successfully)

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Chris Hostetter

: I think finalize() is that not trustworthy, in that it may
: never be called, e.g. in case GC happened to not collect the specific
: object,
: and so the way for programmers to guarantee execution of any code
: at shutdown is with shutdown hooks, I guess this is that what you meant,

i'm not suggesting that this be documented as a *reliable* garunteed way 
to get a commit, just as a safety net for nocie users.  I don't know 
enough about finer points of shutdown hooks to comment on the distinctio, 
but my off the cuff assumption is that a shutdown hook would be a bad idea 
... in a long running program wouldn't thta keep the IndexWriter 
from being GCed until shutdown?

:  Yes. Totally unexpected magical behaviour.
:  What if I didn't commit something on purporse?
...
: Applications can call rollback() in this case.

or more specificly along the lines of my original point: people who read 
the docs carefully are more likely to know about rollback and call it 
explicitly, or to see the autoClose option and explicitly set it to false 
(or use a constructor where it defualts to false)

-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-22 Thread Doron Cohen
Hi,


 I don't know
 enough about finer points of shutdown hooks to comment on the distinctio,
 but my off the cuff assumption is that a shutdown hook would be a bad idea
 ... in a long running program wouldn't thta keep the IndexWriter
 from being GCed until shutdown?


Could be, haven't use them either...
...If IW.close() calls RT.removeShutdownHook() I think this should work.

Doron


Re: Urgent! Forgot to close IndexWriter after adding Documents to the index.

2011-03-21 Thread Chris Hostetter

(replying to the dev list, see context below)

: Unfortunately, you can't easily recover from this (except by
: reindexing your docs again).
: 
: Failing to call IW.commit() or IW.close() means no segments file was 
written...


I know there were good reasons for eliminating the autoCommit 
functionality from IndexWriter, but threads like tis make me thing thta 
even though autoCommit on flush/merge/whatever was bad, having an option 
for some sort of autoClose using a finalizer might by a good idea to 
give new/novice users a safety net.

In the case of totally successful normal operation, this would result in 
one commit at GC (assuming the JVM calls the finalizer) and if there were 
any errors it should (if i understnad correclty) do an implicit rollback.

Anyone see a downside?

...

:  I had a program running for 2 days to build an index for around 160 million
:  text files, and after program ended, I tried searching the index and found
:  the index was not correctly built, *indexReader.numDocs()* returns 0. I
:  checked the index directory, it looked good, all the index data seemed to be
:  there, the directory is 1.5 Gigabytes in size.
: 
:  I checked my code and found that I forgot to call 
*indexWriter.optimize()*and
:  *indexWriter.close()*, I want to know if it is possible to
:  *re-optimize()*the index so I don't need to rebuild the whole index
:  from scratch? I don't
:  really want the program to take another 2 days.


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org