Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Chris Hostetter
:
: As 2.1 is soon coming, I wonder if NewIndexModifier is a proper name for
: the public API.
: (It would be the first NewXYZ and there is no OldXYZ either...)
:
: How about renaming it to something like DeletingIndexWriter?

I haven't been following the Jira issue that closely (LUCENE-565), but as
i recall the name question comes up because the class originally intended
to replace IndexModifier, but it's API/purpose has evolved so that it is
no longer a suitable "drop in replacement" for the current IndexModifier
is that correct?

NewIndexModifier currently subclasses IndexWriter and adds the following
public methods...

  public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
  public int getMaxBufferedDeleteTerms()
  public void updateDocument(Term term, Document doc)
  public void updateDocument(Term term, Document doc, Analyzer analyzer)
  public synchronized void deleteDocuments(Term term)
  public synchronized void deleteDocuments(Term[] terms)

from an API standpoint, it seems like this could easily replace the
current IndexModifier (which would have the nice side effect of resolving
the issue of wether the name NewIndexModifier is good enough) assuming the
semantics of the classes/methdos are the same -- i'm not sure if they are.

skimming hte history of LUCENE-565 it's not clear to me why this was
implemented as a new class with the name NewIndexModifier ... was that
just how it evolved organicly?


if it's not possible to make this class replace IndexModifier, then
DeletingIndexWriter or BufferedDeletingIndexWriter seem like they would be
fine to me.

-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread robert engels

Maybe IndexMaintainer or IndexUpdater ?

On Feb 8, 2007, at 2:59 AM, Chris Hostetter wrote:


:
: As 2.1 is soon coming, I wonder if NewIndexModifier is a proper  
name for

: the public API.
: (It would be the first NewXYZ and there is no OldXYZ either...)
:
: How about renaming it to something like DeletingIndexWriter?

I haven't been following the Jira issue that closely (LUCENE-565),  
but as
i recall the name question comes up because the class originally  
intended
to replace IndexModifier, but it's API/purpose has evolved so that  
it is
no longer a suitable "drop in replacement" for the current  
IndexModifier

is that correct?

NewIndexModifier currently subclasses IndexWriter and adds the  
following

public methods...

  public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
  public int getMaxBufferedDeleteTerms()
  public void updateDocument(Term term, Document doc)
  public void updateDocument(Term term, Document doc, Analyzer  
analyzer)

  public synchronized void deleteDocuments(Term term)
  public synchronized void deleteDocuments(Term[] terms)

from an API standpoint, it seems like this could easily replace the
current IndexModifier (which would have the nice side effect of  
resolving
the issue of wether the name NewIndexModifier is good enough)  
assuming the
semantics of the classes/methdos are the same -- i'm not sure if  
they are.


skimming hte history of LUCENE-565 it's not clear to me why this was
implemented as a new class with the name NewIndexModifier ... was that
just how it evolved organicly?


if it's not possible to make this class replace IndexModifier, then
DeletingIndexWriter or BufferedDeletingIndexWriter seem like they  
would be

fine to me.

-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



new issue

2007-02-08 Thread Gaurav Srivastava
i want the document object of the result of the search in form of 
resultset so that i could use it in my application please suggest


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new issue

2007-02-08 Thread robert engels

ask you question on the user list

On Feb 8, 2007, at 3:18 AM, Gaurav Srivastava wrote:

i want the document object of the result of the search in form of  
resultset so that i could use it in my application please suggest


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Michael McCandless


I like the name BufferedDeletingIndexWriter best so far.


Chris Hostetter wrote:

> from an API standpoint, it seems like this could easily replace the
> current IndexModifier (which would have the nice side effect of
> resolving the issue of wether the name NewIndexModifier is good
> enough) assuming the semantics of the classes/methdos are the same
> -- i'm not sure if they are.

The one method missing vs IndexModifier, which prevents this being a
drop in replacement, is deleteDocument(int docNum).  This specific
issue was discussed here:

  https://issues.apache.org/jira/browse/LUCENE-565#action_12430130

and I would tend to agree with that logic (exposing a
deleteDocument(int docNum) is dangerous), meaning we can't drop in
replace the current IndexModifier.  Maybe we could deprecate the
existing IndexModifier?

> skimming hte history of LUCENE-565 it's not clear to me why this was
> implemented as a new class with the name NewIndexModifier ... was
> that just how it evolved organicly?

Well, the patch started life as direct improvements to
IndexWriter.  But people were concerned w/ that approach and suggested
enabling sub-classing of IndexWriter instead, which lead to the
current NewIndexModifier class.

Long-ish term I think we should aim for one reader class (IndexReader)
that you use to do read-only things and one writer class
(NewIndexModifier being closest to this now) to make changes (adds,
deletes, optimize, etc.) to an index.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Otis Gospodnetic
It's a temporary name, no?  In the end we probably want to keep the _name_ 
IndexWriter, so why not just it IndexWriter2 and when we are happy with it, we 
make it be the new IndexWriter and we deprecate IW2.

Otis

- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, February 8, 2007 4:08:31 AM
Subject: Re: NewIndexModifier - - - DeletingIndexWriter

Maybe IndexMaintainer or IndexUpdater ?

On Feb 8, 2007, at 2:59 AM, Chris Hostetter wrote:

> :
> : As 2.1 is soon coming, I wonder if NewIndexModifier is a proper  
> name for
> : the public API.
> : (It would be the first NewXYZ and there is no OldXYZ either...)
> :
> : How about renaming it to something like DeletingIndexWriter?
>
> I haven't been following the Jira issue that closely (LUCENE-565),  
> but as
> i recall the name question comes up because the class originally  
> intended
> to replace IndexModifier, but it's API/purpose has evolved so that  
> it is
> no longer a suitable "drop in replacement" for the current  
> IndexModifier
> is that correct?
>
> NewIndexModifier currently subclasses IndexWriter and adds the  
> following
> public methods...
>
>   public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
>   public int getMaxBufferedDeleteTerms()
>   public void updateDocument(Term term, Document doc)
>   public void updateDocument(Term term, Document doc, Analyzer  
> analyzer)
>   public synchronized void deleteDocuments(Term term)
>   public synchronized void deleteDocuments(Term[] terms)
>
> from an API standpoint, it seems like this could easily replace the
> current IndexModifier (which would have the nice side effect of  
> resolving
> the issue of wether the name NewIndexModifier is good enough)  
> assuming the
> semantics of the classes/methdos are the same -- i'm not sure if  
> they are.
>
> skimming hte history of LUCENE-565 it's not clear to me why this was
> implemented as a new class with the name NewIndexModifier ... was that
> just how it evolved organicly?
>
>
> if it's not possible to make this class replace IndexModifier, then
> DeletingIndexWriter or BufferedDeletingIndexWriter seem like they  
> would be
> fine to me.
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Doron Cohen
Otis Gospodnetic wrote:

> It's a temporary name, no?  In the end we probably want to keep the
> _name_ IndexWriter, so why not just it IndexWriter2 and when we are
> happy with it, we make it be the new IndexWriter and we deprecate IW2.

For a temporary solution it seems good. But do you also mean releasing
2.1 with IW2? If so we need to javadoc very clearly that this is very
probably a temporary class.

Otherwise -

Chris Hostetter wrote:

> if it's not possible to make this class replace IndexModifier, then
> DeletingIndexWriter or BufferedDeletingIndexWriter seem like they would
be
> fine to me.

I prefer shorter names (when they are clear enough).
BufferedDeletingIndexWriter seems quite long.
Since IndexWriter too is buffering added documents, seems
it is mostly the deletion that distinguishes the two.

So my preference is DeletingIndexWriter.

Also,

Michael McCandless wrote:

> Long-ish term I think we should aim for one reader class (IndexReader)
> that you use to do read-only things and one writer class
> (NewIndexModifier being closest to this now) to make changes (adds,
> deletes, optimize, etc.) to an index.

This sounds great. But at least one use case may no long be possible this
way: there are probably applications 'out there' deleting documents in
this logic: search the index, examine returned docs - post-processing
them using some app-specific logic not well encapsulated in the index,
select a few, delete them by id.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Chris Hostetter

: It's a temporary name, no?  In the end we probably want to keep the
: _name_ IndexWriter, so why not just it IndexWriter2 and when we are
: happy with it, we make it be the new IndexWriter and we deprecate IW2.

Um... actually that's a really good point, this is first and formost an
extension of IndexWriter ... is there any reason not to rename
"NewIndexModifier" as "IndexWriter" (refactoring the existing IndexWriter
code into it, or moving the renaming the current IndexWriter to
"OldIndexWriter", or "NonDeletingIndexWriter")

the only reason i can think of not to do this would be if we are worried
about people who currently subclass IndexWriter getting a change in
behavior if we change the INdexWRiter out from under them ... is this a
signifcant concern? NewIndexModifier doesn't seem to change any of hte
semantics of the IndexWriter methods it extends.


- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Thursday, February 8, 2007 4:08:31 AM
Subject: Re: NewIndexModifier - - - DeletingIndexWriter

Maybe IndexMaintainer or IndexUpdater ?

On Feb 8, 2007, at 2:59 AM, Chris Hostetter wrote:

> :
> : As 2.1 is soon coming, I wonder if NewIndexModifier is a proper
> name for
> : the public API.
> : (It would be the first NewXYZ and there is no OldXYZ either...)
> :
> : How about renaming it to something like DeletingIndexWriter?
>
> I haven't been following the Jira issue that closely (LUCENE-565),
> but as
> i recall the name question comes up because the class originally
> intended
> to replace IndexModifier, but it's API/purpose has evolved so that
> it is
> no longer a suitable "drop in replacement" for the current
> IndexModifier
> is that correct?
>
> NewIndexModifier currently subclasses IndexWriter and adds the
> following
> public methods...
>
>   public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms)
>   public int getMaxBufferedDeleteTerms()
>   public void updateDocument(Term term, Document doc)
>   public void updateDocument(Term term, Document doc, Analyzer
> analyzer)
>   public synchronized void deleteDocuments(Term term)
>   public synchronized void deleteDocuments(Term[] terms)
>
> from an API standpoint, it seems like this could easily replace the
> current IndexModifier (which would have the nice side effect of
> resolving
> the issue of wether the name NewIndexModifier is good enough)
> assuming the
> semantics of the classes/methdos are the same -- i'm not sure if
> they are.
>
> skimming hte history of LUCENE-565 it's not clear to me why this was
> implemented as a new class with the name NewIndexModifier ... was that
> just how it evolved organicly?
>
>
> if it's not possible to make this class replace IndexModifier, then
> DeletingIndexWriter or BufferedDeletingIndexWriter seem like they
> would be
> fine to me.
>
> -Hoss
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Michael McCandless

Chris Hostetter wrote:

: It's a temporary name, no?  In the end we probably want to keep the
: _name_ IndexWriter, so why not just it IndexWriter2 and when we are
: happy with it, we make it be the new IndexWriter and we deprecate IW2.

Um... actually that's a really good point, this is first and formost an
extension of IndexWriter ... is there any reason not to rename
"NewIndexModifier" as "IndexWriter" (refactoring the existing IndexWriter
code into it, or moving the renaming the current IndexWriter to
"OldIndexWriter", or "NonDeletingIndexWriter")

the only reason i can think of not to do this would be if we are worried
about people who currently subclass IndexWriter getting a change in
behavior if we change the INdexWRiter out from under them ... is this a
signifcant concern? NewIndexModifier doesn't seem to change any of hte
semantics of the IndexWriter methods it extends.


+1

I think the new methods in NewIndexModifier are low-risk to the
existing IndexWriter, so, we should just add them into IndexWriter and
not create a new class?  Then we don't have a naming problem anymore :)

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Michael McCandless

Doron Cohen wrote:


Michael McCandless wrote:


Long-ish term I think we should aim for one reader class (IndexReader)
that you use to do read-only things and one writer class
(NewIndexModifier being closest to this now) to make changes (adds,
deletes, optimize, etc.) to an index.


This sounds great. But at least one use case may no long be possible this
way: there are probably applications 'out there' deleting documents in
this logic: search the index, examine returned docs - post-processing
them using some app-specific logic not well encapsulated in the index,
select a few, delete them by id.


This is a very good point.  I think we must keep the deleteDocument*
methods in IndexReader around to continue to support this use case.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Yonik Seeley

On 2/8/07, Michael McCandless <[EMAIL PROTECTED]> wrote:

I think the new methods in NewIndexModifier are low-risk to the
existing IndexWriter, so, we should just add them into IndexWriter and
not create a new class?  Then we don't have a naming problem anymore :)


The original versions of that patches have been removed, but I was
originally concerned about overhead to the IndexWriter for people who
didn't use that delete functionallity (opening readers, keeping track
of the segment number for adds, etc).

Also, I think the extension points are important since
NewIndexModifier does not (and probably never will be able to) do
everything people need.  There can be very complex ways of identifying
documents to be deleted.  It's also nice knowing when documents in ram
will be flushed to disk... one reason being that it correlates with
document visibility if you open a new reader.

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Chris Hostetter

: The original versions of that patches have been removed, but I was
: originally concerned about overhead to the IndexWriter for people who
: didn't use that delete functionallity (opening readers, keeping track
: of the segment number for adds, etc).

do you still have those concerns with the version of NewIndexModifier on
the trunk? ... skimming it i don't see any reason why it would add
overhead to people using only the existing IndexWriter methods.

: Also, I think the extension points are important since
: NewIndexModifier does not (and probably never will be able to) do
: everything people need.  There can be very complex ways of identifying
: documents to be deleted.  It's also nice knowing when documents in ram
: will be flushed to disk... one reason being that it correlates with
: document visibility if you open a new reader.

Which extension points are you refering to? ... the existing protected
methods in IndexWriter and NewIndexModifier can of course be left as they
are for other subclasses to override -- i'm just wondering if there is any
benefit in IndexWriter and NewIndexModifier being seperate classes.



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NewIndexModifier - - - DeletingIndexWriter

2007-02-08 Thread Michael McCandless

Yonik Seeley wrote:

> Also, I think the extension points are important since
> NewIndexModifier does not (and probably never will be able to) do
> everything people need.

I agree extensions points are nice.  Maybe we could leave the
extension points ("doAfterFlushRamSegments", etc.) but merge
NewIndexModifier into IndexWriter?

Though I do worry that by adding these extension points we tie our
hands for later.  EG if we at some point improve IndexWriter to be
more like KinoSearch or Ferret, which have drastically different ways
to use RAM to buffer documents, then we will break people who rely on
these extensions points?  I guess we could also just make a whole new
writer class at that point.  It just feels like it may be too early to
expose the implementation details of how the current writer buffers
things in RAM and builds/flushes a segment on commit.

> There can be very complex ways of identifying documents to be
> deleted.

Agreed.  I think we must continue to support deleting from an
IndexReader for these advanced use cases.

> It's also nice knowing when documents in ram will be flushed to
> disk... one reason being that it correlates with document visibility
> if you open a new reader.

Do you mean this is a good use case for the extension point
"doAfterFlushFlushRamSegments"?  I agree.  Though with LUCENE-710
(adding "commit on close" to IndexWriter) you then have complete
control on visibility to readers.

I also think it's now confusing to users which class (IndexModifier,
NewIndexModifier, IndexWriter) to use to write to an index.  I would
prefer a single IndexWriter class now because this is more closely
towards our eventual goal of "use IndexWriter to make changes; use
IndexReader to search/read".

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-762) [PATCH] Efficiently retrieve sizes of field values

2007-02-08 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-762.


Resolution: Fixed

I have committed the original patch.  All tests pass.  In the end, I could not 
find a way I felt comfortable with for getting rid of the if-then-else clause 
in FieldsReader.  I did add a TODO item there to remind us to go back and take 
a look at it again later.

Since the if clauses are ordered according to their most common usages (I 
think), I don't think there will be much of a performance issue w/ the current 
approach.

> [PATCH] Efficiently retrieve sizes of field values
> --
>
> Key: LUCENE-762
> URL: https://issues.apache.org/jira/browse/LUCENE-762
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Store
>Affects Versions: 2.1
>Reporter: Chuck Williams
> Assigned To: Grant Ingersoll
>Priority: Minor
> Attachments: SizeFieldSelector.patch
>
>
> Sometimes an application would like to know how large a document is before 
> retrieving it.  This can be important for memory management or choosing 
> between algorithms, especially in cases where documents might be very large.
> This patch extends the existing FieldSelector mechanism with two new 
> FieldSelectorResults:  SIZE and SIZE_AND_BREAK.  SIZE creates fields on the 
> retrieved document that store field sizes instead of actual values.  
> SIZE_AND_BREAK is especially efficient if one field comprises the bulk of the 
> document size (e.g., the body field) and can thus be used as a reasonable 
> size approximation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]