Re: NewIndexModifier - - - DeletingIndexWriter
: : As 2.1 is soon coming, I wonder if NewIndexModifier is a proper name for : the public API. : (It would be the first NewXYZ and there is no OldXYZ either...) : : How about renaming it to something like DeletingIndexWriter? I haven't been following the Jira issue that closely (LUCENE-565), but as i recall the name question comes up because the class originally intended to replace IndexModifier, but it's API/purpose has evolved so that it is no longer a suitable "drop in replacement" for the current IndexModifier is that correct? NewIndexModifier currently subclasses IndexWriter and adds the following public methods... public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms) public int getMaxBufferedDeleteTerms() public void updateDocument(Term term, Document doc) public void updateDocument(Term term, Document doc, Analyzer analyzer) public synchronized void deleteDocuments(Term term) public synchronized void deleteDocuments(Term[] terms) from an API standpoint, it seems like this could easily replace the current IndexModifier (which would have the nice side effect of resolving the issue of wether the name NewIndexModifier is good enough) assuming the semantics of the classes/methdos are the same -- i'm not sure if they are. skimming hte history of LUCENE-565 it's not clear to me why this was implemented as a new class with the name NewIndexModifier ... was that just how it evolved organicly? if it's not possible to make this class replace IndexModifier, then DeletingIndexWriter or BufferedDeletingIndexWriter seem like they would be fine to me. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
Maybe IndexMaintainer or IndexUpdater ? On Feb 8, 2007, at 2:59 AM, Chris Hostetter wrote: : : As 2.1 is soon coming, I wonder if NewIndexModifier is a proper name for : the public API. : (It would be the first NewXYZ and there is no OldXYZ either...) : : How about renaming it to something like DeletingIndexWriter? I haven't been following the Jira issue that closely (LUCENE-565), but as i recall the name question comes up because the class originally intended to replace IndexModifier, but it's API/purpose has evolved so that it is no longer a suitable "drop in replacement" for the current IndexModifier is that correct? NewIndexModifier currently subclasses IndexWriter and adds the following public methods... public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms) public int getMaxBufferedDeleteTerms() public void updateDocument(Term term, Document doc) public void updateDocument(Term term, Document doc, Analyzer analyzer) public synchronized void deleteDocuments(Term term) public synchronized void deleteDocuments(Term[] terms) from an API standpoint, it seems like this could easily replace the current IndexModifier (which would have the nice side effect of resolving the issue of wether the name NewIndexModifier is good enough) assuming the semantics of the classes/methdos are the same -- i'm not sure if they are. skimming hte history of LUCENE-565 it's not clear to me why this was implemented as a new class with the name NewIndexModifier ... was that just how it evolved organicly? if it's not possible to make this class replace IndexModifier, then DeletingIndexWriter or BufferedDeletingIndexWriter seem like they would be fine to me. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
new issue
i want the document object of the result of the search in form of resultset so that i could use it in my application please suggest - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: new issue
ask you question on the user list On Feb 8, 2007, at 3:18 AM, Gaurav Srivastava wrote: i want the document object of the result of the search in form of resultset so that i could use it in my application please suggest - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
I like the name BufferedDeletingIndexWriter best so far. Chris Hostetter wrote: > from an API standpoint, it seems like this could easily replace the > current IndexModifier (which would have the nice side effect of > resolving the issue of wether the name NewIndexModifier is good > enough) assuming the semantics of the classes/methdos are the same > -- i'm not sure if they are. The one method missing vs IndexModifier, which prevents this being a drop in replacement, is deleteDocument(int docNum). This specific issue was discussed here: https://issues.apache.org/jira/browse/LUCENE-565#action_12430130 and I would tend to agree with that logic (exposing a deleteDocument(int docNum) is dangerous), meaning we can't drop in replace the current IndexModifier. Maybe we could deprecate the existing IndexModifier? > skimming hte history of LUCENE-565 it's not clear to me why this was > implemented as a new class with the name NewIndexModifier ... was > that just how it evolved organicly? Well, the patch started life as direct improvements to IndexWriter. But people were concerned w/ that approach and suggested enabling sub-classing of IndexWriter instead, which lead to the current NewIndexModifier class. Long-ish term I think we should aim for one reader class (IndexReader) that you use to do read-only things and one writer class (NewIndexModifier being closest to this now) to make changes (adds, deletes, optimize, etc.) to an index. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
It's a temporary name, no? In the end we probably want to keep the _name_ IndexWriter, so why not just it IndexWriter2 and when we are happy with it, we make it be the new IndexWriter and we deprecate IW2. Otis - Original Message From: robert engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, February 8, 2007 4:08:31 AM Subject: Re: NewIndexModifier - - - DeletingIndexWriter Maybe IndexMaintainer or IndexUpdater ? On Feb 8, 2007, at 2:59 AM, Chris Hostetter wrote: > : > : As 2.1 is soon coming, I wonder if NewIndexModifier is a proper > name for > : the public API. > : (It would be the first NewXYZ and there is no OldXYZ either...) > : > : How about renaming it to something like DeletingIndexWriter? > > I haven't been following the Jira issue that closely (LUCENE-565), > but as > i recall the name question comes up because the class originally > intended > to replace IndexModifier, but it's API/purpose has evolved so that > it is > no longer a suitable "drop in replacement" for the current > IndexModifier > is that correct? > > NewIndexModifier currently subclasses IndexWriter and adds the > following > public methods... > > public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms) > public int getMaxBufferedDeleteTerms() > public void updateDocument(Term term, Document doc) > public void updateDocument(Term term, Document doc, Analyzer > analyzer) > public synchronized void deleteDocuments(Term term) > public synchronized void deleteDocuments(Term[] terms) > > from an API standpoint, it seems like this could easily replace the > current IndexModifier (which would have the nice side effect of > resolving > the issue of wether the name NewIndexModifier is good enough) > assuming the > semantics of the classes/methdos are the same -- i'm not sure if > they are. > > skimming hte history of LUCENE-565 it's not clear to me why this was > implemented as a new class with the name NewIndexModifier ... was that > just how it evolved organicly? > > > if it's not possible to make this class replace IndexModifier, then > DeletingIndexWriter or BufferedDeletingIndexWriter seem like they > would be > fine to me. > > -Hoss > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
Otis Gospodnetic wrote: > It's a temporary name, no? In the end we probably want to keep the > _name_ IndexWriter, so why not just it IndexWriter2 and when we are > happy with it, we make it be the new IndexWriter and we deprecate IW2. For a temporary solution it seems good. But do you also mean releasing 2.1 with IW2? If so we need to javadoc very clearly that this is very probably a temporary class. Otherwise - Chris Hostetter wrote: > if it's not possible to make this class replace IndexModifier, then > DeletingIndexWriter or BufferedDeletingIndexWriter seem like they would be > fine to me. I prefer shorter names (when they are clear enough). BufferedDeletingIndexWriter seems quite long. Since IndexWriter too is buffering added documents, seems it is mostly the deletion that distinguishes the two. So my preference is DeletingIndexWriter. Also, Michael McCandless wrote: > Long-ish term I think we should aim for one reader class (IndexReader) > that you use to do read-only things and one writer class > (NewIndexModifier being closest to this now) to make changes (adds, > deletes, optimize, etc.) to an index. This sounds great. But at least one use case may no long be possible this way: there are probably applications 'out there' deleting documents in this logic: search the index, examine returned docs - post-processing them using some app-specific logic not well encapsulated in the index, select a few, delete them by id. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
: It's a temporary name, no? In the end we probably want to keep the : _name_ IndexWriter, so why not just it IndexWriter2 and when we are : happy with it, we make it be the new IndexWriter and we deprecate IW2. Um... actually that's a really good point, this is first and formost an extension of IndexWriter ... is there any reason not to rename "NewIndexModifier" as "IndexWriter" (refactoring the existing IndexWriter code into it, or moving the renaming the current IndexWriter to "OldIndexWriter", or "NonDeletingIndexWriter") the only reason i can think of not to do this would be if we are worried about people who currently subclass IndexWriter getting a change in behavior if we change the INdexWRiter out from under them ... is this a signifcant concern? NewIndexModifier doesn't seem to change any of hte semantics of the IndexWriter methods it extends. - Original Message From: robert engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Thursday, February 8, 2007 4:08:31 AM Subject: Re: NewIndexModifier - - - DeletingIndexWriter Maybe IndexMaintainer or IndexUpdater ? On Feb 8, 2007, at 2:59 AM, Chris Hostetter wrote: > : > : As 2.1 is soon coming, I wonder if NewIndexModifier is a proper > name for > : the public API. > : (It would be the first NewXYZ and there is no OldXYZ either...) > : > : How about renaming it to something like DeletingIndexWriter? > > I haven't been following the Jira issue that closely (LUCENE-565), > but as > i recall the name question comes up because the class originally > intended > to replace IndexModifier, but it's API/purpose has evolved so that > it is > no longer a suitable "drop in replacement" for the current > IndexModifier > is that correct? > > NewIndexModifier currently subclasses IndexWriter and adds the > following > public methods... > > public void setMaxBufferedDeleteTerms(int maxBufferedDeleteTerms) > public int getMaxBufferedDeleteTerms() > public void updateDocument(Term term, Document doc) > public void updateDocument(Term term, Document doc, Analyzer > analyzer) > public synchronized void deleteDocuments(Term term) > public synchronized void deleteDocuments(Term[] terms) > > from an API standpoint, it seems like this could easily replace the > current IndexModifier (which would have the nice side effect of > resolving > the issue of wether the name NewIndexModifier is good enough) > assuming the > semantics of the classes/methdos are the same -- i'm not sure if > they are. > > skimming hte history of LUCENE-565 it's not clear to me why this was > implemented as a new class with the name NewIndexModifier ... was that > just how it evolved organicly? > > > if it's not possible to make this class replace IndexModifier, then > DeletingIndexWriter or BufferedDeletingIndexWriter seem like they > would be > fine to me. > > -Hoss > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] : : - : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
Chris Hostetter wrote: : It's a temporary name, no? In the end we probably want to keep the : _name_ IndexWriter, so why not just it IndexWriter2 and when we are : happy with it, we make it be the new IndexWriter and we deprecate IW2. Um... actually that's a really good point, this is first and formost an extension of IndexWriter ... is there any reason not to rename "NewIndexModifier" as "IndexWriter" (refactoring the existing IndexWriter code into it, or moving the renaming the current IndexWriter to "OldIndexWriter", or "NonDeletingIndexWriter") the only reason i can think of not to do this would be if we are worried about people who currently subclass IndexWriter getting a change in behavior if we change the INdexWRiter out from under them ... is this a signifcant concern? NewIndexModifier doesn't seem to change any of hte semantics of the IndexWriter methods it extends. +1 I think the new methods in NewIndexModifier are low-risk to the existing IndexWriter, so, we should just add them into IndexWriter and not create a new class? Then we don't have a naming problem anymore :) Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
Doron Cohen wrote: Michael McCandless wrote: Long-ish term I think we should aim for one reader class (IndexReader) that you use to do read-only things and one writer class (NewIndexModifier being closest to this now) to make changes (adds, deletes, optimize, etc.) to an index. This sounds great. But at least one use case may no long be possible this way: there are probably applications 'out there' deleting documents in this logic: search the index, examine returned docs - post-processing them using some app-specific logic not well encapsulated in the index, select a few, delete them by id. This is a very good point. I think we must keep the deleteDocument* methods in IndexReader around to continue to support this use case. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
On 2/8/07, Michael McCandless <[EMAIL PROTECTED]> wrote: I think the new methods in NewIndexModifier are low-risk to the existing IndexWriter, so, we should just add them into IndexWriter and not create a new class? Then we don't have a naming problem anymore :) The original versions of that patches have been removed, but I was originally concerned about overhead to the IndexWriter for people who didn't use that delete functionallity (opening readers, keeping track of the segment number for adds, etc). Also, I think the extension points are important since NewIndexModifier does not (and probably never will be able to) do everything people need. There can be very complex ways of identifying documents to be deleted. It's also nice knowing when documents in ram will be flushed to disk... one reason being that it correlates with document visibility if you open a new reader. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
: The original versions of that patches have been removed, but I was : originally concerned about overhead to the IndexWriter for people who : didn't use that delete functionallity (opening readers, keeping track : of the segment number for adds, etc). do you still have those concerns with the version of NewIndexModifier on the trunk? ... skimming it i don't see any reason why it would add overhead to people using only the existing IndexWriter methods. : Also, I think the extension points are important since : NewIndexModifier does not (and probably never will be able to) do : everything people need. There can be very complex ways of identifying : documents to be deleted. It's also nice knowing when documents in ram : will be flushed to disk... one reason being that it correlates with : document visibility if you open a new reader. Which extension points are you refering to? ... the existing protected methods in IndexWriter and NewIndexModifier can of course be left as they are for other subclasses to override -- i'm just wondering if there is any benefit in IndexWriter and NewIndexModifier being seperate classes. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NewIndexModifier - - - DeletingIndexWriter
Yonik Seeley wrote: > Also, I think the extension points are important since > NewIndexModifier does not (and probably never will be able to) do > everything people need. I agree extensions points are nice. Maybe we could leave the extension points ("doAfterFlushRamSegments", etc.) but merge NewIndexModifier into IndexWriter? Though I do worry that by adding these extension points we tie our hands for later. EG if we at some point improve IndexWriter to be more like KinoSearch or Ferret, which have drastically different ways to use RAM to buffer documents, then we will break people who rely on these extensions points? I guess we could also just make a whole new writer class at that point. It just feels like it may be too early to expose the implementation details of how the current writer buffers things in RAM and builds/flushes a segment on commit. > There can be very complex ways of identifying documents to be > deleted. Agreed. I think we must continue to support deleting from an IndexReader for these advanced use cases. > It's also nice knowing when documents in ram will be flushed to > disk... one reason being that it correlates with document visibility > if you open a new reader. Do you mean this is a good use case for the extension point "doAfterFlushFlushRamSegments"? I agree. Though with LUCENE-710 (adding "commit on close" to IndexWriter) you then have complete control on visibility to readers. I also think it's now confusing to users which class (IndexModifier, NewIndexModifier, IndexWriter) to use to write to an index. I would prefer a single IndexWriter class now because this is more closely towards our eventual goal of "use IndexWriter to make changes; use IndexReader to search/read". Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-762) [PATCH] Efficiently retrieve sizes of field values
[ https://issues.apache.org/jira/browse/LUCENE-762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-762. Resolution: Fixed I have committed the original patch. All tests pass. In the end, I could not find a way I felt comfortable with for getting rid of the if-then-else clause in FieldsReader. I did add a TODO item there to remind us to go back and take a look at it again later. Since the if clauses are ordered according to their most common usages (I think), I don't think there will be much of a performance issue w/ the current approach. > [PATCH] Efficiently retrieve sizes of field values > -- > > Key: LUCENE-762 > URL: https://issues.apache.org/jira/browse/LUCENE-762 > Project: Lucene - Java > Issue Type: New Feature > Components: Store >Affects Versions: 2.1 >Reporter: Chuck Williams > Assigned To: Grant Ingersoll >Priority: Minor > Attachments: SizeFieldSelector.patch > > > Sometimes an application would like to know how large a document is before > retrieving it. This can be important for memory management or choosing > between algorithms, especially in cases where documents might be very large. > This patch extends the existing FieldSelector mechanism with two new > FieldSelectorResults: SIZE and SIZE_AND_BREAK. SIZE creates fields on the > retrieved document that store field sizes instead of actual values. > SIZE_AND_BREAK is especially efficient if one field comprises the bulk of the > document size (e.g., the body field) and can thus be used as a reasonable > size approximation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]