Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 10:06 AM, Benson Margulies wrote: > On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir wrote: >> Thanks Benson: look like the problem revolves around indexing >> Document/Fields you get back from IR.document... this has always been >> 'lossy', but I think this is a real API trap. >> >> Please keep testing :) > > Got a suggestion for sneaking around this in the mean time? I just put a comment on the issue: you have to build a new Document rather than re-index a Document loaded from IR.document. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 10:04 AM, Robert Muir wrote: > Thanks Benson: look like the problem revolves around indexing > Document/Fields you get back from IR.document... this has always been > 'lossy', but I think this is a real API trap. > > Please keep testing :) Got a suggestion for sneaking around this in the mean time? > > On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies > wrote: >> On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler wrote: >>> String field is analyzed, but with KeywordTokenizer, so all should be fine. >> >> I filed LUCENE-3854. >> >>> >>> - >>> Uwe Schindler >>> H.-H.-Meier-Allee 63, D-28213 Bremen >>> http://www.thetaphi.de >>> eMail: u...@thetaphi.de >>> >>> >>>> -Original Message- >>>> From: Michael McCandless [mailto:luc...@mikemccandless.com] >>>> Sent: Tuesday, March 06, 2012 3:42 PM >>>> To: java-user@lucene.apache.org >>>> Subject: Re: Problem with updating a document or TermQuery with current >>>> trunk >>>> >>>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing >>>> StringField when we shouldn't... >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote: >>>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies >>>> wrote: >>>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >>>> >>> I think the issue is that your analyzer is standardanalyzer, yet >>>> >>> field text value is "value-1" >>>> >> >>>> >> Robert, >>>> >> >>>> >> Why is this field analyzed at all? It's built with >>> StringField.TYPE_STORED. >>>> >> >>>> > >>>> > thanks Benson, you are right! >>>> > >>>> > -- >>>> > lucidimagination.com >>>> > >>>> > - >>>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> > >>>> >>>> - >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > -- > lucidimagination.com > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Thanks Benson: look like the problem revolves around indexing Document/Fields you get back from IR.document... this has always been 'lossy', but I think this is a real API trap. Please keep testing :) On Tue, Mar 6, 2012 at 9:58 AM, Benson Margulies wrote: > On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler wrote: >> String field is analyzed, but with KeywordTokenizer, so all should be fine. > > I filed LUCENE-3854. > >> >> - >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -Original Message- >>> From: Michael McCandless [mailto:luc...@mikemccandless.com] >>> Sent: Tuesday, March 06, 2012 3:42 PM >>> To: java-user@lucene.apache.org >>> Subject: Re: Problem with updating a document or TermQuery with current >>> trunk >>> >>> Hmm something is up here... I'll dig. Seems like we are somehow analyzing >>> StringField when we shouldn't... >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote: >>> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies >>> wrote: >>> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >>> >>> I think the issue is that your analyzer is standardanalyzer, yet >>> >>> field text value is "value-1" >>> >> >>> >> Robert, >>> >> >>> >> Why is this field analyzed at all? It's built with >> StringField.TYPE_STORED. >>> >> >>> > >>> > thanks Benson, you are right! >>> > >>> > -- >>> > lucidimagination.com >>> > >>> > - >>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> > For additional commands, e-mail: java-user-h...@lucene.apache.org >>> > >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 9:47 AM, Uwe Schindler wrote: > String field is analyzed, but with KeywordTokenizer, so all should be fine. I filed LUCENE-3854. > > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > >> -Original Message- >> From: Michael McCandless [mailto:luc...@mikemccandless.com] >> Sent: Tuesday, March 06, 2012 3:42 PM >> To: java-user@lucene.apache.org >> Subject: Re: Problem with updating a document or TermQuery with current >> trunk >> >> Hmm something is up here... I'll dig. Seems like we are somehow analyzing >> StringField when we shouldn't... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote: >> > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies >> wrote: >> >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >> >>> I think the issue is that your analyzer is standardanalyzer, yet >> >>> field text value is "value-1" >> >> >> >> Robert, >> >> >> >> Why is this field analyzed at all? It's built with > StringField.TYPE_STORED. >> >> >> > >> > thanks Benson, you are right! >> > >> > -- >> > lucidimagination.com >> > >> > - >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: Problem with updating a document or TermQuery with current trunk
String field is analyzed, but with KeywordTokenizer, so all should be fine. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Tuesday, March 06, 2012 3:42 PM > To: java-user@lucene.apache.org > Subject: Re: Problem with updating a document or TermQuery with current > trunk > > Hmm something is up here... I'll dig. Seems like we are somehow analyzing > StringField when we shouldn't... > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote: > > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies > wrote: > >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: > >>> I think the issue is that your analyzer is standardanalyzer, yet > >>> field text value is "value-1" > >> > >> Robert, > >> > >> Why is this field analyzed at all? It's built with StringField.TYPE_STORED. > >> > > > > thanks Benson, you are right! > > > > -- > > lucidimagination.com > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote: > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies > wrote: >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >>> I think the issue is that your analyzer is standardanalyzer, yet field >>> text value is "value-1" >> >> Robert, >> >> Why is this field analyzed at all? It's built with StringField.TYPE_STORED. >> > > thanks Benson, you are right! So, should I attach this to a JIRA? > > -- > lucidimagination.com > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
Hmm something is up here... I'll dig. Seems like we are somehow analyzing StringField when we shouldn't... Mike McCandless http://blog.mikemccandless.com On Tue, Mar 6, 2012 at 9:33 AM, Robert Muir wrote: > On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies > wrote: >> On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >>> I think the issue is that your analyzer is standardanalyzer, yet field >>> text value is "value-1" >> >> Robert, >> >> Why is this field analyzed at all? It's built with StringField.TYPE_STORED. >> > > thanks Benson, you are right! > > -- > lucidimagination.com > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies wrote: > On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >> I think the issue is that your analyzer is standardanalyzer, yet field >> text value is "value-1" > > Robert, > > Why is this field analyzed at all? It's built with StringField.TYPE_STORED. > thanks Benson, you are right! -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 9:23 AM, Benson Margulies wrote: > On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: >> I think the issue is that your analyzer is standardanalyzer, yet field >> text value is "value-1" > > Robert, > > Why is this field analyzed at all? It's built with StringField.TYPE_STORED. > > I'll push another copy that shows that it works fine when the doc is > first added, and gets bad after the 'update', when the field acquires > the 'tokenized' boolean mysteriously. I pushed a new copy that runs the query successfully before the 'delete/add' sequence, and then fails afterwards. > > --benson > > >> >> So standardanalyzer will tokenize this into two terms: "value" and "1" >> >> But later, you proceed to do TermQueries on "value-1". This term won't >> exist... TermQuery etc that take Term don't analyze any text. >> >> Instead usually higher-level things like QueryParsers analyze text into >> Terms. >> >> On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies >> wrote: >>> I've posted a self-contained test case to github of a mystery. >>> >>> git://github.com/bimargulies/lucene-4-update-case.git >>> >>> The code can be seen at >>> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. >>> >>> I write a doc to an index, close the index, then reopen and do a >>> delete/add on the doc to add a field. If I iterate the docs in the >>> index, all looks well, but when I try to query for the doc, it isn't >>> found. >>> >>> To be a bit more specific, the doc has a field "field1" which is a >>> StringField.TYPE_STORED, and it is a query on that field which comes >>> up empty. >>> >>> I expect to learn that I've missed something obvious, and I offer >>> thanks and apologies in advance. >>> >>> - >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> >> >> -- >> lucidimagination.com >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
On Tue, Mar 6, 2012 at 9:20 AM, Robert Muir wrote: > I think the issue is that your analyzer is standardanalyzer, yet field > text value is "value-1" Robert, Why is this field analyzed at all? It's built with StringField.TYPE_STORED. I'll push another copy that shows that it works fine when the doc is first added, and gets bad after the 'update', when the field acquires the 'tokenized' boolean mysteriously. --benson > > So standardanalyzer will tokenize this into two terms: "value" and "1" > > But later, you proceed to do TermQueries on "value-1". This term won't > exist... TermQuery etc that take Term don't analyze any text. > > Instead usually higher-level things like QueryParsers analyze text into Terms. > > On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies > wrote: >> I've posted a self-contained test case to github of a mystery. >> >> git://github.com/bimargulies/lucene-4-update-case.git >> >> The code can be seen at >> https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. >> >> I write a doc to an index, close the index, then reopen and do a >> delete/add on the doc to add a field. If I iterate the docs in the >> index, all looks well, but when I try to query for the doc, it isn't >> found. >> >> To be a bit more specific, the doc has a field "field1" which is a >> StringField.TYPE_STORED, and it is a query on that field which comes >> up empty. >> >> I expect to learn that I've missed something obvious, and I offer >> thanks and apologies in advance. >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > -- > lucidimagination.com > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Problem with updating a document or TermQuery with current trunk
I think the issue is that your analyzer is standardanalyzer, yet field text value is "value-1" So standardanalyzer will tokenize this into two terms: "value" and "1" But later, you proceed to do TermQueries on "value-1". This term won't exist... TermQuery etc that take Term don't analyze any text. Instead usually higher-level things like QueryParsers analyze text into Terms. On Tue, Mar 6, 2012 at 8:35 AM, Benson Margulies wrote: > I've posted a self-contained test case to github of a mystery. > > git://github.com/bimargulies/lucene-4-update-case.git > > The code can be seen at > https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. > > I write a doc to an index, close the index, then reopen and do a > delete/add on the doc to add a field. If I iterate the docs in the > index, all looks well, but when I try to query for the doc, it isn't > found. > > To be a bit more specific, the doc has a field "field1" which is a > StringField.TYPE_STORED, and it is a query on that field which comes > up empty. > > I expect to learn that I've missed something obvious, and I offer > thanks and apologies in advance. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Problem with updating a document or TermQuery with current trunk
I've posted a self-contained test case to github of a mystery. git://github.com/bimargulies/lucene-4-update-case.git The code can be seen at https://github.com/bimargulies/lucene-4-update-case/blob/master/src/test/java/org/apache/lucene/BadFieldTokenizedFlagTest.java. I write a doc to an index, close the index, then reopen and do a delete/add on the doc to add a field. If I iterate the docs in the index, all looks well, but when I try to query for the doc, it isn't found. To be a bit more specific, the doc has a field "field1" which is a StringField.TYPE_STORED, and it is a query on that field which comes up empty. I expect to learn that I've missed something obvious, and I offer thanks and apologies in advance. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Updating a document.
document id will be subject to changes. and all segments' document id is starting from zero. after a merge, document ids will also change. On Mon, Mar 5, 2012 at 12:31 AM, Benson Margulies wrote: > I am walking down the document in an index by number, and I find that > I want to update one. The updateDocument API only works on queries and > terms, not numbers. > > So I can call remove and add, but, then, what's the document's number > after that? Or is that not a meaningful question until I make a new > reader? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Updating a document.
if you want to identify a document, you should use a field such as url as Unique Key in solr On Mon, Mar 5, 2012 at 12:31 AM, Benson Margulies wrote: > I am walking down the document in an index by number, and I find that > I want to update one. The updateDocument API only works on queries and > terms, not numbers. > > So I can call remove and add, but, then, what's the document's number > after that? Or is that not a meaningful question until I make a new > reader? > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Updating a document.
I am walking down the document in an index by number, and I find that I want to update one. The updateDocument API only works on queries and terms, not numbers. So I can call remove and add, but, then, what's the document's number after that? Or is that not a meaningful question until I make a new reader? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Updating a document
Well, taking the code all together, what I expect is that you'll have a document after all is done that only has a "DocId" in it. Nowhere do you fetch the document from the index. What is your evidence that you haven't deleted the document? If you haven't reopened your reader after the above, you'll see the old view of the index, so be sure you close/open your searcher afterwards. Ian's suggestion is the best I think, since the code fragments you've provided don't tell the complete story. If there's an error in contract.getDocID(), we can't see it to help. It might help to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Fri, Jun 10, 2011 at 7:34 AM, Ian Lea wrote: > In different code samples you've got both DocId and DocID. If that > isn't the problem I suggest you post a complete little program that > demonstrates the problem. As small as possible, no external > dependencies. > > > -- > Ian. > > > On Fri, Jun 10, 2011 at 12:24 PM, Pranav goyal > wrote: >> Hi Danny, >> >> I have explained it above. >> >> It has many fields out of which DocId is the field which I am storing as >> well as indexing. While other fields I am just storing. >> And Each document has unique DocId. >> >> d=new Document(); >> File indexDir = new File("./index-dir"); >> StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_31); >> IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31, analyzer); >> try { >> writer = new IndexWriter(FSDirectory.open(indexDir),conf); >> } catch (IOException e1) { >> e1.printStackTrace(); >> } >> String q1 = contract.getDocId(); // Here I am getting my DocId >> Term term = new Term("DocID",contract.getDocId()); >> >> >> Rest I have stated above. >> >> >> >> On Fri, Jun 10, 2011 at 4:44 PM, Danny Lade wrote: >> >>> You delete it first using your id: >>> >>> > writer.deleteDocuments(term); >>> > >>> >>> and then re-add it with the same id: >>> >>> writer.addDocument(d); >>> > >>> >>> Please explain: >>> How looks your document BEFORE you try to delete it? (Which fields has it?) >>> >>> Greetings Danny >>> >> >> >> >> -- >> I'm very responsible, when ever something goes wrong they always say I'm >> responsible -- >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Updating a document
In different code samples you've got both DocId and DocID. If that isn't the problem I suggest you post a complete little program that demonstrates the problem. As small as possible, no external dependencies. -- Ian. On Fri, Jun 10, 2011 at 12:24 PM, Pranav goyal wrote: > Hi Danny, > > I have explained it above. > > It has many fields out of which DocId is the field which I am storing as > well as indexing. While other fields I am just storing. > And Each document has unique DocId. > > d=new Document(); > File indexDir = new File("./index-dir"); > StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_31); > IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31, analyzer); > try { > writer = new IndexWriter(FSDirectory.open(indexDir),conf); > } catch (IOException e1) { > e1.printStackTrace(); > } > String q1 = contract.getDocId(); // Here I am getting my DocId > Term term = new Term("DocID",contract.getDocId()); > > > Rest I have stated above. > > > > On Fri, Jun 10, 2011 at 4:44 PM, Danny Lade wrote: > >> You delete it first using your id: >> >> > writer.deleteDocuments(term); >> > >> >> and then re-add it with the same id: >> >> writer.addDocument(d); >> > >> >> Please explain: >> How looks your document BEFORE you try to delete it? (Which fields has it?) >> >> Greetings Danny >> > > > > -- > I'm very responsible, when ever something goes wrong they always say I'm > responsible -- > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Updating a document
Hi Danny, I have explained it above. It has many fields out of which DocId is the field which I am storing as well as indexing. While other fields I am just storing. And Each document has unique DocId. d=new Document(); File indexDir = new File("./index-dir"); StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_31); IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_31, analyzer); try { writer = new IndexWriter(FSDirectory.open(indexDir),conf); } catch (IOException e1) { e1.printStackTrace(); } String q1 = contract.getDocId();// Here I am getting my DocId Term term = new Term("DocID",contract.getDocId()); Rest I have stated above. On Fri, Jun 10, 2011 at 4:44 PM, Danny Lade wrote: > You delete it first using your id: > > >writer.deleteDocuments(term); > > > > and then re-add it with the same id: > > writer.addDocument(d); > > > > Please explain: > How looks your document BEFORE you try to delete it? (Which fields has it?) > > Greetings Danny > -- I'm very responsible, when ever something goes wrong they always say I'm responsible --
Re: Updating a document
You delete it first using your id: >writer.deleteDocuments(term); > and then re-add it with the same id: writer.addDocument(d); > Please explain: How looks your document BEFORE you try to delete it? (Which fields has it?) Greetings Danny
Re: Updating a document
When I am using a deleteAll() instead of deleteDocuments(); it's working fine. What can be the problem. Still not able to figure it out. On Fri, Jun 10, 2011 at 3:50 PM, Pranav goyal wrote: > Hi Ian, > > Thanks for your reply. But even this isn't working. > My document is not getting deleted. > > Can you please suggest me something else? > > > > > On Fri, Jun 10, 2011 at 3:21 PM, Ian Lea wrote: > >> Try Term term = new Term("DocId", contract.getDocId());. See the >> javadocs for the difference between that and what you have. >> >> You don't need to call optimize() all the time, it at all. >> >> >> -- >> Ian. >> >> >> On Fri, Jun 10, 2011 at 9:24 AM, Pranav goyal >> wrote: >> > Hi, >> > >> > I tried 3-4 ways to delete a document but still no results. I am using >> > Lucene 3.1 >> > >> > I used writer.UpdateDocuments(Term term, Document d) >> > as well as write.addDocument(d); and after that >> writer.deleteDocuments(d); >> > >> > Using both I am not able to delete the previous document. >> > >> > Is there any problem in my code? >> > >> > String q1 = contract.getDocId(); >> > Term term = new Term(contract.getDocId()); // where DocId is my field >> > try { >> >writer.deleteDocuments(term); >> >System.out.println("Deleting Document with the term "+term); >> >} catch (IOException e) { >> >e.printStackTrace(); //To change body of catch statement use >> > File | Settings | File Templates. >> > } >> > d.add(new Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED)); >> > writer.addDocument(d); >> > writer.optimize() ; >> > writer.close(); >> > >> > >> > Same is the result when I use writer.updateDocument(term,d) >> > >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > I'm very responsible, when ever something goes wrong they always say I'm > responsible -- > -- I'm very responsible, when ever something goes wrong they always say I'm responsible --
Re: Updating a document
Hi Ian, Thanks for your reply. But even this isn't working. My document is not getting deleted. Can you please suggest me something else? On Fri, Jun 10, 2011 at 3:21 PM, Ian Lea wrote: > Try Term term = new Term("DocId", contract.getDocId());. See the > javadocs for the difference between that and what you have. > > You don't need to call optimize() all the time, it at all. > > > -- > Ian. > > > On Fri, Jun 10, 2011 at 9:24 AM, Pranav goyal > wrote: > > Hi, > > > > I tried 3-4 ways to delete a document but still no results. I am using > > Lucene 3.1 > > > > I used writer.UpdateDocuments(Term term, Document d) > > as well as write.addDocument(d); and after that > writer.deleteDocuments(d); > > > > Using both I am not able to delete the previous document. > > > > Is there any problem in my code? > > > > String q1 = contract.getDocId(); > > Term term = new Term(contract.getDocId()); // where DocId is my field > > try { > >writer.deleteDocuments(term); > >System.out.println("Deleting Document with the term "+term); > >} catch (IOException e) { > >e.printStackTrace(); //To change body of catch statement use > > File | Settings | File Templates. > > } > > d.add(new Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED)); > > writer.addDocument(d); > > writer.optimize() ; > > writer.close(); > > > > > > Same is the result when I use writer.updateDocument(term,d) > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- I'm very responsible, when ever something goes wrong they always say I'm responsible --
Re: Updating a document
Try Term term = new Term("DocId", contract.getDocId());. See the javadocs for the difference between that and what you have. You don't need to call optimize() all the time, it at all. -- Ian. On Fri, Jun 10, 2011 at 9:24 AM, Pranav goyal wrote: > Hi, > > I tried 3-4 ways to delete a document but still no results. I am using > Lucene 3.1 > > I used writer.UpdateDocuments(Term term, Document d) > as well as write.addDocument(d); and after that writer.deleteDocuments(d); > > Using both I am not able to delete the previous document. > > Is there any problem in my code? > > String q1 = contract.getDocId(); > Term term = new Term(contract.getDocId()); // where DocId is my field > try { > writer.deleteDocuments(term); > System.out.println("Deleting Document with the term "+term); > } catch (IOException e) { > e.printStackTrace(); //To change body of catch statement use > File | Settings | File Templates. > } > d.add(new Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED)); > writer.addDocument(d); > writer.optimize() ; > writer.close(); > > > Same is the result when I use writer.updateDocument(term,d) > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Updating a document
Hi, I tried 3-4 ways to delete a document but still no results. I am using Lucene 3.1 I used writer.UpdateDocuments(Term term, Document d) as well as write.addDocument(d); and after that writer.deleteDocuments(d); Using both I am not able to delete the previous document. Is there any problem in my code? String q1 = contract.getDocId(); Term term = new Term(contract.getDocId()); // where DocId is my field try { writer.deleteDocuments(term); System.out.println("Deleting Document with the term "+term); } catch (IOException e) { e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates. } d.add(new Field("DocId",q1,Field.Store.YES,Field.Index.NOT_ANALYZED)); writer.addDocument(d); writer.optimize() ; writer.close(); Same is the result when I use writer.updateDocument(term,d)
RE: problem updating a document: no segments file?
just for the archives in case anyone else runs into this.. i had my lucene implementations index to a different directory allowing the searcher to work over the previous one while the index built the new one. then at the eend of building the new one, the indexing code would tell the searcher to use the new directory, and procede to delete any other directory it found in that area, representing any previous indexes. this worked fine on osX for the g4 and g5 i was using. on the server, running windows2003server, this didnt work. the code i pasted for the delete was bad and it was deleting everything. HOWEVEr, on the non-ms-server os's, the deleting couldnt get rid of the active files. on the windows server os, it was able to delete "segments" and none of the others. so it lies between the different os's various' ways of seeing: lock, busy, inactive files and what lucene is doing with the segment (and other really) files at that moment. and all this was brought by dumb code that allowed the attempted deletion of the current index anyway. it just made it to the staging server because the other os's didn't have this problem.so, live and learn! -Original Message- From: John Powers [mailto:[EMAIL PROTECTED] Sent: Sat 1/28/2006 9:13 PM To: java-user@lucene.apache.org Subject: RE: problem updating a document: no segments file? i feel confident in the delete sequence. i will run the things you ask for though.this does work on my laptop. the code that changed was some update method that was used in the first release. so before the only writes needed were done by this and it wholesale replaces. whereas the new feature updates documents. it worked fine in testing. and i look at the product of this method in other environments and they all have segment files and update fine. i do appreciate your response though. at least i have things to do to move this forward now. i will print out all the paths that get touched by this delete block, and the results of ech delete().its good to get rid of possibilities. baring this being the problem.. what would cause lucene to delete its segment file? --JN : : if (!newPath.equals(subDirs[i].getPath())) { part keeps it out of the new path. no, the boxes have different windows operating systems.probobly a slight difference in jvm. -Original Message- From: [EMAIL PROTECTED] on behalf of Chris Hostetter Sent: Sat 1/28/2006 2:37 AM To: java-user@lucene.apache.org Subject: RE: problem updating a document: no segments file? : this code works in a couple other boxes as is. that deleting code Are those boxes running the same OS? The same JVM? : removes the active index after this one builds in a different location. : then the searcher is told to make this newest one the current and the : old one is deleted. it effects directories and their entire contents. : it wouldnt select just a segment file. also, like i said, this runs I'm not convinced. If your getIndexDirectory() method is returning a string from a config file (or something like that) and it includes a trailing seperator (in addition to the seperator your code adds) then the index directory would be created fine (if i remember right, extra seperates don't generate an exception) but your string equality test would fail and you'd try to delete the files in the directory you just created. If your searcher has already opened some of the files, then (depending on your OS) the delete code may not be able to delete them -- you're not checking the return value from subFiles[j].delete() so you have no way of knowing. The segments file may be the only file getting deleted, because it may be the only file your searcher doesn't have open at the moment the delete code runs. If i remember right IndexSearcher only opens the segments file to get a list of all the individual segments, and then immdeiately closes it, but keeps the rest of hte files open permenantly. : fine on my laptop and g5. and actually, this code was fine till : recently on that box. i modified a different method and updated the : code to this server and now i have this problem. what was the code that changed? : : String newPath = getIndexLocation() + File.separator + : : System.currentTimeMillis(); : : : : IndexWriter writer = new IndexWriter(new File(newPath), : : analyzer, true); : : ... : : : writer.optimize(); : : writer.close(); : : : : SearchSO.setSearcher(newPath); : : : : File[] subFiles; : : File[] subDirs = new File(getIndexLocation()).listFiles(); : : : : for (int i=0;i- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: problem updating a document: no segments file?
i feel confident in the delete sequence. i will run the things you ask for though.this does work on my laptop. the code that changed was some update method that was used in the first release. so before the only writes needed were done by this and it wholesale replaces. whereas the new feature updates documents. it worked fine in testing. and i look at the product of this method in other environments and they all have segment files and update fine. i do appreciate your response though. at least i have things to do to move this forward now. i will print out all the paths that get touched by this delete block, and the results of ech delete().its good to get rid of possibilities. baring this being the problem.. what would cause lucene to delete its segment file? --JN : : if (!newPath.equals(subDirs[i].getPath())) { part keeps it out of the new path. no, the boxes have different windows operating systems.probobly a slight difference in jvm. -Original Message- From: [EMAIL PROTECTED] on behalf of Chris Hostetter Sent: Sat 1/28/2006 2:37 AM To: java-user@lucene.apache.org Subject: RE: problem updating a document: no segments file? : this code works in a couple other boxes as is. that deleting code Are those boxes running the same OS? The same JVM? : removes the active index after this one builds in a different location. : then the searcher is told to make this newest one the current and the : old one is deleted. it effects directories and their entire contents. : it wouldnt select just a segment file. also, like i said, this runs I'm not convinced. If your getIndexDirectory() method is returning a string from a config file (or something like that) and it includes a trailing seperator (in addition to the seperator your code adds) then the index directory would be created fine (if i remember right, extra seperates don't generate an exception) but your string equality test would fail and you'd try to delete the files in the directory you just created. If your searcher has already opened some of the files, then (depending on your OS) the delete code may not be able to delete them -- you're not checking the return value from subFiles[j].delete() so you have no way of knowing. The segments file may be the only file getting deleted, because it may be the only file your searcher doesn't have open at the moment the delete code runs. If i remember right IndexSearcher only opens the segments file to get a list of all the individual segments, and then immdeiately closes it, but keeps the rest of hte files open permenantly. : fine on my laptop and g5. and actually, this code was fine till : recently on that box. i modified a different method and updated the : code to this server and now i have this problem. what was the code that changed? : : String newPath = getIndexLocation() + File.separator + : : System.currentTimeMillis(); : : : : IndexWriter writer = new IndexWriter(new File(newPath), : : analyzer, true); : : ... : : : writer.optimize(); : : writer.close(); : : : : SearchSO.setSearcher(newPath); : : : : File[] subFiles; : : File[] subDirs = new File(getIndexLocation()).listFiles(); : : : : for (int i=0;i- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: problem updating a document: no segments file?
: this code works in a couple other boxes as is. that deleting code Are those boxes running the same OS? The same JVM? : removes the active index after this one builds in a different location. : then the searcher is told to make this newest one the current and the : old one is deleted. it effects directories and their entire contents. : it wouldnt select just a segment file. also, like i said, this runs I'm not convinced. If your getIndexDirectory() method is returning a string from a config file (or something like that) and it includes a trailing seperator (in addition to the seperator your code adds) then the index directory would be created fine (if i remember right, extra seperates don't generate an exception) but your string equality test would fail and you'd try to delete the files in the directory you just created. If your searcher has already opened some of the files, then (depending on your OS) the delete code may not be able to delete them -- you're not checking the return value from subFiles[j].delete() so you have no way of knowing. The segments file may be the only file getting deleted, because it may be the only file your searcher doesn't have open at the moment the delete code runs. If i remember right IndexSearcher only opens the segments file to get a list of all the individual segments, and then immdeiately closes it, but keeps the rest of hte files open permenantly. : fine on my laptop and g5. and actually, this code was fine till : recently on that box. i modified a different method and updated the : code to this server and now i have this problem. what was the code that changed? : : String newPath = getIndexLocation() + File.separator + : : System.currentTimeMillis(); : : : : IndexWriter writer = new IndexWriter(new File(newPath), : : analyzer, true); : : ... : : : writer.optimize(); : : writer.close(); : : : : SearchSO.setSearcher(newPath); : : : : File[] subFiles; : : File[] subDirs = new File(getIndexLocation()).listFiles(); : : : : for (int i=0;i
RE: problem updating a document: no segments file?
this code works in a couple other boxes as is.that deleting code removes the active index after this one builds in a different location. then the searcher is told to make this newest one the current and the old one is deleted. it effects directories and their entire contents. it wouldnt select just a segment file. also, like i said, this runs fine on my laptop and g5. and actually, this code was fine till recently on that box. i modified a different method and updated the code to this server and now i have this problem. i noticed the segments file has some nonreadable stuff in there... i can't just make an empty one in the directory can i? even if i do, i dont want to put a slopy work around like that in.. -Original Message- From: [EMAIL PROTECTED] on behalf of Chris Hostetter Sent: Fri 1/27/2006 7:00 PM To: java-user@lucene.apache.org Subject: RE: problem updating a document: no segments file? : Its still not keeping the segments file around. Is that necessary? You seem to have some code at the end that (i'm guess) is supposed to remove older copies of the index. Are you sure that code does what you think it does? Have you tried commenting it out and seeing if that fixes your problem? There may be a bug in that code (I didn't read it that closely) ... : String newPath = getIndexLocation() + File.separator + : System.currentTimeMillis(); : : IndexWriter writer = new IndexWriter(new File(newPath), : analyzer, true); ... : writer.optimize(); : writer.close(); : : SearchSO.setSearcher(newPath); : : File[] subFiles; : File[] subDirs = new File(getIndexLocation()).listFiles(); : : for (int i=0;i- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: problem updating a document: no segments file?
: Its still not keeping the segments file around. Is that necessary? You seem to have some code at the end that (i'm guess) is supposed to remove older copies of the index. Are you sure that code does what you think it does? Have you tried commenting it out and seeing if that fixes your problem? There may be a bug in that code (I didn't read it that closely) ... : String newPath = getIndexLocation() + File.separator + : System.currentTimeMillis(); : : IndexWriter writer = new IndexWriter(new File(newPath), : analyzer, true); ... : writer.optimize(); : writer.close(); : : SearchSO.setSearcher(newPath); : : File[] subFiles; : File[] subDirs = new File(getIndexLocation()).listFiles(); : : for (int i=0;i
RE: problem updating a document: no segments file?
ot;keyword"; doc.add(Field.Text(SD, StringFormat.processString(rs.getString("shortdesc"; doc.add(Field.Text(LD, StringFormat.processString(rs.getString("longdesc"; doc.add(Field.Text(CATNAME, StringFormat.processString(rs.getString("cDesc"; doc.add(Field.Text(CATKEY, StringFormat.processString(rs.getString("cKey"; doc.add(Field.Text(CAT, StringFormat.processString(rs.getString("ancestry"; p = rs.getString("seq"); try { doc.add(Field.Keyword(SEQ, p)); } catch (Exception ep) { doc.add(Field.Keyword(SEQ, "0")); } doc.add(Field.Keyword(PRICE, "0")); doc.add(Field.UnIndexed(PRODUCT, "1")); doc.add(Field.UnIndexed(IMAGE, StringFormat.processString(rs.getString("product_img_file"; } else { doc.add(Field.Text(CATNAME, StringFormat.processString(rs.getString("cDesc"; doc.add(Field.Text(CATKEY, StringFormat.processString(rs.getString("cKey"; doc.add(Field.Text(CAT, StringFormat.processString(rs.getString("ancestry"; p = rs.getString("seq"); try { doc.add(Field.Keyword(SEQ, p)); } catch (Exception ep) { doc.add(Field.Keyword(SEQ, "0")); } } lastID = thisID; } writer.addDocument(doc); dbm.runUpdate("update co_cart_prefs set indexUpdated = getDate()"); rs = dbm.runQuery("select indexUpdated, tablesUpdated from co_cart_prefs "); if (rs.next()) { setIndexUpdated(new Date(DateFormat.string_Long(rs.getString("indexUpdated"; setTablesUpdated(new Date(DateFormat.string_Long(rs.getString("tablesUpdated"; } } catch (Exception e) { System.out.println("Indexer index err: " + e); } finally { try { rs.close(); } catch (Exception e) {} } writer.optimize(); writer.close(); SearchSO.setSearcher(newPath); File[] subFiles; File[] subDirs = new File(getIndexLocation()).listFiles(); for (int i=0;imailto:[EMAIL PROTECTED] Sent: Thursday, January 26, 2006 9:07 PM To: java-user@lucene.apache.org Subject: problem updating a document: no segments file? Hello, I have a couple instances of lucene. I just altered on implementation and now its not keeping a segments file. while indexing occurs, there is a segment file.but once its done, there isn't.all the other indexes have one. the problem comes when i try to update a document, it says "segments file not found" and that stops it.this code was working fine on my development box, but now i go to production its not keeping that segments file.and, it searches just fine.i can reindex over and over, and it keeps disappearing. any ideas? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
problem updating a document: no segments file?
Hello, I have a couple instances of lucene. I just altered on implementation and now its not keeping a segments file. while indexing occurs, there is a segment file.but once its done, there isn't.all the other indexes have one. the problem comes when i try to update a document, it says "segments file not found" and that stops it.this code was working fine on my development box, but now i go to production its not keeping that segments file.and, it searches just fine.i can reindex over and over, and it keeps disappearing. any ideas? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Updating a Document without re-analyzing
That could be, indeed, a good way for today. I'm still dreaming to find a ((DocumentOfSomeSort) document).getTokenStream(fieldName) for stored and non-stored fields! paul Le 8 sept. 05, à 11:56, [EMAIL PROTECTED] a écrit : My understanding is that by splitting your fields into two indexes and putting your keyword fields into one and your complicated stuff into the other then you can update your keyword index in the usual way delete/re-add without having to update your other index avoiding the re-analyzing. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Updating a Document without re-analyzing
Hello Paul, I came across this yesterday. http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200504.mbox/[EMAIL PROTECTED] My understanding is that by splitting your fields into two indexes and putting your keyword fields into one and your complicated stuff into the other then you can update your keyword index in the usual way delete/re-add without having to update your other index avoiding the re-analyzing. Regards Paul I. Paul Libbrecht <[EMAIL PROTECTED]> wrote on 08/09/2005 10:29:00: > > Hi, > > some times ago I posted a comment which asking this question (which is > by no means new) about updating a Lucene document without re-analyzing, > that is, where we expect the token-streams to be copied into the new > document and where I intend to change only a few keyword values. > > I cannot find this answer anymore, sadly... It was something like "look > in Nutch" but I didn't find it there, at least. > > Could someone repoint me to this bit ? Updating a document this way > would be a very important optimization for us as we could unload most > of the content into lucene's document and modify, later, the document > with results which need to have swallowed the whole repository first. > > thank you much. > > paul > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Updating a Document without re-analyzing
Hi, some times ago I posted a comment which asking this question (which is by no means new) about updating a Lucene document without re-analyzing, that is, where we expect the token-streams to be copied into the new document and where I intend to change only a few keyword values. I cannot find this answer anymore, sadly... It was something like "look in Nutch" but I didn't find it there, at least. Could someone repoint me to this bit ? Updating a document this way would be a very important optimization for us as we could unload most of the content into lucene's document and modify, later, the document with results which need to have swallowed the whole repository first. thank you much. paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]