Re: UUID processor handling of empty string
r factory is > >> > generating > >> > > uuid even if it is empty. > >> > > This is on solr 5.3.0 > >> > > > >> > > Thanks, > >> > > Susmit > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter < > >> > hossman_luc...@fucit.org > >> > > > > >> > > wrote: > >> > > > >> > > > > >> > > > I'm also confused by what exactly you mean by "doesn't work" but a > >> > > general > >> > > > suggestion you can try is putting the > >> > > > RemoveBlankFieldUpdateProcessorFactory before your UUID > Processor... > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html > >> > > > > >> > > > If you are also worried about strings that aren't exactly empty, > but > >> > > > consist only of whitespace, you can put > >> TrimFieldUpdateProcessorFactory > >> > > > before RemoveBlankFieldUpdateProcessorFactory ... > >> > > > > >> > > > > >> > > > > >> > > > >> > > >> > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html > >> > > > > >> > > > > >> > > > : Date: Thu, 14 Apr 2016 12:30:24 -0700 > >> > > > : From: Erick Erickson <erickerick...@gmail.com> > >> > > > : Reply-To: solr-user@lucene.apache.org > >> > > > : To: solr-user <solr-user@lucene.apache.org> > >> > > > : Subject: Re: UUID processor handling of empty string > >> > > > : > >> > > > : What do you mean "doesn't work"? An empty string is > >> > > > : different than not being present. Thee UUID update > >> > > > : processor (I'm pretty sure) only adds a field if it > >> > > > : is _absent_. Specifying it as an empty string > >> > > > : fails that test so no value is added. > >> > > > : > >> > > > : At that point, if this uuid field is also the , > >> > > > : then each doc that comes in with an empty field will replace > >> > > > : the others. > >> > > > : > >> > > > : If it's _not_ the , the sorting will be confusing. > >> > > > : All the empty string fields are equal, so the tiebreaker is > >> > > > : the internal Lucene doc ID, which may change as merges > >> > > > : happen. You can specify secondary sort fields to make the > >> > > > : sort predictable (the field is popular for this). > >> > > > : > >> > > > : Best, > >> > > > : Erick > >> > > > : > >> > > > : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla < > >> > > shukla.sus...@gmail.com> > >> > > > wrote: > >> > > > : > Hi, > >> > > > : > > >> > > > : > I have configured solr schema to generate unique id for a > >> > collection > >> > > > using > >> > > > : > UUIDUpdateProcessorFactory > >> > > > : > > >> > > > : > I am seeing a peculiar behavior - if the unique 'id' field is > >> > > > explicitly > >> > > > : > set as empty string in the SolrInputDocument, the document > gets > >> > > indexed > >> > > > : > with UUID update processor generating the id. > >> > > > : > However, sorting does not work if uuid was generated in this > way. > >> > > Also > >> > > > : > cursor functionality that depends on unique id sort also does > not > >> > > work. > >> > > > : > I guess the correct behavior would be to fail the indexing if > >> user > >> > > > provides > >> > > > : > an empty string for a uuid field. > >> > > > : > > >> > > > : > The issues do not happen if I omit the id field from the > >> > > > SolrInputDocument . > >> > > > : > > >> > > > : > SolrInputDocument > >> > > > : > > >> > > > : > solrDoc.addField("id", ""); > >> > > > : > > >> > > > : > ... > >> > > > : > > >> > > > : > I am using schema similar to below- > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > /> > >> > > > : > > >> > > > : > >> > > > required="true" /> > >> > > > : > > >> > > > : > id > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > id > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > >> class="solr.UpdateRequestHandler"> > >> > > > : > > >> > > > : > uuid > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > > >> > > > : > Thanks, > >> > > > : > Susmit > >> > > > : > >> > > > > >> > > > -Hoss > >> > > > http://www.lucidworks.com/ > >> > > > > >> > > > >> > > >> >
Re: UUID processor handling of empty string
I did a quick experiment (admittedly with 5.x, but even if this is a bug it won't be back-ported to 5.3) and this works exactly as I expect. I have three docs with IDs as follows doc1: . This is equivalent to your "" doc2: whatever doc3: As expected, when the output comes back doc1 has an empty field, doc2 has "whatever" and doc3 has a newly-generated uuid that happens to start with "f". Adding =id asc returns: doc1: (empty string) doc3: fblahblah doc2: whatever Adding =id desc returns doc2: whatever doc3: fblahblah doc1:(empty string) So for about the third time, "what do you mean by 'doesn't work'?" Provide simple example date (just how you specify the "id" field is sufficient). Provide the requests you're using. Point out what's not as you expect. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Sat, Apr 16, 2016 at 9:54 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > Remove that line of code from your client, or... add the remove blank field > update processor as Hoss suggested. Your code is violating the contract for > the UUID update processor. An empty string is still a value, and the > presence of a value is an explicit trigger to suppress the UUID update > processor. > > -- Jack Krupansky > > On Sat, Apr 16, 2016 at 12:41 PM, Susmit Shukla <shukla.sus...@gmail.com> > wrote: > >> I am seeing the UUID getting generated when I set the field as empty string >> like this - solrDoc.addField("id", ""); with solr 5.3.1 and based on the >> above schema. >> The resulting documents in the index are searchable but not sortable. >> Someone could verify if this bug exists and file a jira. >> >> Thanks, >> Susmit >> >> >> >> On Sat, Apr 16, 2016 at 8:56 AM, Jack Krupansky <jack.krupan...@gmail.com> >> wrote: >> >> > "UUID processor factory is generating uuid even if it is empty." >> > >> > The processor will generate the UUID only if the id field is not >> specified >> > in the input document. Empty value and value not present are not the same >> > thing. >> > >> > So, please clarify your specific situation. >> > >> > >> > -- Jack Krupansky >> > >> > On Thu, Apr 14, 2016 at 7:20 PM, Susmit Shukla <shukla.sus...@gmail.com> >> > wrote: >> > >> > > Hi Chris/Erick, >> > > >> > > Does not work in the sense the order of documents does not change on >> > > changing sort from asc to desc. >> > > This could be just a trivial bug where UUID processor factory is >> > generating >> > > uuid even if it is empty. >> > > This is on solr 5.3.0 >> > > >> > > Thanks, >> > > Susmit >> > > >> > > >> > > >> > > >> > > >> > > On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter < >> > hossman_luc...@fucit.org >> > > > >> > > wrote: >> > > >> > > > >> > > > I'm also confused by what exactly you mean by "doesn't work" but a >> > > general >> > > > suggestion you can try is putting the >> > > > RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... >> > > > >> > > > >> > > > >> > > >> > >> https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html >> > > > >> > > > If you are also worried about strings that aren't exactly empty, but >> > > > consist only of whitespace, you can put >> TrimFieldUpdateProcessorFactory >> > > > before RemoveBlankFieldUpdateProcessorFactory ... >> > > > >> > > > >> > > > >> > > >> > >> https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html >> > > > >> > > > >> > > > : Date: Thu, 14 Apr 2016 12:30:24 -0700 >> > > > : From: Erick Erickson <erickerick...@gmail.com> >> > > > : Reply-To: solr-user@lucene.apache.org >> > > > : To: solr-user <solr-user@lucene.apache.org> >> > > > : Subject: Re: UUID processor handling of empty string >> > > > : >> > > > : What do you mean "doesn't work"? An empty string is >> > > > : different than not being present. Thee UUID update
Re: UUID processor handling of empty string
Remove that line of code from your client, or... add the remove blank field update processor as Hoss suggested. Your code is violating the contract for the UUID update processor. An empty string is still a value, and the presence of a value is an explicit trigger to suppress the UUID update processor. -- Jack Krupansky On Sat, Apr 16, 2016 at 12:41 PM, Susmit Shukla <shukla.sus...@gmail.com> wrote: > I am seeing the UUID getting generated when I set the field as empty string > like this - solrDoc.addField("id", ""); with solr 5.3.1 and based on the > above schema. > The resulting documents in the index are searchable but not sortable. > Someone could verify if this bug exists and file a jira. > > Thanks, > Susmit > > > > On Sat, Apr 16, 2016 at 8:56 AM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > > > "UUID processor factory is generating uuid even if it is empty." > > > > The processor will generate the UUID only if the id field is not > specified > > in the input document. Empty value and value not present are not the same > > thing. > > > > So, please clarify your specific situation. > > > > > > -- Jack Krupansky > > > > On Thu, Apr 14, 2016 at 7:20 PM, Susmit Shukla <shukla.sus...@gmail.com> > > wrote: > > > > > Hi Chris/Erick, > > > > > > Does not work in the sense the order of documents does not change on > > > changing sort from asc to desc. > > > This could be just a trivial bug where UUID processor factory is > > generating > > > uuid even if it is empty. > > > This is on solr 5.3.0 > > > > > > Thanks, > > > Susmit > > > > > > > > > > > > > > > > > > On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter < > > hossman_luc...@fucit.org > > > > > > > wrote: > > > > > > > > > > > I'm also confused by what exactly you mean by "doesn't work" but a > > > general > > > > suggestion you can try is putting the > > > > RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... > > > > > > > > > > > > > > > > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html > > > > > > > > If you are also worried about strings that aren't exactly empty, but > > > > consist only of whitespace, you can put > TrimFieldUpdateProcessorFactory > > > > before RemoveBlankFieldUpdateProcessorFactory ... > > > > > > > > > > > > > > > > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html > > > > > > > > > > > > : Date: Thu, 14 Apr 2016 12:30:24 -0700 > > > > : From: Erick Erickson <erickerick...@gmail.com> > > > > : Reply-To: solr-user@lucene.apache.org > > > > : To: solr-user <solr-user@lucene.apache.org> > > > > : Subject: Re: UUID processor handling of empty string > > > > : > > > > : What do you mean "doesn't work"? An empty string is > > > > : different than not being present. Thee UUID update > > > > : processor (I'm pretty sure) only adds a field if it > > > > : is _absent_. Specifying it as an empty string > > > > : fails that test so no value is added. > > > > : > > > > : At that point, if this uuid field is also the , > > > > : then each doc that comes in with an empty field will replace > > > > : the others. > > > > : > > > > : If it's _not_ the , the sorting will be confusing. > > > > : All the empty string fields are equal, so the tiebreaker is > > > > : the internal Lucene doc ID, which may change as merges > > > > : happen. You can specify secondary sort fields to make the > > > > : sort predictable (the field is popular for this). > > > > : > > > > : Best, > > > > : Erick > > > > : > > > > : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla < > > > shukla.sus...@gmail.com> > > > > wrote: > > > > : > Hi, > > > > : > > > > > : > I have configured solr schema to generate unique id for a > > collection > > > > using > > > > : > UUIDUpdateProcessorFactory > > >
Re: UUID processor handling of empty string
I am seeing the UUID getting generated when I set the field as empty string like this - solrDoc.addField("id", ""); with solr 5.3.1 and based on the above schema. The resulting documents in the index are searchable but not sortable. Someone could verify if this bug exists and file a jira. Thanks, Susmit On Sat, Apr 16, 2016 at 8:56 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > "UUID processor factory is generating uuid even if it is empty." > > The processor will generate the UUID only if the id field is not specified > in the input document. Empty value and value not present are not the same > thing. > > So, please clarify your specific situation. > > > -- Jack Krupansky > > On Thu, Apr 14, 2016 at 7:20 PM, Susmit Shukla <shukla.sus...@gmail.com> > wrote: > > > Hi Chris/Erick, > > > > Does not work in the sense the order of documents does not change on > > changing sort from asc to desc. > > This could be just a trivial bug where UUID processor factory is > generating > > uuid even if it is empty. > > This is on solr 5.3.0 > > > > Thanks, > > Susmit > > > > > > > > > > > > On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter < > hossman_luc...@fucit.org > > > > > wrote: > > > > > > > > I'm also confused by what exactly you mean by "doesn't work" but a > > general > > > suggestion you can try is putting the > > > RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... > > > > > > > > > > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html > > > > > > If you are also worried about strings that aren't exactly empty, but > > > consist only of whitespace, you can put TrimFieldUpdateProcessorFactory > > > before RemoveBlankFieldUpdateProcessorFactory ... > > > > > > > > > > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html > > > > > > > > > : Date: Thu, 14 Apr 2016 12:30:24 -0700 > > > : From: Erick Erickson <erickerick...@gmail.com> > > > : Reply-To: solr-user@lucene.apache.org > > > : To: solr-user <solr-user@lucene.apache.org> > > > : Subject: Re: UUID processor handling of empty string > > > : > > > : What do you mean "doesn't work"? An empty string is > > > : different than not being present. Thee UUID update > > > : processor (I'm pretty sure) only adds a field if it > > > : is _absent_. Specifying it as an empty string > > > : fails that test so no value is added. > > > : > > > : At that point, if this uuid field is also the , > > > : then each doc that comes in with an empty field will replace > > > : the others. > > > : > > > : If it's _not_ the , the sorting will be confusing. > > > : All the empty string fields are equal, so the tiebreaker is > > > : the internal Lucene doc ID, which may change as merges > > > : happen. You can specify secondary sort fields to make the > > > : sort predictable (the field is popular for this). > > > : > > > : Best, > > > : Erick > > > : > > > : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla < > > shukla.sus...@gmail.com> > > > wrote: > > > : > Hi, > > > : > > > > : > I have configured solr schema to generate unique id for a > collection > > > using > > > : > UUIDUpdateProcessorFactory > > > : > > > > : > I am seeing a peculiar behavior - if the unique 'id' field is > > > explicitly > > > : > set as empty string in the SolrInputDocument, the document gets > > indexed > > > : > with UUID update processor generating the id. > > > : > However, sorting does not work if uuid was generated in this way. > > Also > > > : > cursor functionality that depends on unique id sort also does not > > work. > > > : > I guess the correct behavior would be to fail the indexing if user > > > provides > > > : > an empty string for a uuid field. > > > : > > > > : > The issues do not happen if I omit the id field from the > > > SolrInputDocument . > > > : > > > > : > SolrInputDocument > > > : > > > > : > solrDoc.addField("id", ""); > > > : > > > > : > ... > > > : > > > > : > I am using schema similar to below- > > > : > > > > : > > > > : > > > > : > > > > : > > > > : > > > required="true" /> > > > : > > > > : > id > > > : > > > > : > > > > : > > > > : > > > > : > id > > > : > > > > : > > > > : > > > > : > > > > : > > > > : > > > > : > > > > : > uuid > > > : > > > > : > > > > : > > > > : > > > > : > Thanks, > > > : > Susmit > > > : > > > > > > -Hoss > > > http://www.lucidworks.com/ > > > > > >
Re: UUID processor handling of empty string
"UUID processor factory is generating uuid even if it is empty." The processor will generate the UUID only if the id field is not specified in the input document. Empty value and value not present are not the same thing. So, please clarify your specific situation. -- Jack Krupansky On Thu, Apr 14, 2016 at 7:20 PM, Susmit Shukla <shukla.sus...@gmail.com> wrote: > Hi Chris/Erick, > > Does not work in the sense the order of documents does not change on > changing sort from asc to desc. > This could be just a trivial bug where UUID processor factory is generating > uuid even if it is empty. > This is on solr 5.3.0 > > Thanks, > Susmit > > > > > > On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter <hossman_luc...@fucit.org > > > wrote: > > > > > I'm also confused by what exactly you mean by "doesn't work" but a > general > > suggestion you can try is putting the > > RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... > > > > > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html > > > > If you are also worried about strings that aren't exactly empty, but > > consist only of whitespace, you can put TrimFieldUpdateProcessorFactory > > before RemoveBlankFieldUpdateProcessorFactory ... > > > > > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html > > > > > > : Date: Thu, 14 Apr 2016 12:30:24 -0700 > > : From: Erick Erickson <erickerick...@gmail.com> > > : Reply-To: solr-user@lucene.apache.org > > : To: solr-user <solr-user@lucene.apache.org> > > : Subject: Re: UUID processor handling of empty string > > : > > : What do you mean "doesn't work"? An empty string is > > : different than not being present. Thee UUID update > > : processor (I'm pretty sure) only adds a field if it > > : is _absent_. Specifying it as an empty string > > : fails that test so no value is added. > > : > > : At that point, if this uuid field is also the , > > : then each doc that comes in with an empty field will replace > > : the others. > > : > > : If it's _not_ the , the sorting will be confusing. > > : All the empty string fields are equal, so the tiebreaker is > > : the internal Lucene doc ID, which may change as merges > > : happen. You can specify secondary sort fields to make the > > : sort predictable (the field is popular for this). > > : > > : Best, > > : Erick > > : > > : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla < > shukla.sus...@gmail.com> > > wrote: > > : > Hi, > > : > > > : > I have configured solr schema to generate unique id for a collection > > using > > : > UUIDUpdateProcessorFactory > > : > > > : > I am seeing a peculiar behavior - if the unique 'id' field is > > explicitly > > : > set as empty string in the SolrInputDocument, the document gets > indexed > > : > with UUID update processor generating the id. > > : > However, sorting does not work if uuid was generated in this way. > Also > > : > cursor functionality that depends on unique id sort also does not > work. > > : > I guess the correct behavior would be to fail the indexing if user > > provides > > : > an empty string for a uuid field. > > : > > > : > The issues do not happen if I omit the id field from the > > SolrInputDocument . > > : > > > : > SolrInputDocument > > : > > > : > solrDoc.addField("id", ""); > > : > > > : > ... > > : > > > : > I am using schema similar to below- > > : > > > : > > > : > > > : > > > : > > > : > > required="true" /> > > : > > > : > id > > : > > > : > > > : > > > : > > > : > id > > : > > > : > > > : > > > : > > > : > > > : > > > : > > > : > uuid > > : > > > : > > > : > > > : > > > : > Thanks, > > : > Susmit > > : > > > > -Hoss > > http://www.lucidworks.com/ > > >
Re: UUID processor handling of empty string
The suggestions from Chris and Erick are probably an answer, but I just wanted to say that you also looking at this as too much of a black-box situation. You are trying to troubleshoot an effect of something specified on the client from the search results. You can bisect this problem and look at the way the records actually got indexed. So, if the record had "id" set to "" in the client, what do you get in the index for that field? If that behavior is inconsistent with what you do when you don't set the "id" at all, concentrate on fixing that. Most likely using Chris' approach. Regards, Alex. Newsletter and resources for Solr beginners and intermediates: http://www.solr-start.com/ On 15 April 2016 at 09:20, Susmit Shukla <shukla.sus...@gmail.com> wrote: > Hi Chris/Erick, > > Does not work in the sense the order of documents does not change on > changing sort from asc to desc. > This could be just a trivial bug where UUID processor factory is generating > uuid even if it is empty. > This is on solr 5.3.0 > > Thanks, > Susmit > > > > > > On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter <hossman_luc...@fucit.org> > wrote: > >> >> I'm also confused by what exactly you mean by "doesn't work" but a general >> suggestion you can try is putting the >> RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... >> >> >> https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html >> >> If you are also worried about strings that aren't exactly empty, but >> consist only of whitespace, you can put TrimFieldUpdateProcessorFactory >> before RemoveBlankFieldUpdateProcessorFactory ... >> >> >> https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html >> >> >> : Date: Thu, 14 Apr 2016 12:30:24 -0700 >> : From: Erick Erickson <erickerick...@gmail.com> >> : Reply-To: solr-user@lucene.apache.org >> : To: solr-user <solr-user@lucene.apache.org> >> : Subject: Re: UUID processor handling of empty string >> : >> : What do you mean "doesn't work"? An empty string is >> : different than not being present. Thee UUID update >> : processor (I'm pretty sure) only adds a field if it >> : is _absent_. Specifying it as an empty string >> : fails that test so no value is added. >> : >> : At that point, if this uuid field is also the , >> : then each doc that comes in with an empty field will replace >> : the others. >> : >> : If it's _not_ the , the sorting will be confusing. >> : All the empty string fields are equal, so the tiebreaker is >> : the internal Lucene doc ID, which may change as merges >> : happen. You can specify secondary sort fields to make the >> : sort predictable (the field is popular for this). >> : >> : Best, >> : Erick >> : >> : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla <shukla.sus...@gmail.com> >> wrote: >> : > Hi, >> : > >> : > I have configured solr schema to generate unique id for a collection >> using >> : > UUIDUpdateProcessorFactory >> : > >> : > I am seeing a peculiar behavior - if the unique 'id' field is >> explicitly >> : > set as empty string in the SolrInputDocument, the document gets indexed >> : > with UUID update processor generating the id. >> : > However, sorting does not work if uuid was generated in this way. Also >> : > cursor functionality that depends on unique id sort also does not work. >> : > I guess the correct behavior would be to fail the indexing if user >> provides >> : > an empty string for a uuid field. >> : > >> : > The issues do not happen if I omit the id field from the >> SolrInputDocument . >> : > >> : > SolrInputDocument >> : > >> : > solrDoc.addField("id", ""); >> : > >> : > ... >> : > >> : > I am using schema similar to below- >> : > >> : > >> : > >> : > >> : > >> : > > required="true" /> >> : > >> : > id >> : > >> : > >> : > >> : > >> : > id >> : > >> : > >> : > >> : > >> : > >> : > >> : > >> : > uuid >> : > >> : > >> : > >> : > >> : > Thanks, >> : > Susmit >> : >> >> -Hoss >> http://www.lucidworks.com/ >>
Re: UUID processor handling of empty string
Hi Chris/Erick, Does not work in the sense the order of documents does not change on changing sort from asc to desc. This could be just a trivial bug where UUID processor factory is generating uuid even if it is empty. This is on solr 5.3.0 Thanks, Susmit On Thu, Apr 14, 2016 at 2:30 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > I'm also confused by what exactly you mean by "doesn't work" but a general > suggestion you can try is putting the > RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html > > If you are also worried about strings that aren't exactly empty, but > consist only of whitespace, you can put TrimFieldUpdateProcessorFactory > before RemoveBlankFieldUpdateProcessorFactory ... > > > https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html > > > : Date: Thu, 14 Apr 2016 12:30:24 -0700 > : From: Erick Erickson <erickerick...@gmail.com> > : Reply-To: solr-user@lucene.apache.org > : To: solr-user <solr-user@lucene.apache.org> > : Subject: Re: UUID processor handling of empty string > : > : What do you mean "doesn't work"? An empty string is > : different than not being present. Thee UUID update > : processor (I'm pretty sure) only adds a field if it > : is _absent_. Specifying it as an empty string > : fails that test so no value is added. > : > : At that point, if this uuid field is also the , > : then each doc that comes in with an empty field will replace > : the others. > : > : If it's _not_ the , the sorting will be confusing. > : All the empty string fields are equal, so the tiebreaker is > : the internal Lucene doc ID, which may change as merges > : happen. You can specify secondary sort fields to make the > : sort predictable (the field is popular for this). > : > : Best, > : Erick > : > : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla <shukla.sus...@gmail.com> > wrote: > : > Hi, > : > > : > I have configured solr schema to generate unique id for a collection > using > : > UUIDUpdateProcessorFactory > : > > : > I am seeing a peculiar behavior - if the unique 'id' field is > explicitly > : > set as empty string in the SolrInputDocument, the document gets indexed > : > with UUID update processor generating the id. > : > However, sorting does not work if uuid was generated in this way. Also > : > cursor functionality that depends on unique id sort also does not work. > : > I guess the correct behavior would be to fail the indexing if user > provides > : > an empty string for a uuid field. > : > > : > The issues do not happen if I omit the id field from the > SolrInputDocument . > : > > : > SolrInputDocument > : > > : > solrDoc.addField("id", ""); > : > > : > ... > : > > : > I am using schema similar to below- > : > > : > > : > > : > > : > > : > required="true" /> > : > > : > id > : > > : > > : > > : > > : > id > : > > : > > : > > : > > : > > : > > : > > : > uuid > : > > : > > : > > : > > : > Thanks, > : > Susmit > : > > -Hoss > http://www.lucidworks.com/ >
Re: UUID processor handling of empty string
I'm also confused by what exactly you mean by "doesn't work" but a general suggestion you can try is putting the RemoveBlankFieldUpdateProcessorFactory before your UUID Processor... https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/RemoveBlankFieldUpdateProcessorFactory.html If you are also worried about strings that aren't exactly empty, but consist only of whitespace, you can put TrimFieldUpdateProcessorFactory before RemoveBlankFieldUpdateProcessorFactory ... https://lucene.apache.org/solr/6_0_0/solr-core/org/apache/solr/update/processor/TrimFieldUpdateProcessorFactory.html : Date: Thu, 14 Apr 2016 12:30:24 -0700 : From: Erick Erickson <erickerick...@gmail.com> : Reply-To: solr-user@lucene.apache.org : To: solr-user <solr-user@lucene.apache.org> : Subject: Re: UUID processor handling of empty string : : What do you mean "doesn't work"? An empty string is : different than not being present. Thee UUID update : processor (I'm pretty sure) only adds a field if it : is _absent_. Specifying it as an empty string : fails that test so no value is added. : : At that point, if this uuid field is also the , : then each doc that comes in with an empty field will replace : the others. : : If it's _not_ the , the sorting will be confusing. : All the empty string fields are equal, so the tiebreaker is : the internal Lucene doc ID, which may change as merges : happen. You can specify secondary sort fields to make the : sort predictable (the field is popular for this). : : Best, : Erick : : On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shukla <shukla.sus...@gmail.com> wrote: : > Hi, : > : > I have configured solr schema to generate unique id for a collection using : > UUIDUpdateProcessorFactory : > : > I am seeing a peculiar behavior - if the unique 'id' field is explicitly : > set as empty string in the SolrInputDocument, the document gets indexed : > with UUID update processor generating the id. : > However, sorting does not work if uuid was generated in this way. Also : > cursor functionality that depends on unique id sort also does not work. : > I guess the correct behavior would be to fail the indexing if user provides : > an empty string for a uuid field. : > : > The issues do not happen if I omit the id field from the SolrInputDocument . : > : > SolrInputDocument : > : > solrDoc.addField("id", ""); : > : > ... : > : > I am using schema similar to below- : > : > : > : > : > : > : > : > id : > : > : > : > : > id : > : > : > : > : > : > : > : > uuid : > : > : > : > : > Thanks, : > Susmit : -Hoss http://www.lucidworks.com/
Re: UUID processor handling of empty string
What do you mean "doesn't work"? An empty string is different than not being present. Thee UUID update processor (I'm pretty sure) only adds a field if it is _absent_. Specifying it as an empty string fails that test so no value is added. At that point, if this uuid field is also the , then each doc that comes in with an empty field will replace the others. If it's _not_ the , the sorting will be confusing. All the empty string fields are equal, so the tiebreaker is the internal Lucene doc ID, which may change as merges happen. You can specify secondary sort fields to make the sort predictable (the field is popular for this). Best, Erick On Thu, Apr 14, 2016 at 12:18 PM, Susmit Shuklawrote: > Hi, > > I have configured solr schema to generate unique id for a collection using > UUIDUpdateProcessorFactory > > I am seeing a peculiar behavior - if the unique 'id' field is explicitly > set as empty string in the SolrInputDocument, the document gets indexed > with UUID update processor generating the id. > However, sorting does not work if uuid was generated in this way. Also > cursor functionality that depends on unique id sort also does not work. > I guess the correct behavior would be to fail the indexing if user provides > an empty string for a uuid field. > > The issues do not happen if I omit the id field from the SolrInputDocument . > > SolrInputDocument > > solrDoc.addField("id", ""); > > ... > > I am using schema similar to below- > > > > > > > > id > > > > > id > > > > > > > > uuid > > > > > Thanks, > Susmit
UUID processor handling of empty string
Hi, I have configured solr schema to generate unique id for a collection using UUIDUpdateProcessorFactory I am seeing a peculiar behavior - if the unique 'id' field is explicitly set as empty string in the SolrInputDocument, the document gets indexed with UUID update processor generating the id. However, sorting does not work if uuid was generated in this way. Also cursor functionality that depends on unique id sort also does not work. I guess the correct behavior would be to fail the indexing if user provides an empty string for a uuid field. The issues do not happen if I omit the id field from the SolrInputDocument . SolrInputDocument solrDoc.addField("id", ""); ... I am using schema similar to below- id id uuid Thanks, Susmit