Re: Why no composite primary-key in lucene ?

Erick Erickson Sat, 29 Apr 2017 09:49:58 -0700

Internal magic. This is the point behind the admonitions against
relying on the _internal_ Lucene doc ID for anything... it changes.
For the same document. Even if the document doesn't change.

Essentially when two segments are merged (and this is conceptual, not
the code) all the live docs from the first segment are written into
the new segments starting at 0 (or 1, like I said this is conceptual).
This (plus the base for  the segment) is the internal lucene doc id.
When the first segment is written, say the last internal id is 100,
then the first live doc of the next segment gets 101. etc.

Now here's the magic. The <unkqueKey> in Solr is just some random
field as far as Lucene is concerned. Solr knows what field this is. So
when you try to, say, update a doc with a <unkqueKey>, _Solr_ looks
around and asks Lucene "do you have any docs with this value in this
field". If the answer is yes, then Solr has the internal Lucene doc ID
and can do the right thing. So for atomic updates, Solr gets the right
Lucene doc, pulls all the stored data from the internal Lucene doc and
constructs another brand-new Lucene doc and applies the updates
specified. Then Solr tells Lucene to delete the doc with the internal
Lucene doc ID and then tells Lucene to index this new doc. Lucene
assigns it a new internal Lucene doc ID.

Best,
Erick

On Sat, Apr 29, 2017 at 5:38 AM, Dorian Hoxha <dorian.ho...@gmail.com> wrote:
> Hi Shawn,
>
> ES has the same thing. You're right that no 'id' is needed when adding a
> document.
>
> But lucene has updateDocument:
> https://lucene.apache.org/core/6_4_0/core/org/apache/lucene/index/IndexWriter.html#updateDocument-org.apache.lucene.index.Term-java.lang.Iterable-
> ?
>
> Or that doesn't need a commit before deleting docs by Term, so it can
> atomically delete+insert  ?
>
> So how do solr/es connect the internal docId to the outside primary-key (to
> support get,delete) ? Or by just storing/indexing the primary-key in a field
> ?
>
> Thanks,
> Dorian
>
>
> On Fri, Apr 28, 2017 at 10:34 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>>
>> On 4/28/2017 6:16 AM, Dorian Hoxha wrote:
>> > I searched for this on mailing-list,issues etc, but couldn't find any
>> > post.
>> >
>> > So, why not have the possibility of <composite_id> ?
>> > Or nobody cared enough to implement it ? Or no gains ?
>>
>> To my knowledge, and I hope someone can correct me if I'm wrong, Lucene
>> generally has absolutely no concept of a primary key at all, much less
>> one that's composite.  At its core, Lucene won't complain if you index
>> the same document twice -- both copies will be present.
>>
>> Solr (and probably a LOT of user-written Lucene code before that)
>> introduced the concept of a uniqueKey field.  When a duplicate document
>> is indexed to Solr, it is Solr that finds/deletes the original, not
>> Lucene.  I feel quite confident in saying that ES has the same
>> functionality, though I have not confirmed it.
>>
>> Thanks,
>> Shawn
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Why no composite primary-key in lucene ?

Reply via email to