It seems that the real time get handler doesn't play nice with aliases. The
current (and past) behavior seems to be that it only works for the first
collection listed in the alias. This seems to be pretty clearly a bug, as
one certainly would expect the /get executed against an alias to either
refuse to work with aliases or work across all collections in the alias
rather than silently working only on the first collection.

However this has opened another can of worms after some discussion with
Erick on slack. What's the expected behavior for this handler in the event
that the same ID shows up in both collections?

My first impulse was it should return both, and then I looked at /select to
see what it did, and found that /select on an alias to collections that
contain duplicate ids is not in a happy state either since it seems to
randomly return one or the other document, but not both (probably based on
the order in which the docs are returned from sub-requests which is not
deterministic).

So from a user perspective I can see arguments for either of two behaviors
(in both cases) but no reason to like the current behaviors which are
silently giving results that are hiding the situation and not returning all
documents.

Reasonable Behavior 1: Throw an error if a second document with the same ID
is encountered.
Reasonable Behavior 2: Return all documents including both (or more)
documents that have colliding ID's.

I can think of scenarios where either would be desirable, so I would think
that we want to make the behavior choice something that can be selected by
users. For this I see two possible points at which the user might express
their preference:

   1. At Configuration time with an Alias Property
   2. At query time with a query parameter.

This also implies a down side to routed aliases in that it's probably
possible to index the same ID multiple times if it repeats less often than
the collection creation interval for time routing or doesn't repeat within
the same category (for category routed), but the responses to queries may
then hide the duplicates in a non-deterministic fashion which is clearly
bad.

I am possibly ok with just documenting that aliases require the user to
provide their own guarantees about ID uniqueness too... though part of me
really wants to have a mode that detects this problem for the user
somehow... (&facet.mincount=2&facet.field=id seems to work, but requires
active checking?) In any case, the behavior with /get not returning docs in
any but the first collection probably needs to be fixed.

Thoughts?

-Gus

-- 
http://www.needhamsoftware.com (work)
http://www.the111shift.com (play)

Reply via email to