TL;DR: Cayenne using DISTINCT for column queries in 4.2 affects Agrest 
algorithms, but is nevertheless correct and should stay around.

----
When testing M1, I noticed a difference in ordering of the second-level results 
in Agrest vs 4.0/4.1. Those are generated in Agrest via a SelectQuery with 
columns roughly looking like this:

  List<Property<?>> properties = new ArrayList<>();
  properties.add(Property.createSelf(E3.class));

  Expression exp = ExpressionFactory.dbPathExp("e2.id");
  properties.add(Property.create(exp, Integer.class));

  SelectQuery query = new SelectQuery(E3.class);
  query.setColumns(properties);

Here "e2" is a to-one relationship. The difference in generated SQL between 4.0 
and 4.2 is the DISTINCT keyword added in the latter:

4.0: 
  SELECT t0.name, t0.e2_id, t0.id_, t1.id_ FROM utest.e3 t0 JOIN utest.e2 t1 ON 
(t0.e2_id = t1.id_)

4.2:
  SELECT DISTINCT t0.name, t0.e2_id, t0.id_, t1.id_ FROM utest.e3 t0 JOIN 
utest.e2 t1 ON t0.e2_id = t1.id_ 

Both produce the correct result, but since there's no explicit ordering, the 
actual order of the objects returned by Derby is different. I am less worried 
about the ordering (this was just an indicator to me that something has 
changed), but DISTINCT has a performance impact, and now it seems it will 
affect the main execution path of Agrest. 

I suppose Cayenne behavior in 4.2 is correct as with column queries there are 
no simple rules for when there may be duplicate result rows. Our case doesn't 
require DISTINCT only because of the special combination (entity and a related 
to-one id). And we need to fix this on Agrest end (that join is redundant in 
case of to-one). 

Still figured I'd mention...

Andrus


Reply via email to