Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-10 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Robert Leach):

 Replying to [comment:6 Mariusz Felisiak]:
 > This would be another logic that's implicit and probably unexpected by
 users (at least in some cases). As far as I'm aware it's preferable to
 fail loudly even with a `ProgrammingError`. I'm not sure how to improve
 this note, maybe it's enough to add a correct example:
 >
 > {{{#!diff
 > diff --git a/docs/ref/models/querysets.txt
 b/docs/ref/models/querysets.txt
 > index a9da1dcf7e..891b8255b0 100644
 > --- a/docs/ref/models/querysets.txt
 > +++ b/docs/ref/models/querysets.txt
 > @@ -565,7 +565,9 @@ Examples (those after the first will only work on
 PostgreSQL)::
 >  ...wouldn't work because the query would be ordered by
 ``blog__name`` thus
 >  mismatching the ``DISTINCT ON`` expression. You'd have to
 explicitly order
 >  by the relation ``_id`` field (``blog_id`` in this case) or the
 referenced
 > -one (``blog__pk``) to make sure both expressions match.
 > +one (``blog__pk``) to make sure both expressions match::
 > +
 > +Entry.objects.order_by('blog_id').distinct('blog_id')
 >
 >  ``values()``
 >  
 > }}}

 I'm not sure that would have been enough for me to have anticipated or
 correctly interpreted the error I encountered.  I just corrected our code
 base and I now feel I have a clearer picture of how I got here...  Let me
 explain why.  I'd thought my expressions did match:

 {{{
 distinct_fields = ['name', 'pk',
 'compounds__testsynonyms__compound', 'compounds__testsynonyms__name',
 'compounds__testsynonyms__pk', 'compounds__name', 'compounds__pk']
 qs =
 
TestPeak.objects.filter(q_exp).order_by(*distinct_fields).distinct(*distinct_fields)
 }}}

 I was like "but I'm supplying the exact same fields for both order_by and
 distinct".

 What I think I'd missed was that "referenced field" in the phrase
 `explicitly order by the relation _id or referenced field` doesn't just
 apply to related model fields in the queried("/root") model (e.g.
 `Entry`'s fields such as `blog` in
 `Entry.objects.order_by('blog').distinct('blog')`, but applies to all
 related models through all my joins - **and** (indirectly/potentially)
 their `meta.ordering` values... because I wasn't explicitly adding the
 `compounds__testsynonyms__compound` "field" above in my codebase where I
 ran into this error - I was implicitly adding them -> I had written code
 to straightforwardly grab and prepend the meta.ordering fields in order to
 satisfy distinct's requirements and still have things ordered nicely - and
 that's where I was running into this problem, because it didn't occur to
 me that the fields in meta.ordering could be changed differently by
 `order_by` and `distinct`.  meta.ordering lets you add those "blog" fields
 and I wasn't the developer that had added them to our code base, but I
 didn't even think to check whether those fields I was adding would be
 subject to the distinct/order_by gotcha.

 I mean - all the information is there in the doc for me to have avoided
 this.  You just have to piece together doc info on `meta.ordering`,
 `order_by`, and `distinct` and extend the examples to realize it applies
 to joined models' fields - not just the root model.  It was just a bit too
 complex for me at my level of django experience to anticipate correctly
 the implications of what I was doing.

 **Maybe the best thing to do would be to anticipate the motivations that
 lead me down this path...** I wanted to use distinct on a join query,
 which meant that I needed to add fields to order_by that I didn't really
 care about - I just needed to add them to meet the requirement that the
 fields matched.  BUT - I didn't want to change the desired ordering - so I
 needed to explicitly add the default orderings so that the IDs wouldn't
 override them.  So maybe the doc should just point out that if you're
 adding fields to order_by solely to be able to use distinct, but you don't
 want to change the default ordering specified in meta.ordering - you need
 to consider the fact that the meta.ordering fields explicitly added can

Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-09 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Mariusz Felisiak):

 Replying to [comment:5 Robert Leach]:
 > When those methods are assessed individually, I understand why those
 fields are the preferred solution (e.g. the meta ordering may not be
 unique), but given that `distinct` requires the same fields be present at
 the beginning of the order-by, I don't know what prevents the code to be
 written to have those fields be resolved in a way that is copacetic.
 Like, why not convert the reference into 2 additional fields that
 together, meet both requirements (`name` AND `compound_id`)? Order-by
 would be satisfied and distinct would be satisfied.  Or... in my case,
 `name` is unique, so distinct could resolve to the meta ordering without
 issue...
 >
 > Is there a technical reason the code doesn't already do this?

 This would be another logic that's implicit and probably unexpected by
 users (at least in some cases). As far as I'm aware it's preferable to
 fail loudly even with a `ProgrammingError`. I'm not sure how to improve
 this note, maybe it's enough to add a correct example:

 {{{#!diff
 diff --git a/docs/ref/models/querysets.txt b/docs/ref/models/querysets.txt
 index a9da1dcf7e..891b8255b0 100644
 --- a/docs/ref/models/querysets.txt
 +++ b/docs/ref/models/querysets.txt
 @@ -565,7 +565,9 @@ Examples (those after the first will only work on
 PostgreSQL)::
  ...wouldn't work because the query would be ordered by ``blog__name``
 thus
  mismatching the ``DISTINCT ON`` expression. You'd have to explicitly
 order
  by the relation ``_id`` field (``blog_id`` in this case) or the
 referenced
 -one (``blog__pk``) to make sure both expressions match.
 +one (``blog__pk``) to make sure both expressions match::
 +
 +Entry.objects.order_by('blog_id').distinct('blog_id')

  ``values()``
  
 }}}

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070180ac6ab9dc-f364073d-6c35-48f3-a491-0f1f38ab81f1-00%40eu-central-1.amazonses.com.


Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-09 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Robert Leach):

 Replying to [comment:4 Mariusz Felisiak]:
 > Robert, Can you propose a documentation improvement via GitHub's PR?

 I can certainly give it a shot, though I'm not the best writer when it
 comes to brevity.

 Also, I don't have a deep understanding of the related Django code, so my
 understanding could be empirically correct, but technically flawed (like
 Bohr's model of the atom).  For example, when the same field reference is
 supplied to both `.order_by()` and `.distinct()`, such as in Simon's
 example:

 {{{
 TestSynonym.objects.distinct('compound').order_by('compound')
 }}}

 ...why is the inserted field in each case not coordinated?  Why does the
 conversion from the reference (`compound`) differ?  Simon says it resolves
 to:

 {{{
 list(TestSynonym.objects.distinct('compound').order_by('compound__name'))
 }}}

 but based on my debug output of another test using that above call, that's
 imprecise.  It shows:

 {{{
 QUERY: SELECT DISTINCT ON ("DataRepo_testsynonym"."compound_id")
 "DataRepo_testsynonym"."name", "DataRepo_testsynonym"."compound_id" FROM
 "DataRepo_testsynonym" INNER JOIN "DataRepo_testcompound" ON
 ("DataRepo_testsynonym"."compound_id" = "DataRepo_testcompound"."id")
 ORDER BY "DataRepo_testcompound"."name" ASC
 }}}

 which means that the distinct field resolution and order by field
 resolutions are:

 - `distinct`: `compound_id`
 - `order_by`: `name`

 When those methods are assessed individually, I understand why those
 fields are the preferred solution (e.g. the meta ordering may not be
 unique), but given that `distinct` requires the same fields be present at
 the beginning of the order-by, I don't know what prevents the code to be
 written to have those fields be resolved in a way that is copacetic.
 Like, why not convert the reference into 2 additional fields that
 together, meet both requirements (`name` AND `compound_id`)?  Order-by
 would be satisfied and distinct would be satisfied.  Or... in my case,
 `name` is unique, so distinct could resolve to the meta ordering without
 issue...

 Is there a technical reason the code doesn't already do this?

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070180a9e08284-b3b04ee2-2650-4ce8-86a1-e633fe78db55-00%40eu-central-1.amazonses.com.


Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-08 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Mariusz Felisiak):

 Robert, Can you propose a documentation improvement via GitHub's PR?

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070180a74b7e6b-73bea91a-542b-4c88-bd98-e361f50061ba-00%40eu-central-1.amazonses.com.


Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-06 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Robert Leach):

 I believe that the note at the bottom of the distinct section of this doc:

 [https://docs.djangoproject.com/en/4.0/ref/models/querysets/#distinct]

 answers my above question, but to dissect the verbiage it tells you *what*
 to do without the *why* fully described.  It tells you that adding the
 `_id` field would be necessary to **make the expressions match** (which is
 "a" reason why to add the `id` field), but it doesn't explicitly explain
 why that makes them match, or say whether the `id` field is precisely
 required or if any field will do.

 If I'd better understood the *why* in that doc, I might have coded the
 right solution to the gotcha and not overlooked the other cases.

 My updated understanding is that it seems that the reason *a* related
 model field is necessary is because the related model "field" in the model
 definition that links to the related model isn't a "field".  It's a
 reference that gets turned into a field that by default uses the
 `meta.ordering`.  (I didn't even notice that the distinct clause had
 `compound_id` and the order by clause had `name` in that position.)  So
 I'm guessing that *any*(?) related model field in front of a (non-field)
 related model reference (whether it's at the beginning of the distinct
 list or "just before" the non-field related model reference) would solve
 the issue?  Or will *any* explicit inclusion of a non-field related model
 reference cause the problem?  **Or** perhaps even explicit inclusion of
 such a (non) field would cause the problem.

 I think these are areas in which the doc could be improved just a bit
 more.  Understanding the /why/ **better**, I think, could be helpful to
 avoid these pitfals, and also help to understand an otherwise cryptic
 error message.

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/010701809a2375a7-2fa0c291-d8a3-429a-8dbe-a18ca44cd1d4-00%40eu-central-1.amazonses.com.


Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-06 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Robert Leach):

 I figured I might have been missing something.  I actually have code to
 avoid the gotcha, but apparently, I only did it for the model from which
 the query came from (i.e. the "root model"):
 {{{
 # If there are any split_rows manytomany related tables, we will
 need to prepend the ordering (and pk) fields of
 # the root model
 if len(distinct_fields) > 0:
 distinct_fields.insert(0, "pk")
 tmp_distincts =
 self.getOrderByFields(model_name=self.rootmodel.__name__)
 tmp_distincts.reverse()
 for fld_nm in tmp_distincts:
 distinct_fields.insert(0, fld_nm)

 if order_by is not None and order_by not in distinct_fields:
 distinct_fields.insert(0, order_by)
 }}}

 {{{
 def getOrderByFields(self, mdl_inst_nm=None, model_name=None):
 """
 Retrieves a model's default order by fields, given a model
 instance name.
 """
 ... brevity edit ...

 # Get a model object
 mdl = apps.get_model("DataRepo", mdl_nm)

 if "ordering" in mdl._meta.__dict__:
 return mdl._meta.__dict__["ordering"]
 return []
 }}}

 I know this ticket system is not the place for getting support, but if
 you'll indulge me... would prepending all the meta ordering fields avoid
 the gotcha if I inserted the meta ordering field(s) before any other
 fields?  (In my use case, the order is unimportant - only the distinct
 is).  I'd found that it did in the case of the "root model".

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/0107018099f50604-119e789f-ef03-417f-99bc-f67f3ddb2895-00%40eu-central-1.amazonses.com.


Re: [Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-05 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
 Reporter:  Robert Leach |Owner:  nobody
 Type:  Bug  |   Status:  new
Component:  Database layer   |  Version:  3.2
  (models, ORM)  |
 Severity:  Normal   |   Resolution:
 Keywords:  sql, distinct,   | Triage Stage:
 |  Unreviewed
Has patch:  0|  Needs documentation:  0
  Needs tests:  0|  Patch needs improvement:  0
Easy pickings:  0|UI/UX:  0
-+-

Comment (by Simon Charette):

 The `QuerySet.distinct` and `order_by` methods
 
[https://docs.djangoproject.com/en/4.0/ref/models/querysets/#django.db.models.query.QuerySet.order_by
 don't resolve references to foreign key the same way] when referenced
 models define a `Meta.ordering`.

 You can get a similar crash with way less code by doing

 {{{#!python
 list(TestSynonym.objects.distinct('compound').order_by('compound'))
 }}}

 As the above actually translates to

 {{{#!python
 list(TestSynonym.objects.distinct('compound').order_by('compound__name'))
 }}}

 Due to `TestCompound.Meta.ordering = ('name',)`

 --

 I would argue that this is ''invalid'' but I'm curious to hear what others
 have to say as the resulting error is definitely cryptic there's possibly
 ways we could do things better.

 Given `DISTINCT ON` usage requires a matching `ORDER BY` I see three
 options

 1. Do nothing, users should learn the gotchas of `Meta.ordering` before
 using it. We've got a precedent against that by making aggregations ignore
 `Meta.ordering` in the recent versions.
 2. Make `distinct('related_with_ordering')` behave like `order_by` with
 regards to ordering expansion to make both APIs coherent given they a
 dependent (resulting clause could match). This will require a deprecation
 period and spreads the arguable bad related ordering expansion pattern to
 another method.
 3. Refuse the temptation to guess and make
 `distinct('related_with_ordering')` error out loudly so at least users are
 not presented this cryptic error. This maintains the arguably confusing
 mismatch between both APIs but we stop the spread of `ordering` expansion.

-- 
Ticket URL: 
Django 
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/010701809761d05f-54fbd00f-8188-4a52-8af0-b6693aba8e3b-00%40eu-central-1.amazonses.com.


[Django] #33682: SQL generation bug in `.distinct()` when supplied fields go through multiple many-related tables

2022-05-05 Thread Django
#33682: SQL generation bug in `.distinct()` when supplied fields go through
multiple many-related tables
-+-
   Reporter:  Robert |  Owner:  nobody
  Leach  |
   Type:  Bug| Status:  new
  Component:  Database   |Version:  3.2
  layer (models, ORM)|
   Severity:  Normal |   Keywords:  sql, distinct,
   Triage Stage: |  Has patch:  0
  Unreviewed |
Needs documentation:  0  |Needs tests:  0
Patch needs improvement:  0  |  Easy pickings:  0
  UI/UX:  0  |
-+-
 I have a rather complex database and an advanced search interface that
 creates complex queries.  It’s been working great for over a year now.

 I recently added a feature to count distinct related table records in the
 joined results. When I added fields to these many-related tables to
 `.distinct()` (and to `.order_by()`), I couldn’t get the test to execute
 without hitting an `InvalidColumnReference` error. And though I’m
 supplying the same expanded fields list to `.order_by()` that I am to
 `.distinct()`, the error claims that `SELECT DISTINCT ON expressions must
 match initial ORDER BY expressions`…

 When I print the SQL, the place where it notes a difference has a weird
 `T8` reference where the model name should be in an `order by` clause.
 The corresponding `distinct` clause has the full table name instead of the
 reference, which is what I suspect is triggering the exception.

 I was able to create a set of toy models and a test that minimally
 reproduces the exception...

 toymodels.py:
 {{{
 from django.db.models import Model, CharField, AutoField, ForeignKey,
 ManyToManyField, CASCADE

 class TestPeak(Model):
 id = AutoField(primary_key=True)
 name = CharField(max_length=10)
 compounds = ManyToManyField(
 to="TestCompound",
 related_name="testpeaks",
 )
 class Meta:
 verbose_name = "testpeak"
 verbose_name_plural = "testpeaks"
 ordering = ["name"]

 class TestCompound(Model):
 id = AutoField(primary_key=True)
 name = CharField(max_length=10)
 class Meta:
 verbose_name = "testcompound"
 verbose_name_plural = "testcompounds"
 ordering = ["name"]

 class TestSynonym(Model):
 name = CharField(max_length=10, primary_key=True)
 compound = ForeignKey(
 TestCompound, related_name="testsynonyms", on_delete=CASCADE
 )
 class Meta:
 verbose_name = "testsynonym"
 verbose_name_plural = "testsynonyms"
 ordering = ["compound", "name"]
 }}}

 test_bug.py:
 {{{
 from DataRepo.tests.tracebase_test_case import TracebaseTestCase
 from DataRepo.models.toymodels import TestPeak, TestCompound, TestSynonym
 from django.db.models import Q

 class DjangoSQLBug(TracebaseTestCase):
 maxDiff = None

 @classmethod
 def setUpTestData(cls):
 TestCompound.objects.create(name="testcpd")
 cpd = TestCompound.objects.get(id__exact=1)
 TestSynonym.objects.create(name="testsyn",compound=cpd)
 TestPeak.objects.create(name="testpk")
 pk = TestPeak.objects.get(id__exact=1)
 pk.compounds.add(cpd)

 def test_mm_om_query(self):
 q_exp = Q(name__iexact="testpk")
 distinct_fields = ['name', 'pk',
 'compounds__testsynonyms__compound', 'compounds__testsynonyms__name',
 'compounds__testsynonyms__pk', 'compounds__name', 'compounds__pk']
 qs =
 
TestPeak.objects.filter(q_exp).order_by(*distinct_fields).distinct(*distinct_fields)
 self.assertEqual(qs.count(), 1)
 }}}

 `python manage.py test` output:
 {{{
 Creating test database for alias 'default'...
 Creating test database for alias 'validation'...
 System check identified no issues (0 silenced).
 E
 ==
 ERROR: test_mm_om_query (DataRepo.tests.sqlbugtest.test_bug.DjangoSQLBug)
 --
 Traceback (most recent call last):
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/.venv/lib/python3.9/site-
 packages/django/db/backends/utils.py", line 84, in _execute
 return self.cursor.execute(sql, params)
 psycopg2.errors.InvalidColumnReference: SELECT DISTINCT ON expressions
 must match initial ORDER BY expressions
 LINE 1: ...peak"."id", "DataRepo_testsynonym"."compound_id", "DataRepo_...
  ^


 The above exception was the direct cause of the following exception:

 Traceback (most recent call last):
   File "/Users/rleach/PROJECT-
 local/TRACEBASE/tracebase/DataRepo/tests/sqlbugtest/test_bug.py", line 21,
 in test_mm_om_query
 self.ass