#26522: Non-deterministic crash in django.db.models.sql.Query.combine()
-------------------------------------+-------------------------------------
     Reporter:  Ole Laursen          |                    Owner:  nobody
         Type:  Bug                  |                   Status:  closed
    Component:  Database layer       |                  Version:  1.9
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:  fixed
     Keywords:                       |             Triage Stage:  Ready for
                                     |  checkin
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Comment (by Mikalai Radchuk):

 It's a very tricky issue. I think, it can damage data in some cases.

 In my case Django ORM didn't fail with assertion error: it generated wrong
 SQL query so I got wrong data from my DB. On development environment I
 didn't get this issue because it appears randomly. I'm a lucky person - in
 my case I query for some data for UI, but if you perform some calculations
 using data from a queryset that silently returns wrong data - you are in
 trouble. It's quite possible that 4/5 of your processes work properly, but
 one gives wrong results.

 Here is simplified version of my issue.

 Models:
 {{{
 #!python

 from django.db import models


 class Category(models.Model):
     title = models.CharField(max_length=255, unique=True)

     class Meta:
         verbose_name_plural = 'categories'

     def __str__(self):
         return self.title


 class Document(models.Model):
     title = models.CharField(blank=True, max_length=255)
     categories = models.ManyToManyField('Category', blank=True,
 related_name='+')

     def __str__(self):
         return self.title

 }}}

 Build a query set that could return wrong data:
 {{{
 #!python

 from app_with_models.models import Document, Category

 queryset = Document.objects.all()

 # Exclude documents without taxonomy (Category) tags
 queryset = queryset.filter(categories__in=Category.objects.all())

 # Apply predefined taxonomy filters
 category_predefined = [1]  # Should be in fixtures
 queryset = queryset.filter(categories__in=category_predefined)

 queryset = queryset.distinct()

 used_category_ids = queryset.values_list('categories__pk', flat=True)
 print(used_category_ids.query)

 }}}

 This code should generate the following SQL query (formatted):

 {{{
 #!sql

 SELECT DISTINCT
   "app_with_models_document_categories"."category_id"
 FROM "app_with_models_document"
   LEFT OUTER JOIN "app_with_models_document_categories" ON
 ("app_with_models_document"."id" =
 "app_with_models_document_categories"."document_id")
   INNER JOIN "app_with_models_document_categories" T4 ON
 ("app_with_models_document"."id" = T4."document_id")
 WHERE (
   "app_with_models_document_categories"."category_id" IN (SELECT U0."id"
 AS Col1 FROM "app_with_models_category" U0)
   AND T4."category_id" IN (1)
 );
 }}}

 But in some cases it generates this:

 {{{
 #!sql

 SELECT DISTINCT
   T4."category_id"
 FROM "app_with_models_document"
   LEFT OUTER JOIN "app_with_models_document_categories" ON
 ("app_with_models_document"."id" =
 "app_with_models_document_categories"."document_id")
   INNER JOIN "app_with_models_document_categories" T4 ON
 ("app_with_models_document"."id" = T4."document_id")
 WHERE (
   "app_with_models_document_categories"."category_id" IN (SELECT U0."id"
 AS Col1 FROM "app_with_models_category" U0)
   AND T4."category_id" IN (1)
 );
 }}}

 This query generates absolutely wrong results. In my example it will
 always return `1` as `category_id`.

 Some more details:

 * Can be reproduced on `Django==1.10.7`, `Django==1.11.4` and, most
 likely, on order versions too. Fixed in
 9bbb6e2d2536c4ac20dc13a94c1f80494e51f8d9
 * Can be reproduced on Python 3.5, but not on Python 3.6 (at least, I
 didn't manage to do that). I guess it's because of
 [https://docs.python.org/3/whatsnew/3.6.html#new-dict-implementation new
 dict implementation].
 * I've tested with Postgres 9.6.3 and `psycopg2==2.6.1` and
 `psycopg2==2.7.3.1`, but it looks like the issue is not DB engine/adapter
 specific.

 Hope it helps.

 Can it be backported into supported (or at least into LTS)? I would really
 appreciate it. Me an my colleague have spent 4 days debugging our code an
 I'm afraid that it could cause more serious issues for other people.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/26522#comment:12>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/068.dce1a80774207ba0e215cacb9c884167%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to