#36494: Various failures in JSONField lookups when using expressions in 
right-hand
side
-------------------------------------+-------------------------------------
     Reporter:  Jacob Walls          |                    Owner:  (none)
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  dev
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:
                                     |  Unreviewed
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

 * resolution:  wontfix =>
 * status:  closed => new
 * summary:  Various failures for JSONField lookups and right-hand side
     subqueries =>
     Various failures in JSONField lookups when using expressions in right-
     hand side


Old description:

> After #35972, I worked on a
> [https://github.com/jacobtylerwalls/django/pull/2/files test] to verify
> support for `JSONField` lookups against right-hand side subqueries.
> Support is mostly complete on Postgres, but largely incomplete on other
> backends.
>
> I'm suggesting we close this ticket (in separate PRs) with a combination
> of adding support where possible (perhaps for `__range`?), with failing
> at the python layer with more appropriate exceptions, or possibly adding
> a feature flag to guard some failures. I agree with the
> [https://code.djangoproject.com/ticket/31779#comment:3 sentiment] that we
> cannot maintain endless feature flags for database quirks or insulate
> users from every bug in their database vendor, but some of these failures
> seem to me to be on the Django side.
>
> Postgres:
> 1. `__range`
> ([https://github.com/django/django/blob/6df19412aabb7d969f5eab4b2ff41269de89b233/django/db/models/lookups.py#L649
> source]):
> {{{#!py
> django.db.utils.ProgrammingError: the query has 2 placeholders but 3
> parameters were passed
> }}}
> 2. `__has_keys` or `__has_any_keys`:
> {{{#!py
>   File "/django/source/django/db/models/fields/json.py", line 278, in
> get_prep_lookup
>     return [str(item) for item in self.rhs]
>                                   ^^^^^^^^
> TypeError: 'Subquery' object is not iterable
> }}}
> ----
> SQLite & MySQL:
> 1. `__range`: same as Postgres
> 2. `__has_keys` or `__has_any_keys`: same as Postgres
> 3. `__has_key`:
> {{{#!py
>   File "/django/source/django/db/models/fields/json.py", line 186, in
> compile_json_path_final_key
>     return ".%s" % json.dumps(key_transform)
>                    ^^^^^^^^^^^^^^^^^^^^^^^^^
> TypeError: Object of type Subquery is not JSON serializable
> }}}
> 4. `__gt`, `__gte`, `__lt`, `__lte`:
> {{{#!py
>   File "/django/source/django/db/models/fields/json.py", line 610, in
> process_rhs
>     rhs_params = [json.loads(value) for value in rhs_params]
>                   ^^^^^^^^^^^^^^^^^
> json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
> }}}
> 5. `__exact`:
> {{{#!py
> sqlite3.OperationalError: malformed JSON
> }}}
> ----
> Additional failure on MySQL:
> 6. `__in`:
> {{{#!py
> django.db.utils.NotSupportedError: (1235, "This version of MySQL doesn't
> yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
> }}}
> ----
> Oracle:
> 1. `__range`: same as others (database exception)
> 2. `__has_keys` or `__has_any_keys`: same as others
> 3. `__has_key`: same as SQLite & MySQL
> 4. Certain text lookups (`__icontains`, `__startswith`, `__istartswith`,
> `__endswith`, `__iendswith`): return wrong results, they appear to return
> any results having the key on the left-hand side regardless of the
> contents of the right-hand side, refer to the modifications commented out
> in the linked test.
> ----
> This is all pretty niche, but if the context helps, I could envision
> realistic use cases for `__has_keys=queryset`, as I work on a project
> that uses stringified UUIDs identifying other entities as JSON keys.

New description:

 After #35972, I worked on a
 
[https://github.com/jacobtylerwalls/django/commit/3c56106e4c60639f274c4acbfb582f6d1251051b
 test] to verify support for `JSONField` lookups against right-hand side
 subqueries. Support is mostly complete on Postgres, but largely incomplete
 on other backends.

 I'm suggesting we close this ticket (in separate PRs) with a combination
 of adding support where possible (perhaps for `__range`?), with failing at
 the python layer with more appropriate exceptions, or possibly adding a
 feature flag to guard some failures. I agree with the
 [https://code.djangoproject.com/ticket/31779#comment:3 sentiment] that we
 cannot maintain endless feature flags for database quirks or insulate
 users from every bug in their database vendor, but some of these failures
 seem to me to be on the Django side.

 Postgres:
 1. `__range`
 
([https://github.com/django/django/blob/6df19412aabb7d969f5eab4b2ff41269de89b233/django/db/models/lookups.py#L649
 source]):
 {{{#!py
 django.db.utils.ProgrammingError: the query has 2 placeholders but 3
 parameters were passed
 }}}
 2. `__has_keys` or `__has_any_keys`:
 {{{#!py
   File "/django/source/django/db/models/fields/json.py", line 278, in
 get_prep_lookup
     return [str(item) for item in self.rhs]
                                   ^^^^^^^^
 TypeError: 'Subquery' object is not iterable
 }}}
 ----
 SQLite & MySQL:
 1. `__range`: same as Postgres
 2. `__has_keys` or `__has_any_keys`: same as Postgres
 3. `__has_key`:
 {{{#!py
   File "/django/source/django/db/models/fields/json.py", line 186, in
 compile_json_path_final_key
     return ".%s" % json.dumps(key_transform)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
 TypeError: Object of type Subquery is not JSON serializable
 }}}
 4. `__gt`, `__gte`, `__lt`, `__lte`:
 {{{#!py
   File "/django/source/django/db/models/fields/json.py", line 610, in
 process_rhs
     rhs_params = [json.loads(value) for value in rhs_params]
                   ^^^^^^^^^^^^^^^^^
 json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
 }}}
 5. `__exact`:
 {{{#!py
 sqlite3.OperationalError: malformed JSON
 }}}
 ----
 Additional failure on MySQL:
 6. `__in`:
 {{{#!py
 django.db.utils.NotSupportedError: (1235, "This version of MySQL doesn't
 yet support 'LIMIT & IN/ALL/ANY/SOME subquery'")
 }}}
 ----
 Oracle:
 1. `__range`: same as others (database exception)
 2. `__has_keys` or `__has_any_keys`: same as others
 3. `__has_key`: same as SQLite & MySQL
 4. Certain text lookups (`__icontains`, `__startswith`, `__istartswith`,
 `__endswith`, `__iendswith`): return wrong results, they appear to return
 any results having the key on the left-hand side regardless of the
 contents of the right-hand side, refer to the modifications commented out
 in the linked test.
 ----
 This is all pretty niche, but if the context helps, I could envision
 realistic use cases for `__has_keys=queryset`, as I work on a project that
 uses stringified UUIDs identifying other entities as JSON keys.

--
Comment:

 Simon suggested bracketing `Subquery` for the moment and just focusing on
 expression support generally.

 With just `F()`...
 {{{#!py
     def test_lookups_using_expression(self):
         contains_lookups = {"contains", "contained_by"}
         lookups_not_matching = {"gt", "lt", "has_key"}
         text_lookups = {
             "has_key",
             "iexact",
             "icontains",
             "startswith",
             "istartswith",
             "endswith",
             "iendswith",
             "regex",
             "iregex",
         }
         for lookup in JSONField.get_lookups():
             with self.subTest(lookup=lookup):
                 if lookup == "isnull":
                     continue  # not allowed, rhs must be a literal
 boolean.
                 if (
                     lookup in contains_lookups
                     and not
 connection.features.supports_json_field_contains
                 ):
                     continue
                 if lookup in text_lookups:
                     rhs = KT("value__bar")
                 else:
                     rhs = F("value__bar")
                 qs =
 NullableJSONModel.objects.filter(**{f"value__bar__{lookup}": rhs})
                 if lookup in lookups_not_matching:
                     self.assertIs(qs.exists(), False)
                 else:
                     # might need tweaking
                     self.assertQuerySetEqual(qs,
 NullableJSONModel.objects.filter(value__has_key="bar"))
 }}}

 ... we see three classes of failures:

 1. `gt`/`gte`/`lt`/`lte`

 {{{#!py
   File "/Users/jwalls/django/django/db/models/fields/json.py", line 614,
 in process_rhs
     rhs_params = [json.loads(value) for value in rhs_params]
                   ~~~~~~~~~~^^^^^^^
   File "/Users/jwalls/cpython/Lib/json/__init__.py", line 346, in loads
     return _default_decoder.decode(s)
            ~~~~~~~~~~~~~~~~~~~~~~~^^^
   File "/Users/jwalls/cpython/Lib/json/decoder.py", line 345, in decode
     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
   File "/Users/jwalls/cpython/Lib/json/decoder.py", line 363, in
 raw_decode
     raise JSONDecodeError("Expecting value", s, err.value) from None
 json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
 }}}

 2. `range`
 {{{#!py
 django.db.utils.OperationalError: near "AND": syntax error
 }}}

 3. `has_keys`/`has_any_keys`
 {{{#!py
   File "/Users/jwalls/django/django/db/models/fields/json.py", line 278,
 in get_prep_lookup
     return [str(item) for item in self.rhs]
                                   ^^^^^^^^
 TypeError: 'KeyTransform' object is not iterable
 }}}

 ----
 > I'm suggesting we close this ticket (in separate PRs) with a combination
 of adding support where possible (perhaps for __range?), with failing at
 the python layer with more appropriate exceptions,

 I think we should invite PRs per lookup to either raise a better exception
 (case 2 above is especially poor, sending malformed SQL) or add support if
 the effort is not massive. (At Django on the Med 2025 I made a small
 experiment with adding `has_any_keys` support on PostgreSQL: seemed
 slightly promising?)

 All of the commits would probably be "Refs ...", and then I imagine
 closing this ticket manually once we decide the remaining edge cases are
 too difficult to implement.

 Going through the new features features process *could* help ensure that
 "reviewer resources" are equitably competed for in this case, but I can't
 think of many other reasons to go through that process. I think in the
 absence of any glaring warning in the docs, my expectation is that all of
 JSONField's registered lookups work with expressions. I doubt we would add
 a release note.
-- 
Ticket URL: <https://code.djangoproject.com/ticket/36494#comment:2>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/django-updates/01070199c3959ebc-3bf5c9f9-2beb-418b-8247-6cc663671a31-000000%40eu-central-1.amazonses.com.

Reply via email to