Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-02 Thread Alban Hertroys


> On 2 Jun 2020, at 9:30, Shaheed Haque  wrote:
> 
> 
>> I do something like this to get a set of sub-paths in a JSONB field (no idea 
>> how to write that in Django):
>> 
>> select snapshot->’pay_definition’->k.value->’name’
>>   from MyModel
>>   join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true
>> 
> I was unaware of the LATERAL keyword, so thanks. After a bit of Googling 
> however, it seems that it is tricky/impossible to use from the ORM (barring a 
> full scale escape to a "raw" query). One question: as a novice here, I think 
> I understand the right hand side of your JOIN "... k(value)" is shorthand for:
> 
> ... AS table_name(column_name)
> 
> except that I don't see any clues in the docs that jsonb_object_keys() is a 
> "table function". Can you kindly clarify?

Correct. Thomas already explained the return type, but the plural form of the 
function name is also an indication that it returns multiple results.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-02 Thread Thomas Kellerer
> One question: as a novice here, I think I understand the right hand
> side of your JOIN "... k(value)" is shorthand for:
>
> ... AS table_name(column_name)
>
> except that I don't see any clues in the docs that
> jsonb_object_keys() is a "table function".> Can you kindly clarify?

The clue is in the column "return type" which states: "setof text" for 
jsonb_object_keys()
A function returning "setof" is the same as a "table function"





Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-02 Thread Shaheed Haque
>
> Hi,
>
> On Mon, 1 Jun 2020 at 23:50, Alban Hertroys  wrote:


> > On 1 Jun 2020, at 20:18, Shaheed Haque  wrote:
> >
> > Hi,
> >
> > I'm using Django's ORM to access Postgres12. My "MyModel" table has a
> JSONB column called 'snapshot'. In Python terms, each row's 'snapshot'
> looks like this:
> >
> > ==
> > snapshot = {
> > 'pay_definition' : {
> > '1234': {..., 'name': 'foo', ...},
> > '99': {..., 'name': 'bar', ...},
> > }
> > ==
> >
> > I'd like to find all unique values of 'name' in all rows of MyModel. I
> have this working using native JSON functions from the ORM like this:
> >
> > =
> > class PayDef(Func):
> > function='to_jsonb'
> >
>  
> template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
> >
> >
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
> flat=True)
> > =
> >
> > So, skipping the ordering/distinct/ORM parts, the core looks like this:
> >
> >
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)
>
>
> I do something like this to get a set of sub-paths in a JSONB field (no
> idea how to write that in Django):
>
> select snapshot->’pay_definition’->k.value->’name’
>   from MyModel
>   join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on
> true
>
>
I was unaware of the LATERAL keyword, so thanks. After a bit of Googling
however, it seems that it is tricky/impossible to use from the ORM (barring
a full scale escape to a "raw" query). One question: as a novice here, I
think I understand the right hand side of your JOIN "... k(value)" is
shorthand for:

... AS table_name(column_name)

except that I don't see any clues in the docs that jsonb_object_keys() is a
"table function". Can you kindly clarify?

> I don’t know how that compares performance-wise to using jsonb_each, but
> perhaps worth a try. Obviously, the way it’s written above it doesn’t
> return distinct values of ’name’ yet, but that’s fairly easy to remedy.
>
> Indeed; this is what I managed to get to:

SELECT DISTINCT snapshot -> 'pay_definition' -> k.value -> 'name' AS name
FROM paiyroll_payrun
 JOIN LATERAL jsonb_object_keys(snapshot -> 'pay_definition')
AS k(value) ON true
ORDER BY name;

At any rate, I'll have to ponder the "raw" route absent some way to "JOIN
LATERAL".

Thanks, Shaheed


> Alban Hertroys
> --
> If you can't see the forest for the trees,
> cut the trees and you'll find there is no forest.
>
>
>


Re: Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-01 Thread Alban Hertroys


> On 1 Jun 2020, at 20:18, Shaheed Haque  wrote:
> 
> Hi,
> 
> I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB 
> column called 'snapshot'. In Python terms, each row's 'snapshot' looks like 
> this:
> 
> ==
> snapshot = {
> 'pay_definition' : {
> '1234': {..., 'name': 'foo', ...},
> '99': {..., 'name': 'bar', ...},
> }
> ==
> 
> I'd like to find all unique values of 'name' in all rows of MyModel. I have 
> this working using native JSON functions from the ORM like this:
> 
> =
> class PayDef(Func):
> function='to_jsonb'
> 
> template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
> 
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
>  flat=True)
> =
> 
> So, skipping the ordering/distinct/ORM parts, the core looks like this:
> 
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)


I do something like this to get a set of sub-paths in a JSONB field (no idea 
how to write that in Django):

select snapshot->’pay_definition’->k.value->’name’
  from MyModel
  join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true

I don’t know how that compares performance-wise to using jsonb_each, but 
perhaps worth a try. Obviously, the way it’s written above it doesn’t return 
distinct values of ’name’ yet, but that’s fairly easy to remedy.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.





Using JSONB with nested key-value structures, and skipping/wildcarding the key

2020-06-01 Thread Shaheed Haque
Hi,

I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB
column called 'snapshot'. In Python terms, each row's 'snapshot' looks like
this:

==
snapshot = {
'pay_definition' : {
'1234': {..., 'name': 'foo', ...},
'99': {..., 'name': 'bar', ...},
}
==

I'd like to find all unique values of 'name' in all rows of MyModel. I have
this working using native JSON functions from the ORM like this:

=
class PayDef(Func):
function='to_jsonb'

template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"

MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef',
flat=True)
=

So, skipping the ordering/distinct/ORM parts, the core looks like this:

to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name')

My question is if this the best way to solve this problem? The way my
current logic works, reading from inside out is, I think:

   1. Pass in the 'snapshot'.
   2. Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this
   key.
   3. To skip the unknown numeric keys, "jsonb_each()" turns each key,
   value pair into an inner row like ['1234', {...}].
   4. To get to the value column of the inner row "row_to_json()->'value'".
   5. To get the name field's value "->'name'".
   6. A final call to "to_jsonb" in the PayDefs class. This bit is clearly
   Django-specific.

For example, I think the pair of calls row_to_json(jsonb_each()) is needed
because there is no jsonb_object_values() to complement
jsonb_object_keys(). Likewise, since all I care about is the string value
of 'name', is there a way to get rid of the PayDefs class, and its
invocation of to_jsonb (this is probably Django-specific)?

To provide context on what "better" might be:

   - Snapshot JSONs might easily be 20MB in size.
   - Each 'pay_definition' is probablyonly about 1kB in size, and there
   might be 50 of them in a snapshot.
   - There might be 1000 MyModel instances in a given query.
   - I'm using PostgreSQL 12

so my concern is not have the database server or Django perform extraneous
work converting between strings and JSON for example.

Thanks, Shaheed

P.S. I posted a Django-centric version of this to the relevant mailing list
but got no replies; nevertheless, apologies for the cross post.