On Tue, 15 Jul 2025 at 23:02, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Mon, Jul 14, 2025 at 2:01 PM David G. Johnston < > david.g.johns...@gmail.com> wrote: > >> On Mon, Jul 14, 2025 at 12:54 PM Adrian Klaver <adrian.kla...@aklaver.com> >> wrote: >> >>> On 7/14/25 12:51, veem v wrote: >>> > So I want to >>> > understand the experts' opinion on this which I believe will be >>> > crucial during design itself. >>> >>> It is spelled out here: >>> >>> https://www.postgresql.org/docs/current/datatype-json.html >>> >>> >> I've taken to heart the main takeaway from that page: >> >> "In general, most applications should prefer to store JSON data as jsonb, >> unless there are quite specialized needs, such as legacy assumptions about >> ordering of object keys." >> > > I don't think the documentation is accurate at all, unless one of those > specialized needs is to 'be faster'. json serialization is more than 2x > faster based on simple testing (see below). This is absolutely not a > trivial difference. > > I would say, use json for serialization, use jsonb for data storage, > unless the precise structure of the input document is important. > > merlin > > leaselock_iam@leaselock_prod=> explain analyze select json_agg(l) from ( > select l from llcore.lease l limit 10000) q; > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=405.52..405.53 rows=1 width=32) (actual > time=69.043..69.048 rows=1 loops=1) > -> Limit (cost=0.00..380.52 rows=10000 width=247) (actual > time=0.017..9.764 rows=10000 loops=1) > -> Seq Scan on lease l (cost=0.00..100383.89 rows=2638089 > width=247) (actual time=0.016..8.831 rows=10000 loops=1) > Planning Time: 0.109 ms > Execution Time: 69.088 ms > (5 rows) > > Time: 160.560 ms > leaselock_iam@leaselock_prod=> explain analyze select jsonb_agg(l) from ( > select l from llcore.lease l limit 10000) q; > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=405.52..405.53 rows=1 width=32) (actual > time=146.139..146.141 rows=1 loops=1) > -> Limit (cost=0.00..380.52 rows=10000 width=247) (actual > time=0.017..20.837 rows=10000 loops=1) > -> Seq Scan on lease l (cost=0.00..100383.89 rows=2638089 > width=247) (actual time=0.016..19.975 rows=10000 loops=1) > Planning Time: 0.108 ms > Execution Time: 152.277 ms > > Thank you. I tested below for sample data. I see loading or serialization seems a lot slower(twice as slower) in JSONB as compared to JSON. Whereas storage looks efficient in JSONB. and reading performance of nested fields are 7-8 times slower in JSON as compared to JSONB(and ofcourse index support makes it a better choice here). Hope i am testing it correctly here. https://dbfiddle.uk/6P7sjL22 So I am a bit confused here . Also one of our use case is, along with persisting this data and querying it in postgres database, We are also going to move this data from postgres (which is a upstream OLTP system) to a downstream OLAP system ,which is in Snowflake database which is having data types like Variant or Varchar types. So, will it create a significant difference if we store it in JSON vs JSONB in our postgres i.e the source/upstream database?