Hi Mate, Thanks for the detailed response. This will help others in the same situation
Dave Cramer da...@postgresintl.com www.postgresintl.com On Thu, 6 Sep 2018 at 05:03, Mate Varga <m...@matevarga.net> wrote: > Hi, > > summarizing: > we had a table that had an OID column, referencing an object in > pg_largeobject. This was mapped to a (Java) entity with a byte array field, > annotated with @Lob. The problem was that we were fetching thousands of > these entities in one go, and LOB fetching is not batched by Hibernate/JDBC > (so each row is fetched separately). Because we were abusing LOBs (they > were small, often less than 10 kB), we have chosen to move the binary blobs > from the LO table to a simple bytea column. So the entity that had a byte > array field mapped to an OID column now has a byte array field mapped to a > bytea column, and we have manually moved data from the LO table to the > bytea column. Now Hibernate/JDBC fetches all the content we need in > batches. Random benchmark: fetching 20k rows used to take 7 seconds (250 > msec query execution time, 6.7 sec for transfer) and now it takes 1.5 > seconds (250 msec query + 1.3 sec transfer). > > Regards, > Mate > > On Thu, Sep 6, 2018 at 10:56 AM Dave Cramer <p...@fastcrypt.com> wrote: > >> Hi >> >> Can you be more explicit how you fixed the problem ? >> >> Thanks >> Dave Cramer >> >> da...@postgresintl.com >> www.postgresintl.com >> >> >> On Thu, 6 Sep 2018 at 03:46, Mate Varga <m...@matevarga.net> wrote: >> >>> After inlining the data, performance issues have been solved. Thanks for >>> the help. >>> >>> On Mon, Sep 3, 2018 at 9:57 PM Mate Varga <m...@matevarga.net> wrote: >>> >>>> Thanks, >>>> 1) we'll try to move stuff out from LOBs >>>> 2) we might raise a PR for the JDBC driver >>>> >>>> Mate >>>> >>>> On Mon, 3 Sep 2018, 19:35 Dave Cramer, <p...@fastcrypt.com> wrote: >>>> >>>>> >>>>> >>>>> On Mon, 3 Sep 2018 at 13:00, Mate Varga <m...@matevarga.net> wrote: >>>>> >>>>>> More precisely: when fetching 10k rows, JDBC driver just does a large >>>>>> bunch of socket reads. With lobs, it's ping-pong: one read, one write per >>>>>> lob... >>>>>> >>>>>> >>>>> Ok, this is making more sense. In theory we could fetch them all but >>>>> since they are LOB's we could run out of memory. >>>>> >>>>> Not sure what to tell you at this point. I'd entertain a PR if you >>>>> were motivated. >>>>> >>>>> Dave Cramer >>>>> >>>>> da...@postgresintl.com >>>>> www.postgresintl.com >>>>> >>>>> >>>>> >>>>>> >>>>>> On Mon, Sep 3, 2018 at 6:54 PM Mate Varga <m...@matevarga.net> wrote: >>>>>> >>>>>>> So I have detailed profiling results now. Basically it takes very >>>>>>> long that for each blob, the JDBC driver reads from the socket then it >>>>>>> creates the byte array on the Java side. Then it reads the next blob, >>>>>>> etc. >>>>>>> I guess this takes many network roundtrips. >>>>>>> >>>>>>> On Mon, Sep 3, 2018 at 5:58 PM Dave Cramer <p...@fastcrypt.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Mon, 3 Sep 2018 at 10:48, Mate Varga <m...@matevarga.net> wrote: >>>>>>>> >>>>>>>>> That's 1690 msec (1.69 seconds, and that is how long it takes to >>>>>>>>> fetch 20k (small-ish) rows without LOBs (LOBs are a few lines below >>>>>>>>> on the >>>>>>>>> screenshot) >>>>>>>>> >>>>>>>> >>>>>>>> that sound high as well! >>>>>>>> >>>>>>>> Something isn't adding up.. >>>>>>>> >>>>>>>> >>>>>>>> Dave Cramer >>>>>>>> >>>>>>>> da...@postgresintl.com >>>>>>>> www.postgresintl.com >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 3, 2018 at 4:40 PM Dave Cramer <p...@fastcrypt.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> the one you have highlighted ~1.69ms >>>>>>>>>> >>>>>>>>>> Dave Cramer >>>>>>>>>> >>>>>>>>>> da...@postgresintl.com >>>>>>>>>> www.postgresintl.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, 3 Sep 2018 at 10:38, Mate Varga <m...@matevarga.net> wrote: >>>>>>>>>> >>>>>>>>>>> Which frame do you refer to? >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 3, 2018 at 3:57 PM Dave Cramer <p...@fastcrypt.com> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Not sure why reading from a socket is taking 1ms ? >>>>>>>>>>>> >>>>>>>>>>>> Dave Cramer >>>>>>>>>>>> >>>>>>>>>>>> da...@postgresintl.com >>>>>>>>>>>> www.postgresintl.com >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 3 Sep 2018 at 09:39, Mate Varga <m...@matevarga.net> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> https://imgur.com/a/ovsJPRv -- I've uploaded the profiling >>>>>>>>>>>>> info (as an image, sorry). It seems this is a JDBC-level problem. >>>>>>>>>>>>> I >>>>>>>>>>>>> understand that the absolute timing is not meaningful at all >>>>>>>>>>>>> because you >>>>>>>>>>>>> don't know how large the resultset is, but I can tell that this >>>>>>>>>>>>> is only a >>>>>>>>>>>>> few thousands rows + few thousand largeobjects, each largeobject >>>>>>>>>>>>> is around >>>>>>>>>>>>> 1 kByte. (Yes I know this is not a proper use of LOBs -- it's a >>>>>>>>>>>>> legacy db >>>>>>>>>>>>> structure that's hard to change.) >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> Mate >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:52 AM Mate Varga <m...@matevarga.net> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hey, >>>>>>>>>>>>>> >>>>>>>>>>>>>> we'll try to test this with pure JDBC versus hibernate. >>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Sep 3, 2018 at 11:48 AM Dave Cramer <p...@fastcrypt.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, 3 Sep 2018 at 03:55, Mate Varga <m...@matevarga.net> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Basically there's a class with a byte[] field, the class is >>>>>>>>>>>>>>>> mapped to table T and the byte field is annotated with @Lob so >>>>>>>>>>>>>>>> it goes to >>>>>>>>>>>>>>>> the pg_largeobject table. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ah, so hibernate is in the mix. I wonder if that is causing >>>>>>>>>>>>>>> some challenges ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The DB is on separate host but relatively close to the app, >>>>>>>>>>>>>>>> and I can reproduce the problem locally as well. One >>>>>>>>>>>>>>>> interesting bit is >>>>>>>>>>>>>>>> that turning of SSL between the app and PSQL speeds up things >>>>>>>>>>>>>>>> by at least >>>>>>>>>>>>>>>> 50%. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ah, one addition -- the binary objects are encrypted, so >>>>>>>>>>>>>>>> their entropy is very high. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Any chance you could write a simple non-hibernate test code >>>>>>>>>>>>>>> to time the code ? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Dave Cramer >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> dave.cra...@crunchydata.ca >>>>>>>>>>>>>>> www.crunchydata.ca >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Mate >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Sun, Sep 2, 2018 at 12:55 AM Dave Cramer < >>>>>>>>>>>>>>>> p...@fastcrypt.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, 31 Aug 2018 at 10:15, Mate Varga <m...@matevarga.net> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I see -- we could try that, though we're mostly using an >>>>>>>>>>>>>>>>>> ORM (Hibernate) to do this. Thanks! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Aug 31, 2018 at 3:57 PM Dmitry Igrishin < >>>>>>>>>>>>>>>>>> dmit...@gmail.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> пт, 31 авг. 2018 г. в 16:35, Mate Varga <m...@matevarga.net >>>>>>>>>>>>>>>>>>> >: >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Hi, >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > we're fetching binary data from pg_largeobject table. >>>>>>>>>>>>>>>>>>> The data is not very large, but we ended up storing it >>>>>>>>>>>>>>>>>>> there. If I'm >>>>>>>>>>>>>>>>>>> copying the data to a file from the psql console, then it >>>>>>>>>>>>>>>>>>> takes X time >>>>>>>>>>>>>>>>>>> (e.g. a second), fetching it through the JDBC driver takes >>>>>>>>>>>>>>>>>>> at least 10x >>>>>>>>>>>>>>>>>>> more. We don't see this difference between JDBC and >>>>>>>>>>>>>>>>>>> 'native' performance >>>>>>>>>>>>>>>>>>> for anything except largeobjects (and bytea columns, for >>>>>>>>>>>>>>>>>>> the record). >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > Does anyone have any advice about whether this can be >>>>>>>>>>>>>>>>>>> tuned or what the cause is? >>>>>>>>>>>>>>>>>>> I don't know what a reason of that, but I think it's >>>>>>>>>>>>>>>>>>> reasonable and >>>>>>>>>>>>>>>>>>> quite simple to call lo_import()/lo_export() via JNI. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can't imagine that's any faster. The driver simply >>>>>>>>>>>>>>>>> implements the protocol >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Do you have any code to share ? Any other information ? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is the JDBC connection significantly further away network >>>>>>>>>>>>>>>>> wise ? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Dave Cramer >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> da...@postgresintl.com >>>>>>>>>>>>>>>>> www.postgresintl.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>