Hello Pawan,

Approach looks good, my only suggestion is to use batch processing only
when we are dealing with large data set, as this method takes a longer time
compared to the normal method specially on a distributed environment, which
may negatively impact the performance.

Kind Regards,
Chandan Khandelwal
Senior Manager, Enterprise Software Development

*HotWax Systems*
*Enterprise open source experts*
cell: +91-98934-81076
office: 0731-409-3684
http://www.hotwaxsystems.com


On Fri, Jun 5, 2020 at 4:07 PM Pawan Verma <pawan.ve...@hotwaxsystems.com>
wrote:

> Thanks, Pritam and Scott for the discussion.
>
> I've created Jira OFBIZ-11789 for this improvement and also created a PR
> with the proposed changes.
>
> I request everyone to review the PR and suggest your thought on this.
> Thanks!
> --
> Thanks & Regards
> Pawan Verma
> Technical Consultant
> *HotWax Systems*
> *Enterprise open source experts*
> http://www.hotwaxsystems.com
>
>
> On Mon, Jun 1, 2020 at 12:36 PM Pritam Kute <pritam.k...@hotwaxsystems.com
> >
> wrote:
>
> > Thanks Scott for your detailed explanation.
> >
> > The solution looks good to me too. My confusion was with why we are going
> > to implement new method if we can achieve that using the current
> > EntityQuery methods.
> >
> > +1 for adding queryBatchIterator() to EntityQuery.
> >
> > Kind Regards,
> > --
> > Pritam Kute
> >
> >
> > On Thu, May 28, 2020 at 6:32 AM Scott Gray <scott.g...@hotwaxsystems.com
> >
> > wrote:
> >
> > > Hi Pritam,
> > >
> > > I'm not sure about PostgreSQL or Derby but I know with MySQL that
> using a
> > > cursor doesn't really work.  You have to set the result set to
> > > TYPE_FORWARD_ONLY and CONCUR_READ_ONLY and also set the fetch size to
> > > INTEGER.MIN_VALUE.  Only then will the driver stream the results and
> even
> > > then, you may not execute any other SQL commands on the connection
> until
> > > you have fully read or closed the resultset.
> > >
> > > So if an EntityListIterator doesn't really conserve memory, then you
> need
> > > to take a paging query approach such as this:
> > > EntityQuery query =
> > EntityQuery.use(delegator).from("SomeTable").limit(100)
> > > List results = null
> > > while (!(results = query.queryList()).isEmpty()) {
> > >  for (value : results) {
> > >   // do something with each value
> > >  }
> > >  query.offset(query.getOffset() + query.getLimit())
> > > }
> > >
> > > Or with the proposed EntityBatchIterator:
> > > Iterator query =
> > >
> > >
> >
> EntityQuery.use(delegator).from("SomeTable").limit(100).queryBatchIterator()
> > > while (iterator.hasNext()) {
> > >  result = iterator.next()
> > >  // do something with each value
> > > }
> > >
> > > I guess an alternative approach would be to implement something similar
> > > within the EntityListIterator and perhaps a flag to turn it off or on
> > > depending on which database is being used and how well it supports
> > > iterating over results without loading the entire resultset into
> memory.
> > >
> > > Regards
> > > Scott
> > >
> > >
> > >
> > > On Sat, 23 May 2020 at 20:59, Pritam Kute <
> pritam.k...@hotwaxsystems.com
> > >
> > > wrote:
> > >
> > > > Hello Pawan,
> > > >
> > > > I just had a look into the EntityQuery.queryIterator() method and
> looks
> > > > like we can achieve that by using fetchSize(), fowardOnly(),
> > > > cursorScrollInsensitive(), cursorScrollSensitive() and offset()
> methods
> > > in
> > > > EntityQuery class. Let me know if I am missing anything.
> > > >
> > > > It will be good if you can post a pseudo code or something here so
> that
> > > we
> > > > could get an understanding of the exact design which you have in your
> > > mind.
> > > >
> > > > Kind Regards,
> > > > --
> > > > Pritam Kute
> > > >
> > > >
> > > > On Thu, May 21, 2020 at 7:41 PM Pawan Verma <
> > > pawan.ve...@hotwaxsystems.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hello Devs,
> > > > >
> > > > > While working on the large database we have figured out that very
> > large
> > > > > queries consume all memory and crash ofbiz(because queryIterator()
> > > > doesn't
> > > > > really work, it's no different from queryList())
> > > > >
> > > > > The EntityListIterator attempts to use a cursor to iterate over
> large
> > > > > result sets but in reality most databases do not give us a cursor
> > > unless
> > > > we
> > > > > ask for it in a very specific way, and instead you get back the
> full
> > > > result
> > > > > set and potentially consume a large amount of memory.  For example,
> > the
> > > > > MySql details are here (ResultSet section):
> > > > >
> > > > >
> > > >
> > >
> >
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
> > > > >
> > > > > To work around this we've built the EntityBatchIterator which
> instead
> > > > > basically just pages through results like you would do on a webpage
> > > that
> > > > > uses pagination.  The iterator takes an EntityQuery and when next()
> > is
> > > > > called it grabs the first 500 records and returns the first record,
> > > call
> > > > > next again and you get the second record, once you've called next
> 500
> > > > times
> > > > > it runs the query again with an offset and grabs the next 500 and
> so
> > on
> > > > > until there are no more records.
> > > > >
> > > > > The main downsides to this approach are:
> > > > > 1. Same as when using limit/offset you want to be sure your results
> > > will
> > > > > come back in a consistent order to avoid accidentally skipping some
> > > rows
> > > > > and seeing other rows twice.
> > > > > 2. Because the results are a combination of many queries rather
> than
> > a
> > > > > single query, some of the data may change while you are paging
> > through
> > > > it.
> > > > > i.e. if you were to sort by lastUpdatedTxStamp you may see some
> rows
> > > > twice
> > > > > as they are updated by other transactions (this might be a good
> thing
> > > in
> > > > > some cases).
> > > > >
> > > > > So, the main proposal is to introduce a new EntityBatchIterator for
> > > large
> > > > > queries.  If we agree on the proposal I'll create a Jira and
> provide
> > a
> > > > > patch for the community review. Thanks!
> > > > >
> > > > > --
> > > > > Thanks & Regards
> > > > > Pawan Verma
> > > > > Technical Consultant
> > > > > *HotWax Systems*
> > > > > *Enterprise open source experts*
> > > > > http://www.hotwaxsystems.com
> > > > >
> > > >
> > >
> >
>

Reply via email to