Hi Francisco,

thanks for your work on this important topic!

I would like to share some test results here, which might help to improve the 
codebase even further. I am using the jmeter based test case from Pere and 
Enrique (thanks guys!) [1] which uses a load of 30 threads to

1) start a new process instance (POST)
2) retrieve tasks for a user (GET)
3) fetches task details (GET)
4) complete a task (POST)
5) execute a query on data-audit

With this test setup, I noticed that the performance for the POST requests, in 
particular the one to start a new process instance, degrades over time - see 
graph [2]. If I run the same test without data-index, then there is no such 
performance degradation [3]. You can find a thread dump captured a few minutes 
into the first test here [4] that might help to see some of the contention 
points.

I'd appreciate if you could take a look and see if there is something that can 
be further improved based on your previous work. If you need any additional 
data, let me know, but otherwise it is straightforward to run the jmeter test 
as well.

Thanks,
Martin

[1] https://github.com/pefernan/job-service-refactor-test/
[2] 
https://drive.google.com/file/d/1Gqn-ixE05kXv2jdssAUlnMuUVcHxIYZ0/view?usp=sharing
[3] 
https://drive.google.com/file/d/10gVNyb4JYg_bA18bNhY9dEDbPn3TOxL7/view?usp=sharing
[4] 
https://drive.google.com/file/d/1jVrtsO49gCvUlnaC9AUAtkVKTm4PbdUv/view?usp=sharing

________________________________________
From: Francisco Javier Tirado Sarti <[email protected]>
Sent: Wednesday, January 17, 2024 9:13 AM
To: [email protected]
Cc: Pere Fernandez Perez
Subject: [EXTERNAL] Re: [DISCUSSION] Performance issues with data-index 
persistence addon

Hi Alex,
I did not take times (which depends on a number of variables that
drastically change between environments), but verify that the number of
updates has been reduced drastically without losing functionality, which is
objectively a good thing. If before the change, for every node executed, we
have an update for every node previously executed, so if a process have 50
nodes to execute, we were performing nearly 50*51/2 updates, which gives us
a total of  1275 updates, now we have just one for every node being
executed, implying a total of 50 updates.


On Wed, Jan 17, 2024 at 3:18 PM Alex Porcelli <[email protected]> wrote:

> Francisco,
>
> I noticed that your PR has been merged, but I was expecting (at least
> was my understanding from this thread) that before merging some
> benchmark data would be shared in advance - to assess the cost/benefit
> of such a decent size change.
>
> Do you have any information to share?
>
> On Sat, Dec 23, 2023 at 4:02 AM Francisco Javier Tirado Sarti
> <[email protected]> wrote:
> >
> > Yes, as intended, now we have one select and one insert/update per node
> > event.
> > I moved the PR as ready for review and give @Pere Fernandez Perez
> > <[email protected]> permission to the branch so he can edit it  in the
> > next two weeks (Ill be on PTO)  if desired, before merging.
> >
> >
> > On Thu, Dec 21, 2023 at 5:58 PM Alex Porcelli <[email protected]> wrote:
> >
> > > Cool, thank you Francisco!
> > >
> > > Did you manage to get some preliminary data about improvements?
> > >
> > > On Thu, Dec 21, 2023 at 11:52 AM Francisco Javier Tirado Sarti
> > > <[email protected]> wrote:
> > > >
> > > > Yes, after some delay because of quarkus 3 migration. Im refining
> this
> > > > draft PR
> > > > https://github.com/apache/incubator-kie-kogito-apps/pull/1941
> > > >
> > > > On Thu, Dec 21, 2023 at 5:48 PM Alex Porcelli <[email protected]>
> wrote:
> > > >
> > > > > Any update or new findings on this topic?
> > > > >
> > > > > On Tue, Nov 28, 2023 at 8:38 AM Francisco Javier Tirado Sarti
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hi Alex,
> > > > > > After considering different options to improve performance, we
> feel
> > > that
> > > > > it
> > > > > > is time to "partially" move away from the current Map style
> > > interface (
> > > > > >
> > > > >
> > >
> https://github.com/apache/incubator-kie-kogito-apps/blob/main/persistence-commons/persistence-commons-api/src/main/java/org/kie/kogito/persistence/api/Storage.java
> > > > > )
> > > > > > which was shared with Trusty, to one more suitable for usage
> with a
> > > > > > relational DB like postgresql (but still compatible with big
> table
> > > dbs).
> > > > > > The idea will be to replace generic Storage interface by four
> > > specific
> > > > > > interfaces (which will inherit from a common one that keeps the
> query
> > > > > part
> > > > > > at is it. with get and query methods), that will include the
> required
> > > > > > modification operations for the four DataIndex "domains":
> > > > > processinstance,
> > > > > > usertask, processdefinitions and jobs. Those interfaces will
> define
> > > > > methods
> > > > > > like addNode, addVariable, updateTask, addAttachment..... that
> will
> > > allow
> > > > > > the persistent layer implementation  to just update the needed
> info
> > > in
> > > > > the
> > > > > > DB  (for example, for addNode in Postgres, just insert a row into
> > > nodes
> > > > > > table, for addNode in Mongo, basically the same atomic upsert
> > > operation
> > > > > > that is currently done). Therefore, we increase performance for
> > > Postgres
> > > > > > and keep the current one for Mongo. The current DB schemas won't
> be
> > > > > > touched.
> > > > > > Since the code change is large, I do not think I'll be able to
> have
> > > the
> > > > > PR
> > > > > > ready till next week.
> > > > > > But before starting, please let me know if that approach is fine
> for
> > > you.
> > > > > > Best regards.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Nov 24, 2023 at 6:55 PM Alex Porcelli <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > Thank you Francisco to getting deeper on this…
> > > > > > >
> > > > > > > Looking forward to see the results of your suggested
> improvements.
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Nov 24, 2023 at 9:40 AM Francisco Javier Tirado Sarti <
> > > > > > > [email protected]> wrote:
> > > > > > >
> > > > > > > > I forgot to attach the queries
> > > > > > > >
> > > > > > > > On Fri, Nov 24, 2023 at 3:04 PM Francisco Javier Tirado
> Sarti <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > >> Hi,
> > > > > > > >> A brief update on this topic.
> > > > > > > >> After doing a simple test with example
> > > > > > > >>
> > > > > > >
> > > > >
> > >
> https://github.com/apache/incubator-kie-kogito-examples/tree/stable/serverless-workflow-examples/serverless-workflow-data-index-quarkus
> > > > > > > ,
> > > > > > > >> the number of updates over Nodes table is n*n, so we manage
> to
> > > > > obtain a
> > > > > > > >> perfect quadratic performance degradation. The problem is
> worse
> > > in
> > > > > the
> > > > > > > case
> > > > > > > >> of Serverless Workflow than in BPMN because we the number of
> > > nodes
> > > > > is
> > > > > > > >> greater than the number of states. In that example N is 16,
> but
> > > for
> > > > > a
> > > > > > > more
> > > > > > > >> complex workflow it would be certainly large.
> > > > > > > >> I think that this is more related to how we are handling
> JPA in
> > > the
> > > > > > > code,
> > > > > > > >> in particular the mapping from model to entity (basically
> JPA is
> > > > > blind
> > > > > > > and
> > > > > > > >> has to update all nodes for every write because it believes
> the
> > > > > node has
> > > > > > > >> been updated, although it is not) than an issue in the table
> > > > > definition.
> > > > > > > >> In fact, when using JPA, separating the server model from
> the
> > > JPA
> > > > > > > entity is
> > > > > > > >> not a good idea, especially if the entity contains
> collections.
> > > I
> > > > > will
> > > > > > > try
> > > > > > > >> to change that without breaking anything.
> > > > > > > >>
> > > > > > > >> On Wed, Nov 22, 2023 at 12:10 PM Enrique Gonzalez Martinez <
> > > > > > > >> [email protected]> wrote:
> > > > > > > >>
> > > > > > > >>> After the events split you now will need to create a node
> > > instance
> > > > > > > >>> model instance of making independent from the process
> instance.
> > > > > > > >>> That should do the trick.
> > > > > > > >>>
> > > > > > > >>> Regarding deleting/inserting it was fixed at some point.
> > > > > > > >>>
> > > > > > > >>> El mar, 21 nov 2023 a las 20:22, Francisco Javier Tirado
> Sarti
> > > > > > > >>> (<[email protected]>) escribió:
> > > > > > > >>> >
> > > > > > > >>> > Hi Martin,
> > > > > > > >>> > I have a task to review performance of
> > > > > > > >>> >
> > > > > > > >>> > ProcessInstanceNodeDataEventMerger
> > > > > > > >>> > My idea is to reduce the number of delete inserts when
> > > processing
> > > > > > > >>> events
> > > > > > > >>> > and try to do it incremental.
> > > > > > > >>> > That should improve performance.
> > > > > > > >>> > PS:
> > > > > > > >>> > I was planning to send an e-mail tomorrow announcing
> that in
> > > > > case you
> > > > > > > >>> were
> > > > > > > >>> > already working on a fix for that. I assume you are not
> and I
> > > > > would
> > > > > > > be
> > > > > > > >>> > sending a PR soon.
> > > > > > > >>> >
> > > > > > > >>> > On Tue, Nov 21, 2023 at 6:09 PM Martin Weiler
> > > > > > > <[email protected]
> > > > > > > >>> >
> > > > > > > >>> > wrote:
> > > > > > > >>> >
> > > > > > > >>> > > I looked into the new examples using data-index
> persistence
> > > > > addon -
> > > > > > > >>> Neus'
> > > > > > > >>> > > PR#1813 [1] for serverless and Pere's branch [2] for
> > > workflow
> > > > > > > (great
> > > > > > > >>> job
> > > > > > > >>> > > both!) - and they work without issues using single
> > > requests.
> > > > > > > >>> However, under
> > > > > > > >>> > > some load (I used 'ab' for testing with a light
> > > concurrency of
> > > > > 10
> > > > > > > >>> parallel
> > > > > > > >>> > > requests) I ran into the following problems:
> > > > > > > >>> > >
> > > > > > > >>> > > (1) Large number of insert/delete calls (eg. for tables
> > > such as
> > > > > > > >>> nodes,
> > > > > > > >>> > > definitions, etc)
> > > > > > > >>> > >
> > > > > > > >>> > > (2) Hibernate OptimisticLockExceptions /
> > > StaleStateExceptions
> > > > > > > >>> > >
> > > > > > > >>> > > (3) DB deadlocks
> > > > > > > >>> > >
> > > > > > > >>> > > (4) Error responses, slow response times
> > > > > > > >>> > >
> > > > > > > >>> > > The reason I am reaching out with this topic here is to
> > > find
> > > > > out if
> > > > > > > >>> we are
> > > > > > > >>> > > aware of this issue, and if someone is already looking
> > > into or
> > > > > > > being
> > > > > > > >>> > > assigned to it?
> > > > > > > >>> > >
> > > > > > > >>> > > Thanks,
> > > > > > > >>> > > Martin
> > > > > > > >>> > >
> > > > > > > >>> > > [1]
> > > > > > > >>>
> > > https://github.com/apache/incubator-kie-kogito-examples/pull/1813
> > > > > > > >>> > > [2]
> > > > > > > >>> > >
> > > > > > > >>>
> > > > > > >
> > > > >
> > >
> https://github.com/pefernan/kogito-examples/tree/example_data-index_persistence
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > >>> > > To unsubscribe, e-mail: [email protected]
> > > > > > > >>> > > For additional commands, e-mail:
> [email protected]
> > > > > > > >>> > >
> > > > > > > >>> > >
> > > > > > > >>>
> > > > > > > >>>
> > > > >
> ---------------------------------------------------------------------
> > > > > > > >>> To unsubscribe, e-mail: [email protected]
> > > > > > > >>> For additional commands, e-mail: [email protected]
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail: [email protected]
> > > > > > > > For additional commands, e-mail: [email protected]
> > > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [email protected]
> > > > > For additional commands, e-mail: [email protected]
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to