I think for trancing use case we need to publish events one by one from
each mediator (we can't aggregate all such events as it also contains the
message payload)

---------- Forwarded message ----------
From: Supun Sethunga <sup...@wso2.com>
Date: Mon, Feb 8, 2016 at 2:54 PM
Subject: Re: ESB Analytics Mediation Event Publishing Mechanism
To: Anjana Fernando <anj...@wso2.com>
Cc: "engineering-gr...@wso2.com" <engineering-gr...@wso2.com>, Srinath
Perera <srin...@wso2.com>, Sanjiva Weerawarana <sanj...@wso2.com>, Kasun
Indrasiri <ka...@wso2.com>, Isuru Udana <isu...@wso2.com>


Hi all,

Ran some simple performance tests against the new relational provider, in
comparison with the existing one. Follow are the results:

*Records in Backend DB Table*: *1,054,057*

*Conversion:*
Spark Table
id a b c
Backend DB Table 1 xxx yyy zzz
id data 1 ppp qqq rrr
1
[{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
--
To --> 1 aaa bbb ccc
2
[{'a':'aaa','b':'bbb','c':'ccc'},{'a':'xxx','b':'yyy','c':'zzz'},{'a':'ppp','b':'qqq','c':'rrr'}]
2 xxx yyy zzz
2 aaa bbb ccc
2 ppp qqq rrr



*Avg Time for Query Execution:*

Querry
Execution time (~ sec)
Existing Analytics Relation Provider New (ESB) Analytics Relation Provider* *
New relational provider split a single row to multiple rows. Hence the
number of rows in the table equivalent to 3 times (as each row is split to
3 rows) as the original table.
SELECT COUNT(*) FROM <Table>; 13 16
SELECT * FROM <Table> ORDER BY id ASC; 13 16
SELECT * FROM <Table> WHERE id=98435; 13 16
SELECT id,a,first(b),first(c) FROM <Table> GROUP BY id,a ORDER BY id ASC; 18
26

Regards,
Supun

On Wed, Feb 3, 2016 at 3:36 PM, Supun Sethunga <sup...@wso2.com> wrote:

> Hi all,
>
> I have started working on implementing a new "relation" / "relation
> provider", to serve the above requirement. This basically is a modified
> version of the existing "Carbon Analytics" relation provider.
>
> Here I have assumed that the encapsulated data for a single execution flow are
> stored in a single row, and the data about the mediators invoked during the
> flow are stored in a known column of each row (say "data"), as an array
> (say a json array). When each row is read in to spark, this relational
> provider create separate rows for each of the element in the array stored
> in "data" column. I have tested this with some mocked data, and works as
> expected.
>
> Need to test with the real data/data-formats, and modify the mapping
> accordingly. Will update the thread with the details.
>
> Regards,
> Supun
>
>
> On Tue, Feb 2, 2016 at 2:36 AM, Anjana Fernando <anj...@wso2.com> wrote:
>
>> Hi,
>>
>> In a meeting I'd with Kasun and the ESB team, I got to know that, for
>> their tracing mechanism, they were instructed to publish one event for each
>> of the mediator invocations, where, earlier they had an approach, they
>> publish one event, which encapsulated data of a whole execution flow. I
>> would actually like to support the latter approach, mainly due to
>> performance / resource requirements. And also considering the fact, this is
>> a feature that could be enabled in production. So simply, if we do one
>> event per mediator, this does not scale that well. For example, if the ESB
>> is doing 1k TPS, for a sequence that has 20 mediators, that is 20k TPS for
>> analytics traffic. Combine that with a possible ESB cluster hitting a DAS
>> cluster with a single backend database, this maybe too many rows per second
>> written to the database. Where the main problem here is, one event is, a
>> single row/record in the backend database in DAS, so it may come to a
>> state, where the frequency of row creations by events coming from ESBs
>> cannot be sustained.
>>
>> If we create a single event from the 20 mediators, then it is just 1k TPS
>> for DAS event receivers and the database too, event though the message size
>> is bigger. It is not necessarily same performance, if you publish lots of
>> small events to publishing bigger events. Throughput wise, comparatively
>> bigger events will win (even though if we consider that, small operations
>> will be batched in transport level etc.. still one event = one database
>> row). So I would suggest, we try out a single sequence flow = single event,
>> approach, and from the Spark processing side, we consider one of these big
>> rows as multiple rows in Spark. I was first thinking, if UDFs can help in
>> splitting a single column to multiple rows, and that is not possible, and
>> also, a bit troublesome, considering we have to delete the original data
>> table after we concerted it using a script, and not forgetting, we actually
>> have to schedule and run a separate script to do this post-processing. So a
>> much cleaner way to do this would be, to create a new "relation provider"
>> in Spark (which is like a data adapter for their DataFrames), and in our
>> relation provider, when we are reading rows, we convert a single row's
>> column to multiple rows and return that for processing. So Spark will not
>> know, physically it was a single row from the data layer, and it can
>> summarize the data and all as usual and write to the target summary tables.
>> [1] is our existing implementation of Spark relation provider, which
>> directly maps to our DAS analytics tables, we can create the new one
>> extending / based on it. So I suggest we try out this approach and see, if
>> everyone is okay with it.
>>
>> [1]
>> https://github.com/wso2/carbon-analytics/blob/master/components/analytics-processors/org.wso2.carbon.analytics.spark.core/src/main/java/org/wso2/carbon/analytics/spark/core/sources/AnalyticsRelationProvider.java
>>
>> Cheers,
>> Anjana.
>> --
>> *Anjana Fernando*
>> Senior Technical Lead
>> WSO2 Inc. | http://wso2.com
>> lean . enterprise . middleware
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "WSO2 Engineering Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to engineering-group+unsubscr...@wso2.com.
>> For more options, visit https://groups.google.com/a/wso2.com/d/optout.
>>
>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>



-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324



-- 
Kasun Indrasiri
Software Architect
WSO2, Inc.; http://wso2.com
lean.enterprise.middleware

cell: +94 77 556 5206
Blog : http://kasunpanorama.blogspot.com/
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to