Re: synchronization question about datasketches aggregator

2018-07-23 Thread Anastasia Braginsky
 Hi Guys,
Just wanted to pay your attention that once OakIncrementalIndex will be in 
place there is no need to manage the issue of synchronization between 
aggregators and ingestions. Part of Oak benefits is the synchronization for the 
simultaneous writes and reads of the same key in the map.

On Thursday, July 19, 2018, 10:17:18 PM GMT+3, Gian Merlino 
 wrote:  
 
 Hi Will,

Check out also this thread for related discussion:

https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E

On Thu, Jul 19, 2018 at 11:21 AM Will Lauer  wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>
  

Re: synchronization question about datasketches aggregator

2018-07-19 Thread Gian Merlino
Hi Will,

Check out also this thread for related discussion:

https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E

On Thu, Jul 19, 2018 at 11:21 AM Will Lauer  wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>


Re: synchronization question about datasketches aggregator

2018-07-19 Thread Roman Leventov
There is a race between aggregators and ingestion updates. Actually, many
aggregators are vulnerable now. See this issue:
https://github.com/apache/incubator-druid/pull/3956 and a conversation
starting from this message:
https://github.com/apache/incubator-druid/pull/5148#discussion_r170906998.

However, you could replace a simple synchronized with ReadWriteLock or
Striped (see ArrayOfDoublesSketchMergeBufferAggregator for
example), that would be a useful contribution to Druid.

On Thu, 19 Jul 2018 at 13:21, Will Lauer  wrote:

> A colleague recently pointed out to me that all the sketch operations that
> take place in SketchAggregator (in the datasketches module) use a
> SychronizedUnion class that basically wraps a normal sketch Union and
> synchronizes all operations. From what I can tell with other aggregators in
> the Druid code base, there doesn't appear to be a need to synchronize. It
> looks like Aggregators are always processed from within a single thread. Is
> it reasonable to remove all the syncrhonizations from the SketchAggregator
> and avoid the performance hit that they impose at runtime?
>
> Will
>
> Will Lauer
> Senior Principal Architect
>
> Progress requires pain
>
> m: 508.561.6427
>
> o: 217.255.4262
>