Re: synchronization question about datasketches aggregator
Hi Guys, Just wanted to pay your attention that once OakIncrementalIndex will be in place there is no need to manage the issue of synchronization between aggregators and ingestions. Part of Oak benefits is the synchronization for the simultaneous writes and reads of the same key in the map. On Thursday, July 19, 2018, 10:17:18 PM GMT+3, Gian Merlino wrote: Hi Will, Check out also this thread for related discussion: https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E On Thu, Jul 19, 2018 at 11:21 AM Will Lauer wrote: > A colleague recently pointed out to me that all the sketch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tell with other aggregators in > the Druid code base, there doesn't appear to be a need to synchronize. It > looks like Aggregators are always processed from within a single thread. Is > it reasonable to remove all the syncrhonizations from the SketchAggregator > and avoid the performance hit that they impose at runtime? > > Will > > Will Lauer > Senior Principal Architect > > Progress requires pain > > m: 508.561.6427 > > o: 217.255.4262 >
Re: synchronization question about datasketches aggregator
Hi Will, Check out also this thread for related discussion: https://lists.apache.org/thread.html/9899aa790a7eb561ab66f47b35c8f66ffe695432719251351339521a@%3Cdev.druid.apache.org%3E On Thu, Jul 19, 2018 at 11:21 AM Will Lauer wrote: > A colleague recently pointed out to me that all the sketch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tell with other aggregators in > the Druid code base, there doesn't appear to be a need to synchronize. It > looks like Aggregators are always processed from within a single thread. Is > it reasonable to remove all the syncrhonizations from the SketchAggregator > and avoid the performance hit that they impose at runtime? > > Will > > Will Lauer > Senior Principal Architect > > Progress requires pain > > m: 508.561.6427 > > o: 217.255.4262 >
Re: synchronization question about datasketches aggregator
There is a race between aggregators and ingestion updates. Actually, many aggregators are vulnerable now. See this issue: https://github.com/apache/incubator-druid/pull/3956 and a conversation starting from this message: https://github.com/apache/incubator-druid/pull/5148#discussion_r170906998. However, you could replace a simple synchronized with ReadWriteLock or Striped (see ArrayOfDoublesSketchMergeBufferAggregator for example), that would be a useful contribution to Druid. On Thu, 19 Jul 2018 at 13:21, Will Lauer wrote: > A colleague recently pointed out to me that all the sketch operations that > take place in SketchAggregator (in the datasketches module) use a > SychronizedUnion class that basically wraps a normal sketch Union and > synchronizes all operations. From what I can tell with other aggregators in > the Druid code base, there doesn't appear to be a need to synchronize. It > looks like Aggregators are always processed from within a single thread. Is > it reasonable to remove all the syncrhonizations from the SketchAggregator > and avoid the performance hit that they impose at runtime? > > Will > > Will Lauer > Senior Principal Architect > > Progress requires pain > > m: 508.561.6427 > > o: 217.255.4262 >