This sounds like hotspotting. Ideally the workload over the keyspace can be better distributed, which is another avenue of attack - partitioning, keying strategy.
> On Oct 13, 2016, at 6:10 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > > The DAG for a template just happens to schedule 2 tasks that do something > like this: > > val fieldsRDD: RDD[(ItemID, PropertyMap)] = PEventStore.aggregateProperties( > appName = dsp.appName, > entityType = "item")(sc) > > to execute in parallel > > The PEventStore calls from 2 separate closures start hitting HBase and it > fails, no matter how high I set the RPC and Scanner Timeout. > > This has only come up recently with some restructuring, which I assume caused > the 2 tasks to end up at the same point in the DAG. Is there a way to force > one HBase related task to complete before the other is started? They both > return RDDs, which are lazy evaluated like promises until the data is needed. > Can I force the promise to be kept? >