Is it possible to run to run compaction asynchronously while upserting via Spark DataSource writer

Zuyeu, Anton Wed, 24 Jun 2020 15:56:25 -0700

Hi team,

We are upserting incremental dataFrames into our MoR spark table using spark 
datasource writer. Currently we are running compaction inline. We would want to 
have our compaction running asynchronously. As far as I understand to do so our 
only option is to utilize DeltaStreamer. The problem with that is that it seems 
like DeltaStreamer was built to orchestrate all writes to the table ( upserts 
and compaction), where is we want to have our own job to take care of upserting 
but DeltaStream to take care of compaction only (scheduling compaction, running 
, rerunning failed compactions etc). So the question is : is it even possible? 
After looking into DeltaStreamer parameter, what if we supply some mock class 
as -source-class, so DeltaStreamer can pull empty incremental data and 
therefore don't upsert anything but still run compactions based on its 
schedule, will it work? Please also share if there is other ways to achieve 
async compaction without using DeltaStreamer.


Thank you,
Anton Zuyeu

Is it possible to run to run compaction asynchronously while upserting via Spark DataSource writer

Reply via email to