Hi team,

We are upserting incremental dataFrames into our MoR spark table using spark 
datasource writer. Currently we are running compaction inline. We would want to 
have our compaction running asynchronously. As far as I understand to do so our 
only option is to utilize DeltaStreamer. The problem with that is that it seems 
like DeltaStreamer was built to orchestrate all writes to the table ( upserts 
and compaction), where is we want to have our own job to take care of upserting 
but DeltaStream to take care of compaction only (scheduling compaction, running 
, rerunning failed compactions etc). So the question is : is it even possible? 
After looking into DeltaStreamer parameter, what if we supply some mock class 
as -source-class, so DeltaStreamer can pull empty incremental data and 
therefore don't upsert anything but still run compactions based on its 
schedule, will it work? Please also share if there is other ways to achieve 
async compaction without using DeltaStreamer.

Thank you,
Anton Zuyeu

Reply via email to