Re: [PR] Enhance Compaction task to be able to write to a different/new datasource (druid)

via GitHub Sat, 18 Oct 2025 14:26:22 -0700


maytasm commented on PR #18612:
URL: https://github.com/apache/druid/pull/18612#issuecomment-3384406099


   @kfaraz I just added more detail to the PR description. Compaction task 
doesn't require you to specific all the specs (metricsSpec, etc) and can 
discover from existing segments. Specifying all the specs of a datasource is a 
big pain as Druid doesn't have the concept of schema/catalog and the schema can 
evolve/differs between segments. For example, the metricsSpec requires you to 
specify the CombiningAggregator instead of the original aggregators. All of 
this is done for you in the Compaction task
   The Compaction task is just a layer that create index_parallel task with a 
dataSource input type. We can have a new task type but I think that's making 
Druid harder to use/understand...just like adding new runtime properties and 
tuning configs. You can already use Compaction task for more than Compacting. 
There are case where users change the schema, drop dimensions, change 
granularity level, etc. You can even give it a finer segmentGranularity and it 
will be expanding the datasource 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Enhance Compaction task to be able to write to a different/new datasource (druid)

Reply via email to