kfaraz commented on PR #18612: URL: https://github.com/apache/druid/pull/18612#issuecomment-3384543274
Yes, @maytasm , I agree that we should be able to leverage the auto-discover capabilities of compaction task for re-indexing too. For that reason, it makes sense to extend that feature. > We can have a new task type but I think that's making Druid harder to use/understand...just like adding new runtime properties and tuning configs. Yeah, more runtime properties and tuning configs often make Druid harder to use and understand. In fact, my concern is the same. Overloading the same feature to satisfy completely different use cases makes things confusing and less maintainable. The question is of intent. There is no reason a user trying to move data from one DS to another should have to launch a `compact` task. Since the use case here is a new capability altogether, there is no harm in adding a new task type or a new input source type, whichever seems simpler to implement. > You can already use Compaction task for more than Compacting. There are case where users change the schema, drop dimensions, change granularity level, etc. You can even give it a finer segmentGranularity and it will be expanding the datasource Absolutely, this is one of the things we have been discussing, that a `compact` task should ideally never change the meaning of the data, only how it's laid out/partitioned (the change in this PR would only add to that discrepancy). Since we have already added this capability, we don't want to get rid of it right now as users may already be using it. In the future, the compaction templates in #18402 will have capability to validate that a template does not change the meaning of data and does only "compaction". > The schema detection, spec-autofill etc here only make sense for only one of those input source, the Druid input source. For compaction task, the input source is always Druid but for native batch it isn't. Oh, absolutely. To clarify, I meant that we should try to bring the auto-detection capabilities of `compact` task into native batch + `druid` input source, not native batch in general. But I suppose that might be more involved to implement, and perhaps an overkill anyway. We might as well just extend the `compact` task, as that seems simpler. Also, to add to the suggestions from @clintropolis , you could also consider using an MSQ INSERT/REPLACE statement. I don't know for sure if they provide all the auto-discover niceties or not but probably worth a shot. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
