Hello everyone, We see ~60% improvement in query runtime for some datasets. See an example documented here <https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-PerformanceEvaluation>. Please try out this feature and share any feedback. I have included commands to run async clustering in the example section <https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-PerformanceEvaluation>. You could also setup inline clustering using commands in this section <https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-Commandstoscheduleandrunclustering> .
Thanks Satish On Tue, Dec 22, 2020 at 10:32 PM Vinoth Chandar <vin...@apache.org> wrote: > Please help us test this more, before RC is cut! :) > > On Tue, Dec 22, 2020 at 10:23 PM Satish Kotha <satishko...@uber.com.invalid > > > wrote: > > > Hello all, > > > > Clustering feature landed <https://github.com/apache/hudi/pull/2263> on > > master branch and is available in beta. This feature can be used to do > > following > > 1) Stitch small files into larger files > > 2) Change data layout on disk by sorting data using different columns > (for > > query/storage optimization) > > > > If you are interested in the above use cases, appreciate it if you can > try > > out this feature. I have included commands to run clustering in this > > section > > < > > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance#RFC19Clusteringdataforspeedandqueryperformance-Commandstoscheduleandrunclustering > > > > > (along > > with caveats as this feature is still in beta). > > > > Any feedback is welcome. I'm also on #general room in slack. Please feel > > free to ping me if you have any questions/comments. > > > > Thanks > > Satish > > >