Hello everyone,

We see ~60% improvement in query runtime for some datasets. See an example
documented here
<https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-PerformanceEvaluation>.
Please try out this feature and share any feedback.
I have included commands to run async clustering in the example section
<https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-PerformanceEvaluation>.
You could also setup inline clustering using commands in this section
<https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+freshness+and+query+performance#RFC19Clusteringdataforfreshnessandqueryperformance-Commandstoscheduleandrunclustering>
.

Thanks
Satish

On Tue, Dec 22, 2020 at 10:32 PM Vinoth Chandar <vin...@apache.org> wrote:

> Please help us test this more, before RC is cut! :)
>
> On Tue, Dec 22, 2020 at 10:23 PM Satish Kotha <satishko...@uber.com.invalid
> >
> wrote:
>
> > Hello all,
> >
> > Clustering feature landed <https://github.com/apache/hudi/pull/2263> on
> > master branch and is available in beta. This feature can be used to do
> > following
> > 1) Stitch small files into larger files
> > 2) Change data layout on disk by sorting data using different columns
> (for
> > query/storage optimization)
> >
> > If you are interested in the above use cases, appreciate it if you can
> try
> > out this feature. I have included commands to run clustering in this
> > section
> > <
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+19+Clustering+data+for+speed+and+query+performance#RFC19Clusteringdataforspeedandqueryperformance-Commandstoscheduleandrunclustering
> > >
> > (along
> > with caveats as this feature is still in beta).
> >
> > Any feedback is welcome. I'm also on #general room in slack. Please feel
> > free to ping me if you have any questions/comments.
> >
> > Thanks
> > Satish
> >
>

Reply via email to