lw309637554 commented on issue #2652: URL: https://github.com/apache/hudi/issues/2652#issuecomment-796829101
> 1. does the mapping of [<key,partitionpath> -> fileGroupId ] changed after clustering ? the record may wrote to another filegroup? > 2. clusting sort the columns, does it change the physical path of the record to different location which not a partition path by using inlinefs ? > 3. does clustering work on full hudi table or we can choose some partitions? > 4. why clustering ignore the file which size over the targetFileSize? if we ignore it, we should cost time for full scan this file. > 5. when some file is compacting , does clutering scheduler will ignore these files , and then clustering running will still @shenbinglife @vinothchandar hello, i can reply it . 1. yes, the mapping changed. Will write to another file group. 2. clustering sort ,just make the records in a filegroup are sorted. It use spark RDDCustomColumnsSortPartitioner. 3. Now it will work on full table. And every time will choose "hoodie.clustering.plan.strategy.daybased.lookback.partitions" num partition to clustering. You can set the param. 4. Clustering will make small file to large file. Because with large file spark or presto can split it . Performance better. 5. yes ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org