[GitHub] [hudi] danny0405 commented on pull request #9006: [HUDI-6404] Implement ParquetToolsExecutionStrategy for clustering

2023-07-02 Thread via GitHub


danny0405 commented on PR #9006:
URL: https://github.com/apache/hudi/pull/9006#issuecomment-1617101256

   The tests has passed: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=18153=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9006: [HUDI-6404] Implement ParquetToolsExecutionStrategy for clustering

2023-06-30 Thread via GitHub


danny0405 commented on PR #9006:
URL: https://github.com/apache/hudi/pull/9006#issuecomment-1614194198

   > 2\. Column pruning. This current change be used to run parquet_tools prune 
command on unused columns to reduce the storage footprint.
   
   So you mean, a user action like `alter table drop column a, b, c` may 
utilize this new strategy. Makes sense to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] danny0405 commented on pull request #9006: [HUDI-6404] Implement ParquetToolsExecutionStrategy for clustering

2023-06-28 Thread via GitHub


danny0405 commented on PR #9006:
URL: https://github.com/apache/hudi/pull/9006#issuecomment-1611075324

   > If there is a use case of pruning some columns to save storage memory, 
current approach of clustering will iterate over every record and remove the 
unused column, this is so much time consuming.
   
   Thanks @suryaprasanna , can you clarify what's the relationship between 
column pruning and clustering, for regular notion of Hudi clustering, it only 
merges small file groups into larger ones with optional soring on columns, 
there is no pruning happens here, how the user expects to improve the 
efficiency with this patch overall?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org