[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-669299989 I also thought we could add an abstract class UserDefinedBulkInsertPartitioner and make arePartitionRecordsSorted() false by default, but this also requires users to change from "implements" to "extends" ;) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-669298777 We have introduced new sort modes as part of this PR and respective bulk insert partitioners. All these were using UserDefinedBulkInsertPartitioner interface, but then these newly added ones are not user defined but thats the only interface we had in place. So it made sense to rename this interface as BulkInsertPartitioner as it is being used by newly added readily available partitioners. But I didn't think too much about existing users. So, open to adding a new empty interface called UserDefinedBulkInsertPartitioner that extends from BulkInsertPartitioner. But my point was, even if we do that, we can't avoid users needing to make changes to accommodate the new method. So, my point was why not users implement BulkInsertPartitioner instead of UserDefinedBulkInsertPartitioner incase some changes are required from their end. (BulkInsertPartitioner interface is exactly same as old UserDefinedBulkInsertPartitioner with just an addition of one method arePartitionRecordsSorted() ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-668327303 @n3nash ^ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-667721158 I thought of adding an empty interface called UserDefinedBulkInsertPartitioner, but then BulkInsertPartitioner itself has a new method boolean arePartitionRecordsSorted(); So, anyways, existing users will have to make code changes to give implementation to this method. So, not sure if there will be benefit adding an empty UserDefinedBulkInsertPartitioner interface extending from BulkInsertPartitioner. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-667082374 I noticed that base interface don't need to be defined as "UserDefined" anymore since its applicable to our own partitioners as well. So, have stripped off "UserDefined" from the naming. Also, have renamed args and variables to userDefinedBulkInsertPartitioner wherever applicable. Once CI succeeds, will land this PR. @yihua : thanks for helping out with this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-661451741 sure, thanks. Once done, do ping me and vinoth for review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org