[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-08-05 Thread GitBox


nsivabalan commented on pull request #1149:
URL: https://github.com/apache/hudi/pull/1149#issuecomment-669299989


   I also thought we could add an abstract class 
UserDefinedBulkInsertPartitioner and make arePartitionRecordsSorted() false by 
default, but this also requires users to change from "implements" to "extends" 
;) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-08-05 Thread GitBox


nsivabalan commented on pull request #1149:
URL: https://github.com/apache/hudi/pull/1149#issuecomment-669298777


   We have introduced new sort modes as part of this PR and respective bulk 
insert partitioners. All these were using UserDefinedBulkInsertPartitioner 
interface, but then these newly added ones are not user defined but thats the 
only interface we had in place. So it made sense to rename this interface as 
BulkInsertPartitioner as it is being used by newly added readily available 
partitioners. But I didn't think too much about existing users. So, open to 
adding a new empty interface called UserDefinedBulkInsertPartitioner that 
extends from BulkInsertPartitioner. 
   But my point was, even if we do that, we can't avoid users needing to make 
changes to accommodate the new method. So, my point was why not users implement 
BulkInsertPartitioner instead of UserDefinedBulkInsertPartitioner incase some 
changes are required from their end. (BulkInsertPartitioner interface is 
exactly same as old UserDefinedBulkInsertPartitioner with just an addition of 
one method arePartitionRecordsSorted() ) 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-08-03 Thread GitBox


nsivabalan commented on pull request #1149:
URL: https://github.com/apache/hudi/pull/1149#issuecomment-668327303


   @n3nash ^ 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-08-02 Thread GitBox


nsivabalan commented on pull request #1149:
URL: https://github.com/apache/hudi/pull/1149#issuecomment-667721158


   I thought of adding an empty interface called 
UserDefinedBulkInsertPartitioner, but then BulkInsertPartitioner itself has a 
new method  boolean arePartitionRecordsSorted(); So, anyways, existing users 
will have to make code changes to give implementation to this method. So, not 
sure if there will be benefit adding an empty UserDefinedBulkInsertPartitioner 
interface extending from BulkInsertPartitioner. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-31 Thread GitBox


nsivabalan commented on pull request #1149:
URL: https://github.com/apache/hudi/pull/1149#issuecomment-667082374


   I noticed that base interface don't need to be defined as "UserDefined" 
anymore since its applicable to our own partitioners as well. So, have stripped 
off "UserDefined" from the naming. Also, have renamed args and variables to 
userDefinedBulkInsertPartitioner wherever applicable. Once CI succeeds, will 
land this PR. @yihua : thanks for helping out with this. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-20 Thread GitBox


nsivabalan commented on pull request #1149:
URL: https://github.com/apache/hudi/pull/1149#issuecomment-661451741


   sure, thanks. Once done, do ping me and vinoth for review. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org