vikaskr22 commented on PR #569: URL: https://github.com/apache/ranger/pull/569#issuecomment-3116762042
@fateh288 , thanks for the implementation. Following is the review comments on design part, not on implementation. As per one of the discussion with you and my understanding, consider following scenario: There is a change in the format of the data in one particular column. Say Table1.Col1, as per current approach, during schema upgrade, a new column (say col2) of required type will be created and older data will be copied to new column after transformation. Now we may have two scenario: 1. New binaries not yet applied, means, read/write will continue to happen with old column. How you are handling the case, where a column is being updated by the application after copying from col1-> col2 ? If we are using a trigger on each data modification that will copy to new column. Then, we have multiple triggers for the same column ( if multiple modification is happening), and these triggers may also overlap with the job/cursor that is copying old format data to new column. This scenario is posing two risks, 1. Triggers processing ( due to multiple updates on same column ) may be processed in different order and older data may be written to new column. 2. what if some trigger processing/execution fails, We may log this, report this. But do we have any systematic way ( out of box from the framework itself) to detect this and retry ? I can think of one solution that may work but please let me know if it fits in the current approach: Step1: Apply the schema changes, means, create a new column. Step2: Using dynamic configuration (needs to be implemented), let running instances know that ZDU process has started and in such cases they will write to old column in older format and additionally they will also write an event in one table (say ZDU upgrade table, a new table). After writing at both places , then only, transaction will be successful. Step3: As part of your framework, that contains logic to migrate data from col1 to col2, you code/cursor should read event in the insertion order to process this and once it is processed, then only it should be deleted from table. If any Runtime error occurs, since it has not been deleted, it will be retried. This approach ensures, it would be processed in the order in which they occurred. And , out of bix from the framework, there would be way to know if all migration done or not. We should also consider adding one step in "Finalisation Step", to check that this new event table should be empty. This is just an idea, feel free to add your input if above scenario can be handled in different or better way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org