vikaskr22 commented on PR #569:
URL: https://github.com/apache/ranger/pull/569#issuecomment-3116762042

   @fateh288 , thanks for the implementation. 
   
   Following is the review comments on design part, not on implementation.
   
   As per one of the discussion with you and my understanding, consider 
following scenario:
   
   There is a change in the format of the data in one particular column. Say 
Table1.Col1, as per current approach, during schema upgrade, a new column  (say 
col2) of required type will be created and older data will be copied to new 
column after transformation. Now we may have two scenario:
   1. New binaries not yet applied, means, read/write will continue to happen 
with old column. How you are handling the case, where a column is being updated 
by the application after copying from col1-> col2 ? If we are using a trigger 
on each data modification that will copy  to new column. Then, we have multiple 
triggers for the same column ( if multiple modification is happening), and 
these triggers may also overlap with the job/cursor that is copying old format 
data to new column.
   
   This scenario is posing two risks, 
   1.  Triggers processing ( due to multiple updates on same column ) may be 
processed in different order and older data may be written to new column. 
   2. what if some trigger processing/execution fails, We may log this, report 
this. But do we have any systematic way ( out of box from the framework itself) 
to detect this and retry ?
   
   I can think of one solution that may work but please let me know if it fits 
in the current approach:
   
   Step1: Apply the schema changes, means, create a new column.
   Step2: Using dynamic configuration (needs to be implemented), let running 
instances know that ZDU process has started and in such cases they will write 
to old column in older format and additionally they will also write an event in 
one table (say ZDU upgrade table, a new table). After writing at both places , 
then only, transaction will be successful. 
   
   Step3: As part of your framework, that contains logic to migrate data from 
col1 to col2, you code/cursor should read event in the insertion order to 
process this and once it is processed, then only it should be deleted from 
table. If any Runtime error occurs, since it has not been deleted, it will be 
retried.
   
   This approach ensures, it would be processed  in the order in which they 
occurred. And , out of bix from the framework, there would be way to know if 
all migration done or not.
   
   We should also consider adding one step in "Finalisation Step", to check 
that this new event table should be empty.
   
   This is just an idea, feel free to add your input if above scenario can be 
handled in different or better way.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@ranger.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to