Jackie-Jiang commented on pull request #8413:
URL: https://github.com/apache/pinot/pull/8413#issuecomment-1079496589


   > The problem that we are running into is that for GDPR etc., we need to be 
able to purge records from a segment based on values of a particular field and 
if we change the name of the field that is being ingested into Pinot, then we 
loose information that column 'x' in the Pinot table actually came from field 
'y' in the Kafka event / avro schema and hence cannot purge records 
automatically based on orginal avro schema field name 'y' in minion.
   
   If you transform the value within column 'x', even if you can find the 
column, the value is no longer the original value, how do you apply the purge 
logic?
   Also, even if you can modify the value properly, when generating the new 
segment, it will use the record transformer to process the records again, which 
will cause the transform twice problem. If the record transform step is 
skipped, then there is no guarantee that the value type is correct.
   Making all transforms idempotent can make it much more robust.
   
   > Definitely open to suggestions and discussion, but my understanding is 
that ingestion transform functions are applied only during ingestion where the 
original field is in kafka/avro and the transformed value goes into Pinot 
column, so this should be safe right? If you have any particular usecase that 
may not be safe I can try them out?
   
   Ingestion transforms can be used during ingestion and also during reload to 
generate the derived column. Also, on the minion side, the segment can be read 
as source file, and fed into the ingestion engine again, which may transform 
the records again. We have to take extra care to make it right if the transform 
is not idempotent.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to