bvaradar commented on issue #1556:
URL: https://github.com/apache/incubator-hudi/issues/1556#issuecomment-620303776


   @HariprasadAllaka1612 Not sure if I completely understand the context here. 
   
   Questions inline related to your descriptions ?
   
   1. Reading CDC table from hive (hoodie table) to get the latest marker,
        what do you mean by marker ? Is it commit time of Hudi or some 
timestamped directory that you are using as input folder ?
   2. Read the files from S3 based on the latest marked read in step1. 
        Are you reading the files directly or from running incremental query 
here ?
   
   In general, this could be eventual consistency issue too. Does the path 
s3a://gat-datalake-refined-dev/reports/player/dat/2020/04/23 belong to the CDC 
table ? Does it actually exist when you do aws s3 ls ? Did CDC pipeline ran 
with consistency guard enabled ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to