maheshguptags commented on issue #12738: URL: https://github.com/apache/hudi/issues/12738#issuecomment-2656023027
Hi > do you mean after the checkpoint failed, records ingested after that will loss? @cshuo Let's assume 10 million records are ingested into the job. While processing these records, if the job manager triggers the creation of a new Task Manager (TM) due to auto-scaling, or if a TM is manually removed (to test the scenario without auto-scaling), a checkpoint failure could occur, causing all the previously ingested data (the 10 million records) to be discarded. If new data (e.g., 1 million records) is ingested after the checkpoint failure, the new data will be successfully processed and ingested to Hudi, provided the next checkpoint succeeds. To summarize: Ingest 10M records → checkpoint failure (due to TM change) → discard all data Ingest 1M new records → checkpoint success → successfully ingested into Hudi(only 1M). Thanks Mahesh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
