Re: Celeborn End to End correctness/Data Integrity Checks

Keyong Zhou Fri, 18 Oct 2024 05:34:58 -0700

Hi Gaurav,

Currently Celeborn doesn't check row count read and written, however
Celeborn integrates
with Spark's metrics and you can check through the Spark UI. Also I think
it's possible to add
integrity check based on the metrics.


Regards,
Keyong Zhou

Ethan Feng <[email protected]> 于2024年10月18日周五 18:35写道：

> Hi Gaurav,
>
> I hope this message finds you well.
>
> As you may have read in the provided link, Celeborn has successfully
> implemented exactly-once processing for data batches. To ensure data
> integrity within these batches, the shuffle data is compressed. If any
> issues arise with the shuffle data, decompression will fail, and the
> client will be notified, ensuring that only correct data batches are
> processed.
>
> However, it's important to note that there is currently no data
> integrity check for the header of a data batch. To address this, we
> plan to implement a checksum feature [0] to provide comprehensive data
> validation.
>
> If you have any further questions or need additional clarification on
> any specific checks, please don't hesitate to reach out.
>
>
> Ethan Feng
>
> [0]https://issues.apache.org/jira/browse/CELEBORN-894
>
> Gaurav Mittal <[email protected]> 于2024年10月17日周四 04:44写道：
>
> >
> > Hi Celeborn devs,
> >
> > I am trying to better understand the end-2-end data integrity checks that
> > exist in Celeborn today
> > * I saw some details about invariants that allow for Exactly Once
> Behavior
> > here
> > <
> https://celeborn.apache.org/docs/latest/developers/faulttolerant/#exactly-once
> >
> > .
> > * Are there other checks that are performed that help guarantee data
> > correctness such as row count validation - total number of rows read by
> > reducers for a partition are equal to the number of rows written by the
> > mappers for that partition?
> >
> > Thanks
> > Gaurav
>

Re: Celeborn End to End correctness/Data Integrity Checks

Reply via email to