Re: Are segment files identical in different replicas?
Hi Ivan I think they are identical on byte level completely. the formats whether in persistence in disk or producing or consuming are same, so kafka can use "zero copy" to send message to consumer. as we know, the behavior of replication is same with consumption. you can use tool to verify , the tool is bin/kafka-run-class.sh kafka.tools.DumpLogSegments. you can use that tool to check the magic byte, checksum etc of segment, Best, Lisheng Ivan Yurchenko 于2019年9月2日周一 下午5:58写道: > Hi Lisheng, > > Thank you. > > By "byte level" I literally mean bytes stored on disk. > > I'm looking into scenarios like this: in a cluster of Kafka brokers, a > topic is replicated in N replicas. Messages are being produced to the > topic. At some point, a segment e.g. 0123 overflows and > becomes inactive, the log file 0123.log is closed on all > brokers. Let's assume the same range of offsets is written to this segment > on all the brokers. So I wonder if these files are identical on the byte > level, that is literally byte-by-byte comparison of files. > > Let's say we stick to magic number 2 or higher, Kafka version 2.3 or higher > and as I mention previously, ignore compaction. > > I looked into the replication code and DefaultRecordBatch and DefaultRecord > implementations and it seems they should be, since the format is rather > straightforward. I also did some tests and it seems to be true at least for > a normally operating cluster. But I'd like to tap into the community > knowledge on this. > > Br, > Ivan > > On Mon, 2 Sep 2019 at 05:52, Lisheng Wang wrote: > > > Hi Ivan > > > > your assumption about question 1 is true, they will have same records in > > same order. > > > > about question 2, i'm not following what did you mean about "byte > level", > > could you plz to make more explanations? > > > > Best, > > Lisheng > > > > > > Ivan Yurchenko 于2019年8月30日周五 下午8:39写道: > > > > > Hi, > > > > > > Let's say I have a topic-partition replicated to several replicas. On > > each > > > replica there is a segment of this topic-partition containing records > > with > > > offsets N..M. I'm trying to figure out: > > > 1. Will the content of these segment files be identical on the logical > > > level? I.e., will they contain the same records in the same order, > > nothing > > > skipped, nothing extra? (I assume this is true, sounds pretty logical.) > > > 2. Will the content of these segment files be identical on the byte > > level? > > > 3. If 2 is no, will at least record layout (positions) be identical? > > > > > > Let's keep topic compaction out of consideration here. > > > > > > Could you please help me figure this out? > > > > > > Br, > > > Ivan > > > > > >
Re: Are segment files identical in different replicas?
Hi Lisheng, Thank you. By "byte level" I literally mean bytes stored on disk. I'm looking into scenarios like this: in a cluster of Kafka brokers, a topic is replicated in N replicas. Messages are being produced to the topic. At some point, a segment e.g. 0123 overflows and becomes inactive, the log file 0123.log is closed on all brokers. Let's assume the same range of offsets is written to this segment on all the brokers. So I wonder if these files are identical on the byte level, that is literally byte-by-byte comparison of files. Let's say we stick to magic number 2 or higher, Kafka version 2.3 or higher and as I mention previously, ignore compaction. I looked into the replication code and DefaultRecordBatch and DefaultRecord implementations and it seems they should be, since the format is rather straightforward. I also did some tests and it seems to be true at least for a normally operating cluster. But I'd like to tap into the community knowledge on this. Br, Ivan On Mon, 2 Sep 2019 at 05:52, Lisheng Wang wrote: > Hi Ivan > > your assumption about question 1 is true, they will have same records in > same order. > > about question 2, i'm not following what did you mean about "byte level", > could you plz to make more explanations? > > Best, > Lisheng > > > Ivan Yurchenko 于2019年8月30日周五 下午8:39写道: > > > Hi, > > > > Let's say I have a topic-partition replicated to several replicas. On > each > > replica there is a segment of this topic-partition containing records > with > > offsets N..M. I'm trying to figure out: > > 1. Will the content of these segment files be identical on the logical > > level? I.e., will they contain the same records in the same order, > nothing > > skipped, nothing extra? (I assume this is true, sounds pretty logical.) > > 2. Will the content of these segment files be identical on the byte > level? > > 3. If 2 is no, will at least record layout (positions) be identical? > > > > Let's keep topic compaction out of consideration here. > > > > Could you please help me figure this out? > > > > Br, > > Ivan > > >
Re: Are segment files identical in different replicas?
Hi Ivan your assumption about question 1 is true, they will have same records in same order. about question 2, i'm not following what did you mean about "byte level", could you plz to make more explanations? Best, Lisheng Ivan Yurchenko 于2019年8月30日周五 下午8:39写道: > Hi, > > Let's say I have a topic-partition replicated to several replicas. On each > replica there is a segment of this topic-partition containing records with > offsets N..M. I'm trying to figure out: > 1. Will the content of these segment files be identical on the logical > level? I.e., will they contain the same records in the same order, nothing > skipped, nothing extra? (I assume this is true, sounds pretty logical.) > 2. Will the content of these segment files be identical on the byte level? > 3. If 2 is no, will at least record layout (positions) be identical? > > Let's keep topic compaction out of consideration here. > > Could you please help me figure this out? > > Br, > Ivan >
Are segment files identical in different replicas?
Hi, Let's say I have a topic-partition replicated to several replicas. On each replica there is a segment of this topic-partition containing records with offsets N..M. I'm trying to figure out: 1. Will the content of these segment files be identical on the logical level? I.e., will they contain the same records in the same order, nothing skipped, nothing extra? (I assume this is true, sounds pretty logical.) 2. Will the content of these segment files be identical on the byte level? 3. If 2 is no, will at least record layout (positions) be identical? Let's keep topic compaction out of consideration here. Could you please help me figure this out? Br, Ivan