Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot
On Tue, Aug 2, 2022 at 6:44 AM Chitvan Chhabra wrote: > > Though i could be wrong here , but what i understand is: > > After Roll back Scenerio: > > A( Primary Snapshot rollbacked to time: say 12:10:00 PM) , B(Secondary > Snapshot roll backed to time say 12:09:00 PM) Yes. > > Current time say : 12:30:00 PM > Now A must have received acks(in the past of ofcource) from B between > 12:09:00 to 12:10:00 PM , now at 12:09:01 B says i dont have the data which > might have confused A as it must be saying saying that you have already acked > me few data , then how can you say now that i don't have the data now.Hence > the Error.This is just my thought.Or does DRBD support such scenario, if yes > then that is awesome than as that prevent complete resynchronization of data. Normally, yes this works and it's just as awesome as it sounds. But in this particular case, it's utterly broken. With the resource down on both nodes, trying to raise it on either basically wigs out and says "what is this block device?" Any calls to drbdadm, drbdmeta, etc, result in a couple dozen "extent beyond end of bitmap" error messages. Rolling back to an older snapshot results in the exact same error messages, which was quite unexpected. > > Anyways with DRBD Down , you can always get the data back from ZVOL > snapshot(otr its clone) itself(assuming DRBD metadata doesnot contain actual > data ?? ) Yes, I was able to create a new DRBD resource and copy my data into it... but the initial sync for this resource takes 3-4 days... and I don't want to have to do that all the time. > > > > > On Tue, 2 Aug 2022 at 15:04, Roland Kammerer > wrote: >> >> On Tue, Aug 02, 2022 at 02:54:02PM +0530, Chitvan Chhabra wrote: >> > Unable to see the older thread.May Be it is just me.Request to share older >> > conversation as well please. >> >> we have an archive: >> https://lists.linbit.com/pipermail/drbd-user/2022-July/026252.html >> ___ >> Star us on GITHUB: https://github.com/LINBIT >> drbd-user mailing list >> drbd-user@lists.linbit.com >> https://lists.linbit.com/mailman/listinfo/drbd-user > > ___ > Star us on GITHUB: https://github.com/LINBIT > drbd-user mailing list > drbd-user@lists.linbit.com > https://lists.linbit.com/mailman/listinfo/drbd-user -- Michael D Labriola 401-316-9844 (cell) ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot
On Tue, Aug 2, 2022 at 5:04 AM Joel Colledge wrote: > > Hi Michael, > > Are you using the most recent version of drbd-utils? There have been a > few fixes over the years which might be related. I was using 9.20.2 this last time. I'm fairly certain I've been using the focal ppa from linbit for the entire life of this particular system, so I've probably always been newer than the Ubuntu version. > > Perhaps the hardware problems affected the metadata long ago and now > the corrupted metadata is present in all the snapshots. Possible. But I'm fairly certain we recreated the DRBD resources from scratch (new meta-data, initial sync, etc) after we fixed the problems... granted, I could still have problems. This particular system for whatever reason is cursed. > > If that is not the case, this looks to me more like a bug than a > misunderstanding of how DRBD works. Are you able to reproduce the > issue starting from a fresh volume? It could be that this particular > combination of device size and bitmap slot count triggers a bug that > no-one else has yet encountered. A reproducer would be necessary to > work on fixing it. Well, half of what I was looking for here was somebody else to tell me this is odd. I *should* be able to recover by rolling back to an old snapshot of the backing ZVOL on both nodes. I know I've done it for proof of concept and to rollback to fix "human error" type problems... This was the first time I've had to try to recover from something actually going wrong (from DRBD's standpoint). For the record, I did not lose any data... I could still access the ZVOL directly (e.g., mounted EXT4) and rsync into a newly created DRBD resource... but this particular resource is large and takes 3-4 days to finish the initial sync. I'd obviously like to avoid that. > > Best regards, > Joel -- Michael D Labriola 401-316-9844 (cell) ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot
Though i could be wrong here , but what i understand is: After Roll back Scenerio: A( Primary Snapshot rollbacked to time: say 12:10:00 PM) , B(Secondary Snapshot roll backed to time say 12:09:00 PM) Current time say : 12:30:00 PM Now A must have received acks(in the past of ofcource) from B between 12:09:00 to 12:10:00 PM , now at 12:09:01 B says i dont have the data which might have confused A as it must be saying saying that you have already acked me few data , then how can you say now that i don't have the data now.Hence the Error.This is just my thought.Or does DRBD support such scenario, if yes then that is awesome than as that prevent complete resynchronization of data. Anyways with DRBD Down , you can always get the data back from ZVOL snapshot(otr its clone) itself(assuming DRBD metadata doesnot contain actual data ?? ) On Tue, 2 Aug 2022 at 15:04, Roland Kammerer wrote: > On Tue, Aug 02, 2022 at 02:54:02PM +0530, Chitvan Chhabra wrote: > > Unable to see the older thread.May Be it is just me.Request to share > older > > conversation as well please. > > we have an archive: > https://lists.linbit.com/pipermail/drbd-user/2022-July/026252.html > ___ > Star us on GITHUB: https://github.com/LINBIT > drbd-user mailing list > drbd-user@lists.linbit.com > https://lists.linbit.com/mailman/listinfo/drbd-user > ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot
On Tue, Aug 02, 2022 at 02:54:02PM +0530, Chitvan Chhabra wrote: > Unable to see the older thread.May Be it is just me.Request to share older > conversation as well please. we have an archive: https://lists.linbit.com/pipermail/drbd-user/2022-July/026252.html ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot
Unable to see the older thread.May Be it is just me.Request to share older conversation as well please. Thanks and Regards, Chitvan Chhabra On Tue, 2 Aug 2022 at 14:34, Joel Colledge wrote: > Hi Michael, > > Are you using the most recent version of drbd-utils? There have been a > few fixes over the years which might be related. > > Perhaps the hardware problems affected the metadata long ago and now > the corrupted metadata is present in all the snapshots. > > If that is not the case, this looks to me more like a bug than a > misunderstanding of how DRBD works. Are you able to reproduce the > issue starting from a fresh volume? It could be that this particular > combination of device size and bitmap slot count triggers a bug that > no-one else has yet encountered. A reproducer would be necessary to > work on fixing it. > > Best regards, > Joel > ___ > Star us on GITHUB: https://github.com/LINBIT > drbd-user mailing list > drbd-user@lists.linbit.com > https://lists.linbit.com/mailman/listinfo/drbd-user > ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] corrupted resource can't be fixed be rolling back to old snapshot
Hi Michael, Are you using the most recent version of drbd-utils? There have been a few fixes over the years which might be related. Perhaps the hardware problems affected the metadata long ago and now the corrupted metadata is present in all the snapshots. If that is not the case, this looks to me more like a bug than a misunderstanding of how DRBD works. Are you able to reproduce the issue starting from a fresh volume? It could be that this particular combination of device size and bitmap slot count triggers a bug that no-one else has yet encountered. A reproducer would be necessary to work on fixing it. Best regards, Joel ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user