Re: [Sks-devel] Clocks, timers, PTree and wiki advice
-BEGIN PGP SIGNED MESSAGE- Hash: RIPEMD160 On 2012-03-25 at 03:01 -0500, John Clizbe wrote: Phil Pennock wrote: My best guess, which is *only* a guess and I haven't had time to investigate and so I didn't provide it on the page, is that the timestamp is being put into a key somewhere, so when you get two events with the same time, you have two items in a log and one in the tree and the consistency check at the SKS level discovers that one of the two logged items isn't in the tree, and things bail on the corruption. Not certain of the cause either, but I see the same error on Windows, which is coarsely clocked, when trying to get SKS to run on top of Cygwin. I did briefly look at the code after writing the mail. It's been too long since I learnt the little bit of O'Caml I learnt to hack on SKS, so I've not picked it apart very far ... But it looks as though SKS maintains an in-memory copy of an entry and an on-disk copy, and tries to keep the two sets in sync. One is a list, I think the one on disk was a binary tree? So yes: SKS sees two entries in the list in memory (morally the log in what I wrote, but really a cache), but only one can be found/deleted on disk, therefore the disk version is deemed corrupt. Thus SKS's key indexing system fundamentally requires a clock granular enough to not have two entries occur at the same timestamp. I suspect that the best fix is to use side-effect programming, store the timestamp of the last generated key index, and if the new timestamp is = previous timestamp, then take the previous timestamp, add 1 nanosecond, use that and store it back as the new previous timestamp. This will fail when we have systems with instruction cycles measured in picoseconds and associated granularity clocks, but then we can just switch to adding 1 picosecond instead. ;) The less than check also protects against clocks being stepped backward and replaying, with any conflicts that might arise then; with a leap-second coming up in June, systems which replay the second instead of slowing time are especially likely to benefit from this protection. Note that here I am back-seat programming as I don't intend to provide the patch myself. Sorry. Hopefully the reasoning and suggestion are of merit. - -Phil -BEGIN PGP SIGNATURE- iEYEAREDAAYFAk9wUpAACgkQQDBDFTkDY38UtwCfXSdgv4Ehj6C4g3nD/qXF0YmK jfQAn17t6nynNwVwP1+0Xcf2hE1A811i =yKbq -END PGP SIGNATURE- ___ Sks-devel mailing list Sks-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/sks-devel
Re: [Sks-devel] Clocks, timers, PTree and wiki advice
Am 24.03.2012 23:26, schrieb Phil Pennock: Virtual Machine issues There are some issues with clock-keeping mechanisms in some virtual machines (VMs) affecting the Berkeley DB used for PTrees; if the clock resolution is too low, multiple entries occur at the same timestamp and the DB becomes corrupted. After moving to tsc clocksource, ptree problem seems to be void :-) This may prevent some people for some sleepless nights... kernels locking up on SMP instances. If running SKS in a VM instance, you should probably constrain it to a single CPU. That is not true for kvm virtualization, my vm guest uses multiple cpus w/o problems. SKS seems to be stable now. Does this make sense to the folks who've encountered and fixed this problem? Is it accurate? Is this problem really caused by Berkeley DB ? Error message should tell as much as possible about the cause. ...issue. I also have usually run systems on bare metal, rather than in VMs, so this is beyond my expertise. Folks? If people do so, I want not to ask them why, because they have (hopefully) a good reason for doing so. In my case I have a big iron in a hosting center and this host has to run several different operating systems. Christian ___ Sks-devel mailing list Sks-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/sks-devel
Re: [Sks-devel] Clocks, timers, PTree and wiki advice
On Mar 25, 2012, at 2:07 AM, Phil Pennock wrote: On 2012-03-24 at 21:08 -0400, Jeffrey Johnson wrote: I question the analysis (but not what is observed). I do too. I've no idea why this is happening, which is why the summary is so vague. But I'm not an SKS developer. My best guess, which is *only* a guess and I haven't had time to investigate and so I didn't provide it on the page, is that the timestamp is being put into a key somewhere, so when you get two events with the same time, you have two items in a log and one in the tree and the consistency check at the SKS level discovers that one of the two logged items isn't in the tree, and things bail on the corruption. Absent investigating that, I summarised the view expressed on the list by those experiencing the problem. Apologies: I did not meant to blame anyone. I'd just like to see accuracy. I'm a bit overly sensitive to seeing obscure problems blamed on Berkeley DB. (aside) Diagnosing problems given only a corruption hint in a problem report has been at least a weekly experience for me for years because of RPM. Don't take the comments personally: documenting how to change the time source in a VM is a reasonable approach to running SKS in a VM instance. 73 de Jeff ___ Sks-devel mailing list Sks-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/sks-devel