Re: [Sks-devel] Clocks, timers, PTree and wiki advice

2012-03-26 Thread Phil Pennock
-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160

On 2012-03-25 at 03:01 -0500, John Clizbe wrote:
 Phil Pennock wrote:
  My best guess, which is *only* a guess and I haven't had time to
  investigate and so I didn't provide it on the page, is that the
  timestamp is being put into a key somewhere, so when you get two events
  with the same time, you have two items in a log and one in the tree and
  the consistency check at the SKS level discovers that one of the two
  logged items isn't in the tree, and things bail on the corruption.

 Not certain of the cause either, but I see the same error on Windows, which is
 coarsely clocked, when trying to get SKS to run on top of Cygwin.

I did briefly look at the code after writing the mail.  It's been too
long since I learnt the little bit of O'Caml I learnt to hack on SKS, so
I've not picked it apart very far ...

But it looks as though SKS maintains an in-memory copy of an entry and
an on-disk copy, and tries to keep the two sets in sync.  One is a list,
I think the one on disk was a binary tree?

So yes: SKS sees two entries in the list in memory (morally the log in
what I wrote, but really a cache), but only one can be found/deleted on
disk, therefore the disk version is deemed corrupt.

Thus SKS's key indexing system fundamentally requires a clock granular
enough to not have two entries occur at the same timestamp.  I suspect
that the best fix is to use side-effect programming, store the timestamp
of the last generated key index, and if the new timestamp is = previous
timestamp, then take the previous timestamp, add 1 nanosecond, use that
and store it back as the new previous timestamp.

This will fail when we have systems with instruction cycles measured in
picoseconds and associated granularity clocks, but then we can just
switch to adding 1 picosecond instead.  ;)

The less than check also protects against clocks being stepped
backward and replaying, with any conflicts that might arise then; with a
leap-second coming up in June, systems which replay the second instead
of slowing time are especially likely to benefit from this protection.

Note that here I am back-seat programming as I don't intend to provide
the patch myself.  Sorry.  Hopefully the reasoning and suggestion are of
merit.

- -Phil
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAk9wUpAACgkQQDBDFTkDY38UtwCfXSdgv4Ehj6C4g3nD/qXF0YmK
jfQAn17t6nynNwVwP1+0Xcf2hE1A811i
=yKbq
-END PGP SIGNATURE-

___
Sks-devel mailing list
Sks-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/sks-devel


Re: [Sks-devel] Clocks, timers, PTree and wiki advice

2012-03-25 Thread Christian Felsing
Am 24.03.2012 23:26, schrieb Phil Pennock:
   Virtual Machine issues
 
   There are some issues with clock-keeping mechanisms in some virtual
   machines (VMs) affecting the Berkeley DB used for PTrees; if the clock
   resolution is too low, multiple entries occur at the same timestamp
   and the DB becomes corrupted.

After moving to tsc clocksource, ptree problem seems to be void :-) This
may prevent some people for some sleepless nights...

   kernels locking up on SMP instances. If running SKS in a VM instance,
   you should probably constrain it to a single CPU.

That is not true for kvm virtualization, my vm guest uses multiple cpus
w/o problems. SKS seems to be stable now.

 Does this make sense to the folks who've encountered and fixed this
 problem?  Is it accurate?

Is this problem really caused by Berkeley DB ? Error message should tell
as much as possible about the cause.

 ...issue.  I also have usually run systems on bare metal, rather than in
 VMs, so this is beyond my expertise.  Folks?

If people do so, I want not to ask them why, because they have
(hopefully) a good reason for doing so. In my case I have a big iron in
a hosting center and this host has to run several different operating
systems.

Christian

___
Sks-devel mailing list
Sks-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/sks-devel


Re: [Sks-devel] Clocks, timers, PTree and wiki advice

2012-03-25 Thread Jeffrey Johnson

On Mar 25, 2012, at 2:07 AM, Phil Pennock wrote:

 On 2012-03-24 at 21:08 -0400, Jeffrey Johnson wrote:
 I question the analysis (but not what is observed).
 
 I do too.  I've no idea why this is happening, which is why the summary
 is so vague.  But I'm not an SKS developer.
 
 My best guess, which is *only* a guess and I haven't had time to
 investigate and so I didn't provide it on the page, is that the
 timestamp is being put into a key somewhere, so when you get two events
 with the same time, you have two items in a log and one in the tree and
 the consistency check at the SKS level discovers that one of the two
 logged items isn't in the tree, and things bail on the corruption.
 
 Absent investigating that, I summarised the view expressed on the list
 by those experiencing the problem.
 

Apologies: I did not meant to blame anyone. I'd just like to see accuracy.

I'm a bit overly sensitive to seeing obscure problems blamed on Berkeley DB.

(aside)
Diagnosing problems given only a corruption hint in a problem report
has been at least a weekly experience for me for years because of RPM.

Don't take the comments personally: documenting how to change the time source 
in a VM is
a reasonable approach to running SKS in a VM instance.

73 de Jeff

___
Sks-devel mailing list
Sks-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/sks-devel