Re: dump to cgdNb device

2016-06-15 Thread Christos Zoulas
In article <20160615224955.GA2440@neva>,
Alexander Nasonov   wrote:
>Hi,
>
>I setup an encrypted disk cgd1 (aes-cbc 256 on top of wd0g, disklabel
>verification) with a dump device cgd1b but I can't dump to it (I enter
>ddb and type sync to dump). It prints "device bad".
>
>If I enable CGDDEBUG and set cgddebug to 1, it prints some additional
>information (typing from a photo):
>

That means that either cpu_dump or the next function returned ENXIO.
The code is very careful to dump only on partitions that are marked as
swap and on devices it knows about to avoid accidents. Dumping on
"software" devices is very dangerous since typically when you are dumping
you've probably corrupted memory already.

christos



Re: dump to cgdNb device

2016-06-15 Thread Taylor R Campbell
   Date: Wed, 15 Jun 2016 23:49:55 +0100
   From: Alexander Nasonov 

   I setup an encrypted disk cgd1 (aes-cbc 256 on top of wd0g, disklabel
   verification) with a dump device cgd1b but I can't dump to it (I enter
   ddb and type sync to dump). It prints "device bad".

   [...]

   Is it expected to work at all?

I have not witnessed it working.  But I would like it to work!

By a quick skim of cgddump and dk_dump, it looks likely that the
missing part is a definition for cgddkdriver.d_dumpblocks.  Here's a
candidate definition that you might try, attached.  (Not tested, not
even compile-tested, caveat lector, &c.)
static int
cgd_dumpblocks(device_t dev, void *va, daddr_t blkno, int nblk)
{
struct cgd_softc *sc;
struct dk_softc *dksc;
struct disk_geom *dg;
struct disklabel *lp;
size_t nbytes;
void *buf;
int error;

/* Try to get the cgd device state.  */
sc = device_private(dev);
if (sc == NULL)
/* No cgd device: give up.  */
return ENXIO;

/* Extract the disk parameters: geometry, disklabel.  */
dksc = &sc->sc_dksc;
dg = &dksc->sc_dkdev.dk_geom;
lp = dksc->sc_dkdev.dk_label;

/*
 * Compute the number of bytes in this request, which dk_dump
 * has `helpfully' converted to a number of blocks for us.
 */
nbytes = nblk*lp->d_secsize;

/* Try to acquire a buffer to store the ciphertext.  */
buf = cgd_getdata(dksc, nbytes);
if (buf == NULL)
/* Out of memory: give up.  */
return ENOMEM;

/* Encrypt the caller's data into the temporary buffer.  */
cgd_cipher(sc, buf, va, nbytes, blkno, dg->dg_secsize,
CGD_CIPHER_ENCRYPT);

/* Pass it on to the underlying disk device.  */
error = bdev_dump(sc->sc_tdev, blkno, buf, nbytes);

/* Release the buffer.  */
cgd_putdata(dksc, buf);

/* Return any error from the underlying disk device.  */
return error;
}


Re: dump to cgdNb device

2016-06-15 Thread Taylor R Campbell
   Date: Thu, 16 Jun 2016 02:52:59 +
   From: Taylor R Campbell 

   By a quick skim of cgddump and dk_dump, it looks likely that the
   missing part is a definition for cgddkdriver.d_dumpblocks.  Here's a
   candidate definition that you might try, attached.  (Not tested, not
   even compile-tested, caveat lector, &c.)

Slightly simpler version attached -- device_private is not
device_lookup; device_private never fails.  (If the device_lookup in
cgddump failed, we wouldn't have gotten to cgd_dumpblocks anyway.)
static int
cgd_dumpblocks(device_t dev, void *va, daddr_t blkno, int nblk)
{
struct cgd_softc *sc = device_private(dev);
struct dk_softc *dksc = &sc->sc_dksc;
struct disk_geom *dg = &dksc->sc_dkdev.dk_geom;
struct disklabel *lp = dksc->sc_dkdev.dk_label;
size_t nbytes;
void *buf;
int error;

/*
 * Compute the number of bytes in this request, which dk_dump
 * has `helpfully' converted to a number of blocks for us.
 */
nbytes = nblk*lp->d_secsize;

/* Try to acquire a buffer to store the ciphertext.  */
buf = cgd_getdata(dksc, nbytes);
if (buf == NULL)
/* Out of memory: give up.  */
return ENOMEM;

/* Encrypt the caller's data into the temporary buffer.  */
cgd_cipher(sc, buf, va, nbytes, blkno, dg->dg_secsize,
CGD_CIPHER_ENCRYPT);

/* Pass it on to the underlying disk device.  */
error = bdev_dump(sc->sc_tdev, blkno, buf, nbytes);

/* Release the buffer.  */
cgd_putdata(dksc, buf);

/* Return any error from the underlying disk device.  */
return error;
}


Re: dump to cgdNb device

2016-06-15 Thread Alexander Nasonov
Christos Zoulas wrote:
> That means that either cpu_dump or the next function returned ENXIO.
> The code is very careful to dump only on partitions that are marked as
> swap and on devices it knows about to avoid accidents. Dumping on
> "software" devices is very dangerous since typically when you are dumping
> you've probably corrupted memory already.

There is a risk even with hardware devices but it's smaller because less
software is involved. Dumping to cgd is a quite important usecase and
perhaps we should make an exception. Would it help to RO protect some
data structures like private keys?

Alex


re: dump to cgdNb device

2016-06-16 Thread matthew green
> That means that either cpu_dump or the next function returned ENXIO.
> The code is very careful to dump only on partitions that are marked as
> swap and on devices it knows about to avoid accidents. Dumping on
> "software" devices is very dangerous since typically when you are dumping
> you've probably corrupted memory already.

i've never heard of anyone having a corrupted file system because
of a bad dump, and this is across a large number of systems and
years.  while this case is certainly possible, it seems to be
highly improbable.

we've supported dump to raid for a long while now, and it's very
useful.  dump to cgd should work, too.


.mrg.


Re: dump to cgdNb device

2016-06-16 Thread Michael van Elst
campbell+netbsd-tech-k...@mumble.net (Taylor R Campbell) writes:

>static int
>cgd_dumpblocks(device_t dev, void *va, daddr_t blkno, int nblk)

Looks fine, but should consistently use lp->d_secsize.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: dump to cgdNb device

2016-06-16 Thread Michael van Elst
al...@yandex.ru (Alexander Nasonov) writes:

>There is a risk even with hardware devices but it's smaller because less
>software is involved. Dumping to cgd is a quite important usecase and
>perhaps we should make an exception. Would it help to RO protect some
>data structures like private keys?

You would need to protect all data that is required to dump a block,
the keys aren't more important than e.g. the disklabel or the
bus space handle of the disk controller.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Michael van Elst wrote:
> al...@yandex.ru (Alexander Nasonov) writes:
> 
> >There is a risk even with hardware devices but it's smaller because less
> >software is involved. Dumping to cgd is a quite important usecase and
> >perhaps we should make an exception. Would it help to RO protect some
> >data structures like private keys?
> 
> You would need to protect all data that is required to dump a block,
> the keys aren't more important than e.g. the disklabel or the
> bus space handle of the disk controller.

True, but you have to protect disklabel even for "hardware" devices. My
point was about protecting code specific to a "software" device to make
it looks more like "hardware" device.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Christos Zoulas
On Jun 16,  7:18am, al...@yandex.ru (Alexander Nasonov) wrote:
-- Subject: Re: dump to cgdNb device

| Christos Zoulas wrote:
| > That means that either cpu_dump or the next function returned ENXIO.
| > The code is very careful to dump only on partitions that are marked as
| > swap and on devices it knows about to avoid accidents. Dumping on
| > "software" devices is very dangerous since typically when you are dumping
| > you've probably corrupted memory already.
| 
| There is a risk even with hardware devices but it's smaller because less
| software is involved. Dumping to cgd is a quite important usecase and
| perhaps we should make an exception. Would it help to RO protect some
| data structures like private keys?

Well, we could make an exception on cgd... We have to think carefully
what to do to make it safer. Perhaps it should be turned on via a sysctl
or a kernel option only?

christos


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Slightly simpler version attached -- device_private is not
> device_lookup; device_private never fails.  (If the device_lookup in
> cgddump failed, we wouldn't have gotten to cgd_dumpblocks anyway.)

It dumped succesfully but savecore doesn't see any valid core. If I
run savecore -f, it prints gibberish.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Taylor R Campbell
   Date: Thu, 16 Jun 2016 20:20:14 +0100
   From: Alexander Nasonov 

   Taylor R Campbell wrote:
   > Slightly simpler version attached -- device_private is not
   > device_lookup; device_private never fails.  (If the device_lookup in
   > cgddump failed, we wouldn't have gotten to cgd_dumpblocks anyway.)

   It dumped succesfully but savecore doesn't see any valid core. If I
   run savecore -f, it prints gibberish.

Just to double-check (since I am likely to make this mistake myself!),
you're not using a random-keyed cgd, right?  Random-keyed cgd is great
for swap while the key is in memory but not so great for dump after
rebooting!

Does strings(1) on /dev/rcgd1b show anything vaguely meaningful?

Failing that, I might start sprinkling debug prints in everywhere.

Hmm...  I might have caused your dump to scribble all over another
part of the disk, by not adjusting the blkno before passing it along
to bdev_dump.  I'm not sure about this -- but the blkno adjustments
certainly need to be reviewed.


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Just to double-check (since I am likely to make this mistake myself!),
> you're not using a random-keyed cgd, right?  Random-keyed cgd is great
> for swap while the key is in memory but not so great for dump after
> rebooting!

I have both types but the one I was testing was definitely with a
persistent key.

> Does strings(1) on /dev/rcgd1b show anything vaguely meaningful?

Yes, I see this string:

NetBSD 7.99.30 (CGDDEBUG) #0: Thu Jun 16 19:40:27 BST 2016

alnsn@neva:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/CGDDEBUG

This is the kernel I was running. The disk was initially filled
with random bits and CGDDEBUG kernel didn't exist back then.

> Hmm...  I might have caused your dump to scribble all over another
> part of the disk, by not adjusting the blkno before passing it along
> to bdev_dump.  I'm not sure about this -- but the blkno adjustments
> certainly need to be reviewed.

I don't see any problem after reboot. I cat'ed all files in neighbour
partitions (496M and 3G) to /dev/null and I don't see any error. There
aren't full, though. One if about 50% full, the other is 23% full.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Taylor R Campbell
   Date: Thu, 16 Jun 2016 21:35:11 +0100
   From: Alexander Nasonov 

   Taylor R Campbell wrote:
   > Does strings(1) on /dev/rcgd1b show anything vaguely meaningful?

   Yes, I see this string:

   NetBSD 7.99.30 (CGDDEBUG) #0: Thu Jun 16 19:40:27 BST 2016
   
alnsn@neva:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/CGDDEBUG

   This is the kernel I was running. The disk was initially filled
   with random bits and CGDDEBUG kernel didn't exist back then.

So maybe it did get written, but there's just something funny about
it.  If you pass -v to savecore, can you find what aspect of the dump
savecore doesn't like?


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> So maybe it did get written, but there's just something funny about
> it.  If you pass -v to savecore, can you find what aspect of the dump
> savecore doesn't like?

I don't see any difference between -vf and -f.

savecore: magic number mismatch (0x0 != 0x8fca0101)
savecore: no core dump
savecore: warning: /dev/ksyms version mismatch:
NetBSD 7.99.30 (GENERIC) #0: Sun May 29 20:24:03 BST 2016

alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC

and atad

savecore: msgbuf magic incorrect
savecore: reboot after panic: <<< 1K+ of non-printable nonsense >>>
savecore: dump time is zero
savecore: writing core to /var/crash/netbsd.5.core
savecore: writing kernel to /var/crash/netbsd.5
savecore: (null): Bad address
dumplo = 110530048 (215879 * 512)

Alex


Re: dump to cgdNb device

2016-06-16 Thread Taylor R Campbell
   Date: Thu, 16 Jun 2016 21:48:18 +0100
   From: Alexander Nasonov 

   Taylor R Campbell wrote:
   > So maybe it did get written, but there's just something funny about
   > it.  If you pass -v to savecore, can you find what aspect of the dump
   > savecore doesn't like?

   I don't see any difference between -vf and -f.

   savecore: magic number mismatch (0x0 != 0x8fca0101)

Can you ktrace it to see what devices it's actually looking at?

Every time I try to scrutinize savecore it makes me dizzy.  I think if
you selected cgd1b as the dump device with `swapctl -D', rather than
using the default of b, and/or if the running kernel image
doesn't match the dumped kernel image, savecore may get confused.

Maybe savecore should have an option to specify what the dump device
is supposed to be.


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Taylor R Campbell wrote:
> Can you ktrace it to see what devices it's actually looking at?
> 
> Every time I try to scrutinize savecore it makes me dizzy.  I think if
> you selected cgd1b as the dump device with `swapctl -D', rather than
> using the default of b, and/or if the running kernel image
> doesn't match the dumped kernel image, savecore may get confused.

Indeed. When I renamed my kernel to /netbsd, savecore worked.

> Maybe savecore should have an option to specify what the dump device
> is supposed to be.

I think it has but I didn't try it.

Alex


Re: dump to cgdNb device

2016-06-16 Thread Alexander Nasonov
Alexander Nasonov wrote:
> Taylor R Campbell wrote:
> > Maybe savecore should have an option to specify what the dump device
> > is supposed to be.
> 
> I think it has but I didn't try it.

Ah, I misread your sentence. savecore understands -N but I didn't try
it.

-- 
Alex


Re: dump to cgdNb device

2016-06-16 Thread Robert Elz
Date:Fri, 17 Jun 2016 00:21:13 +0100
From:Alexander Nasonov 
Message-ID:  <20160616232113.GA2093@neva>

  | Ah, I misread your sentence. savecore understands -N but I didn't try
  | it.

Last time I tried it, it failed.   As I recall (it was a while ago now,
and so may have been fixed in the interim) the techniques savecore uses
for the "currently running kernel" case (which is assumed to be /netbsd)
and the -N case (some kernel running some other time - even if it is
-N /netbsd) were quite different.   I suspected the -N code might have
bit rotted, but never investigated enough to find out what, where, or why.

kre