Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-14 Thread Mikael Pettersson
On Sun, Feb 10, 2019 at 5:05 PM Jens Axboe  wrote:
>
> On 2/10/19 8:44 AM, James Bottomley wrote:
> > On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> >> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
> >>  wrote:
> > [...]
> >>> I think the reason for this is that the block mq path doesn't feed
> >>> the kernel entropy pool correctly, hence the need to install an
> >>> entropy gatherer for systems that don't have other good random
> >>> number sources.
> >>
> >> That does sound plausible, I admit I didn't even consider the
> >> possibility that the old block I/O path also was an entropy source.
> >
> > In theory, the new one should be as well since the rotational entropy
> > collector is on the SCSI completion path.   I'd seen the same problem
> > but had assumed it was something someone had done to our internal
> > entropy pool and thus hadn't bisected it.
>
> The difference is that the old stack included ADD_RANDOM by default,
> so this check:
>
> if (blk_queue_add_random(q))
> add_disk_randomness(req->rq_disk);
>
> in scsi_end_request() would be true, and we'd add the randomness. For
> sd, it seems to set it just fine for non-rotational drives. Could this
> be because other devices don't? Maybe the below makes a difference.
>
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 6d65ac584eba..60e029911755 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1881,6 +1881,7 @@ struct request_queue *scsi_mq_alloc_queue(struct 
> scsi_device *sdev)
> sdev->request_queue->queuedata = sdev;
> __scsi_init_queue(sdev->host, sdev->request_queue);
> blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
> +   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, sdev->request_queue);
> return sdev->request_queue;
>  }

This patch eliminates my 5 minute boot-up delay problem.

/Mikael


Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-12 Thread Jens Axboe
On 2/12/19 8:24 AM, James Bottomley wrote:
> On Mon, 2019-02-11 at 19:50 -0700, Jens Axboe wrote:
>> On 2/11/19 7:13 PM, James Bottomley wrote:
>>> On Mon, 2019-02-11 at 09:31 -0700, Jens Axboe wrote:
 On 2/11/19 9:28 AM, James Bottomley wrote:
> On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote:
>> On 2/11/19 8:42 AM, James Bottomley wrote:
>>> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
 On 2/11/19 8:25 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
>> On 2/10/19 9:25 AM, James Bottomley wrote:
>
> [...]
>>> That check wasn't changed by the code removal.
>>
>> As I said above, for sd. This isn't true for non-
>> disks.
>
> Yes, but the behaviour above doesn't change across a
> switch
> to MQ, so I don't quite understand how it bisects back
> to
> that change.  If we're not gathering entropy for the
> device
> now, we wouldn't have been before the switch, so the
> entropy characteristics shouldn't have changed.

 But it does, as I also wrote in that first email. The
 legacy
 queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the
 MQ
 ones do not. Hence any non-sd device would previously
 ALWAYS
 have ADD_RANDOM set, now none of them do. Also see the
 patch
 I sent.
>>>
>>> So your theory is that the disk in question never gets to
>>> the
>>> rotational check?  because the check will clear the flag if
>>> it's non-rotational and set it if it's not, so the default
>>> state of the flag shouldn't matter.
>>
>> No, my point is about non-disks, devices that aren't driven
>> by
>> sd. The behavior for sd hasn't changed, as it sets/clears it
>> unconditionally. 
>
> I agree, but I don't think any of them were significant entropy
> contributors before: things like nvme have always been outside
> of
> this and sr and st don't really contribute much to the seek
> load
> during boot because they're probed but not used by the boot
> sequence, so I can't see how they would cause this
> behaviour.  I
> suppose it could be target probing, but even that seems
> unlikely
> because it should be dwarfed by the number of root disk reads
> during boot.
>
> For the rng to take an additional 5 minutes to initialize, we
> must
> have lost a significant entropy source somewhere.

 I agree it's not a significant amount of entropy, but even just
 one
 bit could mean a long stall if that put us over the edge of just
 not
 having enough for whatever is blocking on /dev/random. Mikael's
 boot
 did have a CDROM, it's not impossible that the handful of
 commands we
 end up doing to that device would have contributed enough entropy
 to
 get the boot done without stalling for minutes.

 One way to know for sure, and that's if Mikael tests the patch.
>>>
>>> I think I've got the root cause.  I have one system in my test bed
>>> exhibiting this behaviour.  It turns out the disk in it has no
>>> characteristics VPD page.  The 0xB1 VPD was a SBC-3 addition, so
>>> that's
>>> not surprising.  However, the characteristics check bails before
>>> setting the flags, so it takes the default flag which has flipped.
>>>
>>> We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if
>>> there's
>>> no 0xB1 page or by setting the default as Jens proposed.
>>
>> I'd recommend just doing my patch, since that'll be the same behavior
>> that SCSI had before.
> 
> I've got the history now, it's this patch
> 
> Author: Xuewei Zhang 
> Date:   Thu Sep 6 13:37:19 2018 -0700
> 
> scsi: sd: Contribute to randomness when running rotational device
> 
> It added the else branch to the if (rot == 1).  It's the position of
> that else branch which is wrong because not all disks have a SBC-3
> characteristics VPD page, so they're the ones under MQ which stop
> contributing entropy.  Whichever patch we go with will need a fixes:
> for this.

Ah, makes sense. I'd say we're _probably_ fine just fixing that then,
or at least it should be two separate patches.

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-12 Thread James Bottomley
On Mon, 2019-02-11 at 19:50 -0700, Jens Axboe wrote:
> On 2/11/19 7:13 PM, James Bottomley wrote:
> > On Mon, 2019-02-11 at 09:31 -0700, Jens Axboe wrote:
> > > On 2/11/19 9:28 AM, James Bottomley wrote:
> > > > On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote:
> > > > > On 2/11/19 8:42 AM, James Bottomley wrote:
> > > > > > On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
> > > > > > > On 2/11/19 8:25 AM, James Bottomley wrote:
> > > > > > > > On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
> > > > > > > > > On 2/10/19 9:25 AM, James Bottomley wrote:
> > > > 
> > > > [...]
> > > > > > > > > > That check wasn't changed by the code removal.
> > > > > > > > > 
> > > > > > > > > As I said above, for sd. This isn't true for non-
> > > > > > > > > disks.
> > > > > > > > 
> > > > > > > > Yes, but the behaviour above doesn't change across a
> > > > > > > > switch
> > > > > > > > to MQ, so I don't quite understand how it bisects back
> > > > > > > > to
> > > > > > > > that change.  If we're not gathering entropy for the
> > > > > > > > device
> > > > > > > > now, we wouldn't have been before the switch, so the
> > > > > > > > entropy characteristics shouldn't have changed.
> > > > > > > 
> > > > > > > But it does, as I also wrote in that first email. The
> > > > > > > legacy
> > > > > > > queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the
> > > > > > > MQ
> > > > > > > ones do not. Hence any non-sd device would previously
> > > > > > > ALWAYS
> > > > > > > have ADD_RANDOM set, now none of them do. Also see the
> > > > > > > patch
> > > > > > > I sent.
> > > > > > 
> > > > > > So your theory is that the disk in question never gets to
> > > > > > the
> > > > > > rotational check?  because the check will clear the flag if
> > > > > > it's non-rotational and set it if it's not, so the default
> > > > > > state of the flag shouldn't matter.
> > > > > 
> > > > > No, my point is about non-disks, devices that aren't driven
> > > > > by
> > > > > sd. The behavior for sd hasn't changed, as it sets/clears it
> > > > > unconditionally. 
> > > > 
> > > > I agree, but I don't think any of them were significant entropy
> > > > contributors before: things like nvme have always been outside
> > > > of
> > > > this and sr and st don't really contribute much to the seek
> > > > load
> > > > during boot because they're probed but not used by the boot
> > > > sequence, so I can't see how they would cause this
> > > > behaviour.  I
> > > > suppose it could be target probing, but even that seems
> > > > unlikely
> > > > because it should be dwarfed by the number of root disk reads
> > > > during boot.
> > > > 
> > > > For the rng to take an additional 5 minutes to initialize, we
> > > > must
> > > > have lost a significant entropy source somewhere.
> > > 
> > > I agree it's not a significant amount of entropy, but even just
> > > one
> > > bit could mean a long stall if that put us over the edge of just
> > > not
> > > having enough for whatever is blocking on /dev/random. Mikael's
> > > boot
> > > did have a CDROM, it's not impossible that the handful of
> > > commands we
> > > end up doing to that device would have contributed enough entropy
> > > to
> > > get the boot done without stalling for minutes.
> > > 
> > > One way to know for sure, and that's if Mikael tests the patch.
> > 
> > I think I've got the root cause.  I have one system in my test bed
> > exhibiting this behaviour.  It turns out the disk in it has no
> > characteristics VPD page.  The 0xB1 VPD was a SBC-3 addition, so
> > that's
> > not surprising.  However, the characteristics check bails before
> > setting the flags, so it takes the default flag which has flipped.
> > 
> > We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if
> > there's
> > no 0xB1 page or by setting the default as Jens proposed.
> 
> I'd recommend just doing my patch, since that'll be the same behavior
> that SCSI had before.

I've got the history now, it's this patch

Author: Xuewei Zhang 
Date:   Thu Sep 6 13:37:19 2018 -0700

scsi: sd: Contribute to randomness when running rotational device

It added the else branch to the if (rot == 1).  It's the position of
that else branch which is wrong because not all disks have a SBC-3
characteristics VPD page, so they're the ones under MQ which stop
contributing entropy.  Whichever patch we go with will need a fixes:
for this.

James



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread James Bottomley
On Tue, 2019-02-12 at 03:37 +, Elliott, Robert (Persistent Memory)
wrote:
> > -Original Message-
> > From: linux-kernel-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Jens Axboe
> > Sent: Monday, February 11, 2019 8:50 PM
> > To: James Bottomley ; Mikael
> > Pettersson 
> > Cc: Linux SPARC Kernel Mailing List ;
> > linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> > scsi
> > 
> > Subject: Re: [5.0-rc5 regression] "scsi: kill off the legacy IO
> > path"
> > causes 5 minute delay during boot on Sun Blade 2500
> 
> ...
> 
> > > We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if
> > > there's no 0xB1 page or by setting the default as Jens proposed.
> > 
> > I'd recommend just doing my patch, since that'll be the same
> > behavior that SCSI had before.
> 
> When blk-mq and scsi-mq were introduced to boost SAS SSDs over
> 1 million IOPS, this was purposely kept off, because it generated
> a noticeable 3-12 us latency blip whenever the "fast_mix" code ran
> out of bits. If Jens' patch changes the default to be enabled for
> all scsi-mq users, that'll be a change for the newer legacy
> scsi-mq users (except for users that don't trust the default and
> always set the value in sysfs).

All high IOPs devices have a device characteristics page, so even if we
 default this to ON, it will get turned off for these devices which
should be the desired result regardless of the default setting. 

James



RE: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread Elliott, Robert (Persistent Memory)


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Jens Axboe
> Sent: Monday, February 11, 2019 8:50 PM
> To: James Bottomley ; Mikael
> Pettersson 
> Cc: Linux SPARC Kernel Mailing List ;
> linux-bl...@vger.kernel.org; linux-kernel@vger.kernel.org; linux-scsi
> 
> Subject: Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path"
> causes 5 minute delay during boot on Sun Blade 2500
...

> > We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if
> > there's no 0xB1 page or by setting the default as Jens proposed.
> 
> I'd recommend just doing my patch, since that'll be the same behavior
> that SCSI had before.

When blk-mq and scsi-mq were introduced to boost SAS SSDs over
1 million IOPS, this was purposely kept off, because it generated
a noticeable 3-12 us latency blip whenever the "fast_mix" code ran
out of bits. If Jens' patch changes the default to be enabled for
all scsi-mq users, that'll be a change for the newer legacy
scsi-mq users (except for users that don't trust the default and
always set the value in sysfs).




Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread Jens Axboe
On 2/11/19 7:13 PM, James Bottomley wrote:
> On Mon, 2019-02-11 at 09:31 -0700, Jens Axboe wrote:
>> On 2/11/19 9:28 AM, James Bottomley wrote:
>>> On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote:
 On 2/11/19 8:42 AM, James Bottomley wrote:
> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
>> On 2/11/19 8:25 AM, James Bottomley wrote:
>>> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
 On 2/10/19 9:25 AM, James Bottomley wrote:
>>>
>>> [...]
> That check wasn't changed by the code removal.

 As I said above, for sd. This isn't true for non-disks.
>>>
>>> Yes, but the behaviour above doesn't change across a switch
>>> to MQ, so I don't quite understand how it bisects back to
>>> that change.  If we're not gathering entropy for the device
>>> now, we wouldn't have been before the switch, so the
>>> entropy characteristics shouldn't have changed.
>>
>> But it does, as I also wrote in that first email. The legacy
>> queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ
>> ones do not. Hence any non-sd device would previously ALWAYS
>> have ADD_RANDOM set, now none of them do. Also see the patch
>> I sent.
>
> So your theory is that the disk in question never gets to the
> rotational check?  because the check will clear the flag if
> it's non-rotational and set it if it's not, so the default
> state of the flag shouldn't matter.

 No, my point is about non-disks, devices that aren't driven by
 sd. The behavior for sd hasn't changed, as it sets/clears it
 unconditionally. 
>>>
>>> I agree, but I don't think any of them were significant entropy
>>> contributors before: things like nvme have always been outside of
>>> this and sr and st don't really contribute much to the seek load
>>> during boot because they're probed but not used by the boot
>>> sequence, so I can't see how they would cause this behaviour.  I
>>> suppose it could be target probing, but even that seems unlikely
>>> because it should be dwarfed by the number of root disk reads
>>> during boot.
>>>
>>> For the rng to take an additional 5 minutes to initialize, we must
>>> have lost a significant entropy source somewhere.
>>
>> I agree it's not a significant amount of entropy, but even just one
>> bit could mean a long stall if that put us over the edge of just not
>> having enough for whatever is blocking on /dev/random. Mikael's boot
>> did have a CDROM, it's not impossible that the handful of commands we
>> end up doing to that device would have contributed enough entropy to
>> get the boot done without stalling for minutes.
>>
>> One way to know for sure, and that's if Mikael tests the patch.
> 
> I think I've got the root cause.  I have one system in my test bed
> exhibiting this behaviour.  It turns out the disk in it has no
> characteristics VPD page.  The 0xB1 VPD was a SBC-3 addition, so that's
> not surprising.  However, the characteristics check bails before
> setting the flags, so it takes the default flag which has flipped.
> 
> We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if there's
> no 0xB1 page or by setting the default as Jens proposed.

I'd recommend just doing my patch, since that'll be the same behavior
that SCSI had before.

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread James Bottomley
On Mon, 2019-02-11 at 09:31 -0700, Jens Axboe wrote:
> On 2/11/19 9:28 AM, James Bottomley wrote:
> > On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote:
> > > On 2/11/19 8:42 AM, James Bottomley wrote:
> > > > On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
> > > > > On 2/11/19 8:25 AM, James Bottomley wrote:
> > > > > > On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
> > > > > > > On 2/10/19 9:25 AM, James Bottomley wrote:
> > 
> > [...]
> > > > > > > > That check wasn't changed by the code removal.
> > > > > > > 
> > > > > > > As I said above, for sd. This isn't true for non-disks.
> > > > > > 
> > > > > > Yes, but the behaviour above doesn't change across a switch
> > > > > > to MQ, so I don't quite understand how it bisects back to
> > > > > > that change.  If we're not gathering entropy for the device
> > > > > > now, we wouldn't have been before the switch, so the
> > > > > > entropy characteristics shouldn't have changed.
> > > > > 
> > > > > But it does, as I also wrote in that first email. The legacy
> > > > > queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ
> > > > > ones do not. Hence any non-sd device would previously ALWAYS
> > > > > have ADD_RANDOM set, now none of them do. Also see the patch
> > > > > I sent.
> > > > 
> > > > So your theory is that the disk in question never gets to the
> > > > rotational check?  because the check will clear the flag if
> > > > it's non-rotational and set it if it's not, so the default
> > > > state of the flag shouldn't matter.
> > > 
> > > No, my point is about non-disks, devices that aren't driven by
> > > sd. The behavior for sd hasn't changed, as it sets/clears it
> > > unconditionally. 
> > 
> > I agree, but I don't think any of them were significant entropy
> > contributors before: things like nvme have always been outside of
> > this and sr and st don't really contribute much to the seek load
> > during boot because they're probed but not used by the boot
> > sequence, so I can't see how they would cause this behaviour.  I
> > suppose it could be target probing, but even that seems unlikely
> > because it should be dwarfed by the number of root disk reads
> > during boot.
> > 
> > For the rng to take an additional 5 minutes to initialize, we must
> > have lost a significant entropy source somewhere.
> 
> I agree it's not a significant amount of entropy, but even just one
> bit could mean a long stall if that put us over the edge of just not
> having enough for whatever is blocking on /dev/random. Mikael's boot
> did have a CDROM, it's not impossible that the handful of commands we
> end up doing to that device would have contributed enough entropy to
> get the boot done without stalling for minutes.
> 
> One way to know for sure, and that's if Mikael tests the patch.

I think I've got the root cause.  I have one system in my test bed
exhibiting this behaviour.  It turns out the disk in it has no
characteristics VPD page.  The 0xB1 VPD was a SBC-3 addition, so that's
not surprising.  However, the characteristics check bails before
setting the flags, so it takes the default flag which has flipped.

We can either fix this by setting the QUEUE_FLAG_ADD_RANDOM if there's
no 0xB1 page or by setting the default as Jens proposed.

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d0a980915801..1f3a1474042e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -2961,6 +2961,10 @@ static void sd_read_block_characteristics(struct 
scsi_disk *sdkp)
 
buffer = kmalloc(vpd_len, GFP_KERNEL);
 
+   /* set to rotational in case no device characteristic page exists */
+   blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
+   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
+
if (!buffer ||
/* Block Device Characteristics VPD */
scsi_get_vpd_page(sdkp->device, 0xb1, buffer, vpd_len))
@@ -2971,9 +2975,6 @@ static void sd_read_block_characteristics(struct 
scsi_disk *sdkp)
if (rot == 1) {
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
-   } else {
-   blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
-   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
}
 
if (sdkp->device->type == TYPE_ZBC) {


Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread Jens Axboe
On 2/11/19 9:28 AM, James Bottomley wrote:
> On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote:
>> On 2/11/19 8:42 AM, James Bottomley wrote:
>>> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
 On 2/11/19 8:25 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
>> On 2/10/19 9:25 AM, James Bottomley wrote:
> [...]
>>> That check wasn't changed by the code removal.
>>
>> As I said above, for sd. This isn't true for non-disks.
>
> Yes, but the behaviour above doesn't change across a switch to
> MQ, so I don't quite understand how it bisects back to that
> change.  If we're not gathering entropy for the device now, we
> wouldn't have been before the switch, so the entropy
> characteristics shouldn't have changed.

 But it does, as I also wrote in that first email. The legacy
 queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ ones
 do not. Hence any non-sd device would previously ALWAYS have
 ADD_RANDOM set, now none of them do. Also see the patch I sent.
>>>
>>> So your theory is that the disk in question never gets to the
>>> rotational check?  because the check will clear the flag if it's
>>> non-rotational and set it if it's not, so the default state of the
>>> flag shouldn't matter.
>>
>> No, my point is about non-disks, devices that aren't driven by sd.
>> The behavior for sd hasn't changed, as it sets/clears it
>> unconditionally.
> 
> I agree, but I don't think any of them were significant entropy
> contributors before: things like nvme have always been outside of this
> and sr and st don't really contribute much to the seek load during boot
> because they're probed but not used by the boot sequence, so I can't
> see how they would cause this behaviour.  I suppose it could be target
> probing, but even that seems unlikely because it should be dwarfed by
> the number of root disk reads during boot.
> 
> For the rng to take an additional 5 minutes to initialize, we must have
> lost a significant entropy source somewhere.

I agree it's not a significant amount of entropy, but even just one bit
could mean a long stall if that put us over the edge of just not having
enough for whatever is blocking on /dev/random. Mikael's boot did have a
CDROM, it's not impossible that the handful of commands we end up doing
to that device would have contributed enough entropy to get the boot
done without stalling for minutes.

One way to know for sure, and that's if Mikael tests the patch.

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread James Bottomley
On Mon, 2019-02-11 at 08:46 -0700, Jens Axboe wrote:
> On 2/11/19 8:42 AM, James Bottomley wrote:
> > On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
> > > On 2/11/19 8:25 AM, James Bottomley wrote:
> > > > On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
> > > > > On 2/10/19 9:25 AM, James Bottomley wrote:
[...]
> > > > > > That check wasn't changed by the code removal.
> > > > > 
> > > > > As I said above, for sd. This isn't true for non-disks.
> > > > 
> > > > Yes, but the behaviour above doesn't change across a switch to
> > > > MQ, so I don't quite understand how it bisects back to that
> > > > change.  If we're not gathering entropy for the device now, we
> > > > wouldn't have been before the switch, so the entropy
> > > > characteristics shouldn't have changed.
> > > 
> > > But it does, as I also wrote in that first email. The legacy
> > > queue flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ ones
> > > do not. Hence any non-sd device would previously ALWAYS have
> > > ADD_RANDOM set, now none of them do. Also see the patch I sent.
> > 
> > So your theory is that the disk in question never gets to the
> > rotational check?  because the check will clear the flag if it's
> > non-rotational and set it if it's not, so the default state of the
> > flag shouldn't matter.
> 
> No, my point is about non-disks, devices that aren't driven by sd.
> The behavior for sd hasn't changed, as it sets/clears it
> unconditionally.

I agree, but I don't think any of them were significant entropy
contributors before: things like nvme have always been outside of this
and sr and st don't really contribute much to the seek load during boot
because they're probed but not used by the boot sequence, so I can't
see how they would cause this behaviour.  I suppose it could be target
probing, but even that seems unlikely because it should be dwarfed by
the number of root disk reads during boot.

For the rng to take an additional 5 minutes to initialize, we must have
lost a significant entropy source somewhere.

James

> That's not true for something driven by sr, for instance, and
> anything else non-sd. For those we defaulted to adding randomness for
> !scsi-mq, and default to not adding randomness for scsi-mq.
> 
> The patch I included would have the same behavior for scsi-mq as we
> had for non-mq.



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread Jens Axboe
On 2/11/19 8:42 AM, James Bottomley wrote:
> On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
>> On 2/11/19 8:25 AM, James Bottomley wrote:
>>> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
 On 2/10/19 9:25 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
>> On 2/10/19 8:44 AM, James Bottomley wrote:
>>> On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
 On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
  wrote:
>>>
>>> [...]
> I think the reason for this is that the block mq path
> doesn't feed the kernel entropy pool correctly, hence
> the need to install an entropy gatherer for systems
> that don't have other good random number sources.

 That does sound plausible, I admit I didn't even consider
 the possibility that the old block I/O path also was an
 entropy source.
>>>
>>> In theory, the new one should be as well since the
>>> rotational entropy collector is on the SCSI completion
>>> path.   I'd seen the same problem but had assumed it was
>>> something someone had done to our internal entropy pool and
>>> thus hadn't bisected it.
>>
>> The difference is that the old stack included ADD_RANDOM by
>> default, so this check:
>>
>>  if (blk_queue_add_random(q))
>>  add_disk_randomness(req->rq_disk);
>>
>> in scsi_end_request() would be true, and we'd add the
>> randomness. For sd, it seems to set it just fine for non-
>> rotational drives. Could this be because other devices don't?
>> Maybe the below makes a difference.
>
> No, in both we set it per the rotational parameters of the disk
> in 
>
> sd.c:sd_read_block_characteristics()
>
>   rot = get_unaligned_be16([4]);
>
>   if (rot == 1) {
>   
>   blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>   
>   blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
>   } else {
>   
>   blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
>   
>   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
>   }
>
>
> That check wasn't changed by the code removal.

 As I said above, for sd. This isn't true for non-disks.
>>>
>>> Yes, but the behaviour above doesn't change across a switch to MQ,
>>> so I don't quite understand how it bisects back to that change.  If
>>> we're not gathering entropy for the device now, we wouldn't have
>>> been before the switch, so the entropy characteristics shouldn't
>>> have changed.
>>
>> But it does, as I also wrote in that first email. The legacy queue
>> flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ ones do not.
>> Hence any non-sd device would previously ALWAYS have ADD_RANDOM
>> set, now none of them do. Also see the patch I sent.
> 
> So your theory is that the disk in question never gets to the
> rotational check?  because the check will clear the flag if it's non-
> rotational and set it if it's not, so the default state of the flag
> shouldn't matter.

No, my point is about non-disks, devices that aren't driven by sd. The
behavior for sd hasn't changed, as it sets/clears it unconditionally.
That's not true for something driven by sr, for instance, and anything
else non-sd. For those we defaulted to adding randomness for !scsi-mq,
and default to not adding randomness for scsi-mq.

The patch I included would have the same behavior for scsi-mq as we had
for non-mq.

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread James Bottomley
On Mon, 2019-02-11 at 08:28 -0700, Jens Axboe wrote:
> On 2/11/19 8:25 AM, James Bottomley wrote:
> > On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
> > > On 2/10/19 9:25 AM, James Bottomley wrote:
> > > > On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
> > > > > On 2/10/19 8:44 AM, James Bottomley wrote:
> > > > > > On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> > > > > > > On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
> > > > > > >  wrote:
> > > > > > 
> > > > > > [...]
> > > > > > > > I think the reason for this is that the block mq path
> > > > > > > > doesn't feed the kernel entropy pool correctly, hence
> > > > > > > > the need to install an entropy gatherer for systems
> > > > > > > > that don't have other good random number sources.
> > > > > > > 
> > > > > > > That does sound plausible, I admit I didn't even consider
> > > > > > > the possibility that the old block I/O path also was an
> > > > > > > entropy source.
> > > > > > 
> > > > > > In theory, the new one should be as well since the
> > > > > > rotational entropy collector is on the SCSI completion
> > > > > > path.   I'd seen the same problem but had assumed it was
> > > > > > something someone had done to our internal entropy pool and
> > > > > > thus hadn't bisected it.
> > > > > 
> > > > > The difference is that the old stack included ADD_RANDOM by
> > > > > default, so this check:
> > > > > 
> > > > >   if (blk_queue_add_random(q))
> > > > >   add_disk_randomness(req->rq_disk);
> > > > > 
> > > > > in scsi_end_request() would be true, and we'd add the
> > > > > randomness. For sd, it seems to set it just fine for non-
> > > > > rotational drives. Could this be because other devices don't?
> > > > > Maybe the below makes a difference.
> > > > 
> > > > No, in both we set it per the rotational parameters of the disk
> > > > in 
> > > > 
> > > > sd.c:sd_read_block_characteristics()
> > > > 
> > > > rot = get_unaligned_be16([4]);
> > > > 
> > > > if (rot == 1) {
> > > > 
> > > > blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
> > > > 
> > > > blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
> > > > } else {
> > > > 
> > > > blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
> > > > 
> > > > blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
> > > > }
> > > > 
> > > > 
> > > > That check wasn't changed by the code removal.
> > > 
> > > As I said above, for sd. This isn't true for non-disks.
> > 
> > Yes, but the behaviour above doesn't change across a switch to MQ,
> > so I don't quite understand how it bisects back to that change.  If
> > we're not gathering entropy for the device now, we wouldn't have
> > been before the switch, so the entropy characteristics shouldn't
> > have changed.
> 
> But it does, as I also wrote in that first email. The legacy queue
> flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ ones do not.
> Hence any non-sd device would previously ALWAYS have ADD_RANDOM
> set, now none of them do. Also see the patch I sent.

So your theory is that the disk in question never gets to the
rotational check?  because the check will clear the flag if it's non-
rotational and set it if it's not, so the default state of the flag
shouldn't matter.

James



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread Jens Axboe
On 2/11/19 8:25 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
>> On 2/10/19 9:25 AM, James Bottomley wrote:
>>> On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
 On 2/10/19 8:44 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
>> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
>>  wrote:
>
> [...]
>>> I think the reason for this is that the block mq path
>>> doesn't feed the kernel entropy pool correctly, hence the
>>> need to install an entropy gatherer for systems that don't
>>> have other good random number sources.
>>
>> That does sound plausible, I admit I didn't even consider the
>> possibility that the old block I/O path also was an entropy
>> source.
>
> In theory, the new one should be as well since the rotational
> entropy collector is on the SCSI completion path.   I'd seen
> the same problem but had assumed it was something someone had
> done to our internal entropy pool and thus hadn't bisected it.

 The difference is that the old stack included ADD_RANDOM by
 default, so this check:

if (blk_queue_add_random(q))
add_disk_randomness(req->rq_disk);

 in scsi_end_request() would be true, and we'd add the randomness.
 For sd, it seems to set it just fine for non-rotational drives.
 Could this be because other devices don't? Maybe the below makes
 a difference.
>>>
>>> No, in both we set it per the rotational parameters of the disk in 
>>>
>>> sd.c:sd_read_block_characteristics()
>>>
>>> rot = get_unaligned_be16([4]);
>>>
>>> if (rot == 1) {
>>> 
>>> blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>>> 
>>> blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
>>> } else {
>>> 
>>> blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
>>> 
>>> blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
>>> }
>>>
>>>
>>> That check wasn't changed by the code removal.
>>
>> As I said above, for sd. This isn't true for non-disks.
> 
> Yes, but the behaviour above doesn't change across a switch to MQ, so I
> don't quite understand how it bisects back to that change.  If we're
> not gathering entropy for the device now, we wouldn't have been before
> the switch, so the entropy characteristics shouldn't have changed.

But it does, as I also wrote in that first email. The legacy queue
flags had QUEUE_FLAG_ADD_RANDOM set by default, the MQ ones do not.
Hence any non-sd device would previously ALWAYS have ADD_RANDOM
set, now none of them do. Also see the patch I sent.

>>> Although I suspect it should be unconditional: even SSDs have what
>>> would appear as seek latencies at least during writes depending on
>>> the time taken to find an erased block or even trigger garbage
>>> collection.  The entropy collector is good at taking something
>>> completely regular and spotting the inconsistencies, so it won't
>>> matter that loads of "seeks" are deterministic.
>>
>> The reason it isn't is that it's of limited use for SSDs where it's a
>> lot more predictable. And they are also a lot faster, which means the
>> adding randomness is more problematic from an efficiency pov.
> 
> But that's my point: our entropy extractor is good at weeding out
> predictable signals.  Fine, it won't extract any entropy if the disk
> seek time is entirely regular, but it won't contaminate the entropy
> pool.  The computational delay, I grant ... it takes a while to
> determine if any entropy is present in the signal.

But you are missing my point - if we're mostly weeding out predictable
signals, then it's pointless to take the overhead of the randomness.
This is why the MQ flag don't include it by default.

> What about feeding it with something like discard timings, which should
> be much less predictable.

That's not true, lots of devices have VERY predictable discard timings.
Most of them will have a fixed discards-per-second rate, even.

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-11 Thread James Bottomley
On Sun, 2019-02-10 at 09:35 -0700, Jens Axboe wrote:
> On 2/10/19 9:25 AM, James Bottomley wrote:
> > On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
> > > On 2/10/19 8:44 AM, James Bottomley wrote:
> > > > On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> > > > > On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
> > > > >  wrote:
> > > > 
> > > > [...]
> > > > > > I think the reason for this is that the block mq path
> > > > > > doesn't feed the kernel entropy pool correctly, hence the
> > > > > > need to install an entropy gatherer for systems that don't
> > > > > > have other good random number sources.
> > > > > 
> > > > > That does sound plausible, I admit I didn't even consider the
> > > > > possibility that the old block I/O path also was an entropy
> > > > > source.
> > > > 
> > > > In theory, the new one should be as well since the rotational
> > > > entropy collector is on the SCSI completion path.   I'd seen
> > > > the same problem but had assumed it was something someone had
> > > > done to our internal entropy pool and thus hadn't bisected it.
> > > 
> > > The difference is that the old stack included ADD_RANDOM by
> > > default, so this check:
> > > 
> > >   if (blk_queue_add_random(q))
> > >   add_disk_randomness(req->rq_disk);
> > > 
> > > in scsi_end_request() would be true, and we'd add the randomness.
> > > For sd, it seems to set it just fine for non-rotational drives.
> > > Could this be because other devices don't? Maybe the below makes
> > > a difference.
> > 
> > No, in both we set it per the rotational parameters of the disk in 
> > 
> > sd.c:sd_read_block_characteristics()
> > 
> > rot = get_unaligned_be16([4]);
> > 
> > if (rot == 1) {
> > 
> > blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
> > 
> > blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
> > } else {
> > 
> > blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
> > 
> > blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
> > }
> > 
> > 
> > That check wasn't changed by the code removal.
> 
> As I said above, for sd. This isn't true for non-disks.

Yes, but the behaviour above doesn't change across a switch to MQ, so I
don't quite understand how it bisects back to that change.  If we're
not gathering entropy for the device now, we wouldn't have been before
the switch, so the entropy characteristics shouldn't have changed.

> > Although I suspect it should be unconditional: even SSDs have what
> > would appear as seek latencies at least during writes depending on
> > the time taken to find an erased block or even trigger garbage
> > collection.  The entropy collector is good at taking something
> > completely regular and spotting the inconsistencies, so it won't
> > matter that loads of "seeks" are deterministic.
> 
> The reason it isn't is that it's of limited use for SSDs where it's a
> lot more predictable. And they are also a lot faster, which means the
> adding randomness is more problematic from an efficiency pov.

But that's my point: our entropy extractor is good at weeding out
predictable signals.  Fine, it won't extract any entropy if the disk
seek time is entirely regular, but it won't contaminate the entropy
pool.  The computational delay, I grant ... it takes a while to
determine if any entropy is present in the signal.

What about feeding it with something like discard timings, which should
be much less predictable.

James



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread Jens Axboe
On 2/10/19 9:25 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
>> On 2/10/19 8:44 AM, James Bottomley wrote:
>>> On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
 On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
  wrote:
>>>
>>> [...]
> I think the reason for this is that the block mq path doesn't
> feed
> the kernel entropy pool correctly, hence the need to install an
> entropy gatherer for systems that don't have other good random
> number sources.

 That does sound plausible, I admit I didn't even consider the
 possibility that the old block I/O path also was an entropy
 source.
>>>
>>> In theory, the new one should be as well since the rotational
>>> entropy
>>> collector is on the SCSI completion path.   I'd seen the same
>>> problem
>>> but had assumed it was something someone had done to our internal
>>> entropy pool and thus hadn't bisected it.
>>
>> The difference is that the old stack included ADD_RANDOM by default,
>> so this check:
>>
>>  if (blk_queue_add_random(q))
>>  add_disk_randomness(req->rq_disk);
>>
>> in scsi_end_request() would be true, and we'd add the randomness. For
>> sd, it seems to set it just fine for non-rotational drives. Could
>> this be because other devices don't? Maybe the below makes a
>> difference.
> 
> No, in both we set it per the rotational parameters of the disk in 
> 
> sd.c:sd_read_block_characteristics()
> 
>   rot = get_unaligned_be16([4]);
> 
>   if (rot == 1) {
>   
>   blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>   
>   blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
>   } else {
>   
>   blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
>   
>   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
>   }
> 
> 
> That check wasn't changed by the code removal.

As I said above, for sd. This isn't true for non-disks.

> Although I suspect it should be unconditional: even SSDs have what
> would appear as seek latencies at least during writes depending on the
> time taken to find an erased block or even trigger garbage collection. 
> The entropy collector is good at taking something completely regular
> and spotting the inconsistencies, so it won't matter that loads of
> "seeks" are deterministic.

The reason it isn't is that it's of limited use for SSDs where it's a
lot more predictable. And they are also a lot faster, which means the
adding randomness is more problematic from an efficiency pov.

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread James Bottomley
On Sun, 2019-02-10 at 09:05 -0700, Jens Axboe wrote:
> On 2/10/19 8:44 AM, James Bottomley wrote:
> > On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> > > On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
> > >  wrote:
> > 
> > [...]
> > > > I think the reason for this is that the block mq path doesn't
> > > > feed
> > > > the kernel entropy pool correctly, hence the need to install an
> > > > entropy gatherer for systems that don't have other good random
> > > > number sources.
> > > 
> > > That does sound plausible, I admit I didn't even consider the
> > > possibility that the old block I/O path also was an entropy
> > > source.
> > 
> > In theory, the new one should be as well since the rotational
> > entropy
> > collector is on the SCSI completion path.   I'd seen the same
> > problem
> > but had assumed it was something someone had done to our internal
> > entropy pool and thus hadn't bisected it.
> 
> The difference is that the old stack included ADD_RANDOM by default,
> so this check:
> 
>   if (blk_queue_add_random(q))
>   add_disk_randomness(req->rq_disk);
> 
> in scsi_end_request() would be true, and we'd add the randomness. For
> sd, it seems to set it just fine for non-rotational drives. Could
> this be because other devices don't? Maybe the below makes a
> difference.

No, in both we set it per the rotational parameters of the disk in 

sd.c:sd_read_block_characteristics()

rot = get_unaligned_be16([4]);

if (rot == 1) {

blk_queue_flag_set(QUEUE_FLAG_NONROT, q);

blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
} else {

blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);

blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
}


That check wasn't changed by the code removal.

Although I suspect it should be unconditional: even SSDs have what
would appear as seek latencies at least during writes depending on the
time taken to find an erased block or even trigger garbage collection. 
The entropy collector is good at taking something completely regular
and spotting the inconsistencies, so it won't matter that loads of
"seeks" are deterministic.

James



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread Jens Axboe
On 2/10/19 8:44 AM, James Bottomley wrote:
> On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
>> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
>>  wrote:
> [...]
>>> I think the reason for this is that the block mq path doesn't feed
>>> the kernel entropy pool correctly, hence the need to install an
>>> entropy gatherer for systems that don't have other good random
>>> number sources.
>>
>> That does sound plausible, I admit I didn't even consider the
>> possibility that the old block I/O path also was an entropy source.
> 
> In theory, the new one should be as well since the rotational entropy
> collector is on the SCSI completion path.   I'd seen the same problem
> but had assumed it was something someone had done to our internal
> entropy pool and thus hadn't bisected it.

The difference is that the old stack included ADD_RANDOM by default,
so this check:

if (blk_queue_add_random(q))
add_disk_randomness(req->rq_disk);

in scsi_end_request() would be true, and we'd add the randomness. For
sd, it seems to set it just fine for non-rotational drives. Could this
be because other devices don't? Maybe the below makes a difference.


diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 6d65ac584eba..60e029911755 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1881,6 +1881,7 @@ struct request_queue *scsi_mq_alloc_queue(struct 
scsi_device *sdev)
sdev->request_queue->queuedata = sdev;
__scsi_init_queue(sdev->host, sdev->request_queue);
blk_queue_flag_set(QUEUE_FLAG_SCSI_PASSTHROUGH, sdev->request_queue);
+   blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, sdev->request_queue);
return sdev->request_queue;
 }
 

-- 
Jens Axboe



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread James Bottomley
On Sun, 2019-02-10 at 10:17 +0100, Mikael Pettersson wrote:
> On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
>  wrote:
[...]
> > I think the reason for this is that the block mq path doesn't feed
> > the kernel entropy pool correctly, hence the need to install an
> > entropy gatherer for systems that don't have other good random
> > number sources.
> 
> That does sound plausible, I admit I didn't even consider the
> possibility that the old block I/O path also was an entropy source.

In theory, the new one should be as well since the rotational entropy
collector is on the SCSI completion path.   I'd seen the same problem
but had assumed it was something someone had done to our internal
entropy pool and thus hadn't bisected it.

James



Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-10 Thread Mikael Pettersson
On Sat, Feb 9, 2019 at 7:19 PM James Bottomley
 wrote:
>
> On Sat, 2019-02-09 at 18:04 +0100, Mikael Pettersson wrote:
> > 4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
> > IIIi), but the 5.0-rc kernels consistently experience a 5 minute
> > delay
> > late during boot, after enabling networking but before allowing user
> > logins.  E.g. 5.0-rc5 dmesg has:
> >
> > [Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
> > read (12 bytes read)
> > [Fri Feb  8 17:18:14 2019] random: crng init done
>
> I've had the same problem on several of my test systems.  Are you sure
> it's not this bug report:
>
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087
>
> ?
>
> The solution for me was to install the haveged package which does
> active entropy gathering during boot and can make /dev/urandom
> available much earlier.

Thanks for the hint, I'll look into using haveged on this machine.

>
> > During this interval the machine answers pings but won't allow user
> > logins either on the console or over the network.
> >
> > A git bisect identified commit
> > f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
> > Author: Jens Axboe 
> > Date:   Thu Nov 1 16:36:27 2018 -0600
> >
> > scsi: kill off the legacy IO path
> >
> > as the point where this 5m delay was introduced.
>
> I think the reason for this is that the block mq path doesn't feed the
> kernel entropy pool correctly, hence the need to install an entropy
> gatherer for systems that don't have other good random number sources.

That does sound plausible, I admit I didn't even consider the possibility that
the old block I/O path also was an entropy source.

/Mikael


Re: [5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-09 Thread James Bottomley
On Sat, 2019-02-09 at 18:04 +0100, Mikael Pettersson wrote:
> 4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
> IIIi), but the 5.0-rc kernels consistently experience a 5 minute
> delay
> late during boot, after enabling networking but before allowing user
> logins.  E.g. 5.0-rc5 dmesg has:
> 
> [Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
> read (12 bytes read)
> [Fri Feb  8 17:18:14 2019] random: crng init done

I've had the same problem on several of my test systems.  Are you sure
it's not this bug report:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=912087

?

The solution for me was to install the haveged package which does
active entropy gathering during boot and can make /dev/urandom
available much earlier.

> During this interval the machine answers pings but won't allow user
> logins either on the console or over the network.
> 
> A git bisect identified commit
> f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
> Author: Jens Axboe 
> Date:   Thu Nov 1 16:36:27 2018 -0600
> 
> scsi: kill off the legacy IO path
> 
> as the point where this 5m delay was introduced.

I think the reason for this is that the block mq path doesn't feed the
kernel entropy pool correctly, hence the need to install an entropy
gatherer for systems that don't have other good random number sources.

James



[5.0-rc5 regression] "scsi: kill off the legacy IO path" causes 5 minute delay during boot on Sun Blade 2500

2019-02-09 Thread Mikael Pettersson
4.20 and earlier kernels boot fine on my Sun Blade 2500 (UltraSPARC
IIIi), but the 5.0-rc kernels consistently experience a 5 minute delay
late during boot, after enabling networking but before allowing user
logins.  E.g. 5.0-rc5 dmesg has:

[Fri Feb  8 17:13:17 2019] random: dbus-daemon: uninitialized urandom
read (12 bytes read)
[Fri Feb  8 17:18:14 2019] random: crng init done

During this interval the machine answers pings but won't allow user
logins either on the console or over the network.

A git bisect identified commit f664a3cc17b7d0a2bc3b3ab96181e1029b0ec0e6
Author: Jens Axboe 
Date:   Thu Nov 1 16:36:27 2018 -0600

scsi: kill off the legacy IO path

as the point where this 5m delay was introduced.

My older kernels all have CONFIG_SCSI_MQ_DEFAULT=N, which the above
commit effectively forces to Y.
Rebuilding 4.20 with CONFIG_SCSI_MQ_DEFAULT=Y also triggers the 5m
delay behaviour.

I haven't seen this behaviour on my x86-64 boxes, so presumably it's
related to the sparc64 kernel or this machine's SCSI adapter.

.config and dmesg below.

/Mikael

#
# Automatically generated file; DO NOT EDIT.
# Linux/sparc64 4.20.0 Kernel Configuration
#

#
# Compiler: gcc (GCC) 7.4.1 20181227
#
CONFIG_CC_IS_GCC=y
CONFIG_GCC_VERSION=70401
CONFIG_CLANG_VERSION=0
CONFIG_IRQ_WORK=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION="-blkmq"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_BUILD_SALT=""
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
# CONFIG_USELIB is not set
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_IRQ_PREFLOW_FASTEOI=y
CONFIG_IRQ_DOMAIN=y
CONFIG_GENERIC_MSI_IRQ=y
CONFIG_SPARSE_IRQ=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_PSI is not set
# CONFIG_CPU_ISOLATION is not set

#
# RCU Subsystem
#
CONFIG_TREE_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_RDMA is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_NAMESPACES is not set
# CONFIG_CHECKPOINT_RESTORE is not set
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
# CONFIG_RELAY is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_RD_XZ is not set
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_FHANDLE=y
CONFIG_POSIX_TIMERS=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_FUTEX_PI=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
# CONFIG_AIO is not set
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_MEMBARRIER is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
# CONFIG_BPF_SYSCALL is not set
# CONFIG_USERFAULTFD is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y
# CONFIG_PC104 is not set

#
# Kernel Performance Events And Counters
#
# CONFIG_PERF_EVENTS is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
CONFIG_SLAB_MERGE_DEFAULT=y
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SLAB_FREELIST_HARDENED is not set
CONFIG_SLUB_CPU_PARTIAL=y
# CONFIG_PROFILING is not set
CONFIG_64BIT=y
CONFIG_SPARC=y
CONFIG_SPARC64=y
CONFIG_ARCH_DEFCONFIG="arch/sparc/configs/sparc64_defconfig"
CONFIG_ARCH_PROC_KCORE_TEXT=y
CONFIG_CPU_BIG_ENDIAN=y
CONFIG_ARCH_ATU=y