Re: What belongs in the Debian cloud kernel?

2020-04-02 Thread Noah Meyerhans
On Thu, Apr 02, 2020 at 10:55:16AM -0700, Ross Vandegrift wrote:
> I don't think just saying "yes" automatically is the best approach.  But
> I'm not sure we can come up with a clear set of rules.  Evaluating the
> use cases will involve judgment calls about size vs functionality.  I
> guess I think that's okay.

You certainly may be right.  I wasn't able to convince myself either
way, which is why I posted for additional opinions.

> The first two bugs are about nested virtualization.  I like the idea of
> deciding to support that or not.  I don't know much about nested virt,
> so I don't have a strong opinion.  It seems pretty widely supported on
> our platforms.  I don't know if it raises performance or security
> concerns.  So these seem okay to me, as long as we decide to support
> nested virt, and there aren't major cons that I'm unaware of.

IMO nested virtualization is not something I'd want to see in a
"production" environment.  Hardware-assisted isolation between VMs is
critical for hosting mixed-trust workloads (e.g. VMs owned and
controlled by unrelated parties without a mutual trust relationship).
Current hardware virtualization extensions, e.g. Intel VTx, only have a
concept of a single level of virtualization.  Nested virtualization is
implemented by trapping and emulating the CPU extensions, and by doing a
bunch of mapping of nested guest state to allow it to effectively run as
a peer VM of the parent guest in hardware.  Some details at [1].  So not
only is it painfully complex, but it's also quite slow.

This is not to say that there aren't any legitimate use cases for nested
virtualization.  Only that I'm not sure it's something we want to be
optimizing for.

> Can you share more about the KSM use case?  I'm worried about raising
> security concerns for this one.  KSM has had a history of enabling
> attacks that are sorta serious, but also sorta theoretical.  This might
> cause upset from infosec folks that freak out about any vulnerability -
> even when they don't really understand the magnitude of the risk.

I don't have any direct experience with KSM.  I can certainly see how it
could help with certain classes of workload, though, if it's known that
multiple processes with mostly identical state are running.

I'm not sure I'd focus too much on the security implications of KSM,
though, since it's widely enabled in Debian's generic kernel and kernels
distributed by other distros.  I don't want to cargo-cult it, but
neither do I want to ignore prior art.  I don't think there's any reason
to drop support for applications making use of KSM in our cloud kernels,
though.  I can't think of any reason why the feature would be less
useful in a cloud environment, and it could certainly save money by
allowing the use of smaller instances.

noah

1. 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/virt/kvm/nested-vmx.rst



Re: What belongs in the Debian cloud kernel?

2020-04-02 Thread Ross Vandegrift
On Wed, Apr 01, 2020 at 03:15:37PM -0400, Noah Meyerhans wrote:
> Should we simply say "yes" to any request to add functionality to the
> cloud kernel?  None of the drivers will add *that* much to the size of
> the image, and if people are asking for them, then they've obviously got
> a use case for them.  Or is this a slipperly slope that diminishes the
> value of the cloud kernel?  I can see both sides of the argument, so I'd
> like to hear what others have to say.

I don't think just saying "yes" automatically is the best approach.  But
I'm not sure we can come up with a clear set of rules.  Evaluating the
use cases will involve judgment calls about size vs functionality.  I
guess I think that's okay.


The first two bugs are about nested virtualization.  I like the idea of
deciding to support that or not.  I don't know much about nested virt,
so I don't have a strong opinion.  It seems pretty widely supported on
our platforms.  I don't know if it raises performance or security
concerns.  So these seem okay to me, as long as we decide to support
nested virt, and there aren't major cons that I'm unaware of.


Can you share more about the KSM use case?  I'm worried about raising
security concerns for this one.  KSM has had a history of enabling
attacks that are sorta serious, but also sorta theoretical.  This might
cause upset from infosec folks that freak out about any vulnerability -
even when they don't really understand the magnitude of the risk.

I tried to understand the current state of KSM security.  But I couldn't
easily find a recent summary, and I'm not an expert on the issues.  Here
are the older links I looked at:
- https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-2877
- https://access.redhat.com/blogs/766093/posts/1976303
- https://staff.aist.go.jp/k.suzaki/EuroSec2011-suzaki.pdf
- https://www.usenix.org/system/files/conference/woot15/woot15-paper-barresi.pdf

These sound mostly impractical to me, but they do enable scary sounding
threats (read/write across vmm and hypervisor boundaries).  That makes
me nervous, but someone who understands the issues could convince me
that these aren't worth worrying about.

Ross



Re: What belongs in the Debian cloud kernel?

2020-04-02 Thread Tom Ladd
Hi!

I'd be happy to work on creating documentation for the Debian cloud kernel,
especially for OpenStack.

I"ve worked with Debian as my primary OS for several years now, and have
also been exploring OpenStack.

I'm also honing my Technical Writer skills through on online course.
Writing this documentation would give me an opportunity to combine all my
interests in a single project.

How/when do I start?

Thank you,

Tom Ladd

On Wed, Apr 1, 2020, 12:15 PM Noah Meyerhans  wrote:

> For buster, we generate a cloud kernel for amd64.  For sid/bullseye,
> we'll also support a cloud kernel for arm64.  At the moment, the cloud
> kernel is the only used in the images we generate for Microsoft Azure
> and Amazon EC2.  It's used in the GCE images we generate as well, but
> I'm not sure anybody actually uses those.  We generate two OpenStack
> images, one that uses the cloud kernel and another uses the generic
> kernel.
>
> There are open bugs against the cloud kernel requesting that
> configuration options be turned on there. [1][2][3]  These, IMO,
> highlight a need for some documentation around what is in scope for the
> cloud kernel, and what is not.  This will help us answer requests such
> as these more consistently, and it will also help our users better
> understand whether they can expect the cloud kernel to meet their needs
> or not.
>
> At the moment, the primary optimization applied to the cloud kernel
> focuses on disk space consumed.  We disable compilation of drivers that
> we feel are unlikely to ever appear in a cloud environment.  By doing
> so, we reduce the installed size of the kernel package by roughly 70%.
> There are other optimization we may apply (see [4] for examples), but we
> don't yet.
>
> Should we simply say "yes" to any request to add functionality to the
> cloud kernel?  None of the drivers will add *that* much to the size of
> the image, and if people are asking for them, then they've obviously got
> a use case for them.  Or is this a slipperly slope that diminishes the
> value of the cloud kernel?  I can see both sides of the argument, so I'd
> like to hear what others have to say.
>
> If we're not going to say "yes" to all requests, what criteria should we
> use to determine whether or not to enable a feature?  It's rather not
> leave it as a judgement call.
>
> noah
>
> 1. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=952108
> 2. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955366
> 3. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=955232
> 4. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=947759
>
>


Bug#955480: Acknowledgement (Kernel 5.4 does not like megaraid_sas controller)

2020-04-02 Thread Robert Sander
Hi,

I think the bug can be closed.

I switched the BIOS to UEFI and now kernel 5.4 is able to use the
megaraid_sas controller without any issue.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin



signature.asc
Description: OpenPGP digital signature


Re: Recurrent alerts "Package temperature above threshold, cpu clock throttled"

2020-04-02 Thread l0f4r0
Hi again,

2 avr. 2020 à 00:30 de l0f...@tuta.io:

> => From now on, it seems I have 2 workarounds: turbo boost deactivation and 
> Debian thermald.
> I guess I should do some A/B testing...
>
I've just uninstalled thermald manually built from 01.org, and installed the 
official Debian package as suggested instead.
I've adopted a zero-configuration approach.
Here is my situation by default:

systemctl status thermald.service
  thermald.service - Thermal Daemon Service
   Loaded: loaded (/lib/systemd/system/thermald.service; enabled; vendor 
preset: enabled)
   Active: active (running) since Wed 2020-04-01 23:59:38 CEST; 15h ago
Main PID: 30158 (thermald)
    Tasks: 2 (limit: 4915)
   Memory: 2.3M
   CGroup: /system.slice/thermald.service 
   └─30158 /usr/sbin/thermald --no-daemon --dbus-enable

avril 01 23:59:38 thermald[30158]: sensor id 10 : No temp sysfs for reading raw 
temp
avril 01 23:59:38 thermald[30158]: sensor id 10 : No temp sysfs for reading raw 
temp
avril 01 23:59:38 thermald[30158]: sensor id 10 : No temp sysfs for reading raw 
temp
avril 01 23:59:38 thermald[30158]: I/O warning : failed to load external entity 
"/etc/thermald/thermal-conf.xml"
avril 01 23:59:38 thermald[30158]: error: could not parse file 
/etc/thermald/thermal-conf.xml
avril 01 23:59:38 thermald[30158]: sysfs open failed
avril 01 23:59:38 thermald[30158]: I/O warning : failed to load external entity 
"/etc/thermald/thermal-conf.xml"
avril 01 23:59:38 thermald[30158]: error: could not parse file 
/etc/thermald/thermal-conf.xml
avril 01 23:59:38 thermald[30158]: I/O warning : failed to load external entity 
"/etc/thermald/thermal-conf.xml"
avril 01 23:59:38 thermald[30158]: error: could not parse file 
/etc/thermald/thermal-conf.xml

NB : indeed I don't have /etc/thermald/thermal-conf.xml but it seems that 
thermald can work without it 
(https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1811788).

grep -i pstate /boot/config-$(uname -r)
CONFIG_X86_INTEL_PSTATE=y

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate
intel_pstate

cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 400 MHz - 4.60 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 400 MHz and 4.60 GHz.
  The governor "powersave" may decide which speed to use
  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 887 MHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes

Logs "CPUX: Package temperature above threshold, cpu clock throttled (total 
events = XXX)" are still present in journalctl :(

Do you still think thermald can help me? If so what would you try/configure 
please?
Maybe there is simply something required by thermald I haven't installed yet?

Thank you for your appreciated help :)
Best regards,
l0f4r0



Bug#949369: Info received (Bug#949369: i915: kernel crash in i915_active_acquire())

2020-04-02 Thread Guy Baconniere


@John try to install Linux Kernel 5.5.13 (aka 5.5.0-1) from sid

https://packages.debian.org/sid/kernel/linux-image-5.5.0-1-amd64-unsigned

https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5.12/CHANGES

 Chris Wilson (1):

  drm/i915/execlists: Track active elements during dequeue
 Matt Roper (1):

  drm/i915: Handle all MCR ranges
 Caz Yokoyama (1):

  Revert "drm/i915/tgl: Add extra hdc flush workaround"

Check the comment on Debian Bug #954817
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=954817#17

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1868551/comments/29