Re: [PATCH 2/4] PKCS#7: Check codeSigning EKU for kernel module and kexec pe verification

2021-04-19 Thread joeyli
Hi Varad,

Thanks for your review!

On Thu, Apr 15, 2021 at 02:08:32PM +0200, Varad Gautam wrote:
> Hi Joey,
> 
> On 4/9/21 4:46 AM, Lee, Chun-Yi wrote:
> > This patch adds the logic for checking the CodeSigning extended
> > key usage when verifying signature of kernel module or
> > kexec PE binary in PKCS#7.
> > 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  certs/system_keyring.c   |  2 +-
> >  crypto/asymmetric_keys/Kconfig   |  9 +
> >  crypto/asymmetric_keys/pkcs7_trust.c | 37 
> > +---
> >  include/crypto/pkcs7.h   |  3 ++-
> >  4 files changed, 46 insertions(+), 5 deletions(-)
> > 
> > diff --git a/certs/system_keyring.c b/certs/system_keyring.c
> > index 4b693da488f1..c9f8bca0b0d3 100644
> > --- a/certs/system_keyring.c
> > +++ b/certs/system_keyring.c
> > @@ -243,7 +243,7 @@ int verify_pkcs7_message_sig(const void *data, size_t 
> > len,
> > goto error;
> > }
> > }
> > -   ret = pkcs7_validate_trust(pkcs7, trusted_keys);
> > +   ret = pkcs7_validate_trust(pkcs7, trusted_keys, usage);
> > if (ret < 0) {
> > if (ret == -ENOKEY)
> > pr_devel("PKCS#7 signature not signed with a trusted 
> > key\n");
> > diff --git a/crypto/asymmetric_keys/Kconfig b/crypto/asymmetric_keys/Kconfig
> > index 1f1f004dc757..1754812df989 100644
> > --- a/crypto/asymmetric_keys/Kconfig
> > +++ b/crypto/asymmetric_keys/Kconfig
> > @@ -96,4 +96,13 @@ config SIGNED_PE_FILE_VERIFICATION
> >   This option provides support for verifying the signature(s) on a
> >   signed PE binary.
> >  
> > +config CHECK_CODESIGN_EKU
> > +   bool "Check codeSigning extended key usage"
> > +   depends on PKCS7_MESSAGE_PARSER=y
> > +   depends on SYSTEM_DATA_VERIFICATION
> > +   help
> > + This option provides support for checking the codeSigning extended
> > + key usage when verifying the signature in PKCS#7. It affects kernel
> > + module verification and kexec PE binary verification.
> > +
> >  endif # ASYMMETRIC_KEY_TYPE
> > diff --git a/crypto/asymmetric_keys/pkcs7_trust.c 
> > b/crypto/asymmetric_keys/pkcs7_trust.c
> > index b531df2013c4..077bfef928b6 100644
> > --- a/crypto/asymmetric_keys/pkcs7_trust.c
> > +++ b/crypto/asymmetric_keys/pkcs7_trust.c
> > @@ -16,12 +16,36 @@
> >  #include 
> >  #include "pkcs7_parser.h"
> >  
> > +#ifdef CONFIG_CHECK_CODESIGN_EKU
> > +static bool check_codesign_eku(struct key *key,
> > +enum key_being_used_for usage)
> > +{
> > +   struct public_key *public_key = key->payload.data[asym_crypto];
> > +
> > +   switch (usage) {
> > +   case VERIFYING_MODULE_SIGNATURE:
> > +   case VERIFYING_KEXEC_PE_SIGNATURE:
> > +   return !!(public_key->eku & EKU_codeSigning);
> > +   default:
> > +   break;
> > +   }
> > +   return true;
> > +}
> > +#else
> > +static bool check_codesign_eku(struct key *key,
> > +enum key_being_used_for usage)
> > +{
> > +   return true;
> > +}
> > +#endif
> > +
> >  /*
> >   * Check the trust on one PKCS#7 SignedInfo block.
> >   */
> >  static int pkcs7_validate_trust_one(struct pkcs7_message *pkcs7,
> > struct pkcs7_signed_info *sinfo,
> > -   struct key *trust_keyring)
> > +   struct key *trust_keyring,
> > +   enum key_being_used_for usage)
> >  {
> > struct public_key_signature *sig = sinfo->sig;
> > struct x509_certificate *x509, *last = NULL, *p;
> > @@ -112,6 +136,12 @@ static int pkcs7_validate_trust_one(struct 
> > pkcs7_message *pkcs7,
> > return -ENOKEY;
> >  
> >  matched:
> > +   if (!check_codesign_eku(key, usage)) {
> 
> Perhaps this can be a generic check_eku_usage() call, with codesigning as one 
> of the
> things it can check for.
>

Because only codesign EKU be checked now. So I prefer to keep it
as my current implementation until there have other EKU requirement. 
 
> > +   pr_warn("sinfo %u: The signer %x key is not CodeSigning\n",
> > +   sinfo->index, key_serial(key));
> > +   key_put(key);
> > +   return -ENOKEY;
> > +   }
> > ret = verify_signature(key, sig);
> > key_put(key);
> > if (ret < 0) {
> > @@ -156,7 +186,8 @@ static int pkcs7_validate_trust_one(struct 
> > pkcs7_message *pkcs7,
> >   * May also return -ENOMEM.
> >   */
> >  int pkcs7_validate_trust(struct pkcs7_message *pkcs7,
> > -struct key *trust_keyring)
> > +struct key *trust_keyring,
> > +enum key_being_used_for usage)
> 
> Please also update the comment above to mention the `usage` parameter.
> 
> Regards,
> Varad

I will update it to the comment. Thanks!

Joey Lee

> 
> >  {
> > struct pkcs7_signed_info *sinfo;
> > struct x509_certificate *p;
> > @@ -167,7 +198,7 @@ int pkcs7_validate_trust(struct pkcs7_message *pkcs7,
> >

Re: [PATCH 1/4] X.509: Add CodeSigning extended key usage parsing

2021-04-14 Thread joeyli
Hi Varad, 

Thanks for your review!

On Tue, Apr 13, 2021 at 04:28:11PM +0200, Varad Gautam wrote:
> Hi,
> 
> On 3/9/21 10:10 AM, Lee, Chun-Yi wrote:
> > This patch adds the logic for parsing the CodeSign extended key usage
> > extension in X.509. The parsing result will be set to the eku flag
> > which is carried by public key. It can be used in the PKCS#7
> > verification.
> > 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  crypto/asymmetric_keys/x509_cert_parser.c | 24 
> >  include/crypto/public_key.h   |  1 +
> >  include/linux/oid_registry.h  |  5 +
> >  3 files changed, 30 insertions(+)
> > 
> > diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
> > b/crypto/asymmetric_keys/x509_cert_parser.c
> > index 52c9b455fc7d..65721313b265 100644
> > --- a/crypto/asymmetric_keys/x509_cert_parser.c
> > +++ b/crypto/asymmetric_keys/x509_cert_parser.c
> > @@ -497,6 +497,8 @@ int x509_process_extension(void *context, size_t hdrlen,
> > struct x509_parse_context *ctx = context;
> > struct asymmetric_key_id *kid;
> > const unsigned char *v = value;
> > +   int i = 0;
> > +   enum OID oid;
> >  
> > pr_debug("Extension: %u\n", ctx->last_oid);
> >  
> > @@ -526,6 +528,28 @@ int x509_process_extension(void *context, size_t 
> > hdrlen,
> > return 0;
> > }
> >  
> > +   if (ctx->last_oid == OID_extKeyUsage) {
> > +   if (v[0] != ((ASN1_UNIV << 6) | ASN1_CONS_BIT | ASN1_SEQ) ||
> > +   v[1] != vlen - 2)
> 
> A bad cert might get here with vlen < 2, which would cause indexing into v to 
> break.
> Please add a check for vlen >= 2 before this.
>

I will add the check, thanks for your suggestion!
 
> > +   return -EBADMSG;
> > +   i += 2;
> > +
> > +   while (i < vlen) {
> > +   /* A 10 bytes EKU OID Octet blob =
> > +* ASN1_OID + size byte + 8 bytes OID */
> > +   if (v[i] != ASN1_OID || v[i + 1] != 8 || (i + 10) > 
> > vlen)
> 
> Same here, for i == (vlen - 1), v[i + 1] would fetch outside of v. Or, does 
> the
> ASN.1 layout protect against this?
>

I will move the "(i + 10) > vlen" to the front of "v[i + 1] != 8". It can avoid
that the last octet blob is less than 10 bytes.

Thanks!
Joey Lee
 
> > +   return -EBADMSG;
> > +
> > +   oid = look_up_OID(v + i + 2, v[i + 1]);
> > +   if (oid == OID_codeSigning) {
> > +   ctx->cert->pub->eku |= EKU_codeSigning;
> > +   }
> > +   i += 10;
> > +   }
> > +   pr_debug("extKeyUsage: %d\n", ctx->cert->pub->eku);
> > +   return 0;
> > +   }
> > +
> > return 0;
> >  }
> >  
> > diff --git a/include/crypto/public_key.h b/include/crypto/public_key.h
> > index 47accec68cb0..1ccaebe2a28b 100644
> > --- a/include/crypto/public_key.h
> > +++ b/include/crypto/public_key.h
> > @@ -28,6 +28,7 @@ struct public_key {
> > bool key_is_private;
> > const char *id_type;
> > const char *pkey_algo;
> > +   unsigned int eku : 9;  /* Extended Key Usage (9-bit) */
> >  };
> >  
> >  extern void public_key_free(struct public_key *key);
> > diff --git a/include/linux/oid_registry.h b/include/linux/oid_registry.h
> > index 4462ed2c18cd..e20e8eb53b21 100644
> > --- a/include/linux/oid_registry.h
> > +++ b/include/linux/oid_registry.h
> > @@ -113,9 +113,14 @@ enum OID {
> > OID_SM2_with_SM3,   /* 1.2.156.10197.1.501 */
> > OID_sm3WithRSAEncryption,   /* 1.2.156.10197.1.504 */
> >  
> > +   /* Extended key purpose OIDs [RFC 5280] */
> > +   OID_codeSigning,/* 1.3.6.1.5.5.7.3.3 */
> > +
> > OID__NR
> >  };
> >  
> > +#define EKU_codeSigning(1 << 2)
> > +
> >  extern enum OID look_up_OID(const void *data, size_t datasize);
> >  extern int sprint_oid(const void *, size_t, char *, size_t);
> >  extern int sprint_OID(enum OID, char *, size_t);
> > 
> 
> -- 
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5
> 90409 Nürnberg
> Germany
> 
> HRB 36809, AG Nürnberg
> Geschäftsführer: Felix Imendörffer



Re: [PATCH 4/4] Documentation/admin-guide/module-signing.rst: add openssl command option example for CodeSign EKU

2021-02-24 Thread joeyli
Hi Randy,

On Mon, Feb 22, 2021 at 08:48:42AM -0800, Randy Dunlap wrote:
> Hi,
> 
> On 2/21/21 10:42 PM, Lee, Chun-Yi wrote:
> > Add an openssl command option example for generating CodeSign extended
> > key usage in X.509 when CONFIG_CHECK_CODESIGN_EKU is enabled.
> > 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  Documentation/admin-guide/module-signing.rst | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/module-signing.rst 
> > b/Documentation/admin-guide/module-signing.rst
> > index 7d7c7c8a545c..b57b30c7125f 100644
> > --- a/Documentation/admin-guide/module-signing.rst
> > +++ b/Documentation/admin-guide/module-signing.rst
> > @@ -170,6 +170,12 @@ generate the public/private key files::
> >-config x509.genkey -outform PEM -out kernel_key.pem \
> >-keyout kernel_key.pem
> >  
> > +When ``CONFIG_CHECK_CODESIGN_EKU`` option be enabled, the following openssl
> 
>  is enabled,
> 
> > +command option should be added for generating CodeSign extended key usage 
> > in
> 
>   should be added ?
> 
> > +X.509::

Thanks for your review! I will update to next version.

Joey Lee



Re: [PATCH 1/4] X.509: Add CodeSigning extended key usage parsing

2021-01-21 Thread joeyli
On Thu, Jan 21, 2021 at 04:32:26PM +0200, Jarkko Sakkinen wrote:
> On Thu, Jan 21, 2021 at 12:23:53PM +0800, joeyli wrote:
> > Hi Jarkko,
> > 
> > On Thu, Jan 21, 2021 at 01:40:48AM +0200, Jarkko Sakkinen wrote:
> > > On Wed, Jan 20, 2021 at 05:05:14PM +0800, Lee, Chun-Yi wrote:
> > > > This patch adds the logic for parsing the CodeSign extended key usage
> > > > extension in X.509. The parsing result will be set to the eku flag
> > > > which is carried by public key. It can be used in the PKCS#7
> > > > verification.
> > > > 
> > > > Signed-off-by: "Lee, Chun-Yi" 
> > > > ---
> > > >  crypto/asymmetric_keys/x509_cert_parser.c | 24 
> > > >  include/crypto/public_key.h   |  1 +
> > > >  include/linux/oid_registry.h  |  5 +
> > > >  3 files changed, 30 insertions(+)
> > > > 
> > > > diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
> > > > b/crypto/asymmetric_keys/x509_cert_parser.c
> > > > index 52c9b455fc7d..65721313b265 100644
> > > > --- a/crypto/asymmetric_keys/x509_cert_parser.c
> > > > +++ b/crypto/asymmetric_keys/x509_cert_parser.c
> > > > @@ -497,6 +497,8 @@ int x509_process_extension(void *context, size_t 
> > > > hdrlen,
> > > > struct x509_parse_context *ctx = context;
> > > > struct asymmetric_key_id *kid;
> > > > const unsigned char *v = value;
> > > > +   int i = 0;
> > > > +   enum OID oid;
> > > >  
> > > > pr_debug("Extension: %u\n", ctx->last_oid);
> > > >  
> > > > @@ -526,6 +528,28 @@ int x509_process_extension(void *context, size_t 
> > > > hdrlen,
> > > > return 0;
> > > > }
> > > >  
> > > > +   if (ctx->last_oid == OID_extKeyUsage) {
> > > > +   if (v[0] != ((ASN1_UNIV << 6) | ASN1_CONS_BIT | 
> > > > ASN1_SEQ) ||
> > > > +   v[1] != vlen - 2)
> > > > +   return -EBADMSG;
> > > > +   i += 2;
> > > > +
> > > > +   while (i < vlen) {
> > > > +   /* A 10 bytes EKU OID Octet blob =
> > > > +* ASN1_OID + size byte + 8 bytes OID */
> > > > +   if (v[i] != ASN1_OID || v[i + 1] != 8 || (i + 
> > > > 10) > vlen)
> > > > +   return -EBADMSG;
> > > > +
> > > > +   oid = look_up_OID(v + i + 2, v[i + 1]);
> > > > +   if (oid == OID_codeSigning) {
> > > > +   ctx->cert->pub->eku |= EKU_codeSigning;
> > > > +   }
> > > > +   i += 10;
> > > > +   }
> > > > +   pr_debug("extKeyUsage: %d\n", ctx->cert->pub->eku);
> > > 
> > > With eBPF around, does this make any sense?
> > >
> > 
> > I think that the dynamic debug log is still easier for checking the EKU
> > setting.
> 
> Why?

There have some certificates may loaded when system boot. In booting stage,
using dynamic debug log for checking EKU is easier than eBPF. 

I am not good on eBPF. Correct me if I missed anything, please!

Thanks a lot!
Joey Lee 



Re: [PATCH 1/4] X.509: Add CodeSigning extended key usage parsing

2021-01-20 Thread joeyli
Hi Jarkko,

On Thu, Jan 21, 2021 at 01:40:48AM +0200, Jarkko Sakkinen wrote:
> On Wed, Jan 20, 2021 at 05:05:14PM +0800, Lee, Chun-Yi wrote:
> > This patch adds the logic for parsing the CodeSign extended key usage
> > extension in X.509. The parsing result will be set to the eku flag
> > which is carried by public key. It can be used in the PKCS#7
> > verification.
> > 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  crypto/asymmetric_keys/x509_cert_parser.c | 24 
> >  include/crypto/public_key.h   |  1 +
> >  include/linux/oid_registry.h  |  5 +
> >  3 files changed, 30 insertions(+)
> > 
> > diff --git a/crypto/asymmetric_keys/x509_cert_parser.c 
> > b/crypto/asymmetric_keys/x509_cert_parser.c
> > index 52c9b455fc7d..65721313b265 100644
> > --- a/crypto/asymmetric_keys/x509_cert_parser.c
> > +++ b/crypto/asymmetric_keys/x509_cert_parser.c
> > @@ -497,6 +497,8 @@ int x509_process_extension(void *context, size_t hdrlen,
> > struct x509_parse_context *ctx = context;
> > struct asymmetric_key_id *kid;
> > const unsigned char *v = value;
> > +   int i = 0;
> > +   enum OID oid;
> >  
> > pr_debug("Extension: %u\n", ctx->last_oid);
> >  
> > @@ -526,6 +528,28 @@ int x509_process_extension(void *context, size_t 
> > hdrlen,
> > return 0;
> > }
> >  
> > +   if (ctx->last_oid == OID_extKeyUsage) {
> > +   if (v[0] != ((ASN1_UNIV << 6) | ASN1_CONS_BIT | ASN1_SEQ) ||
> > +   v[1] != vlen - 2)
> > +   return -EBADMSG;
> > +   i += 2;
> > +
> > +   while (i < vlen) {
> > +   /* A 10 bytes EKU OID Octet blob =
> > +* ASN1_OID + size byte + 8 bytes OID */
> > +   if (v[i] != ASN1_OID || v[i + 1] != 8 || (i + 10) > 
> > vlen)
> > +   return -EBADMSG;
> > +
> > +   oid = look_up_OID(v + i + 2, v[i + 1]);
> > +   if (oid == OID_codeSigning) {
> > +   ctx->cert->pub->eku |= EKU_codeSigning;
> > +   }
> > +   i += 10;
> > +   }
> > +   pr_debug("extKeyUsage: %d\n", ctx->cert->pub->eku);
> 
> With eBPF around, does this make any sense?
>

I think that the dynamic debug log is still easier for checking the EKU
setting.

Thanks
Joey Lee



Re: [PATCH 4/4] Documentation/admin-guide/module-signing.rst: add openssl command option example for CodeSign EKU

2020-11-25 Thread joeyli
Hi Randy,

Thanks for your review! I will update it in next version.

Joey Lee

On Wed, Nov 25, 2020 at 09:25:51AM -0800, Randy Dunlap wrote:
> Hi--
> 
> On 11/24/20 11:26 PM, Lee, Chun-Yi wrote:
> > Add an openssl command option example for generating CodeSign extended
> > key usage in X.509 when CONFIG_CHECK_CODESIGN_EKU be enabled.
> 
> is enabled.
> 
> > 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  Documentation/admin-guide/module-signing.rst | 6 ++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/Documentation/admin-guide/module-signing.rst 
> > b/Documentation/admin-guide/module-signing.rst
> > index f8b584179cff..bc184124d646 100644
> > --- a/Documentation/admin-guide/module-signing.rst
> > +++ b/Documentation/admin-guide/module-signing.rst
> > @@ -170,6 +170,12 @@ generate the public/private key files::
> >-config x509.genkey -outform PEM -out kernel_key.pem \
> >-keyout kernel_key.pem
> >  
> > +When ``CONFIG_CHECK_CODESIGN_EKU`` option be enabled, the following openssl
> 
>  is enabled,
> 
> > +command option should be added for generating CodeSign extended key usage 
> > in
> > +X.509::
> > +
> > +-addext "extendedKeyUsage=codeSigning"
> > +
> >  The full pathname for the resulting kernel_key.pem file can then be 
> > specified
> >  in the ``CONFIG_MODULE_SIG_KEY`` option, and the certificate and key 
> > therein will
> >  be used instead of an autogenerated keypair.
> > 
> 
> 
> -- 
> ~Randy



Re: [RFC PATCH 2/2] PKCS#7: Check codeSigning EKU for kernel module and kexec pe verification

2020-10-21 Thread joeyli
Hi Randy,

On Tue, Oct 20, 2020 at 07:56:29AM -0700, Randy Dunlap wrote:
> On 10/20/20 6:42 AM, Ben Boeckel wrote:
> > On Tue, Oct 20, 2020 at 14:50:01 +0800, Lee, Chun-Yi wrote:
> >> +config CHECK_CODESIGN_EKU
> >> +  bool "Check codeSigning extended key usage"
> >> +  depends on PKCS7_MESSAGE_PARSER=y
> >> +  depends on SYSTEM_DATA_VERIFICATION
> >> +  help
> >> +This option provides support for checking the codeSigning extended
> >> +key usage extension when verifying the signature in PKCS#7. It
> 
> extended ... extension.
> Can we drop one of those or reword it?

I will remove the _extension_ word in next version. Thanks for your review!

Joey Lee



Re: [RFC PATCH 2/2] PKCS#7: Check codeSigning EKU for kernel module and kexec pe verification

2020-10-21 Thread joeyli
Hi Ben,

On Tue, Oct 20, 2020 at 09:42:08AM -0400, Ben Boeckel wrote:
> On Tue, Oct 20, 2020 at 14:50:01 +0800, Lee, Chun-Yi wrote:
> > +config CHECK_CODESIGN_EKU
> > +   bool "Check codeSigning extended key usage"
> > +   depends on PKCS7_MESSAGE_PARSER=y
> > +   depends on SYSTEM_DATA_VERIFICATION
> > +   help
> > + This option provides support for checking the codeSigning extended
> > + key usage extension when verifying the signature in PKCS#7. It
> > + affects kernel module verification and kexec PE binary verification
> > + now.
> 
> Is the "now" necessary? Isn't it implied by the option's existence?

Thanks for your review. I will remove the "now" in next version.

Joey Lee



Re: [PATCH] efi/efivars: Create efivars mount point in the registration of efivars abstraction

2020-09-24 Thread joeyli
Hi Ard,

On Thu, Sep 24, 2020 at 12:47:46PM +0200, Ard Biesheuvel wrote:
> On Thu, 24 Sep 2020 at 10:28, Lee, Chun-Yi  wrote:
> >
> > This patch moved the logic of creating efivars mount point to the
> > registration of efivars abstraction. It's useful for userland to
> > determine the availability of efivars filesystem by checking the
> > existence of mount point.
> >
> > The 'efivars' platform device be created on generic EFI runtime services
> > platform, so it can be used to determine the availability of efivarfs.
> > But this approach is not available for google gsmi efivars abstraction.
> >
> > This patch be tested on Here on qemu-OVMF and qemu-uboot.
> >
> > Cc: Ard Biesheuvel 
> > Cc: Matthias Brugger 
> > Cc: Fabian Vogt 
> > Cc: Ilias Apalodimas 
> > Cc: Greg Kroah-Hartman 
> > Cc: Arthur Heymans 
> > Cc: Patrick Rudolph 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> 
> I take it this is v3 of [0]? If so, please explain how it deviates
> from v2. If it doesn't deviate from v2, it is better to continue the
> discussion in the other thread.
> 
> For the sake of discussion, it helps to clarify the confusing nomenclature:
> 
> a) 'efivars abstraction' - an internal kernel API that exposes EFI
> variables, and can potentially be backed by an implementation that is
> not EFI based (i.e., Google gsmi)
> 
> b) efivars.ko module, built on top of the efivars abstraction, which
> exposes EFI variables (real ones or gsmi ones) via the deprecated
> sysfs interface
> 
> c) efivarfs filesystem, also built on top of the efivars abstraction,
> which exposes EFI variables (real ones or gsmi ones) via a special
> filesystem independently of sysfs.
> 
> Of course, the sysfs mount point we create for efivarfs is not called
> 'efivarfs' but 'efivars'. The sysfs subdirectory we create for
> efivars.ko is called 'vars'. Sigh.
>

Thanks for your clarification. It's useful to me!
 
> 
> In this patch, you create the mount point for c) based on whether a)
> gets registered (which occurs on systems with EFI Get/SetVariable
> support or GSMI), right? So, to Greg's point, wouldn't it be easier to
> simply check whether efivarfs is listed in /proc/filesystems?
>

Yes, I think that Greg's suggestion is good enough for a userland tool
to detect the availability of efivarfs. You can ignore my patch. 

Thanks for your help!
Joey Lee
 
> It also helps if you could clarify what the actual use case is, rather
> than saying that it is generally useful.
> 
> 
> 
> 
> 
> [0] https://lore.kernel.org/linux-efi/20200825160719.7188-1-j...@suse.com/
> 
> >  drivers/firmware/efi/efi.c  |  7 ---
> >  drivers/firmware/efi/vars.c | 17 +
> >  2 files changed, 17 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index 3aa07c3b5136..23c11a2a3f4d 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -405,13 +405,6 @@ static int __init efisubsys_init(void)
> > if (error)
> > goto err_remove_group;
> >
> > -   /* and the standard mountpoint for efivarfs */
> > -   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > -   if (error) {
> > -   pr_err("efivars: Subsystem registration failed.\n");
> > -   goto err_remove_group;
> > -   }
> > -
> > if (efi_enabled(EFI_DBG) && efi_enabled(EFI_PRESERVE_BS_REGIONS))
> > efi_debugfs_init();
> >
> > diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
> > index 973eef234b36..6fa7f288d635 100644
> > --- a/drivers/firmware/efi/vars.c
> > +++ b/drivers/firmware/efi/vars.c
> > @@ -1179,6 +1179,8 @@ int efivars_register(struct efivars *efivars,
> >  const struct efivar_operations *ops,
> >  struct kobject *kobject)
> >  {
> > +   int error;
> > +
> > if (down_interruptible(&efivars_lock))
> > return -EINTR;
> >
> > @@ -1191,6 +1193,19 @@ int efivars_register(struct efivars *efivars,
> >
> > up(&efivars_lock);
> >
> > +   /* and the standard mountpoint for efivarfs */
> > +   if (efi_kobj) {
> > +   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > +   if (error) {
> > +   if (down_interruptible(&efivars_lock))
> > +   return -EINTR;
> > +   __efivars = NULL;
> > +   up(&efivars_lock);
> > +   pr_err("efivars: Subsystem registration failed.\n");
> > +   return error;
> > +   }
> > +   }
> > +
> > return 0;
> >  }
> >  EXPORT_SYMBOL_GPL(efivars_register);
> > @@ -1222,6 +1237,8 @@ int efivars_unregister(struct efivars *efivars)
> >
> > pr_info("Unregistered efivars operations\n");
> > __efivars = NULL;
> > +   if (efi_kobj)
> > +   sysfs_remove_mount_point(efi_kobj, "efivars");
> >
> > rv = 0;

Re: [PATCH] efi/efivars: Create efivars mount point in the registration of efivars abstraction

2020-09-24 Thread joeyli
Hi Greg,

On Thu, Sep 24, 2020 at 11:51:57AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Sep 24, 2020 at 04:28:33PM +0800, Lee, Chun-Yi wrote:
> > This patch moved the logic of creating efivars mount point to the
> > registration of efivars abstraction. It's useful for userland to
> > determine the availability of efivars filesystem by checking the
> > existence of mount point.
> 
> Why not do what all other tools do, and look in /proc/filesystems?
> 
> Why is efivars "special" in this way?  What tool isn't properly looking
> for the filesystem in that way today?
> 

Thanks for your idea. I think that this is good enough for userland
tool to check the availability of efivarfs. Ignore my patch please.

Regards
Joey Lee

> > The 'efivars' platform device be created on generic EFI runtime services
> > platform, so it can be used to determine the availability of efivarfs.
> > But this approach is not available for google gsmi efivars abstraction.
> 
> I do not understand this last sentence, can you try to explain it
> better?
> 
> > This patch be tested on Here on qemu-OVMF and qemu-uboot.
> 
> How about real hardware?  :)
> 
> > 
> > Cc: Ard Biesheuvel 
> > Cc: Matthias Brugger 
> > Cc: Fabian Vogt 
> > Cc: Ilias Apalodimas 
> > Cc: Greg Kroah-Hartman 
> > Cc: Arthur Heymans 
> > Cc: Patrick Rudolph 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  drivers/firmware/efi/efi.c  |  7 ---
> >  drivers/firmware/efi/vars.c | 17 +
> >  2 files changed, 17 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index 3aa07c3b5136..23c11a2a3f4d 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -405,13 +405,6 @@ static int __init efisubsys_init(void)
> > if (error)
> > goto err_remove_group;
> >  
> > -   /* and the standard mountpoint for efivarfs */
> > -   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > -   if (error) {
> > -   pr_err("efivars: Subsystem registration failed.\n");
> > -   goto err_remove_group;
> > -   }
> > -
> > if (efi_enabled(EFI_DBG) && efi_enabled(EFI_PRESERVE_BS_REGIONS))
> > efi_debugfs_init();
> >  
> > diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
> > index 973eef234b36..6fa7f288d635 100644
> > --- a/drivers/firmware/efi/vars.c
> > +++ b/drivers/firmware/efi/vars.c
> > @@ -1179,6 +1179,8 @@ int efivars_register(struct efivars *efivars,
> >  const struct efivar_operations *ops,
> >  struct kobject *kobject)
> >  {
> > +   int error;
> > +
> > if (down_interruptible(&efivars_lock))
> > return -EINTR;
> >  
> > @@ -1191,6 +1193,19 @@ int efivars_register(struct efivars *efivars,
> >  
> > up(&efivars_lock);
> >  
> > +   /* and the standard mountpoint for efivarfs */
> > +   if (efi_kobj) {
> 
> Why test for this?  Can it race?
> 
> thanks,
> 
> greg k-h



Re: [PATCH] platform/x86: acer-wmi: add automatic keyboard background light toggle key as KEY_LIGHTS_TOGGLE

2020-08-28 Thread joeyli
Hi Timo,

On Tue, Aug 04, 2020 at 02:14:23AM +0200, Timo Witte wrote:
> Got a dmesg message on my AMD Renoir based Acer laptop:
> "acer_wmi: Unknown key number - 0x84" when toggling keyboard
> background light
> 
> Signed-off-by: Timo Witte 

Reviewed-by: "Lee, Chun-Yi" 

Thanks for your help!
Joey Lee

> ---
>  drivers/platform/x86/acer-wmi.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/platform/x86/acer-wmi.c b/drivers/platform/x86/acer-wmi.c
> index 60c18f21588d..87797f785d6a 100644
> --- a/drivers/platform/x86/acer-wmi.c
> +++ b/drivers/platform/x86/acer-wmi.c
> @@ -111,6 +111,7 @@ static const struct key_entry acer_wmi_keymap[] 
> __initconst = {
>   {KE_KEY, 0x64, {KEY_SWITCHVIDEOMODE} }, /* Display Switch */
>   {KE_IGNORE, 0x81, {KEY_SLEEP} },
>   {KE_KEY, 0x82, {KEY_TOUCHPAD_TOGGLE} }, /* Touch Pad Toggle */
> + {KE_IGNORE, 0x84, {KEY_LIGHTS_TOGGLE} }, /* Automatic Keyboard 
> background light toggle */
>   {KE_KEY, KEY_TOUCHPAD_ON, {KEY_TOUCHPAD_ON} },
>   {KE_KEY, KEY_TOUCHPAD_OFF, {KEY_TOUCHPAD_OFF} },
>   {KE_IGNORE, 0x83, {KEY_TOUCHPAD_TOGGLE} },
> -- 
> 2.27.0



Re: [PATCH v2] efi/efivars: Create efivars mount point via efivars abstraction

2020-08-27 Thread joeyli
On Wed, Aug 26, 2020 at 11:56:33PM +0800, Joey Lee wrote:
> Hi Ard,
> 
> On Wed, Aug 26, 2020 at 02:08:18PM +0200, Ard Biesheuvel wrote:
> > On Wed, 26 Aug 2020 at 02:46, Lee, Chun-Yi  wrote:
> > >
> > > This patch creates efivars mount point when active efivars abstraction
> > > be set. It is useful for userland to determine the availability of efivars
> > > filesystem.
> > >
> > > Cc: Matthias Brugger 
> > > Cc: Fabian Vogt 
> > > Cc: Ilias Apalodimas 
> > > Cc: Ard Biesheuvel 
> > > Signed-off-by: "Lee, Chun-Yi" 
> > 
> > Apologies for not bringing this up before: while the patch seems fine,
> > I wonder if we really need this if the purpose is to decide whether
> > efivars is available or not. We already have the 'efivars' platform
> > device for this, and so userland can simply check for the existence of
> > 
> > /sys/devices/platform/efivars.0
> > 
> > and so we don't need to make any changes for this.
> > 
> 
> The platform device only be registered on generic EFI runtime services
> platform. If the efivars abstraction is google gsmi, then userland can
> not use efivars.0 to detectmine the availability of efivars filesystem.
> 
> If we only consider generic EFI platform, then efivars.0 is good enough.
> But if the gsmi implementation joins this game, then my patch should not
> blocks the creation of efivars mount point because gsmi_kobj is not
> initialized yet.
> 
> Maybe the creation of efivars mount point can be moved to
> efivars_register(), after the kobject be set. Then gsmi can also create
> mount point. How do you think?
> 

Here is the patch for my concept. I simply tested on qemu-OVMF and
qemu-uboot.

>From 7c81d3f058a177bb17c4589fe1863b79940f94c3 Mon Sep 17 00:00:00 2001
From: "Lee, Chun-Yi" 
Date: Fri, 28 Aug 2020 00:05:22 +0800
Subject: [PATCH] efi/efivars: Create efivars mount point in the registration
 of efivars abstraction

This patch moved the logic of creating efivars mount point to the registration
of efivars abstraction. It is useful for userland to determine the availability
of efivars filesystem by checking the existence of mount point.

Signed-off-by: Lee, Chun-Yi 
---
 drivers/firmware/efi/efi.c  |  7 ---
 drivers/firmware/efi/vars.c | 17 +
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 3aa07c3b5136..23c11a2a3f4d 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -405,13 +405,6 @@ static int __init efisubsys_init(void)
if (error)
goto err_remove_group;
 
-   /* and the standard mountpoint for efivarfs */
-   error = sysfs_create_mount_point(efi_kobj, "efivars");
-   if (error) {
-   pr_err("efivars: Subsystem registration failed.\n");
-   goto err_remove_group;
-   }
-
if (efi_enabled(EFI_DBG) && efi_enabled(EFI_PRESERVE_BS_REGIONS))
efi_debugfs_init();
 
diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c
index 973eef234b36..6fa7f288d635 100644
--- a/drivers/firmware/efi/vars.c
+++ b/drivers/firmware/efi/vars.c
@@ -1179,6 +1179,8 @@ int efivars_register(struct efivars *efivars,
 const struct efivar_operations *ops,
 struct kobject *kobject)
 {
+   int error;
+
if (down_interruptible(&efivars_lock))
return -EINTR;
 
@@ -1191,6 +1193,19 @@ int efivars_register(struct efivars *efivars,
 
up(&efivars_lock);
 
+   /* and the standard mountpoint for efivarfs */
+   if (efi_kobj) {
+   error = sysfs_create_mount_point(efi_kobj, "efivars");
+   if (error) {
+   if (down_interruptible(&efivars_lock))
+   return -EINTR;
+   __efivars = NULL;
+   up(&efivars_lock);
+   pr_err("efivars: Subsystem registration failed.\n");
+   return error;
+   }
+   }
+
return 0;
 }
 EXPORT_SYMBOL_GPL(efivars_register);
@@ -1222,6 +1237,8 @@ int efivars_unregister(struct efivars *efivars)
 
pr_info("Unregistered efivars operations\n");
__efivars = NULL;
+   if (efi_kobj)
+   sysfs_remove_mount_point(efi_kobj, "efivars");
 
rv = 0;
 out:
-- 
2.16.4

Thanks a lot!
Joey Lee

> On the other hand, the efisubsys_init() does not unregister efivars.0
> platform device in err_unregister block. It causes that efivars and
> efi-pstore be loaded when no generic_ops.
> 
> Let me know if I miss understood any thing, please.
> 
> 
> > 
> > 
> > > ---
> > >
> > > v2:
> > > Using efivars_kobject() helper instead of checking GetVariable or
> > > GetNextVariable EFI runtime services. Because the efivarfs code could be
> > > instantiated using a different efivars abstraction.
> > >
> > >  drivers/firmware/efi/efi.c | 12 +++-
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > >
> > > d

Re: [PATCH v2] efi/efivars: Create efivars mount point via efivars abstraction

2020-08-26 Thread joeyli
Hi Ard,

On Wed, Aug 26, 2020 at 02:08:18PM +0200, Ard Biesheuvel wrote:
> On Wed, 26 Aug 2020 at 02:46, Lee, Chun-Yi  wrote:
> >
> > This patch creates efivars mount point when active efivars abstraction
> > be set. It is useful for userland to determine the availability of efivars
> > filesystem.
> >
> > Cc: Matthias Brugger 
> > Cc: Fabian Vogt 
> > Cc: Ilias Apalodimas 
> > Cc: Ard Biesheuvel 
> > Signed-off-by: "Lee, Chun-Yi" 
> 
> Apologies for not bringing this up before: while the patch seems fine,
> I wonder if we really need this if the purpose is to decide whether
> efivars is available or not. We already have the 'efivars' platform
> device for this, and so userland can simply check for the existence of
> 
> /sys/devices/platform/efivars.0
> 
> and so we don't need to make any changes for this.
> 

The platform device only be registered on generic EFI runtime services
platform. If the efivars abstraction is google gsmi, then userland can
not use efivars.0 to detectmine the availability of efivars filesystem.

If we only consider generic EFI platform, then efivars.0 is good enough.
But if the gsmi implementation joins this game, then my patch should not
blocks the creation of efivars mount point because gsmi_kobj is not
initialized yet.

Maybe the creation of efivars mount point can be moved to
efivars_register(), after the kobject be set. Then gsmi can also create
mount point. How do you think?

On the other hand, the efisubsys_init() does not unregister efivars.0
platform device in err_unregister block. It causes that efivars and
efi-pstore be loaded when no generic_ops.

Let me know if I miss understood any thing, please.

Thanks a lot!
Joey Lee

> 
> 
> > ---
> >
> > v2:
> > Using efivars_kobject() helper instead of checking GetVariable or
> > GetNextVariable EFI runtime services. Because the efivarfs code could be
> > instantiated using a different efivars abstraction.
> >
> >  drivers/firmware/efi/efi.c | 12 +++-
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index 3aa07c3b5136..db483fc68501 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -405,11 +405,13 @@ static int __init efisubsys_init(void)
> > if (error)
> > goto err_remove_group;
> >
> > -   /* and the standard mountpoint for efivarfs */
> > -   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > -   if (error) {
> > -   pr_err("efivars: Subsystem registration failed.\n");
> > -   goto err_remove_group;
> > +   if (efivars_kobject()) {
> > +   /* and the standard mountpoint for efivarfs */
> > +   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > +   if (error) {
> > +   pr_err("efivars: Subsystem registration failed.\n");
> > +   goto err_remove_group;
> > +   }
> > }
> >
> > if (efi_enabled(EFI_DBG) && efi_enabled(EFI_PRESERVE_BS_REGIONS))
> > --
> > 2.16.4
> >



Re: [PATCH] efi/efivars: create efivars mount point when get variable services are available

2020-08-24 Thread joeyli
Hi Ard,

On Thu, Aug 20, 2020 at 11:30:27AM +0200, Ard Biesheuvel wrote:
> On Wed, 19 Aug 2020 at 11:28, Lee, Chun-Yi  wrote:
> >
> > The efivars filesystem depends on GetVariable or GetNextVariable EFI
> > runtime services. So the /sys/firmware/efi/efivars does not need to be
> > created when GetVariable and GetNextVariable are not available.
> >
> > It is useful for userland to determine the availability of efivars
> > filesystem.
> >
> > Cc: Ilias Apalodimas 
> > Cc: Ard Biesheuvel 
> > Signed-off-by: "Lee, Chun-Yi" 
> 
> Hello Joey,
> 
> This is not the right check to perform here: the efivarfs code could
> be instantiated using a different efivars abstraction, so whether the
> RT services are implemented is not entirely relevant.
> 
> Please look at commit f88814cc2578c121e6edef686365036db72af0ed
> ("efi/efivars: Expose RT service availability via efivars
> abstraction") for an explanation of the underlying issue, and update
> your patch accordingly.
>

Thanks for youre review! I will look at f88814cc257 and change my
patch for v2.

Joey Lee
 
> 
> > ---
> >  drivers/firmware/efi/efi.c | 13 -
> >  1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index fdd1db025dbf..929fbf4dfd5d 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -404,11 +404,14 @@ static int __init efisubsys_init(void)
> > if (error)
> > goto err_remove_group;
> >
> > -   /* and the standard mountpoint for efivarfs */
> > -   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > -   if (error) {
> > -   pr_err("efivars: Subsystem registration failed.\n");
> > -   goto err_remove_group;
> > +   if (efi_rt_services_supported(EFI_RT_SUPPORTED_GET_VARIABLE |
> > + 
> > EFI_RT_SUPPORTED_GET_NEXT_VARIABLE_NAME)) {
> > +   /* and the standard mountpoint for efivarfs */
> > +   error = sysfs_create_mount_point(efi_kobj, "efivars");
> > +   if (error) {
> > +   pr_err("efivars: Subsystem registration failed.\n");
> > +   goto err_remove_group;
> > +   }
> > }
> >
> > if (efi_enabled(EFI_DBG) && efi_enabled(EFI_PRESERVE_BS_REGIONS))
> > --
> > 2.16.4
> >



Re: [PATCH V34 10/29] hibernate: Disable when the kernel is locked down

2019-07-10 Thread joeyli
Hi experts,

On Mon, Jun 24, 2019 at 03:21:23PM +0200, Jiri Kosina wrote:
> On Sat, 22 Jun 2019, Pavel Machek wrote:
> 
> > > There is currently no way to verify the resume image when returning
> > > from hibernate.  This might compromise the signed modules trust model,
> > > so until we can work with signed hibernate images we disable it when the
> > > kernel is locked down.
> > 
> > I keep getting these...
> > 
> > IIRC suse has patches to verify the images.
> 
> Yeah, Joey Lee is taking care of those. CCing.
>

The last time that I sent for hibernation encryption and authentication is
here:
https://lkml.org/lkml/2019/1/3/281

It needs some big changes after review:
 - Simplify the design: remove keyring dependency and trampoline.
 - Encrypted whole snapshot image instead of only data pages.
 - Using TPM:
- Direct use TPM API in hibernation instead of keyring
- Localities (suggested by James Bottomley)

I am still finding enough time to implement those changes, especial TPM
parts.

Thanks
Joey Lee


Re: [PATCH 2/2 v3] efi: print appropriate status message when loading certificates

2019-05-07 Thread joeyli
On Fri, May 03, 2019 at 04:23:51PM +0200, Ard Biesheuvel wrote:
> On Fri, 3 May 2019 at 10:59, joeyli  wrote:
> >
> > On Fri, May 03, 2019 at 10:07:59AM +0200, Ard Biesheuvel wrote:
> > > On Fri, 3 May 2019 at 09:18, joeyli  wrote:
> > > >
> > > > Hi Ard,
> > > >
> > > > On Thu, May 02, 2019 at 11:04:34AM +0200, Ard Biesheuvel wrote:
> > > > > On Thu, 2 May 2019 at 06:04, Lee, Chun-Yi  
> > > > > wrote:
> > > > > >
> > > > > > When loading certificates list from UEFI variable, the original 
> > > > > > error
> > > > > > message direct shows the efi status code from UEFI firmware. It 
> > > > > > looks
> > > > > > ugly:
> > > > > >
> > > > > > [2.335031] Couldn't get size: 0x800e
> > > > > > [2.335032] Couldn't get UEFI MokListRT
> > > > > > [2.339985] Couldn't get size: 0x800e
> > > > > > [2.339987] Couldn't get UEFI dbx list
> > > > > >
> > > > > > So, this patch shows the status string instead of status code.
> > > > > >
> > > > > > On the other hand, the "Couldn't get UEFI" message doesn't need
> > > > > > to be exposed when db/dbx/mok variable do not exist. So, this
> > > > > > patch set the message level to debug.
> > > > > >
> > > > > > v3.
> > > > > > - Print messages similar to db/mok when loading dbx hash to 
> > > > > > blacklist:
> > > > > > [1.500952] EFI: Blacklisting hash of an executable: UEFI:dbx
> > > > > > [1.501773] blacklist: Loaded blacklisting hash
> > > > > > 'bin:80b4d96931bf0d02fd91a61e19d14f1da452e66db2408ca8604d411f92659f0a'
> > > > > >
> > > > > > - Setting messages for the existence of db/mok/dbx lists to debug 
> > > > > > level.
> > > > > >
> > > > > > v2.
> > > > > > Setting the MODSIGN messages level to debug.
> > > > > >
> > > > > > Link:
> > > > > > https://forums.opensuse.org/showthread.php/535324-MODSIGN-Couldn-t-get-UEFI-db-list?p=2897516#post2897516
> > > > > > Cc: James Morris 
> > > > > > Cc: Serge E. Hallyn" 
> > > > > > Cc: David Howells 
> > > > > > Cc: Nayna Jain 
> > > > > > Cc: Josh Boyer 
> > > > > > Cc: Mimi Zohar 
> > > > > > Signed-off-by: "Lee, Chun-Yi" 
> > > > > > ---
> > > > > >  certs/blacklist.c |  3 +-
> > > > > >  security/integrity/platform_certs/load_uefi.c | 40 
> > > > > > +++
> > > > > >  2 files changed, 31 insertions(+), 12 deletions(-)
> > > > > >
> > > > > > diff --git a/certs/blacklist.c b/certs/blacklist.c
> > > > > > index 3a507b9e2568..f91437e39e44 100644
> > > > > > --- a/certs/blacklist.c
> > > > > > +++ b/certs/blacklist.c
> > > > > > @@ -100,7 +100,8 @@ int mark_hash_blacklisted(const char *hash)
> > > > > > if (IS_ERR(key)) {
> > > > > > pr_err("Problem blacklisting hash (%ld)\n", 
> > > > > > PTR_ERR(key));
> > > > > > return PTR_ERR(key);
> > > > > > -   }
> > > > > > +   } else
> > > > > > +   pr_notice("Loaded blacklisting hash '%s'\n", hash);
> > > > > > return 0;
> > > > > >  }
> > > > > >
> > > > > > diff --git a/security/integrity/platform_certs/load_uefi.c 
> > > > > > b/security/integrity/platform_certs/load_uefi.c
> > > > > > index 81b19c52832b..6b6996e5bc27 100644
> > > > > > --- a/security/integrity/platform_certs/load_uefi.c
> > > > > > +++ b/security/integrity/platform_certs/load_uefi.c
> > > > > > @@ -1,5 +1,7 @@
> > > > > >  // SPDX-License-Identifier: GPL-2.0
> > > > > >
> > > > > > +#define pr_fmt(fmt) "EFI: "fmt
> > > > > > +
> > > > > >  #include 
> > > > >

Re: [PATCH 0/5 v2][RFC] Encryption and authentication for hibernate snapshot image

2019-01-11 Thread joeyli
On Wed, Jan 09, 2019 at 05:47:53PM +0100, Stephan Mueller wrote:
> Am Mittwoch, 9. Januar 2019, 17:39:58 CET schrieb joeyli:
> 
> Hi joeyli,
> 
> > 
> > I am doing encrypt-then-MAC.
> 
> Note, this is what the authenc() AEAD cipher does.
>

OK.. Thanks for your reminding. I will look at it.

Joey Lee 


Re: [PATCH 0/5 v2][RFC] Encryption and authentication for hibernate snapshot image

2019-01-11 Thread joeyli
On Thu, Jan 10, 2019 at 05:09:46PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 10, 2019 at 7:13 AM joeyli  wrote:
> >
> > On Wed, Jan 09, 2019 at 10:47:42AM -0800, Andy Lutomirski wrote:
> > > On Wed, Jan 9, 2019 at 8:40 AM joeyli  wrote:
> > > >
> > > > Hi Andy,
> > > >
> > > > Thanks for your review!
> > > >
> > > > On Tue, Jan 08, 2019 at 01:41:48PM -0800, Andy Lutomirski wrote:
> > > > > > On Jan 7, 2019, at 9:37 AM, joeyli  wrote:
> > > > > >
> > > > > > Hi Pavel,
> > > > > >
> > > > > > Thanks for your review!
> > > > > >
> > > > > >> On Sun, Jan 06, 2019 at 07:10:27PM +0100, Pavel Machek wrote:
> > > > > >> Hi!
> > > > > >>
> > > > > >>> This patchset is the implementation of encryption and 
> > > > > >>> authentication
> > > > > >>> for hibernate snapshot image. The image will be encrypted by AES 
> > > > > >>> and
> > > > > >>> authenticated by HMAC.
> > > > > >>
> > > > > >> Ok, so you encrypt.
> > > > > >
> > > > > > Yes, encryption and authentication.
> > > > > >
> > > > > >>> The hibernate function can be used to snapshot memory pages to an 
> > > > > >>> image,
> > > > > >>> then kernel restores the image to memory space in a appropriate 
> > > > > >>> time.
> > > > > >>> There have secrets in snapshot image and cracker may modifies it 
> > > > > >>> for
> > > > > >>> hacking system. Encryption and authentication of snapshot image 
> > > > > >>> can protect
> > > > > >>> the system.
> > > > > >>>
> > > > > >>> Hibernate function requests the master key through key retention 
> > > > > >>> service.
> > > > > >>> The snapshot master key can be a trusted key or a user defined 
> > > > > >>> key. The
> > > > > >>> name of snapshot master key is fixed to "swsusp-kmk". User should 
> > > > > >>> loads
> > > > > >>> swsusp-kmk to kernel by keyctl tool before the hibernation resume.
> > > > > >>> e.g. The swsusp-kmk must be loaded before systemd-hibernate-resume
> > > > > >>
> > > > > >> But if userspace has a key, encryption is useless against root.
> > > > > >
> > > > > > Yes, but this concern is not only for hibernation encryption. This 
> > > > > > patch
> > > > > > set does not provide solution against this concern.
> > > > > >
> > > > > > The purpose of this patch set is to encrypt and authenticate 
> > > > > > hibernate
> > > > > > snapshot image in kernel space. It also requests key through keyring
> > > > > > mechanism. Which means that we can easy to adapt to new key type 
> > > > > > from
> > > > > > keyring in the future.
> > > > > >
> > > > > > Currently TPM trusted key or user defined key types are not against
> > > > > > root. Even using the TPM trusted key, it still can be unsealed by 
> > > > > > root
> > > > > > before the PCRs be capped (unless we capped PCRs in kernel).
> > > > > >
> > > > > > My solution for keeping the secret by kernel is the EFI secure key 
> > > > > > type:
> > > > > >https://lkml.org/lkml/2018/8/5/31
> > > > > >
> > > > > > But the EFI boot variable doesn't design for keeping secret, so 
> > > > > > Windows
> > > > > > and OEM/ODM do not use boot variable to keep secret. So this idea 
> > > > > > can
> > > > > > not be accepted. We must think other key type against root.
> > > > > >
> > > > > >>> The TPM trusted key type is preferred to be the master key. But 
> > > > > >>> user
> > > > > >>> defined key can also be used for testing or when the platform 
> > > > > >>> doesn't
> > > > > >>> have TPM

Re: [PATCH 0/5 v2][RFC] Encryption and authentication for hibernate snapshot image

2019-01-10 Thread joeyli
On Wed, Jan 09, 2019 at 10:47:42AM -0800, Andy Lutomirski wrote:
> On Wed, Jan 9, 2019 at 8:40 AM joeyli  wrote:
> >
> > Hi Andy,
> >
> > Thanks for your review!
> >
> > On Tue, Jan 08, 2019 at 01:41:48PM -0800, Andy Lutomirski wrote:
> > > > On Jan 7, 2019, at 9:37 AM, joeyli  wrote:
> > > >
> > > > Hi Pavel,
> > > >
> > > > Thanks for your review!
> > > >
> > > >> On Sun, Jan 06, 2019 at 07:10:27PM +0100, Pavel Machek wrote:
> > > >> Hi!
> > > >>
> > > >>> This patchset is the implementation of encryption and authentication
> > > >>> for hibernate snapshot image. The image will be encrypted by AES and
> > > >>> authenticated by HMAC.
> > > >>
> > > >> Ok, so you encrypt.
> > > >
> > > > Yes, encryption and authentication.
> > > >
> > > >>> The hibernate function can be used to snapshot memory pages to an 
> > > >>> image,
> > > >>> then kernel restores the image to memory space in a appropriate time.
> > > >>> There have secrets in snapshot image and cracker may modifies it for
> > > >>> hacking system. Encryption and authentication of snapshot image can 
> > > >>> protect
> > > >>> the system.
> > > >>>
> > > >>> Hibernate function requests the master key through key retention 
> > > >>> service.
> > > >>> The snapshot master key can be a trusted key or a user defined key. 
> > > >>> The
> > > >>> name of snapshot master key is fixed to "swsusp-kmk". User should 
> > > >>> loads
> > > >>> swsusp-kmk to kernel by keyctl tool before the hibernation resume.
> > > >>> e.g. The swsusp-kmk must be loaded before systemd-hibernate-resume
> > > >>
> > > >> But if userspace has a key, encryption is useless against root.
> > > >
> > > > Yes, but this concern is not only for hibernation encryption. This patch
> > > > set does not provide solution against this concern.
> > > >
> > > > The purpose of this patch set is to encrypt and authenticate hibernate
> > > > snapshot image in kernel space. It also requests key through keyring
> > > > mechanism. Which means that we can easy to adapt to new key type from
> > > > keyring in the future.
> > > >
> > > > Currently TPM trusted key or user defined key types are not against
> > > > root. Even using the TPM trusted key, it still can be unsealed by root
> > > > before the PCRs be capped (unless we capped PCRs in kernel).
> > > >
> > > > My solution for keeping the secret by kernel is the EFI secure key type:
> > > >https://lkml.org/lkml/2018/8/5/31
> > > >
> > > > But the EFI boot variable doesn't design for keeping secret, so Windows
> > > > and OEM/ODM do not use boot variable to keep secret. So this idea can
> > > > not be accepted. We must think other key type against root.
> > > >
> > > >>> The TPM trusted key type is preferred to be the master key. But user
> > > >>> defined key can also be used for testing or when the platform doesn't
> > > >>> have TPM. User must be aware that the security of user key relies on
> > > >>> user space. If the root account be compromised, then the user key will
> > > >>> easy to be grabbed.
> > > >>
> > > >> In the TPM case, does userland have access to the key?
> > > >
> > > > In the TPM case, userland can only touch the sealed key blob. So 
> > > > userland
> > > > doesn't know the real secret. But it has risk that root unseals the key
> > > > before PCRs be capped.
> > > >
> > > >> Please explain your security goals.
> > > >
> > > > My security goals:
> > > >
> > > > - Encrypt and authicate hibernate snapshot image in kernel space. 
> > > > Userspace
> > > >  can only touch an encrypted and signed snapshot image.
> > >
> > > Signed?
> > >
> > > I’m not entirely convinced that the keyring mechanism is what you
> > > want. ISTM that there are two goals here:
> > >
> > > a) Encryption: it should be as hard as can reasonably be arranged to
> > &

Re: [PATCH 1/5 v2] PM / hibernate: Create snapshot keys handler

2019-01-09 Thread joeyli
Hi James

On Tue, Jan 08, 2019 at 10:49:39PM -0800, James Bottomley wrote:
> On Tue, 2019-01-08 at 17:43 -0800, Andy Lutomirski wrote:
> > [Adding Jarkko because this stuff relates to the TPM.]
> > 
> > On Tue, Jan 8, 2019 at 4:44 PM James Bottomley
> >  wrote:
> > > 
> > > On Tue, 2019-01-08 at 15:54 -0800, Andy Lutomirski wrote:
> > > > > On Jan 7, 2019, at 11:09 PM, Stephan Mueller  > > > > de>
> > > > > wrote:
> > > > > 
> > > > > Am Dienstag, 8. Januar 2019, 06:03:58 CET schrieb Herbert Xu:
> > > > > 
> > > > > Hi Herbert,
> > > > > 
> > > > > > Are we going to have multiple implementations for the same
> > > > > > KDF? If not then the crypto API is not a good fit.  To
> > > > > > consolidate multiple implementations of the same KDF, simply
> > > > > > provide helpers for them.
> > > > > 
> > > > > It is unlikely to have multiple implementations of a KDF.
> > > > > However, KDFs relate to hashes like block chaining modes to raw
> > > > > block ciphers. Thus a KDF can be applied with different hashes.
> > > > > 
> > > > > My idea was to add template support to RNGs (because KDFs are
> > > > > effectively a type of RNG since they produce an arbitrary
> > > > > output from a fixed input). The KDFs would be a template
> > > > > wrapping hashes.  For example, the CTR-KDF from SP800-108 could
> > > > > be instantiated like kdf-ctr(sha256).
> > > > > 
> > > > > 
> > > > 
> > > > I think that, if the crypto API is going to grow a KDF facility,
> > > > it should be done right. Have a key type or flag or whatever that
> > > > says “this key may *only* be used to derive keys using such-and-
> > > > such algorithm”, and have a helper to derive a key.  That helper
> > > > should take some useful parameters and mix them in:
> > > > 
> > > > - What type of key is being derived?  ECDSA signing key?  HMAC
> > > > key?  AES key?
> > > > 
> > > > - Can user code access the derived key?
> > > > 
> > > > - What is the key’s purpose?  “Encrypt and authenticate a
> > > > hibernation image” would be a purpose.
> > > > 
> > > > - Number of bytes.
> > > > 
> > > > All of these parameters should be mixed in to the key derivation.
> > > > 
> > > > Also, an AE key, even for AES+HMAC, should be just one derived
> > > > key.  If you need 512 bits, ask for a 512-bit key, not two 256-
> > > > bit keys.
> > > 
> > > Actually, it would be enormously helpful if we could reuse these
> > > pieces for the TPM as well.  It has two KDFs: KDFa, which is the
> > > CTR-KDF from SP800-108 and KDFe which is the SP800-56A KDF for
> > > elliptic curve one pass Diffie Hellman, so if we're going to do the
> > > former, I'd really like the latter as well.
> > > 
> > > The way the TPM parametrises input to both KDFs is
> > > 
> > > (hashAlg, key, label, contextU, contextV, bits)
> > > 
> > > Where
> > > 
> > > hashAlg  = the hash algorithm used as the PRF
> > > key  = the input parameter of variable bit size or
> > >the x co-ordinate of the shared point
> > > label= An ASCII string representing the use
> > > contextU = public input U
> > > contextV = public input V
> > > bits = number of output bits
> > > 
> > > Is that a good enough parametrisation (not the only way you
> > > distinguish uses is with the label, which is not
> > > recoverable)?  ContextU and ContextV are simply concatenated to
> > > form the full Context of SP800-108, but we tend to need two
> > > separate inputs (for KDFe they're the public x co-ordinates of the
> > > points of the two inputs to ECDH for instance; in KDFa they're
> > > usually the local and remote nonces).
> > > 
> > > The labels for TPM usage are things like "INTEGRITY" for HMAC keys
> > > or "CFB" when generating an aes128-cfb session key. For KDFe, the
> > > tpm seems to like the label "SECRET".  Although the TPM specifies
> > > fixed short strings for the label, nothing prevents them being
> > > longer like the purpose you state above (essentially we could mix
> > > purpose, use and key type into the label and the contexts).
> > > 
> > 
> > That really ought to cover anything the kernel needs.
> > 
> > But can you explain what's up with with KDFe?  The obvious searches
> > end up with just warnings that the US currently has no government :(
> 
> You mean you can't find SP100-56A because NIST is a government entity
> and it's discontinued its website because of the government shutdown? 
> No idea, I only live here, you'll have to ask a real American.
> 
> ACM does have a copy:
> 
> http://delivery.acm.org/10.1145/221/2206270/SP800-56A_Revision1_Mar08-2007.pdf?ip=50.35.68.20&id=2206270&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1546993111_ed9c8bd24b2f838c829d428aac7f5d71
> 
> > Anyway, if we're talking about the TPM, it seems like the entire
> > "trusted key" mechanism in the kernel is missing the point.  If I
> > want to encrypt something like a hibernation image on a machine with
> > a TPM, it makes essentially no sense to me that we w

Re: [PATCH 0/5 v2][RFC] Encryption and authentication for hibernate snapshot image

2019-01-09 Thread joeyli
On Thu, Jan 10, 2019 at 12:39:58AM +0800, joeyli wrote:
> Hi Andy,
>
[...snip] 
> 
> Let's why I encrypt/decrypt data pages one by one, then I copy the
^^^ That's why

> encrypt/decrypt data from buffer page (only one buffer page reserved
> for encrypt/decrypt) to original page. I encreypt pages one by one, but
> I HMAC and verify the whole snapshot image by update mode.
> 
[...snip]
>  
> > Why are you manually supporting three different key types?  Can’t you
> > just somehow support all key types?  And shouldn’t you be verifying
> 
> I only supported two key typs in my patch set, user defined key and
> TPM trusted key. The EFI secure boot did not accept by EFI subsystem.
   ^^^ EFI secure key
   https://lkml.org/lkml/2018/8/5/10

Sorry for I produced too many typo when feeling sleepy...

Thanks a lot!
Joey Lee


Re: [PATCH 0/5 v2][RFC] Encryption and authentication for hibernate snapshot image

2019-01-09 Thread joeyli
Hi Andy,

Thanks for your review!

On Tue, Jan 08, 2019 at 01:41:48PM -0800, Andy Lutomirski wrote:
> > On Jan 7, 2019, at 9:37 AM, joeyli  wrote:
> >
> > Hi Pavel,
> >
> > Thanks for your review!
> >
> >> On Sun, Jan 06, 2019 at 07:10:27PM +0100, Pavel Machek wrote:
> >> Hi!
> >>
> >>> This patchset is the implementation of encryption and authentication
> >>> for hibernate snapshot image. The image will be encrypted by AES and
> >>> authenticated by HMAC.
> >>
> >> Ok, so you encrypt.
> >
> > Yes, encryption and authentication.
> >
> >>> The hibernate function can be used to snapshot memory pages to an image,
> >>> then kernel restores the image to memory space in a appropriate time.
> >>> There have secrets in snapshot image and cracker may modifies it for
> >>> hacking system. Encryption and authentication of snapshot image can 
> >>> protect
> >>> the system.
> >>>
> >>> Hibernate function requests the master key through key retention service.
> >>> The snapshot master key can be a trusted key or a user defined key. The
> >>> name of snapshot master key is fixed to "swsusp-kmk". User should loads
> >>> swsusp-kmk to kernel by keyctl tool before the hibernation resume.
> >>> e.g. The swsusp-kmk must be loaded before systemd-hibernate-resume
> >>
> >> But if userspace has a key, encryption is useless against root.
> >
> > Yes, but this concern is not only for hibernation encryption. This patch
> > set does not provide solution against this concern.
> >
> > The purpose of this patch set is to encrypt and authenticate hibernate
> > snapshot image in kernel space. It also requests key through keyring
> > mechanism. Which means that we can easy to adapt to new key type from
> > keyring in the future.
> >
> > Currently TPM trusted key or user defined key types are not against
> > root. Even using the TPM trusted key, it still can be unsealed by root
> > before the PCRs be capped (unless we capped PCRs in kernel).
> >
> > My solution for keeping the secret by kernel is the EFI secure key type:
> >https://lkml.org/lkml/2018/8/5/31
> >
> > But the EFI boot variable doesn't design for keeping secret, so Windows
> > and OEM/ODM do not use boot variable to keep secret. So this idea can
> > not be accepted. We must think other key type against root.
> >
> >>> The TPM trusted key type is preferred to be the master key. But user
> >>> defined key can also be used for testing or when the platform doesn't
> >>> have TPM. User must be aware that the security of user key relies on
> >>> user space. If the root account be compromised, then the user key will
> >>> easy to be grabbed.
> >>
> >> In the TPM case, does userland have access to the key?
> >
> > In the TPM case, userland can only touch the sealed key blob. So userland
> > doesn't know the real secret. But it has risk that root unseals the key
> > before PCRs be capped.
> >
> >> Please explain your security goals.
> >
> > My security goals:
> >
> > - Encrypt and authicate hibernate snapshot image in kernel space. Userspace
> >  can only touch an encrypted and signed snapshot image.
> 
> Signed?
> 
> I’m not entirely convinced that the keyring mechanism is what you
> want. ISTM that there are two goals here:
> 
> a) Encryption: it should be as hard as can reasonably be arranged to
> extract secrets from a hibernation image.
>
> b) Authentication part 1: it should not be possible for someone in
> possession of a turned-off machine to tamper with the hibernation
> image such that the image, when booted, will leak its secrets. This
> should protect against attackers who don’t know the encryption key.
> 
> c) Authentication part 2: it should be to verify, to the extent
> practical, that the image came from the same machine and was really
> created using hibernation.  Or maybe by the same user.
> 
> For (a) and (b), using an AE mode where the key is protected in some
> reasonable way.  Joey, why are you using HMAC?  Please tell me you’re
> at least doing encrypt-then-MAC.  But why not use a real AE mode like
> AES-GCM?

The reason for using HMAC is the history for development. My first patch
set is only for hibernate authentication. Then I added encryption code on
top of my authentication patches in last version.

I am doing encrypt-then-MAC. My code ecrypts each page by AES then HMAC
whole snapshot image. I feed encry

Re: [PATCH 0/5 v2][RFC] Encryption and authentication for hibernate snapshot image

2019-01-07 Thread joeyli
Hi Pavel, 

Thanks for your review!

On Sun, Jan 06, 2019 at 07:10:27PM +0100, Pavel Machek wrote:
> Hi!
> 
> > This patchset is the implementation of encryption and authentication
> > for hibernate snapshot image. The image will be encrypted by AES and
> > authenticated by HMAC.
> 
> Ok, so you encrypt. 
>

Yes, encryption and authentication. 
 
> > The hibernate function can be used to snapshot memory pages to an image,
> > then kernel restores the image to memory space in a appropriate time.
> > There have secrets in snapshot image and cracker may modifies it for
> > hacking system. Encryption and authentication of snapshot image can protect
> > the system.
> > 
> > Hibernate function requests the master key through key retention service.
> > The snapshot master key can be a trusted key or a user defined key. The
> > name of snapshot master key is fixed to "swsusp-kmk". User should loads
> > swsusp-kmk to kernel by keyctl tool before the hibernation resume.
> > e.g. The swsusp-kmk must be loaded before systemd-hibernate-resume
> 
> But if userspace has a key, encryption is useless against root.
>

Yes, but this concern is not only for hibernation encryption. This patch
set does not provide solution against this concern.

The purpose of this patch set is to encrypt and authenticate hibernate
snapshot image in kernel space. It also requests key through keyring
mechanism. Which means that we can easy to adapt to new key type from
keyring in the future. 

Currently TPM trusted key or user defined key types are not against
root. Even using the TPM trusted key, it still can be unsealed by root
before the PCRs be capped (unless we capped PCRs in kernel). 

My solution for keeping the secret by kernel is the EFI secure key type:
https://lkml.org/lkml/2018/8/5/31

But the EFI boot variable doesn't design for keeping secret, so Windows
and OEM/ODM do not use boot variable to keep secret. So this idea can
not be accepted. We must think other key type against root. 

> > The TPM trusted key type is preferred to be the master key. But user
> > defined key can also be used for testing or when the platform doesn't
> > have TPM. User must be aware that the security of user key relies on
> > user space. If the root account be compromised, then the user key will
> > easy to be grabbed.
> 
> In the TPM case, does userland have access to the key? 
>

In the TPM case, userland can only touch the sealed key blob. So userland
doesn't know the real secret. But it has risk that root unseals the key
before PCRs be capped.

> Please explain your security goals.
> 

My security goals:

- Encrypt and authicate hibernate snapshot image in kernel space. Userspace
  can only touch an encrypted and signed snapshot image.

- The code of encryption are in kernel. They will be signed and verify with
  kernel binary when secure boot enabled. It's better than using
  unauthenticated userspace code at runtime.

- Using TPM trusted key, at least the security of hibernation aligns with
  other subsystem. e.g. EVM. 

- Using keyring as the key source of hibernation. Then we can easy to
  adapt to new key type against root be compromised. 

> 
> > Lee, Chun-Yi (5):
> >   PM / hibernate: Create snapshot keys handler
> >   PM / hibernate: Generate and verify signature for snapshot image
> >   PM / hibernate: Encrypt snapshot image
> >   PM / hibernate: Erase the snapshot master key in snapshot pages
> >   PM / hibernate: An option to request that snapshot image must be
> > authenticated
> > 

Thanks a lot!
Joey Lee


Re: [PATCH 1/5 v2] PM / hibernate: Create snapshot keys handler

2019-01-07 Thread joeyli
Hi Stephan, 

First, thanks for your review!

On Sun, Jan 06, 2019 at 09:01:27AM +0100, Stephan Mueller wrote:
> Am Donnerstag, 3. Januar 2019, 15:32:23 CET schrieb Lee, Chun-Yi:
> 
> Hi Chun,
> 
> > This patch adds a snapshot keys handler for using the key retention
> > service api to create keys for snapshot image encryption and
> > authentication.
> > 
> > This handler uses TPM trusted key as the snapshot master key, and the
> > encryption key and authentication key are derived from the snapshot
> > key. The user defined key can also be used as the snapshot master key
> > , but user must be aware that the security of user key relies on user
> > space.
> > 
[...snip]
> > +static int calc_hash(u8 *digest, const u8 *buf, unsigned int buflen,
> > +bool may_sleep)
> > +{
> > +   struct shash_desc *desc;
> > +   int err;
> > +
> > +   desc = kzalloc(sizeof(struct shash_desc) +
> > +  crypto_shash_descsize(hash_tfm),
> > +  may_sleep ? GFP_KERNEL : GFP_ATOMIC);
> 
> Why not using SHASH_DESC_ON_STACK?
>

Because security concern and bad runtime performance. Please looking at
c2cd0b08e1e patch for hibernation. And reference:

https://lore.kernel.org/lkml/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com/T/#u
https://lwn.net/Articles/749064/
 
> > +   if (!desc)
> > +   return -ENOMEM;
> > +
> > +   desc->tfm = hash_tfm;
> > +   desc->flags = may_sleep ? CRYPTO_TFM_REQ_MAY_SLEEP : 0;
> > +   err = crypto_shash_digest(desc, buf, buflen, digest);
> > +   shash_desc_zero(desc);
> > +   kzfree(desc);
> > +
> > +   return err;
> > +}
> > +
> > +static int calc_key_hash(u8 *key, unsigned int key_len, const char *salt,
> > +u8 *hash, bool may_sleep)
> > +{
> > +   unsigned int salted_buf_len;
> > +   u8 *salted_buf;
> > +   int ret;
> > +
> > +   if (!key || !hash_tfm || !hash)
> > +   return -EINVAL;
> > +
> > +   salted_buf_len = strlen(salt) + 1 + SNAPSHOT_KEY_SIZE;
> 
> strlen on binary data? I guess that will not work. May I suggest to hand down 
> the length of salt to this function?
>

hm... The salt is actually a "salt string" that's gave from
snapshot_get_auth_key() or snapshot_get_enc_key(). So I use
strlen() here. I will change the name to salt_string to avoid
confusion. 
 
> > +   salted_buf = kzalloc(salted_buf_len,
> > +   may_sleep ? GFP_KERNEL : GFP_ATOMIC);
> > +   if (!salted_buf)
> > +   return -ENOMEM;
> > +
> > +   strcpy(salted_buf, salt);
> > +   memcpy(salted_buf + strlen(salted_buf) + 1, key, key_len);
> > +
> > +   ret = calc_hash(hash, salted_buf, salted_buf_len, may_sleep);
> > +   memzero_explicit(salted_buf, salted_buf_len);
> > +   kzfree(salted_buf);
> > +
> > +   return ret;
> > +}
> 
> This function looks very much like a key derivation. May I strongly propose 
> to 

Actually key derivation function is modified from the get_derived_key() from
the encrypted.c file in encrypted key.

> use an official KDF type like SP800-108 or HKDF?
> 
> You find the counter-KDF according to SP800-108 in security/keys/dh.c (search 
> for functions *kdf*).
> 
> Or we may start pulling in KDF support into the kernel crypto API via the 
> patches along the line of [1].
> 
> [1] http://www.chronox.de/kdf.html
>

Thanks for your suggestion. I didn't touch any key derivation standard
before. I will study it.

But I still want to use my original function currently. Because the same
logic is also used in trusted key. I will happy to move to SP800-108 or
HKDF when it's available in kernel.  

> > +
> > +/* Derive authentication/encryption key */
> > +static int get_derived_key(u8 *derived_key, const char *derived_type_str,
> > +  bool may_sleep)
[...snip]
> > +static int trusted_key_init(void)
> > +{
> > +   struct trusted_key_payload *tkp;
> > +   struct key *key;
> > +   int err = 0;
> > +
> > +   pr_debug("%s\n", __func__);
> > +
> > +   /* find out swsusp-key */
> > +   key = request_key(&key_type_trusted, skey.key_name, NULL);
> > +   if (IS_ERR(key)) {
> > +   pr_err("Request key error: %ld\n", PTR_ERR(key));
> > +   err = PTR_ERR(key);
> > +   return err;
> > +   }
> > +
> > +   down_write(&key->sem);
> > +   tkp = key->payload.data[0];
> > +   if (invalid_key(tkp->key, tkp->key_len)) {
> > +   err = -EINVAL;
> > +   goto key_invalid;
> > +   }
> > +   skey.key_len = tkp->key_len;
> > +   memcpy(skey.key, tkp->key, tkp->key_len);
> > +   /* burn the original key contents */
> > +   memzero_explicit(tkp->key, tkp->key_len);
> > +
> > +key_invalid:
> > +   up_write(&key->sem);
> > +   key_put(key);
> > +
> > +   return err;
> > +}
> > +
> > +static int user_key_init(void)
> 
> This function and trusted_key_init look very similar - could they be 
> collapsed 
> into one function?
>

The data structure is different between trusted key with user key. I will try to
extract the duplicate part but may not collapse into one.
 
> > +

Re: [PATCH 0/2] [RFC] sysfs: Add hook for checking the file capability of opener

2018-12-31 Thread joeyli
Hi Greg, 

Thanks for your review!

On Sun, Dec 30, 2018 at 03:45:06PM +0100, Greg Kroah-Hartman wrote:
> On Sun, Dec 30, 2018 at 09:28:54PM +0800, Lee, Chun-Yi wrote:
> > There have some discussion in the following mail loop about checking
> > capability in sysfs write handler:
> > https://lkml.org/lkml/2018/9/13/978
> 
> A sysfs callback should not care about stuff like this.
> 
> Worst case, do a simple:
>   if (!capable(CAP_FOO))
>   return -EPERM
> 
> you don't care or need to worry about the file handle for that at all,
> right?
>

The capable() can be bypassed. Unprivileged process may reads or writes
those sysfs if file permission be relaxed by root for non-root user.
 
> > Sometimes we check the capability in sysfs implementation by using
> > capable function.
> 
> Which should be fine, right?
>

If file permission is enough to restrict sysfs that can only be used
by root. Why do some sysfs interfaces use capable()? It's not
redundancy? 
 
> > But the checking can be bypassed by opening sysfs
> > file within an unprivileged process then writing the file within a
> > privileged process. The tricking way has been exposed by Andy Lutomirski
> > for CVE-2013-1959.
> 
> And who does this for a sysfs file?  And why?
>

Just want to bypass the capable() checking. 
 
> > Because the sysfs_ops does not forward the file descriptor to the
> > show/store callback, there doesn't have chance to check the capability
> > of file's opener.
> 
> Which is by design.  If you care about open, you are using sysfs wrong.
>

OK~ So the sysfs doesn't care opener's capability.
 
> > This patch adds the hook to sysfs_ops that allows
> > different implementation in object and attribute levels for checking
> > file capable before accessing sysfs interfaces.
> 
> No, please no.
>
> > The callback function of kobject sysfs_ops is the first implementation
> > of new hook. It casts attribute to kobj_attribute then calls the file
> > capability callback function of attribute level. The same logic can
> > be implemented in other sysfs file types, like: device, driver and
> > bus type. 
> > 
> > The capability checking logic in wake_lock/wake_unlock sysfs interface
> > is the first example for kobject. It will check the opener's capability.
> 
> Why doesn't the file permission of that sysfs file determine who can or
> can not write to that file?
>

I agree that the file permission can restrict the writer of sysfs. But,
I still confused for why do some sysfs interface use capable()?

Thanks a lot!
Joey Lee 


Re: [PATCH 2/2] PM / Sleep: Check the file capability when writing wake lock interface

2018-12-31 Thread joeyli
Hi Greg,

On Sun, Dec 30, 2018 at 03:48:35PM +0100, Greg Kroah-Hartman wrote:
> On Sun, Dec 30, 2018 at 09:28:56PM +0800, Lee, Chun-Yi wrote:
> > The wake lock/unlock sysfs interfaces check that the writer must has
> > CAP_BLOCK_SUSPEND capability. But the checking logic can be bypassed
> > by opening sysfs file within an unprivileged process and then writing
> > the file within a privileged process. The tricking way has been exposed
> > by Andy Lutomirski in CVE-2013-1959.
> 
> Don't you mean "open by privileged and then written by unprivileged?"
> Or if not, exactly how is this a problem?  You check the capabilities
> when you do the write and if that is not allowed then, well
>

Sorry for I didn't provide clear explanation.

The privileged means CAP_BLOCK_SUSPEND but not file permission. The file 
permission
has already relaxed for non-root user. Then the expected behavior is that 
non-root
process must has CAP_BLOCK_SUSPEND capability for writing wake_lock sysfs.

But, the CAP_BLOCK_SUSPEND restrict can be bypassed:

int main(int argc, char* argv[])
{
int fd, ret = 0;

fd = open("/sys/power/wake_lock", O_RDWR);
if (fd < 0)
err(1, "open wake_lock");

if (dup2(fd, 1) != 1)   // overwrite the stdout with wake_lock
err(1, "dup2");
sleep(1);
execl("./string", "string");//string has capability

return ret;
}

This program is an unpriviledged process (has no CAP_BLOCK_SUSPEND), it opened
wake_lock sysfs and overwrited stdout. Then it executes the "string" program
that has CAP_BLOCK_SUSPEND. The string program writes to stdout, which means
that it writes to wake_lock. So an unpriviledged opener can trick an priviledged
writer for writing sysfs.  
 
> And you are checking the namespace of the person trying to do the write
> when the write happens, which is correct here, right?
> 
> If you really want to mess with wake locks in a namespaced environment,
> then put it in a real namespaced environment, which is {HUGE HINT} not
> sysfs.
>

I don't want to mess with wake locks in namespace.
 
> So no, this patch isn't ok...
>

Thanks a lot!
Joey Lee 


Re: [PATCH][RFC v2] ACPI: acpi_pad: Do not launch acpi_pad threads on idle cpus

2018-12-11 Thread joeyli
Hi Yu Chen,

Thanks for your response!

On Tue, Dec 11, 2018 at 11:12:21AM +0800, Yu Chen wrote:
> Hi Joey,
> On Mon, Dec 10, 2018 at 02:31:53PM +0800, joeyli wrote:
> > Hi Chen Yu and ACPI experts,
> > 
> > On Sat, May 05, 2018 at 07:53:22PM +0800, Chen Yu wrote:
> > > According to current implementation of acpi_pad driver,
> > > it does not make sense to spawn any power saving threads
> > > on the cpus which are already idle - it might bring
> > > unnecessary overhead on these idle cpus and causes power
> > > waste. So verify the condition that if the number of 'busy'
> > > cpus exceeds the amount of the 'forced idle' cpus is met.
> > > This is applicable due to round-robin attribute of the
> > > power saving threads, otherwise ignore the setting/ACPI
> > > notification.
> > > 
> > > Suggested-by: Lenny Szubowicz 
> > > Suggested-by: Len Brown 
> > > Cc: "Rafael J. Wysocki" 
> > > Cc: Lenny Szubowicz 
> > > Cc: Len Brown 
> > > Cc: Jacob Pan 
> > > Cc: Rui Zhang 
> > > Cc: linux-a...@vger.kernel.org
> > > Signed-off-by: Chen Yu 
> > 
> > Do you have any news for this patch? Why it did not merged by kernel
> > maineline?
> > 
> We are evaluating if this could be integrated into idle injection framework.
> May I know if there's any requirement/background from SUSE on this?
> 

I am also looking at your patch and idle injection framework. Currently I do not
have good suggestion for your patches yet. But I will try to ready my knowledge 
when
you send out new version.

Thanks a lot!
Joey Lee 

> > 
> > > ---
> > >  drivers/acpi/acpi_pad.c | 52 
> > > -
> > >  1 file changed, 51 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
> > > index 552c1f7..515e60e 100644
> > > --- a/drivers/acpi/acpi_pad.c
> > > +++ b/drivers/acpi/acpi_pad.c
> > > @@ -254,12 +254,62 @@ static void set_power_saving_task_num(unsigned int 
> > > num)
> > >   }
> > >  }
> > >  
> > > +/*
> > > + * Extra acpi_pad threads should not be created until
> > > + * the requested idle count is less than/equals to the
> > > + * number of the busy cpus - it does not make sense to
> > > + * throttle the idle cpus.
> > > + */
> > > +#define SAMPLE_INTERVAL_JIF  20
> > > +
> > > +static u64 get_idle_time(int cpu)
> > > +{
> > > + u64 idle, idle_usecs = -1ULL;
> > > +
> > > + idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> > > +
> > > + if (idle_usecs == -1ULL)
> > > + idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
> > > + else
> > > + idle = idle_usecs * NSEC_PER_USEC;
> > > +
> > > + return idle;
> > > +}
> > > +
> > > +static bool idle_nr_valid(unsigned int num_cpus)
> > > +{
> > > + int busy_nr = 0, i = 0, load_thresh = 100 - idle_pct;
> > > +
> > > + if (!num_cpus)
> > > + return true;
> > > +
> > > + for_each_online_cpu(i) {
> > > + u64 wall_time, idle_time;
> > > + unsigned int elapsed_delta, idle_delta, load;
> > > +
> > > + wall_time = jiffies64_to_nsecs(get_jiffies_64());
> > > + idle_time = get_idle_time(i);
> > > + /* Wait and see... */
> > > + schedule_timeout_uninterruptible(SAMPLE_INTERVAL_JIF);
> > > +
> > > + idle_delta = get_idle_time(i) - idle_time;
> > > + elapsed_delta = jiffies64_to_nsecs(get_jiffies_64()) - 
> > > wall_time;
> > > + idle_delta = (idle_delta > elapsed_delta) ? elapsed_delta : 
> > > idle_delta;
> > > + load = 100 * (elapsed_delta - idle_delta) / elapsed_delta;
> > > + if (load >= load_thresh)
> > > + busy_nr++;
> > > + }
> > > +
> > > + return (busy_nr >= num_cpus) ? true : false;
> > > +}
> > > +
> > >  static void acpi_pad_idle_cpus(unsigned int num_cpus)
> > >  {
> > >   get_online_cpus();
> > >  
> > >   num_cpus = min_t(unsigned int, num_cpus, num_online_cpus());
> > > - set_power_saving_task_num(num_cpus);
> > > + if (idle_nr_valid(num_cpus))
> > > + set_power_saving_task_num(num_cpus);
> > >  
> > >   put_online_cpus();
> > >  }
> > > -- 
> > > 2.7.4
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> > > the body of a message to majord...@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH][RFC v2] ACPI: acpi_pad: Do not launch acpi_pad threads on idle cpus

2018-12-09 Thread joeyli
Hi Chen Yu and ACPI experts,

On Sat, May 05, 2018 at 07:53:22PM +0800, Chen Yu wrote:
> According to current implementation of acpi_pad driver,
> it does not make sense to spawn any power saving threads
> on the cpus which are already idle - it might bring
> unnecessary overhead on these idle cpus and causes power
> waste. So verify the condition that if the number of 'busy'
> cpus exceeds the amount of the 'forced idle' cpus is met.
> This is applicable due to round-robin attribute of the
> power saving threads, otherwise ignore the setting/ACPI
> notification.
> 
> Suggested-by: Lenny Szubowicz 
> Suggested-by: Len Brown 
> Cc: "Rafael J. Wysocki" 
> Cc: Lenny Szubowicz 
> Cc: Len Brown 
> Cc: Jacob Pan 
> Cc: Rui Zhang 
> Cc: linux-a...@vger.kernel.org
> Signed-off-by: Chen Yu 

Do you have any news for this patch? Why it did not merged by kernel
maineline?

Thanks a lot!
Joey Lee

> ---
>  drivers/acpi/acpi_pad.c | 52 
> -
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/acpi_pad.c b/drivers/acpi/acpi_pad.c
> index 552c1f7..515e60e 100644
> --- a/drivers/acpi/acpi_pad.c
> +++ b/drivers/acpi/acpi_pad.c
> @@ -254,12 +254,62 @@ static void set_power_saving_task_num(unsigned int num)
>   }
>  }
>  
> +/*
> + * Extra acpi_pad threads should not be created until
> + * the requested idle count is less than/equals to the
> + * number of the busy cpus - it does not make sense to
> + * throttle the idle cpus.
> + */
> +#define SAMPLE_INTERVAL_JIF  20
> +
> +static u64 get_idle_time(int cpu)
> +{
> + u64 idle, idle_usecs = -1ULL;
> +
> + idle_usecs = get_cpu_idle_time_us(cpu, NULL);
> +
> + if (idle_usecs == -1ULL)
> + idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
> + else
> + idle = idle_usecs * NSEC_PER_USEC;
> +
> + return idle;
> +}
> +
> +static bool idle_nr_valid(unsigned int num_cpus)
> +{
> + int busy_nr = 0, i = 0, load_thresh = 100 - idle_pct;
> +
> + if (!num_cpus)
> + return true;
> +
> + for_each_online_cpu(i) {
> + u64 wall_time, idle_time;
> + unsigned int elapsed_delta, idle_delta, load;
> +
> + wall_time = jiffies64_to_nsecs(get_jiffies_64());
> + idle_time = get_idle_time(i);
> + /* Wait and see... */
> + schedule_timeout_uninterruptible(SAMPLE_INTERVAL_JIF);
> +
> + idle_delta = get_idle_time(i) - idle_time;
> + elapsed_delta = jiffies64_to_nsecs(get_jiffies_64()) - 
> wall_time;
> + idle_delta = (idle_delta > elapsed_delta) ? elapsed_delta : 
> idle_delta;
> + load = 100 * (elapsed_delta - idle_delta) / elapsed_delta;
> + if (load >= load_thresh)
> + busy_nr++;
> + }
> +
> + return (busy_nr >= num_cpus) ? true : false;
> +}
> +
>  static void acpi_pad_idle_cpus(unsigned int num_cpus)
>  {
>   get_online_cpus();
>  
>   num_cpus = min_t(unsigned int, num_cpus, num_online_cpus());
> - set_power_saving_task_num(num_cpus);
> + if (idle_nr_valid(num_cpus))
> + set_power_saving_task_num(num_cpus);
>  
>   put_online_cpus();
>  }
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] PM / hibernate: Create snapshot keys handler

2018-10-08 Thread joeyli
Hi Any, Jann,

On Wed, Oct 03, 2018 at 03:08:12PM -0700, Andy Lutomirski wrote:
> On Tue, Oct 2, 2018 at 12:36 PM Jann Horn  wrote:
> >
> > +Andy for opinions on things in write handlers
> > +Mimi Zohar as EVM maintainer
> >
> > On Tue, Oct 2, 2018 at 9:55 AM joeyli  wrote:
> > > On Thu, Sep 13, 2018 at 04:31:18PM +0200, Jann Horn wrote:
> > > > On Thu, Sep 13, 2018 at 4:08 PM Lee, Chun-Yi  
> > > > wrote:
> > > > > This patch adds a snapshot keys handler for using the key retention
> > > > > service api to create keys for snapshot image encryption and
> > > > > authentication.
> > > [...snip]
> > > > > +static ssize_t disk_kmk_store(struct kobject *kobj, struct 
> > > > > kobj_attribute *attr,
> > > > > + const char *buf, size_t n)
> > > > > +{
> > > > > +   int error = 0;
> > > > > +   char *p;
> > > > > +   int len;
> > > > > +
> > > > > +   if (!capable(CAP_SYS_ADMIN))
> > > > > +   return -EPERM;
> > > >
> > > > This is wrong, you can't use capable() in a write handler. You'd have
> > > > to use file_ns_capable(), and I think sysfs currently doesn't give you
> > > > a pointer to the struct file.
> > > > If you want to do this in a write handler, you'll have to either get
> > > > rid of this check or plumb through the cred struct pointer.
> > > > Alternatively, you could use some interface that doesn't go through a
> > > > write handler.
> > > >
> > >
> > > Thank you for point out this problem.
> > >
> > > Actually the evm_write_key() is the example for my code. The
> > > difference is that evm creates interface file on securityfs, but my
> > > implementation is on sysfs:
> > >
> > > security/integrity/evm/evm_secfs.c
> > >
> > > static ssize_t evm_write_key(struct file *file, const char __user *buf,
> > >  size_t count, loff_t *ppos)
> > > {
> > > int i, ret;
> > >
> > > if (!capable(CAP_SYS_ADMIN) || (evm_initialized & EVM_SETUP))
> > > return -EPERM;
> 
> Yeah, that's a bug.
> 
> > > ...
> > >
> > > On the other hand, the writing handler of /sys/power/wake_lock also
> > > uses capable() to check the CAP_BLOCK_SUSPEND capability:
> > >
> > > kernel/power/main.c
> > > static ssize_t wake_lock_store(struct kobject *kobj,
> > >struct kobj_attribute *attr,
> > >const char *buf, size_t n)
> > > {
> > > int error = pm_wake_lock(buf);
> > > return error ? error : n;
> > > }
> > > power_attr(wake_lock);
> > >
> > > kernel/power/wakelock.c
> > > int pm_wake_lock(const char *buf)
> > > {
> > > ...
> > > if (!capable(CAP_BLOCK_SUSPEND))
> > > return -EPERM;
> > > ...
> 
> Also a bug.
> 
> > >
> > >
> > > So I confused for when can capable() be used in sysfs interface? Is
> > > capable() only allowed in reading handler? Why the writing handler
> > > of securityfs can use capable()?
> >
> > They can't, they're all wrong. :P All of these capable() checks in
> > write handlers have to be assumed to be ineffective. But it's not a
> > big deal because you still need UID 0 to access these files.
> 
> Why are there capability checks at all?
> 
> >
> > > > > +static int user_key_init(void)
> > > > > +{
> > > > > +   struct user_key_payload *ukp;
> > > > > +   struct key *key;
> > > > > +   int err = 0;
> > > > > +
> > > > > +   pr_debug("%s\n", __func__);
> > > > > +
> > > > > +   /* find out swsusp-key */
> > > > > +   key = request_key(&key_type_user, skey.key_name, NULL);
> > > >
> > > > request_key() looks at current's keyring. That shouldn't happen in a
> > > > write handler.
> > > >
> > >
> > > The evm_write_key() also uses request_key() but it's on securityfs. Should
> > > I move my sysfs interface to securityfs?
> >
> > I don't think you should be doing this in the context of any
> > filesystem. If EVM does that, EVM is doing it wrong.
> 
> EVM sounds buggy.
> 
> In general if you look at current *at all* in an implementation of
> write() *in any filesystem*, you are doing it wrong.

I have read CVE-2013-1959... Thanks for Jann and Andy's review.

I will create the sysfs interface through other way, then using 
file_ns_capable() for capability checking in next version.

Thanks a lot!
Joey Lee


Re: [PATCH 1/5] PM / hibernate: Create snapshot keys handler

2018-10-02 Thread joeyli
Hi Jann,

Thanks for your review and very sorry for my delay!

On Thu, Sep 13, 2018 at 04:31:18PM +0200, Jann Horn wrote:
> +cc keyrings list
> 
> On Thu, Sep 13, 2018 at 4:08 PM Lee, Chun-Yi  wrote:
> > This patch adds a snapshot keys handler for using the key retention
> > service api to create keys for snapshot image encryption and
> > authentication.
[...snip]
> > +static ssize_t disk_kmk_store(struct kobject *kobj, struct kobj_attribute 
> > *attr,
> > + const char *buf, size_t n)
> > +{
> > +   int error = 0;
> > +   char *p;
> > +   int len;
> > +
> > +   if (!capable(CAP_SYS_ADMIN))
> > +   return -EPERM;
> 
> This is wrong, you can't use capable() in a write handler. You'd have
> to use file_ns_capable(), and I think sysfs currently doesn't give you
> a pointer to the struct file.
> If you want to do this in a write handler, you'll have to either get
> rid of this check or plumb through the cred struct pointer.
> Alternatively, you could use some interface that doesn't go through a
> write handler.
>

Thank you for point out this problem.

Actually the evm_write_key() is the example for my code. The
difference is that evm creates interface file on securityfs, but my
implementation is on sysfs:

security/integrity/evm/evm_secfs.c

static ssize_t evm_write_key(struct file *file, const char __user *buf,
 size_t count, loff_t *ppos)
{
int i, ret;

if (!capable(CAP_SYS_ADMIN) || (evm_initialized & EVM_SETUP)) 
return -EPERM;
...

On the other hand, the writing handler of /sys/power/wake_lock also
uses capable() to check the CAP_BLOCK_SUSPEND capability: 

kernel/power/main.c 
static ssize_t wake_lock_store(struct kobject *kobj,
   struct kobj_attribute *attr,
   const char *buf, size_t n)
{
int error = pm_wake_lock(buf);
return error ? error : n;
}
power_attr(wake_lock);

kernel/power/wakelock.c
int pm_wake_lock(const char *buf)
{
...
if (!capable(CAP_BLOCK_SUSPEND))
return -EPERM;
...


So I confused for when can capable() be used in sysfs interface? Is
capable() only allowed in reading handler? Why the writing handler
of securityfs can use capable()?

> > +
> > +static int user_key_init(void)
> > +{
> > +   struct user_key_payload *ukp;
> > +   struct key *key;
> > +   int err = 0;
> > +
> > +   pr_debug("%s\n", __func__);
> > +
> > +   /* find out swsusp-key */
> > +   key = request_key(&key_type_user, skey.key_name, NULL);
> 
> request_key() looks at current's keyring. That shouldn't happen in a
> write handler.
>

The evm_write_key() also uses request_key() but it's on securityfs. Should
I move my sysfs interface to securityfs?

> > +   if (IS_ERR(key)) {
> > +   pr_err("Request key error: %ld\n", PTR_ERR(key));
> > +   err = PTR_ERR(key);
> > +   return err;
> > +   }
> > +
> > +   down_write(&key->sem);
> > +   ukp = user_key_payload_locked(key);
> > +   if (!ukp) {
> > +   /* key was revoked before we acquired its semaphore */
> > +   err = -EKEYREVOKED;
> > +   goto key_invalid;
> > +   }
> > +   if (invalid_key(ukp->data, ukp->datalen)) {
> > +   err = -EINVAL;
> > +   goto key_invalid;
> > +   }
> > +   skey.key_len = ukp->datalen;
> > +   memcpy(skey.key, ukp->data, ukp->datalen);
> > +   /* burn the original key contents */
> > +   memzero_explicit(ukp->data, ukp->datalen);
> 
> You just zero out the contents of the supplied key? That seems very
> unidiomatic for the keys subsystem, and makes me wonder why you're
> using the keys subsystem for this in the first place. It doesn't look
> like normal use of the keys subsystem.
> 

Because I want that only one decrypted key in kernel memory. Then hibernation
can handle the key more easy. In evm_init_key(), it also burned the key
contents after evm key be initialled: 

security/integrity/evm/evm_crypto.c
int evm_init_key(void)
{
[...snip]
/* burn the original key contents */
memset(ekp->decrypted_data, 0, ekp->decrypted_datalen);
up_read(&evm_key->sem);
key_put(evm_key);
return rc;
}

The keys subsystem already handles the interactive with userland and TPM.
That's the reason for using keys subsystem in hibernation. 

> > +key_invalid:
> > +   up_write(&key->sem);
> > +   key_put(key);
> > +
> > +   return err;
> > +}
> > +
> > +/* this function may sleeps */
> > +int snapshot_key_init(void)
> > +{
> > +   int err;
> > +
> > +   pr_debug("%s\n", __func__);
> > +
> > +   if (skey.initialized)
> > +   return 0;
> > +
> > +   hash_tfm = crypto_alloc_shash(hash_alg, 0, CRYPTO_ALG_ASYNC);
> > +   if (IS_ERR(hash_tfm)) {
> > +   pr_err("Can't allocate %s transform: %ld\n",
> > +  

Re: [PATCH 1/5] PM / hibernate: Create snapshot keys handler

2018-10-01 Thread joeyli
Hi Yu Chen, 

Thanks for your review and very sorry for my delay!

On Thu, Sep 13, 2018 at 09:58:32PM +0800, Yu Chen wrote:
> On Wed, Sep 12, 2018 at 10:23:33PM +0800, Lee, Chun-Yi wrote:
> > This patch adds a snapshot keys handler for using the key retention
> > service api to create keys for snapshot image encryption and
> > authentication.
> > 
> > This handler uses TPM trusted key as the snapshot master key, and the
> > encryption key and authentication key are derived from the snapshot
> > key. The user defined key can also be used as the snapshot master key
> > , but user must be aware that the security of user key relies on user
> > space.
> > 
> In case the kernel provides mechanism to generate key from passphase,
> the master key could also be generated in kernel space other than TPM.
> It seems than snapshot_key_init() is easy to add the interface for that,
> right?

The user defined key can be used to carry the blob from user space. User
sapce can use keyctl tool to enroll the blob. We can design a structure of
blob that it carries the hash of passphase, salt... so on. Then kernel
parses the blob to generate the key for encryption/authentication.

> > The name of snapshot key is fixed to "swsusp-kmk". User should use
> > the keyctl tool to load the key blob to root's user keyring. e.g.
> > 
> >  # /bin/keyctl add trusted swsusp-kmk "load `cat swsusp-kmk.blob`" @u
> > 
> > or create a new user key. e.g.
> > 
> >  # /bin/keyctl add user swsusp-kmk password @u

The above password can be a blob. The user defined key can carries the
blob to kernel. We can design the blob with userland tool later.

> > 
> > Then the disk_kmk sysfs file can be used to trigger the initialization
> > of snapshot key:
> > 
> >  # echo 1 > /sys/power/disk_kmk
> > 
> > After the initialization be triggered, the secret in the payload of
> > swsusp-key will be copied by hibernation and be erased. Then user can
> > use keyctl to remove swsusp-kmk key from root's keyring.
> > 
> > If user does not trigger the initialization by disk_kmk file after
> > swsusp-kmk be loaded to kernel. Then the snapshot key will be
> > initialled when hibernation be triggered.
> > 
> > Cc: "Rafael J. Wysocki" 
> > Cc: Pavel Machek 
> > Cc: Chen Yu 
> > Cc: Oliver Neukum 
> > Cc: Ryan Chen 
> > Cc: David Howells 
> > Cc: Giovanni Gherdovich 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
[...snip]
> > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> > index abef759de7c8..18d13cbf0591 100644
> > --- a/kernel/power/hibernate.c
> > +++ b/kernel/power/hibernate.c
> > @@ -1034,6 +1034,39 @@ static ssize_t disk_store(struct kobject *kobj, 
> > struct kobj_attribute *attr,
> >  
> >  power_attr(disk);
> >  
> > +#ifdef CONFIG_HIBERNATION_ENC_AUTH
> > +static ssize_t disk_kmk_show(struct kobject *kobj, struct kobj_attribute 
> > *attr,
> > +char *buf)
> > +{
> > +   if (snapshot_key_initialized())
> > +   return sprintf(buf, "initialized\n");
> > +   else
> > +   return sprintf(buf, "uninitialized\n");
> > +}
> > +
> > +static ssize_t disk_kmk_store(struct kobject *kobj, struct kobj_attribute 
> > *attr,
> > + const char *buf, size_t n)
> > +{
> Does kmk mean kernel master key? It might looks unclear from first glance,
> how about disk_genkey_store()?

Yes, it's the kernel maskter key for disk (hibernate).

The sysfs interfaces is used to load the master key from keyring, then
kernel parses the encrypted key or user defined key to grab the plaintext
key. So sysfs triggered the initial process of the master key.

Even user space didn't trigger the process through sysfs, kernel will
still runs the initial process when hibernation be triggered. Then
kernel uses the master key to derive encrypte key and authenticate key. 

I prefer to keep the name of sysfs and the function name. 

> > +   int error = 0;
> > +   char *p;
> > +   int len;
> > +
> > +   if (!capable(CAP_SYS_ADMIN))
> > +   return -EPERM;
> > +
> > +   p = memchr(buf, '\n', n);
> > +   len = p ? p - buf : n;
> > +   if (strncmp(buf, "1", len))
> > +   return -EINVAL;
> Why user is not allowed to disable(remove) it by
> echo 0 ?

Similar with evm, this sysfs interface gives user space a chance
to ask kernel to initial master key. Once the master key be
initialled and loaded, kernel doesn't want the key to be removed
because the key is unsealed by TPM. The PCRs may already capped
then there have no other master key can be unsealed and enrolled.

So, I prefer that do not give user space the function to
remove an enrolled master key. Once the master key be loaded,
kernel will use the key to encrypt snapshot image.   

> > +
> > +   error = snapshot_key_init();
> > +
> > +   return error ? error : n;
> > +}
> > +
> > +power_attr(disk_kmk);
> > +#endif /* !CONFIG_HIBERNATION_ENC_AUTH */
[...snip]
> > diff --git a/kernel/power/snapshot_key.c b/kernel/power/snapshot_key.c
> > new file mode 100644
> > index 00

Re: [PATCH 1/5] PM / hibernate: Create snapshot keys handler

2018-09-13 Thread joeyli
Hi Randy,

On Wed, Sep 12, 2018 at 09:27:27AM -0700, Randy Dunlap wrote:
> Hi,
> 
> On 9/12/18 7:23 AM, Lee, Chun-Yi wrote:
> > diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> > index 3a6c2f87699e..7c5c30149dbc 100644
> > --- a/kernel/power/Kconfig
> > +++ b/kernel/power/Kconfig
> > @@ -76,6 +76,20 @@ config HIBERNATION
> >  
> >   For more information take a look at 
> > .
> >  
> > +config HIBERNATION_ENC_AUTH
> > +   bool "Hibernation encryption and authentication"
> > +   depends on HIBERNATION
> > +   depends on TRUSTED_KEYS
> > +   select CRYPTO_AES
> > +   select CRYPTO_HMAC
> > +   select CRYPTO_SHA512
> > +   help
> > + This option will encrypt and authenticate the memory snapshot image
> > + of hibernation. It prevents that the snapshot image be arbitrary
> 
>arbitrarily
> 
> > + modified. User can use TPMs trusted key or user defined key as the
> 
>   The user
> orA user can use the TPM's trusted key
>

Thanks for your review! I will update it in next version.

Joey Lee 


Re: [PATCH 5/5] PM / hibernate: An option to request that snapshot image must be authenticated

2018-09-13 Thread joeyli
Hi Randy, 

On Wed, Sep 12, 2018 at 09:24:38AM -0700, Randy Dunlap wrote:
> Hi,
> 
> On 9/12/18 7:23 AM, Lee, Chun-Yi wrote:
> > diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> > index 7c5c30149dbc..3c998fd6dc4c 100644
> > --- a/kernel/power/Kconfig
> > +++ b/kernel/power/Kconfig
> > @@ -90,6 +90,17 @@ config HIBERNATION_ENC_AUTH
> >   master key of hibernation. The TPM trusted key depends on TPM. The
> >   security of user defined key relies on user space.
> >  
> > +config HIBERNATION_ENC_AUTH_FORCE
> > +   bool "Require hibernate snapshot image to be validly signed"
> > +   depends on HIBERNATION_ENC_AUTH
> > +   help
> > + This option will prevent that a snapshot image without (valid)
> > + signature be restored. Without this option, a unauthenticated
> 
> an
> 
> > + snapshot image can be restored but the restored kernel will be
> > + tainted. Which also means that the hibernation can be triggered
> 
> s/Which/This/
> 
> or like this:
> tainted, which also
>

Thanks for your review and suggestion! I will update the description in
next version.

Regards
Joey Lee


Re: [PATCH] x86/PCI: Claim the resources of firmware enabled IOAPIC before children bus

2018-08-11 Thread joeyli
On Fri, Aug 10, 2018 at 08:58:37AM -0500, Bjorn Helgaas wrote:
> On Fri, Aug 10, 2018 at 05:25:01PM +0800, joeyli wrote:
> > On Wed, Aug 08, 2018 at 04:23:22PM -0500, Bjorn Helgaas wrote:
> > ...
[...snip]
> > hm... I have another question that it may not relates to this issue. I
> > was tracing the code path of PCI hot-remove/hotplug. Base on spec, looks
> > that the RST# should be asserted when hot-remove. And the memory decode
> > bit must be set to zero after RST# be asserted. But I didn't see that
> > any kernel PCI/ACPI code set RST#. The only possible code to set RST# is
> > in POWER architecture. Do you know who assert the RST# when hot-remove?
> 
> RST# is a conventional PCI signal (not a PCIe signal).  In any case, I
> would expect signals like that to be handled by hardware, not by
> software.  What section of the spec are you looking at?  I wouldn't

In PCI Hot-Plug Spec v1.1

2.2.1 Hot Removal
The Hot-Plug System Driver uses the Hot-Plug Controller to do the following:
a) Assert RST# to the slot and isolate the slot from the rest of the bus, in
   either order.
b) Power down the slot.
c) Change the optional slot-state indicator, as defined in Section 3.1.1, to 
show
   that the slot is off.

In the above description, it said that "Hot-Plug System Driver" should done
the job. So I was think that kernel driver must asserts RST#, but I didn't
find that in kernel code.

Then, in PCI Local Bus spec v2.2, it mentions:

Table 6-1: Command Register Bits
Bit LocationDescription
0   ...State after RST# is 0.
1   ...State after RST# is 0.

So, after hot-remove the RST# must be asserted and the IO/memory
decode bit should also be set to zero.

I was tracing the kerenl hot-remove code for RST# because I
want to make sure that kernel didn't change the RST# state from
firmware.

> expect any requirements for doing things to a device when the device
> is being hot-removed, since the device may already be inaccessible,
> e.g., physically unreachable.

I see! It makes sense.

But I still confused about the "Hot-Plug System Driver" wording in
PCI Hot-Plug Spec. The "Hot-Plug System Driver " means a kernel
driver?

> 
> On a hot-*add*, there would of course be requirements about how the
> device powers up and comes out of reset.  For native drivers like
> pciehp/shpcpd/etc, there are often ways for software to control power
> to the slot, e.g., the "Power Controller Control" bit in the PCIe Slot
> Control register.
> 
> For ACPI-mediated hotplug (as in your situation), the actual hardware
> details are handled by the firmware and all the OS sees are things
> like ACPI Notify events and it uses methods like _STA and other things
> mentioned in ACPI v6.2, sec 6.3.
> 
> > > What are the chances of getting a firmware fix?  Has this firmware
> > > already shipped to customers?
> > 
> > The good news is that the machine has not shipped yet. As I know
> > that manufacturer is also finding the root cause for why firmware
> > enabled memory decode bit and also set the wrong addresses.
> 
> I don't think it's necessarily a problem that firmware enables the
> IOAPIC.  This is ACPI-mediated hotplug and it looks like it adds CPUs,
> memory, and I/O.  I wouldn't be surprised if the firmware has to make
> the IOAPIC operational to make some parts of the hot-add work.
> 
> The address conflict is the real problem.

Thanks for your explanation. It's really useful to me.

Thanks a lot!
Joey Lee


Re: [PATCH] x86/PCI: Claim the resources of firmware enabled IOAPIC before children bus

2018-08-10 Thread joeyli
On Wed, Aug 08, 2018 at 04:23:22PM -0500, Bjorn Helgaas wrote:
> On Wed, Aug 08, 2018 at 11:53:18PM +0800, joeyli wrote:
> > Hi Bjorn,
> > 
> > First, thanks for your review!
> > 
> > On Mon, Aug 06, 2018 at 04:48:07PM -0500, Bjorn Helgaas wrote:
> > > On Tue, Jul 24, 2018 at 07:01:44PM +0800, Lee, Chun-Yi wrote:
> > > > I got a machine that the resource of firmware enabled IOAPIC conflicts
> > > > with the resource of a children bus when the PCI host bus be hotplug.
> > > > 
> > > > [ 3182.243325] PCI host bridge to bus 0001:40
> > > > [ 3182.243328] pci_bus 0001:40: root bus resource [io  0xc000-0xdfff 
> > > > window]
> > > > [ 3182.243330] pci_bus 0001:40: root bus resource [mem 
> > > > 0xdc00-0xebff window]
> > > > [ 3182.243331] pci_bus 0001:40: root bus resource [mem 
> > > > 0x2124-0x2125 window]
> > > > [ 3182.243334] pci_bus 0001:40: root bus resource [bus 40-7e]
> > > > ...
> > > > [ 3182.244737] pci 0001:40:05.4: [8086:6f2c] type 00 class 0x080020
> > > > [ 3182.244746] pci 0001:40:05.4: reg 0x10: [mem 0xdc00-0xdc000fff]
> > > > ...
> > > > [ 3182.246697] pci 0001:40:02.0: PCI bridge to [bus 41]
> > > > [ 3182.246702] pci 0001:40:02.0:   bridge window [mem 
> > > > 0xdc00-0xdc7f]
> > > > ...
> > > > pci 0001:40:05.4: can't claim BAR 0 [mem 0xdc00-0xdc000fff]: 
> > > > address conflict with PCI Bus 0001:41 [mem 0xdc00-0xdc7f]
> > > > 
> > > > The bus topology:
> > > > 
> > > >  +-[0001:40]-+-02.0-[41]--
> > > >  |   +-03.0-[41]--
> > > >  |   +-03.2-[41]--+-00.0  Intel Corporation I350 Gigabit 
> > > > Network Connection
> > > >  |   |+-00.1  Intel Corporation I350 Gigabit 
> > > > Network Connection
> > > >  |   |+-00.2  Intel Corporation I350 Gigabit 
> > > > Network Connection
> > > >  |   |\-00.3  Intel Corporation I350 Gigabit 
> > > > Network Connection
> > > >  |   +-05.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > > > v4/Xeon D Map/VTd_Misc/System Management
> > > >  |   +-05.1  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > > > v4/Xeon D IIO Hot Plug
> > > >  |   +-05.2  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > > > v4/Xeon D IIO RAS/Control Status/Global Errors
> > > >  |   \-05.4  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > > > v4/Xeon D I/O APIC
> > > > 
> > > > This problem causes that the NIC behine the child bus was not available
> > > > after PCI host bridge hotpluged.
> > > > 
> > > > Kernel does not want to change resource of firmware enabled IOAPIC, but
> > > > the priority of children bus's resources are higher than any other 
> > > > devices.
> > > > So this conflict can not be handled by the reassigning logic of kernel.
> > 
> > Sorry for my description is not clear. The "priority" is for resources
> > clamining, not for the address decoding.
> > 
> > > I don't understand this paragraph.  AFAIK, if two devices on a bus
> > > both decode the same address, as the IOAPIC and the bridge do here,
> > > the only real "priority" in PCI would be subtractive decode.  But I
> > > don't think that applies here since the bridge is using a positive
> > > decode window.
> > 
> > Sorry for I didn't understand... How could you know the bridge doesn't
> > apply subtractive decode?
> 
> A subtractive decode bridge forwards anything appearing on its primary
> bus to its secondary bus.  In conventional PCI, it only does this if
> no other agent on the primary bus asserts DEVSEL# (PCI r3.0, sec
> 3.6.1).  In PCIe, the primary "bus" is a link and there's only one
> device on it, so a subtractive decode bridge could forward anything it
> sees.  If the subtractive decode bridge is part of a multi-function
> device, I assume that multi-function device would have to determine
> internally whether the subtractive decode bridge or another function
> should claim the transaction.
> 
> Either way, a subtractive decode bridge can forward anything that
> appears on its primary bus, so a subtractive decode window effectively
> contains everything the upstream bridge passes down.  In this cas

Re: [PATCH v2] platform/x86: acer-wmi: refactor function has_cap

2018-08-08 Thread joeyli
Hi Gustavo, 

Sorry for my delay!

On Mon, Aug 06, 2018 at 03:38:32PM -0500, Gustavo A. R. Silva wrote:
> Refactor function has_cap in order to avoid returning integer
> values, when instead it should return booleans.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva 

Thanks for your patch and also thank Andy and Joe's help. 

Please feel free to add:

Reviewed-by: "Lee, Chun-Yi"  

Joey Lee

> ---
> Changes in v2:
>   Remove parentheses and unnecessary code. Thank you all for
>   the feedback!
> 
>  drivers/platform/x86/acer-wmi.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/platform/x86/acer-wmi.c b/drivers/platform/x86/acer-wmi.c
> index 8952173..3294bb2 100644
> --- a/drivers/platform/x86/acer-wmi.c
> +++ b/drivers/platform/x86/acer-wmi.c
> @@ -672,10 +672,7 @@ static void __init find_quirks(void)
>  
>  static bool has_cap(u32 cap)
>  {
> - if ((interface->capability & cap) != 0)
> - return 1;
> -
> - return 0;
> + return interface->capability & cap;
>  }
>  
>  /*
> -- 
> 2.7.4
> 


Re: [PATCH] x86/PCI: Claim the resources of firmware enabled IOAPIC before children bus

2018-08-08 Thread joeyli
Hi Bjorn,

First, thanks for your review!

On Mon, Aug 06, 2018 at 04:48:07PM -0500, Bjorn Helgaas wrote:
> On Tue, Jul 24, 2018 at 07:01:44PM +0800, Lee, Chun-Yi wrote:
> > I got a machine that the resource of firmware enabled IOAPIC conflicts
> > with the resource of a children bus when the PCI host bus be hotplug.
> > 
> > [ 3182.243325] PCI host bridge to bus 0001:40
> > [ 3182.243328] pci_bus 0001:40: root bus resource [io  0xc000-0xdfff window]
> > [ 3182.243330] pci_bus 0001:40: root bus resource [mem 
> > 0xdc00-0xebff window]
> > [ 3182.243331] pci_bus 0001:40: root bus resource [mem 
> > 0x2124-0x2125 window]
> > [ 3182.243334] pci_bus 0001:40: root bus resource [bus 40-7e]
> > ...
> > [ 3182.244737] pci 0001:40:05.4: [8086:6f2c] type 00 class 0x080020
> > [ 3182.244746] pci 0001:40:05.4: reg 0x10: [mem 0xdc00-0xdc000fff]
> > ...
> > [ 3182.246697] pci 0001:40:02.0: PCI bridge to [bus 41]
> > [ 3182.246702] pci 0001:40:02.0:   bridge window [mem 0xdc00-0xdc7f]
> > ...
> > pci 0001:40:05.4: can't claim BAR 0 [mem 0xdc00-0xdc000fff]: address 
> > conflict with PCI Bus 0001:41 [mem 0xdc00-0xdc7f]
> > 
> > The bus topology:
> > 
> >  +-[0001:40]-+-02.0-[41]--
> >  |   +-03.0-[41]--
> >  |   +-03.2-[41]--+-00.0  Intel Corporation I350 Gigabit Network 
> > Connection
> >  |   |+-00.1  Intel Corporation I350 Gigabit Network 
> > Connection
> >  |   |+-00.2  Intel Corporation I350 Gigabit Network 
> > Connection
> >  |   |\-00.3  Intel Corporation I350 Gigabit Network 
> > Connection
> >  |   +-05.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > v4/Xeon D Map/VTd_Misc/System Management
> >  |   +-05.1  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > v4/Xeon D IIO Hot Plug
> >  |   +-05.2  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > v4/Xeon D IIO RAS/Control Status/Global Errors
> >  |   \-05.4  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
> > v4/Xeon D I/O APIC
> > 
> > This problem causes that the NIC behine the child bus was not available
> > after PCI host bridge hotpluged.
> > 
> > Kernel does not want to change resource of firmware enabled IOAPIC, but
> > the priority of children bus's resources are higher than any other devices.
> > So this conflict can not be handled by the reassigning logic of kernel.

Sorry for my description is not clear. The "priority" is for resources
clamining, not for the address decoding.

> 
> I don't understand this paragraph.  AFAIK, if two devices on a bus
> both decode the same address, as the IOAPIC and the bridge do here,
> the only real "priority" in PCI would be subtractive decode.  But I
> don't think that applies here since the bridge is using a positive
> decode window.
>

Sorry for I didn't understand... How could you know the bridge doesn't
apply subtractive decode?

> I would expect that a read to the [mem 0xdc00-0xdc000fff] range
> would potentially receive two read completions, which would cause an
> Unexpected Completion error.
>

Thanks for your information. The I350 NIC doesn't work after hotplug.
So it may causes by Unexpected Completion error as you mentioned.
 
> Maybe you mean that we don't want to change the IOAPIC resources, but
> it's OK if we move the bridge window, so in that sense, the IOAPIC is
> "higher priority" than the bridge window?  This is the reverse of what
> your paragraph seems to say.
>

Yes, this is what I mean. Sorry for my paragraph causes confusing. 

The memory decoder bit in the command register of 0001:40:05.4 IOAPIC
is raised by firmware after hotplug. So kernel treats it as a
_firmware enabled_ IOAPIC. Because kernel don't want to change the IOAPIC
resources, so my patch try to claim the IOAPIC resources before all
bridges. It can workaround the hotplug problem on issue machine.

But, actually I am not sure that this hacking makes sense. And, I also
don't know why the memory decoder bit is raised by firmware when hotplug.
Maybe this is a firmware problem.
 
> > This patch claims the resources of firmware enabled IOAPIC before
> > children bus. Then kernel gets a chance to reassign the resources of
> > children bus to avoid the conflict.
> 
> Can you open a report at https://bugzilla.kernel.org, attach the
> before and after dmesg logs, and include the URL in the commit log?
>

OK, I have filed a bug on kernel bugzilla and also attached dmesg
log:
https://bugzilla.kernel.org/show_bug.cgi?id=200765

Thanks a lot!
Joey Lee


Re: [PATCH 2/3][RFC] PM / Hibernate: Encrypt the snapshot pages before submitted to the block device

2018-07-18 Thread joeyli
On Fri, Jul 13, 2018 at 03:34:25PM +0800, Yu Chen wrote:
> Hi,
> On Thu, Jul 12, 2018 at 06:10:37PM +0800, joeyli wrote:
> > Hi Yu Chen, 
> > 
> > Sorry for my delay...
> > 
> > On Fri, Jul 06, 2018 at 11:28:56PM +0800, Yu Chen wrote:
[...snip]
> > > > Why the cryption code must be indepent of snapshot code?
> > > >
> > > Modules can be easier to be maintained and customized/improved in the 
> > > future IMO..
> > 
> > hm... currently I didn't see apparent benefit on this...
> > 
> > About modularization, is it possible to separate the key handler code
> > from crypto_hibernation.c? Otherwise the snapshot signature needs
> > to depend on encrypt function.
> > 
> I understand.
> It seems that we can reuse crypto_data() for the signature logic.
> For example, add one parameter for crypto_data(..., crypto_mode)
> the crypto_mode is a combination of 
> ENCRYPT/DECRYPT/SIGNATURE_START/SIGNATURE_END,
> and can be controled by sysfs. If SIGNATURE_START is enabled, the 
> crypto_data()
> invokes crypto_shash_update() to get "hmac(sha512)" hash, and if SIGNATURE_END
> is enabled, crypto_shash_final() stores the final result in global buffer

I agree that the swsusp_prepare and hmac-hash codes can be extract to
crypto_hibernation.  

> struct hibernation_crypto.keys.digest[SNAPSHOT_DIGEST_SIZE] which can be
> passed to the restore kernel, the same as the salt. Besides,

The salt is stored in the swsusp_header and put in swap. What I want
is that the signature of snapshot should be keep in the snapshot header
as my original design in patch. The benefit is that the snapshot with
signature can be stored to any place but not just swap. The user space
can take snapshot image with signature to anywhere.

> the swsusp_prepare_hash() in your code could be moved into
> init_crypto_helper(), thus the signature key could also reuse
> the same key passed from user space or via get_efi_secret_key().
> In this way, the change in snapshot.c is minimal and the crypto
> facilities could be maintained in one file.

I agree. But as I mentioned in that mail, the key handler codes
in hibernation_crypto() should be extract to another C file to avoid
the logic is mixed with the crypto code. The benefit is that we can
add more key sources like encrypted key or EFI key in the future. 

> > > > > 2. There's no need to traverse the snapshot image twice, if the
> > > > >image is large(there's requirement on servers now) we can
> > > > >simplyly do the encryption before the disk IO, and this is
> > > > >why PATCH[2/3] looks like this.
> > > > 
> > > > If the encryption solution is only for block device, then the uswsusp
> > > > interface must be locked-down when kernel is in locked mode:
> > > > 
> > > > uswsusp: Disable when the kernel is locked down
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=lockdown-20180410&id=8732c1663d7c0305ae01ba5a1ee4d2299b7b4612
> > > > 
> > > > I still suggest to keep the solution to direct encript the snapshot
> > > > image for uswsusp because the snapshot can be encrypted by kernel
> > > > before user space get it.
> > > > 
> > > > I mean that if the uswsusp be used, then kernel direct encrypts the
> > > > snapshot image, otherwise kernel encrypts pages before block io.
> > > > 
> > > I did not quite get the point, if the kernel has been locked down,
> > > then the uswsusp is disabled, why the kernel encrypts the snapshot
> > > for uswsusp?
> > 
> > There have some functions be locked-down because there have no
> > appropriate mechanisms to check the integrity of writing data. If
> > the snapshot image can be encrypted and authentication, then the
> > uswsusp interface is avaiable for userspace. We do not need to lock
> > it down.
> Ok, I got your point. To be honest, I like your implementation to treat
> the snapshot data seperately except that it might cause small overhead
> when traversing large number of snapshot and make snapshot logic a little
> coupling with crypto logic. Let me send our v2 using the crypto-before-blockio
> for review and maybe change to your solution using the encapsulated APIs in
> crypto_hibernate.c.

Because sometimes I stick on other topics...

If you are urgent for pushing your encryption solution. Please do not
consider too much on signature check. Just complete your patches and push
to kernel mainline. I will follow your result to respin my signature work.

Thanks
Joey Lee 


Re: [PATCH 2/3][RFC] PM / Hibernate: Encrypt the snapshot pages before submitted to the block device

2018-07-12 Thread joeyli
Hi Yu Chen, 

Sorry for my delay...

On Fri, Jul 06, 2018 at 11:28:56PM +0800, Yu Chen wrote:
> Hi Joey Lee,
> On Fri, Jun 29, 2018 at 08:59:43PM +0800, joeyli wrote:
> > On Thu, Jun 28, 2018 at 10:52:07PM +0800, Yu Chen wrote:
> > > Hi,
> > > On Thu, Jun 28, 2018 at 10:28:56PM +0800, joeyli wrote:
> > > > On Thu, Jun 28, 2018 at 09:50:17PM +0800, Yu Chen wrote:
> > > > > Hi,
> > > > > On Thu, Jun 28, 2018 at 09:07:20PM +0800, joeyli wrote:
> > > > > > Hi Chen Yu,
> > > > > > 
> > > > > > On Wed, Jun 20, 2018 at 05:40:32PM +0800, Chen Yu wrote:
> > > > > > > Use the helper functions introduced previously to encrypt
> > > > > > > the page data before they are submitted to the block device.
> > > > > > > Besides, for the case of hibernation compression, the data
> > > > > > > are firstly compressed and then encrypted, and vice versa
> > > > > > > for the resume process.
> > > > > > >
> > > > > > 
> > > > > > I want to suggest my solution that it direct signs/encrypts the
> > > > > > memory snapshot image. This solution is already shipped with
> > > > > > SLE12 a couple of years:
> > > > > > 
> > > > > > https://github.com/joeyli/linux-s4sign/commits/s4sign-hmac-encrypted-key-v0.2-v4.17-rc3
> > > > > > 
> > > > > I did not see image page encryption in above link, if I understand
> > > > 
> > > > PM / hibernate: Generate and verify signature for snapshot image
> > > > https://github.com/joeyli/linux-s4sign/commit/bae39460393ada4c0226dd07cd5e3afcef86b71f
> > > >
> > > > PM / hibernate: snapshot image encryption
> > > > https://github.com/joeyli/linux-s4sign/commit/6a9a0113bb221c036ebd0f6321b7191283fe4929
> > > >
> > > > The above patches sign and encrypt the data pages in snapshot image.
> > > > It puts the signature to header.
> > > >
> > > It looks like your signature code can be simplyly added on top of the
> > > solution we proposed here, how about we collaborating on this task?
> > 
> > OK, I will base on your user key solution to respin my signature patches.
> >  
> > > just my 2 cents, 
> > > 1. The cryption code should be indepent of the snapshot code, and
> > >this is why we implement it as a kernel module for that in PATCH[1/3].
> > 
> > Why the cryption code must be indepent of snapshot code?
> >
> Modules can be easier to be maintained and customized/improved in the future 
> IMO..

hm... currently I didn't see apparent benefit on this...

About modularization, is it possible to separate the key handler code
from crypto_hibernation.c? Otherwise the snapshot signature needs
to depend on encrypt function.

> > > 2. There's no need to traverse the snapshot image twice, if the
> > >image is large(there's requirement on servers now) we can
> > >simplyly do the encryption before the disk IO, and this is
> > >why PATCH[2/3] looks like this.
> > 
> > If the encryption solution is only for block device, then the uswsusp
> > interface must be locked-down when kernel is in locked mode:
> > 
> > uswsusp: Disable when the kernel is locked down
> > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=lockdown-20180410&id=8732c1663d7c0305ae01ba5a1ee4d2299b7b4612
> > 
> > I still suggest to keep the solution to direct encript the snapshot
> > image for uswsusp because the snapshot can be encrypted by kernel
> > before user space get it.
> > 
> > I mean that if the uswsusp be used, then kernel direct encrypts the
> > snapshot image, otherwise kernel encrypts pages before block io.
> > 
> I did not quite get the point, if the kernel has been locked down,
> then the uswsusp is disabled, why the kernel encrypts the snapshot
> for uswsusp?

There have some functions be locked-down because there have no
appropriate mechanisms to check the integrity of writing data. If
the snapshot image can be encrypted and authentication, then the
uswsusp interface is avaiable for userspace. We do not need to lock
it down.

> > On the other hand, I have a question about asynchronous block io.
> > Please see below...
> > 
> Okay.
> > > > > > > Suggested-by: Rafael J. Wysocki 
> > > > > > > Cc: Rafael J. Wysocki 
> > > > > > > Cc: Pavel Machek 
> > &

Re: [PATCH 0/3][RFC] Introduce the in-kernel hibernation encryption

2018-07-05 Thread joeyli
Hi Chen Yu, 

On Wed, Jun 20, 2018 at 05:39:37PM +0800, Chen Yu wrote:
> Hi,
> As security becomes more and more important, we add the in-kernel
> encryption support for hibernation.
> 
> This prototype is a trial version to implement the hibernation
> encryption in the kernel, so that the users do not have to rely
> on third-party tools to encrypt the hibernation image. The only
> dependency on user space is that, the user space should provide
> a valid key derived from passphrase to the kernel for image encryption.
> 
> There was a discussion on the mailing list on whether this key should
> be derived in kernel or in user space. And it turns out to be generating
> the key by user space is more acceptable[1]. So this patch set is divided
> into two parts:
> 1. The hibernation snapshot encryption in kernel space,
> 2. the key derivation implementation in user space.
> 
> Please refer to each patch for detail, and feel free to comment on
> this, thanks.
> 
> [1] https://www.spinics.net/lists/linux-crypto/msg33145.html
> 
> Chen Yu (3):
>   PM / Hibernate: Add helper functions for hibernation encryption
>   PM / Hibernate: Encrypt the snapshot pages before submitted to the
> block device
>   tools: create power/crypto utility
>

I am trying this patch set.

Could you please tell me how to test the user space crypto utility with
systemd's hibernation module? 

I have a question about the salt. If the salt is saved in image header,
does that mean that kernel needs to read the image header before user
space crypto utility be launched? Otherwise user space can not get
the salt to produce key? I a bit confused about the resume process.

Thanks
Joey Lee 


Re: [PATCH 2/3][RFC] PM / Hibernate: Encrypt the snapshot pages before submitted to the block device

2018-06-29 Thread joeyli
On Thu, Jun 28, 2018 at 10:52:07PM +0800, Yu Chen wrote:
> Hi,
> On Thu, Jun 28, 2018 at 10:28:56PM +0800, joeyli wrote:
> > On Thu, Jun 28, 2018 at 09:50:17PM +0800, Yu Chen wrote:
> > > Hi,
> > > On Thu, Jun 28, 2018 at 09:07:20PM +0800, joeyli wrote:
> > > > Hi Chen Yu,
> > > > 
> > > > On Wed, Jun 20, 2018 at 05:40:32PM +0800, Chen Yu wrote:
> > > > > Use the helper functions introduced previously to encrypt
> > > > > the page data before they are submitted to the block device.
> > > > > Besides, for the case of hibernation compression, the data
> > > > > are firstly compressed and then encrypted, and vice versa
> > > > > for the resume process.
> > > > >
> > > > 
> > > > I want to suggest my solution that it direct signs/encrypts the
> > > > memory snapshot image. This solution is already shipped with
> > > > SLE12 a couple of years:
> > > > 
> > > > https://github.com/joeyli/linux-s4sign/commits/s4sign-hmac-encrypted-key-v0.2-v4.17-rc3
> > > > 
> > > I did not see image page encryption in above link, if I understand
> > 
> > PM / hibernate: Generate and verify signature for snapshot image
> > https://github.com/joeyli/linux-s4sign/commit/bae39460393ada4c0226dd07cd5e3afcef86b71f
> >
> > PM / hibernate: snapshot image encryption
> > https://github.com/joeyli/linux-s4sign/commit/6a9a0113bb221c036ebd0f6321b7191283fe4929
> >
> > The above patches sign and encrypt the data pages in snapshot image.
> > It puts the signature to header.
> >
> It looks like your signature code can be simplyly added on top of the
> solution we proposed here, how about we collaborating on this task?

OK, I will base on your user key solution to respin my signature patches.
 
> just my 2 cents, 
> 1. The cryption code should be indepent of the snapshot code, and
>this is why we implement it as a kernel module for that in PATCH[1/3].

Why the cryption code must be indepent of snapshot code?

> 2. There's no need to traverse the snapshot image twice, if the
>image is large(there's requirement on servers now) we can
>simplyly do the encryption before the disk IO, and this is
>why PATCH[2/3] looks like this.

If the encryption solution is only for block device, then the uswsusp
interface must be locked-down when kernel is in locked mode:

uswsusp: Disable when the kernel is locked down
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=lockdown-20180410&id=8732c1663d7c0305ae01ba5a1ee4d2299b7b4612

I still suggest to keep the solution to direct encript the snapshot
image for uswsusp because the snapshot can be encrypted by kernel
before user space get it.

I mean that if the uswsusp be used, then kernel direct encrypts the
snapshot image, otherwise kernel encrypts pages before block io.

On the other hand, I have a question about asynchronous block io.
Please see below...

> > > > > Suggested-by: Rafael J. Wysocki 
> > > > > Cc: Rafael J. Wysocki 
> > > > > Cc: Pavel Machek 
> > > > > Cc: Len Brown 
> > > > > Cc: Borislav Petkov 
> > > > > Cc: "Lee, Chun-Yi" 
> > > > > Cc: linux...@vger.kernel.org
> > > > > Cc: linux-kernel@vger.kernel.org
> > > > > Signed-off-by: Chen Yu 
> > > > > ---
> > > > >  kernel/power/power.h |   1 +
> > > > >  kernel/power/swap.c  | 215 
> > > > > ---
> > > > >  2 files changed, 205 insertions(+), 11 deletions(-)
[...snip]
> > > > >  /* kernel/power/hibernate.c */
> > > > >  extern int swsusp_check(void);
> > > > > diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> > > > > index c2bcf97..2b6b3d0 100644
> > > > > --- a/kernel/power/swap.c
> > > > > +++ b/kernel/power/swap.c
[...snip]
> > > > > @@ -1069,18 +1171,42 @@ static int load_image(struct swap_map_handle 
> > > > > *handle,
> > > > >   if (!m)
> > > > >   m = 1;
> > > > >   nr_pages = 0;
> > > > > + crypto_page_idx = 0;
> > > > > + if (handle->crypto) {
> > > > > + crypt_buf = (void *)get_zeroed_page(GFP_KERNEL);
> > > > > + if (!crypt_buf)
> > > > > + return -ENOMEM;
> > > > > + }
> > > > >   start = ktime_get();
> > > > >   f

Re: [PATCH 2/3][RFC] PM / Hibernate: Encrypt the snapshot pages before submitted to the block device

2018-06-28 Thread joeyli
On Thu, Jun 28, 2018 at 09:50:17PM +0800, Yu Chen wrote:
> Hi,
> On Thu, Jun 28, 2018 at 09:07:20PM +0800, joeyli wrote:
> > Hi Chen Yu,
> > 
> > On Wed, Jun 20, 2018 at 05:40:32PM +0800, Chen Yu wrote:
> > > Use the helper functions introduced previously to encrypt
> > > the page data before they are submitted to the block device.
> > > Besides, for the case of hibernation compression, the data
> > > are firstly compressed and then encrypted, and vice versa
> > > for the resume process.
> > >
> > 
> > I want to suggest my solution that it direct signs/encrypts the
> > memory snapshot image. This solution is already shipped with
> > SLE12 a couple of years:
> > 
> > https://github.com/joeyli/linux-s4sign/commits/s4sign-hmac-encrypted-key-v0.2-v4.17-rc3
> > 
> I did not see image page encryption in above link, if I understand

PM / hibernate: Generate and verify signature for snapshot image
https://github.com/joeyli/linux-s4sign/commit/bae39460393ada4c0226dd07cd5e3afcef86b71f

PM / hibernate: snapshot image encryption
https://github.com/joeyli/linux-s4sign/commit/6a9a0113bb221c036ebd0f6321b7191283fe4929

The above patches sign and encrypt the data pages in snapshot image.
It puts the signature to header.

> correctly, your code focus on signation and encrypt the hidden data,
> but not target for the whole snapshot data.

Please ignore the hidden area because we already encrypted snapshot image.
Those data doesn't need to erase from snapshot anymore. I will remove
the hidden area patches in next version. 

> > The above patches still need to clean up. I am working on some
> > other bugs, but I can clean up and send out it ASAP.
> > 
> > The advantage of this solution is that it produces a signed and
> > encrypted image. Not just for writing to block device by kernel,
> > it also can provide a signed/encrypted image to user space. User
> > space can store the encrypted image to anywhere.
> > 
> > I am OK for your user space key generator because I didn't have
> > similar solution yet. I am working on the EFI master key and also
> > want to adapt hibernation to keyring. I will continue the works.
> > 
> The user space tool can easily add the keyring support besides
> ioctl if needed.
>

Understood.

Thanks
Joey Lee
 
> Best,
> Yu
> > Thanks a lot!
> > Joey Lee
> >  
> > > Suggested-by: Rafael J. Wysocki 
> > > Cc: Rafael J. Wysocki 
> > > Cc: Pavel Machek 
> > > Cc: Len Brown 
> > > Cc: Borislav Petkov 
> > > Cc: "Lee, Chun-Yi" 
> > > Cc: linux...@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Signed-off-by: Chen Yu 
> > > ---
> > >  kernel/power/power.h |   1 +
> > >  kernel/power/swap.c  | 215 
> > > ---
> > >  2 files changed, 205 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/kernel/power/power.h b/kernel/power/power.h
> > > index 660aac3..637695c 100644
> > > --- a/kernel/power/power.h
> > > +++ b/kernel/power/power.h
> > > @@ -207,6 +207,7 @@ extern int swsusp_swap_in_use(void);
> > >  #define SF_PLATFORM_MODE 1
> > >  #define SF_NOCOMPRESS_MODE   2
> > >  #define SF_CRC32_MODE4
> > > +#define SF_ENCRYPT_MODE  8
> > >  
> > >  /* kernel/power/hibernate.c */
> > >  extern int swsusp_check(void);
> > > diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> > > index c2bcf97..2b6b3d0 100644
> > > --- a/kernel/power/swap.c
> > > +++ b/kernel/power/swap.c
> > > @@ -102,14 +102,16 @@ struct swap_map_handle {
> > >   unsigned int k;
> > >   unsigned long reqd_free_pages;
> > >   u32 crc32;
> > > + bool crypto;
> > >  };
> > >  
> > >  struct swsusp_header {
> > >   char reserved[PAGE_SIZE - 20 - sizeof(sector_t) - sizeof(int) -
> > > -   sizeof(u32)];
> > > + sizeof(u32) - HIBERNATE_SALT_BYTES];
> > >   u32 crc32;
> > >   sector_t image;
> > >   unsigned int flags; /* Flags to pass to the "boot" kernel */
> > > + char salt[HIBERNATE_SALT_BYTES];
> > >   charorig_sig[10];
> > >   charsig[10];
> > >  } __packed;
> > > @@ -127,6 +129,53 @@ struct swsusp_extent {
> > >   unsigned long end;
> > >  };
> > >  
> > > +/* For encryption/decryption. */
> > > +static struct hibernation_crypto *hibernation_crypto_ops;
> > &g

Re: [PATCH 2/3][RFC] PM / Hibernate: Encrypt the snapshot pages before submitted to the block device

2018-06-28 Thread joeyli
Hi Chen Yu,

On Wed, Jun 20, 2018 at 05:40:32PM +0800, Chen Yu wrote:
> Use the helper functions introduced previously to encrypt
> the page data before they are submitted to the block device.
> Besides, for the case of hibernation compression, the data
> are firstly compressed and then encrypted, and vice versa
> for the resume process.
>

I want to suggest my solution that it direct signs/encrypts the
memory snapshot image. This solution is already shipped with
SLE12 a couple of years:

https://github.com/joeyli/linux-s4sign/commits/s4sign-hmac-encrypted-key-v0.2-v4.17-rc3

The above patches still need to clean up. I am working on some
other bugs, but I can clean up and send out it ASAP.

The advantage of this solution is that it produces a signed and
encrypted image. Not just for writing to block device by kernel,
it also can provide a signed/encrypted image to user space. User
space can store the encrypted image to anywhere.

I am OK for your user space key generator because I didn't have
similar solution yet. I am working on the EFI master key and also
want to adapt hibernation to keyring. I will continue the works.

Thanks a lot!
Joey Lee
 
> Suggested-by: Rafael J. Wysocki 
> Cc: Rafael J. Wysocki 
> Cc: Pavel Machek 
> Cc: Len Brown 
> Cc: Borislav Petkov 
> Cc: "Lee, Chun-Yi" 
> Cc: linux...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Chen Yu 
> ---
>  kernel/power/power.h |   1 +
>  kernel/power/swap.c  | 215 
> ---
>  2 files changed, 205 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/power/power.h b/kernel/power/power.h
> index 660aac3..637695c 100644
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -207,6 +207,7 @@ extern int swsusp_swap_in_use(void);
>  #define SF_PLATFORM_MODE 1
>  #define SF_NOCOMPRESS_MODE   2
>  #define SF_CRC32_MODE4
> +#define SF_ENCRYPT_MODE  8
>  
>  /* kernel/power/hibernate.c */
>  extern int swsusp_check(void);
> diff --git a/kernel/power/swap.c b/kernel/power/swap.c
> index c2bcf97..2b6b3d0 100644
> --- a/kernel/power/swap.c
> +++ b/kernel/power/swap.c
> @@ -102,14 +102,16 @@ struct swap_map_handle {
>   unsigned int k;
>   unsigned long reqd_free_pages;
>   u32 crc32;
> + bool crypto;
>  };
>  
>  struct swsusp_header {
>   char reserved[PAGE_SIZE - 20 - sizeof(sector_t) - sizeof(int) -
> -   sizeof(u32)];
> + sizeof(u32) - HIBERNATE_SALT_BYTES];
>   u32 crc32;
>   sector_t image;
>   unsigned int flags; /* Flags to pass to the "boot" kernel */
> + char salt[HIBERNATE_SALT_BYTES];
>   charorig_sig[10];
>   charsig[10];
>  } __packed;
> @@ -127,6 +129,53 @@ struct swsusp_extent {
>   unsigned long end;
>  };
>  
> +/* For encryption/decryption. */
> +static struct hibernation_crypto *hibernation_crypto_ops;
> +
> +void set_hibernation_ops(struct hibernation_crypto *ops)
> +{
> + hibernation_crypto_ops = ops;
> +}
> +EXPORT_SYMBOL_GPL(set_hibernation_ops);
> +
> +static int crypto_data(const char *inbuf,
> + int inlen,
> + char *outbuf,
> + int outlen,
> + bool encrypt,
> + int page_idx)
> +{
> + if (hibernation_crypto_ops &&
> + hibernation_crypto_ops->crypto_data)
> + return hibernation_crypto_ops->crypto_data(inbuf,
> + inlen, outbuf, outlen, encrypt, page_idx);
> + else
> + return -EINVAL;
> +}
> +
> +static void crypto_save(void *outbuf)
> +{
> + if (hibernation_crypto_ops &&
> + hibernation_crypto_ops->save)
> + hibernation_crypto_ops->save(outbuf);
> +}
> +
> +static void crypto_restore(void *inbuf)
> +{
> + if (hibernation_crypto_ops &&
> + hibernation_crypto_ops->restore)
> + hibernation_crypto_ops->restore(inbuf);
> +}
> +
> +static int crypto_init(bool suspend)
> +{
> + if (hibernation_crypto_ops &&
> + hibernation_crypto_ops->init)
> + return hibernation_crypto_ops->init(suspend);
> + else
> + return -EINVAL;
> +}
> +
>  static struct rb_root swsusp_extents = RB_ROOT;
>  
>  static int swsusp_extents_insert(unsigned long swap_offset)
> @@ -318,6 +367,10 @@ static int mark_swapfiles(struct swap_map_handle 
> *handle, unsigned int flags)
>   swsusp_header->flags = flags;
>   if (flags & SF_CRC32_MODE)
>

Re: [PATCH 07/24] hibernate: Disable when the kernel is locked down

2018-05-23 Thread joeyli
Hi experts, 

Sorry for I missed this discussion...

On Thu, Apr 26, 2018 at 10:20:29AM +0200, Jiri Kosina wrote:
> On Thu, 26 Apr 2018, Pavel Machek wrote:
> 
> > That's not how the crypto needs to work. Talk to Jiri Kosina, ok?
> 
> Yeah, Joey Lee (adding to CC) implemented it here:
> 
>   https://lkml.org/lkml/2015/8/11/47
> 
> I think there have been more respins, Joey definitely knows more details 
> and status quo.
> 
> The design is specifically tailored for secure-boot environments though.
>

I am working on the next version of hibernation encryption and authentication:
https://github.com/joeyli/linux-s4sign/wiki 

My plan is:

- Hibernation encryption:
  There is a draft patch to encrypt image by ctr(aes). This patch works
  with the first version of hibernation verification:
  
https://github.com/joeyli/linux-s4sign/commit/6a9a0113bb221c036ebd0f6321b7191283fe4929

- Adapt hibernation to key retention service:
- Using the encrypted key to derive encrypt key and auth key to
  encrypt and hmac snapshot image. Put the encrypted key in the image
  header of snapshot.
- The encrypted key will be encrypted by KMK (kernel master key). Either
  trusted key(sealed by TPM) or EFI key (explain in later) can be the KMK.
  If there have appropriate UI support in initrd, user key can also be
  the KMK.
- Similar with the enrolling EVM key, but more earler:
  The systemd and dracut must be changed for enrolling kernel master key
  before the swap partition be mounted.

- EFI key:
- A new master key type to key retention service.
- It can be a new option beyond trusted key(TPM) and user key.
- EFI stub generates a random key and stores in EFI boot service
  variable:
- This random key in boot variable can be called ERK (EFI Root Key)
- The ERK is secure when secure boot enabled.
- User must aware and enable secure boot by themself if they want.
- ERK can be a secret to encrypt a random number for generate a EFI key
   - The EFI key can be used by hibernation encryption/authentication.
   - The EFI key can be a master key to generate a encrypted key for 
EVM.
- Rescue mechanism for ERK:
- The ERK may be regenerated after the old ERK be erased by firmware 
update
  or firmware recovery.
- Current idea is using the public key in first/second trusted keyring
  to encrypt the ERK for backup. User can enroll the EFI key with old 
ERK to
  request kernel to re-encrypt the EFI key with new ERK.


Thanks a lot!
Joey Lee


Re: [PATCH] efi: Fix the size not consistent issue when unmapping memory map

2018-05-04 Thread joeyli
Hi Ard,

On Thu, May 03, 2018 at 02:05:51PM +0200, Ard Biesheuvel wrote:
> On 2 May 2018 at 08:17, Lee, Chun-Yi  wrote:
> > When using kdump, SOMETIMES the "size not consistent" warning message
> > shows up when the crash kernel boots with early_ioremap_debug parameter:
> >
> > WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 
> > early_iounmap+0x4f/0x12c()
> > early_iounmap(ff200180, 0118) [0] size not consistent 0120
> >
> > The root cause is that the unmapping size of memory map doesn't
> > match with the original size when mapping:
> >
> > in __efi_memmap_init()
> > map.map = early_memremap(phys_map, data->size);
> >
> > in efi_memmap_unmap()
> > size = efi.memmap.desc_size * efi.memmap.nr_map;
> > early_memunmap(efi.memmap.map, size);
> >
> > But the efi.memmap.nr_map is from __efi_memmap_init(). The remainder
> > of size was discarded when calculating the nr_map:
> > map.nr_map = data->size / data->desc_size;
> >
> > When the original size of memory map region does not equal to the
> > result of multiplication. The "size not consistent" warning
> > will be triggered.
> >
> > This issue sometimes was hit by kdump because kexec set the efi map
> > size to align with 16 when loading crash kernel image:
> >
> > in bzImage64_load()
> > efi_map_sz = efi_get_runtime_map_size();
> > efi_map_sz = ALIGN(efi_map_sz, 16);
> >
> > Dave Young's a841aa83d patch fixed kexec issue. On UEFI side, this
> > patch changes the logic in the unmapping function. Using the end
> > address of map to calcuate original size.
> >
> 
> Why do we still need this patch? I.e., in which circumstances will
> efi_memory_map_data::size assume a value that is rounded up or
> otherwise incorrect?
>

There have no other case except kexec. But I think that it's better
to sync mapping/unmapping size between __efi_memmap_init() and 
efi_memmap_unmap(). 


Thanks
Joey Lee
 
> > Thank Randy Wright for his report and testing. And also thank
> > Takashi Iwai for his help to trace issue.
> >
> > Cc: Ard Biesheuvel 
> > Cc: Takashi Iwai 
> > Cc: Vivek Goyal 
> > Cc: Ingo Molnar 
> > Tested-by: Randy Wright 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  drivers/firmware/efi/memmap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/firmware/efi/memmap.c b/drivers/firmware/efi/memmap.c
> > index 5fc7052..1f592d8 100644
> > --- a/drivers/firmware/efi/memmap.c
> > +++ b/drivers/firmware/efi/memmap.c
> > @@ -121,7 +121,7 @@ void __init efi_memmap_unmap(void)
> > if (!efi.memmap.late) {
> > unsigned long size;
> >
> > -   size = efi.memmap.desc_size * efi.memmap.nr_map;
> > +   size = efi.memmap.map_end - efi.memmap.map;
> > early_memunmap(efi.memmap.map, size);
> > } else {
> > memunmap(efi.memmap.map);
> > --
> > 2.10.2
> >


Re: [PATCH] efi: Fix the size not consistent issue when unmapping memory map

2018-04-16 Thread joeyli
On Mon, Apr 16, 2018 at 06:35:22PM -0600, Randy Wright wrote:
> On Mon, Apr 16, 2018 at 02:37:38PM +0800, joeyli wrote:
> > Hi Randy,
> > ...
> > Randy, do you want to try Dave's kexec patch on your environment? Please 
> > remove
> > my patch first.  
> > 
> > Thanks a lot!
> > Joey Lee
> 
> Hi Joey, 
> 
> I tried Dave's patch to kexec-bzimage64.c on my build of the SuSE
> 4.12.14-15 kernel.   I ran the same test as I did with your patch: I
> verified the early_ioremap.c warnings occurred with a crash triggered
> from a kexec boot of the unmodified kernel. Then I applied the patch to
> kexec-bzimage64.c, rebuilt, re-ran the test to crash from the kexec'ed
> kernel, and verified the warnings are no longer seen.
>

Thanks for your help! Looks that Dave's kexec patch works to prevent the
warning message. 
 
> I'm out of time today, but I will plan to run the same  test tomorrow on
> a build of the SuSE 4.4.120-94.17 kernel, on which I had also reported
> the original bug.
>

Both kexec and efi sides will be applied patches.

Thanks a lot!
Joey Lee 


Re: [PATCH] efi: Fix the size not consistent issue when unmapping memory map

2018-04-15 Thread joeyli
Hi Randy,

On Mon, Apr 16, 2018 at 11:09:04AM +0800, Dave Young wrote:
> On 04/16/18 at 10:57am, Dave Young wrote:
> > On 04/13/18 at 02:27pm, Lee, Chun-Yi wrote:
> > > When using kdump, SOMETIMES the "size not consistent" warning message
> > > shows up when the crash kernel boots with early_ioremap_debug parameter:
> > > 
> > > WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 
> > > early_iounmap+0x4f/0x12c()
> > > early_iounmap(ff200180, 0118) [0] size not consistent 0120
> > >
[...snip] 
> > 
> > Good catch.  The kexec code need to be fixed to use a separate buffer so
> > avoid the alignment like what kexec-tools did.  I can submit a fix for
> > that.
> 
> Can you try below code, see if it works?
>

Randy, do you want to try Dave's kexec patch on your environment? Please remove
my patch first.  

Thanks a lot!
Joey Lee
 
> 
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index 3182908b7e6c..eaee37c54b7b 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -398,11 +398,10 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>* little bit simple
>*/
>   efi_map_sz = efi_get_runtime_map_size();
> - efi_map_sz = ALIGN(efi_map_sz, 16);
>   params_cmdline_sz = sizeof(struct boot_params) + cmdline_len +
>   MAX_ELFCOREHDR_STR_LEN;
>   params_cmdline_sz = ALIGN(params_cmdline_sz, 16);
> - kbuf.bufsz = params_cmdline_sz + efi_map_sz +
> + kbuf.bufsz = params_cmdline_sz + ALIGN(efi_map_sz, 16)+
>   sizeof(struct setup_data) +
>   sizeof(struct efi_setup_data);
>  
> @@ -410,7 +409,7 @@ static void *bzImage64_load(struct kimage *image, char 
> *kernel,
>   if (!params)
>   return ERR_PTR(-ENOMEM);
>   efi_map_offset = params_cmdline_sz;
> - efi_setup_data_offset = efi_map_offset + efi_map_sz;
> + efi_setup_data_offset = efi_map_offset + ALIGN(efi_map_sz, 16);
>  
>   /* Copy setup header onto bootparams. Documentation/x86/boot.txt */
>   setup_header_size = 0x0202 + kernel[0x0201] - setup_hdr_offset;


Re: [PATCH] efi: Fix the size not consistent issue when unmapping memory map

2018-04-15 Thread joeyli
On Mon, Apr 16, 2018 at 10:57:34AM +0800, Dave Young wrote:
> On 04/13/18 at 02:27pm, Lee, Chun-Yi wrote:
> > When using kdump, SOMETIMES the "size not consistent" warning message
> > shows up when the crash kernel boots with early_ioremap_debug parameter:
> > 
> > WARNING: CPU: 0 PID: 0 at ../mm/early_ioremap.c:182 
> > early_iounmap+0x4f/0x12c()
> > early_iounmap(ff200180, 0118) [0] size not consistent 0120
> > 
> > The root cause is that the unmapping size of memory map doesn't
> > match with the original size when mapping:
> > 
> > in __efi_memmap_init()
> > map.map = early_memremap(phys_map, data->size);
> > 
> > in efi_memmap_unmap()
> > size = efi.memmap.desc_size * efi.memmap.nr_map;
> > early_memunmap(efi.memmap.map, size);
> > 
> > But the efi.memmap.nr_map is from __efi_memmap_init(). The remainder
> > of size was discarded when calculating the nr_map:
> > map.nr_map = data->size / data->desc_size;
> > 
> > When the original size of memory map region does not equal to the
> > result of multiplication. The "size not consistent" warning
> > will be triggered.
> > 
> > This issue sometimes was hit by kdump because kexec set the efi map
> > size to align with 16 when loading crash kernel image:
> > 
> > in bzImage64_load()
> > efi_map_sz = efi_get_runtime_map_size();
> > efi_map_sz = ALIGN(efi_map_sz, 16);
> > 
> > This patch changes the logic in the unmapping function. Using the
> > end address of map to calcuate original size.
> > 
> > Thank Randy Wright for his report and testing. And also thank
> > Takashi Iwai for his help to trace issue.
> 
> Good catch.  The kexec code need to be fixed to use a separate buffer so
> avoid the alignment like what kexec-tools did.  I can submit a fix for
> that.
>

Thanks!
 
> But this issue could be a potential issue even if kexec get fixed so it
> looks worth a fix in efi code as well.  How about mapping only nr_maps
> *desc_size in __efi_memmap_init?  It looks easier to understand.
> 

Takashi has another patch as you said. Finally I sent this patch because
it's smaller.   

Thanks a lot!
Joey Lee


Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-09 Thread joeyli
On Sun, Apr 08, 2018 at 08:40:10PM -0700, Alexei Starovoitov wrote:
> On Sun, Apr 08, 2018 at 04:07:42PM +0800, joeyli wrote:
> > 
> > > If the only thing that folks are paranoid about is reading
> > > arbitrary kernel memory with bpf_probe_read() helper
> > > then preferred patch would be to disable it during verification
> > > when in lockdown mode
> > 
> > Sorry for I didn't fully understand your idea...
> > Do you mean that using bpf verifier to filter out bpf program that
> > uses bpf_probe_read()?
> 
> Take a look bpf_get_trace_printk_proto().
> Similarly we can add bpf_get_probe_read_proto() that
> will return NULL if lockdown is on.
> Then programs with bpf_probe_read() will be rejected by the verifier.
> 

OK, I saw check_helper_call(). Thank you for point it out.
it's good idea!

Joey Lee


Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-08 Thread joeyli
On Tue, Apr 03, 2018 at 07:34:25PM -0700, Alexei Starovoitov wrote:
> On Tue, Apr 3, 2018 at 9:26 AM, Andy Lutomirski  wrote:
> > On Tue, Apr 3, 2018 at 8:41 AM, Alexei Starovoitov
> >  wrote:
> >> On Tue, Apr 03, 2018 at 08:11:07AM -0700, Andy Lutomirski wrote:
> >>> >
> >>> >> "bpf: Restrict kernel image access functions when the kernel is locked 
> >>> >> down":
> >>> >> This patch just sucks in general.
> >>> >
> >>> > Yes - but that's what Alexei Starovoitov specified.  bpf kind of sucks 
> >>> > since
> >>> > it gives you unrestricted access to the kernel.
> >>>
> >>> bpf, in certain contexts, gives you unrestricted access to *reading*
> >>> kernel memory.  bpf should, under no circumstances, let you write to
> >>> the kernel unless you're using fault injection or similar.
> >>>
> >>> I'm surprised that Alexei acked this patch.  If something like XDP or
> >>> bpfilter starts becoming widely used, this patch will require a lot of
> >>> reworking to avoid breaking standard distros.
> >>
> >> my understanding was that this lockdown set attemps to disallow _reads_
> >> of kernel memory from anything, so first version of patch was adding
> >> run-time checks for bpf_probe_read() which is no-go
> >> and without this helper the bpf for tracing is losing a lot of its power,
> >> so the easiest is to disable it all.
> >
> > Fair enough.
> 
> Actually looking at the patch again:
> https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=efi-lock-down&id=78bb0059c3b8304a8d124b55feebc780fb3e0500
> 

The bpf is totally disabled in this patch.

> If the only thing that folks are paranoid about is reading
> arbitrary kernel memory with bpf_probe_read() helper
> then preferred patch would be to disable it during verification
> when in lockdown mode

Sorry for I didn't fully understand your idea...
Do you mean that using bpf verifier to filter out bpf program that
uses bpf_probe_read()?
.
> No run-time overhead and android folks will be happy
> that lockdown doesn't break their work.
> They converted out-of-tree networking accounting
> module and corresponding user daemon to use bpf:
> https://www.linuxplumbersconf.org/2017/ocw/system/presentations/4791/original/eBPF%20cgroup%20filters%20for%20data%20usage%20accounting%20on%20Android.pdf

Thanks
Joey Lee


Re: [GIT PULL] Kernel lockdown for secure boot

2018-04-08 Thread joeyli
On Wed, Apr 04, 2018 at 04:31:46AM +, Matthew Garrett wrote:
> On Tue, Apr 3, 2018 at 7:34 PM Alexei Starovoitov <
> alexei.starovoi...@gmail.com> wrote:
> > If the only thing that folks are paranoid about is reading
> > arbitrary kernel memory with bpf_probe_read() helper
> > then preferred patch would be to disable it during verification
> > when in lockdown mode.
> > No run-time overhead and android folks will be happy
> > that lockdown doesn't break their work.
> > They converted out-of-tree networking accounting
> > module and corresponding user daemon to use bpf:
> 
> https://www.linuxplumbersconf.org/2017/ocw/system/presentations/4791/original/eBPF%20cgroup%20filters%20for%20data%20usage%20accounting%20on%20Android.pdf
> 
> An alternative would be to only disable kernel reads if the kernel contains
> secrets that aren't supposed to be readable by root. If the keyring is
> configured such that root can read everything, it seems like less of a
> concern?

Currently the KMK (kernel master key) can be a trusted key (TPM sealed) or a
user key (plaintext). The EVM keeps a key (plaintext) in memory that
it is decrypted from a KMK encrypted key. Those kernel reads functions should
be disabled when the KMK be loaded to keyring.

You idea is good that kernel can keep those reads functions enabled until KMK
be loaded. Which means those functions are still available before user enables
KMK and EVM.

There have another idea is using a tree to register all sensitive data
then blanking them when reading. Here is a very early developing version:

https://github.com/joeyli/linux-sensitive_data/commits/sensitive-data-tree-v0.1-v4.15

But this approach causes runtime overhead and all sensitive data address must
be found and registered (e.g. plaintext in encryption module)

Thanks a lot!
Joey Lee 


Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread joeyli
Hi David, 

On Wed, Apr 04, 2018 at 05:17:24PM +0100, David Howells wrote:
> Andy Lutomirski  wrote:
> 
> > Since this thread has devolved horribly, I'm going to propose a solution.
> > 
> > 1. Split the "lockdown" state into three levels:  (please don't
> > bikeshed about the names right now.)
> > 
> > LOCKDOWN_NONE: normal behavior
> > 
> > LOCKDOWN_PROTECT_INTEGREITY: kernel tries to keep root from writing to
> > kernel memory
> > 
> > LOCKDOWN_PROTECT_INTEGRITY_AND_SECRECY: kernel tries to keep root from
> > reading or writing kernel memory.
> 
> In theory, it's good idea, but in practice it's not as easy to implement as I
> think you think.
> 
> Let me list here the things that currently get restricted by lockdown:
> 
[...snip]
>  (5) Kexec.
>

About IMA with kernel module signing and kexec(not on x86_64 yet)...

Because IMA can be used to verify the integrity of kernel module or even
the image for kexec. I think that the
IMA_KEYRINGS_PERMIT_SIGNED_BY_BUILTIN_OR_SECONDARY must be enabled at runtime
when kernel is locked-down.

Because the root can enroll master key to keyring then IMA trusts the ima key
derived from master key. It causes that the arbitrary signed module can be 
loaded
when the root compromised.

Thanks
Joey Lee


Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread joeyli
On Wed, Apr 04, 2018 at 11:19:27PM +0100, David Howells wrote:
> Jann Horn  wrote:
> 
> > > Uh, no.  bpf, for example, can be used to modify kernel memory.
> > 
> > I'm pretty sure bpf isn't supposed to be able to modify arbitrary
> > kernel memory. AFAIU if you can use BPF to write to arbitrary kernel
> > memory, that's a bug; with CAP_SYS_ADMIN, you can read from userspace,
> > write to userspace, and read from kernelspace, but you shouldn't be
> > able to write to kernelspace.
> 
> Ah - you may be right.  I seem to have misremembered what Joey Lee wrote in
> his patch description.
>

Sorry for it's my fault to misunderstood the behavoir of bpf with
CAP_SYS_ADMIN.

Joey Lee


Re: An actual suggestion (Re: [GIT PULL] Kernel lockdown for secure boot)

2018-04-04 Thread joeyli
Hi Andy,

On Wed, Apr 04, 2018 at 07:49:12AM -0700, Andy Lutomirski wrote:
> Since this thread has devolved horribly, I'm going to propose a solution.
...
> 6. There's a way to *decrease* the lockdown level below the configured
> value.  (This ability itself may be gated by a config option.)
> Choices include a UEFI protected variable, an authenticated flag
> passed by the bootloader, and even just some special flag in the boot
> handoff protocol.  It would be really quite useful for a user to be
> able to ask their bootloader to reduce the lockdown level for the
> purpose of a particular boot for debugging.  I read the docs on

The "mokutil --disable-validation" done a similar bahvior as above.
Just it lets kernel to ignore the secure boot. 

> mokutil --disable-validation, and it's quite messy.  Let's have a way
> to do this that is mostly independent of the particular firmware in
> use.
>

Why the disabl-validation is messy?   
The mokutil is shim specific but not dependent on particular firmware. 
 
> I can imagine a grub option that decreases lockdown level along with a
> rule that grub will *not* load that option from its config, for
> example.
>

The root can modify the grub config to decrease lockdown level in next
boot without physcial accessing. The mokutil's interactive UI is used
to deal with user to confirm the physcial accessing.

Thanks
Joey Lee


Re: [PATCH 0/9] KEYS: Blacklisting & UEFI database load

2018-03-27 Thread joeyli
Hi Mimi,

On Mon, Mar 19, 2018 at 10:12:03AM -0400, Mimi Zohar wrote:
> On Sun, 2018-03-11 at 11:20 +0800, joeyli wrote:
> > On Wed, Mar 07, 2018 at 07:28:37AM -0800, James Bottomley wrote:
> > > On Wed, 2018-03-07 at 08:18 -0500, Mimi Zohar wrote:
> > > > On Tue, 2018-03-06 at 15:05 +0100, Jiri Slaby wrote:
> > > > > what's the status of this please? Distributors (I checked SUSE,
> > > > > RedHat and Ubuntu) have to carry these patches and every of them
> > > > > have to forward-port the patches to new kernels. So are you going
> > > > > to resend the PR to have this merged?
> > > [...]
> > > > Just because I trust the platform keys prior to booting the kernel,
> > > > doesn't mean that I *want* to trust those keys once booted.  There
> > > > are, however, places where we need access to those keys to verify a
> > > > signature (eg. kexec kernel image).
> > > 
> > > Which is essentially the reason I always give when these patches come
> > > back
> > >
> > 
> > Josh Boyer's "MODSIGN: Allow the "db" UEFI variable to be suppressed"
> > patch checks MokIgnoreDB variable to ignore db:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=keys-uefi&id=7c395b30a33a617c5cc2cdd419300af71277b79a
> > 
> > I think that we can consider to use MokAllowDB. Which means that kernel
> > ignores DB by default.
> 
> Not all systems have a shim layer.  This design is really x86
> specific.  Allowing shim keys, but ignoring DB, does not address those
> systems.
>

Actually shim is EFI specific but not x86 specific. I agree with you that
not all system has shim. But at least shim provides a way to interact with
user to detect the physical accessing.

For the system doesn't have shim, kernel can provide a boot option for user
to trust the keys in DB. But it also means that the boot option can be
enabled without physical accessing.
 
> > > > Nayna Jain's "certs: define a trusted platform keyring" patch set
> > > > introduces a new, separate keyring for these platform keys.
> > > 
> > > Perhaps, to break the deadlock, we should ask Jiří what the reason is
> > > the distros want these keys to be trusted.  Apart from the Microsoft
> > > key, it will also give you an OEM key in your trusted keyring.  Is it
> > > something to do with OEM supplied modules?
> > >
> > 
> > As I remember that some manufacturers uses certificate in db to
> > sign their kernel module. We need to discuss with them for switching
> > to mok. Currently I do not know all use cases for using db.
> > 
> > There have some benefits for using db:
> > 
> >  - User does not need to deal with shim-mokmanager to enroll mok.
> >Target machine doesn't need to reboot and user doesn't need to
> >face to mokmanager UI.  
> 
> The reason for trusting enrolled shim keys is because it requires
> physical presence.  (I kind of remember hearing that this changed.
>  There is some method of accepting enrolled keys that does not require
> physical presence.)
>

Could you please provide more detail for those methods? Thanks!
 
> >  - The db is a authenticated variable, it's still secure when secure
> >boot is disabled.
> >The db is a authenticated variable that it can only be modified
> >by manufacturer's key. Kernel can trust it when secure boot
> >is disabled. It's useful for we do not need to taint kernel
> >for loading a manufacturer's kernel module even secure boot is
> >disabled.
> > 
> >  - Do not need to worry about the space of NVRAM and the EFI firmware
> >implementation for writing a boot time variable.
> >   
> > But I also agree that we should not trust all keys (like Microsoft key)
> > in db by default.
> 
> Between requiring a shim layer and relying on physical presence, I'm
> not convinced this is the best solution.  Do we really want to support
> different methods for different architectures?
>

It's not the best solution because it relies on other layers. But it's
currently the only solution for general EFI firmware. Or you have other
solution can be used for all architectures?

Thanks a lot!
Joey Lee 


Re: [PATCH] ACPI / scan: Send the change uevent with offine environmental data

2018-03-19 Thread joeyli
Hi Rafael, 

On Mon, Mar 19, 2018 at 11:02:32AM +0100, Rafael J. Wysocki wrote:
> On Friday, March 2, 2018 7:35:08 AM CET Lee, Chun-Yi wrote:
> > In current design of ACPI container offline, Kernel emits
> > KOBJ_CHANGE uevent to user space to indidate that the ejection of
> > the container was triggered by platform. (caa73ea15 patch)
> > 
> > A pure KOBJ_CHANGE uevent is not enough for user space to identify
> > the purpose. For example, a "udevadm trigger" command can also
> > be used to emit change event to all udev rules. A udev rule can not
> > identify that the event is from kernel for offline or from udevadm
> > for other purpose. Then the offline action in udev rule may also be
> > triggered by udevadm tool.
> > 
> > So, similar to the change uevent of dock, kernel sends the
> > KOBJ_CHANGE uevent with a offline environmental data to indicate
> > purpose. It's useful by udev rule for using ENV{EVENT} filter.
> > 
> > Cc: Michal Hocko  
> > Cc: "Rafael J. Wysocki"  
> > Cc: Len Brown  
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  drivers/acpi/scan.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 8e63d93..f6dca9b 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -116,6 +116,7 @@ bool acpi_scan_is_offline(struct acpi_device *adev, 
> > bool uevent)
> >  {
> > struct acpi_device_physical_node *pn;
> > bool offline = true;
> > +   static const char *envp[] = { "EVENT=offline", NULL };
> >  
> > /*
> >  * acpi_container_offline() calls this for all of the container's
> > @@ -126,7 +127,7 @@ bool acpi_scan_is_offline(struct acpi_device *adev, 
> > bool uevent)
> > list_for_each_entry(pn, &adev->physical_node_list, node)
> > if (device_supports_offline(pn->dev) && !pn->dev->offline) {
> > if (uevent)
> > -   kobject_uevent(&pn->dev->kobj, KOBJ_CHANGE);
> > +   kobject_uevent_env(&pn->dev->kobj, KOBJ_CHANGE, 
> > envp);
> >  
> > offline = false;
> > break;
> > 
> 
> This causes build issues when applied, please fix.
> 

I just sent the v2 patch. Thanks for your review!

Joey Lee 


Re: [PATCH 4/5] MODSIGN: checking the blacklisted hash before loading a kernel module

2018-03-16 Thread joeyli
On Thu, Mar 15, 2018 at 07:30:26AM -0700, James Bottomley wrote:
> On Thu, 2018-03-15 at 14:16 +0800, joeyli wrote:
> > On Wed, Mar 14, 2018 at 07:19:25AM -0700, James Bottomley wrote:
> > > 
> > > On Wed, 2018-03-14 at 14:08 +0800, joeyli wrote:
> > > > 
> > > > On Tue, Mar 13, 2018 at 10:18:35AM -0700, James Bottomley wrote:
> > > > > 
> > > > > 
> > > > > On Tue, 2018-03-13 at 18:38 +0800, Lee, Chun-Yi wrote:
> > > > > > 
> > > > > > 
> > > > > > This patch adds the logic for checking the kernel module's
> > > > > > hash base on blacklist. The hash must be generated by sha256
> > > > > > and enrolled to dbx/mokx.
> > > > > > 
> > > > > > For example:
> > > > > > sha256sum sample.ko
> > > > > > mokutil --mokx --import-hash $HASH_RESULT
> > > > > > 
> > > > > > Whether the signature on ko file is stripped or not, the hash
> > > > > > can be compared by kernel.
> > > > > 
> > > > > What's the use case for this?  We're already in trouble from
> > > > > the ODMs for the size of dbx and its consumption of the
> > > > > extremely limited variable space, so do we really have a use
> > > > > case for adding module blacklist hashes to the UEFI variables
> > > > > given the space constraints (as in one we can't do any other
> > > > > way)?
> > > > > 
> > > > 
> > > > The dbx is a authenticated variable that it can only be updated
> > > > by manufacturer. The mokx gives a flexible way for distro to
> > > > revoke a key or a signed module. Then we don't need to touch shim
> > > > or bother manufacturer to deliver new db. Currently it doesn't
> > > > have real use case yet. 
> > > > 
> > > > I knew that the NVRAM has limited space. But distro needs a
> > > > backup solution for emergency.
> > > 
> > > I wasn't asking why the variable, I was asking why the mechanism.
> > > 
> > > OK, let me try to ask the question in a different way:
> > > 
> > > Why would the distribution need to blacklist a module in this way?
> > >  For
> > 
> > This way is a new option for user to blacklist a module but not the
> > only way.
> 
> So this is for the *user* not the distribution?
> 
> >  MOK has this ability because shim implements the mokx by signature
> > database format (EFI_SIGNATURE_DATA in UEFI spec). This format
> > supports both hash signature and x.509 certificate.
> > 
> > > 
> > > the distro to execute the script to add this blacklist, means the
> > > system is getting automated or manual updates ... can't those
> > > updates just remove the module?
> > > 
> > Yes, we can just remove or update the module in kernel rpm or kmp.
> > But user may re-install distro with old kernel or install a old kmp.
> > If the blacklist hash was stored in variable, then kernel can prevent
> > to load the module.
> > 
> > On the other hand, for enrolling mokx, user must reboots system and
> > deals with shim-mokmanager UI. It's more secure because user should
> > really know what he does. And user can choice not to enroll the hash
> > if they still want to use the module.
> 
> OK, so now the use case is the user needs to roll back but doesn't want
> a module to load ... I've got to say that in that case I'd just remove
> it before reload.
> 
> > > The point is that module sha sums are pretty ephemeral in our model
> > > (they change with every kernel), so it seems to be a mismatch to
> > > place them in a permanent blacklist, particularly when we have very
> > > limited space for that list.
> > > 
> > Normally we run a serious process for signing a kernel module before
> > shipping it to customer. The SUSE's "Partner Linux Driver Program”
> > (PLDP) is an example. So the module sha sums are not too ephemeral.
> 
> Ephemeral isn't about the signing process it means that the sum is
> short lived because every time you create a module for a specific
> kernel its sum changes (because of the interface versioning) so your
> blacklist only applies to one module and specific kernel combination.
>  Once you compile it for a different kernel you need a different
> blacklist sum for it.
>

I agree with you that the sum is ephemeral. I will remove this patch
from the mokx series. The certificate in mokx will be loaded to
blacklist keyring. Which means that we still can use mokx to revoke
public key. But kernel will not check the blacklisted hash before
loading kernel module.

Thanks a lot!
Joey Lee


Re: [PATCH 4/5] MODSIGN: checking the blacklisted hash before loading a kernel module

2018-03-14 Thread joeyli
On Wed, Mar 14, 2018 at 07:19:25AM -0700, James Bottomley wrote:
> On Wed, 2018-03-14 at 14:08 +0800, joeyli wrote:
> > On Tue, Mar 13, 2018 at 10:18:35AM -0700, James Bottomley wrote:
> > > 
> > > On Tue, 2018-03-13 at 18:38 +0800, Lee, Chun-Yi wrote:
> > > > 
> > > > This patch adds the logic for checking the kernel module's hash
> > > > base on blacklist. The hash must be generated by sha256 and
> > > > enrolled
> > > > to dbx/mokx.
> > > > 
> > > > For example:
> > > > sha256sum sample.ko
> > > > mokutil --mokx --import-hash $HASH_RESULT
> > > > 
> > > > Whether the signature on ko file is stripped or not, the hash can
> > > > be
> > > > compared by kernel.
> > > 
> > > What's the use case for this?  We're already in trouble from the
> > > ODMs for the size of dbx and its consumption of the extremely
> > > limited variable space, so do we really have a use case for adding
> > > module blacklist hashes to the UEFI variables given the space
> > > constraints (as in one we can't do any other way)?
> > > 
> > 
> > The dbx is a authenticated variable that it can only be updated by
> > manufacturer. The mokx gives a flexible way for distro to revoke a
> > key or a signed module. Then we don't need to touch shim or bother
> > manufacturer to deliver new db. Currently it doesn't have real use
> > case yet. 
> > 
> > I knew that the NVRAM has limited space. But distro needs a backup
> > solution for emergency.
> 
> I wasn't asking why the variable, I was asking why the mechanism.
> 
> OK, let me try to ask the question in a different way:
> 
> Why would the distribution need to blacklist a module in this way?  For

This way is a new option for user to blacklist a module but not the only
way. MOK has this ability because shim implements the mokx by signature
database format (EFI_SIGNATURE_DATA in UEFI spec). This format supports
both hash signature and x.509 certificate.

> the distro to execute the script to add this blacklist, means the
> system is getting automated or manual updates ... can't those updates
> just remove the module?
>
Yes, we can just remove or update the module in kernel rpm or kmp. But
user may re-install distro with old kernel or install a old kmp. If
the blacklist hash was stored in variable, then kernel can prevent
to load the module.

On the other hand, for enrolling mokx, user must reboots system and
deals with shim-mokmanager UI. It's more secure because user should
really know what he does. And user can choice not to enroll the hash
if they still want to use the module.

> The point is that module sha sums are pretty ephemeral in our model
> (they change with every kernel), so it seems to be a mismatch to place
> them in a permanent blacklist, particularly when we have very limited
> space for that list.
>
Normally we run a serious process for signing a kernel module before
shipping it to customer. The SUSE's "Partner Linux Driver Program” (PLDP)
is an example. So the module sha sums are not too ephemeral.

I agree with you for the space is limit. But the mokx gives a option to
distro or user to blacklist kernel module. They can choice to use this
mechanism or not. 

Thanks a lot!
Joey Lee


Re: [PATCH 1/5] MODSIGN: do not load mok when secure boot disabled

2018-03-14 Thread joeyli
Hi Ard, 

First! Thanks for your review!

On Tue, Mar 13, 2018 at 05:25:30PM +, Ard Biesheuvel wrote:
> On 13 March 2018 at 10:37, Lee, Chun-Yi  wrote:
> > The mok can not be trusted when the secure boot is disabled. Which
> > means that the kernel embedded certificate is the only trusted key.
> >
> > Due to db/dbx are authenticated variables, they needs manufacturer's
> > KEK for update. So db/dbx are secure when secureboot disabled.
> >
> 
> Did you consider the case where secure boot is not implemented? I
> don't think db/dbx are secure in that case, although perhaps it may
> not matter (a bit more information on the purpose of these patches and
> all the shim lingo etc would be appreciated)
> 

The patch 5 in this series checks that the db/dbx must have
EFI_VARIABLE_TIME_BASED_AUTHENTICATED_WRITE_ACCESS attribute. But I agree
with you that kernel should checks the SecureBoot variable must
exist in system. I will add patch to detect it.

> > Cc: David Howells 
> > Cc: Josh Boyer 
> > Cc: James Bottomley 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  certs/load_uefi.c | 26 +++---
> >  1 file changed, 15 insertions(+), 11 deletions(-)
> >
> > diff --git a/certs/load_uefi.c b/certs/load_uefi.c
> > index 3d88459..d6de4d0 100644
> > --- a/certs/load_uefi.c
> > +++ b/certs/load_uefi.c
> > @@ -164,17 +164,6 @@ static int __init load_uefi_certs(void)
> > }
> > }
> >
> > -   mok = get_cert_list(L"MokListRT", &mok_var, &moksize);
> 
> Which tree does this apply to? My tree doesn't have get_cert_list()
>

This patch set is base on the efi-lock-down and keys-uefi branchs in
David Howells's linux-fs git tree.

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=keys-uefi

Thanks a lot!
Joey Lee 


Re: [PATCH 4/5] MODSIGN: checking the blacklisted hash before loading a kernel module

2018-03-13 Thread joeyli
On Tue, Mar 13, 2018 at 10:18:35AM -0700, James Bottomley wrote:
> On Tue, 2018-03-13 at 18:38 +0800, Lee, Chun-Yi wrote:
> > This patch adds the logic for checking the kernel module's hash
> > base on blacklist. The hash must be generated by sha256 and enrolled
> > to dbx/mokx.
> > 
> > For example:
> > sha256sum sample.ko
> > mokutil --mokx --import-hash $HASH_RESULT
> > 
> > Whether the signature on ko file is stripped or not, the hash can be
> > compared by kernel.
> 
> What's the use case for this?  We're already in trouble from the ODMs
> for the size of dbx and its consumption of the extremely limited
> variable space, so do we really have a use case for adding module
> blacklist hashes to the UEFI variables given the space constraints (as
> in one we can't do any other way)?
>

The dbx is a authenticated variable that it can only be updated by
manufacturer. The mokx gives a flexible way for distro to revoke a key
or a signed module. Then we don't need to touch shim or bother
manufacturer to deliver new db. Currently it doesn't have real use
case yet. 

I knew that the NVRAM has limited space. But distro needs a backup
solution for emergency.

Thanks a lot!
Joey Lee 


Re: [PATCH 2/5] MODSIGN: print appropriate status message when getting UEFI certificates list

2018-03-13 Thread joeyli
Hi James,

Thanks for your review.

On Tue, Mar 13, 2018 at 10:17:50AM -0700, James Bottomley wrote:
> On Tue, 2018-03-13 at 18:35 +0800, Lee, Chun-Yi wrote:
> > When getting certificates list from UEFI variable, the original error
> > message shows the state number from UEFI firmware. It's hard to be
> > read by human. This patch changed the error message to show the
> > appropriate string.
> > 
> > The message will be showed as:
> > 
> > [0.788529] MODSIGN: Couldn't get UEFI MokListRT: EFI_NOT_FOUND
> > [0.788537] MODSIGN: Couldn't get UEFI MokListXRT: EFI_NOT_FOUND
> 
> I keep saying this, but these error messages need to be gated on the
> presence of shim for the non-shim secure boot case.  You can't assume
> the shim variables are there because they won't be in the case of a
> fully owned system.

I will change the message to pr_debug.

Thanks a lot!
Joey Lee 


Re: [PATCH 0/9] KEYS: Blacklisting & UEFI database load

2018-03-10 Thread joeyli
On Wed, Mar 07, 2018 at 07:28:37AM -0800, James Bottomley wrote:
> On Wed, 2018-03-07 at 08:18 -0500, Mimi Zohar wrote:
> > On Tue, 2018-03-06 at 15:05 +0100, Jiri Slaby wrote:
> > > what's the status of this please? Distributors (I checked SUSE,
> > > RedHat and Ubuntu) have to carry these patches and every of them
> > > have to forward-port the patches to new kernels. So are you going
> > > to resend the PR to have this merged?
> [...]
> > Just because I trust the platform keys prior to booting the kernel,
> > doesn't mean that I *want* to trust those keys once booted.  There
> > are, however, places where we need access to those keys to verify a
> > signature (eg. kexec kernel image).
> 
> Which is essentially the reason I always give when these patches come
> back
>

Josh Boyer's "MODSIGN: Allow the "db" UEFI variable to be suppressed"
patch checks MokIgnoreDB variable to ignore db:

https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/commit/?h=keys-uefi&id=7c395b30a33a617c5cc2cdd419300af71277b79a

I think that we can consider to use MokAllowDB. Which means that kernel
ignores DB by default.

> > Nayna Jain's "certs: define a trusted platform keyring" patch set
> > introduces a new, separate keyring for these platform keys.
> 
> Perhaps, to break the deadlock, we should ask Jiří what the reason is
> the distros want these keys to be trusted.  Apart from the Microsoft
> key, it will also give you an OEM key in your trusted keyring.  Is it
> something to do with OEM supplied modules?
>

As I remember that some manufacturers uses certificate in db to
sign their kernel module. We need to discuss with them for switching
to mok. Currently I do not know all use cases for using db.

There have some benefits for using db:

 - User does not need to deal with shim-mokmanager to enroll mok.
   Target machine doesn't need to reboot and user doesn't need to
   face to mokmanager UI.  

 - The db is a authenticated variable, it's still secure when secure
   boot is disabled.
   The db is a authenticated variable that it can only be modified
   by manufacturer's key. Kernel can trust it when secure boot
   is disabled. It's useful for we do not need to taint kernel
   for loading a manufacturer's kernel module even secure boot is
   disabled.

 - Do not need to worry about the space of NVRAM and the EFI firmware
   implementation for writing a boot time variable.
  
But I also agree that we should not trust all keys (like Microsoft key)
in db by default.

Thanks a lot!
Joey Lee


Re: [PATCH] ACPI / scan: Send the change uevent with offine environmental data

2018-03-02 Thread joeyli
On Fri, Mar 02, 2018 at 03:00:59PM +0100, Michal Hocko wrote:
> On Fri 02-03-18 14:35:08, Lee, Chun-Yi wrote:
> > In current design of ACPI container offline, Kernel emits
> > KOBJ_CHANGE uevent to user space to indidate that the ejection of
> > the container was triggered by platform. (caa73ea15 patch)
> > 
> > A pure KOBJ_CHANGE uevent is not enough for user space to identify
> > the purpose. For example, a "udevadm trigger" command can also
> > be used to emit change event to all udev rules. A udev rule can not
> > identify that the event is from kernel for offline or from udevadm
> > for other purpose. Then the offline action in udev rule may also be
> > triggered by udevadm tool.
> > 
> > So, similar to the change uevent of dock, kernel sends the
> > KOBJ_CHANGE uevent with a offline environmental data to indicate
> > purpose. It's useful by udev rule for using ENV{EVENT} filter.
> 
> Looks reasonable to me. I have also tested this on Huawei Kunlun server
> which hits the offline & online storm as a result of udevadm triggered
> and a container udev rule which hooks into change event and offlines
> all devices attached to the container. This patch allowed the udev rule
> to be more targeted at the offline event.
> 
> > Cc: Michal Hocko  
> > Cc: "Rafael J. Wysocki"  
> > Cc: Len Brown  
> > Signed-off-by: "Lee, Chun-Yi" 
> 
> Acked-by: Michal Hocko 
> Tested-by: Michal Hocko 
>

Thank you for review and testing.

Joey Lee
 
> > ---
> >  drivers/acpi/scan.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 8e63d93..f6dca9b 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -116,6 +116,7 @@ bool acpi_scan_is_offline(struct acpi_device *adev, 
> > bool uevent)
> >  {
> > struct acpi_device_physical_node *pn;
> > bool offline = true;
> > +   static const char *envp[] = { "EVENT=offline", NULL };
> >  
> > /*
> >  * acpi_container_offline() calls this for all of the container's
> > @@ -126,7 +127,7 @@ bool acpi_scan_is_offline(struct acpi_device *adev, 
> > bool uevent)
> > list_for_each_entry(pn, &adev->physical_node_list, node)
> > if (device_supports_offline(pn->dev) && !pn->dev->offline) {
> > if (uevent)
> > -   kobject_uevent(&pn->dev->kobj, KOBJ_CHANGE);
> > +   kobject_uevent_env(&pn->dev->kobj, KOBJ_CHANGE, 
> > envp);
> >  
> > offline = false;
> > break;
> > -- 
> > 2.10.2
> 
> -- 
> Michal Hocko
> SUSE Labs


Re: [PATCH 1/4] MODSIGN: do not load mok when secure boot disabled

2017-11-30 Thread joeyli
Hi James, 

First, thank you for reviewing and comment!

On Thu, Nov 30, 2017 at 07:51:03AM -0800, James Bottomley wrote:
> On Wed, 2017-11-29 at 22:11 +0800, Lee, Chun-Yi wrote:
> > The mok can not be trusted when the secure boot is disabled. Which
> > means that the kernel embedded certificate is the only trusted key.
> > 
> > Due to db/dbx are authenticated variables, they needs manufacturer's
> > KEK for update. So db/dbx are secure when secureboot disabled.
> > 
> > Cc: David Howells  
> > Cc: Josh Boyer 
> > Signed-off-by: "Lee, Chun-Yi" 
> > ---
> >  certs/load_uefi.c | 26 +++---
> >  1 file changed, 15 insertions(+), 11 deletions(-)
> > 
> > diff --git a/certs/load_uefi.c b/certs/load_uefi.c
> > index 3d88459..d6de4d0 100644
> > --- a/certs/load_uefi.c
> > +++ b/certs/load_uefi.c
> > @@ -164,17 +164,6 @@ static int __init load_uefi_certs(void)
> >     }
> >     }
> >  
> > -   mok = get_cert_list(L"MokListRT", &mok_var, &moksize);
> > -   if (!mok) {
> > -   pr_info("MODSIGN: Couldn't get UEFI MokListRT\n");
> > -   } else {
> > -   rc = parse_efi_signature_list("UEFI:MokListRT",
> > -     mok, moksize,
> > get_handler_for_db);
> > -   if (rc)
> > -   pr_err("Couldn't parse MokListRT signatures:
> > %d\n", rc);
> > -   kfree(mok);
> > -   }
> > -
> >     dbx = get_cert_list(L"dbx", &secure_var, &dbxsize);
> >     if (!dbx) {
> >     pr_info("MODSIGN: Couldn't get UEFI dbx list\n");
> > @@ -187,6 +176,21 @@ static int __init load_uefi_certs(void)
> >     kfree(dbx);
> >     }
> >  
> > +   /* the MOK can not be trusted when secure boot is disabled
> > */
> > +   if (!efi_enabled(EFI_SECURE_BOOT))
> > +   return 0;
> > +
> > +   mok = get_cert_list(L"MokListRT", &mok_var, &moksize);
> 
> This isn't really a criticism of your patch but the underlying: all of
> the *RT variables, like MokListRT are insecure runtime variables and
> can be tampered with.  I can agree that I can't see a tamper attack
> between exit boot services and pulling this in to the key list, but I'd
> feel a lot happier if the values were obtained directly from the BS
> variable before exit boot services because that's means the path for
> getting the values is directly within the secure envelope and doesn't
> rely on passing via an insecure element.
>

The shim creates MokListRT as a Runtime-Volatile variable then copies
MokList to MokListRT at boot time. According to spec, the Runtime-Volatile
variable is read only after ExitBootService:

UEFI Sepc V2.7, P.281
...
Once ExitBootServices() is performed, only variables that have
EFI_VARIABLE_RUNTIME_ACCESS and EFI_VARIABLE_NON_VOLATILE set can be
set with SetVariable().
Variables that have runtime access but that are not nonvolatile
are read-only data variables once ExitBootServices() is performed.
...

On the other hand, the kernel code is signed and verified by shim. So
all codes in initial kernel path are secure. 

But, your suggestion reminds me that the code in get_cert_list() must
checks MOK/MOKx's attribute to make sure they are Runtime-Volatile
variables. I will update this patch.

Thanks a lot!
Joey Lee


Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove

2017-11-28 Thread joeyli
On Wed, Nov 29, 2017 at 08:49:13AM +0800, joeyli wrote:
> Hi Andrea, 
> 
> On Fri, Nov 24, 2017 at 10:22:35AM +, Andrea Reale wrote:
> > Resending the patch adding linux-acpi in CC, as suggested by Rafael.
> > Everyone else: apologies for the noise.
> > 
> > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal")
> > introduced an assumption whereas when control
> > reaches remove_memory the corresponding memory has been already
> > offlined. In that case, the acpi_memhotplug was making sure that
> > the assumption held.
> > This assumption, however, is not necessarily true if offlining
> > and removal are not done by the same "controller" (for example,
> > when first offlining via sysfs).
> > 
> > Removing this assumption for the generic remove_memory code
> > and moving it in the specific acpi_memhotplug code. This is
> > a dependency for the software-aided arm64 offlining and removal
> > process.
> > 
> > Signed-off-by: Andrea Reale 
> > Signed-off-by: Maciej Bielski 
> > ---
> >  drivers/acpi/acpi_memhotplug.c |  2 +-
> >  include/linux/memory_hotplug.h |  9 ++---
> >  mm/memory_hotplug.c| 13 +
> >  3 files changed, 16 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> > index 6b0d3ef..b0126a0 100644
> > --- a/drivers/acpi/acpi_memhotplug.c
> > +++ b/drivers/acpi/acpi_memhotplug.c
> > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct 
> > acpi_memory_device *mem_device)
> > nid = memory_add_physaddr_to_nid(info->start_addr);
> >  
> > acpi_unbind_memory_blocks(info);
> > -   remove_memory(nid, info->start_addr, info->length);
> > +   BUG_ON(remove_memory(nid, info->start_addr, info->length));
> > list_del(&info->list);
> > kfree(info);
> > }
> > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> > index 58e110a..1a9c7b2 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -295,7 +295,7 @@ static inline bool movable_node_is_enabled(void)
> >  extern bool is_mem_section_removable(unsigned long pfn, unsigned long 
> > nr_pages);
> >  extern void try_offline_node(int nid);
> >  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
> > -extern void remove_memory(int nid, u64 start, u64 size);
> > +extern int remove_memory(int nid, u64 start, u64 size);
> >  
> >  #else
> >  static inline bool is_mem_section_removable(unsigned long pfn,
> > @@ -311,7 +311,10 @@ static inline int offline_pages(unsigned long 
> > start_pfn, unsigned long nr_pages)
> > return -EINVAL;
> >  }
> >  
> > -static inline void remove_memory(int nid, u64 start, u64 size) {}
> > +static inline int remove_memory(int nid, u64 start, u64 size)
> > +{
> > +   return -EINVAL;
> > +}
> >  #endif /* CONFIG_MEMORY_HOTREMOVE */
> >  
> >  extern int walk_memory_range(unsigned long start_pfn, unsigned long 
> > end_pfn,
> > @@ -323,7 +326,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, 
> > unsigned long start_pfn,
> > unsigned long nr_pages);
> >  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
> >  extern bool is_memblock_offlined(struct memory_block *mem);
> > -extern void remove_memory(int nid, u64 start, u64 size);
> > +extern int remove_memory(int nid, u64 start, u64 size);
> >  extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long 
> > start_pfn);
> >  extern void sparse_remove_one_section(struct zone *zone, struct 
> > mem_section *ms,
> > unsigned long map_offset);
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index d4b5f29..d5f15af 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1892,7 +1892,7 @@ EXPORT_SYMBOL(try_offline_node);
> >   * and online/offline operations before this call, as required by
> >   * try_offline_node().
> >   */
> > -void __ref remove_memory(int nid, u64 start, u64 size)
> > +int __ref remove_memory(int nid, u64 start, u64 size)
> >  {
> > int ret;
> >  
> > @@ -1908,18 +1908,23 @@ void __ref remove_memory(int nid, u64 start, u64 
> > size)
> > ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > check_memblock_offlin

Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove

2017-11-28 Thread joeyli
On Fri, Nov 24, 2017 at 07:17:41PM +0100, Michal Hocko wrote:
> On Fri 24-11-17 15:54:59, Andrea Reale wrote:
> > On Fri 24 Nov 2017, 16:43, Michal Hocko wrote:
> > > On Fri 24-11-17 14:49:17, Andrea Reale wrote:
> > > > Hi Rafael,
> > > > 
> > > > On Fri 24 Nov 2017, 15:39, Rafael J. Wysocki wrote:
> > > > > On Fri, Nov 24, 2017 at 11:22 AM, Andrea Reale 
> > > > >  wrote:
> > > > > > Resending the patch adding linux-acpi in CC, as suggested by Rafael.
> > > > > > Everyone else: apologies for the noise.
> > > > > >
> > > > > > Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory 
> > > > > > removal")
> > > > > > introduced an assumption whereas when control
> > > > > > reaches remove_memory the corresponding memory has been already
> > > > > > offlined. In that case, the acpi_memhotplug was making sure that
> > > > > > the assumption held.
> > > > > > This assumption, however, is not necessarily true if offlining
> > > > > > and removal are not done by the same "controller" (for example,
> > > > > > when first offlining via sysfs).
> > > > > >
> > > > > > Removing this assumption for the generic remove_memory code
> > > > > > and moving it in the specific acpi_memhotplug code. This is
> > > > > > a dependency for the software-aided arm64 offlining and removal
> > > > > > process.
> > > > > >
> > > > > > Signed-off-by: Andrea Reale 
> > > > > > Signed-off-by: Maciej Bielski 
> > > > > > ---
> > > > > >  drivers/acpi/acpi_memhotplug.c |  2 +-
> > > > > >  include/linux/memory_hotplug.h |  9 ++---
> > > > > >  mm/memory_hotplug.c| 13 +
> > > > > >  3 files changed, 16 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/acpi/acpi_memhotplug.c 
> > > > > > b/drivers/acpi/acpi_memhotplug.c
> > > > > > index 6b0d3ef..b0126a0 100644
> > > > > > --- a/drivers/acpi/acpi_memhotplug.c
> > > > > > +++ b/drivers/acpi/acpi_memhotplug.c
> > > > > > @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct 
> > > > > > acpi_memory_device *mem_device)
> > > > > > nid = 
> > > > > > memory_add_physaddr_to_nid(info->start_addr);
> > > > > >
> > > > > > acpi_unbind_memory_blocks(info);
> > > > > > -   remove_memory(nid, info->start_addr, info->length);
> > > > > > +   BUG_ON(remove_memory(nid, info->start_addr, 
> > > > > > info->length));
> > > > > 
> > > > > Why does this have to be BUG_ON()?  Is it really necessary to kill the
> > > > > system here?
> > > > 
> > > > Actually, I hoped you would help me understand that: that BUG() call 
> > > > was introduced
> > > > by yourself in Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify 
> > > > memory removal")
> > > > in memory_hoptlug.c:remove_memory()). 
> > > > 
> > > > Just reading at that commit my understanding was that you were assuming
> > > > that acpi_memory_remove_memory() have already done the job of offlining
> > > > the target memory, so there would be a bug if that wasn't the case.
> > > > 
> > > > In my case, that assumption did not hold and I found that it might not
> > > > hold for other platforms that do not use ACPI. In fact, the purpose of
> > > > this patch is to move this assumption out of the generic hotplug code
> > > > and move it to ACPI code where it originated. 
> > > 
> > > remove_memory failure is basically impossible to handle AFAIR. The
> > > original code to BUG in remove_memory is ugly as hell and we do not want
> > > to spread that out of that function. Instead we really want to get rid
> > > of it.
> > 
> > Today, BUG() is called even in the simple case where remove fails
> > because the section we are removing is not offline.
> 
> You cannot hotremove memory which is still online. This is what caller
> should enforce. This is too late to handle the failure. At least for
> ACPI.
>

The logic in acpi_scan_hot_remove() calls memory_subsys_offline(). If
there doesn't have any error returns by memory_subsys_offline, then ACPI
assumes all devices are offlined by subsystem (memory subsystem in this case).

Then system moves to remove stage, ACPI calls acpi_memory_device_remove().
Here
 
> > I cannot see any need to
> > BUG() in such a case: an error code seems more than sufficient to me.
> 
> I do not rememeber details but AFAIR ACPI is in a deferred (kworker)
> context here and cannot simply communicate error code down the road.
> I agree that we should be able to simply return an error but what is the
> actual error condition that might happen here?
>

Currently acpi_bus_trim() didn't handle any return error. If subsystem
returns error, then ACPI can only interrupt hot-remove process.

> > This is why this patch removes the BUG() call when the "offline" check
> > fails from the generic code. 
> 
> As I've said we should simply get rid of BUG rather than move it around.
>

As I remember that the original BUG() helped us to find out a bug about the
offline state doesn't sync between memblock device with memory state

Re: [PATCH v2 2/5] mm: memory_hotplug: Remove assumption on memory state before hotremove

2017-11-28 Thread joeyli
Hi Andrea, 

On Fri, Nov 24, 2017 at 10:22:35AM +, Andrea Reale wrote:
> Resending the patch adding linux-acpi in CC, as suggested by Rafael.
> Everyone else: apologies for the noise.
> 
> Commit 242831eb15a0 ("Memory hotplug / ACPI: Simplify memory removal")
> introduced an assumption whereas when control
> reaches remove_memory the corresponding memory has been already
> offlined. In that case, the acpi_memhotplug was making sure that
> the assumption held.
> This assumption, however, is not necessarily true if offlining
> and removal are not done by the same "controller" (for example,
> when first offlining via sysfs).
> 
> Removing this assumption for the generic remove_memory code
> and moving it in the specific acpi_memhotplug code. This is
> a dependency for the software-aided arm64 offlining and removal
> process.
> 
> Signed-off-by: Andrea Reale 
> Signed-off-by: Maciej Bielski 
> ---
>  drivers/acpi/acpi_memhotplug.c |  2 +-
>  include/linux/memory_hotplug.h |  9 ++---
>  mm/memory_hotplug.c| 13 +
>  3 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 6b0d3ef..b0126a0 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -282,7 +282,7 @@ static void acpi_memory_remove_memory(struct 
> acpi_memory_device *mem_device)
>   nid = memory_add_physaddr_to_nid(info->start_addr);
>  
>   acpi_unbind_memory_blocks(info);
> - remove_memory(nid, info->start_addr, info->length);
> + BUG_ON(remove_memory(nid, info->start_addr, info->length));
>   list_del(&info->list);
>   kfree(info);
>   }
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 58e110a..1a9c7b2 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -295,7 +295,7 @@ static inline bool movable_node_is_enabled(void)
>  extern bool is_mem_section_removable(unsigned long pfn, unsigned long 
> nr_pages);
>  extern void try_offline_node(int nid);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
> -extern void remove_memory(int nid, u64 start, u64 size);
> +extern int remove_memory(int nid, u64 start, u64 size);
>  
>  #else
>  static inline bool is_mem_section_removable(unsigned long pfn,
> @@ -311,7 +311,10 @@ static inline int offline_pages(unsigned long start_pfn, 
> unsigned long nr_pages)
>   return -EINVAL;
>  }
>  
> -static inline void remove_memory(int nid, u64 start, u64 size) {}
> +static inline int remove_memory(int nid, u64 start, u64 size)
> +{
> + return -EINVAL;
> +}
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
>  
>  extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
> @@ -323,7 +326,7 @@ extern void move_pfn_range_to_zone(struct zone *zone, 
> unsigned long start_pfn,
>   unsigned long nr_pages);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
>  extern bool is_memblock_offlined(struct memory_block *mem);
> -extern void remove_memory(int nid, u64 start, u64 size);
> +extern int remove_memory(int nid, u64 start, u64 size);
>  extern int sparse_add_one_section(struct pglist_data *pgdat, unsigned long 
> start_pfn);
>  extern void sparse_remove_one_section(struct zone *zone, struct mem_section 
> *ms,
>   unsigned long map_offset);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index d4b5f29..d5f15af 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1892,7 +1892,7 @@ EXPORT_SYMBOL(try_offline_node);
>   * and online/offline operations before this call, as required by
>   * try_offline_node().
>   */
> -void __ref remove_memory(int nid, u64 start, u64 size)
> +int __ref remove_memory(int nid, u64 start, u64 size)
>  {
>   int ret;
>  
> @@ -1908,18 +1908,23 @@ void __ref remove_memory(int nid, u64 start, u64 size)
>   ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
>   check_memblock_offlined_cb);
>   if (ret)
> - BUG();
> + goto end_remove;
> +
> + ret = arch_remove_memory(start, size);
> +
> + if (ret)
> + goto end_remove;

The original code triggers BUG() when any memblock is not offlined. Why
the new logic includes the result of arch_remove_memory()?

But I agreed the we don't need BUG(). Returning a error is better.

>  
>   /* remove memmap entry */
>   firmware_map_remove(start, start + size, "System RAM");
>   memblock_free(start, size);
>   memblock_remove(start, size);
>  
> - arch_remove_memory(start, size);
> -
>   try_offline_node(nid);
>  
> +end_remove:
>   mem_hotplug_done();
> + return ret;
>  }
>  EXPORT_SYMBOL_GPL(remove_memory);
>  #endif /* CONFIG_MEMORY_HOTREMOVE */
> -- 
> 2.7.4
> 
> --
> To unsubscribe from this list: send the line 

Re: [PATCH 07/27] kexec_file: Disable at runtime if securelevel has been set

2017-10-28 Thread joeyli
On Fri, Oct 27, 2017 at 03:32:26PM -0400, Mimi Zohar wrote:
> On Thu, 2017-10-26 at 10:17 -0400, Mimi Zohar wrote:
> > On Thu, 2017-10-26 at 15:42 +0800, joeyli wrote:
> > > Hi Mimi,
> > > 
> > > Thank you for reviewing.
> > > 
> > > On Mon, Oct 23, 2017 at 11:54:43AM -0400, Mimi Zohar wrote:
> > > > On Thu, 2017-10-19 at 15:51 +0100, David Howells wrote:
> > > > > From: Chun-Yi Lee 
> > > > > 
> > > > > When KEXEC_VERIFY_SIG is not enabled, kernel should not loads image
> > > > > through kexec_file systemcall if securelevel has been set.
> > > > 
> > > > The patch title and description needs to be updated to refer to
> > > > lockdown, not securelevel.
> > > > 
> > > > As previously mentioned the last time these patches were posted, this
> > > > leaves out testing to see if the integrity subsystem is enabled.
> > > > 
> > > > Commit 503ceaef8e2e "ima: define a set of appraisal rules requiring
> > > > file signatures" was upstreamed.  An additional patch could force
> > > > these rules to be added to the custom policy, if lockdown is enabled.
> > > >  This and other patches in this series could then check to see if
> > > > is_ima_appraise_enabled() is true.
> > > > 
> > > > Mimi
> > > >
> > > 
> > > I have updated the patch title and description, and I also added
> > > is_ima_appraise_enabled() as the following. Is it good to you?
> > 
> > Yes, that works.  Thanks!  Remember is_ima_appraise_enabled() is
> > dependent on the "ima: require secure_boot rules in lockdown mode"
> > patch - http://kernsec.org/pipermail/linux-security-module-archive/201
> > 7-October/003910.html.
> > 
> > The IMA "secure_boot" policy can be specified on the boot command line
> > as ima_policy="secure_boot".  It requires kernel modules, firmware,
> > kexec kernel image and the IMA custom policy to be signed.  In
> > lockdown mode, these rules are enabled by default and added to the
> > custom policy.
> > 
> > > On the other hand, I am not good on IMA. I have traced the code path
> > > in kimage_file_prepare_segments(). Looks that the READING_KEXEC_IMAGE
> > > doesn't show in selinux_kernel_read_file(). Where is the exact code
> > > in IMA for checking the signature when loading crash kernel file?
> > 
> > kernel_read_file_from_fd() calls the security_kernel_read_file() and
> > security_kernel_post_read_file() hooks, which call ima_read_file() and
> > ima_post_read_file() respectively.
> 
> Hm, with "lockdown" enabled on the boot command line, I'm now able to
> do the kexec load, but not the unload.  :/   After the kexec load with

I have tried on Qemu with OVMF, I can load and unload second kernel by
kexec tool (on openSUSE is in kexec-tools RPM):  

# kexec -u -s

I add -s for using kexec-load-file, and I signed kernel by pesign.

> the "--reuse-cmdline" option, the system reboots, but isn't in
> "lockdown" mode.
>

Either enabling secure boot in EFI firmware or using _lockdown_ kernel
parameter, the second kernel can be locked down on my OVMF VM.

I used following commands:

# kexec -s -l /boot/vmlinuz-4.14.0-rc2-default+ --append="$(cat /proc/cmdline)" 
--initrd=/boot/initrd-4.14.0-rc2-default+
# umount -a; mount -o remount,ro /
# kexec -e

The kernel source is from David's linux-fs git with lockdown-20171026 tag.
The kernel is also signed by pesign.

Regards
Joey Lee 


Re: [PATCH 07/27] kexec_file: Disable at runtime if securelevel has been set

2017-10-26 Thread joeyli
Hi Mimi,

Thank you for reviewing.

On Mon, Oct 23, 2017 at 11:54:43AM -0400, Mimi Zohar wrote:
> On Thu, 2017-10-19 at 15:51 +0100, David Howells wrote:
> > From: Chun-Yi Lee 
> > 
> > When KEXEC_VERIFY_SIG is not enabled, kernel should not loads image
> > through kexec_file systemcall if securelevel has been set.
> 
> The patch title and description needs to be updated to refer to
> lockdown, not securelevel.
> 
> As previously mentioned the last time these patches were posted, this
> leaves out testing to see if the integrity subsystem is enabled.
> 
> Commit 503ceaef8e2e "ima: define a set of appraisal rules requiring
> file signatures" was upstreamed.  An additional patch could force
> these rules to be added to the custom policy, if lockdown is enabled.
>  This and other patches in this series could then check to see if
> is_ima_appraise_enabled() is true.
> 
> Mimi
>

I have updated the patch title and description, and I also added
is_ima_appraise_enabled() as the following. Is it good to you?

On the other hand, I am not good on IMA. I have traced the code path
in kimage_file_prepare_segments(). Looks that the READING_KEXEC_IMAGE
doesn't show in selinux_kernel_read_file(). Where is the exact code
in IMA for checking the signature when loading crash kernel file? 

Thanks a lot!
Joey Lee
---

>From 274a2125132ba5aff49e4ccd167f52982732361f Mon Sep 17 00:00:00 2001
From: "Lee, Chun-Yi" 
Date: Thu, 26 Oct 2017 15:24:50 +0800
Subject: [PATCH] kexec_file: The integrity must be checked when the kernel is
 locked down

When KEXEC_VERIFY_SIG and IMA appraise are not enabled, kernel should
not allow that the image to be loaded by kexec_file systemcall when the
kernel is locked down.

The original code was showed in Matthew's patch but not in the later
patch set:
https://lkml.org/lkml/2015/3/13/778

Signed-off-by: "Lee, Chun-Yi" 
---
 kernel/kexec_file.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 9f48f44..b6dc218 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -255,6 +255,14 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, 
initrd_fd,
if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
return -EPERM;
 
+   /* Don't permit images to be loaded into trusted kernels if we're not
+* going to check the integrity on them
+*/
+   if (!IS_ENABLED(CONFIG_KEXEC_VERIFY_SIG) &&
+   !is_ima_appraise_enabled() &&
+   kernel_is_locked_down("kexec of unsigned images"))
+   return -EPERM;
+
/* Make sure we have a legal set of flags */
if (flags != (flags & KEXEC_FILE_FLAGS))
return -EINVAL;
-- 
2.6.2


Re: [PATCH 12/27] x86/msr: Restrict MSR access when the kernel is locked down

2017-10-25 Thread joeyli
Hi David, 

On Mon, Oct 23, 2017 at 03:49:44PM +0100, David Howells wrote:
> Alan Cox  wrote:
> 
> > There are a load of standard tools that use this so I think you are going
> > to need a whitelist. Can you at least log *which* MSR in the failing case
> > so a whitelist can be built over time ?
> 
> Will the attached change work for you?
> 

It's good to me.

Joey Lee

> ---
> diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
> index a05a97863286..f18cadbc31ce 100644
> --- a/arch/x86/kernel/msr.c
> +++ b/arch/x86/kernel/msr.c
> @@ -84,8 +84,10 @@ static ssize_t msr_write(struct file *file, const char 
> __user *buf,
>   int err = 0;
>   ssize_t bytes = 0;
>  
> - if (kernel_is_locked_down("Direct MSR access"))
> + if (kernel_is_locked_down("Direct MSR access")) {
> + pr_info("Direct access to MSR %x\n", reg);
>   return -EPERM;
> + }
>  
>   if (count % 8)
>   return -EINVAL; /* Invalid chunk size */
> @@ -135,6 +137,7 @@ static long msr_ioctl(struct file *file, unsigned int 
> ioc, unsigned long arg)
>   break;
>   }
>   if (kernel_is_locked_down("Direct MSR access")) {
> + pr_info("Direct access to MSR %x\n", reg[1]); /* 
> Display %ecx */
>   err = -EPERM;
>   break;
>   }


Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-10-25 Thread joeyli
On Mon, Oct 23, 2017 at 03:53:00PM +0100, David Howells wrote:
> j...@suse.com wrote:
> 
> > hm... patch 4 only prevents write_mem() but not read_mem().
> > Or I missed anything?
> 
> Actually, yes, as it happens, patch 11 prevents you from even opening /dev/mem
> and /dev/kmem by locking down open of /dev/port.  So I've moved this bit to
> patch 4, simplified and posted a new variant for patch 4.
>

Thank you for pointing out!

Joey Lee


Re: [PATCH 12/27] x86/msr: Restrict MSR access when the kernel is locked down

2017-10-20 Thread joeyli
On Fri, Oct 20, 2017 at 09:48:16PM +0100, David Howells wrote:
> Alan Cox  wrote:
> 
> > There are a load of standard tools that use this so I think you are going
> > to need a whitelist. Can you at least log *which* MSR in the failing case
> > so a whitelist can be built over time ?
> 
[...snip]
> 
> And do you know where wrmsr_safe_regs() might be found?  I can see things
> using it and exporting it, but no implementation, so I'm guessing it's
> macroised somewhere.

Looks the definition is in 

arch/x86/lib/msr-reg.S

#ifdef CONFIG_X86_64
/*
 * int {rdmsr,wrmsr}_safe_regs(u32 gprs[8]);
 *
 * reg layout: u32 gprs[eax, ecx, edx, ebx, esp, ebp, esi, edi]
 *
 */
.macro op_safe_regs op
ENTRY(\op\()_safe_regs)
pushq %rbx
pushq %r12
...

Regards
Joey Lee


Re: [PATCH 17/27] acpi: Disable APEI error injection if the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:52:41PM +0100, David Howells wrote:
> From: Linn Crosetto 
> 
> ACPI provides an error injection mechanism, EINJ, for debugging and testing
> the ACPI Platform Error Interface (APEI) and other RAS features.  If
> supported by the firmware, ACPI specification 5.0 and later provide for a
> way to specify a physical memory address to which to inject the error.
> 
> Injecting errors through EINJ can produce errors which to the platform are
> indistinguishable from real hardware errors.  This can have undesirable
> side-effects, such as causing the platform to mark hardware as needing
> replacement.
> 
> While it does not provide a method to load unauthenticated privileged code,
> the effect of these errors may persist across reboots and affect trust in
> the underlying hardware, so disable error injection through EINJ if
> the kernel is locked down.
> 
> Signed-off-by: Linn Crosetto 
> Signed-off-by: David Howells 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: linux-a...@vger.kernel.org
> ---
> 
>  drivers/acpi/apei/einj.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c
> index b38737c83a24..6d71e1e97b20 100644
> --- a/drivers/acpi/apei/einj.c
> +++ b/drivers/acpi/apei/einj.c
> @@ -518,6 +518,9 @@ static int einj_error_inject(u32 type, u32 flags, u64 
> param1, u64 param2,
>   int rc;
>   u64 base_addr, size;
>  
> + if (kernel_is_locked_down("ACPI error injection"))
> + return -EPERM;
> +
>   /* If user manually set "flags", make sure it is legal */
>   if (flags && (flags &
>   ~(SETWA_FLAGS_APICID|SETWA_FLAGS_MEM|SETWA_FLAGS_PCIE_SBDF)))
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 16/27] acpi: Disable ACPI table override if the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:52:34PM +0100, David Howells wrote:
> From: Linn Crosetto 
> 
> >From the kernel documentation (initrd_table_override.txt):
> 
>   If the ACPI_INITRD_TABLE_OVERRIDE compile option is true, it is possible
>   to override nearly any ACPI table provided by the BIOS with an
>   instrumented, modified one.
> 
> When securelevel is set, the kernel should disallow any unauthenticated
> changes to kernel space.  ACPI tables contain code invoked by the kernel,
> so do not allow ACPI tables to be overridden if the kernel is locked down.
> 
> Signed-off-by: Linn Crosetto 
> Signed-off-by: David Howells 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: linux-a...@vger.kernel.org
> ---
> 
>  drivers/acpi/tables.c |5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index 80ce2a7d224b..5cc13c42daf9 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -526,6 +526,11 @@ void __init acpi_table_upgrade(void)
>   if (table_nr == 0)
>   return;
>  
> + if (kernel_is_locked_down("ACPI table override")) {
> + pr_notice("kernel is locked down, ignoring table override\n");
> + return;
> + }
> +
>   acpi_tables_addr =
>   memblock_find_in_range(0, ACPI_TABLE_UPGRADE_MAX_PHYS,
>  all_tables_size, PAGE_SIZE);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 15/27] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:52:27PM +0100, David Howells wrote:
> From: Josh Boyer 
> 
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware .  Reject
> the option when the kernel is locked down.
> 
> Signed-off-by: Josh Boyer 
> Signed-off-by: David Howells 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: Dave Young 
> cc: linux-a...@vger.kernel.org
> ---
> 
>  drivers/acpi/osl.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index db78d353bab1..36c6527c1b0a 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -192,7 +192,7 @@ acpi_physical_address __init 
> acpi_os_get_root_pointer(void)
>   acpi_physical_address pa = 0;
>  
>  #ifdef CONFIG_KEXEC
> - if (acpi_rsdp)
> + if (acpi_rsdp && !kernel_is_locked_down("ACPI RSDP specification"))
>   return acpi_rsdp;
>  #endif
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/27] ACPI: Limit access to custom_method when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:52:19PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> custom_method effectively allows arbitrary access to system memory, making
> it possible for an attacker to circumvent restrictions on module loading.
> Disable it if the kernel is locked down.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: linux-a...@vger.kernel.org
> ---
> 
>  drivers/acpi/custom_method.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/custom_method.c b/drivers/acpi/custom_method.c
> index c68e72414a67..b33fba70ec51 100644
> --- a/drivers/acpi/custom_method.c
> +++ b/drivers/acpi/custom_method.c
> @@ -29,6 +29,9 @@ static ssize_t cm_write(struct file *file, const char 
> __user * user_buf,
>   struct acpi_table_header table;
>   acpi_status status;
>  
> + if (kernel_is_locked_down("ACPI custom methods"))
> + return -EPERM;
> +
>   if (!(*ppos)) {
>   /* parse the table header to get the table length */
>   if (count <= sizeof(struct acpi_table_header))
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 13/27] asus-wmi: Restrict debugfs interface when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:52:11PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> We have no way of validating what all of the Asus WMI methods do on a given
> machine - and there's a risk that some will allow hardware state to be
> manipulated in such a way that arbitrary code can be executed in the
> kernel, circumventing module loading restrictions.  Prevent that if the
> kernel is locked down.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: acpi4asus-u...@lists.sourceforge.net
> cc: platform-driver-...@vger.kernel.org
> ---
> 
>  drivers/platform/x86/asus-wmi.c |9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c
> index 48e1541dc8d4..ef5587469337 100644
> --- a/drivers/platform/x86/asus-wmi.c
> +++ b/drivers/platform/x86/asus-wmi.c
> @@ -1905,6 +1905,9 @@ static int show_dsts(struct seq_file *m, void *data)
>   int err;
>   u32 retval = -1;
>  
> + if (kernel_is_locked_down("Asus WMI"))
> + return -EPERM;
> +
>   err = asus_wmi_get_devstate(asus, asus->debug.dev_id, &retval);
>  
>   if (err < 0)
> @@ -1921,6 +1924,9 @@ static int show_devs(struct seq_file *m, void *data)
>   int err;
>   u32 retval = -1;
>  
> + if (kernel_is_locked_down("Asus WMI"))
> + return -EPERM;
> +
>   err = asus_wmi_set_devstate(asus->debug.dev_id, asus->debug.ctrl_param,
>   &retval);
>  
> @@ -1945,6 +1951,9 @@ static int show_call(struct seq_file *m, void *data)
>   union acpi_object *obj;
>   acpi_status status;
>  
> + if (kernel_is_locked_down("Asus WMI"))
> + return -EPERM;
> +
>   status = wmi_evaluate_method(ASUS_WMI_MGMT_GUID,
>0, asus->debug.method_id,
>&input, &output);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 12/27] x86/msr: Restrict MSR access when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:52:04PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> Writing to MSRs should not be allowed if the kernel is locked down, since
> it could lead to execution of arbitrary code in kernel mode.  Based on a
> patch by Kees Cook.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 
> Acked-by: Kees Cook 
> Reviewed-by: Thomas Gleixner 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: x...@kernel.org
> ---
> 
>  arch/x86/kernel/msr.c |7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/msr.c b/arch/x86/kernel/msr.c
> index ef688804f80d..a05a97863286 100644
> --- a/arch/x86/kernel/msr.c
> +++ b/arch/x86/kernel/msr.c
> @@ -84,6 +84,9 @@ static ssize_t msr_write(struct file *file, const char 
> __user *buf,
>   int err = 0;
>   ssize_t bytes = 0;
>  
> + if (kernel_is_locked_down("Direct MSR access"))
> + return -EPERM;
> +
>   if (count % 8)
>   return -EINVAL; /* Invalid chunk size */
>  
> @@ -131,6 +134,10 @@ static long msr_ioctl(struct file *file, unsigned int 
> ioc, unsigned long arg)
>   err = -EBADF;
>   break;
>   }
> + if (kernel_is_locked_down("Direct MSR access")) {
> + err = -EPERM;
> + break;
> + }
>   if (copy_from_user(®s, uregs, sizeof regs)) {
>   err = -EFAULT;
>   break;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 11/27] x86: Lock down IO port access when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:51:56PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> IO port access would permit users to gain access to PCI configuration
> registers, which in turn (on a lot of hardware) give access to MMIO
> register space. This would potentially permit root to trigger arbitrary
> DMA, so lock it down by default.
> 
> This also implicitly locks down the KDADDIO, KDDELIO, KDENABIO and
> KDDISABIO console ioctls.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 
> Reviewed-by: Thomas Gleixner 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks!
Joey Lee

> cc: x...@kernel.org
> ---
> 
>  arch/x86/kernel/ioport.c |6 --
>  drivers/char/mem.c   |2 ++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c
> index 9c3cf0944bce..2c0f058651c5 100644
> --- a/arch/x86/kernel/ioport.c
> +++ b/arch/x86/kernel/ioport.c
> @@ -30,7 +30,8 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned 
> long num, int turn_on)
>  
>   if ((from + num <= from) || (from + num > IO_BITMAP_BITS))
>   return -EINVAL;
> - if (turn_on && !capable(CAP_SYS_RAWIO))
> + if (turn_on && (!capable(CAP_SYS_RAWIO) ||
> + kernel_is_locked_down("ioperm")))
>   return -EPERM;
>  
>   /*
> @@ -120,7 +121,8 @@ SYSCALL_DEFINE1(iopl, unsigned int, level)
>   return -EINVAL;
>   /* Trying to gain more privileges? */
>   if (level > old) {
> - if (!capable(CAP_SYS_RAWIO))
> + if (!capable(CAP_SYS_RAWIO) ||
> + kernel_is_locked_down("iopl"))
>   return -EPERM;
>   }
>   regs->flags = (regs->flags & ~X86_EFLAGS_IOPL) |
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index b7c36898b689..0875b3d47773 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -768,6 +768,8 @@ static loff_t memory_lseek(struct file *file, loff_t 
> offset, int orig)
>  
>  static int open_port(struct inode *inode, struct file *filp)
>  {
> + if (kernel_is_locked_down("Direct ioport access"))
> + return -EPERM;
>   return capable(CAP_SYS_RAWIO) ? 0 : -EPERM;
>  }
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 10/27] PCI: Lock down BAR access when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:51:49PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> Any hardware that can potentially generate DMA has to be locked down in
> order to avoid it being possible for an attacker to modify kernel code,
> allowing them to circumvent disabled module loading or module signing.
> Default to paranoid - in future we can potentially relax this for
> sufficiently IOMMU-isolated devices.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 
> Acked-by: Bjorn Helgaas 

I have reviewed this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> cc: linux-...@vger.kernel.org
> ---
> 
>  drivers/pci/pci-sysfs.c |9 +
>  drivers/pci/proc.c  |9 -
>  drivers/pci/syscall.c   |3 ++-
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 1eecfa301f7f..e1a3b0e765c2 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -881,6 +881,9 @@ static ssize_t pci_write_config(struct file *filp, struct 
> kobject *kobj,
>   loff_t init_off = off;
>   u8 *data = (u8 *) buf;
>  
> + if (kernel_is_locked_down("Direct PCI access"))
> + return -EPERM;
> +
>   if (off > dev->cfg_size)
>   return 0;
>   if (off + count > dev->cfg_size) {
> @@ -1175,6 +1178,9 @@ static int pci_mmap_resource(struct kobject *kobj, 
> struct bin_attribute *attr,
>   enum pci_mmap_state mmap_type;
>   struct resource *res = &pdev->resource[bar];
>  
> + if (kernel_is_locked_down("Direct PCI access"))
> + return -EPERM;
> +
>   if (res->flags & IORESOURCE_MEM && iomem_is_exclusive(res->start))
>   return -EINVAL;
>  
> @@ -1255,6 +1261,9 @@ static ssize_t pci_write_resource_io(struct file *filp, 
> struct kobject *kobj,
>struct bin_attribute *attr, char *buf,
>loff_t off, size_t count)
>  {
> + if (kernel_is_locked_down("Direct PCI access"))
> + return -EPERM;
> +
>   return pci_resource_io(filp, kobj, attr, buf, off, count, true);
>  }
>  
> diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
> index 098360d7ff81..a6c53d855daa 100644
> --- a/drivers/pci/proc.c
> +++ b/drivers/pci/proc.c
> @@ -116,6 +116,9 @@ static ssize_t proc_bus_pci_write(struct file *file, 
> const char __user *buf,
>   int size = dev->cfg_size;
>   int cnt;
>  
> + if (kernel_is_locked_down("Direct PCI access"))
> + return -EPERM;
> +
>   if (pos >= size)
>   return 0;
>   if (nbytes >= size)
> @@ -195,6 +198,9 @@ static long proc_bus_pci_ioctl(struct file *file, 
> unsigned int cmd,
>  #endif /* HAVE_PCI_MMAP */
>   int ret = 0;
>  
> + if (kernel_is_locked_down("Direct PCI access"))
> + return -EPERM;
> +
>   switch (cmd) {
>   case PCIIOC_CONTROLLER:
>   ret = pci_domain_nr(dev->bus);
> @@ -236,7 +242,8 @@ static int proc_bus_pci_mmap(struct file *file, struct 
> vm_area_struct *vma)
>   struct pci_filp_private *fpriv = file->private_data;
>   int i, ret, write_combine = 0, res_bit = IORESOURCE_MEM;
>  
> - if (!capable(CAP_SYS_RAWIO))
> + if (!capable(CAP_SYS_RAWIO) ||
> + kernel_is_locked_down("Direct PCI access"))
>   return -EPERM;
>  
>   if (fpriv->mmap_state == pci_mmap_io) {
> diff --git a/drivers/pci/syscall.c b/drivers/pci/syscall.c
> index 9bf993e1f71e..afa01cc3ceec 100644
> --- a/drivers/pci/syscall.c
> +++ b/drivers/pci/syscall.c
> @@ -92,7 +92,8 @@ SYSCALL_DEFINE5(pciconfig_write, unsigned long, bus, 
> unsigned long, dfn,
>   u32 dword;
>   int err = 0;
>  
> - if (!capable(CAP_SYS_ADMIN))
> + if (!capable(CAP_SYS_ADMIN) ||
> + kernel_is_locked_down("Direct PCI access"))
>   return -EPERM;
>  
>   dev = pci_get_bus_and_slot(bus, dfn);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/27] uswsusp: Disable when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:51:42PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> uswsusp allows a user process to dump and then restore kernel state, which
> makes it possible to modify the running kernel.  Disable this if the kernel
> is locked down.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 
> cc: linux...@vger.kernel.org

I have reviewed and tested this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> ---
> 
>  kernel/power/user.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 22df9f7ff672..678ade9decfe 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -52,6 +52,9 @@ static int snapshot_open(struct inode *inode, struct file 
> *filp)
>   if (!hibernation_available())
>   return -EPERM;
>  
> + if (kernel_is_locked_down("/dev/snapshot"))
> + return -EPERM;
> +
>   lock_system_sleep();
>  
>   if (!atomic_add_unless(&snapshot_device_available, -1, 0)) {
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/27] hibernate: Disable when the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:51:34PM +0100, David Howells wrote:
> From: Josh Boyer 
> 
> There is currently no way to verify the resume image when returning
> from hibernate.  This might compromise the signed modules trust model,
> so until we can work with signed hibernate images we disable it when the
> kernel is locked down.
> 
> Signed-off-by: Josh Boyer 
> Signed-off-by: David Howells 
> cc: linux...@vger.kernel.org

I have reviewed and tested this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> ---
> 
>  kernel/power/hibernate.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index a5c36e9c56a6..f2eafefeec50 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -70,7 +70,7 @@ static const struct platform_hibernation_ops 
> *hibernation_ops;
>  
>  bool hibernation_available(void)
>  {
> - return (nohibernate == 0);
> + return nohibernate == 0 && !kernel_is_locked_down("Hibernation");
>  }
>  
>  /**
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/27] Copy secure_boot flag in boot params across kexec reboot

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:51:20PM +0100, David Howells wrote:
> From: Dave Young 
> 
> Kexec reboot in case secure boot being enabled does not keep the secure
> boot mode in new kernel, so later one can load unsigned kernel via legacy
> kexec_load.  In this state, the system is missing the protections provided
> by secure boot.
> 
> Adding a patch to fix this by retain the secure_boot flag in original
> kernel.
> 
> secure_boot flag in boot_params is set in EFI stub, but kexec bypasses the
> stub.  Fixing this issue by copying secure_boot flag across kexec reboot.
> 
> Signed-off-by: Dave Young 
> Signed-off-by: David Howells 
> cc: ke...@lists.infradead.org

I have reviewed and tested this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> ---
> 
>  arch/x86/kernel/kexec-bzimage64.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index fb095ba0c02f..7d0fac5bcbbe 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -179,6 +179,7 @@ setup_efi_state(struct boot_params *params, unsigned long 
> params_load_addr,
>   if (efi_enabled(EFI_OLD_MEMMAP))
>   return 0;
>  
> + params->secure_boot = boot_params.secure_boot;
>   ei->efi_loader_signature = current_ei->efi_loader_signature;
>   ei->efi_systab = current_ei->efi_systab;
>   ei->efi_systab_hi = current_ei->efi_systab_hi;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 05/27] kexec: Disable at runtime if the kernel is locked down

2017-10-19 Thread joeyli
On Thu, Oct 19, 2017 at 03:51:09PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> kexec permits the loading and execution of arbitrary code in ring 0, which
> is something that lock-down is meant to prevent. It makes sense to disable
> kexec in this situation.
> 
> This does not affect kexec_file_load() which can check for a signature on the
> image to be booted.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 
> Acked-by: Dave Young 

I have reviewed and tested this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> cc: ke...@lists.infradead.org
> ---
> 
>  kernel/kexec.c |7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index e62ec4dc6620..7dadfed9b676 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -202,6 +202,13 @@ SYSCALL_DEFINE4(kexec_load, unsigned long, entry, 
> unsigned long, nr_segments,
>   return -EPERM;
>  
>   /*
> +  * kexec can be used to circumvent module loading restrictions, so
> +  * prevent loading in that case
> +  */
> + if (kernel_is_locked_down("kexec of unsigned images"))
> + return -EPERM;
> +
> + /*
>* Verify we have a legal set of flags
>* This leaves us room for future extensions.
>*/
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/27] Restrict /dev/mem and /dev/kmem when the kernel is locked down

2017-10-19 Thread joeyli
Hi David,

Thanks for you send out this series.

On Thu, Oct 19, 2017 at 03:51:02PM +0100, David Howells wrote:
> From: Matthew Garrett 
> 
> Allowing users to write to address space makes it possible for the kernel to
> be subverted, avoiding module loading restrictions.  Prevent this when the
> kernel has been locked down.
> 
> Signed-off-by: Matthew Garrett 
> Signed-off-by: David Howells 

I have reviewed and tested this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> ---
> 
>  drivers/char/mem.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 593a8818aca9..b7c36898b689 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -179,6 +179,9 @@ static ssize_t write_mem(struct file *file, const char 
> __user *buf,
>   if (p != *ppos)
>   return -EFBIG;
>  
> + if (kernel_is_locked_down("/dev/mem"))
> + return -EPERM;
> +
>   if (!valid_phys_addr_range(p, count))
>   return -EFAULT;
>  
> @@ -540,6 +543,9 @@ static ssize_t write_kmem(struct file *file, const char 
> __user *buf,
>   char *kbuf; /* k-addr because vwrite() takes vmlist_lock rwlock */
>   int err = 0;
>  
> + if (kernel_is_locked_down("/dev/kmem"))
> + return -EPERM;
> +
>   if (p < (unsigned long) high_memory) {
>   unsigned long to_write = min_t(unsigned long, count,
>  (unsigned long)high_memory - p);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 03/27] Enforce module signatures if the kernel is locked down

2017-10-19 Thread joeyli
Hi David,

Thanks for you send our this series.

On Thu, Oct 19, 2017 at 03:50:55PM +0100, David Howells wrote:
> If the kernel is locked down, require that all modules have valid
> signatures that we can verify.
> 
> Signed-off-by: David Howells 

I have reviewed and tested this patch. Please feel free to add:

Reviewed-by: "Lee, Chun-Yi" 

Thanks a lot!
Joey Lee

> ---
> 
>  kernel/module.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index de66ec825992..3d9a3270c179 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2781,7 +2781,8 @@ static int module_sig_check(struct load_info *info, int 
> flags)
>   }
>  
>   /* Not having a signature is only an error if we're strict. */
> - if (err == -ENOKEY && !sig_enforce)
> + if (err == -ENOKEY && !sig_enforce &&
> + !kernel_is_locked_down("Loading of unsigned modules"))
>   err = 0;
>  
>   return err;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-efi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 18/27] bpf: Restrict kernel image access functions when the kernel is locked down

2017-10-19 Thread joeyli
Hi Alexei,

Thanks for your review! 

On Thu, Oct 19, 2017 at 03:18:30PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 19, 2017 at 03:52:49PM +0100, David Howells wrote:
> > From: Chun-Yi Lee 
> > 
> > There are some bpf functions can be used to read kernel memory:
> > bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
> > private keys in kernel memory (e.g. the hibernation image signing key) to
> > be read by an eBPF program.  Prohibit those functions when the kernel is
> > locked down.
> > 
> > Signed-off-by: Chun-Yi Lee 
> > Signed-off-by: David Howells 
> > cc: net...@vger.kernel.org
> > ---
> > 
> >  kernel/trace/bpf_trace.c |   11 +++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index dc498b605d5d..35e85a3fdb37 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -65,6 +65,11 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const 
> > void *, unsafe_ptr)
> >  {
> > int ret;
> >  
> > +   if (kernel_is_locked_down("BPF")) {
> > +   memset(dst, 0, size);
> > +   return -EPERM;
> > +   }
> 
> That doesn't help the lockdown purpose.
> If you don't trust the root the only way to prevent bpf read
> memory is to disable the whole thing.

Not totally untrust root, I don't want that root reads
arbitrary memory address through bpf.

Is it not enough to lock down bpf_probe_read, bpf_probe_write_user
and bpf_trace_printk?

> Have a single check in sys_bpf() to disallow everything if 
> kernel_is_locked_down()
> and don't add overhead to critical path like bpf_probe_read().
>

Yes, it give overhead to bpf_probe_read but it prevents arbitrary
memory read.

Another idea is signing bpf bytecode then verifying signture when
loading to kernel. 

Thanks a lot!
Joey Lee 


Re: Draft manpage explaining kernel lockdown

2017-10-06 Thread joeyli
Hi David,

On Thu, Oct 05, 2017 at 12:00:24PM +0100, David Howells wrote:
> Hi Ard, Michael,
> 
> Attached is a draft for a manual page (kernel_lockdown.7) that I intend to
> point at from messages emitted when the kernel prohibits something because the
> kernel is in 'lockdown' mode, typically triggered by EFI secure boot.
> 
> Let me know what you think.
> 
> David
> ---
[...snip]
> When lockdown is in effect, a number of things are disabled or restricted in
> use.  This includes special device files and kernel services that allow direct
> access of the kernel image:
> .P
> .RS
> /dev/mem
> .br
> /dev/kmem
> .br
> /dev/kcore
> .br
> /dev/ioports
> .br
> BPF memory access functions

Some information for note...

The BPF functions bpf_probe_read(), bpf_trace_printk() and
bpf_probe_write_user() need to be lockdown to avoid accessing
arbitrary addess.

Our original idea is trying to filter out senstivie data address
at runtime by eBPF verifier. But it can not be success. Gary Lin has
investigated and comment:

  Although eBPF verifier can stop the reading from the hard-coded address, 
  it's not able to stop reading arguments in the probed functions. So if 
  the malicious user attaches a eBPF program to a function that is used to 
  process the sensitive data, the eBPF program can print those arguments 
  easily and this might leak the passwords or private keys.

If we readlly want to allow eBPF code to access memory, then I think that the
bpf bytecode should be signed by trused key in system keyring.

> .RE
> .P

Another function may needs to be restrictred:

- The perf_event_open() with PERF_SAMPLE_REGS_INTR

  Jann Horn raised this concern. The tool can be used to grab register
  to peep sensitive data. We may need to block this tracing function.

Regards
Joey Lee


Re: [RFC v2 PATCH] x86/boot: Add the secdata section to the setup header

2017-08-19 Thread joeyli
Hi,

On Mon, Jul 10, 2017 at 11:24:44AM +0800, Gary Lin wrote:
> A new section, secdata, in the setup header is introduced to store the
> distro-specific security version which is designed to help the
> bootloader to warn the user when loading a less secure or vulnerable
> kernel. The secdata section can be presented as the following:
> 
> struct sec_hdr {
>   __u16 header_length;
>   __u32 distro_version;
>   __u16 security_version;
> } __attribute__((packed));
> char *signer;
> 
> It consists of a fixed size structure and a null-terminated string.
> "header_length" is the size of "struct sec_hdr" and can be used as the
> offset to "signer". It also can be a kind of the "header version" to
> detect if any new member is introduced.
> 
> The kernel packager of the distribution can put the distro name in
> "signer" and the distro version in "distro_version". When a severe
> vulnerability is fixed, the packager increases "security_version" in
> the kernel build afterward. The bootloader can maintain a list of the
> security versions of the current kernels and only allows the kernel with
> a higher or equal security version to boot. If the user is going to boot
> a kernel with a lower security version, a warning should show to prevent
> the user from loading a vulnerable kernel accidentally.
> 
> Enabling UEFI Secure Boot is recommended when using the security version
> or the attacker may alter the security version stealthily.
> 
> (For more details: https://github.com/lcp/shim/wiki/Security-Version)
> 
> v2:
> - Decrease the size of secdata_offset to 2 bytes since the setup header
>   is limited to around 32KB.
> - Restructure the secdata section. The signer is now a null-terminated
>   string. The type of distro_version changes to u32 in case the distro
>   uses a long version.
> - Modify the Kconfig names and add help.
> - Remove the signer name hack in build.c.
> 
> Cc: Ard Biesheuvel 
> Cc: "H. Peter Anvin" 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Joey Lee 
> Signed-off-by: Gary Lin 

I have reviewed and tested this patch. Please feel free to add:

Signed-off-by: Joey Lee 

Thanks
Joey Lee

> ---
>  arch/x86/Kconfig  | 28 
>  arch/x86/boot/header.S| 14 +-
>  arch/x86/boot/setup.ld|  1 +
>  arch/x86/boot/tools/build.c   |  1 -
>  arch/x86/include/uapi/asm/bootparam.h |  1 +
>  5 files changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 316152f72bb9..043ff86828a6 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1828,6 +1828,34 @@ config EFI_MIXED
>  
>  If unsure, say N.
>  
> +config SIGNER_NAME
> + string "Signer name"
> + default ""
> + ---help---
> +This option specifies who signs or releases this kernel.
> +
> +config DISTRO_VERSION
> + int "Distribution version"
> + default 0
> + range 0 4294967295
> + ---help---
> +   This option specifies the distribution version which this
> +   kernel belongs to.
> +
> +config SECURITY_VERSION
> + int "Security version"
> + default 0
> + range 0 65535
> + ---help---
> +The security version is the version defined by the distribution
> +to indicate the severe security fixes. The bootloader can maintain
> +a list of the security versions of the current kernels. After
> +fixing a severe vulnerability in the kernel, the distribution can
> +increase the security version to notify the bootloader to update
> +the list. When booting a kernel with a lower security version,
> +the bootloader warns the user to avoid loading a vulnerable kernel
> +accidentally.
> +
>  config SECCOMP
>   def_bool y
>   prompt "Enable seccomp to safely compute untrusted bytecode"
> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 2ed8f0c25def..c62e0baf2d89 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -300,7 +300,7 @@ _start:
>   # Part 2 of the header, from the old setup.S
>  
>   .ascii  "HdrS"  # header signature
> - .word   0x020d  # header version number (>= 0x0105)
> + .word   0x020e  # header version number (>= 0x0105)
>   # or else old loadlin-1.5 will fail)
>   .globl realmode_swtch
>  realmode_swtch:  .word   0, 0# default_switch, SETUPSEG
> @@ -551,6 +551,7 @@ pref_address: .quad LOAD_PHYSICAL_ADDR
> # preferred load addr
>  
>  init_size:   .long INIT_SIZE # kernel initialization size
>  handover_offset: .long 0 # Filled in by build.c
> +secdata_offset:  .word secdata_start
>  
>  # End of setup header #
>  
> @@ -628,3 +629,14 @@ die:
>  setup_corrupt:
>   .byte   7
>

Re: A udev rule to serve the change event of ACPI container?

2017-08-15 Thread joeyli
On Fri, Aug 04, 2017 at 05:06:19PM +0200, Michal Hocko wrote:
> On Thu 03-08-17 11:37:37, YASUAKI ISHIMATSU wrote:
> > 
> > 
> > On 08/02/2017 01:49 AM, joeyli wrote:
> > > Hi YASUAKI,
> > > 
> > > On Tue, Aug 01, 2017 at 03:21:38PM -0400, YASUAKI ISHIMATSU wrote:
> > >> Hi Joey,
> > >>
> > >> On 07/23/2017 05:18 AM, joeyli wrote:
> > > [...snip]
> > >>>>>
> > >>>>
> > >>>> At least Yasuaki raised similar behavior for container in 2013.
> > >>>> It's similar to the DVD player case, user space application needs
> > >>>> to do something then trigger children offline and ejection of
> > >>>> container.
> > >>>>
> > >>>> Base on Yasuaki's explanation, the reason of that he requested the
> > >>>> userland ejection approach is that he got memory hot-remove problem
> > >>>> in 2013. Maybe his problem is already fixed by your patches in current
> > >>>> mainline.
> > >>>>
> > >>>> Hi Yasuaki, could you please check that your memory hot-remove problem
> > >>>> is fixed on mainline kernel?  
> > >>
> > >> I cannot remember what I mentioned in 2013. Could you tell me url of 
> > >> lkml archive.
> > >>
> > > 
> > 
> > > Here: https://lkml.org/lkml/2013/11/28/520
> > 
> > Thank you for specifying the URL. In the URL, I wrote the following 
> > problems:
> > 
> > >   1. easily fail
> > >  My container device has CPU device and Memory device, and maximum 
> > > size of
> > >  memory is 3Tbyte. In my environment, hot removing container device 
> > > fails
> > >  on offlining memory if memory is used by application.
> > >  I think if offlininig memory, we must retly to offline memory several
> > >  times.
> > 
> > I think the issue still remains. If process keeps accessing memory, 
> > offlining
> > the memory easily fails with EBUSY.
> 
> Yes and the only way how to deal with it is to retry the offline. In
> order to do that userspace has to be notified to try again.

Looks that kernel still needs to rely on userspace to trigger offline
for each child device. 

My plan is to impelement a _EJECTING_ flag on acpi_device struct.
It indicates that the acpi device under ejection process. The flag
will be changed in the following situations:

- EJECTING flag 0 -> 1
- After acpi_scan_hot_remove() sends KOBJ_CHANGE for all children

- EJECTING flag 1 -> 0 
- After acpi_scan_hot_remove() calls _EJ0
- Any child device was set online again

And, according to the order, any EJECTING flag changing work must be
pushed to kacpi_hotplug_wq work queue.

Thanks a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-08-03 Thread joeyli
On Thu, Aug 03, 2017 at 11:31:53AM +0200, Michal Hocko wrote:
> On Thu 03-08-17 17:22:37, Joey Lee wrote:
> > On Wed, Aug 02, 2017 at 11:01:43AM +0200, Michal Hocko wrote:
> > > On Mon 31-07-17 15:38:45, Joey Lee wrote:
> [...]
> > > > So, the behavior is:
> > > > 
> > > > Kernel received ejection event, set _Eject_ flag on container object
> > > >   -> Kernel sends offline events to all children devices
> > > > -> User space performs cleaning jobs and offlines each child device
> > > >   -> Kernel detects all children offlined
> > > > -> Kernel removes objects and calls power off(_EJ0)
> > > 
> > > Yes this is what I've had in mind. It is the "kernel detects..." part
> > > which is not implemented now and that requires us to do the explicit
> > > eject from userspace, correct?
> > >
> > 
> > Yes, the _Eject_ flag and _detects_ part are not implemented now. 
> > 
> > In this approach, kernel still relies on user space to trigger the
> > offline. The ejection process is still not transparent to user space.
> > Is it what you want?
> 
> But as long as there is no auto-offlining then there is no other choice
> no? Besides that userspace even shouldn't care about the fact that the

If Yasuaki's problem is already fixed in mainline, then the auto-offlining
will be possible.  

> eject is in progress. That is a BIOS->OS deal AFAIU. All the userspace
> cares about is the proper cleanup of the resources and that happens at
> the offline time.
>

I agree! User space doesn't need to know the detail of kobject cleaning
and ejection stages.
 
> > > > If anyone onlined one of the children devices in the term of waiting
> > > > userland offlines all children, then the _Eject_ flag will be clean
> > > > and ejection process will be interrupted. In this situation, 
> > > > administrator
> > > > needs to trigger ejection event again.
> > > 
> > > yes
> > > 
> > > > Do you think that the race hurts anything?
> > > 
> > > What kind of race?
> > 
> > User space set a child online before all childreen offlined, then
> > the _Eject_ flag is cleaned and the ejection process is interrupted.
> 
> Is this really a race though? Kernel will always have a full picture and
> if userspace wants to online some part then the eject cannot succeed.
> This is something that a userspace driver eject cannot possibly handle.

Then I agree.

I am waiting Yasuaki's response and want to know Rafael's and
Yasuaki's opinions about the _Eject_ flag approach.

Thanks a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-08-03 Thread joeyli
On Wed, Aug 02, 2017 at 11:01:43AM +0200, Michal Hocko wrote:
> On Mon 31-07-17 15:38:45, Joey Lee wrote:
> > Hi Michal,
> > 
> > Sorry for my delay...
> > 
> > On Tue, Jul 25, 2017 at 02:48:37PM +0200, Michal Hocko wrote:
> > > On Mon 24-07-17 17:29:21, Joey Lee wrote:
> [...]
> > > > For the success case, yes, we can clear the flag when the _EJ0 of 
> > > > container
> > > > is success. But for the fail case, we don't know when the operation is
> > > > terminated.
> > > 
> > > Hmm, this is rather strange. What is the BIOS state in the meantime?
> > > Let's say it doesn't retry. Does it wait for the OS for ever?
> > > 
> > 
> > Unfortunately ACPI spec doesn't mention the detail of BIOS behavior for
> > container hot-removing.
> > 
> > IMHO, if the BIOS doesn't retry, at least it should maintains a timer
> > to handle the OS layer time out then BIOS resets hardware(turns off
> > progress light or something else...).
> > 
> > The old BIOS just treats the ejection event as a button event. BIOS
> > emits 0x103 ejection event to OS after user presses a button or UI.
> > Then BIOS hopes that OS(either kernel or userland) finishs all jobs,
> > calls _EJ0 to turn off power, and calls _OST to return state to BIOS.
> > 
> > If the ejection event from BIOS doesn't trigger anything in upper OS
> > layer, old BIOS can not against this situation unless it has a timer.
> 
> Right but I would consider that a BIOS problem. It is simply not
> feasible to expect that OS will react in instance. Especially when we
> are talking about resources like memory which takes time proportional to
> the size to tear down properly.
> 

I agree with you that old BIOS implementation is not enough
to handle the situation from OS layer. But those old BIOS has
been shipped. We still need to consider to work with them.

> > > > > [...]
> > > > > > Base on the above figure, if userspace didn't do anything or it
> > > > > > just performs part of offline jobs. Then the container's [eject]
> > > > > > state will be always _SET_ there, and kernel will always check
> > > > > > the the latest child offline state when any child be offlined
> > > > > > by userspace.
> > > > > 
> > > > > What is a problem about that? The eject is simply in progress until 
> > > > > all
> > > > > is set. Or maybe I just misunderstood.
> > > > >
> > > > 
> > > > I agree, but it's only for success case. For fail case, kernel can not
> > > > wait forever. Can we?
> > > 
> > > Well, this won't consume any additional resources so I wouldn't be all
> > > that worried. Maybe we can reset the flag as soon as somebody tries to
> > > online some part of the container?
> > >
> > 
> > So, the behavior is:
> > 
> > Kernel received ejection event, set _Eject_ flag on container object
> >   -> Kernel sends offline events to all children devices
> > -> User space performs cleaning jobs and offlines each child device
> >   -> Kernel detects all children offlined
> > -> Kernel removes objects and calls power off(_EJ0)
> 
> Yes this is what I've had in mind. It is the "kernel detects..." part
> which is not implemented now and that requires us to do the explicit
> eject from userspace, correct?
>

Yes, the _Eject_ flag and _detects_ part are not implemented now. 

In this approach, kernel still relies on user space to trigger the
offline. The ejection process is still not transparent to user space.
Is it what you want?

> > If anyone onlined one of the children devices in the term of waiting
> > userland offlines all children, then the _Eject_ flag will be clean
> > and ejection process will be interrupted. In this situation, administrator
> > needs to trigger ejection event again.
> 
> yes
> 
> > Do you think that the race hurts anything?
> 
> What kind of race?

User space set a child online before all childreen offlined, then
the _Eject_ flag is cleaned and the ejection process is interrupted.


Thanks
Joey Lee 


Re: A udev rule to serve the change event of ACPI container?

2017-08-01 Thread joeyli
Hi YASUAKI,

On Tue, Aug 01, 2017 at 03:21:38PM -0400, YASUAKI ISHIMATSU wrote:
> Hi Joey,
> 
> On 07/23/2017 05:18 AM, joeyli wrote:
[...snip]
> >>>
> >>
> >> At least Yasuaki raised similar behavior for container in 2013.
> >> It's similar to the DVD player case, user space application needs
> >> to do something then trigger children offline and ejection of
> >> container.
> >>
> >> Base on Yasuaki's explanation, the reason of that he requested the
> >> userland ejection approach is that he got memory hot-remove problem
> >> in 2013. Maybe his problem is already fixed by your patches in current
> >> mainline.
> >>
> >> Hi Yasuaki, could you please check that your memory hot-remove problem
> >> is fixed on mainline kernel?  
> 
> I cannot remember what I mentioned in 2013. Could you tell me url of lkml 
> archive.
>

Here: https://lkml.org/lkml/2013/11/28/520

In the mail, you mentioned two problems. But there have no detail about
those issues, and no root causes. We don't know why kernel should relies
on user space to complete the hot removing process for container.

Thank a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-07-31 Thread joeyli
Hi Michal,

Sorry for my delay...

On Tue, Jul 25, 2017 at 02:48:37PM +0200, Michal Hocko wrote:
> On Mon 24-07-17 17:29:21, Joey Lee wrote:
> > On Mon, Jul 24, 2017 at 10:57:02AM +0200, Michal Hocko wrote:
> > > On Wed 19-07-17 17:09:10, Joey Lee wrote:
> > > > On Mon, Jul 17, 2017 at 11:05:25AM +0200, Michal Hocko wrote:
> > > [...]
> > > > > The problem I have with this expectation is that userspace will never
> > > > > have a good atomic view of the whole container. So it can only try to
> > > > 
> > > > I agreed!
> > > > 
> > > > Even a userspace application can handle part of offline jobs. It's
> > > > still possible that other kernel/userland compenents are using the
> > > > resource in container.
> > > > 
> > > > > eject and then hope that nobody has onlined part of the container.
> > > > > If you emit offline event to the userspace the cleanup can be done and
> > > > > after the last component goes offline then the eject can be done
> > > > > atomically.
> > > > 
> > > > The thing that we didn't align is how does kernel maintains the flag
> > > > of ejection state on container.
> > > 
> > > Why it cannot be an attribute of the container? The flag would be set
> > > when the eject operation is requested and cleared when either the
> > > operation is successful (all parts offline and eject operation acked
> > > by the BIOS) or it is terminated.
> > >
> > 
> > For the success case, yes, we can clear the flag when the _EJ0 of container
> > is success. But for the fail case, we don't know when the operation is
> > terminated.
> 
> Hmm, this is rather strange. What is the BIOS state in the meantime?
> Let's say it doesn't retry. Does it wait for the OS for ever?
> 

Unfortunately ACPI spec doesn't mention the detail of BIOS behavior for
container hot-removing.

IMHO, if the BIOS doesn't retry, at least it should maintains a timer
to handle the OS layer time out then BIOS resets hardware(turns off
progress light or something else...).

The old BIOS just treats the ejection event as a button event. BIOS
emits 0x103 ejection event to OS after user presses a button or UI.
Then BIOS hopes that OS(either kernel or userland) finishs all jobs,
calls _EJ0 to turn off power, and calls _OST to return state to BIOS.

If the ejection event from BIOS doesn't trigger anything in upper OS
layer, old BIOS can not against this situation unless it has a timer.

> > > [...]
> > > > Base on the above figure, if userspace didn't do anything or it
> > > > just performs part of offline jobs. Then the container's [eject]
> > > > state will be always _SET_ there, and kernel will always check
> > > > the the latest child offline state when any child be offlined
> > > > by userspace.
> > > 
> > > What is a problem about that? The eject is simply in progress until all
> > > is set. Or maybe I just misunderstood.
> > >
> > 
> > I agree, but it's only for success case. For fail case, kernel can not
> > wait forever. Can we?
> 
> Well, this won't consume any additional resources so I wouldn't be all
> that worried. Maybe we can reset the flag as soon as somebody tries to
> online some part of the container?
>

So, the behavior is:

Kernel received ejection event, set _Eject_ flag on container object
  -> Kernel sends offline events to all children devices
-> User space performs cleaning jobs and offlines each child device
  -> Kernel detects all children offlined
-> Kernel removes objects and calls power off(_EJ0)

If anyone onlined one of the children devices in the term of waiting
userland offlines all children, then the _Eject_ flag will be clean
and ejection process will be interrupted. In this situation, administrator
needs to trigger ejection event again. Do you think that the race hurts
anything?

Thanks a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-07-24 Thread joeyli
On Mon, Jul 24, 2017 at 10:57:02AM +0200, Michal Hocko wrote:
> On Wed 19-07-17 17:09:10, Joey Lee wrote:
> > On Mon, Jul 17, 2017 at 11:05:25AM +0200, Michal Hocko wrote:
> [...]
> > > The problem I have with this expectation is that userspace will never
> > > have a good atomic view of the whole container. So it can only try to
> > 
> > I agreed!
> > 
> > Even a userspace application can handle part of offline jobs. It's
> > still possible that other kernel/userland compenents are using the
> > resource in container.
> > 
> > > eject and then hope that nobody has onlined part of the container.
> > > If you emit offline event to the userspace the cleanup can be done and
> > > after the last component goes offline then the eject can be done
> > > atomically.
> > 
> > The thing that we didn't align is how does kernel maintains the flag
> > of ejection state on container.
> 
> Why it cannot be an attribute of the container? The flag would be set
> when the eject operation is requested and cleared when either the
> operation is successful (all parts offline and eject operation acked
> by the BIOS) or it is terminated.
>

For the success case, yes, we can clear the flag when the _EJ0 of container
is success. But for the fail case, we don't know when the operation is
terminated.
 
> [...]
> > Base on the above figure, if userspace didn't do anything or it
> > just performs part of offline jobs. Then the container's [eject]
> > state will be always _SET_ there, and kernel will always check
> > the the latest child offline state when any child be offlined
> > by userspace.
> 
> What is a problem about that? The eject is simply in progress until all
> is set. Or maybe I just misunderstood.
>

I agree, but it's only for success case. For fail case, kernel can not
wait forever. Can we?
 
> > 
> > On the other hand, for retry BIOS, we will apply the same
> > _eject_ flag approach on retry BIOS. If the OS performs
> > offline/ejection jobs too long then the retry BIOS is finally
> > time out. There doesn't have way for OS to aware the timeout.
> 
> Doesn't BIOS notify the OS that the eject has timed out?
> 

No, there doesn't have interface to notify OS for BIOS time out. 

Thanks a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-07-24 Thread joeyli
Hi Yasuaki,  

On Fri, Jul 14, 2017 at 10:44:14PM +0800, joeyli wrote:
> On Fri, Jul 14, 2017 at 10:37:13AM +0200, Michal Hocko wrote:
> > On Thu 13-07-17 20:45:21, Joey Lee wrote:
> > > On Thu, Jul 13, 2017 at 09:06:19AM +0200, Michal Hocko wrote:
> > > > On Thu 13-07-17 14:58:06, Joey Lee wrote:
> > [...]
> > > > > If BIOS emits ejection event for a ACPI0004 container, someone needs
> > > > > to handle the offline/eject jobs of container. Either kernel or user
> > > > > space.
> > > > > 
> > > > > Only sending uevent to individual child device can simplify udev rule,
> > > > > but it also means that the kernel needs to offline/eject container
> > > > > after all children devices are offlined.
> > > > 
> > > > Why cannot kernel send this eject command to the BIOS if the whole
> > > > container is offline? If it is not then the kernel would send EBUSY to
> > > 
> > > Current kernel container hot-remove process:
> > > 
> > >   BIOS -> SCI event -> Kernel ACPI -> uevent -> userland
> > >   
> > > Then, kernel just calls _OST to expose state to BIOS, then process is
> > > stopped. Kernel doesn't wait there for userland to offline each child
> > > devices. Either BIOS or userland needs to trigger the container
> > > ejection.
> > > 
> > > > container is offline? If it is not then the kernel would send EBUSY to
> > > > the BIOS and BIOS would have to retry after some timeout. Or is it a
> > > 
> > > The d429e5c122 patch is merged to mainline. So kernel will send
> > > DEVICE_BUSY to BIOS after it emits uevent to userland. BIOS can choice
> > > to apply the retry approach until OS returns process failure exactly or
> > > BIOS timeout.
> > > 
> > > > problem that currently implemented BIOS firmwares do not implement this
> > > > retry?
> > > 
> > > Yes, we should consider the behavior of old BIOS. Old BIOS doesn't
> > > retry/resend the ejection event. So kernel or userland need to take the
> > > retry job. Obviously userland runs the retry since the caa73ea15 patch
> > > is merged.
> > > 
> > > IMHO there have two different expectation from user space application.
> > > 
> > > Applications like DVD player or Burner expect that kernel should
> > > info userspace for the ejection, then application can do their cleaning
> > > job and re-trigger ejection from userland.
> > 
> > I am not sure I understand the DVD example because I do not see how it
> > fits into the container and online/offline scenario.
> >
> 
> At least Yasuaki raised similar behavior for container in 2013.
> It's similar to the DVD player case, user space application needs
> to do something then trigger children offline and ejection of
> container.
> 
> Base on Yasuaki's explanation, the reason of that he requested the
> userland ejection approach is that he got memory hot-remove problem
> in 2013. Maybe his problem is already fixed by your patches in current
> mainline.
> 
> Hi Yasuaki, could you please check that your memory hot-remove problem
> is fixed on mainline kernel?  
> 
> If Yasuaki's issue is already fixed, then we should consider to let
> kernel does the container hot-remove transparently. 

Could you please help to check that your memory hot-remove problem in 2013
is fixed on mainline kernel?  

Thanks a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-07-19 Thread joeyli
On Mon, Jul 17, 2017 at 11:05:25AM +0200, Michal Hocko wrote:
> On Fri 14-07-17 22:44:14, Joey Lee wrote:
> > On Fri, Jul 14, 2017 at 10:37:13AM +0200, Michal Hocko wrote:
> > > On Thu 13-07-17 20:45:21, Joey Lee wrote:
> > > > On Thu, Jul 13, 2017 at 09:06:19AM +0200, Michal Hocko wrote:
> > > > > On Thu 13-07-17 14:58:06, Joey Lee wrote:
> > > [...]
> > > > > > If BIOS emits ejection event for a ACPI0004 container, someone needs
> > > > > > to handle the offline/eject jobs of container. Either kernel or user
> > > > > > space.
> > > > > >
> > > > > > Only sending uevent to individual child device can simplify udev 
> > > > > > rule,
> > > > > > but it also means that the kernel needs to offline/eject container
> > > > > > after all children devices are offlined.
> > > > >
> > > > > Why cannot kernel send this eject command to the BIOS if the whole
> > > > > container is offline? If it is not then the kernel would send EBUSY to
> > > >
> > > > Current kernel container hot-remove process:
> > > >
> > > >   BIOS -> SCI event -> Kernel ACPI -> uevent -> userland
> > > >
> > > > Then, kernel just calls _OST to expose state to BIOS, then process is
> > > > stopped. Kernel doesn't wait there for userland to offline each child
> > > > devices. Either BIOS or userland needs to trigger the container
> > > > ejection.
> > > >
> > > > > container is offline? If it is not then the kernel would send EBUSY to
> > > > > the BIOS and BIOS would have to retry after some timeout. Or is it a
> > > >
> > > > The d429e5c122 patch is merged to mainline. So kernel will send
> > > > DEVICE_BUSY to BIOS after it emits uevent to userland. BIOS can choice
> > > > to apply the retry approach until OS returns process failure exactly or
> > > > BIOS timeout.
> > > >
> > > > > problem that currently implemented BIOS firmwares do not implement 
> > > > > this
> > > > > retry?
> > > >
> > > > Yes, we should consider the behavior of old BIOS. Old BIOS doesn't
> > > > retry/resend the ejection event. So kernel or userland need to take the
> > > > retry job. Obviously userland runs the retry since the caa73ea15 patch
> > > > is merged.
> > > >
> > > > IMHO there have two different expectation from user space application.
> > > >
> > > > Applications like DVD player or Burner expect that kernel should
> > > > info userspace for the ejection, then application can do their cleaning
> > > > job and re-trigger ejection from userland.
> > >
> > > I am not sure I understand the DVD example because I do not see how it
> > > fits into the container and online/offline scenario.
> > >
> >
> > At least Yasuaki raised similar behavior for container in 2013.
> > It's similar to the DVD player case, user space application needs
> > to do something then trigger children offline and ejection of
> > container.
>
> The problem I have with this expectation is that userspace will never
> have a good atomic view of the whole container. So it can only try to

I agreed!

Even a userspace application can handle part of offline jobs. It's
still possible that other kernel/userland compenents are using the
resource in container.

> eject and then hope that nobody has onlined part of the container.
> If you emit offline event to the userspace the cleanup can be done and
> after the last component goes offline then the eject can be done
> atomically.

The thing that we didn't align is how does kernel maintains the flag
of ejection state on container.

>
> [...]
> > > Hmm, so can we trigger the eject from the _kernel_ when the last child
> > > is offlined?
> >
> > Kernel needs to remember that the container is under a _EJECTION_ state
> > that it should waits all children be offlined. Then kernel checks the
> > container offline state when each individual device is offlined. If
> > kernel found a container offlined (means that all children are offlined),
> > and the container is under ejection state, then kernel runs ejection
> > jobs (removing objects and calls _EJ0).
>
> yes, that is what I meant.
>
> > To achieve this, I think that the container object needs a _EJECTION_
> > flag. It helps kernel to remember the state that it set by BIOS's
> > ejection event.
>
> yes something like that.
>

Good!

> > This approach has some problems: If userland doesn't finish his offline
> > jobs or userland doesn't do anything, when should kernel clears the
> > ejection flag and responses failure by _OST to BIOS?
>
> I do not see how is that any different from the current approach. You
> still cannot do the eject if some component is online and we rely on the
> userspace to do the offline.

I see. I agree with you that it's no different from the current approach.
But I still concern how to maintain the ejection state flag in kernel.

>
> > And, for new BIOS that it has time out mechanism. Currently there have
> > no way for BIOS to tell kernel that it gives up. It's hard to sync the
> > kernel container's ejection flag with BIOS.
>
> I am not sure I understand. The kernel/BIOS synch

Re: A udev rule to serve the change event of ACPI container?

2017-07-14 Thread joeyli
On Fri, Jul 14, 2017 at 10:37:13AM +0200, Michal Hocko wrote:
> On Thu 13-07-17 20:45:21, Joey Lee wrote:
> > On Thu, Jul 13, 2017 at 09:06:19AM +0200, Michal Hocko wrote:
> > > On Thu 13-07-17 14:58:06, Joey Lee wrote:
> [...]
> > > > If BIOS emits ejection event for a ACPI0004 container, someone needs
> > > > to handle the offline/eject jobs of container. Either kernel or user
> > > > space.
> > > > 
> > > > Only sending uevent to individual child device can simplify udev rule,
> > > > but it also means that the kernel needs to offline/eject container
> > > > after all children devices are offlined.
> > > 
> > > Why cannot kernel send this eject command to the BIOS if the whole
> > > container is offline? If it is not then the kernel would send EBUSY to
> > 
> > Current kernel container hot-remove process:
> > 
> >   BIOS -> SCI event -> Kernel ACPI -> uevent -> userland
> >   
> > Then, kernel just calls _OST to expose state to BIOS, then process is
> > stopped. Kernel doesn't wait there for userland to offline each child
> > devices. Either BIOS or userland needs to trigger the container
> > ejection.
> > 
> > > container is offline? If it is not then the kernel would send EBUSY to
> > > the BIOS and BIOS would have to retry after some timeout. Or is it a
> > 
> > The d429e5c122 patch is merged to mainline. So kernel will send
> > DEVICE_BUSY to BIOS after it emits uevent to userland. BIOS can choice
> > to apply the retry approach until OS returns process failure exactly or
> > BIOS timeout.
> > 
> > > problem that currently implemented BIOS firmwares do not implement this
> > > retry?
> > 
> > Yes, we should consider the behavior of old BIOS. Old BIOS doesn't
> > retry/resend the ejection event. So kernel or userland need to take the
> > retry job. Obviously userland runs the retry since the caa73ea15 patch
> > is merged.
> > 
> > IMHO there have two different expectation from user space application.
> > 
> > Applications like DVD player or Burner expect that kernel should
> > info userspace for the ejection, then application can do their cleaning
> > job and re-trigger ejection from userland.
> 
> I am not sure I understand the DVD example because I do not see how it
> fits into the container and online/offline scenario.
>

At least Yasuaki raised similar behavior for container in 2013.
It's similar to the DVD player case, user space application needs
to do something then trigger children offline and ejection of
container.

Base on Yasuaki's explanation, the reason of that he requested the
userland ejection approach is that he got memory hot-remove problem
in 2013. Maybe his problem is already fixed by your patches in current
mainline.

Hi Yasuaki, could you please check that your memory hot-remove problem
is fixed on mainline kernel?  

If Yasuaki's issue is already fixed, then we should consider to let
kernel does the container hot-remove transparently. 

> > But, some other applications like database don't want that their service
> > be stopped when the devices offline/eject. The hot-remove sholud be done by
> > kernel transparently.
> > 
> > We need a way for fill two situations.
> 
> Hmm, so can we trigger the eject from the _kernel_ when the last child
> is offlined?

Kernel needs to remember that the container is under a _EJECTION_ state
that it should waits all children be offlined. Then kernel checks the
container offline state when each individual device is offlined. If
kernel found a container offlined (means that all children are offlined),
and the container is under ejection state, then kernel runs ejection
jobs (removing objects and calls _EJ0). 

To achieve this, I think that the container object needs a _EJECTION_
flag. It helps kernel to remember the state that it set by BIOS's
ejection event.

This approach has some problems: If userland doesn't finish his offline
jobs or userland doesn't do anything, when should kernel clears the 
ejection flag and responses failure by _OST to BIOS?

And, for new BIOS that it has time out mechanism. Currently there have
no way for BIOS to tell kernel that it gives up. It's hard to sync the
kernel container's ejection flag with BIOS. 

Of course the better is that Yasuaki's problem got fixed. Kernel does
the hot-removes container transparently (again). Then we don't need
to worry how to maintain a ejection state in kernel.  

Thanks a lot!
Joey Lee 


Re: A udev rule to serve the change event of ACPI container?

2017-07-13 Thread joeyli
On Thu, Jul 13, 2017 at 09:06:19AM +0200, Michal Hocko wrote:
> On Thu 13-07-17 14:58:06, Joey Lee wrote:
> > Hi Michal, 
> > 
> > Sorry for my delay.
> > 
> > On Tue, Jul 11, 2017 at 10:25:32AM +0200, Michal Hocko wrote:
> > > On Mon 26-06-17 10:59:07, Michal Hocko wrote:
> > > > On Mon 26-06-17 14:26:57, Joey Lee wrote:
> > > > > Hi all,
> > > > > 
> > > > > If ACPI received ejection request for a ACPI container, kernel
> > > > > emits KOBJ_CHANGE uevent when it found online children devices
> > > > > below the acpi container.
> > > > > 
> > > > > Base on the description of caa73ea15 kernel patch, user space
> > > > > is expected to offline all devices below the container and the
> > > > > container itself. Then, user space can finalize the removal of
> > > > > the container with the help of its ACPI device object's eject
> > > > > attribute in sysfs.
> > > > > 
> > > > > That means that kernel relies on users space to peform the offline
> > > > > and ejection jobs to acpi container and children devices. The
> > > > > discussion is here:
> > > > >   https://lkml.org/lkml/2013/11/28/520
> > > > > 
> > > > > The mail loop didn't explain why the userspace is responsible for
> > > > > the whole container offlining. Is it possible to do that transparently
> > > > > from the kernel? What's the difference between offlining memory and
> > > > > processors which happends without any cleanup and container which
> > > > > does essentially the same except it happens at once? 
> > > > >  
> > > > >  - After a couple of years, can we let the container hot-remove
> > > > >process transparently?
> > > > >  - Except udev rule, does there have any other mechanism to trigger
> > > > >auto offline/ejection?
> > > > 
> > > > I would be also interested whether the kernel can simply send an udev 
> > > > event
> > > > to all devices in the container.
> > > 
> > > Any opinion on this?
> > 
> > If BIOS emits ejection event for a ACPI0004 container, someone needs
> > to handle the offline/eject jobs of container. Either kernel or user
> > space.
> > 
> > Only sending uevent to individual child device can simplify udev rule,
> > but it also means that the kernel needs to offline/eject container
> > after all children devices are offlined.
> 
> Why cannot kernel send this eject command to the BIOS if the whole
> container is offline? If it is not then the kernel would send EBUSY to

Current kernel container hot-remove process:

  BIOS -> SCI event -> Kernel ACPI -> uevent -> userland
  
Then, kernel just calls _OST to expose state to BIOS, then process is
stopped. Kernel doesn't wait there for userland to offline each child
devices. Either BIOS or userland needs to trigger the container
ejection.

> container is offline? If it is not then the kernel would send EBUSY to
> the BIOS and BIOS would have to retry after some timeout. Or is it a

The d429e5c122 patch is merged to mainline. So kernel will send
DEVICE_BUSY to BIOS after it emits uevent to userland. BIOS can choice
to apply the retry approach until OS returns process failure exactly or
BIOS timeout.

> problem that currently implemented BIOS firmwares do not implement this
> retry?

Yes, we should consider the behavior of old BIOS. Old BIOS doesn't
retry/resend the ejection event. So kernel or userland need to take the
retry job. Obviously userland runs the retry since the caa73ea15 patch
is merged.

IMHO there have two different expectation from user space application.

Applications like DVD player or Burner expect that kernel should
info userspace for the ejection, then application can do their cleaning
job and re-trigger ejection from userland.

But, some other applications like database don't want that their service
be stopped when the devices offline/eject. The hot-remove sholud be done by
kernel transparently.

We need a way for fill two situations.

Thanks a lot!
Joey Lee


Re: A udev rule to serve the change event of ACPI container?

2017-07-12 Thread joeyli
Hi Michal, 

Sorry for my delay.

On Tue, Jul 11, 2017 at 10:25:32AM +0200, Michal Hocko wrote:
> On Mon 26-06-17 10:59:07, Michal Hocko wrote:
> > On Mon 26-06-17 14:26:57, Joey Lee wrote:
> > > Hi all,
> > > 
> > > If ACPI received ejection request for a ACPI container, kernel
> > > emits KOBJ_CHANGE uevent when it found online children devices
> > > below the acpi container.
> > > 
> > > Base on the description of caa73ea15 kernel patch, user space
> > > is expected to offline all devices below the container and the
> > > container itself. Then, user space can finalize the removal of
> > > the container with the help of its ACPI device object's eject
> > > attribute in sysfs.
> > > 
> > > That means that kernel relies on users space to peform the offline
> > > and ejection jobs to acpi container and children devices. The
> > > discussion is here:
> > >   https://lkml.org/lkml/2013/11/28/520
> > > 
> > > The mail loop didn't explain why the userspace is responsible for
> > > the whole container offlining. Is it possible to do that transparently
> > > from the kernel? What's the difference between offlining memory and
> > > processors which happends without any cleanup and container which
> > > does essentially the same except it happens at once? 
> > >  
> > >  - After a couple of years, can we let the container hot-remove
> > >process transparently?
> > >  - Except udev rule, does there have any other mechanism to trigger
> > >auto offline/ejection?
> > 
> > I would be also interested whether the kernel can simply send an udev event
> > to all devices in the container.
> 
> Any opinion on this?

If BIOS emits ejection event for a ACPI0004 container, someone needs
to handle the offline/eject jobs of container. Either kernel or user
space.

Only sending uevent to individual child device can simplify udev rule,
but it also means that the kernel needs to offline/eject container
after all children devices are offlined. Maybe adding a ejection flag
on the ACPI0004 object to indicate the container state. But, if userland
doesn't do his job, then the timing to reset the flag will be the problem. 

Thanks a lot!
Joey Lee 


  1   2   3   4   >