date:20200521

Re: [PATCH v2 0/8] iio: core: wrap IIO device into an iio_dev_opaque object

2020-05-21 Thread Ardelean, Alexandru

On Thu, 2020-05-14 at 16:17 +0300, Alexandru Ardelean wrote:
> This change starts to hide some internal fields of the IIO device into
> the framework.
> 
> Because the iio_priv_to_dev() will be hidden some pre-work is done to
> try to remove it from some interrupt handlers.
> iio_priv_to_dev() will become a function call and won't be expandable
> into place (as-is now as an inline function).
> 

I'll defer this series.
A cleanup of iio_priv_to_dev() doesn't look like a bit detour.


> Changelog v1 -> v2:
> - add pre-work patches that remove some calls to iio_priv_to_dev() from
>   interrupt handlers
> - renamed iio_dev_priv -> iio_dev_opaque
> - moved the iio_dev_opaque to 'include/linux/iio/iio-opaque.h' this way
>   it should be usable for debugging
> - the iio_priv() call, is still an inline function that returns an
>   'indio_dev->priv' reference; this field is added to 'struct iio_dev';
>   the reference is computed in iio_device_alloc() and should be
>   cacheline aligned
> - the to_iio_dev_opaque() container is in the
>   'include/linux/iio/iio-opaque.h' header; it's still implemented with
>   some pointer arithmetic; one idea was to do it via an
>   'indio_dev->opaque' field; that may still be an optionif there are
>   some opinions in that direction
> 
> Alexandru Ardelean (8):
>   iio: proximity: ping: pass reference to IIO device via call-stack
>   iio: at91-sama5d2_adc: pass ref to IIO device via param for int
> function
>   iio: at91_adc: pass ref to IIO device via param for int function
>   iio: stm32-dfsdm-adc: pass iio device as arg for the interrupt handler
>   iio: stm32-adc: pass iio device as arg for the interrupt handler
>   iio: core: wrap IIO device into an iio_dev_opaque object
>   iio: core: simplify alloc alignment code
>   iio: core: move debugfs data on the private iio dev info
> 
>  drivers/iio/adc/at91-sama5d2_adc.c |  7 ++-
>  drivers/iio/adc/at91_adc.c |  5 +-
>  drivers/iio/adc/stm32-adc.c| 10 ++--
>  drivers/iio/adc/stm32-dfsdm-adc.c  |  6 +--
>  drivers/iio/industrialio-core.c| 75 --
>  drivers/iio/proximity/ping.c   |  5 +-
>  include/linux/iio/iio-opaque.h | 27 +++
>  include/linux/iio/iio.h| 24 +++---
>  8 files changed, 99 insertions(+), 60 deletions(-)
>  create mode 100644 include/linux/iio/iio-opaque.h
>

Re: [PATCH v2 2/3] extcon: max14577: Add proper dt-compatible strings

2020-05-21 Thread Chanwoo Choi

On 5/22/20 3:48 PM, Marek Szyprowski wrote:
> Add device tree compatible strings and create proper modalias structures
> to let this driver load automatically if compiled as module, because
> max14577 MFD driver creates MFD cells with such compatible strings.
> 
> Signed-off-by: Marek Szyprowski 
> ---
> v2:
> - added .of_match_table pointer
> ---
>  drivers/extcon/extcon-max14577.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/extcon/extcon-max14577.c 
> b/drivers/extcon/extcon-max14577.c
> index 32f663436e6e..03af678ddeba 100644
> --- a/drivers/extcon/extcon-max14577.c
> +++ b/drivers/extcon/extcon-max14577.c
> @@ -782,9 +782,19 @@ static const struct platform_device_id 
> max14577_muic_id[] = {
>  };
>  MODULE_DEVICE_TABLE(platform, max14577_muic_id);
>  
> +static const struct of_device_id of_max14577_muic_dt_match[] = {
> + { .compatible = "maxim,max77836-muic",
> +   .data = (void *)MAXIM_DEVICE_TYPE_MAX77836, },
> + { .compatible = "maxim,max14577-muic",
> +   .data = (void *)MAXIM_DEVICE_TYPE_MAX14577, },
> + { },

How about changing the order between max77836 and max14577 as already added 
structure
like platform_device_id if there are no specific reason as following:?


static const struct of_device_id of_max14577_muic_dt_match[] = {
{ .compatible = "maxim,max14577-muic",
  .data = (void *)MAXIM_DEVICE_TYPE_MAX14577, },
{ .compatible = "maxim,max77836-muic",
  .data = (void *)MAXIM_DEVICE_TYPE_MAX77836, },
{ },



> +};
> +MODULE_DEVICE_TABLE(of, of_max14577_muic_dt_match);
> +
>  static struct platform_driver max14577_muic_driver = {
>   .driver = {
>   .name   = "max14577-muic",
> + .of_match_table = of_max14577_muic_dt_match,
>   },
>   .probe  = max14577_muic_probe,
>   .remove = max14577_muic_remove,
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH v30 12/20] x86/sgx: Add a page reclaimer

2020-05-21 Thread Sean Christopherson

On Fri, May 15, 2020 at 03:44:02AM +0300, Jarkko Sakkinen wrote:
> +/**
> + * sgx_reclaim_pages() - Reclaim EPC pages from the consumers
> + *
> + * Take a fixed number of pages from the head of the active page pool and
> + * reclaim them to the enclave's private shmem files. Skip the pages, which
> + * have been accessed since the last scan. Move those pages to the tail of
> + * active page pool so that the pages get scanned in LRU like fashion.
> + */
> +void sgx_reclaim_pages(void)
> +{
> + struct sgx_epc_page *chunk[SGX_NR_TO_SCAN];
> + struct sgx_backing backing[SGX_NR_TO_SCAN];
> + struct sgx_epc_section *section;
> + struct sgx_encl_page *encl_page;
> + struct sgx_epc_page *epc_page;
> + int cnt = 0;
> + int ret;
> + int i;
> +
> + spin_lock(&sgx_active_page_list_lock);
> + for (i = 0; i < SGX_NR_TO_SCAN; i++) {
> + if (list_empty(&sgx_active_page_list))
> + break;
> +
> + epc_page = list_first_entry(&sgx_active_page_list,
> + struct sgx_epc_page, list);
> + list_del_init(&epc_page->list);
> + encl_page = epc_page->owner;
> +
> + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0)
> + chunk[cnt++] = epc_page;
> + else
> + /* The owner is freeing the page. No need to add the
> +  * page back to the list of reclaimable pages.
> +  */
> + epc_page->desc &= ~SGX_EPC_PAGE_RECLAIMABLE;
> + }
> + spin_unlock(&sgx_active_page_list_lock);
> +
> + for (i = 0; i < cnt; i++) {
> + epc_page = chunk[i];
> + encl_page = epc_page->owner;
> +
> + if (!sgx_reclaimer_age(epc_page))
> + goto skip;
> +
> + ret = sgx_encl_get_backing(encl_page->encl,
> +SGX_ENCL_PAGE_INDEX(encl_page),
> +&backing[i]);
> + if (ret)
> + goto skip;
> +
> + mutex_lock(&encl_page->encl->lock);
> + encl_page->desc |= SGX_ENCL_PAGE_RECLAIMED;
> + mutex_unlock(&encl_page->encl->lock);
> + continue;
> +
> +skip:
> + kref_put(&encl_page->encl->refcount, sgx_encl_release);
> +
> + spin_lock(&sgx_active_page_list_lock);
> + list_add_tail(&epc_page->list, &sgx_active_page_list);
> + spin_unlock(&sgx_active_page_list_lock);

Ugh, this is wrong.  If the above kref_put() drops the last reference and
releases the enclave, adding the page to the active page list will result
in a use-after-free as the enclave will have been freed.  It also leaks the
EPC page because sgx_encl_destroy() skips pages that are in the process of
being reclaimed (as detected by list_empty()).

The "original" code did the put() after list_add_tail(), but was moved in
v15 to fix a bug where the put() could drop a reference to the wrong enclave
if the page was freed and reallocated by a different CPU between
list_add_tail() and put().  But, that particular bug only occurred because
the code at the time was:

sgx_encl_page_put(epc_page);

I.e. the backpointer in epc_page was consumed after dropping the spin lock.
So long as epc_page->owner (well, epc_page in general) isn't dereferenced,
I'm 99% certain this can be fixed simply by doing kref_put() after moving
the page back to the active page list.

> +
> + chunk[i] = NULL;
> + }
> +
> + for (i = 0; i < cnt; i++) {
> + epc_page = chunk[i];
> + if (epc_page)
> + sgx_reclaimer_block(epc_page);
> + }
> +
> + for (i = 0; i < cnt; i++) {
> + epc_page = chunk[i];
> + if (!epc_page)
> + continue;
> +
> + encl_page = epc_page->owner;
> + sgx_reclaimer_write(epc_page, &backing[i]);
> + sgx_encl_put_backing(&backing[i], true);
> +
> + kref_put(&encl_page->encl->refcount, sgx_encl_release);
> + epc_page->desc &= ~SGX_EPC_PAGE_RECLAIMABLE;
> +
> + section = sgx_epc_section(epc_page);
> + spin_lock(§ion->lock);
> + list_add_tail(&epc_page->list, §ion->page_list);
> + section->free_cnt++;
> + spin_unlock(§ion->lock);
> + }
> +}

[PATCH] iio: humidity: hts221: remove usage of iio_priv_to_dev()

2020-05-21 Thread Alexandru Ardelean

We may want to get rid of the iio_priv_to_dev() helper. That's a bit
uncertain at this point. The reason is that we will hide some of the
members of the iio_dev structure (to prevent drivers from accessing them
directly), and that will also mean hiding the implementation of the
iio_priv_to_dev() helper inside the IIO core.

Hiding the implementation of iio_priv_to_dev() implies that some fast-paths
may not be fast anymore, so a general idea is to try to get rid of the
iio_priv_to_dev() altogether.

For this driver, removing the iio_priv_to_dev() helper means passing the
iio_dev object on hts221_allocate_buffers() & hts221_allocate_trigger().

Signed-off-by: Alexandru Ardelean 
---
 drivers/iio/humidity/hts221.h| 4 ++--
 drivers/iio/humidity/hts221_buffer.c | 7 +++
 drivers/iio/humidity/hts221_core.c   | 4 ++--
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/iio/humidity/hts221.h b/drivers/iio/humidity/hts221.h
index 7d6771f7cf47..569146910885 100644
--- a/drivers/iio/humidity/hts221.h
+++ b/drivers/iio/humidity/hts221.h
@@ -46,7 +46,7 @@ extern const struct dev_pm_ops hts221_pm_ops;
 int hts221_probe(struct device *dev, int irq, const char *name,
 struct regmap *regmap);
 int hts221_set_enable(struct hts221_hw *hw, bool enable);
-int hts221_allocate_buffers(struct hts221_hw *hw);
-int hts221_allocate_trigger(struct hts221_hw *hw);
+int hts221_allocate_buffers(struct hts221_hw *hw, struct iio_dev *iio_dev);
+int hts221_allocate_trigger(struct hts221_hw *hw, struct iio_dev *iio_dev);
 
 #endif /* HTS221_H */
diff --git a/drivers/iio/humidity/hts221_buffer.c 
b/drivers/iio/humidity/hts221_buffer.c
index 9fb3f33614d4..48d469eeb0e6 100644
--- a/drivers/iio/humidity/hts221_buffer.c
+++ b/drivers/iio/humidity/hts221_buffer.c
@@ -72,10 +72,9 @@ static irqreturn_t hts221_trigger_handler_thread(int irq, 
void *private)
return IRQ_HANDLED;
 }
 
-int hts221_allocate_trigger(struct hts221_hw *hw)
+int hts221_allocate_trigger(struct hts221_hw *hw, struct iio_dev *iio_dev)
 {
struct st_sensors_platform_data *pdata = dev_get_platdata(hw->dev);
-   struct iio_dev *iio_dev = iio_priv_to_dev(hw);
bool irq_active_low = false, open_drain = false;
unsigned long irq_type;
int err;
@@ -190,9 +189,9 @@ static irqreturn_t hts221_buffer_handler_thread(int irq, 
void *p)
return IRQ_HANDLED;
 }
 
-int hts221_allocate_buffers(struct hts221_hw *hw)
+int hts221_allocate_buffers(struct hts221_hw *hw, struct iio_dev *iio_dev)
 {
-   return devm_iio_triggered_buffer_setup(hw->dev, iio_priv_to_dev(hw),
+   return devm_iio_triggered_buffer_setup(hw->dev, iio_dev,
NULL, hts221_buffer_handler_thread,
&hts221_buffer_ops);
 }
diff --git a/drivers/iio/humidity/hts221_core.c 
b/drivers/iio/humidity/hts221_core.c
index 9003671f14fb..77dfa65df841 100644
--- a/drivers/iio/humidity/hts221_core.c
+++ b/drivers/iio/humidity/hts221_core.c
@@ -621,11 +621,11 @@ int hts221_probe(struct device *dev, int irq, const char 
*name,
}
 
if (hw->irq > 0) {
-   err = hts221_allocate_buffers(hw);
+   err = hts221_allocate_buffers(hw, iio_dev);
if (err < 0)
return err;
 
-   err = hts221_allocate_trigger(hw);
+   err = hts221_allocate_trigger(hw, iio_dev);
if (err)
return err;
}
-- 
2.25.1

[PATCH] iio: light: iqs621: remove usage of iio_priv_to_dev()

2020-05-21 Thread Alexandru Ardelean

We may want to get rid of the iio_priv_to_dev() helper. That's a bit
uncertain at this point. The reason is that we will hide some of the
members of the iio_dev structure (to prevent drivers from accessing them
directly), and that will also mean hiding the implementation of the
iio_priv_to_dev() helper inside the IIO core.

Hiding the implementation of iio_priv_to_dev() implies that some fast-paths
may not be fast anymore, so a general idea is to try to get rid of the
iio_priv_to_dev() altogether.

For this driver, removing iio_priv_to_dev() means keeping a reference
on the state struct.

Signed-off-by: Alexandru Ardelean 
---
 drivers/iio/light/iqs621-als.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iio/light/iqs621-als.c b/drivers/iio/light/iqs621-als.c
index b2988a782bd0..1a056e2446ab 100644
--- a/drivers/iio/light/iqs621-als.c
+++ b/drivers/iio/light/iqs621-als.c
@@ -36,6 +36,7 @@
 
 struct iqs621_als_private {
struct iqs62x_core *iqs62x;
+   struct iio_dev *indio_dev;
struct notifier_block notifier;
struct mutex lock;
bool light_en;
@@ -103,7 +104,7 @@ static int iqs621_als_notifier(struct notifier_block 
*notifier,
 
iqs621_als = container_of(notifier, struct iqs621_als_private,
  notifier);
-   indio_dev = iio_priv_to_dev(iqs621_als);
+   indio_dev = iqs621_als->indio_dev;
timestamp = iio_get_time_ns(indio_dev);
 
mutex_lock(&iqs621_als->lock);
@@ -191,7 +192,7 @@ static int iqs621_als_notifier(struct notifier_block 
*notifier,
 static void iqs621_als_notifier_unregister(void *context)
 {
struct iqs621_als_private *iqs621_als = context;
-   struct iio_dev *indio_dev = iio_priv_to_dev(iqs621_als);
+   struct iio_dev *indio_dev = iqs621_als->indio_dev;
int ret;
 
ret = blocking_notifier_chain_unregister(&iqs621_als->iqs62x->nh,
@@ -551,6 +552,7 @@ static int iqs621_als_probe(struct platform_device *pdev)
 
iqs621_als = iio_priv(indio_dev);
iqs621_als->iqs62x = iqs62x;
+   iqs621_als->indio_dev = indio_dev;
 
if (iqs62x->dev_desc->prod_num == IQS622_PROD_NUM) {
ret = regmap_read(iqs62x->regmap, IQS622_IR_THRESH_TOUCH,
-- 
2.25.1

[PATCH] perf jvmti: remove redundant jitdump line table entries

2020-05-21 Thread Nick Gasson

For each PC/BCI pair in the JVMTI compiler inlining record table, the
jitdump plugin emits debug line table entries for every source line in
the method preceding that BCI. Instead only emit one source line per
PC/BCI pair. Reported by Ian Rogers. This reduces the .dump size for
SPECjbb from ~230MB to ~40MB.

Also fix an error in the DWARF line table state machine where addresses
are incorrectly offset by -0x40 (GEN_ELF_TEXT_OFFSET). This can be seen
with `objdump -S` on the ELF files after perf inject.

Signed-off-by: Nick Gasson 
---
 tools/perf/jvmti/libjvmti.c| 73 +-
 tools/perf/util/genelf_debug.c |  4 +-
 2 files changed, 30 insertions(+), 47 deletions(-)

diff --git a/tools/perf/jvmti/libjvmti.c b/tools/perf/jvmti/libjvmti.c
index a9a056d68416..398e4ba6498d 100644
--- a/tools/perf/jvmti/libjvmti.c
+++ b/tools/perf/jvmti/libjvmti.c
@@ -32,38 +32,41 @@ static void print_error(jvmtiEnv *jvmti, const char *msg, 
jvmtiError ret)
 
 #ifdef HAVE_JVMTI_CMLR
 static jvmtiError
-do_get_line_numbers(jvmtiEnv *jvmti, void *pc, jmethodID m, jint bci,
-   jvmti_line_info_t *tab, jint *nr)
+do_get_line_number(jvmtiEnv *jvmti, void *pc, jmethodID m, jint bci,
+  jvmti_line_info_t *tab)
 {
-   jint i, lines = 0;
-   jint nr_lines = 0;
+   jint i, nr_lines = 0;
jvmtiLineNumberEntry *loc_tab = NULL;
jvmtiError ret;
+   jint src_line = -1;
 
ret = (*jvmti)->GetLineNumberTable(jvmti, m, &nr_lines, &loc_tab);
if (ret == JVMTI_ERROR_ABSENT_INFORMATION || ret == 
JVMTI_ERROR_NATIVE_METHOD) {
/* No debug information for this method */
-   *nr = 0;
-   return JVMTI_ERROR_NONE;
+   return ret;
} else if (ret != JVMTI_ERROR_NONE) {
print_error(jvmti, "GetLineNumberTable", ret);
return ret;
}
 
-   for (i = 0; i < nr_lines; i++) {
-   if (loc_tab[i].start_location < bci) {
-   tab[lines].pc = (unsigned long)pc;
-   tab[lines].line_number = loc_tab[i].line_number;
-   tab[lines].discrim = 0; /* not yet used */
-   tab[lines].methodID = m;
-   lines++;
-   } else {
-   break;
-   }
+   for (i = 0; i < nr_lines && loc_tab[i].start_location <= bci; i++) {
+   src_line = i;
+   }
+
+   if (src_line != -1) {
+   tab->pc = (unsigned long)pc;
+   tab->line_number = loc_tab[src_line].line_number;
+   tab->discrim = 0; /* not yet used */
+   tab->methodID = m;
+
+   ret = JVMTI_ERROR_NONE;
+   } else {
+   ret = JVMTI_ERROR_ABSENT_INFORMATION;
}
+
(*jvmti)->Deallocate(jvmti, (unsigned char *)loc_tab);
-   *nr = lines;
-   return JVMTI_ERROR_NONE;
+
+   return ret;
 }
 
 static jvmtiError
@@ -71,9 +74,8 @@ get_line_numbers(jvmtiEnv *jvmti, const void *compile_info, 
jvmti_line_info_t **
 {
const jvmtiCompiledMethodLoadRecordHeader *hdr;
jvmtiCompiledMethodLoadInlineRecord *rec;
-   jvmtiLineNumberEntry *lne = NULL;
PCStackInfo *c;
-   jint nr, ret;
+   jint ret;
int nr_total = 0;
int i, lines_total = 0;
 
@@ -86,24 +88,7 @@ get_line_numbers(jvmtiEnv *jvmti, const void *compile_info, 
jvmti_line_info_t **
for (hdr = compile_info; hdr != NULL; hdr = hdr->next) {
if (hdr->kind == JVMTI_CMLR_INLINE_INFO) {
rec = (jvmtiCompiledMethodLoadInlineRecord *)hdr;
-   for (i = 0; i < rec->numpcs; i++) {
-   c = rec->pcinfo + i;
-   nr = 0;
-   /*
-* unfortunately, need a tab to get the number 
of lines!
-*/
-   ret = (*jvmti)->GetLineNumberTable(jvmti, 
c->methods[0], &nr, &lne);
-   if (ret == JVMTI_ERROR_NONE) {
-   /* free what was allocated for nothing 
*/
-   (*jvmti)->Deallocate(jvmti, (unsigned 
char *)lne);
-   nr_total += (int)nr;
-   } else if (ret == JVMTI_ERROR_ABSENT_INFORMATION
-  || ret == JVMTI_ERROR_NATIVE_METHOD) 
{
-   /* No debug information for this method 
*/
-   } else {
-   print_error(jvmti, 
"GetLineNumberTable", ret);
-   }
-   }
+   nr_total += rec->numpcs;
}
}
 
@@ -122,14 +107,12 @@ get_line_numbers(jvmtiEnv *jvmti, const void 
*compile_info, jvmti_l

[PATCH] iio: position: iqs624: remove usage of iio_priv_to_dev()

2020-05-21 Thread Alexandru Ardelean

We may want to get rid of the iio_priv_to_dev() helper. That's a bit
uncertain at this point. The reason is that we will hide some of the
members of the iio_dev structure (to prevent drivers from accessing them
directly), and that will also mean hiding the implementation of the
iio_priv_to_dev() helper inside the IIO core.

Hiding the implementation of iio_priv_to_dev() implies that some fast-paths
may not be fast anymore, so a general idea is to try to get rid of the
iio_priv_to_dev() altogether.

For this driver, removing iio_priv_to_dev() also means keeping a reference
on the state struct.

Signed-off-by: Alexandru Ardelean 
---
 drivers/iio/position/iqs624-pos.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iio/position/iqs624-pos.c 
b/drivers/iio/position/iqs624-pos.c
index 77096c31c2ba..520dafbdc48f 100644
--- a/drivers/iio/position/iqs624-pos.c
+++ b/drivers/iio/position/iqs624-pos.c
@@ -23,6 +23,7 @@
 
 struct iqs624_pos_private {
struct iqs62x_core *iqs62x;
+   struct iio_dev *indio_dev;
struct notifier_block notifier;
struct mutex lock;
bool angle_en;
@@ -59,7 +60,7 @@ static int iqs624_pos_notifier(struct notifier_block 
*notifier,
 
iqs624_pos = container_of(notifier, struct iqs624_pos_private,
  notifier);
-   indio_dev = iio_priv_to_dev(iqs624_pos);
+   indio_dev = iqs624_pos->indio_dev;
timestamp = iio_get_time_ns(indio_dev);
 
iqs62x = iqs624_pos->iqs62x;
@@ -98,7 +99,7 @@ static int iqs624_pos_notifier(struct notifier_block 
*notifier,
 static void iqs624_pos_notifier_unregister(void *context)
 {
struct iqs624_pos_private *iqs624_pos = context;
-   struct iio_dev *indio_dev = iio_priv_to_dev(iqs624_pos);
+   struct iio_dev *indio_dev = iqs624_pos->indio_dev;
int ret;
 
ret = blocking_notifier_chain_unregister(&iqs624_pos->iqs62x->nh,
@@ -243,6 +244,7 @@ static int iqs624_pos_probe(struct platform_device *pdev)
 
iqs624_pos = iio_priv(indio_dev);
iqs624_pos->iqs62x = iqs62x;
+   iqs624_pos->indio_dev = indio_dev;
 
indio_dev->modes = INDIO_DIRECT_MODE;
indio_dev->dev.parent = &pdev->dev;
-- 
2.25.1

[PATCH] dmaengine: sprd: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu

pm_runtime_get_sync() increments the runtime PM usage counter even
when it returns an error code. Thus a pairing decrement is needed on
the error handling path to keep the counter balanced.

Also, call pm_runtime_disable() when pm_runtime_get_sync() returns
an error code.

Signed-off-by: Dinghao Liu 
---
 drivers/dma/sprd-dma.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index 0ef5ca81ba4d..275d83768d0d 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -1210,7 +1210,7 @@ static int sprd_dma_probe(struct platform_device *pdev)
ret = dma_async_device_register(&sdev->dma_dev);
if (ret < 0) {
dev_err(&pdev->dev, "register dma device failed:%d\n", ret);
-   goto err_register;
+   goto err_rpm;
}
 
sprd_dma_info.dma_cap = sdev->dma_dev.cap_mask;
@@ -1224,10 +1224,9 @@ static int sprd_dma_probe(struct platform_device *pdev)
 
 err_of_register:
dma_async_device_unregister(&sdev->dma_dev);
-err_register:
+err_rpm:
pm_runtime_put_noidle(&pdev->dev);
pm_runtime_disable(&pdev->dev);
-err_rpm:
sprd_dma_disable(sdev);
return ret;
 }
-- 
2.17.1

[PATCH v2 1/3] regulator: max14577: Add proper dt-compatible strings

2020-05-21 Thread Marek Szyprowski

Add device tree compatible strings and create proper modalias structures
to let this driver load automatically if compiled as module, because
max14577 MFD driver creates MFD cells with such compatible strings.

Signed-off-by: Marek Szyprowski 
---
v2:
- added .of_match_table pointer
---
 drivers/regulator/max14577-regulator.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/max14577-regulator.c 
b/drivers/regulator/max14577-regulator.c
index 07a150c9bbf2..4c9ae52b9e87 100644
--- a/drivers/regulator/max14577-regulator.c
+++ b/drivers/regulator/max14577-regulator.c
@@ -238,10 +238,20 @@ static const struct platform_device_id 
max14577_regulator_id[] = {
 };
 MODULE_DEVICE_TABLE(platform, max14577_regulator_id);
 
+static const struct of_device_id of_max14577_regulator_dt_match[] = {
+   { .compatible = "maxim,max77836-regulator",
+ .data = (void *)MAXIM_DEVICE_TYPE_MAX77836, },
+   { .compatible = "maxim,max14577-regulator",
+ .data = (void *)MAXIM_DEVICE_TYPE_MAX14577, },
+   { },
+};
+MODULE_DEVICE_TABLE(of, of_max14577_regulator_dt_match);
+
 static struct platform_driver max14577_regulator_driver = {
.driver = {
-  .name = "max14577-regulator",
-  },
+   .name = "max14577-regulator",
+   .of_match_table = of_max14577_regulator_dt_match,
+   },
.probe  = max14577_regulator_probe,
.id_table   = max14577_regulator_id,
 };
-- 
2.17.1

[PATCH v2 2/3] extcon: max14577: Add proper dt-compatible strings

2020-05-21 Thread Marek Szyprowski

Add device tree compatible strings and create proper modalias structures
to let this driver load automatically if compiled as module, because
max14577 MFD driver creates MFD cells with such compatible strings.

Signed-off-by: Marek Szyprowski 
---
v2:
- added .of_match_table pointer
---
 drivers/extcon/extcon-max14577.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/extcon/extcon-max14577.c b/drivers/extcon/extcon-max14577.c
index 32f663436e6e..03af678ddeba 100644
--- a/drivers/extcon/extcon-max14577.c
+++ b/drivers/extcon/extcon-max14577.c
@@ -782,9 +782,19 @@ static const struct platform_device_id max14577_muic_id[] 
= {
 };
 MODULE_DEVICE_TABLE(platform, max14577_muic_id);
 
+static const struct of_device_id of_max14577_muic_dt_match[] = {
+   { .compatible = "maxim,max77836-muic",
+ .data = (void *)MAXIM_DEVICE_TYPE_MAX77836, },
+   { .compatible = "maxim,max14577-muic",
+ .data = (void *)MAXIM_DEVICE_TYPE_MAX14577, },
+   { },
+};
+MODULE_DEVICE_TABLE(of, of_max14577_muic_dt_match);
+
 static struct platform_driver max14577_muic_driver = {
.driver = {
.name   = "max14577-muic",
+   .of_match_table = of_max14577_muic_dt_match,
},
.probe  = max14577_muic_probe,
.remove = max14577_muic_remove,
-- 
2.17.1

[PATCH v2 3/3] power: charger: max14577: Add proper dt-compatible strings

2020-05-21 Thread Marek Szyprowski

Add device tree compatible strings and create proper modalias structures
to let this driver load automatically if compiled as module, because
max14577 MFD driver creates MFD cells with such compatible strings.

Signed-off-by: Marek Szyprowski 
---
v2:
- added .of_match_table pointer
---
 drivers/power/supply/max14577_charger.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/power/supply/max14577_charger.c 
b/drivers/power/supply/max14577_charger.c
index 8a59feac6468..96f4cd1941b2 100644
--- a/drivers/power/supply/max14577_charger.c
+++ b/drivers/power/supply/max14577_charger.c
@@ -623,9 +623,19 @@ static const struct platform_device_id 
max14577_charger_id[] = {
 };
 MODULE_DEVICE_TABLE(platform, max14577_charger_id);
 
+static const struct of_device_id of_max14577_charger_dt_match[] = {
+   { .compatible = "maxim,max77836-charger",
+ .data = (void *)MAXIM_DEVICE_TYPE_MAX77836, },
+   { .compatible = "maxim,max14577-charger",
+ .data = (void *)MAXIM_DEVICE_TYPE_MAX14577, },
+   { },
+};
+MODULE_DEVICE_TABLE(of, of_max14577_charger_dt_match);
+
 static struct platform_driver max14577_charger_driver = {
.driver = {
.name   = "max14577-charger",
+   .of_match_table = of_max14577_charger_dt_match,
},
.probe  = max14577_charger_probe,
.remove = max14577_charger_remove,
-- 
2.17.1

Re: [PATCH v2 2/3] irqchip/sifive-plic: Setup cpuhp once after boot CPU handler is present

2020-05-21 Thread Anup Patel

On Fri, May 22, 2020 at 3:36 AM Palmer Dabbelt  wrote:
>
> On Mon, 18 May 2020 02:14:40 PDT (-0700), Anup Patel wrote:
> > For multiple PLIC instances, the plic_init() is called once for each
> > PLIC instance. Due to this we have two issues:
> > 1. cpuhp_setup_state() is called multiple times
> > 2. plic_starting_cpu() can crash for boot CPU if cpuhp_setup_state()
> >is called before boot CPU PLIC handler is available.
> >
> > This patch fixes both above issues.
> >
> > Fixes: f1ad1133b18f ("irqchip/sifive-plic: Add support for multiple PLICs")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Anup Patel 
> > ---
> >  drivers/irqchip/irq-sifive-plic.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-sifive-plic.c 
> > b/drivers/irqchip/irq-sifive-plic.c
> > index 9f7f8ce88c00..6c54abf5cc5e 100644
> > --- a/drivers/irqchip/irq-sifive-plic.c
> > +++ b/drivers/irqchip/irq-sifive-plic.c
> > @@ -76,6 +76,7 @@ struct plic_handler {
> >   void __iomem*enable_base;
> >   struct plic_priv*priv;
> >  };
> > +static bool plic_cpuhp_setup_done;
> >  static DEFINE_PER_CPU(struct plic_handler, plic_handlers);
> >
> >  static inline void plic_toggle(struct plic_handler *handler,
> > @@ -285,6 +286,7 @@ static int __init plic_init(struct device_node *node,
> >   int error = 0, nr_contexts, nr_handlers = 0, i;
> >   u32 nr_irqs;
> >   struct plic_priv *priv;
> > + struct plic_handler *handler;
> >
> >   priv = kzalloc(sizeof(*priv), GFP_KERNEL);
> >   if (!priv)
> > @@ -313,7 +315,6 @@ static int __init plic_init(struct device_node *node,
> >
> >   for (i = 0; i < nr_contexts; i++) {
> >   struct of_phandle_args parent;
> > - struct plic_handler *handler;
> >   irq_hw_number_t hwirq;
> >   int cpu, hartid;
> >
> > @@ -367,9 +368,18 @@ static int __init plic_init(struct device_node *node,
> >   nr_handlers++;
> >   }
> >
> > - cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
> > + /*
> > +  * We can have multiple PLIC instances so setup cpuhp state only
> > +  * when context handler for current/boot CPU is present.
> > +  */
> > + handler = this_cpu_ptr(&plic_handlers);
> > + if (handler->present && !plic_cpuhp_setup_done) {
> > + cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING,
> > "irqchip/sifive/plic:starting",
> > plic_starting_cpu, plic_dying_cpu);
> > + plic_cpuhp_setup_done = true;
>
> So presumably something else is preventing multiple plic_init() calls from
> executing at the same time?  Assuming that's the case

AFAIK, interrupt controller and timer probing happens sequentially on
boot CPU before all secondary CPUs are brought-up.

>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 
>
> > + }
> > +
> >   pr_info("mapped %d interrupts with %d handlers for %d contexts.\n",
> >   nr_irqs, nr_handlers, nr_contexts);
> >   set_handle_irq(plic_handle_irq);

Thanks,
Anup

[PATCH] perf list: Add metrics to command line usage

2020-05-21 Thread Ian Rogers

Before:
 Usage: perf list [] [hw|sw|cache|tracepoint|pmu|sdt|event_glob]
After:
 Usage: perf list [] 
[hw|sw|cache|tracepoint|pmu|sdt|metric|metricgroup|event_glob]
Signed-off-by: Ian Rogers 
---
 tools/perf/builtin-list.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c
index 965ef017496f..0a7fe4cb 100644
--- a/tools/perf/builtin-list.c
+++ b/tools/perf/builtin-list.c
@@ -42,7 +42,7 @@ int cmd_list(int argc, const char **argv)
OPT_END()
};
const char * const list_usage[] = {
-   "perf list [] 
[hw|sw|cache|tracepoint|pmu|sdt|event_glob]",
+   "perf list [] 
[hw|sw|cache|tracepoint|pmu|sdt|metric|metricgroup|event_glob]",
NULL
};
 
-- 
2.27.0.rc0.183.gde8f92d652-goog

Re: [PATCH v2 1/3] irqchip/sifive-plic: Set default irq affinity in plic_irqdomain_map()

2020-05-21 Thread Anup Patel

On Fri, May 22, 2020 at 3:36 AM Palmer Dabbelt  wrote:
>
> On Mon, 18 May 2020 02:14:39 PDT (-0700), Anup Patel wrote:
> > For multiple PLIC instances, each PLIC can only target a subset of
> > CPUs which is represented by "lmask" in the "struct plic_priv".
> >
> > Currently, the default irq affinity for each PLIC interrupt is all
> > online CPUs which is illegal value for default irq affinity when we
> > have multiple PLIC instances. To fix this, we now set "lmask" as the
> > default irq affinity in for each interrupt in plic_irqdomain_map().
> >
> > Fixes: f1ad1133b18f ("irqchip/sifive-plic: Add support for multiple PLICs")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Anup Patel 
> > ---
> >  drivers/irqchip/irq-sifive-plic.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/irqchip/irq-sifive-plic.c 
> > b/drivers/irqchip/irq-sifive-plic.c
> > index 822e074c0600..9f7f8ce88c00 100644
> > --- a/drivers/irqchip/irq-sifive-plic.c
> > +++ b/drivers/irqchip/irq-sifive-plic.c
> > @@ -176,9 +176,12 @@ static struct irq_chip plic_chip = {
> >  static int plic_irqdomain_map(struct irq_domain *d, unsigned int irq,
> > irq_hw_number_t hwirq)
> >  {
> > + struct plic_priv *priv = d->host_data;
> > +
> >   irq_domain_set_info(d, irq, hwirq, &plic_chip, d->host_data,
> >   handle_fasteoi_irq, NULL, NULL);
>
> If you're going to re-spin this, d->host_data could be priv here.

The controller's private data is named "host_data" for "struct irq_domain"
in Linux irq subsystem hence the usage.

>
> >   irq_set_noprobe(irq);
> > + irq_set_affinity(irq, &priv->lmask);
> >   return 0;
> >  }
>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 

Thanks,
Anup

Re: [PATCH v2 00/18] Add support for Nitro Enclaves

2020-05-21 Thread Paraschiv, Andra-Irina




On 22/05/2020 09:29, Andra Paraschiv wrote:

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest  that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

How does all gets to an enclave VM running on the host?

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the current patch series.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
an add memory PCI command. The PCI device commands are then translated into
actions taken on the hypervisor side; that's the Nitro hypervisor running on the
host where the primary VM is running. The Nitro hypervisor is based on core KVM
technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [2]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.


Adding here that we've also considered FIT image format [1] as an option.

Andra

[1] https://github.com/u-boot/u-boot/tree/master/doc/uImage.FIT



Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

The following patch series covers the NE driver for enclave lifetime management.
It provides an ioctl interface to the user space and includes the NE PCI device
driver that is the means of communication with the hypervisor running on the
host where the primary VM and the enclave are launched.

The proposed solution is following the KVM model and uses KVM ioctls to be able
to create and set resources for enclaves. Additional NE ioctl commands, besides
the ones provided by KVM, are used to start an enclave and get memory offset for
in-memory enclave image loading.

Thank you.

Andra

[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
[2] http://man7.org/linux/man-pages/man7/vsock.7.html

---

Patch Series Changelog

The patch series is built on top of v5.7-rc6.

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creat

Re: [PATCH] rxrpc: Fix a memory leak in rxkad_verify_response()

2020-05-21 Thread Markus Elfring

> How do you think about a wording variant like the following?
>
>A ticket was not released after a call of the function “platform_get_irq” 
> failed.

I should have specified an other function name here.

   A ticket was not released after a call of the function 
“rxkad_decrypt_ticket” failed.


Regards,
Markus

Re: [PATCH 5/5] dt-bindings: timer: Add CLINT bindings

2020-05-21 Thread Anup Patel

On Fri, May 22, 2020 at 11:59 AM Sean Anderson  wrote:
>
> On 5/22/20 1:54 AM, Anup Patel wrote:
> > On Fri, May 22, 2020 at 1:35 AM Sean Anderson  wrote:
> >>
> >> On 5/21/20 9:45 AM, Anup Patel wrote:
> >>> +Required properties:
> >>> +- compatible : "sifive,clint-1.0.0" and a string identifying the actual
> >>> +  detailed implementation in case that specific bugs need to be worked 
> >>> around.
> >>
> >> Should the "riscv,clint0" compatible string be documented here? This
> >
> > Yes, I forgot to add this compatible string. I will add in v2.
> >
> >> peripheral is not really specific to sifive, as it is present in most
> >> rocket-chip cores.
> >
> > I agree that CLINT is present in a lot of non-SiFive RISC-V SOCs and
> > FPGAs but this IP is only documented as part of SiFive FU540 SOC.
> > (Refer, https://static.dev.sifive.com/FU540-C000-v1.0.pdf)
> >
> > The RISC-V foundation should host the CLINT spec independently
> > under https://github.com/riscv and make CLINT spec totally open.
> >
> > For now, I have documented it just like PLIC DT bindings found at:
> > Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.txt
>
> The PLIC seems to have its own RISC-V-sponsored documentation [1] which
> was split off from the older privileged specs. By your logic above,
> should it be renamed to riscv,plic0.txt (with a corresponding change in
> the documented compatible strings)?
>
> [1] https://github.com/riscv/riscv-plic-spec

For PLIC bindings, we can certainly do the renaming because now
we have PLIC v1 specification hosted on RISC-V Foundation Github.

Regards,
Anup

[PATCH v2 16/18] nitro_enclaves: Add sample for ioctl interface usage

2020-05-21 Thread Andra Paraschiv

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 samples/nitro_enclaves/.gitignore |   2 +
 samples/nitro_enclaves/Makefile   |  28 +
 .../include/linux/nitro_enclaves.h|  23 +
 .../include/uapi/linux/nitro_enclaves.h   |  77 +++
 samples/nitro_enclaves/ne_ioctl_sample.c  | 502 ++
 5 files changed, 632 insertions(+)
 create mode 100644 samples/nitro_enclaves/.gitignore
 create mode 100644 samples/nitro_enclaves/Makefile
 create mode 100644 samples/nitro_enclaves/include/linux/nitro_enclaves.h
 create mode 100644 samples/nitro_enclaves/include/uapi/linux/nitro_enclaves.h
 create mode 100644 samples/nitro_enclaves/ne_ioctl_sample.c

diff --git a/samples/nitro_enclaves/.gitignore 
b/samples/nitro_enclaves/.gitignore
new file mode 100644
index ..827934129c90
--- /dev/null
+++ b/samples/nitro_enclaves/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+ne_ioctl_sample
diff --git a/samples/nitro_enclaves/Makefile b/samples/nitro_enclaves/Makefile
new file mode 100644
index ..2acdc6e0c492
--- /dev/null
+++ b/samples/nitro_enclaves/Makefile
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see .
+
+# Enclave lifetime management support for Nitro Enclaves (NE) - ioctl sample
+# usage.
+
+.PHONY: all clean
+
+CFLAGS += -Wall -Iinclude/
+
+all:
+   $(CC) $(CFLAGS) -o ne_ioctl_sample ne_ioctl_sample.c -lpthread
+
+clean:
+   rm -f ne_ioctl_sample
diff --git a/samples/nitro_enclaves/include/linux/nitro_enclaves.h 
b/samples/nitro_enclaves/include/linux/nitro_enclaves.h
new file mode 100644
index ..7e593a9fbf8c
--- /dev/null
+++ b/samples/nitro_enclaves/include/linux/nitro_enclaves.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/samples/nitro_enclaves/include/uapi/linux/nitro_enclaves.h 
b/samples/nitro_enclaves/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..98ba59f15b52
--- /dev/null
+++ b/samples/nitro_enclaves/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+/**
+ * The command is used to get information needed for in-memory enclave image
+ * loading e.g. offset in enclave memory to start placing the enclave image.
+ *
+ * The image load metadata is an in / out data structure. It includes info
+ * provided by the caller - flags - and returns the offset in enclave memory
+ * where to start placing the enclave image.
+ */
+#define NE_GET_IMAGE_LOAD_METADATA _IOWR(0xAE, 0x20, struct 
image_load_metadata)
+
+/**
+ * The command is used to trigger enclave start af

[PATCH v2 18/18] MAINTAINERS: Add entry for the Nitro Enclaves driver

2020-05-21 Thread Andra Paraschiv

Signed-off-by: Andra Paraschiv 
---
 MAINTAINERS | 13 +
 1 file changed, 13 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ecc0749810b0..69fe37999a9e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11956,6 +11956,19 @@ S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2.git
 F: arch/nios2/
 
+NITRO ENCLAVES (NE)
+M: Andra Paraschiv 
+M: Alexandru Vasile 
+M: Alexandru Ciobotaru 
+L: linux-kernel@vger.kernel.org
+S: Supported
+W: https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+F: include/linux/nitro_enclaves.h
+F: include/uapi/linux/nitro_enclaves.h
+F: drivers/virt/nitro_enclaves/
+F: samples/nitro_enclaves/
+F: Documentation/nitro_enclaves/
+
 NOHZ, DYNTICKS SUPPORT
 M: Frederic Weisbecker 
 M: Thomas Gleixner 
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.

[PATCH v2 17/18] nitro_enclaves: Add overview documentation

2020-05-21 Thread Andra Paraschiv

Signed-off-by: Andra Paraschiv 
---
 Documentation/nitro_enclaves/ne_overview.txt | 86 
 1 file changed, 86 insertions(+)
 create mode 100644 Documentation/nitro_enclaves/ne_overview.txt

diff --git a/Documentation/nitro_enclaves/ne_overview.txt 
b/Documentation/nitro_enclaves/ne_overview.txt
new file mode 100644
index ..be8bb3d84132
--- /dev/null
+++ b/Documentation/nitro_enclaves/ne_overview.txt
@@ -0,0 +1,86 @@
+Nitro Enclaves
+==
+
+Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
+that allows customers to carve out isolated compute environments within EC2
+instances [1].
+
+For example, an application that processes sensitive data and runs in a VM,
+can be separated from other applications running in the same VM. This
+application then runs in a separate VM than the primary VM, namely an enclave.
+
+An enclave runs alongside the VM that spawned it. This setup matches low 
latency
+applications needs. The resources that are allocated for the enclave, such as
+memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
+process running in the primary VM, that communicates with the NE driver via an
+ioctl interface.
+
+In this sense, there are two components:
+
+1. An enclave abstraction process - a user space process running in the primary
+VM guest  that uses the provided ioctl interface of the NE driver to spawn an
+enclave VM (that's 2 below).
+
+How does all gets to an enclave VM running on the host?
+
+There is a NE emulated PCI device exposed to the primary VM. The driver for 
this
+new PCI device is included in the NE driver.
+
+The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE 
ioctl
+maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
+an add memory PCI command. The PCI device commands are then translated into
+actions taken on the hypervisor side; that's the Nitro hypervisor running on 
the
+host where the primary VM is running. The Nitro hypervisor is based on core KVM
+technology.
+
+2. The enclave itself - a VM running on the same host as the primary VM that
+spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
+for the enclave VM. An enclave does not have persistent storage attached.
+
+An enclave communicates with the primary VM via a local communication channel,
+using virtio-vsock [2]. The primary VM has virtio-pci vsock emulated device,
+while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
+uses eventfd for signaling. The enclave VM sees the usual interfaces - local
+APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
+device is placed in memory below the typical 4 GiB.
+
+The application that runs in the enclave needs to be packaged in an enclave
+image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
+enclave VM. The enclave VM has its own kernel and follows the standard Linux
+boot protocol.
+
+The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
+Enclave Image Format (EIF); plus an EIF header including metadata such as magic
+number, eif version, image size and CRC.
+
+Hash values are computed for the entire enclave image (EIF), the kernel and
+ramdisk(s). That's used, for example, to check that the enclave image that is
+loaded in the enclave VM is the one that was intended to be run.
+
+These crypto measurements are included in a signed attestation document
+generated by the Nitro Hypervisor and further used to prove the identity of the
+enclave; KMS is an example of service that NE is integrated with and that 
checks
+the attestation doc.
+
+The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
+init process in the enclave connects to the vsock CID of the primary VM and a
+predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
+used to check in the primary VM that the enclave has booted.
+
+If the enclave VM crashes or gracefully exits, an interrupt event is received 
by
+the NE driver. This event is sent further to the user space enclave process
+running in the primary VM via a poll notification mechanism. Then the user 
space
+enclave process can exit.
+
+The NE driver for enclave lifetime management provides an ioctl interface to 
the
+user space. It includes the NE PCI device driver that is the means of
+communication with the hypervisor running on the host where the primary VM and
+the enclave are launched.
+
+The proposed solution is following the KVM model and uses KVM ioctls to be able
+to create and set resources for enclaves. Additional NE ioctl commands, besides
+the ones provided by KVM, are used to start an enclave and get memory offset 
for
+in-memory enclave image loading.
+
+[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
+[2] http://man7.org/linux/man-pages/man7/vsock.7.html
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. regis

[PATCH v2 15/18] nitro_enclaves: Add Makefile for the Nitro Enclaves driver

2020-05-21 Thread Andra Paraschiv

Signed-off-by: Andra Paraschiv 
---
 drivers/virt/Makefile|  2 ++
 drivers/virt/nitro_enclaves/Makefile | 23 +++
 2 files changed, 25 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Makefile

diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index fd331247c27a..f28425ce4b39 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -5,3 +5,5 @@
 
 obj-$(CONFIG_FSL_HV_MANAGER)   += fsl_hypervisor.o
 obj-y  += vboxguest/
+
+obj-$(CONFIG_NITRO_ENCLAVES)   += nitro_enclaves/
diff --git a/drivers/virt/nitro_enclaves/Makefile 
b/drivers/virt/nitro_enclaves/Makefile
new file mode 100644
index ..9109aed41070
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Makefile
@@ -0,0 +1,23 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see .
+
+# Enclave lifetime management support for Nitro Enclaves (NE).
+
+obj-$(CONFIG_NITRO_ENCLAVES) += nitro_enclaves.o
+
+nitro_enclaves-y := ne_pci_dev.o ne_misc_dev.o
+
+ccflags-y += -Wall
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.

[PATCH v2 12/18] nitro_enclaves: Add logic for enclave start

2020-05-21 Thread Andra Paraschiv

After all the enclave resources are set, the enclave is ready for
beginning to run.

Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.

The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 107 ++
 1 file changed, 107 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index a5c6185613b9..2109ed3f2f53 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -468,6 +468,53 @@ static int ne_set_user_memory_region_ioctl(struct 
ne_enclave *ne_enclave,
return rc;
 }
 
+/**
+ * ne_enclave_start_ioctl - Trigger enclave start after the enclave resources,
+ * such as memory and CPU, have been set.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @enclave_start_metadata: enclave metadata that includes enclave cid and
+ * flags and the slot uid.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_enclave_start_ioctl(struct ne_enclave *ne_enclave,
+   struct enclave_start_metadata *enclave_start_metadata)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct enclave_start_req enclave_start_req = {};
+   int rc = -EINVAL;
+
+   if (WARN_ON(!ne_enclave) || WARN_ON(!ne_enclave->pdev))
+   return -EINVAL;
+
+   if (!enclave_start_metadata)
+   return -EINVAL;
+
+   enclave_start_metadata->slot_uid = ne_enclave->slot_uid;
+
+   enclave_start_req.enclave_cid = enclave_start_metadata->enclave_cid;
+   enclave_start_req.flags = enclave_start_metadata->flags;
+   enclave_start_req.slot_uid = enclave_start_metadata->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, ENCLAVE_START, &enclave_start_req,
+  sizeof(enclave_start_req), &cmd_reply,
+  sizeof(cmd_reply));
+   if (rc < 0) {
+   pr_err_ratelimited(NE "Error in enclave start [rc=%d]\n", rc);
+
+   return rc;
+   }
+
+   ne_enclave->state = NE_STATE_RUNNING;
+
+   enclave_start_metadata->enclave_cid = cmd_reply.enclave_cid;
+
+   return 0;
+}
+
 static int ne_enclave_open(struct inode *node, struct file *file)
 {
return 0;
@@ -575,6 +622,66 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd,
return rc;
}
 
+   case NE_START_ENCLAVE: {
+   struct enclave_start_metadata enclave_start_metadata = {};
+   int rc = -EINVAL;
+
+   if (copy_from_user(&enclave_start_metadata, (void *)arg,
+  sizeof(enclave_start_metadata))) {
+   pr_err_ratelimited(NE "Error in copy from user\n");
+
+   return -EFAULT;
+   }
+
+   mutex_lock(&ne_enclave->enclave_info_mutex);
+
+   if (ne_enclave->state != NE_STATE_INIT) {
+   pr_err_ratelimited(NE "Enclave isn't in init state\n");
+
+   mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+   return -EINVAL;
+   }
+
+   if (!ne_enclave->nr_mem_regions) {
+   pr_err_ratelimited(NE "Enclave has no mem regions\n");
+
+   mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+   return -EINVAL;
+   }
+
+   if (!ne_enclave->nr_vcpus) {
+   pr_err_ratelimited(NE "Enclave has no vcpus\n");
+
+   mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+   return -EINVAL;
+   }
+
+   if (!cpumask_empty(ne_enclave->cpu_siblings)) {
+   pr_err_ratelimited(NE "CPU siblings not used\n");
+
+   mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+   return -EINVAL;
+   }
+
+   rc = ne_enclave_start_ioctl(ne_enclave,
+   &enclave_start_metadata);
+
+   mutex_unlock(&ne_enclave->enclave_info_mutex);
+
+   if (copy_to_user((void *)arg, &enclave_start_metadata,
+sizeof(enclave_start_metadata))) {
+   pr_err_ratelimited(NE "Error in copy to user\n");
+
+   return -EFAULT;
+   }
+
+   return rc;
+   }
+
default:
return -ENOTTY;
}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi Count

[PATCH v2 11/18] nitro_enclaves: Add logic for enclave memory region set

2020-05-21 Thread Andra Paraschiv

Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.

One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.

Add ioctl command logic for setting user space memory region for an
enclave.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 256 ++
 1 file changed, 256 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 936c9ce19213..a5c6185613b9 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -240,6 +240,234 @@ static int ne_create_vcpu_ioctl(struct ne_enclave 
*ne_enclave, u32 vcpu_id)
return rc;
 }
 
+/**
+ * ne_sanity_check_user_mem_region - Sanity check the userspace memory
+ * region received during the set user memory region ioctl call.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be sanity checked.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_sanity_check_user_mem_region(struct ne_enclave *ne_enclave,
+   struct kvm_userspace_memory_region *mem_region)
+{
+   if (WARN_ON(!ne_enclave))
+   return -EINVAL;
+
+   if (!mem_region)
+   return -EINVAL;
+
+   if ((mem_region->memory_size % MIN_MEM_REGION_SIZE) != 0) {
+   pr_err_ratelimited(NE "Mem size not multiple of 2 MiB\n");
+
+   return -EINVAL;
+   }
+
+   if ((mem_region->userspace_addr & (MIN_MEM_REGION_SIZE - 1)) ||
+   !access_ok((void __user *)(unsigned long)mem_region->userspace_addr,
+  mem_region->memory_size)) {
+   pr_err_ratelimited(NE "Invalid user space addr range\n");
+
+   return -EINVAL;
+   }
+
+   if ((mem_region->guest_phys_addr + mem_region->memory_size) <
+   mem_region->guest_phys_addr) {
+   pr_err_ratelimited(NE "Invalid guest phys addr range\n");
+
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_set_user_memory_region_ioctl - Add user space memory region to the slot
+ * associated with the current enclave.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @mem_region: user space memory region to be associated with the given slot.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_set_user_memory_region_ioctl(struct ne_enclave *ne_enclave,
+   struct kvm_userspace_memory_region *mem_region)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   long gup_rc = 0;
+   unsigned long i = 0;
+   struct ne_mem_region *ne_mem_region = NULL;
+   unsigned long nr_phys_contig_mem_regions = 0;
+   unsigned long nr_pinned_pages = 0;
+   struct page **phys_contig_mem_regions = NULL;
+   int rc = -EINVAL;
+   struct slot_add_mem_req slot_add_mem_req = {};
+
+   if (WARN_ON(!ne_enclave) || WARN_ON(!ne_enclave->pdev))
+   return -EINVAL;
+
+   if (!mem_region)
+   return -EINVAL;
+
+   if (ne_enclave->mm != current->mm)
+   return -EIO;
+
+   rc = ne_sanity_check_user_mem_region(ne_enclave, mem_region);
+   if (rc < 0)
+   return rc;
+
+   ne_mem_region = kzalloc(sizeof(*ne_mem_region), GFP_KERNEL);
+   if (!ne_mem_region)
+   return -ENOMEM;
+
+   /*
+* TODO: Update nr_pages value to handle contiguous virtual address
+* ranges mapped to non-contiguous physical regions. Hugetlbfs can give
+* 2 MiB / 1 GiB contiguous physical regions.
+*/
+   ne_mem_region->nr_pages = mem_region->memory_size / MIN_MEM_REGION_SIZE;
+
+   ne_mem_region->pages = kcalloc(ne_mem_region->nr_pages,
+  sizeof(*ne_mem_region->pages),
+  GFP_KERNEL);
+   if (!ne_mem_region->pages) {
+   kzfree(ne_mem_region);
+
+   return -ENOMEM;
+   }
+
+   phys_contig_mem_regions = kcalloc(ne_mem_region->nr_pages,
+ sizeof(*phys_contig_mem_regions),
+ GFP_KERNEL);
+   if (!phys_contig_mem_regions) {
+   kzfree(ne_mem_region->pages);
+   kzfree(ne_mem_region);
+
+   return -ENOMEM;
+   }
+
+   /*
+* TODO: Handle non-contiguous memory regions received from user space.
+* Hugetlbfs can give 2 MiB / 1 GiB con

[PATCH v2 10/18] nitro_enclaves: Add logic for enclave image load metadata

2020-05-21 Thread Andra Paraschiv

Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.

Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the giveni memory offset.

Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 24 +++
 1 file changed, 24 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 8cf262ac1bbc..936c9ce19213 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -295,6 +295,30 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd,
return rc;
}
 
+   case NE_GET_IMAGE_LOAD_METADATA: {
+   struct image_load_metadata image_load_metadata = {};
+
+   if (copy_from_user(&image_load_metadata, (void *)arg,
+  sizeof(image_load_metadata))) {
+   pr_err_ratelimited(NE "Error in copy from user\n");
+
+   return -EFAULT;
+   }
+
+   /* TODO: Check flags before setting the memory offset. */
+
+   image_load_metadata.memory_offset = NE_IMAGE_LOAD_OFFSET;
+
+   if (copy_to_user((void *)arg, &image_load_metadata,
+sizeof(image_load_metadata))) {
+   pr_err_ratelimited(NE "Error in copy to user\n");
+
+   return -EFAULT;
+   }
+
+   return 0;
+   }
+
default:
return -ENOTTY;
}
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.

[PATCH v2 13/18] nitro_enclaves: Add logic for enclave termination

2020-05-21 Thread Andra Paraschiv

An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.

Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 177 ++
 1 file changed, 177 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 2109ed3f2f53..a6f8d2e98e67 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -689,9 +689,186 @@ static long ne_enclave_ioctl(struct file *file, unsigned 
int cmd,
return 0;
 }
 
+/**
+ * ne_enclave_remove_all_mem_region_entries - Remove all memory region
+ * entries from the enclave data structure.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ */
+static void ne_enclave_remove_all_mem_region_entries(
+   struct ne_enclave *ne_enclave)
+{
+   struct ne_mem_region *ne_mem_region = NULL;
+   struct ne_mem_region *ne_mem_region_tmp = NULL;
+
+   if (WARN_ON(!ne_enclave))
+   return;
+
+   list_for_each_entry_safe(ne_mem_region, ne_mem_region_tmp,
+&ne_enclave->mem_regions_list,
+mem_region_list_entry) {
+   list_del(&ne_mem_region->mem_region_list_entry);
+
+   unpin_user_pages(ne_mem_region->pages,
+ne_mem_region->nr_pages);
+
+   kzfree(ne_mem_region->pages);
+
+   kzfree(ne_mem_region);
+   }
+}
+
+/**
+ * ne_enclave_remove_all_vcpu_id_entries - Remove all vCPU id entries
+ * from the enclave data structure.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ */
+static void ne_enclave_remove_all_vcpu_id_entries(struct ne_enclave 
*ne_enclave)
+{
+   unsigned int cpu = 0;
+   struct ne_vcpu_id *ne_vcpu_id = NULL;
+   struct ne_vcpu_id *ne_vcpu_id_tmp = NULL;
+
+   if (WARN_ON(!ne_enclave))
+   return;
+
+   mutex_lock(&ne_cpu_pool.mutex);
+
+   list_for_each_entry_safe(ne_vcpu_id, ne_vcpu_id_tmp,
+&ne_enclave->vcpu_ids_list,
+vcpu_id_list_entry) {
+   list_del(&ne_vcpu_id->vcpu_id_list_entry);
+
+   /* Update the available CPU pool. */
+   cpumask_set_cpu(ne_vcpu_id->vcpu_id, ne_cpu_pool.avail);
+
+   kzfree(ne_vcpu_id);
+   }
+
+   /* If any siblings left in the enclave CPU pool, move to available. */
+   for_each_cpu(cpu, ne_enclave->cpu_siblings) {
+   cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
+
+   cpumask_set_cpu(cpu, ne_cpu_pool.avail);
+   }
+
+   free_cpumask_var(ne_enclave->cpu_siblings);
+
+   mutex_unlock(&ne_cpu_pool.mutex);
+}
+
+/**
+ * ne_pci_dev_remove_enclave_entry - Remove enclave entry from the data
+ * structure that is part of the PCI device private data.
+ *
+ * This function gets called with the ne_pci_dev enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @ne_pci_dev: private data associated with the PCI device.
+ */
+static void ne_pci_dev_remove_enclave_entry(struct ne_enclave *ne_enclave,
+   struct ne_pci_dev *ne_pci_dev)
+{
+   struct ne_enclave *ne_enclave_entry = NULL;
+   struct ne_enclave *ne_enclave_entry_tmp = NULL;
+
+   if (WARN_ON(!ne_enclave) || WARN_ON(!ne_pci_dev))
+   return;
+
+   list_for_each_entry_safe(ne_enclave_entry, ne_enclave_entry_tmp,
+&ne_pci_dev->enclaves_list,
+enclave_list_entry) {
+   if (ne_enclave_entry->slot_uid == ne_enclave->slot_uid) {
+   list_del(&ne_enclave_entry->enclave_list_entry);
+
+   break;
+   }
+   }
+}
+
 static int ne_enclave_release(struct inode *inode, struct file *file)
 {
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct enclave_stop_req enclave_stop_request = {};
+   struct ne_enclave *ne_enclave = file->private_data;
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   int rc = -EINVAL;
+   struct slot_free_req slot_free_req = {};
+
+   if (WARN_ON(!ne_enclave))
+   return 0;
+
+   /*
+* Early exit in case there is an error in the enclave creation logic
+* and fput() is called on the cleanup path.
+*/
+   if (!ne_enclave->slot_uid)
+   return 0;
+

[PATCH v2 14/18] nitro_enclaves: Add Kconfig for the Nitro Enclaves driver

2020-05-21 Thread Andra Paraschiv

Signed-off-by: Andra Paraschiv 
---
 drivers/virt/Kconfig|  2 ++
 drivers/virt/nitro_enclaves/Kconfig | 28 
 2 files changed, 30 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/Kconfig

diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index 363af2eaf2ba..ae82460cdec2 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -32,4 +32,6 @@ config FSL_HV_MANAGER
 partition shuts down.
 
 source "drivers/virt/vboxguest/Kconfig"
+
+source "drivers/virt/nitro_enclaves/Kconfig"
 endif
diff --git a/drivers/virt/nitro_enclaves/Kconfig 
b/drivers/virt/nitro_enclaves/Kconfig
new file mode 100644
index ..2298a4bf609b
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/Kconfig
@@ -0,0 +1,28 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, see .
+
+# Amazon Nitro Enclaves (NE) support.
+# Nitro is a hypervisor that has been developed by Amazon.
+
+config NITRO_ENCLAVES
+   tristate "Nitro Enclaves Support"
+   depends on HOTPLUG_CPU
+   help
+ This driver consists of support for enclave lifetime management
+ for Nitro Enclaves (NE).
+
+ To compile this driver as a module, choose M here.
+ The module will be called nitro_enclaves.
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.

Re: [PATCH] rxrpc: Fix a memory leak in rxkad_verify_response()

2020-05-21 Thread Markus Elfring

> In function rxkad_verify_response(), pointer "ticket" is not released,
> when function rxkad_decrypt_ticket() returns an error, causing a
> memory leak bug.

I suggest to improve also this change description.
How do you think about a wording variant like the following?

   A ticket was not released after a call of the function “platform_get_irq” 
failed.
   Thus replace the jump target “temporary_error_free_resp”
   by “temporary_error_free_ticket”.


Regards,
Markus

[PATCH v2 05/18] nitro_enclaves: Handle PCI device command requests

2020-05-21 Thread Andra Paraschiv

The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.

Add logic for handling PCI device command requests based on the given
command type.

Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 

Fix issue reported in:
https://lore.kernel.org/lkml/202004231644.xtmn4z1z%25...@intel.com/

Reported-by: kbuild test robot 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 284 +++
 1 file changed, 284 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 8e39e30c882f..7b2d8e1b49a5 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -42,6 +42,268 @@ static const struct pci_device_id ne_pci_ids[] = {
 
 MODULE_DEVICE_TABLE(pci, ne_pci_ids);
 
+/**
+ * ne_submit_request - Submit command request to the PCI device based on the
+ * command type.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to send the command to.
+ * @cmd_type: command type of the request sent to the PCI device.
+ * @cmd_request: command request payload.
+ * @cmd_request_size: size of the command request payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_submit_request(struct pci_dev *pdev,
+enum ne_pci_dev_cmd_type cmd_type,
+void *cmd_request, size_t cmd_request_size)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev) || WARN_ON(!ne_pci_dev->iomem_base))
+   return -EINVAL;
+
+   if (cmd_type <= INVALID_CMD || cmd_type >= MAX_CMD) {
+   dev_err_ratelimited(&pdev->dev, NE "Invalid cmd type=%u\n",
+   cmd_type);
+
+   return -EINVAL;
+   }
+
+   if (!cmd_request) {
+   dev_err_ratelimited(&pdev->dev, NE "Null cmd request\n");
+
+   return -EINVAL;
+   }
+
+   if (cmd_request_size > NE_SEND_DATA_SIZE) {
+   dev_err_ratelimited(&pdev->dev,
+   NE "Invalid req size=%zu for cmd type=%u\n",
+   cmd_request_size, cmd_type);
+
+   return -EINVAL;
+   }
+
+   memcpy_toio(ne_pci_dev->iomem_base + NE_SEND_DATA, cmd_request,
+   cmd_request_size);
+
+   iowrite32(cmd_type, ne_pci_dev->iomem_base + NE_COMMAND);
+
+   return 0;
+}
+
+/**
+ * ne_retrieve_reply - Retrieve reply from the PCI device.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device to receive the reply from.
+ * @cmd_reply: command reply payload.
+ * @cmd_reply_size: size of the command reply payload.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_retrieve_reply(struct pci_dev *pdev,
+struct ne_pci_dev_cmd_reply *cmd_reply,
+size_t cmd_reply_size)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev) || WARN_ON(!ne_pci_dev->iomem_base))
+   return -EINVAL;
+
+   if (!cmd_reply) {
+   dev_err_ratelimited(&pdev->dev, NE "Null cmd reply\n");
+
+   return -EINVAL;
+   }
+
+   if (cmd_reply_size > NE_RECV_DATA_SIZE) {
+   dev_err_ratelimited(&pdev->dev, NE "Invalid reply size=%zu\n",
+   cmd_reply_size);
+
+   return -EINVAL;
+   }
+
+   memcpy_fromio(cmd_reply, ne_pci_dev->iomem_base + NE_RECV_DATA,
+ cmd_reply_size);
+
+   return 0;
+}
+
+/**
+ * ne_wait_for_reply - Wait for a reply of a PCI command.
+ *
+ * This function gets called with the ne_pci_dev mutex held.
+ *
+ * @pdev: PCI device for which a reply is waited.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_wait_for_reply(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   int rc = -EINVAL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev))
+   return -EINVAL;
+
+   /*
+* TODO: Update to _interruptible and handle interrupted wait event
+* e.g. -ERESTARTSYS, incoming signals + add / update timeout.
+*/
+   rc = wait_event_timeout(ne_pci_dev->cmd_reply_wait_q,
+   atomic_read(&

[PATCH v2 09/18] nitro_enclaves: Add logic for enclave vcpu creation

2020-05-21 Thread Andra Paraschiv

An enclave, before being started, has its resources set. One of its
resources is CPU.

Add ioctl command logic for enclave vCPU creation. Return as result a
file descriptor that is associated with the enclave vCPU.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 223 ++
 1 file changed, 223 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index 1036221238f4..8cf262ac1bbc 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -63,6 +63,183 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+static int ne_enclave_vcpu_open(struct inode *node, struct file *file)
+{
+   return 0;
+}
+
+static long ne_enclave_vcpu_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+   switch (cmd) {
+   default:
+   return -ENOTTY;
+   }
+
+   return 0;
+}
+
+static int ne_enclave_vcpu_release(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static const struct file_operations ne_enclave_vcpu_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = ne_enclave_vcpu_ioctl,
+   .open   = ne_enclave_vcpu_open,
+   .release= ne_enclave_vcpu_release,
+};
+
+/**
+ * ne_get_cpu_from_cpu_pool - Get a CPU from the CPU pool, if it is set.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @vcpu_id: id of the CPU to be associated with the given slot, apic id on 
x86.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_get_cpu_from_cpu_pool(struct ne_enclave *ne_enclave, u32 
*vcpu_id)
+{
+   unsigned int cpu = 0;
+   unsigned int cpu_sibling = 0;
+
+   if (WARN_ON(!ne_enclave))
+   return -EINVAL;
+
+   if (!vcpu_id)
+   return -EINVAL;
+
+   /* There are CPU siblings available to choose from. */
+   cpu = cpumask_any(ne_enclave->cpu_siblings);
+   if (cpu < nr_cpu_ids) {
+   cpumask_clear_cpu(cpu, ne_enclave->cpu_siblings);
+
+   *vcpu_id = cpu;
+
+   return 0;
+   }
+
+   mutex_lock(&ne_cpu_pool.mutex);
+
+   /* Choose any CPU from the available CPU pool. */
+   cpu = cpumask_any(ne_cpu_pool.avail);
+   if (cpu >= nr_cpu_ids) {
+   pr_err_ratelimited(NE "No CPUs available in CPU pool\n");
+
+   mutex_unlock(&ne_cpu_pool.mutex);
+
+   return -EINVAL;
+   }
+
+   cpumask_clear_cpu(cpu, ne_cpu_pool.avail);
+
+   /*
+* Make sure the CPU siblings are not marked as
+* available anymore.
+*/
+   for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
+   if (cpu_sibling != cpu) {
+   cpumask_clear_cpu(cpu_sibling, ne_cpu_pool.avail);
+
+   cpumask_set_cpu(cpu_sibling, ne_enclave->cpu_siblings);
+   }
+   }
+
+   mutex_unlock(&ne_cpu_pool.mutex);
+
+   *vcpu_id = cpu;
+
+   return 0;
+}
+
+/**
+ * ne_create_vcpu_ioctl - Add vCPU to the slot associated with the current
+ * enclave. Create vCPU file descriptor to be further used for CPU handling.
+ *
+ * This function gets called with the ne_enclave mutex held.
+ *
+ * @ne_enclave: private data associated with the current enclave.
+ * @vcpu_id: id of the CPU to be associated with the given slot, apic id on 
x86.
+ *
+ * @returns: vCPU fd on success, negative return value on failure.
+ */
+static int ne_create_vcpu_ioctl(struct ne_enclave *ne_enclave, u32 vcpu_id)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   int fd = 0;
+   struct file *file = NULL;
+   struct ne_vcpu_id *ne_vcpu_id = NULL;
+   int rc = -EINVAL;
+   struct slot_add_vcpu_req slot_add_vcpu_req = {};
+
+   if (WARN_ON(!ne_enclave) || WARN_ON(!ne_enclave->pdev))
+   return -EINVAL;
+
+   if (ne_enclave->mm != current->mm)
+   return -EIO;
+
+   ne_vcpu_id = kzalloc(sizeof(*ne_vcpu_id), GFP_KERNEL);
+   if (!ne_vcpu_id)
+   return -ENOMEM;
+
+   fd = get_unused_fd_flags(O_CLOEXEC);
+   if (fd < 0) {
+   rc = fd;
+
+   pr_err_ratelimited(NE "Error in getting unused fd [rc=%d]\n",
+  rc);
+
+   goto free_ne_vcpu_id;
+   }
+
+   /* TODO: Include (vcpu) id in the ne-vm-vcpu naming. */
+   file = anon_inode_getfile("ne-vm-vcpu", &ne_enclave_vcpu_fops,
+ ne_enclave, O_RDWR);
+   if (IS_ERR(file)) {
+   rc = PTR_ERR(file);
+
+   pr_err_ratelimited(NE "Error in anon inode get file [rc=%d]\n",
+  rc);
+
+

[PATCH v2 03/18] nitro_enclaves: Define enclave info for internal bookkeeping

2020-05-21 Thread Andra Paraschiv

The Nitro Enclaves driver keeps an internal info per each enclave.

This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.h | 121 ++
 1 file changed, 121 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.h 
b/drivers/virt/nitro_enclaves/ne_misc_dev.h
new file mode 100644
index ..9d683607502f
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.h
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef _NE_MISC_DEV_H_
+#define _NE_MISC_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Entry in vCPU IDs list. */
+struct ne_vcpu_id {
+   /* CPU id associated with a given slot, apic id on x86. */
+   u32 vcpu_id;
+
+   struct list_head vcpu_id_list_entry;
+};
+
+/* Entry in memory regions list. */
+struct ne_mem_region {
+   struct list_head mem_region_list_entry;
+
+   /* Number of pages that make up the memory region. */
+   unsigned long nr_pages;
+
+   /* Pages that make up the user space memory region. */
+   struct page **pages;
+};
+
+/* Per-enclave data used for enclave lifetime management. */
+struct ne_enclave {
+   /**
+* CPU pool with siblings of already allocated CPUs to an enclave.
+* This is used when a CPU pool is set, to be able to know the CPU
+* siblings for the hyperthreading (HT) setup.
+*/
+   cpumask_var_t cpu_siblings;
+
+   struct list_head enclave_list_entry;
+
+   /* Mutex for accessing this internal state. */
+   struct mutex enclave_info_mutex;
+
+   /**
+* Wait queue used for out-of-band event notifications
+* triggered from the PCI device event handler to the enclave
+* process via the poll function.
+*/
+   wait_queue_head_t eventq;
+
+   /* Variable used to determine if the out-of-band event was triggered. */
+   bool has_event;
+
+   /**
+* The maximum number of memory regions that can be handled by the
+* lower levels.
+*/
+   u64 max_mem_regions;
+
+   /* Enclave memory regions list. */
+   struct list_head mem_regions_list;
+
+   /* Enclave process abstraction mm data struct. */
+   struct mm_struct *mm;
+
+   /* Number of memory regions associated with the enclave. */
+   u64 nr_mem_regions;
+
+   /* Number of vcpus associated with the enclave. */
+   u64 nr_vcpus;
+
+   /* PCI device used for enclave lifetime management. */
+   struct pci_dev *pdev;
+
+   /* Slot unique id mapped to the enclave. */
+   u64 slot_uid;
+
+   /* Enclave state, updated during enclave lifetime. */
+   u16 state;
+
+   /* Enclave vCPUs list. */
+   struct list_head vcpu_ids_list;
+};
+
+/* States available for an enclave. */
+enum ne_state {
+   /* NE_START_ENCLAVE ioctl was never issued for the enclave. */
+   NE_STATE_INIT = 0,
+
+   /**
+* NE_START_ENCLAVE ioctl was issued and the enclave is running
+* as expected.
+*/
+   NE_STATE_RUNNING = 2,
+
+   /* Enclave exited without userspace interaction. */
+   NE_STATE_STOPPED = U16_MAX,
+};
+
+/* Nitro Enclaves (NE) misc device */
+extern struct miscdevice ne_miscdevice;
+
+#endif /* _NE_MISC_DEV_H_ */
-- 
2.20.1 (Apple Git-117)




Amazon Development Center (Romania) S.R.L. registered office: 27A Sf. Lazar 
Street, UBC5, floor 2, Iasi, Iasi County, 700045, Romania. Registered in 
Romania. Registration number J22/2621/2005.

[PATCH v2 08/18] nitro_enclaves: Add logic for enclave vm creation

2020-05-21 Thread Andra Paraschiv

Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.

Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.

The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 169 ++
 1 file changed, 169 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
index e1866fac8220..1036221238f4 100644
--- a/drivers/virt/nitro_enclaves/ne_misc_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -63,6 +63,146 @@ struct ne_cpu_pool {
 
 static struct ne_cpu_pool ne_cpu_pool;
 
+static int ne_enclave_open(struct inode *node, struct file *file)
+{
+   return 0;
+}
+
+static long ne_enclave_ioctl(struct file *file, unsigned int cmd,
+unsigned long arg)
+{
+   switch (cmd) {
+   default:
+   return -ENOTTY;
+   }
+
+   return 0;
+}
+
+static int ne_enclave_release(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static __poll_t ne_enclave_poll(struct file *file, poll_table *wait)
+{
+   __poll_t mask = 0;
+   struct ne_enclave *ne_enclave = file->private_data;
+
+   poll_wait(file, &ne_enclave->eventq, wait);
+
+   if (!ne_enclave->has_event)
+   return mask;
+
+   mask = POLLHUP;
+
+   return mask;
+}
+
+static const struct file_operations ne_enclave_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .poll   = ne_enclave_poll,
+   .unlocked_ioctl = ne_enclave_ioctl,
+   .open   = ne_enclave_open,
+   .release= ne_enclave_release,
+};
+
+/**
+ * ne_create_vm_ioctl - Alloc slot to be associated with an enclave. Create
+ * enclave file descriptor to be further used for enclave resources handling
+ * e.g. memory regions and CPUs.
+ *
+ * This function gets called with the ne_pci_dev enclave mutex held.
+ *
+ * @pdev: PCI device used for enclave lifetime management.
+ * @ne_pci_dev: private data associated with the PCI device.
+ * @type: type of the virtual machine to be created.
+ *
+ * @returns: enclave fd on success, negative return value on failure.
+ */
+static int ne_create_vm_ioctl(struct pci_dev *pdev,
+ struct ne_pci_dev *ne_pci_dev, unsigned long type)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   int fd = 0;
+   struct file *file = NULL;
+   struct ne_enclave *ne_enclave = NULL;
+   int rc = -EINVAL;
+   struct slot_alloc_req slot_alloc_req = {};
+
+   if (WARN_ON(!pdev) || WARN_ON(!ne_pci_dev))
+   return -EINVAL;
+
+   ne_enclave = kzalloc(sizeof(*ne_enclave), GFP_KERNEL);
+   if (!ne_enclave)
+   return -ENOMEM;
+
+   if (!zalloc_cpumask_var(&ne_enclave->cpu_siblings, GFP_KERNEL)) {
+   kzfree(ne_enclave);
+
+   return -ENOMEM;
+   }
+
+   fd = get_unused_fd_flags(O_CLOEXEC);
+   if (fd < 0) {
+   rc = fd;
+
+   pr_err_ratelimited(NE "Error in getting unused fd [rc=%d]\n",
+  rc);
+
+   goto free_cpumask;
+   }
+
+   file = anon_inode_getfile("ne-vm", &ne_enclave_fops, ne_enclave,
+ O_RDWR);
+   if (IS_ERR(file)) {
+   rc = PTR_ERR(file);
+
+   pr_err_ratelimited(NE "Error in anon inode get file [rc=%d]\n",
+  rc);
+
+   goto put_fd;
+   }
+
+   ne_enclave->pdev = pdev;
+
+   rc = ne_do_request(ne_enclave->pdev, SLOT_ALLOC, &slot_alloc_req,
+  sizeof(slot_alloc_req), &cmd_reply,
+  sizeof(cmd_reply));
+   if (rc < 0) {
+   pr_err_ratelimited(NE "Error in slot alloc [rc=%d]\n", rc);
+
+   goto put_file;
+   }
+
+   init_waitqueue_head(&ne_enclave->eventq);
+   ne_enclave->has_event = false;
+   mutex_init(&ne_enclave->enclave_info_mutex);
+   ne_enclave->max_mem_regions = cmd_reply.mem_regions;
+   INIT_LIST_HEAD(&ne_enclave->mem_regions_list);
+   ne_enclave->mm = current->mm;
+   ne_enclave->slot_uid = cmd_reply.slot_uid;
+   ne_enclave->state = NE_STATE_INIT;
+   INIT_LIST_HEAD(&ne_enclave->vcpu_ids_list);
+
+   list_add(&ne_enclave->enclave_list_entry, &ne_pci_dev->enclaves_list);
+
+   fd_install(fd, file);
+
+   return fd;
+
+put_file:
+   fput(file);
+put_fd:
+   put_unused_fd(fd);
+free_cpu

[PATCH v2 07/18] nitro_enclaves: Init misc device providing the ioctl interface

2020-05-21 Thread Andra Paraschiv

The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.

This ioctl interface is mapped to a Nitro Enclaves misc device.

Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_misc_dev.c | 196 ++
 drivers/virt/nitro_enclaves/ne_pci_dev.c  |  13 ++
 2 files changed, 209 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_misc_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_misc_dev.c 
b/drivers/virt/nitro_enclaves/ne_misc_dev.c
new file mode 100644
index ..e1866fac8220
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_misc_dev.c
@@ -0,0 +1,196 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+/**
+ * Enclave lifetime management driver for Nitro Enclaves (NE).
+ * Nitro is a hypervisor that has been developed by Amazon.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define MIN_MEM_REGION_SIZE (2 * 1024UL * 1024UL)
+
+#define NE "nitro_enclaves: "
+
+#define NE_DEV_NAME "nitro_enclaves"
+
+#define NE_IMAGE_LOAD_OFFSET (8 * 1024UL * 1024UL)
+
+static char *ne_cpus;
+module_param(ne_cpus, charp, 0644);
+MODULE_PARM_DESC(ne_cpus, " - CPU pool used for Nitro Enclaves");
+
+/* CPU pool used for Nitro Enclaves. */
+struct ne_cpu_pool {
+   /* Available CPUs in the pool. */
+   cpumask_var_t avail;
+   struct mutex mutex;
+};
+
+static struct ne_cpu_pool ne_cpu_pool;
+
+static int ne_open(struct inode *node, struct file *file)
+{
+   return 0;
+}
+
+static long ne_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+   switch (cmd) {
+
+   default:
+   return -ENOTTY;
+   }
+
+   return 0;
+}
+
+static int ne_release(struct inode *inode, struct file *file)
+{
+   return 0;
+}
+
+static const struct file_operations ne_fops = {
+   .owner  = THIS_MODULE,
+   .llseek = noop_llseek,
+   .unlocked_ioctl = ne_ioctl,
+   .open   = ne_open,
+   .release= ne_release,
+};
+
+struct miscdevice ne_miscdevice = {
+   .minor  = MISC_DYNAMIC_MINOR,
+   .name   = NE_DEV_NAME,
+   .fops   = &ne_fops,
+   .mode   = 0660,
+};
+
+static int __init ne_init(void)
+{
+   unsigned int cpu = 0;
+   unsigned int cpu_sibling = 0;
+   int rc = -EINVAL;
+
+   memset(&ne_cpu_pool, 0, sizeof(ne_cpu_pool));
+
+   if (!zalloc_cpumask_var(&ne_cpu_pool.avail, GFP_KERNEL))
+   return -ENOMEM;
+
+   mutex_init(&ne_cpu_pool.mutex);
+
+   rc = cpulist_parse(ne_cpus, ne_cpu_pool.avail);
+   if (rc < 0) {
+   pr_err_ratelimited(NE "Error in cpulist parse [rc=%d]\n", rc);
+
+   goto free_cpumask;
+   }
+
+   /*
+* Check if CPU siblings are included in the provided CPU pool. The
+* expectation is that CPU cores are made available in the CPU pool for
+* enclaves.
+*/
+   for_each_cpu(cpu, ne_cpu_pool.avail) {
+   for_each_cpu(cpu_sibling, topology_sibling_cpumask(cpu)) {
+   if (!cpumask_test_cpu(cpu_sibling, ne_cpu_pool.avail)) {
+   pr_err_ratelimited(NE "CPU %d is not in pool\n",
+  cpu_sibling);
+
+   rc = -EINVAL;
+
+   goto free_cpumask;
+   }
+   }
+   }
+
+   for_each_cpu(cpu, ne_cpu_pool.avail) {
+   rc = remove_cpu(cpu);
+   if (rc != 0) {
+   pr_err_ratelimited(NE "CPU %d not offlined [rc=%d]\n",
+  cpu, rc);
+
+   goto online_cpus;
+   }
+   }
+
+   rc = pci_register_driver(&ne_pci_driver);
+   if (rc < 0) {
+   pr_err_ratelimited(NE "Error in pci register driver [rc=%d]\n",
+  rc);
+
+   goto online_cpus;
+   }
+
+   return 0;
+
+online_c

[PATCH v2 01/18] nitro_enclaves: Add ioctl interface definition

2020-05-21 Thread Andra Paraschiv

The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.

An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.

Include part of the KVM ioctls in the provided ioctl interface, with
additional NE ioctl commands that e.g. triggers the enclave run.

Signed-off-by: Alexandru Vasile 
Signed-off-by: Andra Paraschiv 
---
 .../userspace-api/ioctl/ioctl-number.rst  |  5 +-
 include/linux/nitro_enclaves.h| 23 ++
 include/uapi/linux/nitro_enclaves.h   | 77 +++
 3 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/nitro_enclaves.h
 create mode 100644 include/uapi/linux/nitro_enclaves.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index f759edafd938..8a19b5e871d3 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -325,8 +325,11 @@ Code  Seq#Include File 
  Comments
 0xAC  00-1F  linux/raw.h
 0xAD  00 Netfilter 
device in development:
  

-0xAE  alllinux/kvm.h 
Kernel-based Virtual Machine
+0xAE  00-1F  linux/kvm.h 
Kernel-based Virtual Machine
  

+0xAE  40-FF  linux/kvm.h 
Kernel-based Virtual Machine
+ 

+0xAE  20-3F  linux/nitro_enclaves.h  Nitro 
Enclaves
 0xAF  00-1F  linux/fsl_hypervisor.h  Freescale 
hypervisor
 0xB0  allRATIO 
devices in development:
  

diff --git a/include/linux/nitro_enclaves.h b/include/linux/nitro_enclaves.h
new file mode 100644
index ..7e593a9fbf8c
--- /dev/null
+++ b/include/linux/nitro_enclaves.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef _LINUX_NITRO_ENCLAVES_H_
+#define _LINUX_NITRO_ENCLAVES_H_
+
+#include 
+
+#endif /* _LINUX_NITRO_ENCLAVES_H_ */
diff --git a/include/uapi/linux/nitro_enclaves.h 
b/include/uapi/linux/nitro_enclaves.h
new file mode 100644
index ..98ba59f15b52
--- /dev/null
+++ b/include/uapi/linux/nitro_enclaves.h
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef _UAPI_LINUX_NITRO_ENCLAVES_H_
+#define _UAPI_LINUX_NITRO_ENCLAVES_H_
+
+#include 
+#include 
+
+/* Nitro Enclaves (NE) Kernel Driver Interface */
+
+/**
+ * The command is used to get information needed for in-memory enclave image
+ * loading e.g. offset in enclave memory to start placing the enclave image.
+ *
+ * The image load metadata is an in / out data structure. It includes info
+ * provided by the caller - flags - and returns the offset in enclave memory
+ * where to start placing the enclave image.
+ *

[PATCH v2 02/18] nitro_enclaves: Define the PCI device interface

2020-05-21 Thread Andra Paraschiv

The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.

This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself run, e.g. to launch a VM that is used
for the enclave.

Define the MMIO space of the PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the functions for the PCI device init
/ uninit and command requests handling.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.h | 266 +++
 1 file changed, 266 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.h

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.h 
b/drivers/virt/nitro_enclaves/ne_pci_dev.h
new file mode 100644
index ..1d5d5f4872d6
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.h
@@ -0,0 +1,266 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef _NE_PCI_DEV_H_
+#define _NE_PCI_DEV_H_
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Nitro Enclaves (NE) PCI device identifier */
+
+#define PCI_DEVICE_ID_NE (0xe4c1)
+#define PCI_BAR_NE (0x03)
+
+/* Device registers */
+
+/**
+ * (1 byte) Register to notify the device that the driver is using it
+ * (Read/Write).
+ */
+#define NE_ENABLE (0x)
+#define NE_ENABLE_OFF (0x00)
+#define NE_ENABLE_ON (0x01)
+
+/* (2 bytes) Register to select the device run-time version (Read/Write). */
+#define NE_VERSION (0x0002)
+#define NE_VERSION_MAX (0x0001)
+
+/**
+ * (4 bytes) Register to notify the device what command was requested
+ * (Write-Only).
+ */
+#define NE_COMMAND (0x0004)
+
+/**
+ * (4 bytes) Register to notify the driver that a reply or a device event
+ * is available (Read-Only):
+ * - Lower half  - command reply counter
+ * - Higher half - out-of-band device event counter
+ */
+#define NE_EVTCNT (0x000c)
+#define NE_EVTCNT_REPLY_SHIFT (0)
+#define NE_EVTCNT_REPLY_MASK (0x)
+#define NE_EVTCNT_REPLY(cnt) (((cnt) & NE_EVTCNT_REPLY_MASK) >> \
+ NE_EVTCNT_REPLY_SHIFT)
+#define NE_EVTCNT_EVENT_SHIFT (16)
+#define NE_EVTCNT_EVENT_MASK (0x)
+#define NE_EVTCNT_EVENT(cnt) (((cnt) & NE_EVTCNT_EVENT_MASK) >> \
+ NE_EVTCNT_EVENT_SHIFT)
+
+/* (240 bytes) Buffer for sending the command request payload (Read/Write). */
+#define NE_SEND_DATA (0x0010)
+
+/* (240 bytes) Buffer for receiving the command reply payload (Read-Only). */
+#define NE_RECV_DATA (0x0100)
+
+/* Device MMIO buffer sizes */
+
+/* 240 bytes for send / recv buffer. */
+#define NE_SEND_DATA_SIZE (240)
+#define NE_RECV_DATA_SIZE (240)
+
+/* MSI-X interrupt vectors */
+
+/* MSI-X vector used for command reply notification. */
+#define NE_VEC_REPLY (0)
+
+/* MSI-X vector used for out-of-band events e.g. enclave crash. */
+#define NE_VEC_EVENT (1)
+
+/* Device command types. */
+enum ne_pci_dev_cmd_type {
+   INVALID_CMD = 0,
+   ENCLAVE_START = 1,
+   ENCLAVE_GET_SLOT = 2,
+   ENCLAVE_STOP = 3,
+   SLOT_ALLOC = 4,
+   SLOT_FREE = 5,
+   SLOT_ADD_MEM = 6,
+   SLOT_ADD_VCPU = 7,
+   SLOT_COUNT = 8,
+   NEXT_SLOT = 9,
+   SLOT_INFO = 10,
+   SLOT_ADD_BULK_VCPUS = 11,
+   MAX_CMD,
+};
+
+/* Device commands - payload structure for requests and replies. */
+
+struct enclave_start_req {
+   /* Slot unique id mapped to the enclave to start. */
+   u64 slot_uid;
+
+   /**
+* Context ID (CID) for the enclave vsock device.
+* If 0, CID is autogenerated.
+*/
+   u64 enclave_cid;
+
+   /* Flags for the enclave to start with (e.g. debug mode). */
+   u64 flags;
+} __attribute__ ((__packed__));
+
+struct enclave_get_slot_req {
+   /* Context ID (CID) for the enclave vsock device. */
+   u64 enclave_cid;
+} __attribute__ ((__packed__));
+
+struct enclave_stop_req {
+   /* Slot unique id mapped to the enclave to stop. */
+

[PATCH v2 06/18] nitro_enclaves: Handle out-of-band PCI device events

2020-05-21 Thread Andra Paraschiv

In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.

Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.

Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd. The
enclave fd is returned as a result of KVM_CREATE_VM ioctl call.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 117 +++
 1 file changed, 117 insertions(+)

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
index 7b2d8e1b49a5..cd2012d9b365 100644
--- a/drivers/virt/nitro_enclaves/ne_pci_dev.c
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -304,6 +304,85 @@ static irqreturn_t ne_reply_handler(int irq, void *args)
return IRQ_HANDLED;
 }
 
+/**
+ * ne_event_work_handler - Work queue handler for notifying enclaves on
+ * a state change received by the event interrupt handler.
+ *
+ * An out-of-band event is being issued by the Nitro Hypervisor when at least
+ * one enclave is changing state without client interaction.
+ *
+ * @work: item containing the Nitro Enclaves PCI device for which a
+ *   out-of-band event was issued.
+ */
+static void ne_event_work_handler(struct work_struct *work)
+{
+   struct ne_pci_dev_cmd_reply cmd_reply = {};
+   struct ne_enclave *ne_enclave = NULL;
+   struct ne_pci_dev *ne_pci_dev =
+   container_of(work, struct ne_pci_dev, notify_work);
+   int rc = -EINVAL;
+   struct slot_info_req slot_info_req = {};
+
+   mutex_lock(&ne_pci_dev->enclaves_list_mutex);
+
+   /*
+* Iterate over all enclaves registered for the Nitro Enclaves
+* PCI device and determine for which enclave(s) the out-of-band event
+* is corresponding to.
+*/
+   list_for_each_entry(ne_enclave, &ne_pci_dev->enclaves_list,
+   enclave_list_entry) {
+   mutex_lock(&ne_enclave->enclave_info_mutex);
+
+   /*
+* Enclaves that were never started cannot receive out-of-band
+* events.
+*/
+   if (ne_enclave->state != NE_STATE_RUNNING)
+   goto unlock;
+
+   slot_info_req.slot_uid = ne_enclave->slot_uid;
+
+   rc = ne_do_request(ne_enclave->pdev, SLOT_INFO, &slot_info_req,
+  sizeof(slot_info_req), &cmd_reply,
+  sizeof(cmd_reply));
+   WARN_ON(rc < 0);
+
+   /* Notify enclave process that the enclave state changed. */
+   if (ne_enclave->state != cmd_reply.state) {
+   ne_enclave->state = cmd_reply.state;
+
+   ne_enclave->has_event = true;
+
+   wake_up_interruptible(&ne_enclave->eventq);
+   }
+
+unlock:
+mutex_unlock(&ne_enclave->enclave_info_mutex);
+   }
+
+   mutex_unlock(&ne_pci_dev->enclaves_list_mutex);
+}
+
+/**
+ * ne_event_handler - Interrupt handler for PCI device out-of-band
+ * events. This interrupt does not supply any data in the MMIO region.
+ * It notifies a change in the state of any of the launched enclaves.
+ *
+ * @irq: received interrupt for an out-of-band event.
+ * @args: PCI device private data structure.
+ *
+ * @returns: IRQ_HANDLED on handled interrupt, IRQ_NONE otherwise.
+ */
+static irqreturn_t ne_event_handler(int irq, void *args)
+{
+   struct ne_pci_dev *ne_pci_dev = (struct ne_pci_dev *)args;
+
+   queue_work(ne_pci_dev->event_wq, &ne_pci_dev->notify_work);
+
+   return IRQ_HANDLED;
+}
+
 /**
  * ne_setup_msix - Setup MSI-X vectors for the PCI device.
  *
@@ -359,8 +438,40 @@ static int ne_setup_msix(struct pci_dev *pdev)
goto free_irq_vectors;
}
 
+   ne_pci_dev->event_wq = create_singlethread_workqueue("ne_pci_dev_wq");
+   if (!ne_pci_dev->event_wq) {
+   rc = -ENOMEM;
+
+   dev_err_ratelimited(&pdev->dev,
+   NE "Cannot get wq for dev events [rc=%d]\n",
+   rc);
+
+   goto free_reply_irq_vec;
+   }
+
+   INIT_WORK(&ne_pci_dev->notify_work, ne_event_work_handler);
+
+   /*
+* This IRQ gets triggered every time any enclave's state changes. Its
+* handler then scans for the changes and propagates them to the user
+* space.
+*/
+

Re: [PATCH] drm/vblank: Fix -Wformat compile warnings on some arches

2020-05-21 Thread Thomas Zimmermann



Am 21.05.20 um 22:46 schrieb Lyude Paul:
> On some architectures like ppc64le and aarch64, compiling with
> -Wformat=1 will throw the following warnings:
> 
>   In file included from drivers/gpu/drm/drm_vblank.c:33:
>   drivers/gpu/drm/drm_vblank.c: In function 'drm_update_vblank_count':
>   drivers/gpu/drm/drm_vblank.c:273:16: warning: format '%llu' expects
>   argument of type 'long long unsigned int', but argument 4 has type
>   'long int' [-Wformat=]
> DRM_DEBUG_VBL("updating vblank count on crtc %u:"
>   ^~~
>   ./include/drm/drm_print.h:407:22: note: in definition of macro
>   'DRM_DEBUG_VBL'
> drm_dbg(DRM_UT_VBL, fmt, ##__VA_ARGS__)
> ^~~
>   drivers/gpu/drm/drm_vblank.c:274:22: note: format string is defined here
>" current=%llu, diff=%u, hw=%u hw_last=%u\n",
>  ~~~^
>  %lu
> 
> So, fix that with a typecast.
> 
> Signed-off-by: Lyude Paul 
> Co-developed-by: Dave Airlie 

Reviewed-by: Thomas Zimmermann 

Thanks!

> ---
>  drivers/gpu/drm/drm_vblank.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index acb3c3f65b579..1a96db2dd16fa 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -342,7 +342,7 @@ static void drm_update_vblank_count(struct drm_device 
> *dev, unsigned int pipe,
>  
>   DRM_DEBUG_VBL("updating vblank count on crtc %u:"
> " current=%llu, diff=%u, hw=%u hw_last=%u\n",
> -   pipe, atomic64_read(&vblank->count), diff,
> +   pipe, (unsigned long long)atomic64_read(&vblank->count), 
> diff,
> cur_vblank, vblank->last);
>  
>   if (diff == 0) {
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature

Re: [PATCH v5 05/19] mtd: spi-nor: add support for DTR protocol

2020-05-21 Thread masonccyang



Hi Pratyush,


> +/**
> + * spi_nor_spimem_setup_op() - Set up common properties of a spi-mem 
op.
> + * @nor:  pointer to a 'struct spi_nor'
> + * @op: pointer to the 'struct spi_mem_op' whose properties
> + * need to be initialized.
> + * @proto:  the protocol from which the properties need to be set.
> + */
> +void spi_nor_spimem_setup_op(const struct spi_nor *nor,
> +  struct spi_mem_op *op,
> +  const enum spi_nor_protocol proto)
> +{
> +   u8 ext;
> +
> +   op->cmd.buswidth = spi_nor_get_protocol_inst_nbits(proto);
> +
> +   if (op->addr.nbytes)
> +  op->addr.buswidth = spi_nor_get_protocol_addr_nbits(proto);
> +
> +   if (op->dummy.nbytes)
> +  op->dummy.buswidth = spi_nor_get_protocol_addr_nbits(proto);
> +
> +   if (op->data.nbytes)
> +  op->data.buswidth = spi_nor_get_protocol_data_nbits(proto);
> +
> +   if (spi_nor_protocol_is_dtr(proto)) {

As mentioned before that I am also patching mx25* which supports 8S-8S-8S 
and 
8D-8D-8D mode.

please patch to spi_nor_protocol_is_8_8_8(proto) for 8S-8S-8S mode 
support.

> +  /*
> +   * spi-mem supports mixed DTR modes, but right now we can only
> +   * have all phases either DTR or STR. IOW, spi-mem can have
> +   * something like 4S-4D-4D, but spi-nor can't. So, set all 4
> +   * phases to either DTR or STR.
> +   */

if (spi_nor_protocol_is_8D_8D_8D(proto) {

> +  op->cmd.dtr = op->addr.dtr = op->dummy.dtr
> += op->data.dtr = true;
> +
> +  /* 2 bytes per clock cycle in DTR mode. */
> +  op->dummy.nbytes *= 2;

}

> +
> +  ext = spi_nor_get_cmd_ext(nor, op);
> +  op->cmd.opcode = (op->cmd.opcode << 8) | ext;
> +  op->cmd.nbytes = 2;
> +   }
> +}
> +

thanks & best regards,
Mason


CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information 
and/or personal data, which is protected by applicable laws. Please be 
reminded that duplication, disclosure, distribution, or use of this e-mail 
(and/or its attachments) or any part thereof is prohibited. If you receive 
this e-mail in error, please notify us immediately and delete this mail as 
well as its attachment(s) from your system. In addition, please be 
informed that collection, processing, and/or use of personal data is 
prohibited unless expressly permitted by personal data protection laws. 
Thank you for your attention and cooperation.

Macronix International Co., Ltd.

=





CONFIDENTIALITY NOTE:

This e-mail and any attachments may contain confidential information and/or 
personal data, which is protected by applicable laws. Please be reminded that 
duplication, disclosure, distribution, or use of this e-mail (and/or its 
attachments) or any part thereof is prohibited. If you receive this e-mail in 
error, please notify us immediately and delete this mail as well as its 
attachment(s) from your system. In addition, please be informed that 
collection, processing, and/or use of personal data is prohibited unless 
expressly permitted by personal data protection laws. Thank you for your 
attention and cooperation.

Macronix International Co., Ltd.

=

[PATCH v2 04/18] nitro_enclaves: Init PCI device driver

2020-05-21 Thread Andra Paraschiv

The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.

Setup the PCI device driver and add support for MSI-X interrupts.

Signed-off-by: Alexandru-Catalin Vasile 
Signed-off-by: Alexandru Ciobotaru 
Signed-off-by: Andra Paraschiv 
---
 drivers/virt/nitro_enclaves/ne_pci_dev.c | 303 +++
 1 file changed, 303 insertions(+)
 create mode 100644 drivers/virt/nitro_enclaves/ne_pci_dev.c

diff --git a/drivers/virt/nitro_enclaves/ne_pci_dev.c 
b/drivers/virt/nitro_enclaves/ne_pci_dev.c
new file mode 100644
index ..8e39e30c882f
--- /dev/null
+++ b/drivers/virt/nitro_enclaves/ne_pci_dev.c
@@ -0,0 +1,303 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+/* Nitro Enclaves (NE) PCI device driver. */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ne_misc_dev.h"
+#include "ne_pci_dev.h"
+
+#define DEFAULT_TIMEOUT_MSECS (12) // 120 sec
+
+#define NE "nitro_enclaves: "
+
+static const struct pci_device_id ne_pci_ids[] = {
+   { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, PCI_DEVICE_ID_NE) },
+   { 0, }
+};
+
+MODULE_DEVICE_TABLE(pci, ne_pci_ids);
+
+/**
+ * ne_setup_msix - Setup MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to setup the MSI-X for.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_setup_msix(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+   int nr_vecs = 0;
+   int rc = -EINVAL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev))
+   return -EINVAL;
+
+   nr_vecs = pci_msix_vec_count(pdev);
+   if (nr_vecs < 0) {
+   rc = nr_vecs;
+
+   dev_err_ratelimited(&pdev->dev,
+   NE "Error in getting vec count [rc=%d]\n",
+   rc);
+
+   return rc;
+   }
+
+   rc = pci_alloc_irq_vectors(pdev, nr_vecs, nr_vecs, PCI_IRQ_MSIX);
+   if (rc < 0) {
+   dev_err_ratelimited(&pdev->dev,
+   NE "Error in alloc MSI-X vecs [rc=%d]\n",
+   rc);
+
+   return rc;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_teardown_msix - Teardown MSI-X vectors for the PCI device.
+ *
+ * @pdev: PCI device to teardown the MSI-X for.
+ */
+static void ne_teardown_msix(struct pci_dev *pdev)
+{
+   struct ne_pci_dev *ne_pci_dev = NULL;
+
+   if (WARN_ON(!pdev))
+   return;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev))
+   return;
+
+   pci_free_irq_vectors(pdev);
+}
+
+/**
+ * ne_pci_dev_enable - Select PCI device version and enable it.
+ *
+ * @pdev: PCI device to select version for and then enable.
+ *
+ * @returns: 0 on success, negative return value on failure.
+ */
+static int ne_pci_dev_enable(struct pci_dev *pdev)
+{
+   u8 dev_enable_reply = 0;
+   u16 dev_version_reply = 0;
+   struct ne_pci_dev *ne_pci_dev = NULL;
+
+   if (WARN_ON(!pdev))
+   return -EINVAL;
+
+   ne_pci_dev = pci_get_drvdata(pdev);
+   if (WARN_ON(!ne_pci_dev) || WARN_ON(!ne_pci_dev->iomem_base))
+   return -EINVAL;
+
+   iowrite16(NE_VERSION_MAX, ne_pci_dev->iomem_base + NE_VERSION);
+
+   dev_version_reply = ioread16(ne_pci_dev->iomem_base + NE_VERSION);
+   if (dev_version_reply != NE_VERSION_MAX) {
+   dev_err_ratelimited(&pdev->dev,
+   NE "Error in pci dev version cmd\n");
+
+   return -EIO;
+   }
+
+   iowrite8(NE_ENABLE_ON, ne_pci_dev->iomem_base + NE_ENABLE);
+
+   dev_enable_reply = ioread8(ne_pci_dev->iomem_base + NE_ENABLE);
+   if (dev_enable_reply != NE_ENABLE_ON) {
+   dev_err_ratelimited(&pdev->dev,
+   NE "Error in pci dev enable cmd\n");
+
+   return -EIO;
+   }
+
+   return 0;
+}
+
+/**
+ * ne_pci_dev_disable - Disable PCI device.
+ *
+ * @pdev: PCI device to dis

[PATCH v2 00/18] Add support for Nitro Enclaves

2020-05-21 Thread Andra Paraschiv

Nitro Enclaves (NE) is a new Amazon Elastic Compute Cloud (EC2) capability
that allows customers to carve out isolated compute environments within EC2
instances [1].

For example, an application that processes sensitive data and runs in a VM,
can be separated from other applications running in the same VM. This
application then runs in a separate VM than the primary VM, namely an enclave.

An enclave runs alongside the VM that spawned it. This setup matches low latency
applications needs. The resources that are allocated for the enclave, such as
memory and CPU, are carved out of the primary VM. Each enclave is mapped to a
process running in the primary VM, that communicates with the NE driver via an
ioctl interface.

In this sense, there are two components:

1. An enclave abstraction process - a user space process running in the primary
VM guest  that uses the provided ioctl interface of the NE driver to spawn an
enclave VM (that's 2 below).

How does all gets to an enclave VM running on the host?

There is a NE emulated PCI device exposed to the primary VM. The driver for this
new PCI device is included in the current patch series.

The ioctl logic is mapped to PCI device commands e.g. the NE_START_ENCLAVE ioctl
maps to an enclave start PCI command or the KVM_SET_USER_MEMORY_REGION maps to
an add memory PCI command. The PCI device commands are then translated into
actions taken on the hypervisor side; that's the Nitro hypervisor running on the
host where the primary VM is running. The Nitro hypervisor is based on core KVM
technology.

2. The enclave itself - a VM running on the same host as the primary VM that
spawned it. Memory and CPUs are carved out of the primary VM and are dedicated
for the enclave VM. An enclave does not have persistent storage attached.

An enclave communicates with the primary VM via a local communication channel,
using virtio-vsock [2]. The primary VM has virtio-pci vsock emulated device,
while the enclave VM has a virtio-mmio vsock emulated device. The vsock device
uses eventfd for signaling. The enclave VM sees the usual interfaces - local
APIC and IOAPIC - to get interrupts from virtio-vsock device. The virtio-mmio
device is placed in memory below the typical 4 GiB.

The application that runs in the enclave needs to be packaged in an enclave
image together with the OS ( e.g. kernel, ramdisk, init ) that will run in the
enclave VM. The enclave VM has its own kernel and follows the standard Linux
boot protocol.

The kernel bzImage, the kernel command line, the ramdisk(s) are part of the
Enclave Image Format (EIF); plus an EIF header including metadata such as magic
number, eif version, image size and CRC.

Hash values are computed for the entire enclave image (EIF), the kernel and
ramdisk(s). That's used, for example, to check that the enclave image that is
loaded in the enclave VM is the one that was intended to be run.

These crypto measurements are included in a signed attestation document
generated by the Nitro Hypervisor and further used to prove the identity of the
enclave; KMS is an example of service that NE is integrated with and that checks
the attestation doc.

The enclave image (EIF) is loaded in the enclave memory at offset 8 MiB. The
init process in the enclave connects to the vsock CID of the primary VM and a
predefined port - 9000 - to send a heartbeat value - 0xb7. This mechanism is
used to check in the primary VM that the enclave has booted.

If the enclave VM crashes or gracefully exits, an interrupt event is received by
the NE driver. This event is sent further to the user space enclave process
running in the primary VM via a poll notification mechanism. Then the user space
enclave process can exit.

The following patch series covers the NE driver for enclave lifetime management.
It provides an ioctl interface to the user space and includes the NE PCI device
driver that is the means of communication with the hypervisor running on the
host where the primary VM and the enclave are launched.

The proposed solution is following the KVM model and uses KVM ioctls to be able
to create and set resources for enclaves. Additional NE ioctl commands, besides
the ones provided by KVM, are used to start an enclave and get memory offset for
in-memory enclave image loading.

Thank you.

Andra

[1] https://aws.amazon.com/ec2/nitro/nitro-enclaves/
[2] http://man7.org/linux/man-pages/man7/vsock.7.html

---

Patch Series Changelog

The patch series is built on top of v5.7-rc6.

v1 -> v2

* Rebase on top of v5.7-rc6.
* Adapt codebase based on feedback from v1.
* Update ioctl number definition - major and minor.
* Add sample / documentation for the ioctl interface basic flow usage.
* Update cover letter to include more context on the NE overall.
* Add fix for the enclave / vcpu fd creation error cleanup path.
* Add fix reported by kbuild test robot .
* v1: https://lore.kernel.org/lkml/20200421184150.68011-1-andra...@amazon.com/

---

Andra Paraschiv (18):
  nitro_enclaves: Add ioc

Re: [PATCH 5/5] dt-bindings: timer: Add CLINT bindings

2020-05-21 Thread Sean Anderson

On 5/22/20 1:54 AM, Anup Patel wrote:
> On Fri, May 22, 2020 at 1:35 AM Sean Anderson  wrote:
>>
>> On 5/21/20 9:45 AM, Anup Patel wrote:
>>> +Required properties:
>>> +- compatible : "sifive,clint-1.0.0" and a string identifying the actual
>>> +  detailed implementation in case that specific bugs need to be worked 
>>> around.
>>
>> Should the "riscv,clint0" compatible string be documented here? This
> 
> Yes, I forgot to add this compatible string. I will add in v2.
> 
>> peripheral is not really specific to sifive, as it is present in most
>> rocket-chip cores.
> 
> I agree that CLINT is present in a lot of non-SiFive RISC-V SOCs and
> FPGAs but this IP is only documented as part of SiFive FU540 SOC.
> (Refer, https://static.dev.sifive.com/FU540-C000-v1.0.pdf)
> 
> The RISC-V foundation should host the CLINT spec independently
> under https://github.com/riscv and make CLINT spec totally open.
> 
> For now, I have documented it just like PLIC DT bindings found at:
> Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.txt

The PLIC seems to have its own RISC-V-sponsored documentation [1] which
was split off from the older privileged specs. By your logic above,
should it be renamed to riscv,plic0.txt (with a corresponding change in
the documented compatible strings)?

[1] https://github.com/riscv/riscv-plic-spec

> 
> If RISC-V maintainers agree then I will document it as "RISC-V CLINT".
> 
> @Palmer ?? @Paul ??
> 
> Regards,
> Anup
> 

--Sean

Re: [PATCH] iommu/dma: limit iova free size to unmmaped iova

2020-05-21 Thread guptap


On 2020-05-22 01:46, Robin Murphy wrote:

On 2020-05-21 12:30, Prakash Gupta wrote:
Limit the iova size while freeing based on unmapped size. In absence 
of
this even with unmap failure, invalid iova is pushed to iova rcache 
and

subsequently can cause panic while rcache magazine is freed.


Can you elaborate on that panic?


We have seen couple of stability issues around this.
Below is one such example:

kernel BUG at kernel/msm-4.19/drivers/iommu/iova.c:904!
iova_magazine_free_pfns
iova_rcache_insert
free_iova_fast
__iommu_unmap_page
iommu_dma_unmap_page


It turned out an iova pfn 0 got into iova_rcache. One possibility I see 
is
where client unmap with invalid dma_addr. The unmap call will fail and 
warn on
and still try to free iova. This will cause invalid pfn to be inserted 
into

rcache. As and when the magazine with invalid pfn will be freed
private_find_iova() will return NULL for invalid iova and meet bug 
condition.



Signed-off-by: Prakash Gupta 

:100644 100644 4959f5df21bd 098f7d377e04 M  drivers/iommu/dma-iommu.c

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 4959f5df21bd..098f7d377e04 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -472,7 +472,8 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,

if (!cookie->fq_domain)
iommu_tlb_sync(domain, &iotlb_gather);
-   iommu_dma_free_iova(cookie, dma_addr, size);
+   if (unmapped)
+   iommu_dma_free_iova(cookie, dma_addr, unmapped);


Frankly, if any part of the unmap fails then things have gone
catastrophically wrong already, but either way this isn't right. The
IOVA API doesn't support partial freeing - an IOVA *must* be freed
with its original size, or not freed at all, otherwise it will corrupt
the state of the rcaches and risk a cascade of further misbehaviour
for future callers.


I agree, we shouldn't be freeing the partial iova. Instead just making
sure if unmap was successful should be sufficient before freeing iova. 
So change

can instead be something like this:

-   iommu_dma_free_iova(cookie, dma_addr, size);
+   if (unmapped)
+   iommu_dma_free_iova(cookie, dma_addr, size);


TBH my gut feeling here is that you're really just trying to treat a
symptom of another bug elsewhere, namely some driver calling
dma_unmap_* or dma_free_* with the wrong address or size in the first
place.

This condition would arise only if driver calling dma_unmap/free_* with 
0
iova_pfn. This will be flagged with a warning during unmap but will 
trigger
panic later on while doing unrelated dma_map/unmap_*. If unmapped has 
already
failed for invalid iova, there is no reason we should consider this as 
valid

iova and free. This part should be fixed.

On 2020-05-22 00:19, Andrew Morton wrote:

I think we need a cc:stable here?


Added now.

Thanks,
Prakash

[PATCH] mm: swapfile: fix /proc/swaps heading and Size/Used/Priority alignment

2020-05-21 Thread Randy Dunlap

From: Randy Dunlap 

Fix the heading and Size/Used/Priority field alignments in
/proc/swaps. If the Size and/or Used value is >= 1000 (8 bytes),
then the alignment by using tab characters is broken.

This patch maintains the use of tabs for alignment. If spaces
are preferred, we can just use a Field Width specifier for the
bytes and inuse fields. That way those fields don't have to be
a multiple of 8 bytes in width. E.g., with a field width of 12,
both Size and Used would always fit on the first line of an
80-column wide terminal (only Priority would be on the second line).

There are actually 2 problems: heading alignment and field width.
On an xterm, if Used is 7 bytes in length, the tab does nothing, and
the display is like this, with no space/tab between the Used and
Priority fields. (ugh)

FilenameTypeSizeUsedPriority
/dev/sda8   partition   16779260
2023012-1

To be clear, if one does 'cat /proc/swaps >/tmp/proc.swaps', it does
look different, like so:

FilenameTypeSizeUsedPriority
/dev/sda8   partition   167792602086988 
-1

Signed-off-by: Randy Dunlap 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: Alexander Viro 
Cc: Andrew Morton 
---
 mm/swapfile.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

--- linux-next-20200521.orig/mm/swapfile.c
+++ linux-next-20200521/mm/swapfile.c
@@ -2753,20 +2753,24 @@ static int swap_show(struct seq_file *sw
struct swap_info_struct *si = v;
struct file *file;
int len;
+   unsigned int bytes, inuse;
 
if (si == SEQ_START_TOKEN) {
-   seq_puts(swap,"Filename\t\t\t\tType\t\tSize\tUsed\tPriority\n");
+   
seq_puts(swap,"Filename\t\t\t\tType\t\tSize\t\tUsed\t\tPriority\n");
return 0;
}
 
+   bytes = si->pages << (PAGE_SHIFT - 10);
+   inuse = si->inuse_pages << (PAGE_SHIFT - 10);
+
file = si->swap_file;
len = seq_file_path(swap, file, " \t\n\\");
-   seq_printf(swap, "%*s%s\t%u\t%u\t%d\n",
+   seq_printf(swap, "%*s%s\t%u\t%s%u\t%s%d\n",
len < 40 ? 40 - len : 1, " ",
S_ISBLK(file_inode(file)->i_mode) ?
"partition" : "file\t",
-   si->pages << (PAGE_SHIFT - 10),
-   si->inuse_pages << (PAGE_SHIFT - 10),
+   bytes, bytes < 1000 ? "\t" : "",
+   inuse, inuse < 1000 ? "\t" : "",
si->prio);
return 0;
 }

Re: [PATCH v1] usb: musb: dsps: set MUSB_DA8XX quirk for AM335x

2020-05-21 Thread Oleksij Rempel

On Wed, May 20, 2020 at 09:55:05AM -0500, Bin Liu wrote:
> On Wed, May 20, 2020 at 06:49:34AM +0200, Oleksij Rempel wrote:
> > On Tue, May 19, 2020 at 05:18:51PM -0500, Bin Liu wrote:
> > > Hi,
> > > 
> > > On Fri, Mar 27, 2020 at 06:38:49AM +0100, Oleksij Rempel wrote:
> > > > Beagle Bone Black has different memory corruptions if kernel is
> > > > configured with USB_TI_CPPI41_DMA=y. This issue is reproducible with
> > > > ath9k-htc driver (ar9271 based wifi usb controller):
> > > > 
> > > > root@AccessBox:~ iw dev wlan0 set monitor  fcsfail otherbss
> > > > root@AccessBox:~ ip l s dev wlan0 up
> > > > kmemleak: Cannot insert 0xda577e40 into the object search tree 
> > > > (overlaps existing)
> > > > CPU: 0 PID: 176 Comm: ip Not tainted 5.5.0 #7
> > > > Hardware name: Generic AM33XX (Flattened Device Tree)
> > > > [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
> > > > [] (show_stack) from [] (dump_stack+0x84/0x98)
> > > > [] (dump_stack) from [] (create_object+0x2f8/0x324)
> > > > [] (create_object) from [] 
> > > > (kmem_cache_alloc+0x1a8/0x39c)
> > > > [] (kmem_cache_alloc) from [] 
> > > > (__alloc_skb+0x60/0x174)
> > > > [] (__alloc_skb) from [] (ath9k_wmi_cmd+0x50/0x184 
> > > > [ath9k_htc])
> > > > [] (ath9k_wmi_cmd [ath9k_htc]) from [] 
> > > > (ath9k_regwrite_multi+0x54/0x84 [ath9k_htc])
> > > > [] (ath9k_regwrite_multi [ath9k_htc]) from [] 
> > > > (ath9k_regwrite+0xf0/0xfc [ath9k_htc])
> > > > [] (ath9k_regwrite [ath9k_htc]) from [] 
> > > > (ar5008_hw_process_ini+0x280/0x6c0 [ath9k_hw])
> > > > [] (ar5008_hw_process_ini [ath9k_hw]) from [] 
> > > > (ath9k_hw_reset+0x270/0x1458 [ath9k_hw])
> > > > [] (ath9k_hw_reset [ath9k_hw]) from [] 
> > > > (ath9k_htc_start+0xb0/0x22c [ath9k_htc])
> > > > [] (ath9k_htc_start [ath9k_htc]) from [] 
> > > > (drv_start+0x4c/0x1e8 [mac80211])
> > > > [] (drv_start [mac80211]) from [] 
> > > > (ieee80211_do_open+0x480/0x954 [mac80211])
> > > > [] (ieee80211_do_open [mac80211]) from [] 
> > > > (__dev_open+0xdc/0x160)
> > > > [] (__dev_open) from [] 
> > > > (__dev_change_flags+0x1a4/0x204)
> > > > [] (__dev_change_flags) from [] 
> > > > (dev_change_flags+0x20/0x50)
> > > > [] (dev_change_flags) from [] 
> > > > (do_setlink+0x2ac/0x978)
> > > > 
> > > > After applying this patch, the system is running in monitor mode without
> > > > noticeable issues.
> > > > 
> > > > Suggested-by: Michael Grzeschik 
> > > > Signed-off-by: Oleksij Rempel 
> > > > ---
> > > >  drivers/usb/musb/musb_dsps.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/usb/musb/musb_dsps.c b/drivers/usb/musb/musb_dsps.c
> > > > index 88923175f71e..c01f9e9e69f5 100644
> > > > --- a/drivers/usb/musb/musb_dsps.c
> > > > +++ b/drivers/usb/musb/musb_dsps.c
> > > > @@ -690,7 +690,7 @@ static void dsps_dma_controller_resume(struct 
> > > > dsps_glue *glue) {}
> > > >  #endif /* CONFIG_USB_TI_CPPI41_DMA */
> > > >  
> > > >  static struct musb_platform_ops dsps_ops = {
> > > > -   .quirks = MUSB_DMA_CPPI41 | MUSB_INDEXED_EP,
> > > > +   .quirks = MUSB_DMA_CPPI41 | MUSB_INDEXED_EP | 
> > > > MUSB_DA8XX,
> > > 
> > > The MUSB_DA8XX flag cannot be simply applied to MUSB_DSPS, at least the
> > > teardown and autoreq register offsets are different as show in
> > > cppi41_dma_controller_create().
> > 
> > ok
> > 
> > > Do you understand what exactly caused the issue?
> > 
> > No.
> > 
> > Disabling DMA support "solve" this issue as well.
> > 
> > Beside, with DMA support, there remains one more crash with different 
> > symptoms.
> > I can workaround it by disabling CPU Freq governor, or setting it to 
> > performance.
> > 
> > > The kernel trace above doesn't provide enuough information.
> > 
> > Do you have any suggestions how to instrument the kernel to get needed
> > information? Or, should I try to capture USB traffic before the crash? 
> 
> First can you please try the following patch instead?
> 
> diff --git a/drivers/usb/musb/musb_cppi41.c b/drivers/usb/musb/musb_cppi41.c
> index 7fbb8a307145..26c996f8b2bd 100644
> --- a/drivers/usb/musb/musb_cppi41.c
> +++ b/drivers/usb/musb/musb_cppi41.c
> @@ -614,7 +614,6 @@ static int cppi41_dma_channel_abort(struct dma_channel 
> *channel)
> }
> 
> /* DA8xx Advisory 2.3.27: wait 250 ms before to start the teardown */
> -   if (musb->ops->quirks & MUSB_DA8XX)
> mdelay(250);
> 
> tdbit = 1 << cppi41_channel->port_num;

The setup is running, it may take some time until it is reproduced.
Reproduce ability seems to depend on the traffic in the air. Since
currently most people are on vocation, there is not many things happen
on WiFi. 

> > 
> > If it helps, ath9k_htc is a usb wifi adapter. It generates a lot of
> > USB traffic on multiple endpoints. Bulk with data packets and Interrupt
> > with register accesses, LED blinking... etc.
> 
> Do you have a link to the picture or description of the adapter? I'd like
> to see if I can buy the same to tak

linux-next: manual merge of the tip tree with the arm64 tree

2020-05-21 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the tip tree got a conflict in:

  kernel/Makefile

between commit:

  d08b9f0ca660 ("scs: Add support for Clang's Shadow Call Stack (SCS)")

from the arm64 tree and commit:

  dfd402a4c4ba ("kcsan: Add Kernel Concurrency Sanitizer infrastructure")

from the tip tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc kernel/Makefile
index c332eb9d4841,5d935b63f812..
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@@ -103,7 -107,7 +107,8 @@@ obj-$(CONFIG_TRACEPOINTS) += trace
  obj-$(CONFIG_IRQ_WORK) += irq_work.o
  obj-$(CONFIG_CPU_PM) += cpu_pm.o
  obj-$(CONFIG_BPF) += bpf/
 +obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
+ obj-$(CONFIG_KCSAN) += kcsan/
  
  obj-$(CONFIG_PERF_EVENTS) += events/
  


pgpOHFCgl4aac.pgp
Description: OpenPGP digital signature

Re: Re: [PATCH] [v2] PCI: tegra194: Fix runtime PM imbalance on error

2020-05-21 Thread dinghao . liu


> On Thu, May 21, 2020 at 5:16 PM Bjorn Helgaas  wrote:
> >
> > [+cc Rafael, linux-pm]
> >
> > On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> > > pm_runtime_get_sync() increments the runtime PM usage counter even
> > > when it returns an error code. Thus a pairing decrement is needed on
> > > the error handling path to keep the counter balanced.
> >
> > I didn't realize there were so many drivers with the exact same issue.
> > Can we just squash these all into a single patch so we can see them
> > all together?
> >
> > Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
> > looks like many of them have similar issues, i.e., they have a pattern
> > like this
> >
> >   ret = pm_runtime_get_sync(dev);
> >   if (ret < 0)
> > return;
> >
> >   pm_runtime_put(dev);
> >
> > where there is not a pm_runtime_put() to match every
> > pm_runtime_get_sync().  Random sample:
> >
> >   nds32_pmu_reserve_hardware
> >   sata_rcar_probe
> >   exynos_trng_probe
> >   ks_sa_rng_probe
> >   omap_aes_probe
> >   sun8i_ss_probe
> >   omap_aes_probe
> >   zynq_gpio_probe
> >   amdgpu_hwmon_show_power_avg
> >   mtk_crtc_ddp_hw_init
> >   ...
> >
> > Surely I'm missing something and these aren't all broken, right?
> 
> If they do what you've said, they are all broken I'm afraid.
> 
> They should all be doing something like
> 
> ret = pm_runtime_get_sync(dev);
> if (ret < 0)
> goto out;
> 
> ...
> 
> out:
> pm_runtime_put(dev);
> 
> > Maybe we could put together a coccinelle script to scan the tree for
> > this issue?
> >
> > > Signed-off-by: Dinghao Liu 
> > > ---
> > >  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c 
> > > b/drivers/pci/controller/dwc/pcie-tegra194.c
> > > index ae30a2fd3716..2c0d2ce16b47 100644
> > > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > > @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct 
> > > tegra_pcie_dw *pcie)
> > >   ret = pinctrl_pm_select_default_state(dev);
> > >   if (ret < 0) {
> > >   dev_err(dev, "Failed to configure sideband pins: %d\n", 
> > > ret);
> > > - goto fail_pinctrl;
> > > + goto fail_pm_get_sync;
> > >   }
> > >
> > >   tegra_pcie_init_controller(pcie);
> > > @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct 
> > > tegra_pcie_dw *pcie)
> > >
> > >  fail_host_init:
> > >   tegra_pcie_deinit_controller(pcie);
> > > -fail_pinctrl:
> > > - pm_runtime_put_sync(dev);
> > >  fail_pm_get_sync:
> > > + pm_runtime_put_sync(dev);
> 
> Why not pm_runtime_put()?

Good question. For functions with PM decrement API somewhere, I 
will adopt it. If this API is not suitable here, please tell me.

> 
> > >   pm_runtime_disable(dev);
> > >   return ret;
> > >  }
> > > --
> > > 2.17.1
> > >

Re: [PATCH v8 5/5] dt-bindings: arm: fsl: add different Protonic boards

2020-05-21 Thread Oleksij Rempel

On Thu, May 21, 2020 at 02:00:02PM -0600, r...@kernel.org wrote:
> On Wed, 20 May 2020 17:41:16 +0200, Oleksij Rempel wrote:
> > Add Protonic PRTI6Q, WD2, RVT, VT7 boards.
> > 
> > Signed-off-by: Oleksij Rempel 
> > ---
> >  Documentation/devicetree/bindings/arm/fsl.yaml | 4 
> >  1 file changed, 4 insertions(+)
> > 
> 
> 
> Please add Acked-by/Reviewed-by tags when posting new versions. However,
> there's no need to repost patches *only* to add the tags. The upstream
> maintainer will do that for acks received on the version they apply.
> 
> If a tag was not added on purpose, please state why and what changed.

Sorry, there is no special reason. I just missed it.

Regards,
Oleksij
-- 
Pengutronix e.K.   | |
Steuerwalder Str. 21   | http://www.pengutronix.de/  |
31137 Hildesheim, Germany  | Phone: +49-5121-206917-0|
Amtsgericht Hildesheim, HRA 2686   | Fax:   +49-5121-206917- |


signature.asc
Description: PGP signature

linux-next: manual merge of the devicetree tree with the arm-soc tree

2020-05-21 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/arm/socionext/uniphier.yaml

between commit:

  82ab9b6705bd ("dt-bindings: arm: Add Akebi96 board support")

from the arm-soc tree and commit:

  9f60a65bc5e6 ("dt-bindings: Clean-up schema indentation formatting")

from the devicetree tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/devicetree/bindings/arm/socionext/uniphier.yaml
index 10a7f0752281,113f93b9ae55..
--- a/Documentation/devicetree/bindings/arm/socionext/uniphier.yaml
+++ b/Documentation/devicetree/bindings/arm/socionext/uniphier.yaml
@@@ -51,9 -51,8 +51,9 @@@ properties
- description: LD20 SoC boards
  items:
- enum:
- - socionext,uniphier-ld20-akebi96
- - socionext,uniphier-ld20-global
- - socionext,uniphier-ld20-ref
++  - socionext,uniphier-ld20-akebi96
+   - socionext,uniphier-ld20-global
+   - socionext,uniphier-ld20-ref
- const: socionext,uniphier-ld20
- description: PXs3 SoC boards
  items:


pgpHbLi0OKGn9.pgp
Description: OpenPGP digital signature

Re: [PATCH v2] 9p/xen: increase XEN_9PFS_RING_ORDER

2020-05-21 Thread Dominique Martinet

Stefano Stabellini wrote on Thu, May 21, 2020:
> From: Stefano Stabellini 
> 
> Increase XEN_9PFS_RING_ORDER to 9 for performance reason. Order 9 is the
> max allowed by the protocol.
> 
> We can't assume that all backends will support order 9. The xenstore
> property max-ring-page-order specifies the max order supported by the
> backend. We'll use max-ring-page-order for the size of the ring.
> 
> This means that the size of the ring is not static
> (XEN_FLEX_RING_SIZE(9)) anymore. Change XEN_9PFS_RING_SIZE to take an
> argument and base the calculation on the order chosen at setup time.
> 
> Finally, modify p9_xen_trans.maxsize to be divided by 4 compared to the
> original value. We need to divide it by 2 because we have two rings
> coming off the same order allocation: the in and out rings. This was a
> mistake in the original code. Also divide it further by 2 because we
> don't want a single request/reply to fill up the entire ring. There can
> be multiple requests/replies outstanding at any given time and if we use
> the full ring with one, we risk forcing the backend to wait for the
> client to read back more replies before continuing, which is not
> performant.
> 
> Signed-off-by: Stefano Stabellini 

LGTM, I'll try to find some time to test this by the end of next week or
will trust you if I can't make it -- ping me around June 1st if I don't
reply again until then...

Cheers,
-- 
Dominique

linux-next: manual merge of the devicetree tree with the pci tree

2020-05-21 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/pci/cdns-pcie.yaml

between commit:

  fb5f8f3ca5f8 ("dt-bindings: PCI: cadence: Deprecate inbound/outbound specific 
bindings")

from the pci tree and commit:

  3d21a4609335 ("dt-bindings: Remove cases of 'allOf' containing a '$ref'")

from the devicetree tree.

I fixed it up (the former removed the section modified by the latter,
so I just did that) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpt696c8IvXQ.pgp
Description: OpenPGP digital signature

Re: [PATCH -V2] swap: Reduce lock contention on swap cache from swap slots allocation

2020-05-21 Thread Huang, Ying

Daniel Jordan  writes:

> On Wed, May 20, 2020 at 11:15:02AM +0800, Huang Ying wrote:
>> @@ -2827,6 +2865,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  p = kvzalloc(struct_size(p, avail_lists, nr_node_ids), GFP_KERNEL);
>>  if (!p)
>>  return ERR_PTR(-ENOMEM);
>> +p->cluster_next_cpu = alloc_percpu(unsigned int);
>> +if (!p->cluster_next_cpu) {
>> +kvfree(p);
>> +return ERR_PTR(-ENOMEM);
>> +}
>
> There should be free_percpu()s at two places after this, but I think the
> allocation really belongs right...
>
>> @@ -3202,7 +3245,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, 
>> specialfile, int, swap_flags)
>>   * select a random position to start with to help wear leveling
>>   * SSD
>>   */
>> -p->cluster_next = 1 + prandom_u32_max(p->highest_bit);
>
> ...here because then it's only allocated when it's actually used.

Good catch!  And yes, this is the better place to allocate memory.  I
will fix this in the new version!  Thanks a lot!

Best Regards,
Huang, Ying

>> +for_each_possible_cpu(cpu) {
>> +per_cpu(*p->cluster_next_cpu, cpu) =
>> +1 + prandom_u32_max(p->highest_bit);
>> +}
>>  nr_cluster = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
>>  
>>  cluster_info = kvcalloc(nr_cluster, sizeof(*cluster_info),
>> -- 
>> 2.26.2
>> 
>>

[PATCH mmotm] mm/swap: fix livelock in __read_swap_cache_async()

2020-05-21 Thread Hugh Dickins

I've only seen this livelock on one machine (repeatably, but not to
order), and not fully analyzed it - two processes seen looping around
getting -EEXIST from swapcache_prepare(), I guess a third (at lower
priority? but wanting the same cpu as one of the loopers? preemption
or cond_resched() not enough to let it back in?) set SWAP_HAS_CACHE,
then went off into direct reclaim, scheduled away, and somehow could
not get back to add the page to swap cache and let them all complete.

Restore the page allocation in __read_swap_cache_async() to before
the swapcache_prepare() call: "mm: memcontrol: charge swapin pages
on instantiation" moved it outside the loop, which indeed looks much
nicer, but exposed this weakness.  We used to allocate new_page once
and then keep it across all iterations of the loop: but I think that
just optimizes for a rare case, and complicates the flow, so go with
the new simpler structure, with allocate+free each time around (which
is more considerate use of the memory too).

Fix the comment on the looping case, which has long been inaccurate:
it's not a racing get_swap_page() that's the problem here.

Fix the add_to_swap_cache() and mem_cgroup_charge() error recovery:
not swap_free(), but put_swap_page() to undo SWAP_HAS_CACHE, as was
done before; but delete_from_swap_cache() already includes it.

And one more nit: I don't think it makes any difference in practice,
but remove the "& GFP_KERNEL" mask from the mem_cgroup_charge() call:
add_to_swap_cache() needs that, to convert gfp_mask from user and page
cache allocation (e.g. highmem) to radix node allocation (lowmem), but
we don't need or usually apply that mask when charging mem_cgroup.

Signed-off-by: Hugh Dickins 
---
Mostly fixing mm-memcontrol-charge-swapin-pages-on-instantiation.patch
but now I see that mm-memcontrol-delete-unused-lrucare-handling.patch
made a further change here (took an arg off the mem_cgroup_charge call):
as is, this patch is diffed to go on top of both of them, and better
that I get it out now for Johannes look at; but could be rediffed for
folding into blah-instantiation.patch later.

Earlier in the day I promised two patches to __read_swap_cache_async(),
but find now that I cannot quite justify the second patch: it makes a
slight adjustment in swapcache_prepare(), then removes the redundant
__swp_swapcount() && swap_slot_cache_enabled business from blah_async().

I'd still like to do that, but this patch here brings back the
alloc_page_vma() in between them, and I don't have any evidence to
reassure us that I'm not then pessimizing a readahead case by doing
unnecessary allocation and free. Leave it for some other time perhaps.

 mm/swap_state.c |   52 +-
 1 file changed, 29 insertions(+), 23 deletions(-)

--- 5.7-rc6-mm1/mm/swap_state.c 2020-05-20 12:21:56.149694170 -0700
+++ linux/mm/swap_state.c   2020-05-21 20:17:50.188773901 -0700
@@ -392,56 +392,62 @@ struct page *__read_swap_cache_async(swp
return NULL;
 
/*
+* Get a new page to read into from swap.  Allocate it now,
+* before marking swap_map SWAP_HAS_CACHE, when -EEXIST will
+* cause any racers to loop around until we add it to cache.
+*/
+   page = alloc_page_vma(gfp_mask, vma, addr);
+   if (!page)
+   return NULL;
+
+   /*
 * Swap entry may have been freed since our caller observed it.
 */
err = swapcache_prepare(entry);
if (!err)
break;
 
-   if (err == -EEXIST) {
-   /*
-* We might race against get_swap_page() and stumble
-* across a SWAP_HAS_CACHE swap_map entry whose page
-* has not been brought into the swapcache yet.
-*/
-   cond_resched();
-   continue;
-   }
+   put_page(page);
+   if (err != -EEXIST)
+   return NULL;
 
-   return NULL;
+   /*
+* We might race against __delete_from_swap_cache(), and
+* stumble across a swap_map entry whose SWAP_HAS_CACHE
+* has not yet been cleared.  Or race against another
+* __read_swap_cache_async(), which has set SWAP_HAS_CACHE
+* in swap_map, but not yet added its page to swap cache.
+*/
+   cond_resched();
}
 
/*
-* The swap entry is ours to swap in. Prepare a new page.
+* The swap entry is ours to swap in. Prepare the new page.
 */
 
-   page = alloc_page_vma(gfp_mask, vma, addr);
-   if (!page)
-   goto fail_free;
-
__SetPageLocked(page);
__SetPageSwapBacked(page);
 
/* May fa

[PATCH] capabilities: Introduce CAP_RESTORE

2020-05-21 Thread Adrian Reber

This enables CRIU to checkpoint and restore a process as non-root.

Over the last years CRIU upstream has been asked a couple of time if it
is possible to checkpoint and restore a process as non-root. The answer
usually was: 'almost'.

The main blocker to restore a process was that selecting the PID of the
restored process, which is necessary for CRIU, is guarded by CAP_SYS_ADMIN.

In the last two years the questions about checkpoint/restore as non-root
have increased and especially in the last few months we have seen
multiple people inventing workarounds.

The use-cases so far and their workarounds:

 * Checkpoint/Restore in an HPC environment in combination with
   a resource manager distributing jobs. Users are always running
   as non root, but there was the desire to provide a way to
   checkpoint and restore long running jobs.
   Workaround: setuid wrapper to start CRIU as root as non-root
   
https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
 * Another use case to checkpoint/restore processes as non-root
   uses as workaround a non privileged process which cycles through
   PIDs by calling fork() as fast as possible with a rate of
   100,000 pids/s instead of writing to ns_last_pid
   https://github.com/twosigma/set_ns_last_pid
 * Fast Java startup using checkpoint/restore.
   We have been in contact with JVM developers who are integrating
   CRIU into a JVM to decrease the startup time.
   Workaround so far: patch out CAP_SYS_ADMIN checks in the kernel
 * Container migration as non root. There are people already
   using CRIU to migrate containers as non-root. The solution
   there is to run it in a user namespace. So if you are able
   to carefully setup your environment with the namespaces
   it is already possible to restore a container/process as non-root.
   Unfortunately it is not always possible to setup an environment
   in such a way and for easier access to non-root based container
   migration this patch is also required.

There are probably a few more things guarded by CAP_SYS_ADMIN required
to run checkpoint/restore as non-root, but by applying this patch I can
already checkpoint and restore processes as non-root. As there are
already multiple workarounds I would prefer to do it correctly in the
kernel to avoid that CRIU users are starting to invent more workarounds.

I have used the following tests to verify that this change works as
expected by setting the new capability CAP_RESTORE on the two resulting
test binaries:

$ cat ns_last_pid.c
 // http://efiop-notes.blogspot.com/2014/06/how-to-set-pid-using-nslastpid.html
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

int main(int argc, char *argv[])
{
pid_t pid, new_pid;
char buf[32];
int fd;

if (argc != 2)
return 1;

printf("Opening ns_last_pid...\n");
fd = open("/proc/sys/kernel/ns_last_pid", O_RDWR | O_CREAT, 0644);
if (fd < 0) {
perror("Cannot open ns_last_pid");
return 1;
}

printf("Locking ns_last_pid...\n");
if (flock(fd, LOCK_EX)) {
close(fd);
printf("Cannot lock ns_last_pid\n");
return 1;
}

pid = atoi(argv[1]);
snprintf(buf, sizeof(buf), "%d", pid - 1);
printf("Writing pid-1 to ns_last_pid...\n");
if (write(fd, buf, strlen(buf)) != strlen(buf)) {
printf("Cannot write to buf\n");
return 1;
}

printf("Forking...\n");
new_pid = fork();
if (new_pid == 0) {
printf("I am the child!\n");
exit(0);
} else if (new_pid == pid)
printf("I am the parent. My child got the pid %d!\n", new_pid);
else
printf("pid (%d) does not match expected pid (%d)\n", new_pid, 
pid);

printf("Cleaning up...\n");
if (flock(fd, LOCK_UN))
printf("Cannot unlock\n");
close(fd);
return 0;
}
$ id -u; /home/libcap/ns_last_pid 30
1001
Opening ns_last_pid...
Locking ns_last_pid...
Writing pid-1 to ns_last_pid...
Forking...
I am the parent. My child got the pid 30!
I am the child!
Cleaning up...

For the clone3() based approach:
$ cat clone3_set_tid.c
 #define _GNU_SOURCE
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))

int main(int argc, char *argv[])
{
struct clone_args c_args = { };
pid_t pid, new_pid;

if (argc != 2)
return 1;

pid = atoi(argv[1]);
c_args.set_tid = ptr_to_u64(&pid);
c_args.set_tid_size = 1;

printf("Forking...\n");
new_pid = syscall(__NR_clone3, &c_args, sizeof(c_args));
if (new_pid == 0) {
printf("I am the child!\n");
exit(0);
} else if

Re: [PATCH 5/5] dt-bindings: timer: Add CLINT bindings

2020-05-21 Thread Anup Patel

On Fri, May 22, 2020 at 1:35 AM Sean Anderson  wrote:
>
> On 5/21/20 9:45 AM, Anup Patel wrote:
> > We add DT bindings documentation for CLINT device.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  .../bindings/timer/sifive,clint.txt   | 33 +++
> >  1 file changed, 33 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/timer/sifive,clint.txt
> >
> > diff --git a/Documentation/devicetree/bindings/timer/sifive,clint.txt 
> > b/Documentation/devicetree/bindings/timer/sifive,clint.txt
> > new file mode 100644
> > index ..cae2dad1223a
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/timer/sifive,clint.txt
> > @@ -0,0 +1,33 @@
> > +SiFive Core Local Interruptor (CLINT)
> > +-
> > +
> > +SiFive (and other RISC-V) SOCs include an implementation of the SiFive Core
> > +Local Interruptor (CLINT) for M-mode timer and inter-processor interrupts.
> > +
> > +It directly connects to the timer and inter-processor interrupt lines of
> > +various HARTs (or CPUs) so RISC-V per-HART (or per-CPU) local interrupt
> > +controller is the parent interrupt controller for CLINT device.
> > +
> > +The clock frequency of CLINT is specified via "timebase-frequency" DT
> > +property of "/cpus" DT node. The "timebase-frequency" DT property is
> > +described in: Documentation/devicetree/bindings/riscv/cpus.yaml
> > +
> > +Required properties:
> > +- compatible : "sifive,clint-1.0.0" and a string identifying the actual
> > +  detailed implementation in case that specific bugs need to be worked 
> > around.
>
> Should the "riscv,clint0" compatible string be documented here? This

Yes, I forgot to add this compatible string. I will add in v2.

> peripheral is not really specific to sifive, as it is present in most
> rocket-chip cores.

I agree that CLINT is present in a lot of non-SiFive RISC-V SOCs and
FPGAs but this IP is only documented as part of SiFive FU540 SOC.
(Refer, https://static.dev.sifive.com/FU540-C000-v1.0.pdf)

The RISC-V foundation should host the CLINT spec independently
under https://github.com/riscv and make CLINT spec totally open.

For now, I have documented it just like PLIC DT bindings found at:
Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.txt

If RISC-V maintainers agree then I will document it as "RISC-V CLINT".

@Palmer ?? @Paul ??

>
> > +- reg : Should contain 1 register range (address and length).
> > +- interrupts-extended : Specifies which HARTs (or CPUs) are connected to
> > +  the CLINT.  Each node pointed to should be a riscv,cpu-intc node, which
> > +  has a riscv node as parent.
> > +
> > +Example:
> > +
> > + clint@200 {
> > + compatible = "sifive,clint-1.0.0", "sifive,fu540-c000-clint";
> > + interrupts-extended = <
> > + &cpu1-intc 3 &cpu1-intc 7
> > + &cpu2-intc 3 &cpu2-intc 7
> > + &cpu3-intc 3 &cpu3-intc 7
> > + &cpu4-intc 3 &cpu4-intc 7>;
> > + reg = <0x200 0x400>;
> > + };
> >
>
> --Sean

Regards,
Anup

Re: [PATCH 01/19] dt-bindings: PCI: Endpoint: Add DT bindings for PCI EPF NTB Device

2020-05-21 Thread Kishon Vijay Abraham I

Hi RobH,

On 5/14/2020 8:29 PM, Kishon Vijay Abraham I wrote:
> Add device tree schema for PCI endpoint function bus to which
> endpoint function devices should be attached. Then add device tree
> schema for PCI endpoint function device to include bindings thats
> generic to all endpoint functions. Finally add device tree schema
> for PCI endpoint NTB function device by including the generic
> device tree schema for PCIe endpoint function.
> 
> Signed-off-by: Kishon Vijay Abraham I 
> ---
>  .../bindings/pci/endpoint/pci-epf-bus.yaml| 42 +++
>  .../bindings/pci/endpoint/pci-epf-device.yaml | 69 +++
>  .../bindings/pci/endpoint/pci-epf-ntb.yaml| 68 ++
>  3 files changed, 179 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml
>  create mode 100644 
> Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml
>  create mode 100644 
> Documentation/devicetree/bindings/pci/endpoint/pci-epf-ntb.yaml
> 
> diff --git a/Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml 
> b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml
> new file mode 100644
> index ..1c504f2e85e4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml
> @@ -0,0 +1,42 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +# Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/pci/endpoint/pci-epf-bus.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: PCI Endpoint Function Bus
> +
> +maintainers:
> +  - Kishon Vijay Abraham I 
> +
> +properties:
> +  compatible:
> +const: pci-epf-bus
> +
> +patternProperties:
> +  "^func@[0-9a-f]+$":
> +type: object
> +description: |
> +  PCI Endpoint Function Bus node should have subnodes for each of
> +  the implemented endpoint function. It should follow the bindings
> +  specified for endpoint function in
> +  Documentation/devicetree/bindings/pci/endpoint/
> +
> +examples:
> +  - |
> +epf_bus {
> +  compatible = "pci-epf-bus";
> +
> +  func@0 {
> +compatible = "pci-epf-ntb";
> +epcs = <&pcie0_ep>, <&pcie1_ep>;
> +epc-names = "primary", "secondary";
> +reg = <0>;

I'm not sure how to represent "reg" property properly for cases like this where
it represents ID and not a memory resource. I seem to get warning for
"reg_format" even after adding address-cells and size-cells property in
epf_bus. Can you give some hints here please?

> +epf,vendor-id = /bits/ 16 <0x104c>;

I want to make vendor-id and device-id as 16 bits from the beginning at-least
for PCIe endpoint. So I'm prefixing these properties with "epf,". However I get
this "do not match any of the regexes:". Can we add "epf" as a standard prefix?

Thanks
Kishon
> +epf,device-id = /bits/ 16 <0xb00d>;
> +num-mws = <4>;
> +mws-size = <0x0 0x10>, <0x0 0x10>, <0x0 0x10>, <0x0 
> 0x10>;
> +  };
> +};
> +...
> diff --git 
> a/Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml 
> b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml
> new file mode 100644
> index ..cee72864c8ca
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml
> @@ -0,0 +1,69 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +# Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/pci/endpoint/pci-epf-device.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: PCI Endpoint Function Device
> +
> +maintainers:
> +  - Kishon Vijay Abraham I 
> +
> +properties:
> +  compatible:
> +const: pci-epf-bus
> +
> +properties:
> +  $nodename:
> +pattern: "^func@"
> +
> +  epcs:
> +description:
> +  Phandle to the endpoint controller device. Should have "2" entries for
> +  NTB endpoint function and "1" entry for others.
> +minItems: 1
> +maxItems: 2
> +
> +  epc-names:
> +description:
> +  Must contain an entry for each entry in "epcs" when "epcs" have more 
> than
> +  one entry.
> +
> +  reg:
> +maxItems: 0
> +description: Must contain the index number of the function.
> +
> +  epf,vendor-id:
> +description:
> +  The PCI vendor ID
> +allOf:
> +  - $ref: /schemas/types.yaml#/definitions/uint16
> +
> +  epf,device-id:
> +description:
> +  The PCI device ID
> +allOf:
> +  - $ref: /schemas/types.yaml#/definitions/uint16
> +
> +  epf,baseclass-code:
> +description: Code to classify the type of operation the function performs
> +allOf:
> +  - $ref: /schemas/types.yaml#/definitions/uint8
> +
> +  epf,subclass-code:
> +description:
> +  Specifies a base class sub-class, which identifies more specifically

Re: [PATCH 03/11] mm/hugetlb: introduce alloc_control structure to simplify migration target allocation APIs

2020-05-21 Thread Joonsoo Kim

2020년 5월 22일 (금) 오전 3:57, Mike Kravetz 님이 작성:
>
> On 5/17/20 6:20 PM, js1...@gmail.com wrote:
> > From: Joonsoo Kim 
> >
> > Currently, page allocation functions for migration requires some arguments.
> > More worse, in the following patch, more argument will be needed to unify
> > the similar functions. To simplify them, in this patch, unified data
> > structure that controls allocation behaviour is introduced.
>
> As a followup to Roman's question and your answer about adding a suffix/prefix
> to the new structure.  It 'may' be a bit confusing as alloc_context is already
> defined and *ac is passsed around for page allocations.  Perhaps, this new
> structure could somehow have migrate in the name as it is all about allocating
> migrate targets?

I have considered that but I cannot find appropriate prefix. In hugetlb code,
struct alloc_control is passed to the internal function which is not
fully dedicated
to the migration so 'migrate' would not be appropriate prefix.

alloc_context is used by page allocation core and alloc_control would be used by
outside of it so I think that we can endure it. If there is a good
suggestion, I will change
the name happily.

> >
> > For clean-up, function declarations are re-ordered.
> >
> > Note that, gfp_mask handling on alloc_huge_page_(node|nodemask) is
> > slightly changed, from ASSIGN to OR. It's safe since caller of these
> > functions doesn't pass extra gfp_mask except htlb_alloc_mask().
> >
> > Signed-off-by: Joonsoo Kim 
>
> Patch makes sense.

Thanks!

> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index a298a8c..94d2386 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1526,10 +1526,15 @@ struct page *new_page_nodemask(struct page *page,
> >   unsigned int order = 0;
> >   struct page *new_page = NULL;
> >
> > - if (PageHuge(page))
> > - return alloc_huge_page_nodemask(
> > - page_hstate(compound_head(page)),
> > - preferred_nid, nodemask);
> > + if (PageHuge(page)) {
> > + struct hstate *h = page_hstate(page);
>
> I assume the removal of compound_head(page) was intentional?  Just asking
> because PageHuge will look at head page while page_hstate will not.  So,
> if passed a non-head page things could go bad.

I was thinking that page_hstate() can handle the tail page but it seems that
it's not. Thanks for correction. I will change it on next version.

Thanks.

[PATCH] [v3] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu

When copy_from_user() returns an error code, there
is a runtime PM usage counter imbalance.

Fix this by moving copy_from_user() to the beginning
of this function.

Signed-off-by: Dinghao Liu 
---

Changelog:

v2: - Move copy_from_user() to the beginning rather
  than adding pm_runtime_put_autosuspend().

v3: - Add missing changelog information.
---
 drivers/usb/musb/musb_debugfs.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/musb/musb_debugfs.c b/drivers/usb/musb/musb_debugfs.c
index 7b6281ab62ed..30a89aa8a3e7 100644
--- a/drivers/usb/musb/musb_debugfs.c
+++ b/drivers/usb/musb/musb_debugfs.c
@@ -168,6 +168,11 @@ static ssize_t musb_test_mode_write(struct file *file,
u8  test;
charbuf[24];
 
+   memset(buf, 0x00, sizeof(buf));
+
+   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
+   return -EFAULT;
+
pm_runtime_get_sync(musb->controller);
test = musb_readb(musb->mregs, MUSB_TESTMODE);
if (test) {
@@ -176,11 +181,6 @@ static ssize_t musb_test_mode_write(struct file *file,
goto ret;
}
 
-   memset(buf, 0x00, sizeof(buf));
-
-   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
-   return -EFAULT;
-
if (strstarts(buf, "force host full-speed"))
test = MUSB_TEST_FORCE_HOST | MUSB_TEST_FORCE_FS;
 
-- 
2.17.1

Re: [PATCH v2] xfrm: policy: Fix xfrm policy match

2020-05-21 Thread Xin Long

On Fri, May 22, 2020 at 9:45 AM Yuehaibing  wrote:
>
> On 2020/5/21 14:49, Xin Long wrote:
> > On Tue, May 19, 2020 at 4:53 PM Steffen Klassert
> >  wrote:
> >>
> >> On Fri, May 15, 2020 at 04:39:57PM +0800, Yuehaibing wrote:
> >>>
> >>> Friendly ping...
> >>>
> >>> Any plan for this issue?
> >>
> >> There was still no consensus between you and Xin on how
> >> to fix this issue. Once this happens, I consider applying
> >> a fix.
> >>
> > Sorry, Yuehaibing, I can't really accept to do: (A->mark.m & A->mark.v)
> > I'm thinking to change to:
> >
> >  static bool xfrm_policy_mark_match(struct xfrm_policy *policy,
> >struct xfrm_policy *pol)
> >  {
> > -   u32 mark = policy->mark.v & policy->mark.m;
> > -
> > -   if (policy->mark.v == pol->mark.v && policy->mark.m == pol->mark.m)
> > -   return true;
> > -
> > -   if ((mark & pol->mark.m) == pol->mark.v &&
> > -   policy->priority == pol->priority)
> > +   if (policy->mark.v == pol->mark.v &&
> > +   (policy->mark.m == pol->mark.m ||
> > +policy->priority == pol->priority))
> > return true;
> >
> > return false;
> >
> > which means we consider (the same value and mask) or
> > (the same value and priority) as the same one. This will
> > cover both problems.
>
>   policy A (mark.v = 0x1011, mark.m = 0x1011, priority = 1)
>   policy B (mark.v = 0x1001, mark.m = 0x1001, priority = 1)
I'd think these are 2 different policies.

>
>   when fl->flowi_mark == 0x12341011, in xfrm_policy_match() do check like 
> this:
>
> (fl->flowi_mark & pol->mark.m) != pol->mark.v
>
> 0x12341011 & 0x1011 == 0x1011
> 0x12341011 & 0x1001 == 0x1001
>
>  This also match different policy depends on the order of policy inserting.
Yes, this may happen when a user adds 2  policies like that.
But I think this's a problem that the user doesn't configure it well,
'priority' should be set.
and this can not be avoided, also such as:

   policy A (mark.v = 0xff00, mark.m = 0x1000, priority = 1)
   policy B (mark.v = 0x00ff, mark.m = 0x0011, priority = 1)

   try with 0x12341011

So just be it, let users decide.

[PATCH] drm/i915: fix a memory leak bug.

2020-05-21 Thread wu000273

From: Qiushi Wu 

In intel_gtt_setup_scratch_page(), pointer "page" is not released if
pci_dma_mapping_error() return an error, leading to a memory leak bug.
Fix this issue by freeing "page" before return.

Fixes: 0e87d2b06cb46 ("intel-gtt: initialize our own scratch page")
Signed-off-by: Qiushi Wu 
---
 drivers/char/agp/intel-gtt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 66a62d17a3f5..a56a50f2740f 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -304,8 +304,10 @@ static int intel_gtt_setup_scratch_page(void)
if (intel_private.needs_dmar) {
dma_addr = pci_map_page(intel_private.pcidev, page, 0,
PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(intel_private.pcidev, dma_addr))
+   if (pci_dma_mapping_error(intel_private.pcidev, dma_addr)) {
+   __free_page(page);
return -EINVAL;
+   }
 
intel_private.scratch_page_dma = dma_addr;
} else
-- 
2.17.1

Re: Re: [PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Greg Kroah-Hartman

On Fri, May 22, 2020 at 01:22:24PM +0800, dinghao@zju.edu.cn wrote:
> Sorry, it's my carelessness. In v1 I added pm_runtime_put_autosuspend()
> after copy_from_user() to fix this problem. Since copy_from_user() is
> moved to the beginning now, we need not to add PM decrement. 

THat's fine, please put that information in a v3 and resend it.

thanks,

greg k-h

[PATCH] dmaengine: dw-axi-dmac: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu

When axi_dma_resume() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep the
counter balanced.

Signed-off-by: Dinghao Liu 
---
 drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c 
b/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
index 14c1ac26f866..a368d01170f1 100644
--- a/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
+++ b/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
@@ -924,8 +924,10 @@ static int dw_probe(struct platform_device *pdev)
 */
pm_runtime_get_noresume(chip->dev);
ret = axi_dma_resume(chip);
-   if (ret < 0)
+   if (ret < 0) {
+   pm_runtime_put(chip->dev);
goto err_pm_disable;
+   }
 
axi_dma_hw_init(chip);
 
-- 
2.17.1

Offer

2020-05-21 Thread Mazer

I hope you are doing great?
 
This is Felix from Toronto-Canada. I have a lucrative business 
offer that will benefit us both immensely within a very short 
period of time. However, I need your initial approval of interest 
prior to further and complete details regarding the deal.
 
Thanks,
 
Felix.

Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-21 Thread Luis Chamberlain

On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:
> On 5/15/20 4:28 PM, Luis Chamberlain wrote:
> > This makes use of the new module_firmware_crashed() to help
> > annotate when firmware for device drivers crash. When firmware
> > crashes devices can sometimes become unresponsive, and recovery
> > sometimes requires a driver unload / reload and in the worst cases
> > a reboot.
> > 
> > Using a taint flag allows us to annotate when this happens clearly.
> 
> I don't fully understand what this is meant to do, so I can't
> fully assess whether it's the right thing to do.

It is meant to taint the kernel to ensure it is clear that something
critically bad has happened with the device firmware, it crashed, and
recovery may or may not happen, we are not 100% certain.
> 
> But in this particular place in the IPA code, the *modem* has
> crashed.  And the IPA driver is not responsible for modem
> firmware, remoteproc is.

Oi vei. So the device it depends on has crashed.

> The IPA driver *can* be responsible for loading some other
> firmware, but even in that case, it only happens on initial
> boot, and it's basically assumed to never crash.

OK is this an issue which we can recover from? If for the slightest bit
this can affect users it is something we should inform them over.

This patch set is missing uevents for these issues, but I just added
support for this.

> So regardless of whether this module_firmware_crashed() call is
> appropriate in some places, I believe it should not be used here.

OK thanks. Can the user be affected by this crash? If so how? Can
we recover ? Is that always guaranteed?

  Luis

[PATCH] net/mlx4_core: fix a memory leak bug.

2020-05-21 Thread wu000273

From: Qiushi Wu 

In function mlx4_opreq_action(), pointer "mailbox" is not released,
when mlx4_cmd_box() return and error, causing a memory leak bug.
Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can
free this pointer.

Fixes: fe6f700d6cbb7 ("Respond to operation request by firmware")
Signed-off-by: Qiushi Wu 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 6e501af0e532..f6ff9620a137 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -2734,7 +2734,7 @@ void mlx4_opreq_action(struct work_struct *work)
if (err) {
mlx4_err(dev, "Failed to retrieve required operation: 
%d\n",
 err);
-   return;
+   goto out;
}
MLX4_GET(modifier, outbox, GET_OP_REQ_MODIFIER_OFFSET);
MLX4_GET(token, outbox, GET_OP_REQ_TOKEN_OFFSET);
-- 
2.17.1

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-21 Thread Luis Chamberlain

On Fri, May 22, 2020 at 08:12:59AM +0300, Emmanuel Grumbach wrote:
> >
> > On Tue, May 19, 2020 at 10:37 PM Emmanuel Grumbach  
> > wrote:
> > > So I believe we already have this uevent, it is the devcoredump. All
> > > we need is to add the unique id.
> >
> > I think there are a few reasons that devcoredump doesn't satisfy what
> > either Luis or I want.
> >
> > 1) it can be disabled entirely [1], for good reasons (e.g., think of
> > non-${CHIP_VENDOR} folks, who can't (and don't want to) do anything
> > with the opaque dumps provided by closed-source firmware)
> 
> Ok, if all you're interested into is the information that this event
> happen (as opposed to report a bug and providing the data), then I
> agree. 

I've now hit again a firmware crash with ath10k with the latest firwmare
and kernel and the *only* thing that helped recovery was a full reboot,
so that is a crystal clear case that this needs to taint the kernel, and
yes we do want to inform users too, so I've just added uevent support
for a few panic / taint events in the kernel now and rolled into my
series. I'll run some final tests and then post this as a follow up.

devlink didn't cut it, its networking specific.

  Luis

Re: piix4-poweroff.c I/O BAR usage

2020-05-21 Thread Paul Burton

Hello,

On Thu, May 21, 2020 at 6:04 PM Maciej W. Rozycki  wrote:
>  Paul may or may not be reachable anymore, so I'll step in.

I'm reachable but lacking free time & with no access to Malta hardware
I can't claim to be too useful here, so thanks for responding :)

Before being moved to a driver (which was mostly driven by a desire to
migrate Malta to a multi-platform/generic kernel using DT) this code
was part of arch/mips/mti-malta/ where I added it in commit
b6911bba598f ("MIPS: Malta: add suspend state entry code"). My main
motivation at the time was to make QEMU exit after running poweroff,
but I did ensure it worked on real Malta boards too (at least Malta-R
with CoreFPGA6). Over the years since then it shocked a couple of
hardware people to see software power off a Malta - if the original
hardware designers had intended that to work then the knowledge had
been lost over time :)

I suspect the code was based on visws_machine_power_off():

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/platform/visws/visws_quirks.c?h=v3.10#n125

> > pci_request_region() takes a BAR number (0-5), but here we're passing
> > PCI_BRIDGE_RESOURCES (13 if CONFIG_PCI_IOV, or 7 otherwise), which is
> > the bridge I/O window.
> >
> > I don't think this device ([8086:7113]) is a bridge, so that resource
> > should be empty.
>
>  Hmm, isn't the resource actually set up by `quirk_piix4_acpi' though?

I agree that the region used is meant to match that set up by
quirk_piix4_acpi(), which also refers to it using the
PCI_BRIDGE_RESOURCES macro.

Thanks,
Paul

Re: Re: [PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread dinghao . liu

Sorry, it's my carelessness. In v1 I added pm_runtime_put_autosuspend()
after copy_from_user() to fix this problem. Since copy_from_user() is
moved to the beginning now, we need not to add PM decrement. 

Regards,
Dinghao

> On Fri, May 22, 2020 at 10:59:02AM +0800, Dinghao Liu wrote:
> > When copy_from_user() returns an error code, there
> > is a runtime PM usage counter imbalance.
> > 
> > Fix this by moving copy_from_user() to the beginning
> > of this function.
> > 
> > Signed-off-by: Dinghao Liu 
> > ---
> >  drivers/usb/musb/musb_debugfs.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> What changed from v1?  Always show that below the --- line as the
> documentation says to.
> 
> thanks,
> 
> greg k-h

Re: Panic related to perf (bisected)

2020-05-21 Thread Ian Rogers

On Thu, May 21, 2020 at 10:14 PM Tibor Billes  wrote:
>
> Hi,
>
> On Mon, 18 May 2020, Ian Rogers wrote:
>
> > On Sat, May 16, 2020 at 6:36 AM Billes Tibor  wrote:
> > >
> > > Hi,
> > >
> > > I've been hitting a freeze on my laptop since 5.3, but haven't got the
> > > time to finish bisecting it. Now
> > > I had, and here is what I found:
> > >
> > > - 5.2 series works correctly (tested 5.2.9 and 5.2.15)
> > > - 5.3 series and newer kernels freeze. The newest I tested is 5.6.10
> > > (which also freezes). There will
> > >be a complete bisect log at the end of the mail.
> > >
> > > There are several circumstances to reproduce the freeze. At least this
> > > is what I found relevant:
> > > - The freeze does not occur after a fresh boot, it needs a sleep-wakeup
> > > cycle.
> > > - Run `perf stat -a --topdown -- sleep 8s` every minute (It is part of
> > > some metrics I collect using Zabbix)
> > > - Some workload (building the kernel or gaming) increases the chance of
> > > freezing, but it can occur
> > >without user interaction too.
> > >
> > > The freeze usually comes within a few minutes after wakeup. The longest
> > > was about an hour. (For
> > > comparison, if I don't do a sleep-wakeup, the machine works fine for 8+
> > > hours).
> > >
> > > First, there is a warning in syslog, then I took pictures of the actual
> > > panic.
> > >
> > > The warning:
> > > May 16 13:28:46 serpens kernel: [33128.086217] [ cut here
> > > ]
> > > May 16 13:28:46 serpens kernel: [33128.086222] WARNING: CPU: 0 PID: 0 at
> > > arch/x86/events/core.c:1506 x86_pmu_del+0x140/0x160
> > > May 16 13:28:46 serpens kernel: [33128.086223] Modules linked in:
> > > nouveau mxm_wmi ttm ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc
> > > iptable_filter binfmt_misc essiv authenc uvcvideo iwlmvm
> > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev mac80211
> > > videobuf2_common snd_hda_codec_realtek snd_hda_codec_generic libarc4
> > > snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep intel_rapl_msr
> > > iwlwifi snd_hda_core intel_rapl_common snd_pcm x86_pkg_temp_thermal
> > > snd_seq mei_me cfg80211 snd_seq_device snd_timer mei intel_powerclamp
> > > snd soundcore ideapad_laptop coretemp sparse_keymap ip_tables x_tables
> > > dm_crypt hid_generic usbhid hid i915 intel_gtt i2c_algo_bit
> > > drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> > > aesni_intel glue_helper crypto_simd r8169 ahci cryptd psmouse libahci
> > > realtek drm_panel_orientation_quirks video wmi
> > > May 16 13:28:46 serpens kernel: [33128.086247] CPU: 0 PID: 0 Comm:
> > > swapper/0 Not tainted 5.6.10 #52
> > > May 16 13:28:46 serpens kernel: [33128.086247] Hardware name: LENOVO
> > > 20378/Lenovo Y50-70, BIOS 9ECN36WW(V2.00) 01/12/2015
> > > May 16 13:28:46 serpens kernel: [33128.086249] RIP:
> > > 0010:x86_pmu_del+0x140/0x160
> > > May 16 13:28:46 serpens kernel: [33128.086250] Code: 63 d8 48 c7 84 dd
> > > 20 07 00 00 00 00 00 00 4c 89 e7 89 85 14 02 00 00 e8 fe 27 16 00 e9 ef
> > > fe ff ff 44 8d 6b 01 e9 5d ff ff ff <0f> 0b 5b 5d 41 5c 41 5d c3 31 db
> > > e9 41 ff ff ff 41 bd 01 00 00 00
> > > May 16 13:28:46 serpens kernel: [33128.086251] RSP:
> > > 0018:a2a380003e40 EFLAGS: 00010046
> > > May 16 13:28:46 serpens kernel: [33128.086252] RAX: 0005
> > > RBX: 0005 RCX: 0010
> > > May 16 13:28:46 serpens kernel: [33128.086253] RDX: 0005
> > > RSI: 0005 RDI: 907c4d4b1000
> > > May 16 13:28:46 serpens kernel: [33128.086254] RBP: 907d932125a0
> > > R08: 0002 R09: 00029f80
> > > May 16 13:28:46 serpens kernel: [33128.086254] R10: a2a380003eb0
> > > R11:  R12: 907c4d4b1000
> > > May 16 13:28:46 serpens kernel: [33128.086255] R13: 0006
> > > R14: 907d9326fc0c R15: 907d9326fb00
> > > May 16 13:28:46 serpens kernel: [33128.086256] FS:
> > > () GS:907d9320() knlGS:
> > > May 16 13:28:46 serpens kernel: [33128.086256] CS:  0010 DS:  ES:
> > >  CR0: 80050033
> > > May 16 13:28:46 serpens kernel: [33128.086257] CR2: 7f713c5b44b0
> > > CR3: 8300a001 CR4: 001606f0
> > > May 16 13:28:46 serpens kernel: [33128.086257] Call Trace:
> > > May 16 13:28:46 serpens kernel: [33128.086259]  
> > > May 16 13:28:46 serpens kernel: [33128.086263]
> > > event_sched_out.isra.116+0x89/0x1f0
> > > May 16 13:28:46 serpens kernel: [33128.086264]
> > > group_sched_out.part.118+0x55/0xd0
> > > May 16 13:28:46 serpens kernel: [33128.086265] ctx_sched_out+0x207/0x240
> > > May 16 13:28:46 serpens kernel: [33128.086267]
> > > perf_mux_hrtimer_handler+0x267/0x310
> > > May 16 13:28:46 serpens kernel: [33128.086269]  ?
> > > __perf_install_in_context+0x220/0x220
> > > May 16 13:28:46 serpens kernel: [33128.086270]
> > > __hrtimer_run_queues+0xfa/0x260
> > > May 16 13:28:46 serpens kernel: [33128.086272] 
> > > hrtimer_in

Re: [RFC 1/2] devlink: add simple fw crash helpers

2020-05-21 Thread Luis Chamberlain

On Tue, May 19, 2020 at 02:15:30PM -0700, Jakub Kicinski wrote:
> Add infra for creating devlink instances for a device to report

Thanks for doing this series as a PoC, counter to the module_firmware_crash()
which I proposed to taint the kernel with a firmware crash flag to the kernel
and module.

For those not famliar about devlink:

https://lwn.net/Articles/677967/
https://www.kernel.org/doc/html/latest/networking/devlink/index.html

The github page also is now 404 as Jiri merged that stuff into iproute2:

git://git.kernel.org/pub/scm/network/iproute2/iproute2.git

> fw crashes. This patch expects the devlink instance to be registered
> at probe time. I belive to be the cleanest. We can also add a devm
> version of the helpers, so that we don't have to do the clean up.
> Or we can go even further and register the devlink instance only
> once error has happened (for the first time, then we can just
> find out if already registered by traversing the list like we
> do here).
> 
> With the patch applied and a sample driver converted we get:
> 
> $ devlink dev
> pci/:07:00.0
> 
> Then monitor for errors:
> 
> $ devlink mon health
> [health,status] pci/:07:00.0:
>   reporter fw
> state error error 1 recover 0
> [health,status] pci/:07:00.0:
>   reporter fw
> state error error 2 recover 0
> 
> These are the events I triggered on purpose. One can also inspect
> the health of all devices capable of reporting fw errors:
> 
> $ devlink health
> pci/:07:00.0:
>   reporter fw
> state error error 7 recover 0
> 
> Obviously drivers may upgrade to the full devlink health API
> which includes state dump, state dump auto-collect and automatic
> error recovery control.
> 
> Signed-off-by: Jakub Kicinski 
> ---
>  include/linux/devlink.h   |  11 +++
>  net/core/Makefile |   2 +-
>  net/core/devlink_simple_fw_reporter.c | 101 ++
>  3 files changed, 113 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/devlink.h
>  create mode 100644 net/core/devlink_simple_fw_reporter.c
> 
> diff --git a/include/linux/devlink.h b/include/linux/devlink.h
> new file mode 100644
> index ..2b73987eefca
> --- /dev/null
> +++ b/include/linux/devlink.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _LINUX_DEVLINK_H_
> +#define _LINUX_DEVLINK_H_
> +
> +struct device;
> +
> +void devlink_simple_fw_reporter_prepare(struct device *dev);
> +void devlink_simple_fw_reporter_cleanup(struct device *dev);
> +void devlink_simple_fw_reporter_report_crash(struct device *dev);
> +
> +#endif
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 3e2c378e5f31..6f1513781c17 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
>  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
>  obj-$(CONFIG_DST_CACHE) += dst_cache.o
>  obj-$(CONFIG_HWBM) += hwbm.o
> -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o

This was looking super sexy up to here. This is networking specific.
We want something generic for *anything* that requests firmware.

I'm afraid this won't work for something generic. I don't think its
throw-away work though, the idea to provide a generic interface to
dump firmware through netlink might be nice for networking, or other
things.

But I have a feeling we'll want something still more generic than this.

So networking may want to be aware that a firmware crash happened as
part of this network device health thing, but firmware crashing is a
generic thing.

I have now extended my patch set to include uvents and I am more set on
that we need the taint now more than ever.

  Luis

>  obj-$(CONFIG_GRO_CELLS) += gro_cells.o
>  obj-$(CONFIG_FAILOVER) += failover.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
> diff --git a/net/core/devlink_simple_fw_reporter.c 
> b/net/core/devlink_simple_fw_reporter.c
> new file mode 100644
> index ..48dde9123c3c
> --- /dev/null
> +++ b/net/core/devlink_simple_fw_reporter.c
> @@ -0,0 +1,101 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct devlink_simple_fw_reporter {
> + struct list_head list;
> + struct devlink_health_reporter *reporter;
> +};
> +
> +
> +static LIST_HEAD(devlink_simple_fw_reporters);
> +static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex);
> +
> +static const struct devlink_health_reporter_ops simple_devlink_health = {
> + .name = "fw",
> +};
> +
> +static const struct devlink_ops simple_devlink_ops = {
> +};
> +
> +static struct devlink_simple_fw_reporter *
> +devlink_simple_fw_reporter_find_for_dev(struct device *dev)
> +{
> + struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL;
> + struct devlink *devlink;
> +
> + mutex_lock(&devlink_simple_fw_reporters_mutex);
> + list_for_each_entry(simple_devlink, &devlink_simple_fw_reporters,
> + lis

[PATCH v2 0/4] mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()

2020-05-21 Thread John Hubbard

The purpose of posting this series is to launch a test in the
intel-gfx-ci tree. (The patches have already been merged into Andrew's
linux-mm tree.)

This applies to today's linux.git (note the base-commit tag at the
bottom).

Changes since V1:

* Fixed a bug in the refactoring patch: added FOLL_FAST_ONLY to the
  list of gup_flags *not* to WARN() on. This lead to a failure in the
  first intel-gfx-ci test run [1].

[1] 
https://lore.kernel.org/r/159008745422.32320.5724805750977048...@build.alporthouse.com

Original cover letter:

This needs to go through Andrew's -mm tree, due to adding a new gup.c
routine. However, I would really love to have some testing from the
drm/i915 folks, because I haven't been able to run-time test that part
of it.

Otherwise, though, the series has passed my basic run time testing:
some LTP tests, some xfs and etx4 non-destructive xfstests, and an
assortment of other smaller ones: vm selftests, io_uring_register, a
few more. But that's only on one particular machine. Also, cross-compile
tests for half a dozen arches all pass.

Details:

In order to convert the drm/i915 driver from get_user_pages() to
pin_user_pages(), a FOLL_PIN equivalent of __get_user_pages_fast() was
required. That led to refactoring __get_user_pages_fast(), with the
following goals:

1) As above: provide a pin_user_pages*() routine for drm/i915 to call,
   in place of __get_user_pages_fast(),

2) Get rid of the gup.c duplicate code for walking page tables with
   interrupts disabled. This duplicate code is a minor maintenance
   problem anyway.

3) Make it easy for an upcoming patch from Souptick, which aims to
   convert __get_user_pages_fast() to use a gup_flags argument, instead
   of a bool writeable arg.  Also, if this series looks good, we can
   ask Souptick to change the name as well, to whatever the consensus
   is. My initial recommendation is: get_user_pages_fast_only(), to
   match the new pin_user_pages_only().

John Hubbard (4):
  mm/gup: move __get_user_pages_fast() down a few lines in gup.c
  mm/gup: refactor and de-duplicate gup_fast() code
  mm/gup: introduce pin_user_pages_fast_only()
  drm/i915: convert get_user_pages() --> pin_user_pages()

 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  22 +--
 include/linux/mm.h  |   3 +
 mm/gup.c| 153 
 3 files changed, 109 insertions(+), 69 deletions(-)


base-commit: 051143e1602d90ea71887d92363edd539d411de5
-- 
2.26.2

[PATCH v2 1/4] mm/gup: move __get_user_pages_fast() down a few lines in gup.c

2020-05-21 Thread John Hubbard

This is in order to avoid a forward declaration of
internal_get_user_pages_fast(), in the next patch.

This is code movement only--all generated code should
be identical.

Signed-off-by: John Hubbard 
---
 mm/gup.c | 112 +++
 1 file changed, 56 insertions(+), 56 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 50cd9323efff..4502846d57f9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2666,62 +2666,6 @@ static bool gup_fast_permitted(unsigned long start, 
unsigned long end)
 }
 #endif
 
-/*
- * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back 
to
- * the regular GUP.
- * Note a difference with get_user_pages_fast: this always returns the
- * number of pages pinned, 0 if no pages were pinned.
- *
- * If the architecture does not support this function, simply return with no
- * pages pinned.
- */
-int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
- struct page **pages)
-{
-   unsigned long len, end;
-   unsigned long flags;
-   int nr_pinned = 0;
-   /*
-* Internally (within mm/gup.c), gup fast variants must set FOLL_GET,
-* because gup fast is always a "pin with a +1 page refcount" request.
-*/
-   unsigned int gup_flags = FOLL_GET;
-
-   if (write)
-   gup_flags |= FOLL_WRITE;
-
-   start = untagged_addr(start) & PAGE_MASK;
-   len = (unsigned long) nr_pages << PAGE_SHIFT;
-   end = start + len;
-
-   if (end <= start)
-   return 0;
-   if (unlikely(!access_ok((void __user *)start, len)))
-   return 0;
-
-   /*
-* Disable interrupts.  We use the nested form as we can already have
-* interrupts disabled by get_futex_key.
-*
-* With interrupts disabled, we block page table pages from being
-* freed from under us. See struct mmu_table_batch comments in
-* include/asm-generic/tlb.h for more details.
-*
-* We do not adopt an rcu_read_lock(.) here as we also want to
-* block IPIs that come from THPs splitting.
-*/
-
-   if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
-   gup_fast_permitted(start, end)) {
-   local_irq_save(flags);
-   gup_pgd_range(start, end, gup_flags, pages, &nr_pinned);
-   local_irq_restore(flags);
-   }
-
-   return nr_pinned;
-}
-EXPORT_SYMBOL_GPL(__get_user_pages_fast);
-
 static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
   unsigned int gup_flags, struct page **pages)
 {
@@ -2794,6 +2738,62 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
return ret;
 }
 
+/*
+ * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back 
to
+ * the regular GUP.
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
+ *
+ * If the architecture does not support this function, simply return with no
+ * pages pinned.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+   unsigned long len, end;
+   unsigned long flags;
+   int nr_pinned = 0;
+   /*
+* Internally (within mm/gup.c), gup fast variants must set FOLL_GET,
+* because gup fast is always a "pin with a +1 page refcount" request.
+*/
+   unsigned int gup_flags = FOLL_GET;
+
+   if (write)
+   gup_flags |= FOLL_WRITE;
+
+   start = untagged_addr(start) & PAGE_MASK;
+   len = (unsigned long) nr_pages << PAGE_SHIFT;
+   end = start + len;
+
+   if (end <= start)
+   return 0;
+   if (unlikely(!access_ok((void __user *)start, len)))
+   return 0;
+
+   /*
+* Disable interrupts.  We use the nested form as we can already have
+* interrupts disabled by get_futex_key.
+*
+* With interrupts disabled, we block page table pages from being
+* freed from under us. See struct mmu_table_batch comments in
+* include/asm-generic/tlb.h for more details.
+*
+* We do not adopt an rcu_read_lock(.) here as we also want to
+* block IPIs that come from THPs splitting.
+*/
+
+   if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
+   gup_fast_permitted(start, end)) {
+   local_irq_save(flags);
+   gup_pgd_range(start, end, gup_flags, pages, &nr_pinned);
+   local_irq_restore(flags);
+   }
+
+   return nr_pinned;
+}
+EXPORT_SYMBOL_GPL(__get_user_pages_fast);
+
 /**
  * get_user_pages_fast() - pin user pages in memory
  * @start:  starting user address
-- 
2.26.2

[PATCH v2 2/4] mm/gup: refactor and de-duplicate gup_fast() code

2020-05-21 Thread John Hubbard

There were two nearly identical sets of code for gup_fast()
style of walking the page tables with interrupts disabled.
This has lead to the usual maintenance problems that arise from
having duplicated code.

There is already a core internal routine in gup.c for gup_fast(),
so just enhance it very slightly: allow skipping the fall-back
to "slow" (regular) get_user_pages(), via the new FOLL_FAST_ONLY
flag. Then, just call internal_get_user_pages_fast() from
__get_user_pages_fast(), and adjust the API to match pre-existing
API behavior.

There is a change in behavior from this refactoring: the nested
form of interrupt disabling is used in all gup_fast() variants
now. That's because there is only one place that interrupt disabling
for page walking is done, and so the safer form is required. This
should, if anything, eliminate possible (rare) bugs, because the
non-nested form of enabling interrupts was fragile at best.

Signed-off-by: John Hubbard 
---
 include/linux/mm.h |  1 +
 mm/gup.c   | 63 ++
 2 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a5594ac9ebe3..84b601cab699 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2782,6 +2782,7 @@ struct page *follow_page(struct vm_area_struct *vma, 
unsigned long address,
 #define FOLL_LONGTERM  0x1 /* mapping lifetime is indefinite: see below */
 #define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
 #define FOLL_PIN   0x4 /* pages must be released via unpin_user_page */
+#define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index 4502846d57f9..4564b0dc7d0b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2694,10 +2694,12 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
struct page **pages)
 {
unsigned long addr, len, end;
+   unsigned long flags;
int nr_pinned = 0, ret = 0;
 
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
-  FOLL_FORCE | FOLL_PIN | FOLL_GET)))
+  FOLL_FORCE | FOLL_PIN | FOLL_GET |
+  FOLL_FAST_ONLY)))
return -EINVAL;
 
start = untagged_addr(start) & PAGE_MASK;
@@ -2710,15 +2712,26 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;
 
+   /*
+* Disable interrupts. The nested form is used, in order to allow full,
+* general purpose use of this routine.
+*
+* With interrupts disabled, we block page table pages from being
+* freed from under us. See struct mmu_table_batch comments in
+* include/asm-generic/tlb.h for more details.
+*
+* We do not adopt an rcu_read_lock(.) here as we also want to
+* block IPIs that come from THPs splitting.
+*/
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
gup_fast_permitted(start, end)) {
-   local_irq_disable();
+   local_irq_save(flags);
gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
-   local_irq_enable();
+   local_irq_restore(flags);
ret = nr_pinned;
}
 
-   if (nr_pinned < nr_pages) {
+   if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
/* Try to get the remaining pages with get_user_pages */
start += nr_pinned << PAGE_SHIFT;
pages += nr_pinned;
@@ -2750,45 +2763,29 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  struct page **pages)
 {
-   unsigned long len, end;
-   unsigned long flags;
-   int nr_pinned = 0;
+   int nr_pinned;
/*
 * Internally (within mm/gup.c), gup fast variants must set FOLL_GET,
 * because gup fast is always a "pin with a +1 page refcount" request.
+*
+* FOLL_FAST_ONLY is required in order to match the API description of
+* this routine: no fall back to regular ("slow") GUP.
 */
-   unsigned int gup_flags = FOLL_GET;
+   unsigned int gup_flags = FOLL_GET | FOLL_FAST_ONLY;
 
if (write)
gup_flags |= FOLL_WRITE;
 
-   start = untagged_addr(start) & PAGE_MASK;
-   len = (unsigned long) nr_pages << PAGE_SHIFT;
-   end = start + len;
-
-   if (end <= start)
-   return 0;
-   if (unlikely(!access_ok((void __user *)start, len)))
-   return 0;
-
+   nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags,
+

[PATCH v2 4/4] drm/i915: convert get_user_pages() --> pin_user_pages()

2020-05-21 Thread John Hubbard

This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 22 -
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7ffd7afeb7a5..b55ac7563189 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -471,7 +471,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
*_work)
down_read(&mm->mmap_sem);
locked = 1;
}
-   ret = get_user_pages_remote
+   ret = pin_user_pages_remote
(work->task, mm,
 obj->userptr.ptr + pinned * PAGE_SIZE,
 npages - pinned,
@@ -507,7 +507,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
*_work)
}
mutex_unlock(&obj->mm.lock);
 
-   release_pages(pvec, pinned);
+   unpin_user_pages(pvec, pinned);
kvfree(pvec);
 
i915_gem_object_put(obj);
@@ -564,6 +564,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
struct sg_table *pages;
bool active;
int pinned;
+   unsigned int gup_flags = 0;
 
/* If userspace should engineer that these pages are replaced in
 * the vma between us binding this page into the GTT and completion
@@ -598,11 +599,14 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
  GFP_KERNEL |
  __GFP_NORETRY |
  __GFP_NOWARN);
-   if (pvec) /* defer to worker if malloc fails */
-   pinned = __get_user_pages_fast(obj->userptr.ptr,
-  num_pages,
-  
!i915_gem_object_is_readonly(obj),
-  pvec);
+   /* defer to worker if malloc fails */
+   if (pvec) {
+   if (!i915_gem_object_is_readonly(obj))
+   gup_flags |= FOLL_WRITE;
+   pinned = pin_user_pages_fast_only(obj->userptr.ptr,
+ num_pages, gup_flags,
+ pvec);
+   }
}
 
active = false;
@@ -620,7 +624,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
__i915_gem_userptr_set_active(obj, true);
 
if (IS_ERR(pages))
-   release_pages(pvec, pinned);
+   unpin_user_pages(pvec, pinned);
kvfree(pvec);
 
return PTR_ERR_OR_ZERO(pages);
@@ -675,7 +679,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
}
 
mark_page_accessed(page);
-   put_page(page);
+   unpin_user_page(page);
}
obj->mm.dirty = false;
 
-- 
2.26.2

Re: [PATCH] PM: runtime: clk: Fix clk_pm_runtime_get() error path

2020-05-21 Thread Marek Szyprowski

Hi Rafael,

On 21.05.2020 19:08, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
>
> clk_pm_runtime_get() assumes that the PM-runtime usage counter will
> be dropped by pm_runtime_get_sync() on errors, which is not the case,
> so PM-runtime references to devices acquired by the former are leaked
> on errors returned by the latter.
>
> Fix this by modifying clk_pm_runtime_get() to drop the reference if
> pm_runtime_get_sync() returns an error.
>
> Fixes: 9a34b45397e5 clk: Add support for runtime PM
> Cc: 4.15+  # 4.15+
> Signed-off-by: Rafael J. Wysocki 

Frankly, I would rather fix the runtime_get_sync() instead of fixing the 
return path everywhere in the kernel. The current behavior of the 
pm_runtime_get_sync() is completely counter-intuitive then. I bet that 
in the 99% of the places where it is being called assume that no special 
fixup is needed in case of failure. This is one of the most common 
runtime PM related function and it is really a common pattern in the 
drivers to call:

pm_runtime_get_sync()

do something with the hardware

pm_runtime_put()

Do you really want to fix the error paths of the all such calls?


> ---
>   drivers/clk/clk.c |6 +-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/clk/clk.c
> ===
> --- linux-pm.orig/drivers/clk/clk.c
> +++ linux-pm/drivers/clk/clk.c
> @@ -114,7 +114,11 @@ static int clk_pm_runtime_get(struct clk
>   return 0;
>   
>   ret = pm_runtime_get_sync(core->dev);
> - return ret < 0 ? ret : 0;
> + if (ret < 0) {
> + pm_runtime_put_noidle(core->dev);
> + return ret;
> + }
> + return 0;
>   }
>   
>   static void clk_pm_runtime_put(struct clk_core *core)
>
>
>
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

[PATCH v2 3/4] mm/gup: introduce pin_user_pages_fast_only()

2020-05-21 Thread John Hubbard

This is the FOLL_PIN equivalent of __get_user_pages_fast(),
except with a more descriptive name, and gup_flags instead of
a boolean "write" in the argument list.

Signed-off-by: John Hubbard 
---
 include/linux/mm.h |  2 ++
 mm/gup.c   | 36 
 2 files changed, 38 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 84b601cab699..98be7289d7e9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1820,6 +1820,8 @@ extern int mprotect_fixup(struct vm_area_struct *vma,
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  struct page **pages);
+int pin_user_pages_fast_only(unsigned long start, int nr_pages,
+unsigned int gup_flags, struct page **pages);
 /*
  * per-process(per-mm_struct) statistics.
  */
diff --git a/mm/gup.c b/mm/gup.c
index 4564b0dc7d0b..6fa9b2016a53 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2859,6 +2859,42 @@ int pin_user_pages_fast(unsigned long start, int 
nr_pages,
 }
 EXPORT_SYMBOL_GPL(pin_user_pages_fast);
 
+/*
+ * This is the FOLL_PIN equivalent of __get_user_pages_fast(). Behavior is the
+ * same, except that this one sets FOLL_PIN instead of FOLL_GET.
+ *
+ * The API rules are the same, too: no negative values may be returned.
+ */
+int pin_user_pages_fast_only(unsigned long start, int nr_pages,
+unsigned int gup_flags, struct page **pages)
+{
+   int nr_pinned;
+
+   /*
+* FOLL_GET and FOLL_PIN are mutually exclusive. Note that the API
+* rules require returning 0, rather than -errno:
+*/
+   if (WARN_ON_ONCE(gup_flags & FOLL_GET))
+   return 0;
+   /*
+* FOLL_FAST_ONLY is required in order to match the API description of
+* this routine: no fall back to regular ("slow") GUP.
+*/
+   gup_flags |= (FOLL_PIN | FOLL_FAST_ONLY);
+   nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags,
+pages);
+   /*
+* This routine is not allowed to return negative values. However,
+* internal_get_user_pages_fast() *can* return -errno. Therefore,
+* correct for that here:
+*/
+   if (nr_pinned < 0)
+   nr_pinned = 0;
+
+   return nr_pinned;
+}
+EXPORT_SYMBOL_GPL(pin_user_pages_fast_only);
+
 /**
  * pin_user_pages_remote() - pin pages of a remote process (task != current)
  *
-- 
2.26.2

linux-next: manual merge of the device-mapper tree with the block tree

2020-05-21 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the device-mapper tree got a conflict in:

  drivers/md/dm-zoned-metadata.c

between commit:

  c64644ce363b ("block: remove the error_sector argument to blkdev_issue_flush")

from the block tree and commit:

  bf28a3ba0986 ("dm zoned: store device in struct dmz_sb")

from the device-mapper tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/md/dm-zoned-metadata.c
index bf2245370305,db0dc2b5d44d..
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@@ -659,9 -816,10 +816,10 @@@ static int dmz_write_sb(struct dmz_meta
sb->crc = 0;
sb->crc = cpu_to_le32(crc32_le(sb_gen, (unsigned char *)sb, 
DMZ_BLOCK_SIZE));
  
-   ret = dmz_rdwr_block(zmd, REQ_OP_WRITE, block, mblk->page);
+   ret = dmz_rdwr_block(dev, REQ_OP_WRITE, zmd->sb[set].block,
+mblk->page);
if (ret == 0)
-   ret = blkdev_issue_flush(zmd->dev->bdev, GFP_NOIO);
 -  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO, NULL);
++  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO);
  
return ret;
  }
@@@ -703,7 -862,7 +862,7 @@@ static int dmz_write_dirty_mblocks(stru
  
/* Flush drive cache (this will also sync data) */
if (ret == 0)
-   ret = blkdev_issue_flush(zmd->dev->bdev, GFP_NOIO);
 -  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO, NULL);
++  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO);
  
return ret;
  }
@@@ -772,7 -933,7 +933,7 @@@ int dmz_flush_metadata(struct dmz_metad
  
/* If there are no dirty metadata blocks, just flush the device cache */
if (list_empty(&write_list)) {
-   ret = blkdev_issue_flush(zmd->dev->bdev, GFP_NOIO);
 -  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO, NULL);
++  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO);
goto err;
}
  


pgpSZHCh8TIRK.pgp
Description: OpenPGP digital signature

Re: [PATCH v2 01/15] taint: add module firmware crash taint support

2020-05-21 Thread Luis Chamberlain

On Tue, May 19, 2020 at 06:42:31PM +0200, Jessica Yu wrote:
> +++ Luis Chamberlain [15/05/20 21:28 +]:
> > Device driver firmware can crash, and sometimes, this can leave your
> > system in a state which makes the device or subsystem completely
> > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> > of scraping some magical words from the kernel log, which is driver
> > specific, is much easier. So instead provide a helper which lets drivers
> > annotate this.
> > 
> > Once this happens, scrapers can easily look for modules taint flags
> > for a firmware crash. This will taint both the kernel and respective
> > calling module.
> > 
> > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
> > fact should in no way shape or form affect lockdep. This taint is device
> > driver specific.
> > 
> > Signed-off-by: Luis Chamberlain 
> > ---
> > Documentation/admin-guide/tainted-kernels.rst |  6 ++
> > include/linux/kernel.h|  3 ++-
> > include/linux/module.h| 13 +
> > include/trace/events/module.h |  3 ++-
> > kernel/module.c   |  5 +++--
> > kernel/panic.c|  1 +
> > tools/debugging/kernel-chktaint   |  7 +++
> > 7 files changed, 34 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/tainted-kernels.rst 
> > b/Documentation/admin-guide/tainted-kernels.rst
> > index 71e9184a9079..92530f1d60ae 100644
> > --- a/Documentation/admin-guide/tainted-kernels.rst
> > +++ b/Documentation/admin-guide/tainted-kernels.rst
> > @@ -100,6 +100,7 @@ Bit  Log  Number  Reason that got the kernel tainted
> >  15  _/K   32768  kernel has been live patched
> >  16  _/X   65536  auxiliary taint, defined for and used by distros
> >  17  _/T  131072  kernel was built with the struct randomization plugin
> > + 18  _/Q  262144  driver firmware crash annotation
> > ===  ===  ==  
> > 
> > Note: The character ``_`` is representing a blank in this table to make 
> > reading
> > @@ -162,3 +163,8 @@ More detailed explanation for tainting
> >  produce extremely unusual kernel structure layouts (even performance
> >  pathological ones), which is important to know when debugging. Set at
> >  build time.
> > +
> > + 18) ``Q`` used by device drivers to annotate that the device driver's 
> > firmware
> > + has crashed and the device's operation has been severely affected. The
> > + device may be left in a crippled state, requiring full driver removal 
> > /
> > + addition, system reboot, or it is unclear how long recovery will take.
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 04a5885cec1b..19e1541c82c7 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -601,7 +601,8 @@ extern enum system_states {
> > #define TAINT_LIVEPATCH 15
> > #define TAINT_AUX   16
> > #define TAINT_RANDSTRUCT17
> > -#define TAINT_FLAGS_COUNT  18
> > +#define TAINT_FIRMWARE_CRASH   18
> > +#define TAINT_FLAGS_COUNT  19
> > 
> > struct taint_flag {
> > char c_true;/* character printed when tainted */
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index 2c2e988bcf10..221200078180 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module 
> > *mod)
> > bool is_module_sig_enforced(void);
> > void set_module_sig_enforced(void);
> > 
> > +void add_taint_module(struct module *mod, unsigned flag,
> > + enum lockdep_ok lockdep_ok);
> > +
> > +static inline void module_firmware_crashed(void)
> > +{
> > +   add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
> > +}
> 
> Just a nit: I think module_firmware_crashed() is a confusing name - it
> doesn't really tell me what it's doing, and it's not really related to
> the rest of the module_* symbols, which mostly have to do with module
> loader/module specifics. Especially since a driver can be built-in, too.
> How about taint_firmware_crashed() or something similar?

Sure.

> Also, I think we might crash in add_taint_module() if a driver is
> built into the kernel, because THIS_MODULE will be null and there is
> no null pointer check in add_taint_module(). We could unify the
> CONFIG_MODULES and !CONFIG_MODULES stubs and either add an `if (mod)`
> check in add_taint_module() or add an #ifdef MODULE check in the stub
> itself to call add_taint() or add_taint_module() as appropriate. Hope
> that makes sense.

I had to do something a bit different but I think you'll agree with it.
Will include it in my next iteration.

  Luis

Re: Panic related to perf (bisected)

2020-05-21 Thread Tibor Billes

Hi,

On Mon, 18 May 2020, Ian Rogers wrote:

> On Sat, May 16, 2020 at 6:36 AM Billes Tibor  wrote:
> >
> > Hi,
> >
> > I've been hitting a freeze on my laptop since 5.3, but haven't got the
> > time to finish bisecting it. Now
> > I had, and here is what I found:
> >
> > - 5.2 series works correctly (tested 5.2.9 and 5.2.15)
> > - 5.3 series and newer kernels freeze. The newest I tested is 5.6.10
> > (which also freezes). There will
> >be a complete bisect log at the end of the mail.
> >
> > There are several circumstances to reproduce the freeze. At least this
> > is what I found relevant:
> > - The freeze does not occur after a fresh boot, it needs a sleep-wakeup
> > cycle.
> > - Run `perf stat -a --topdown -- sleep 8s` every minute (It is part of
> > some metrics I collect using Zabbix)
> > - Some workload (building the kernel or gaming) increases the chance of
> > freezing, but it can occur
> >without user interaction too.
> >
> > The freeze usually comes within a few minutes after wakeup. The longest
> > was about an hour. (For
> > comparison, if I don't do a sleep-wakeup, the machine works fine for 8+
> > hours).
> >
> > First, there is a warning in syslog, then I took pictures of the actual
> > panic.
> >
> > The warning:
> > May 16 13:28:46 serpens kernel: [33128.086217] [ cut here
> > ]
> > May 16 13:28:46 serpens kernel: [33128.086222] WARNING: CPU: 0 PID: 0 at
> > arch/x86/events/core.c:1506 x86_pmu_del+0x140/0x160
> > May 16 13:28:46 serpens kernel: [33128.086223] Modules linked in:
> > nouveau mxm_wmi ttm ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc
> > iptable_filter binfmt_misc essiv authenc uvcvideo iwlmvm
> > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev mac80211
> > videobuf2_common snd_hda_codec_realtek snd_hda_codec_generic libarc4
> > snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep intel_rapl_msr
> > iwlwifi snd_hda_core intel_rapl_common snd_pcm x86_pkg_temp_thermal
> > snd_seq mei_me cfg80211 snd_seq_device snd_timer mei intel_powerclamp
> > snd soundcore ideapad_laptop coretemp sparse_keymap ip_tables x_tables
> > dm_crypt hid_generic usbhid hid i915 intel_gtt i2c_algo_bit
> > drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> > aesni_intel glue_helper crypto_simd r8169 ahci cryptd psmouse libahci
> > realtek drm_panel_orientation_quirks video wmi
> > May 16 13:28:46 serpens kernel: [33128.086247] CPU: 0 PID: 0 Comm:
> > swapper/0 Not tainted 5.6.10 #52
> > May 16 13:28:46 serpens kernel: [33128.086247] Hardware name: LENOVO
> > 20378/Lenovo Y50-70, BIOS 9ECN36WW(V2.00) 01/12/2015
> > May 16 13:28:46 serpens kernel: [33128.086249] RIP:
> > 0010:x86_pmu_del+0x140/0x160
> > May 16 13:28:46 serpens kernel: [33128.086250] Code: 63 d8 48 c7 84 dd
> > 20 07 00 00 00 00 00 00 4c 89 e7 89 85 14 02 00 00 e8 fe 27 16 00 e9 ef
> > fe ff ff 44 8d 6b 01 e9 5d ff ff ff <0f> 0b 5b 5d 41 5c 41 5d c3 31 db
> > e9 41 ff ff ff 41 bd 01 00 00 00
> > May 16 13:28:46 serpens kernel: [33128.086251] RSP:
> > 0018:a2a380003e40 EFLAGS: 00010046
> > May 16 13:28:46 serpens kernel: [33128.086252] RAX: 0005
> > RBX: 0005 RCX: 0010
> > May 16 13:28:46 serpens kernel: [33128.086253] RDX: 0005
> > RSI: 0005 RDI: 907c4d4b1000
> > May 16 13:28:46 serpens kernel: [33128.086254] RBP: 907d932125a0
> > R08: 0002 R09: 00029f80
> > May 16 13:28:46 serpens kernel: [33128.086254] R10: a2a380003eb0
> > R11:  R12: 907c4d4b1000
> > May 16 13:28:46 serpens kernel: [33128.086255] R13: 0006
> > R14: 907d9326fc0c R15: 907d9326fb00
> > May 16 13:28:46 serpens kernel: [33128.086256] FS:
> > () GS:907d9320() knlGS:
> > May 16 13:28:46 serpens kernel: [33128.086256] CS:  0010 DS:  ES:
> >  CR0: 80050033
> > May 16 13:28:46 serpens kernel: [33128.086257] CR2: 7f713c5b44b0
> > CR3: 8300a001 CR4: 001606f0
> > May 16 13:28:46 serpens kernel: [33128.086257] Call Trace:
> > May 16 13:28:46 serpens kernel: [33128.086259]  
> > May 16 13:28:46 serpens kernel: [33128.086263]
> > event_sched_out.isra.116+0x89/0x1f0
> > May 16 13:28:46 serpens kernel: [33128.086264]
> > group_sched_out.part.118+0x55/0xd0
> > May 16 13:28:46 serpens kernel: [33128.086265] ctx_sched_out+0x207/0x240
> > May 16 13:28:46 serpens kernel: [33128.086267]
> > perf_mux_hrtimer_handler+0x267/0x310
> > May 16 13:28:46 serpens kernel: [33128.086269]  ?
> > __perf_install_in_context+0x220/0x220
> > May 16 13:28:46 serpens kernel: [33128.086270]
> > __hrtimer_run_queues+0xfa/0x260
> > May 16 13:28:46 serpens kernel: [33128.086272] hrtimer_interrupt+0xe5/0x240
> > May 16 13:28:46 serpens kernel: [33128.086275]  ?
> > recalibrate_cpu_khz+0x10/0x10
> > May 16 13:28:46 serpens kernel: [33128.086278]
> > smp_apic_timer_interrupt+0x62/0x120
> > May 16 13:28:46 serpens kernel: [33128.086280] ap

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-21 Thread Emmanuel Grumbach

>
> On Tue, May 19, 2020 at 10:37 PM Emmanuel Grumbach  
> wrote:
> > So I believe we already have this uevent, it is the devcoredump. All
> > we need is to add the unique id.
>
> I think there are a few reasons that devcoredump doesn't satisfy what
> either Luis or I want.
>
> 1) it can be disabled entirely [1], for good reasons (e.g., think of
> non-${CHIP_VENDOR} folks, who can't (and don't want to) do anything
> with the opaque dumps provided by closed-source firmware)

Ok, if all you're interested into is the information that this event
happen (as opposed to report a bug and providing the data), then I
agree. True, not everybody want or can enable devcoredump. I am just a
bit concerned that we may end up with two interface that notify the
same event basically. The ideal maybe would be to be able to
optionally reduce the content of the devoredump to nothing more that
is already in the dmesg output. But then, it is not what it is meant
to be: namely, a core dump..

> 2) not all drivers necessarily have a useful dump to provide when
> there's a crash; look at the rest of Luis's series to see the kinds of
> drivers-with-firmware that are crashing, some of which aren't dumping
> anything

Fair enouh.

> 3) for those that do support devcoredump, it may be used for purposes
> that are not "crashes" -- e.g., some provide debugfs or other knobs to
> initiate dumps, for diagnostic or debugging purposes

Not sure I really think we need to care about those cases, but you
already have 2 good arguments :)

>
> Brian
>
> [1] devcd_disabled
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/devcoredump.c?h=v5.6#n22

Re: [PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Greg Kroah-Hartman

On Fri, May 22, 2020 at 10:59:02AM +0800, Dinghao Liu wrote:
> When copy_from_user() returns an error code, there
> is a runtime PM usage counter imbalance.
> 
> Fix this by moving copy_from_user() to the beginning
> of this function.
> 
> Signed-off-by: Dinghao Liu 
> ---
>  drivers/usb/musb/musb_debugfs.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

What changed from v1?  Always show that below the --- line as the
documentation says to.

thanks,

greg k-h

[PATCH] rxrpc: fix a memory leak bug.

2020-05-21 Thread wu000273

From: Qiushi Wu 

In function rxkad_verify_response(), pointer "ticket" is not released,
when function rxkad_decrypt_ticket() returns an error, causing a
memory leak bug.

Fixes: 8c2f826dc3631 ("rxrpc: Don't put crypto buffers on the stack")
Signed-off-by: Qiushi Wu 
---
 net/rxrpc/rxkad.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 098f1f9ec53b..52a24d4ef5d8 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -1148,7 +1148,7 @@ static int rxkad_verify_response(struct rxrpc_connection 
*conn,
ret = rxkad_decrypt_ticket(conn, skb, ticket, ticket_len, &session_key,
   &expiry, _abort_code);
if (ret < 0)
-   goto temporary_error_free_resp;
+   goto temporary_error_free_ticket;
 
/* use the session key from inside the ticket to decrypt the
 * response */
@@ -1230,7 +1230,6 @@ static int rxkad_verify_response(struct rxrpc_connection 
*conn,
 
 temporary_error_free_ticket:
kfree(ticket);
-temporary_error_free_resp:
kfree(response);
 temporary_error:
/* Ignore the response packet if we got a temporary error such as
-- 
2.17.1

[PATCH] scsi: ufs-bsg: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu

When ufs_bsg_alloc_desc_buffer() returns an error code,
a pairing runtime PM usage counter decrement is needed
to keep the counter balanced.

Signed-off-by: Dinghao Liu 
---
 drivers/scsi/ufs/ufs_bsg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufs_bsg.c b/drivers/scsi/ufs/ufs_bsg.c
index 53dd87628cbe..516a7f573942 100644
--- a/drivers/scsi/ufs/ufs_bsg.c
+++ b/drivers/scsi/ufs/ufs_bsg.c
@@ -106,8 +106,10 @@ static int ufs_bsg_request(struct bsg_job *job)
desc_op = bsg_request->upiu_req.qr.opcode;
ret = ufs_bsg_alloc_desc_buffer(hba, job, &desc_buff,
&desc_len, desc_op);
-   if (ret)
+   if (ret) {
+   pm_runtime_put_sync(hba->dev);
goto out;
+   }
 
/* fall through */
case UPIU_TRANSACTION_NOP_OUT:
-- 
2.17.1

linux-next: manual merge of the block tree with the djw-vfs tree

2020-05-21 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the block tree got a conflict in:

  drivers/block/loop.c

between commit:

  efbe3c2493d2 ("fs: Remove unneeded IS_DAX() check in io_is_direct()")

from the djw-vfs tree and commit:

  3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl")

from the block tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/block/loop.c
index 14372df0f354,a565c5aafa52..
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@@ -1022,21 -1146,20 +1146,21 @@@ static int loop_configure(struct loop_d
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
  
-   if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
+   if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
blk_queue_write_cache(lo->lo_queue, true, false);
  
-   if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
+   if (config->block_size)
+   bsize = config->block_size;
 -  else if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev)
++  else if ((lo->lo_backing_file->f_flags & O_DIRECT) &&
++   inode->i_sb->s_bdev)
/* In case of direct I/O, match underlying block size */
-   unsigned short bsize = bdev_logical_block_size(
-   inode->i_sb->s_bdev);
+   bsize = bdev_logical_block_size(inode->i_sb->s_bdev);
+   else
+   bsize = 512;
  
-   blk_queue_logical_block_size(lo->lo_queue, bsize);
-   blk_queue_physical_block_size(lo->lo_queue, bsize);
-   blk_queue_io_min(lo->lo_queue, bsize);
-   }
+   blk_queue_logical_block_size(lo->lo_queue, bsize);
+   blk_queue_physical_block_size(lo->lo_queue, bsize);
+   blk_queue_io_min(lo->lo_queue, bsize);
  
loop_update_rotational(lo);
loop_update_dio(lo);


pgpt_3jZZBldb.pgp
Description: OpenPGP digital signature

[PATCH] scsi: ufs: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu

When devm_clk_get() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep
the counter balanced.

Signed-off-by: Dinghao Liu 
---
 drivers/scsi/ufs/ti-j721e-ufs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ufs/ti-j721e-ufs.c b/drivers/scsi/ufs/ti-j721e-ufs.c
index 5216d228cdd9..f3f212f6f9a9 100644
--- a/drivers/scsi/ufs/ti-j721e-ufs.c
+++ b/drivers/scsi/ufs/ti-j721e-ufs.c
@@ -39,6 +39,7 @@ static int ti_j721e_ufs_probe(struct platform_device *pdev)
clk = devm_clk_get(dev, NULL);
if (IS_ERR(clk)) {
dev_err(dev, "Cannot claim MPHY clock.\n");
+   pm_runtime_put_sync(dev);
return PTR_ERR(clk);
}
clk_rate = clk_get_rate(clk);
-- 
2.17.1

Re: [PATCH v2 7/8] exec: Generic execfd support

2020-05-21 Thread Rob Landley

On 5/21/20 10:28 PM, Eric W. Biederman wrote:
> 
> Rob Landley  writes:
> 
>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> 
>> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
>> bash-compatible shell with nommu support, which means in order to do subshell
>> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
>> have the child exec itself to unblock the parent, and then read the context 
>> data
>> that just got discarded through the pipe from the parent. ("Wheee." And you 
>> can
>> quote me on that.)
> 
> Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
> correct that should be the same as vfork except without causing the
> parent to wait for you.  Which I think would remove the need to reexec
> yourself.

As with perpetual motion, that only seems like it would work if you don't
understand what's going on.

A nommu system uses physical addresses, not virtual ones, so every process sees
the same addresses. So if I allocate a new block of memory and memcpy the
contents of the old one into the new one, any pointers in the copy point back
into the ORIGINAL block of memory. Trying to adjust the pointers in the copy is
the exact same problem as trying to do garbage collection in C: it's an AI
complete problem.

Any attempt to "implement a full fork" on nommu hits this problem: copying an
existing mapping to a new address range means any address values in the new
mapping point into the OLD mapping. Things like fdpic fix this up at exec time
(traversing elf tables and relocating), but not at runtime. If you can solve the
"relocate at runtime all addresses within an existing mapping, and all other
mappings that might point to this mapping, including local variables on the
stack that point to a structure member or halfway into a string rather than the
start of an allocation, without adjusting unrelated values coincidentally within
RANGE of a mapping" problem, THEN you can fork on a nommu system.

What vfork() does is pause the parent and have the child continue AS the parent
for a bit (with the system call returning 0). The child starts with all the same
memory mappings the parent has (usually not even a new stack). The child has a
new PID and new resources like its own file descriptor table so close() and
open() don't affect the parent, but if you change a global that's visible to the
parent when it resumes (ant often local variables too: don't return from the
function that called vfork() because if you DON'T have a new stack it'll stomp
the return address the parent needs when IT does it). If the child calls
malloc() the parent needs to free it because it's same heap (because same
mapping of the same physical memory).

Then when the child is ready to discard all those mappings (due to calling
either execve() or _exit(), those are the only two options), the parent resumes
from where it left off with the PID of the child as the system call return 
value.

The reason the child pauses the parent is so only one process is ever using
those mappings at a given time. Otherwise they're acting like threads without
locking, and usually both are sharing a stack.

P.S. You can use threads _instead_ of fork for some stuff on nommu, but that's
its own can of worms. You still need to vfork() when you do create a child
process you're going to exec, so it doesn't go away, you're just requiring
multiple techniques simultaneously to handle a special case.

P.P.S. vfork() is useful on mmu systems to solve the "don't fork from a thread"
problem. You can vfork() from a thread cheaply and reliably and it only pauses
the one thread you forked from, not every thread in the whole process. If you
fork() from a heavily threadded process you can cause a multi-milisecond latency
spike because even with an mmu the copy on write "keep track of what's shared by
what" generally can't handle the "threads AND processes sharing mappings" case,
so it just gives up and copies it all at fork time, in one go, holding a big
lock while doing so. This causes a large latency spike which vfork() avoids.
(And can cause a large wasteful allocation and memory dirtying which is
immediately freed.)

>>> The file descriptor is stored in mm->exe_file.
>>> Probably the most straight forward implementation is to allow
>>> execveat(AT_EXE_FILE, ...).
>>
>> Cool, that works.
>>
>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>
>> Added to the todo heap.
> 
> Yes I don't think it would be a lot of code.
> 
> I think you might be better served with clone(CLONE_VM) as it doesn't
> block so you don't need to feed yourself your context over a pipe.

Except that doesn't fix it.

Yes I could use threads instead, but the cure is worse than the disease and the
result is your shell background processes are threads rather than independent
processes (is $$ reporting PID or TID, I really don't want to go there).

> Eric

Rob

[PATCH] wlcore: fix runtime pm imbalance in wlcore_irq_locked

2020-05-21 Thread Dinghao Liu

When wlcore_fw_status() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep the
counter balanced. It's the same for all error paths after
wlcore_fw_status().

Signed-off-by: Dinghao Liu 
---
 drivers/net/wireless/ti/wlcore/main.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c 
b/drivers/net/wireless/ti/wlcore/main.c
index f140f7d7f553..fd3608223f64 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -548,7 +548,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 
ret = wlcore_fw_status(wl, wl->fw_status);
if (ret < 0)
-   goto out;
+   goto err_ret;
 
wlcore_hw_tx_immediate_compl(wl);
 
@@ -565,7 +565,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
ret = -EIO;
 
/* restarting the chip. ignore any other interrupt. */
-   goto out;
+   goto err_ret;
}
 
if (unlikely(intr & WL1271_ACX_SW_INTR_WATCHDOG)) {
@@ -575,7 +575,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
ret = -EIO;
 
/* restarting the chip. ignore any other interrupt. */
-   goto out;
+   goto err_ret;
}
 
if (likely(intr & WL1271_ACX_INTR_DATA)) {
@@ -583,7 +583,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 
ret = wlcore_rx(wl, wl->fw_status);
if (ret < 0)
-   goto out;
+   goto err_ret;
 
/* Check if any tx blocks were freed */
spin_lock_irqsave(&wl->wl_lock, flags);
@@ -596,7 +596,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 */
ret = wlcore_tx_work_locked(wl);
if (ret < 0)
-   goto out;
+   goto err_ret;
} else {
spin_unlock_irqrestore(&wl->wl_lock, flags);
}
@@ -604,7 +604,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
/* check for tx results */
ret = wlcore_hw_tx_delayed_compl(wl);
if (ret < 0)
-   goto out;
+   goto err_ret;
 
/* Make sure the deferred queues don't get too long */
defer_count = skb_queue_len(&wl->deferred_tx_queue) +
@@ -617,14 +617,14 @@ static int wlcore_irq_locked(struct wl1271 *wl)
wl1271_debug(DEBUG_IRQ, "WL1271_ACX_INTR_EVENT_A");
ret = wl1271_event_handle(wl, 0);
if (ret < 0)
-   goto out;
+   goto err_ret;
}
 
if (intr & WL1271_ACX_INTR_EVENT_B) {
wl1271_debug(DEBUG_IRQ, "WL1271_ACX_INTR_EVENT_B");
ret = wl1271_event_handle(wl, 1);
if (ret < 0)
-   goto out;
+   goto err_ret;
}
 
if (intr & WL1271_ACX_INTR_INIT_COMPLETE)
@@ -635,6 +635,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
wl1271_debug(DEBUG_IRQ, "WL1271_ACX_INTR_HW_AVAILABLE");
}
 
+err_ret:
pm_runtime_mark_last_busy(wl->dev);
pm_runtime_put_autosuspend(wl->dev);
 
-- 
2.17.1

Re: [PATCH 10/14] docs: move locking-specific documenta to locking/ directory

2020-05-21 Thread Mauro Carvalho Chehab

Em Fri, 15 May 2020 12:06:07 -0600
Jonathan Corbet  escreveu:

> On Fri,  1 May 2020 17:37:54 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > Several files under Documentation/*.txt describe some type of
> > locking API. Move them to locking/ subdir and add to the
> > locking/index.rst index file.
> > 
> > Signed-off-by: Mauro Carvalho Chehab   
> 
> I've applied this, but it really seems like this belongs in the core-api
> manual someday.

Makes sense.

Well, right now, it is at the same level as core-api, just below it:

Kernel API documentation

These books get into the details of how specific kernel subsystems work
from the point of view of a kernel developer.  Much of the information 
here
is taken directly from the kernel source, with supplemental material 
added
as needed (or at least as we managed to add it — probably *not* all 
that is
needed).

.. toctree::
   :maxdepth: 2

   driver-api/index
   core-api/index
   locking/index

Not too bad.

Btw, there are other doc sets that could also fit into the core-api, like:

...
   accounting/index
...
   security/index
...
   bpf/index
...
   scheduler/index

while most of the rest should likely be inside driver-api.

Some care should be taken when moving stuff, though: there is a
reason why they weren't moved to driver-api in the first place:
they may contain stuff for the admin guide mixed there.

Thanks,
Mauro

Re: Re: [PATCH] [v2] PCI: tegra194: Fix runtime PM imbalance on error

2020-05-21 Thread dinghao . liu

Hi Bjorn,

In fact, most usage of pm_runtime_get_sync() is correct. I made 
a static analysis tool to check this imbalance in kernel and 
found about 80 bugs in dirvers. Some of my patches have been 
accepted and I'm trying to patch the rest as soon as possible.

Regards,
Dinghao 

> [+cc Rafael, linux-pm]
> 
> On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > when it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
> 
> I didn't realize there were so many drivers with the exact same issue.
> Can we just squash these all into a single patch so we can see them
> all together?
> 
> Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
> looks like many of them have similar issues, i.e., they have a pattern
> like this
> 
>   ret = pm_runtime_get_sync(dev);
>   if (ret < 0)
> return;
> 
>   pm_runtime_put(dev);
> 
> where there is not a pm_runtime_put() to match every
> pm_runtime_get_sync().  Random sample:
> 
>   nds32_pmu_reserve_hardware
>   sata_rcar_probe
>   exynos_trng_probe
>   ks_sa_rng_probe
>   omap_aes_probe
>   sun8i_ss_probe
>   omap_aes_probe
>   zynq_gpio_probe
>   amdgpu_hwmon_show_power_avg
>   mtk_crtc_ddp_hw_init
>   ...
> 
> Surely I'm missing something and these aren't all broken, right?
> 
> Maybe we could put together a coccinelle script to scan the tree for
> this issue?
> 
> > Signed-off-by: Dinghao Liu 
> > ---
> >  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c 
> > b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index ae30a2fd3716..2c0d2ce16b47 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw 
> > *pcie)
> > ret = pinctrl_pm_select_default_state(dev);
> > if (ret < 0) {
> > dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
> > -   goto fail_pinctrl;
> > +   goto fail_pm_get_sync;
> > }
> >  
> > tegra_pcie_init_controller(pcie);
> > @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw 
> > *pcie)
> >  
> >  fail_host_init:
> > tegra_pcie_deinit_controller(pcie);
> > -fail_pinctrl:
> > -   pm_runtime_put_sync(dev);
> >  fail_pm_get_sync:
> > +   pm_runtime_put_sync(dev);
> > pm_runtime_disable(dev);
> > return ret;
> >  }
> > -- 
> > 2.17.1
> >

Re: [PATCH 06/14] docs: debugging-via-ohci1394.txt: add it to the core-api book

2020-05-21 Thread Mauro Carvalho Chehab

Em Fri, 15 May 2020 12:00:16 -0600
Jonathan Corbet  escreveu:

> On Fri,  1 May 2020 17:37:50 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > There is an special chapter inside the core-api book about
> > some debug infrastructure like tracepoints and debug objects.
> > 
> > It sounded to me that this is the best place to add a chapter
> > explaining how to use a FireWire controller to do remote
> > kernel debugging, as explained on this document.
> > 
> > Signed-off-by: Mauro Carvalho Chehab   
> 
> I've applied this, but core-api really seems like the wrong place for
> this.  It would be good to rethink our layout a bit at some point in the
> near future...

Yeah, agreed. Debug functionality should likely deserve a separate
chapter outside core-api.

Now that we'll have all docs converted, it should be easier to view
the hole picture and re-design the doc organization.


Thanks,
Mauro

Re: [PATCH 02/14] docs: add bus-virt-phys-mapping.txt to core-api

2020-05-21 Thread Mauro Carvalho Chehab

Em Fri, 15 May 2020 11:53:21 -0600
Jonathan Corbet  escreveu:

> On Fri,  1 May 2020 17:37:46 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > This describes an old interface used prior the new DMA-API
> > interfaces. Add it to the core-api guide, just after the
> > DMA stuff.
> > 
> > Signed-off-by: Mauro Carvalho Chehab 
> > ---
> >  .../bus-virt-phys-mapping.rst}   | 0
> >  Documentation/core-api/index.rst | 1 +
> >  2 files changed, 1 insertion(+)
> >  rename Documentation/{bus-virt-phys-mapping.txt => 
> > core-api/bus-virt-phys-mapping.rst} (100%)  
> 
> For this one, I think we should maybe just delete the file.  It contains a
> warning from *20 years ago* saying not to use it, and talks about
> functions like isa_readl() that haven't existed i the kernel for some
> time.  Is there any reason to keep dragging it around?

Except for "keeping it for historical reasons" (as mentioned at the
file), I don't see any reason why to keep it.

It might be useful if someone wants to port some OOT code based on
a legacy kernel.

Yet, if you prefer to just trash it, I'm ok with that.

Thanks,
Mauro

Re: [PATCH v2 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

2020-05-21 Thread Alex Williamson

On Thu, 21 May 2020 22:39:06 -0400
Qian Cai  wrote:

> On Tue, May 05, 2020 at 03:55:02PM -0600, Alex Williamson wrote:
> []
> vfio_pci_mmap_fault(struct vm_fault *vmf)
> >  {
> > struct vm_area_struct *vma = vmf->vma;
> > struct vfio_pci_device *vdev = vma->vm_private_data;
> > +   vm_fault_t ret = VM_FAULT_NOPAGE;
> >  
> > -   if (vfio_pci_add_vma(vdev, vma))
> > -   return VM_FAULT_OOM;
> > +   mutex_lock(&vdev->vma_lock);
> > +   down_read(&vdev->memory_lock);  
> 
> This lock here will trigger,
> 
> [17368.321363][T3614103] 
> ==
> [17368.321375][T3614103] WARNING: possible circular locking dependency 
> detected
> [17368.321399][T3614103] 5.7.0-rc6-next-20200521+ #116 Tainted: GW
> 
> [17368.321410][T3614103] 
> --
> [17368.321433][T3614103] qemu-kvm/3614103 is trying to acquire lock:
> [17368.321443][T3614103] c000200fb2328968 (&kvm->lock){+.+.}-{3:3}, at: 
> kvmppc_irq_bypass_add_producer_hv+0xd4/0x3b0 [kvm_hv]
> [17368.321488][T3614103] 
> [17368.321488][T3614103] but task is already holding lock:
> [17368.321533][T3614103] c16f4dc8 (lock#7){+.+.}-{3:3}, at: 
> irq_bypass_register_producer+0x80/0x1d0
> [17368.321564][T3614103] 
> [17368.321564][T3614103] which lock already depends on the new lock.
> [17368.321564][T3614103] 
> [17368.321590][T3614103] 
> [17368.321590][T3614103] the existing dependency chain (in reverse order) is:
> [17368.321625][T3614103] 
> [17368.321625][T3614103] -> #4 (lock#7){+.+.}-{3:3}:
> [17368.321662][T3614103]__mutex_lock+0xdc/0xb80
> [17368.321683][T3614103]irq_bypass_register_producer+0x80/0x1d0
> [17368.321706][T3614103]vfio_msi_set_vector_signal+0x1d8/0x350 
> [vfio_pci]
> [17368.321719][T3614103]vfio_msi_set_block+0xb0/0x1e0 [vfio_pci]
> [17368.321752][T3614103]vfio_pci_set_msi_trigger+0x13c/0x3e0 
> [vfio_pci]
> [17368.321787][T3614103]vfio_pci_set_irqs_ioctl+0x134/0x2c0 [vfio_pci]
> [17368.321821][T3614103]vfio_pci_ioctl+0xe10/0x1460 [vfio_pci]
> [17368.321855][T3614103]vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> [17368.321879][T3614103]ksys_ioctl+0xd8/0x130
> [17368.321888][T3614103]sys_ioctl+0x28/0x40
> [17368.321910][T3614103]system_call_exception+0x108/0x1d0
> [17368.321932][T3614103]system_call_common+0xf0/0x278
> [17368.321951][T3614103] 
> [17368.321951][T3614103] -> #3 (&vdev->memory_lock){}-{3:3}:
> [17368.321988][T3614103]lock_release+0x190/0x5e0
> [17368.322009][T3614103]__mutex_unlock_slowpath+0x68/0x410
> [17368.322042][T3614103]vfio_pci_mmap_fault+0xe8/0x1f0 [vfio_pci]
> vfio_pci_mmap_fault at drivers/vfio/pci/vfio_pci.c:1534
> [17368.322066][T3614103]__do_fault+0x64/0x220
> [17368.322086][T3614103]handle_mm_fault+0x12f0/0x19e0
> [17368.322107][T3614103]__do_page_fault+0x284/0xf70
> [17368.322116][T3614103]handle_page_fault+0x10/0x2c
> [17368.322136][T3614103] 
> [17368.322136][T3614103] -> #2 (&mm->mmap_sem){}-{3:3}:
> [17368.322160][T3614103]__might_fault+0x84/0xe0
> [17368.322182][T3614103]_copy_to_user+0x3c/0x120
> [17368.322206][T3614103]kvm_vcpu_ioctl+0x1ec/0xac0 [kvm]
> [17368.322239][T3614103]ksys_ioctl+0xd8/0x130
> [17368.322270][T3614103]sys_ioctl+0x28/0x40
> [17368.322301][T3614103]system_call_exception+0x108/0x1d0
> [17368.322334][T3614103]system_call_common+0xf0/0x278
> [17368.322375][T3614103] 
> [17368.322375][T3614103] -> #1 (&vcpu->mutex){+.+.}-{3:3}:
> [17368.322411][T3614103]__mutex_lock+0xdc/0xb80
> [17368.322446][T3614103]kvmppc_xive_release+0xd8/0x260 [kvm]
> [17368.322484][T3614103]kvm_device_release+0xc4/0x110 [kvm]
> [17368.322518][T3614103]__fput+0x154/0x3b0
> [17368.322562][T3614103]task_work_run+0xd8/0x170
> [17368.322583][T3614103]do_exit+0x4f8/0xeb0
> [17368.322604][T3614103]do_group_exit+0x78/0x160
> [17368.322625][T3614103]get_signal+0x230/0x1440
> [17368.322657][T3614103]do_notify_resume+0x130/0x3e0
> [17368.322677][T3614103]syscall_exit_prepare+0x1a4/0x280
> [17368.322687][T3614103]system_call_common+0xf8/0x278
> [17368.322718][T3614103] 
> [17368.322718][T3614103] -> #0 (&kvm->lock){+.+.}-{3:3}:
> [17368.322753][T3614103]__lock_acquire+0x1fe4/0x3190
> [17368.322774][T3614103]lock_acquire+0x140/0x9a0
> [17368.322805][T3614103]__mutex_lock+0xdc/0xb80
> [17368.322838][T3614103]

[PATCH 1/2] video: fbdev: fix error handling for get_user_pages_fast()

2020-05-21 Thread John Hubbard

Dealing with the return value of get_user_pages*() variants has a few
classic pitfalls, and this driver found one of them: the return value
might be zero, positive, or -errno. And if positive, it might be fewer
pages than were requested. And if fewer pages than requested, then
the caller should return (via put_page()) the pages that *were*
pinned.

This driver was doing that *except* that it had a problem with the
-errno case, which was being stored in an unsigned int, and which
would case an interesting mess if it ever happened: nr_pages would be
interpreted as a spectacularly huge unsigned value, rather than a
small negative value. Also, it was unnecessarily overriding a
potentially informative -errno, with -EINVAL, in some cases.

Instead: clamp the nr_pages to zero or positive, so that the error
handling works. And return the -errno value from get_user_pages*(),
unchanged, if we get one. And explain this with comments, seeing as
how it is error-prone.

Cc: Bartlomiej Zolnierkiewicz 
Cc: Arnd Bergmann 
Cc: Daniel Vetter 
Cc: Gustavo A. R. Silva 
Cc: Jani Nikula 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/video/fbdev/pvr2fb.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index f18d457175d9..ceb6ef590597 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -654,8 +654,22 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
 
ret = get_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, 
pages);
if (ret < nr_pages) {
-   nr_pages = ret;
-   ret = -EINVAL;
+   if (ret < 0) {
+   /*
+*  Clamp the unsigned nr_pages to zero so that the
+*  error handling works. And leave ret at whatever
+*  -errno value was returned from GUP.
+*/
+   nr_pages = 0;
+   } else {
+   nr_pages = ret;
+   /*
+* Use -EINVAL to represent a mildly desperate guess at
+* why we got fewer pages (maybe even zero pages) than
+* requested.
+*/
+   ret = -EINVAL;
+   }
goto out_unmap;
}
 
-- 
2.26.2

[PATCH 0/2] video: fbdev: fix error handling, convert to pin_user_pages*()

2020-05-21 Thread John Hubbard

Hi,

Note that I have only compile-tested this series, although that does
also include cross-compiling for a few other arches. I'm hoping that
this posting will lead to some run-time testing.

Also: the proposed fix does not have a "Fixes:" tag, nor does it
Cc stable. That's because the issue has been there since the dawn of
git history for the kernel. If it's gone unnoticed this long, then
there is clearly no need for the relatively fast track of putting it
into stable, IMHO. But please correct me if that's wrong.

Cc: Bartlomiej Zolnierkiewicz 
Cc: Arnd Bergmann 
Cc: Daniel Vetter 
Cc: Gustavo A. R. Silva 
Cc: Jani Nikula 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org

John Hubbard (2):
  video: fbdev: fix error handling for get_user_pages_fast()
  video: fbdev: convert get_user_pages() --> pin_user_pages()

 drivers/video/fbdev/pvr2fb.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)


base-commit: 051143e1602d90ea71887d92363edd539d411de5
-- 
2.26.2

[PATCH 2/2] video: fbdev: convert get_user_pages() --> pin_user_pages()

2020-05-21 Thread John Hubbard

This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Cc: Bartlomiej Zolnierkiewicz 
Cc: Arnd Bergmann 
Cc: Daniel Vetter 
Cc: Gustavo A. R. Silva 
Cc: Jani Nikula 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/video/fbdev/pvr2fb.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index ceb6ef590597..2d9f69b93392 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -652,7 +652,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
if (!pages)
return -ENOMEM;
 
-   ret = get_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, 
pages);
+   ret = pin_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, 
pages);
if (ret < nr_pages) {
if (ret < 0) {
/*
@@ -712,9 +712,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
ret = count;
 
 out_unmap:
-   for (i = 0; i < nr_pages; i++)
-   put_page(pages[i]);
-
+   unpin_user_pages(pages, nr_pages);
kfree(pages);
 
return ret;
-- 
2.26.2

[PATCH 2/3] gpio: pxa: Fix return value of pxa_gpio_probe()

2020-05-21 Thread Tiezhu Yang

When call function devm_platform_ioremap_resource(), we should use IS_ERR()
to check the return value and return PTR_ERR() if failed.

Fixes: 542c25b7a209 ("drivers: gpio: pxa: use devm_platform_ioremap_resource()")
Signed-off-by: Tiezhu Yang 
---
 drivers/gpio/gpio-pxa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-pxa.c b/drivers/gpio/gpio-pxa.c
index 1361270..0cb6600 100644
--- a/drivers/gpio/gpio-pxa.c
+++ b/drivers/gpio/gpio-pxa.c
@@ -660,8 +660,8 @@ static int pxa_gpio_probe(struct platform_device *pdev)
pchip->irq1 = irq1;
 
gpio_reg_base = devm_platform_ioremap_resource(pdev, 0);
-   if (!gpio_reg_base)
-   return -EINVAL;
+   if (IS_ERR(gpio_reg_base))
+   return PTR_ERR(gpio_reg_base);
 
clk = clk_get(&pdev->dev, NULL);
if (IS_ERR(clk)) {
-- 
2.1.0

[PATCH 3/3] gpio: pxa: Add COMPILE_TEST support

2020-05-21 Thread Tiezhu Yang

Add COMPILE_TEST support to the PXA GPIO driver for better compile
testing coverage.

Signed-off-by: Tiezhu Yang 
---
 drivers/gpio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 03c01f4..5e90aad 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -439,7 +439,7 @@ config GPIO_PMIC_EIC_SPRD
 
 config GPIO_PXA
bool "PXA GPIO support"
-   depends on ARCH_PXA || ARCH_MMP
+   depends on ARCH_PXA || ARCH_MMP || COMPILE_TEST
help
  Say yes here to support the PXA GPIO device
 
-- 
2.1.0

[PATCH 1/3] gpio: bcm-kona: Fix return value of bcm_kona_gpio_probe()

2020-05-21 Thread Tiezhu Yang

When call function devm_platform_ioremap_resource(), we should use IS_ERR()
to check the return value and return PTR_ERR() if failed.

Fixes: 72d8cb715477 ("drivers: gpio: bcm-kona: use 
devm_platform_ioremap_resource()")
Signed-off-by: Tiezhu Yang 
---
 drivers/gpio/gpio-bcm-kona.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-bcm-kona.c b/drivers/gpio/gpio-bcm-kona.c
index baee8c3..cf3687a 100644
--- a/drivers/gpio/gpio-bcm-kona.c
+++ b/drivers/gpio/gpio-bcm-kona.c
@@ -625,7 +625,7 @@ static int bcm_kona_gpio_probe(struct platform_device *pdev)
 
kona_gpio->reg_base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(kona_gpio->reg_base)) {
-   ret = -ENXIO;
+   ret = PTR_ERR(kona_gpio->reg_base);
goto err_irq_domain;
}
 
-- 
2.1.0

Re: [v4,0/7] Add Mediatek thermal dirver and dtsi

2020-05-21 Thread Michael Kao

On Thu, 2020-05-21 at 14:51 +0200, Matthias Brugger wrote:
> Hi Michael,
> 
> On 23/03/2020 13:15, Michael Kao wrote:
> > This patchset supports for MT8183 chip to mtk_thermal.c.
> > Add thermal zone of all the thermal sensor in SoC for
> > another get temperatrue. They don't need to thermal throttle.
> > And we bind coolers for thermal zone nodes of cpu_thermal.
> > 
> > Rebase to kernel-5.6-rc1.
> > 
> > Update content:
> > 
> > [1/7]
> > - Squash thermal zone settings in the dtsi from [v3,5/8]
> >   arm64: dts: mt8183: Increase polling frequency for CPU thermal zone
> > 
> > - Remove the property of interrupts and mediatek,hw-reset-temp
> > 
> > [2/7]
> > - Correct commit message
> > 
> > [4/7]
> > - Change the target temperature to the 80C and change the commit message
> > 
> > [6/7]
> > - Adjust newline alignment
> > 
> > - Fix the judgement on the return value of registering thermal zone
> > 
> > This patch series base on these patches [1].
> > 
> > [v7,3/3] PM / AVS: SVS: Introduce SVS engine 
> > (https://patchwork.kernel.org/patch/11439829/)
> > 
> > Matthias Kaehlcke (1):
> >   arm64: dts: mt8183: Configure CPU cooling
> > 
> > Michael Kao (6):
> >   arm64: dts: mt8183: add thermal zone node
> >   arm64: dts: mt8183: add dynamic power coefficients
> >   arm64: dts: mt8183: Add #cooling-cells to CPU nodes
> >   thermal: mediatek: mt8183: fix bank number settings
> 
> Do I understand correctly that we need to fix the bank number before we can 
> add
> the device tree changes. And that the last two patches are enhancements for 
> the
> driver but needed to get a working version?
> 
> Regards,
> Matthias
> 
Hi Matthias,

There is one bank setting of mt8183 config.
If the device tree merged first. I worry that it will crash when the
thermal zone read temperature.
It will access the invalid index of bank.
So please wait the patch "fix bank number settings " merged first.
Thanks!

/* MT8183 thermal sensor data */
static const int mt8183_bank_data[MT8183_NUM_SENSORS] = {
MT8183_TS1, MT8183_TS2, MT8183_TS3, MT8183_TS4, MT8183_TS5,
MT8183_TSABB
}; 

Best Regards,
Michael


> >   thermal: mediatek: add another get_temp ops for thermal sensors
> >   thermal: mediatek: use spinlock to protect PTPCORESEL
> > 
> >  arch/arm64/boot/dts/mediatek/mt8183.dtsi | 156 +++
> >  drivers/thermal/mtk_thermal.c|  88 +++--
> >  2 files changed, 231 insertions(+), 13 deletions(-)
> >

Re: [v4,0/7] Add Mediatek thermal dirver and dtsi

2020-05-21 Thread Michael Kao

On Thu, 2020-05-21 at 14:51 +0200, Matthias Brugger wrote:
> Hi Michael,
> 
> On 23/03/2020 13:15, Michael Kao wrote:
> > This patchset supports for MT8183 chip to mtk_thermal.c.
> > Add thermal zone of all the thermal sensor in SoC for
> > another get temperatrue. They don't need to thermal throttle.
> > And we bind coolers for thermal zone nodes of cpu_thermal.
> > 
> > Rebase to kernel-5.6-rc1.
> > 
> > Update content:
> > 
> > [1/7]
> > - Squash thermal zone settings in the dtsi from [v3,5/8]
> >   arm64: dts: mt8183: Increase polling frequency for CPU thermal zone
> > 
> > - Remove the property of interrupts and mediatek,hw-reset-temp
> > 
> > [2/7]
> > - Correct commit message
> > 
> > [4/7]
> > - Change the target temperature to the 80C and change the commit message
> > 
> > [6/7]
> > - Adjust newline alignment
> > 
> > - Fix the judgement on the return value of registering thermal zone
> > 
> > This patch series base on these patches [1].
> > 
> > [v7,3/3] PM / AVS: SVS: Introduce SVS engine 
> > (https://patchwork.kernel.org/patch/11439829/)
> > 
> > Matthias Kaehlcke (1):
> >   arm64: dts: mt8183: Configure CPU cooling
> > 
> > Michael Kao (6):
> >   arm64: dts: mt8183: add thermal zone node
> >   arm64: dts: mt8183: add dynamic power coefficients
> >   arm64: dts: mt8183: Add #cooling-cells to CPU nodes
> >   thermal: mediatek: mt8183: fix bank number settings
> 
> Do I understand correctly that we need to fix the bank number before we can 
> add
> the device tree changes. And that the last two patches are enhancements for 
> the
> driver but needed to get a working version?
> 
> Regards,
> Matthias

Hi Matthias,

There is one bank setting of mt8183 config.
If the device tree merged first. I worry that it will crash when the
thermal zone read temperature.
It will access the invalid index of bank.
So please add the patch "fix bank number settings "
first.2

> 
> >   thermal: mediatek: add another get_temp ops for thermal sensors
> >   thermal: mediatek: use spinlock to protect PTPCORESEL
> > 
> >  arch/arm64/boot/dts/mediatek/mt8183.dtsi | 156 +++
> >  drivers/thermal/mtk_thermal.c|  88 +++--
> >  2 files changed, 231 insertions(+), 13 deletions(-)
> >

Re: [PATCH V3 0/3] arm64: Enable vmemmap mapping from device memory

2020-05-21 Thread Jia He


Hi

On 2020/3/31 13:09, Anshuman Khandual wrote:

This series enables vmemmap backing memory allocation from device memory
ranges on arm64. But before that, it enables vmemmap_populate_basepages()
and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
alocation requests.


I verified no obvious regression after this patch series.

Host: ThunderX2(armv8a server), kernel v5.4

qemu:v3.1, -M virt \

-object 
memory-backend-file,id=mem1,share=on,mem-path=/tmp2/nvdimm.img,size=4G,align=2M \


-device nvdimm,id=nvdimm1,memdev=mem1,label-size=2M

Guest: kernel v5.7.0-rc5 with this patch series.

Tested case:

- 4K PAGESIZE, boot, mount w/ -o dax, mount w/o -o dax, basic io

- 64K PAGESIZE,boot, mount w/ -o dax, mount w/o -o dax, basic io

Not tested:

- 16K pagesize due to my hardware limiation(can't run 16K pgsz kernel)

- hot-add/remove nvdimm device from qemu due to no fully support on arm64 qemu 
yet

- Host nvdimm device hotplug

Hence from above result,

Tested-by: Jia He 


This series applies after latest (v14) arm64 memory hot remove series
(https://lkml.org/lkml/2020/3/3/1746) on Linux 5.6.

Pending Question:

altmap_alloc_block_buf() does not have any other remaining users in the
tree after this change. Should it be converted into a static function and
it's declaration be dropped from the header (include/linux/mm.h). Avoided
doing so because I was not sure if there are any off-tree users or not.

Changes in V3:

- Dropped comment from free_hotplug_page_range() per Robin
- Modified comment in unmap_hotplug_range() per Robin
- Enabled altmap support in vmemmap_alloc_block_buf() per Robin

Changes in V2: (https://lkml.org/lkml/2020/3/4/475)

- Rebased on latest hot-remove series (v14) adding P4D page table support

Changes in V1: (https://lkml.org/lkml/2020/1/23/12)

- Added an WARN_ON() in unmap_hotplug_range() when altmap is
   provided without the page table backing memory being freed

Changes in RFC V2: (https://lkml.org/lkml/2019/10/21/11)

- Changed the commit message on 1/2 patch per Will
- Changed the commit message on 2/2 patch as well
- Rebased on arm64 memory hot remove series (v10)

RFC V1: (https://lkml.org/lkml/2019/6/28/32)

Cc: Catalin Marinas
Cc: Will Deacon
Cc: Mark Rutland
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Dave Hansen
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: David Hildenbrand
Cc: Mike Rapoport
Cc: Michal Hocko
Cc: "Matthew Wilcox (Oracle)"
Cc: "Kirill A. Shutemov"
Cc: Andrew Morton
Cc: Dan Williams
Cc: Pavel Tatashin
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc:linux-arm-ker...@lists.infradead.org
Cc:linux-i...@vger.kernel.org
Cc:linux-ri...@lists.infradead.org
Cc:x...@kernel.org
Cc:linuxppc-...@lists.ozlabs.org
Cc:linux...@kvack.org
Cc:linux-kernel@vger.kernel.org

Anshuman Khandual (3):
   mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
   mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()
   arm64/mm: Enable vmem_altmap support for vmemmap mappings

  arch/arm64/mm/mmu.c   | 59 ++-
  arch/ia64/mm/discontig.c  |  2 +-
  arch/powerpc/mm/init_64.c | 10 +++
  arch/riscv/mm/init.c  |  2 +-
  arch/x86/mm/init_64.c | 12 
  include/linux/mm.h|  8 --
  mm/sparse-vmemmap.c   | 38 -
  7 files changed, 87 insertions(+), 44 deletions(-)


--

---
Cheers,
Justin (Jia He)

Re: [PATCH 01/10] swiotlb-xen: use vmalloc_to_page on vmalloc virt addresses

2020-05-21 Thread Stefano Stabellini

On Thu, 21 May 2020, Julien Grall wrote:
> Hi,
> 
> On 21/05/2020 00:45, Stefano Stabellini wrote:
> > From: Boris Ostrovsky 
> > 
> > Don't just assume that virt_to_page works on all virtual addresses.
> > Instead add a is_vmalloc_addr check and use vmalloc_to_page on vmalloc
> > virt addresses.
> 
> Can you provide an example where swiotlb is used with vmalloc()?

The issue was reported here happening on the Rasperry Pi 4:
https://marc.info/?l=xen-devel&m=158862573216800

If you are asking where in the Linux codebase the vmalloc is happening
specifically, I don't know for sure, my information is limited to the
stack trace that you see in the link (I don't have a Rasperry Pi 4 yet
but I shall have one soon.)


> > Signed-off-by: Boris Ostrovsky 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >   drivers/xen/swiotlb-xen.c | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b6d27762c6f8..a42129cba36e 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -335,6 +335,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t
> > size, void *vaddr,
> > int order = get_order(size);
> > phys_addr_t phys;
> > u64 dma_mask = DMA_BIT_MASK(32);
> > +   struct page *pg;
> > if (hwdev && hwdev->coherent_dma_mask)
> > dma_mask = hwdev->coherent_dma_mask;
> > @@ -346,9 +347,11 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t
> > size, void *vaddr,
> > /* Convert the size to actually allocated. */
> > size = 1UL << (order + XEN_PAGE_SHIFT);
> >   + pg = is_vmalloc_addr(vaddr) ? vmalloc_to_page(vaddr) :
> > + virt_to_page(vaddr);
> 
> Common DMA code seems to protect this check with CONFIG_DMA_REMAP. Is it
> something we want to do it here as well? Or is there any other condition where
> vmalloc can happen?

I can see it in dma_direct_free_pages:

if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
vunmap(cpu_addr);

I wonder why the common DMA code does that. is_vmalloc_addr should work
regardless of CONFIG_DMA_REMAP. Maybe just for efficiency?

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1291 matches

Mail list logo