Re: [PATCH] ARM: Use WFI when possible when halting a core

2017-07-27 Thread Chen-Yu Tsai
On Thu, Jul 27, 2017 at 8:45 PM, Chen-Yu Tsai  wrote:
> On Thu, Jul 27, 2017 at 8:17 PM, Russell King - ARM Linux
>  wrote:
>> On Thu, Jul 27, 2017 at 08:08:03PM +0800, Chen-Yu Tsai wrote:
>>> On ARM, the kernel busy loops after cleaning up when a core is to be
>>> halted. This may have the undesired effect of increasing the core
>>> temperature, at a time when no power or thermal management is
>>> available, such as a kernel panic or shutdown to halt.
>>>
>>> x86 uses the more efficient HLT (halt) instruction. The equivalent
>>> instruction on ARM is WFI (wait for interrupt).
>>>
>>> This patch makes the busy loops in the ARM halt/reboot/panic paths
>>> use WFI when available (as defined in arch/arm/include/asm/barrier.h).
>>>
>>> A touch test indicates that this lowers the surface temperature of the
>>> Allwinner H3 SoC, with all 4 cores brought up with SMP, from painfully
>>> hot to just warm to the touch after shutdown to halt.
>>>
>>> Signed-off-by: Chen-Yu Tsai 
>>> ---
>>>
>>> I asked about this some time ago, and the answer was no one had done
>>> it. So here it is. It is somewhat similar to the patch "arm64:
>>> Improve parking of stopped CPUs", for arm64 obviously.
>>
>> There are various errata around the "wfi" instruction on various CPUs
>> which means that this simple implementation isn't going to work
>> reliably.
>>
>> This is where cpu_do_idle() comes in - please use that instead.
>
> Thanks for the tip. Looks like this one is implemented for every
> platform. No need for #ifdefs anymore.

So I've done the modifications and re-tested. Unfortunately it no
longer works, either using wfi() directly or cpu_do_idle(). It worked
back in January this year.

I'm not sure what's happening. The chip is still hot. (Hotter than
when it is idle in userspace.) I added some printk calls around the
wfi() / cpu_do_idle() calls and found it continuously looping, instead
of being parked. But of course printk might add to the problem.

AFAIK all interrupts should be disabled by the time the call is
reached. What could possibly bring the CPU out of WFI?

Regards
ChenYu


Re: [PATCH] mm: memcontrol: Use int for event/state parameter in several functions

2017-07-27 Thread Michal Hocko
On Thu 27-07-17 14:10:04, Matthias Kaehlcke wrote:
> Several functions use an enum type as parameter for an event/state,
> but are called in some locations with an argument of a different enum
> type. Adjust the interface of these functions to reality by changing the
> parameter to int.

enums are mostly for documentation purposes. Using defines would be
possible but less elegant. If for anything else then things like
MEMCG_NR_STAT.

> This fixes a ton of enum-conversion warnings that are generated when
> building the kernel with clang.
> 
> Signed-off-by: Matthias Kaehlcke 

Acked-by: Michal Hocko 

Thanks!
> ---
>  include/linux/memcontrol.h | 20 
>  mm/memcontrol.c|  4 +++-
>  2 files changed, 15 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 3914e3dd6168..80edbc04361e 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -487,8 +487,9 @@ extern int do_swap_account;
>  void lock_page_memcg(struct page *page);
>  void unlock_page_memcg(struct page *page);
>  
> +/* idx can be of type enum memcg_stat_item or node_stat_item */
>  static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
> -  enum memcg_stat_item idx)
> +  int idx)
>  {
>   long val = 0;
>   int cpu;
> @@ -502,15 +503,17 @@ static inline unsigned long memcg_page_state(struct 
> mem_cgroup *memcg,
>   return val;
>  }
>  
> +/* idx can be of type enum memcg_stat_item or node_stat_item */
>  static inline void __mod_memcg_state(struct mem_cgroup *memcg,
> -  enum memcg_stat_item idx, int val)
> +  int idx, int val)
>  {
>   if (!mem_cgroup_disabled())
>   __this_cpu_add(memcg->stat->count[idx], val);
>  }
>  
> +/* idx can be of type enum memcg_stat_item or node_stat_item */
>  static inline void mod_memcg_state(struct mem_cgroup *memcg,
> -enum memcg_stat_item idx, int val)
> +int idx, int val)
>  {
>   if (!mem_cgroup_disabled())
>   this_cpu_add(memcg->stat->count[idx], val);
> @@ -631,8 +634,9 @@ static inline void count_memcg_events(struct mem_cgroup 
> *memcg,
>   this_cpu_add(memcg->stat->events[idx], count);
>  }
>  
> +/* idx can be of type enum memcg_stat_item or node_stat_item */
>  static inline void count_memcg_page_event(struct page *page,
> -   enum memcg_stat_item idx)
> +   int idx)
>  {
>   if (page->mem_cgroup)
>   count_memcg_events(page->mem_cgroup, idx, 1);
> @@ -840,19 +844,19 @@ static inline bool mem_cgroup_oom_synchronize(bool wait)
>  }
>  
>  static inline unsigned long memcg_page_state(struct mem_cgroup *memcg,
> -  enum memcg_stat_item idx)
> +  int idx)
>  {
>   return 0;
>  }
>  
>  static inline void __mod_memcg_state(struct mem_cgroup *memcg,
> -  enum memcg_stat_item idx,
> +  int idx,
>int nr)
>  {
>  }
>  
>  static inline void mod_memcg_state(struct mem_cgroup *memcg,
> -enum memcg_stat_item idx,
> +int idx,
>  int nr)
>  {
>  }
> @@ -918,7 +922,7 @@ static inline void count_memcg_events(struct mem_cgroup 
> *memcg,
>  }
>  
>  static inline void count_memcg_page_event(struct page *page,
> -   enum memcg_stat_item idx)
> +   int idx)
>  {
>  }
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3df3c04d73ab..460130d2a796 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -550,10 +550,12 @@ mem_cgroup_largest_soft_limit_node(struct 
> mem_cgroup_tree_per_node *mctz)
>   * value, and reading all cpu value can be performance bottleneck in some
>   * common workload, threshold and synchronization as vmstat[] should be
>   * implemented.
> + *
> + * The parameter idx can be of type enum memcg_event_item or vm_event_item.
>   */
>  
>  static unsigned long memcg_sum_events(struct mem_cgroup *memcg,
> -   enum memcg_event_item event)
> +   int event)
>  {
>   unsigned long val = 0;
>   int cpu;
> -- 
> 2.14.0.rc0.400.g1c36432dff-goog

-- 
Michal Hocko
SUSE Labs


Re: [PATCH] KVM: nVMX: do not pin the VMCS12

2017-07-27 Thread Paolo Bonzini
On 27/07/2017 19:20, David Matlack wrote:
>> +   kvm_vcpu_write_guest_page(&vmx->vcpu,
>> + vmx->nested.current_vmptr >> PAGE_SHIFT,
>> + vmx->nested.cached_vmcs12, 0, VMCS12_SIZE);
> Have you hit any "suspicious RCU usage" error messages during VM
> teardown with this patch? We did when we replaced memcpy with
> kvm_write_guest a while back. IIRC it was due to kvm->srcu not being
> held in one of the teardown paths. kvm_write_guest() expects it to be
> held in order to access memslots.
> 
> We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send
> that patch along with a few other nVMX dirty tracking related patches
> I've been meaning to get upstreamed.

Oh, right.  I had this other (untested) patch in the queue after
Christian recently annotated everything with RCU checks:

Paolo

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 890b706d1943..07e3b02a1be3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -477,7 +477,8 @@ struct kvm {
 static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx)
 {
return srcu_dereference_check(kvm->buses[idx], &kvm->srcu,
- lockdep_is_held(&kvm->slots_lock));
+ lockdep_is_held(&kvm->slots_lock) ||
+ !refcount_read(&kvm->users_count));
 }
 
 static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
@@ -570,7 +571,8 @@ void kvm_put_kvm(struct kvm *kvm);
 static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id)
 {
return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu,
-   lockdep_is_held(&kvm->slots_lock));
+   lockdep_is_held(&kvm->slots_lock) ||
+   !refcount_read(&kvm->users_count));
 }
 
 static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f3f74271f1a9..6a21c98b22bf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -655,7 +655,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
mutex_init(&kvm->lock);
mutex_init(&kvm->irq_lock);
mutex_init(&kvm->slots_lock);
-   refcount_set(&kvm->users_count, 1);
INIT_LIST_HEAD(&kvm->devices);
 
r = kvm_arch_init_vm(kvm, type);
@@ -701,6 +700,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
if (r)
goto out_err;
 
+   refcount_set(&kvm->users_count, 1);
spin_lock(&kvm_lock);
list_add(&kvm->vm_list, &vm_list);
spin_unlock(&kvm_lock);
@@ -717,10 +717,9 @@ static struct kvm *kvm_create_vm(unsigned long type)
hardware_disable_all();
 out_err_no_disable:
for (i = 0; i < KVM_NR_BUSES; i++)
-   kfree(rcu_access_pointer(kvm->buses[i]));
+   kfree(kvm_get_bus(kvm, i));
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
-   kvm_free_memslots(kvm,
-   rcu_dereference_protected(kvm->memslots[i], 1));
+   kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
kvm_arch_free_vm(kvm);
mmdrop(current->mm);
return ERR_PTR(r);
@@ -754,9 +754,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
spin_unlock(&kvm_lock);
kvm_free_irq_routing(kvm);
for (i = 0; i < KVM_NR_BUSES; i++) {
-   struct kvm_io_bus *bus;
+   struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
 
-   bus = rcu_dereference_protected(kvm->buses[i], 1);
if (bus)
kvm_io_bus_destroy(bus);
kvm->buses[i] = NULL;
@@ -770,8 +769,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
kvm_arch_destroy_vm(kvm);
kvm_destroy_devices(kvm);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
-   kvm_free_memslots(kvm,
-   rcu_dereference_protected(kvm->memslots[i], 1));
+   kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
cleanup_srcu_struct(&kvm->irq_srcu);
cleanup_srcu_struct(&kvm->srcu);
kvm_arch_free_vm(kvm);



[PATCH v6 02/14] spi: qup: Setup DMA mode correctly

2017-07-27 Thread Varadarajan Narayanan
To operate in DMA mode, the buffer should be aligned and
the size of the transfer should be a multiple of block size
(for v1). And the no. of words being transferred should
be programmed in the count registers appropriately.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 118 +++---
 1 file changed, 55 insertions(+), 63 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index c0d4def..abe799b 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -149,11 +149,18 @@ struct spi_qup {
int rx_bytes;
int qup_v1;
 
-   int use_dma;
+   int mode;
struct dma_slave_config rx_conf;
struct dma_slave_config tx_conf;
 };
 
+static inline bool spi_qup_is_dma_xfer(int mode)
+{
+   if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM)
+   return true;
+
+   return false;
+}
 
 static inline bool spi_qup_is_valid_state(struct spi_qup *controller)
 {
@@ -424,7 +431,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
error = -EIO;
}
 
-   if (!controller->use_dma) {
+   if (!spi_qup_is_dma_xfer(controller->mode)) {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
spi_qup_fifo_read(controller, xfer);
 
@@ -443,34 +450,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-static u32
-spi_qup_get_mode(struct spi_master *master, struct spi_transfer *xfer)
-{
-   struct spi_qup *qup = spi_master_get_devdata(master);
-   u32 mode;
-
-   qup->w_size = 4;
-
-   if (xfer->bits_per_word <= 8)
-   qup->w_size = 1;
-   else if (xfer->bits_per_word <= 16)
-   qup->w_size = 2;
-
-   qup->n_words = xfer->len / qup->w_size;
-
-   if (qup->n_words <= (qup->in_fifo_sz / sizeof(u32)))
-   mode = QUP_IO_M_MODE_FIFO;
-   else
-   mode = QUP_IO_M_MODE_BLOCK;
-
-   return mode;
-}
-
 /* set clock freq ... bits per word */
 static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
 {
struct spi_qup *controller = spi_master_get_devdata(spi->master);
-   u32 config, iomode, mode, control;
+   u32 config, iomode, control;
int ret, n_words;
 
if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) {
@@ -491,25 +475,30 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
return -EIO;
}
 
-   mode = spi_qup_get_mode(spi->master, xfer);
+   controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8);
+   controller->n_words = xfer->len / controller->w_size;
n_words = controller->n_words;
 
-   if (mode == QUP_IO_M_MODE_FIFO) {
+   if (n_words <= (controller->in_fifo_sz / sizeof(u32))) {
+
+   controller->mode = QUP_IO_M_MODE_FIFO;
+
writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT);
writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT);
/* must be zero for FIFO */
writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
-   } else if (!controller->use_dma) {
+   } else if (spi->master->can_dma &&
+  spi->master->can_dma(spi->master, spi, xfer) &&
+  spi->master->cur_msg_mapped) {
+
+   controller->mode = QUP_IO_M_MODE_BAM;
+
writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
/* must be zero for BLOCK and BAM */
writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
-   } else {
-   mode = QUP_IO_M_MODE_BAM;
-   writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
-   writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
 
if (!controller->qup_v1) {
void __iomem *input_cnt;
@@ -528,19 +517,28 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
 
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
}
+   } else {
+
+   controller->mode = QUP_IO_M_MODE_BLOCK;
+
+   writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
+   writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
+   /* must be zero for BLOCK and BAM */
+   writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
+   writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
}
 
iomode = readl_relaxed(controller->bas

[PATCH v6 04/14] spi: qup: Place the QUP in run mode before DMA

2017-07-27 Thread Varadarajan Narayanan
Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index fdd34c3..f1aa5c1 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -343,6 +343,14 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
else if (xfer->tx_buf)
tx_done = spi_qup_dma_done;
 
+   /* before issuing the descriptors, set the QUP to run */
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
+   __func__, __LINE__);
+   return ret;
+   }
+
if (xfer->rx_buf) {
ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM, rx_done);
if (ret)
@@ -385,6 +393,13 @@ static int spi_qup_do_pio(struct spi_master *master, 
struct spi_transfer *xfer,
 
spi_qup_fifo_write(qup, xfer);
 
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
+   __func__, __LINE__);
+   return ret;
+   }
+
if (!wait_for_completion_timeout(&qup->done, timeout))
return -ETIMEDOUT;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 06/14] spi: qup: Fix transaction done signaling

2017-07-27 Thread Varadarajan Narayanan
Wait to signal done until we get all of the interrupts we are expecting
to get for a transaction.  If we don't wait for the input done flag, we
can be in between transactions when the done flag comes in and this can
mess up the next transaction.

While here cleaning up the code which sets controller->xfer = NULL and
restores it in the ISR. This looks to be some debug code which is not
required.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 27 +--
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index ef95294..a7c630c 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -409,29 +409,16 @@ static int spi_qup_do_pio(struct spi_master *master, 
struct spi_transfer *xfer,
 static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
 {
struct spi_qup *controller = dev_id;
-   struct spi_transfer *xfer;
+   struct spi_transfer *xfer = controller->xfer;
u32 opflags, qup_err, spi_err;
-   unsigned long flags;
int error = 0;
 
-   spin_lock_irqsave(&controller->lock, flags);
-   xfer = controller->xfer;
-   controller->xfer = NULL;
-   spin_unlock_irqrestore(&controller->lock, flags);
-
qup_err = readl_relaxed(controller->base + QUP_ERROR_FLAGS);
spi_err = readl_relaxed(controller->base + SPI_ERROR_FLAGS);
opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
 
writel_relaxed(qup_err, controller->base + QUP_ERROR_FLAGS);
writel_relaxed(spi_err, controller->base + SPI_ERROR_FLAGS);
-   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
-
-   if (!xfer) {
-   dev_err_ratelimited(controller->dev, "unexpected irq %08x %08x 
%08x\n",
-   qup_err, spi_err, opflags);
-   return IRQ_HANDLED;
-   }
 
if (qup_err) {
if (qup_err & QUP_ERROR_OUTPUT_OVER_RUN)
@@ -455,7 +442,9 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
error = -EIO;
}
 
-   if (!spi_qup_is_dma_xfer(controller->mode)) {
+   if (spi_qup_is_dma_xfer(controller->mode)) {
+   writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
+   } else {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
spi_qup_fifo_read(controller, xfer);
 
@@ -463,12 +452,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
spi_qup_fifo_write(controller, xfer);
}
 
-   spin_lock_irqsave(&controller->lock, flags);
-   controller->error = error;
-   controller->xfer = xfer;
-   spin_unlock_irqrestore(&controller->lock, flags);
-
-   if (controller->rx_bytes == xfer->len || error)
+   if ((opflags & QUP_OP_MAX_INPUT_DONE_FLAG) || error)
complete(&controller->done);
 
return IRQ_HANDLED;
@@ -666,7 +650,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
 exit:
spi_qup_set_state(controller, QUP_STATE_RESET);
spin_lock_irqsave(&controller->lock, flags);
-   controller->xfer = NULL;
if (!ret)
ret = controller->error;
spin_unlock_irqrestore(&controller->lock, flags);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 05/14] spi: qup: Fix error handling in spi_qup_prep_sg

2017-07-27 Thread Varadarajan Narayanan
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index f1aa5c1..ef95294 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -311,8 +311,8 @@ static int spi_qup_prep_sg(struct spi_master *master, 
struct spi_transfer *xfer,
}
 
desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags);
-   if (!desc)
-   return -EINVAL;
+   if (IS_ERR_OR_NULL(desc))
+   return desc ? PTR_ERR(desc) : -EINVAL;
 
desc->callback = callback;
desc->callback_param = qup;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-07-27 Thread David.Wu

Hi Florian,

在 2017/7/28 0:54, Florian Fainelli 写道:

- if you need knowledge about this PHY connection type prior to binding
the PHY device and its driver (that is, before of_phy_connect()) we
could add a boolean property e.g: "phy-is-internal" that allows us to
know that, or we can have a new phy-mode value, e.g: "internal-rmii"
which describes that, either way would probably be fine, but the former
scales better



Using "phy-is-internal" is very helpful, but it is easy to confuse with 
the real internal PHY, may we use the other words like phy is inlined🙂.



Then again, using phy-mode = "internal" even though this is Reduced MII
is not big of a deal IMHO as long as there is no loss of information and
that internal de-facto means internal reduced MII for instance.
--




[PATCH v6 09/14] spi: qup: call io_config in mode specific function

2017-07-27 Thread Varadarajan Narayanan
DMA transactions should only only need to call io_config only once, but
block mode might call it several times to setup several transactions so
it can handle reads/writes larger than the max size per transaction, so
we move the call to the do_ functions.

This is just refactoring, there should be no functional change

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 26 +-
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index ff5aa08..1aa6078 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -156,6 +156,8 @@ struct spi_qup {
struct dma_slave_config tx_conf;
 };
 
+static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer 
*xfer);
+
 static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag)
 {
u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL);
@@ -417,11 +419,12 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
-static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer,
+static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
-   struct spi_qup *qup = spi_master_get_devdata(master);
dma_async_tx_callback rx_done = NULL, tx_done = NULL;
+   struct spi_master *master = spi->master;
+   struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
if (xfer->rx_buf)
@@ -429,6 +432,10 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
else if (xfer->tx_buf)
tx_done = spi_qup_dma_done;
 
+   ret = spi_qup_io_config(spi, xfer);
+   if (ret)
+   return ret;
+
/* before issuing the descriptors, set the QUP to run */
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
@@ -459,12 +466,17 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer,
return 0;
 }
 
-static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer,
+static int spi_qup_do_pio(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
+   struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
 
+   ret = spi_qup_io_config(spi, xfer);
+   if (ret)
+   return ret;
+
ret = spi_qup_set_state(qup, QUP_STATE_RUN);
if (ret) {
dev_warn(qup->dev, "cannot set RUN state\n");
@@ -742,10 +754,6 @@ static int spi_qup_transfer_one(struct spi_master *master,
if (ret)
return ret;
 
-   ret = spi_qup_io_config(spi, xfer);
-   if (ret)
-   return ret;
-
timeout = DIV_ROUND_UP(xfer->speed_hz, MSEC_PER_SEC);
timeout = DIV_ROUND_UP(xfer->len * 8, timeout);
timeout = 100 * msecs_to_jiffies(timeout);
@@ -760,9 +768,9 @@ static int spi_qup_transfer_one(struct spi_master *master,
spin_unlock_irqrestore(&controller->lock, flags);
 
if (spi_qup_is_dma_xfer(controller->mode))
-   ret = spi_qup_do_dma(master, xfer, timeout);
+   ret = spi_qup_do_dma(spi, xfer, timeout);
else
-   ret = spi_qup_do_pio(master, xfer, timeout);
+   ret = spi_qup_do_pio(spi, xfer, timeout);
 
if (ret)
goto exit;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 12/14] spi: qup: allow multiple DMA transactions per spi xfer

2017-07-27 Thread Varadarajan Narayanan
Much like the block mode changes, we are breaking up DMA transactions
into 64K chunks so we can reset the QUP engine.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 92 ---
 1 file changed, 66 insertions(+), 26 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1af3b41..3c2c2c0 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -418,12 +418,35 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
+static u32 spi_qup_sgl_get_nents_len(struct scatterlist *sgl, u32 max,
+u32 *nents)
+{
+   struct scatterlist *sg;
+   u32 total = 0;
+
+   *nents = 0;
+
+   for (sg = sgl; sg; sg = sg_next(sg)) {
+   unsigned int len = sg_dma_len(sg);
+
+   /* check for overflow as well as limit */
+   if (((total + len) < total) || ((total + len) > max))
+   break;
+
+   total += len;
+   (*nents)++;
+   }
+
+   return total;
+}
+
 static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer,
  unsigned long timeout)
 {
dma_async_tx_callback rx_done = NULL, tx_done = NULL;
struct spi_master *master = spi->master;
struct spi_qup *qup = spi_master_get_devdata(master);
+   struct scatterlist *tx_sgl, *rx_sgl;
int ret;
 
if (xfer->rx_buf)
@@ -431,40 +454,57 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
else if (xfer->tx_buf)
tx_done = spi_qup_dma_done;
 
-   ret = spi_qup_io_config(spi, xfer);
-   if (ret)
-   return ret;
+   rx_sgl = xfer->rx_sg.sgl;
+   tx_sgl = xfer->tx_sg.sgl;
 
-   /* before issuing the descriptors, set the QUP to run */
-   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
-   if (ret) {
-   dev_warn(qup->dev, "%s(%d): cannot set RUN state\n",
-   __func__, __LINE__);
-   return ret;
-   }
+   do {
+   u32 rx_nents, tx_nents;
+
+   if (rx_sgl)
+   qup->n_words = spi_qup_sgl_get_nents_len(rx_sgl,
+   SPI_MAX_XFER, &rx_nents) / qup->w_size;
+   if (tx_sgl)
+   qup->n_words = spi_qup_sgl_get_nents_len(tx_sgl,
+   SPI_MAX_XFER, &tx_nents) / qup->w_size;
+   if (!qup->n_words)
+   return -EIO;
 
-   if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl,
- xfer->rx_sg.nents, DMA_DEV_TO_MEM,
- rx_done);
+   ret = spi_qup_io_config(spi, xfer);
if (ret)
return ret;
 
-   dma_async_issue_pending(master->dma_rx);
-   }
-
-   if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl,
- xfer->tx_sg.nents, DMA_MEM_TO_DEV,
- tx_done);
-   if (ret)
+   /* before issuing the descriptors, set the QUP to run */
+   ret = spi_qup_set_state(qup, QUP_STATE_RUN);
+   if (ret) {
+   dev_warn(qup->dev, "cannot set RUN state\n");
return ret;
+   }
+   if (rx_sgl) {
+   ret = spi_qup_prep_sg(master, rx_sgl, rx_nents,
+ DMA_DEV_TO_MEM, rx_done);
+   if (ret)
+   return ret;
+   dma_async_issue_pending(master->dma_rx);
+   }
 
-   dma_async_issue_pending(master->dma_tx);
-   }
+   if (tx_sgl) {
+   ret = spi_qup_prep_sg(master, tx_sgl, tx_nents,
+ DMA_MEM_TO_DEV, tx_done);
+   if (ret)
+   return ret;
+
+   dma_async_issue_pending(master->dma_tx);
+   }
+
+   if (!wait_for_completion_timeout(&qup->done, timeout))
+   return -ETIMEDOUT;
+
+   for (; rx_sgl && rx_nents--; rx_sgl = sg_next(rx_sgl))
+   ;
+   for (; tx_sgl && tx_nents--; tx_sgl = sg_next(tx_sgl))
+   ;
 
-   if (!wait_for_completion_timeout(&qup->done, timeout))
-   return -ETIMEDOUT;
+   } while (rx_sgl || tx_sgl);
 
return 0;
 }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 14/14] spi: qup: Fix QUP version identify method

2017-07-27 Thread Varadarajan Narayanan
Use of_device_get_match_data to identify QUP version instead
of of_device_is_compatible.

Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 4c3c938..1364516 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1058,9 +1059,7 @@ static int spi_qup_probe(struct platform_device *pdev)
else if (!ret)
master->can_dma = spi_qup_can_dma;
 
-   /* set v1 flag if device is version 1 */
-   if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1"))
-   controller->qup_v1 = 1;
+   controller->qup_v1 = (int)of_device_get_match_data(dev);
 
if (!controller->qup_v1)
master->set_cs = spi_qup_set_cs;
@@ -1256,7 +1255,7 @@ static int spi_qup_remove(struct platform_device *pdev)
 }
 
 static const struct of_device_id spi_qup_dt_match[] = {
-   { .compatible = "qcom,spi-qup-v1.1.1", },
+   { .compatible = "qcom,spi-qup-v1.1.1", .data = (void *)1, },
{ .compatible = "qcom,spi-qup-v2.1.1", },
{ .compatible = "qcom,spi-qup-v2.2.1", },
{ }
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 10/14] spi: qup: allow block mode to generate multiple transactions

2017-07-27 Thread Varadarajan Narayanan
This let's you write more to the SPI bus than 64K-1 which is important
if the block size of a SPI device is >= 64K or some other device wants
to do something larger.

This has the benefit of completely removing spi_message from the spi-qup
transactions

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 128 +++---
 1 file changed, 80 insertions(+), 48 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1aa6078..707b1ec 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -120,7 +120,7 @@
 
 #define SPI_NUM_CHIPSELECTS4
 
-#define SPI_MAX_DMA_XFER   (SZ_64K - 64)
+#define SPI_MAX_XFER   (SZ_64K - 64)
 
 /* high speed mode is when bus rate is greater then 26MHz */
 #define SPI_HS_MIN_RATE2600
@@ -149,6 +149,8 @@ struct spi_qup {
int n_words;
int tx_bytes;
int rx_bytes;
+   const u8*tx_buf;
+   u8  *rx_buf;
int qup_v1;
 
int mode;
@@ -173,6 +175,12 @@ static inline bool spi_qup_is_dma_xfer(int mode)
return false;
 }
 
+/* get's the transaction size length */
+static inline unsigned int spi_qup_len(struct spi_qup *controller)
+{
+   return controller->n_words * controller->w_size;
+}
+
 static inline bool spi_qup_is_valid_state(struct spi_qup *controller)
 {
u32 opstate = readl_relaxed(controller->base + QUP_STATE);
@@ -225,10 +233,9 @@ static int spi_qup_set_state(struct spi_qup *controller, 
u32 state)
return 0;
 }
 
-static void spi_qup_read_from_fifo(struct spi_qup *controller,
-   struct spi_transfer *xfer, u32 num_words)
+static void spi_qup_read_from_fifo(struct spi_qup *controller, u32 num_words)
 {
-   u8 *rx_buf = xfer->rx_buf;
+   u8 *rx_buf = controller->rx_buf;
int i, shift, num_bytes;
u32 word;
 
@@ -236,8 +243,9 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller,
 
word = readl_relaxed(controller->base + QUP_INPUT_FIFO);
 
-   num_bytes = min_t(int, xfer->len - controller->rx_bytes,
-   controller->w_size);
+   num_bytes = min_t(int, spi_qup_len(controller) -
+  controller->rx_bytes,
+  controller->w_size);
 
if (!rx_buf) {
controller->rx_bytes += num_bytes;
@@ -258,13 +266,12 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller,
}
 }
 
-static void spi_qup_read(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_read(struct spi_qup *controller)
 {
u32 remainder, words_per_block, num_words;
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
 
-   remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes,
+   remainder = DIV_ROUND_UP(spi_qup_len(controller) - controller->rx_bytes,
 controller->w_size);
words_per_block = controller->in_blk_sz >> 2;
 
@@ -285,7 +292,7 @@ static void spi_qup_read(struct spi_qup *controller,
}
 
/* read up to the maximum transfer size available */
-   spi_qup_read_from_fifo(controller, xfer, num_words);
+   spi_qup_read_from_fifo(controller, num_words);
 
remainder -= num_words;
 
@@ -307,18 +314,18 @@ static void spi_qup_read(struct spi_qup *controller,
 
 }
 
-static void spi_qup_write_to_fifo(struct spi_qup *controller,
-   struct spi_transfer *xfer, u32 num_words)
+static void spi_qup_write_to_fifo(struct spi_qup *controller, u32 num_words)
 {
-   const u8 *tx_buf = xfer->tx_buf;
+   const u8 *tx_buf = controller->tx_buf;
int i, num_bytes;
u32 word, data;
 
for (; num_words; num_words--) {
word = 0;
 
-   num_bytes = min_t(int, xfer->len - controller->tx_bytes,
-   controller->w_size);
+   num_bytes = min_t(int, spi_qup_len(controller) -
+  controller->tx_bytes,
+  controller->w_size);
if (tx_buf)
for (i = 0; i < num_bytes; i++) {
data = tx_buf[controller->tx_bytes + i];
@@ -338,13 +345,12 @@ static void spi_qup_dma_done(void *data)
complete(&qup->done);
 }
 
-static void spi_qup_write(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_write(struct spi_qup *controller)
 {
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
u32 remainder, words_per_block, num_words;
 
-   remain

[PATCH v6 13/14] spi: qup: Ensure done detection

2017-07-27 Thread Varadarajan Narayanan
This patch fixes an issue where a SPI transaction has completed, but the
done condition is missed.  This occurs because at the time of interrupt the
MAX_INPUT_DONE_FLAG is not asserted.  However, in the process of reading
blocks of data from the FIFO, the last portion of data comes in.

The opflags read at the beginning of the irq handler no longer matches the
current opflag state.  To get around this condition, the block read
function should update the opflags so that done detection is correct after
the return.

Signed-off-by: Andy Gross 
Signed-off-by: Abhishek Sahu 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 3c2c2c0..4c3c938 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -266,7 +266,7 @@ static void spi_qup_read_from_fifo(struct spi_qup 
*controller, u32 num_words)
}
 }
 
-static void spi_qup_read(struct spi_qup *controller)
+static void spi_qup_read(struct spi_qup *controller, u32 *opflags)
 {
u32 remainder, words_per_block, num_words;
bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
@@ -305,10 +305,12 @@ static void spi_qup_read(struct spi_qup *controller)
 
/*
 * Due to extra stickiness of the QUP_OP_IN_SERVICE_FLAG during block
-* mode reads, it has to be cleared again at the very end
+* reads, it has to be cleared again at the very end.  However, be sure
+* to refresh opflags value because MAX_INPUT_DONE_FLAG may now be
+* present and this is used to determine if transaction is complete
 */
-   if (is_block_mode && spi_qup_is_flag_set(controller,
-   QUP_OP_MAX_INPUT_DONE_FLAG))
+   *opflags = readl_relaxed(controller->base + QUP_OPERATIONAL);
+   if (is_block_mode && *opflags & QUP_OP_MAX_INPUT_DONE_FLAG)
writel_relaxed(QUP_OP_IN_SERVICE_FLAG,
   controller->base + QUP_OPERATIONAL);
 
@@ -613,7 +615,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
writel_relaxed(opflags, controller->base + QUP_OPERATIONAL);
} else {
if (opflags & QUP_OP_IN_SERVICE_FLAG)
-   spi_qup_read(controller);
+   spi_qup_read(controller, &opflags);
 
if (opflags & QUP_OP_OUT_SERVICE_FLAG)
spi_qup_write(controller);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 11/14] spi: qup: refactor spi_qup_prep_sg

2017-07-27 Thread Varadarajan Narayanan
Take specific sgl and nent to be prepared.  This is in
preparation for splitting DMA into multiple transacations, this
contains no code changes just refactoring.

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 707b1ec..1af3b41 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -382,27 +382,20 @@ static void spi_qup_write(struct spi_qup *controller)
} while (remainder);
 }
 
-static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer 
*xfer,
-  enum dma_transfer_direction dir,
+static int spi_qup_prep_sg(struct spi_master *master, struct scatterlist *sgl,
+  unsigned int nents, enum dma_transfer_direction dir,
   dma_async_tx_callback callback)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
unsigned long flags = DMA_PREP_INTERRUPT | DMA_PREP_FENCE;
struct dma_async_tx_descriptor *desc;
-   struct scatterlist *sgl;
struct dma_chan *chan;
dma_cookie_t cookie;
-   unsigned int nents;
 
-   if (dir == DMA_MEM_TO_DEV) {
+   if (dir == DMA_MEM_TO_DEV)
chan = master->dma_tx;
-   nents = xfer->tx_sg.nents;
-   sgl = xfer->tx_sg.sgl;
-   } else {
+   else
chan = master->dma_rx;
-   nents = xfer->rx_sg.nents;
-   sgl = xfer->rx_sg.sgl;
-   }
 
desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags);
if (IS_ERR_OR_NULL(desc))
@@ -451,7 +444,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
}
 
if (xfer->rx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM, rx_done);
+   ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl,
+ xfer->rx_sg.nents, DMA_DEV_TO_MEM,
+ rx_done);
if (ret)
return ret;
 
@@ -459,7 +454,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct 
spi_transfer *xfer,
}
 
if (xfer->tx_buf) {
-   ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV, tx_done);
+   ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl,
+ xfer->tx_sg.nents, DMA_MEM_TO_DEV,
+ tx_done);
if (ret)
return ret;
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 07/14] spi: qup: Do block sized read/write in block mode

2017-07-27 Thread Varadarajan Narayanan
This patch corrects the behavior of the BLOCK
transactions.  During block transactions, the controller
must be read/written to in block size transactions.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 151 +++---
 1 file changed, 119 insertions(+), 32 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index a7c630c..8cfa112 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -82,6 +82,8 @@
 #define QUP_IO_M_MODE_BAM  3
 
 /* QUP_OPERATIONAL fields */
+#define QUP_OP_IN_BLOCK_READ_REQ   BIT(13)
+#define QUP_OP_OUT_BLOCK_WRITE_REQ BIT(12)
 #define QUP_OP_MAX_INPUT_DONE_FLAG BIT(11)
 #define QUP_OP_MAX_OUTPUT_DONE_FLAGBIT(10)
 #define QUP_OP_IN_SERVICE_FLAG BIT(9)
@@ -154,6 +156,13 @@ struct spi_qup {
struct dma_slave_config tx_conf;
 };
 
+static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag)
+{
+   u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL);
+
+   return (opflag & flag) != 0;
+}
+
 static inline bool spi_qup_is_dma_xfer(int mode)
 {
if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM)
@@ -214,29 +223,26 @@ static int spi_qup_set_state(struct spi_qup *controller, 
u32 state)
return 0;
 }
 
-static void spi_qup_fifo_read(struct spi_qup *controller,
-   struct spi_transfer *xfer)
+static void spi_qup_read_from_fifo(struct spi_qup *controller,
+   struct spi_transfer *xfer, u32 num_words)
 {
u8 *rx_buf = xfer->rx_buf;
-   u32 word, state;
-   int idx, shift, w_size;
+   int i, shift, num_bytes;
+   u32 word;
 
-   w_size = controller->w_size;
-
-   while (controller->rx_bytes < xfer->len) {
-
-   state = readl_relaxed(controller->base + QUP_OPERATIONAL);
-   if (0 == (state & QUP_OP_IN_FIFO_NOT_EMPTY))
-   break;
+   for (; num_words; num_words--) {
 
word = readl_relaxed(controller->base + QUP_INPUT_FIFO);
 
+   num_bytes = min_t(int, xfer->len - controller->rx_bytes,
+   controller->w_size);
+
if (!rx_buf) {
-   controller->rx_bytes += w_size;
+   controller->rx_bytes += num_bytes;
continue;
}
 
-   for (idx = 0; idx < w_size; idx++, controller->rx_bytes++) {
+   for (i = 0; i < num_bytes; i++, controller->rx_bytes++) {
/*
 * The data format depends on bytes per SPI word:
 *  4 bytes: 0x12345678
@@ -244,38 +250,80 @@ static void spi_qup_fifo_read(struct spi_qup *controller,
 *  1 byte : 0x0012
 */
shift = BITS_PER_BYTE;
-   shift *= (w_size - idx - 1);
+   shift *= (controller->w_size - i - 1);
rx_buf[controller->rx_bytes] = word >> shift;
}
}
 }
 
-static void spi_qup_fifo_write(struct spi_qup *controller,
+static void spi_qup_read(struct spi_qup *controller,
struct spi_transfer *xfer)
 {
-   const u8 *tx_buf = xfer->tx_buf;
-   u32 word, state, data;
-   int idx, w_size;
+   u32 remainder, words_per_block, num_words;
+   bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK;
+
+   remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes,
+controller->w_size);
+   words_per_block = controller->in_blk_sz >> 2;
+
+   do {
+   /* ACK by clearing service flag */
+   writel_relaxed(QUP_OP_IN_SERVICE_FLAG,
+  controller->base + QUP_OPERATIONAL);
+
+   if (is_block_mode) {
+   num_words = (remainder > words_per_block) ?
+   words_per_block : remainder;
+   } else {
+   if (!spi_qup_is_flag_set(controller,
+QUP_OP_IN_FIFO_NOT_EMPTY))
+   break;
 
-   w_size = controller->w_size;
+   num_words = 1;
+   }
 
-   while (controller->tx_bytes < xfer->len) {
+   /* read up to the maximum transfer size available */
+   spi_qup_read_from_fifo(controller, xfer, num_words);
 
-   state = readl_relaxed(controller->base + QUP_OPERATIONAL);
-   if (state & QUP_OP_OUT_FIFO_FULL)
+   remainder -= num_words;
+
+   /* if block mode, check to see if next block is available */
+   if (is_block_mode && !spi_qup_is_flag_set(controller,
+   QUP_OP_IN_BLOCK_READ_REQ))
break;
 

[PATCH v6 08/14] spi: qup: refactor spi_qup_io_config into two functions

2017-07-27 Thread Varadarajan Narayanan
This is in preparation for handling transactions larger than
64K-1 bytes in block mode, which is currently unsupported and
quietly fails.

We need to break these into two functions 1) prep is
called once per spi_message and 2) io_config is called
once per spi-qup bus transaction

This is just refactoring, there should be no functional
change

Signed-off-by: Matthew McClintock 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 91 +++
 1 file changed, 62 insertions(+), 29 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 8cfa112..ff5aa08 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -545,12 +545,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-/* set clock freq ... bits per word */
-static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
+/* set clock freq ... bits per word, determine mode */
+static int spi_qup_io_prep(struct spi_device *spi, struct spi_transfer *xfer)
 {
struct spi_qup *controller = spi_master_get_devdata(spi->master);
-   u32 config, iomode, control;
-   int ret, n_words;
+   int ret;
 
if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) {
dev_err(controller->dev, "too big size for loopback %d > %d\n",
@@ -565,32 +564,56 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
return -EIO;
}
 
-   if (spi_qup_set_state(controller, QUP_STATE_RESET)) {
-   dev_err(controller->dev, "cannot set RESET state\n");
-   return -EIO;
-   }
-
controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8);
controller->n_words = xfer->len / controller->w_size;
-   n_words = controller->n_words;
-
-   if (n_words <= (controller->in_fifo_sz / sizeof(u32))) {
 
+   if (controller->n_words <= (controller->in_fifo_sz / sizeof(u32)))
controller->mode = QUP_IO_M_MODE_FIFO;
+   else if (spi->master->can_dma &&
+spi->master->can_dma(spi->master, spi, xfer) &&
+spi->master->cur_msg_mapped)
+   controller->mode = QUP_IO_M_MODE_BAM;
+   else
+   controller->mode = QUP_IO_M_MODE_BLOCK;
+
+   return 0;
+}
+
+/* prep qup for another spi transaction of specific type */
+static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer)
+{
+   struct spi_qup *controller = spi_master_get_devdata(spi->master);
+   u32 config, iomode, control;
+   unsigned long flags;
+
+   spin_lock_irqsave(&controller->lock, flags);
+   controller->xfer = xfer;
+   controller->error= 0;
+   controller->rx_bytes = 0;
+   controller->tx_bytes = 0;
+   spin_unlock_irqrestore(&controller->lock, flags);
+
+
+   if (spi_qup_set_state(controller, QUP_STATE_RESET)) {
+   dev_err(controller->dev, "cannot set RESET state\n");
+   return -EIO;
+   }
 
-   writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT);
-   writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT);
+   switch (controller->mode) {
+   case QUP_IO_M_MODE_FIFO:
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_READ_CNT);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_WRITE_CNT);
/* must be zero for FIFO */
writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT);
writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT);
-   } else if (spi->master->can_dma &&
-  spi->master->can_dma(spi->master, spi, xfer) &&
-  spi->master->cur_msg_mapped) {
-
-   controller->mode = QUP_IO_M_MODE_BAM;
-
-   writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT);
-   writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT);
+   break;
+   case QUP_IO_M_MODE_BAM:
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_INPUT_CNT);
+   writel_relaxed(controller->n_words,
+  controller->base + QUP_MX_OUTPUT_CNT);
/* must be zero for BLOCK and BAM */
writel_relaxed(0, controller->base + QUP_MX_READ_CNT);
writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT);
@@ -608,19 +631,25 @@ static int spi_qup_io_config(struct spi_device *spi, 
struct spi_transfer *xfer)
if (xfer->tx_buf)
writel_relaxed(0, input_cnt);
else
-   writel_relaxed(n_words, input_cnt);
+   writel_relaxed(controller->n_words, input_cnt);
 
   

[PATCH v6 03/14] spi: qup: Add completion timeout

2017-07-27 Thread Varadarajan Narayanan
Add i/o completion timeout for DMA and PIO modes.

Signed-off-by: Andy Gross 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index abe799b..fdd34c3 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -331,8 +331,10 @@ static void spi_qup_dma_terminate(struct spi_master 
*master,
dmaengine_terminate_all(master->dma_rx);
 }
 
-static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer)
+static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer,
+ unsigned long timeout)
 {
+   struct spi_qup *qup = spi_master_get_devdata(master);
dma_async_tx_callback rx_done = NULL, tx_done = NULL;
int ret;
 
@@ -357,10 +359,14 @@ static int spi_qup_do_dma(struct spi_master *master, 
struct spi_transfer *xfer)
dma_async_issue_pending(master->dma_tx);
}
 
+   if (!wait_for_completion_timeout(&qup->done, timeout))
+   return -ETIMEDOUT;
+
return 0;
 }
 
-static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer)
+static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer,
+ unsigned long timeout)
 {
struct spi_qup *qup = spi_master_get_devdata(master);
int ret;
@@ -379,6 +385,9 @@ static int spi_qup_do_pio(struct spi_master *master, struct 
spi_transfer *xfer)
 
spi_qup_fifo_write(qup, xfer);
 
+   if (!wait_for_completion_timeout(&qup->done, timeout))
+   return -ETIMEDOUT;
+
return 0;
 }
 
@@ -632,9 +641,9 @@ static int spi_qup_transfer_one(struct spi_master *master,
spin_unlock_irqrestore(&controller->lock, flags);
 
if (spi_qup_is_dma_xfer(controller->mode))
-   ret = spi_qup_do_dma(master, xfer);
+   ret = spi_qup_do_dma(master, xfer, timeout);
else
-   ret = spi_qup_do_pio(master, xfer);
+   ret = spi_qup_do_pio(master, xfer, timeout);
 
if (ret)
goto exit;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 01/14] spi: qup: Enable chip select support

2017-07-27 Thread Varadarajan Narayanan
Enable chip select support for QUP versions later than v1. The
chip select support was broken in QUP version 1. Hence the chip
select support was removed earlier in an earlier commit
(4a8573abe "spi: qup: Remove chip select function"). Since the
chip select support is functional in recent versions of QUP,
re-enabling it for QUP versions later than v1.

Signed-off-by: Sham Muthayyan 
Signed-off-by: Varadarajan Narayanan 
---
 drivers/spi/spi-qup.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c
index 1bfa889..c0d4def 100644
--- a/drivers/spi/spi-qup.c
+++ b/drivers/spi/spi-qup.c
@@ -750,6 +750,24 @@ static int spi_qup_init_dma(struct spi_master *master, 
resource_size_t base)
return ret;
 }
 
+static void spi_qup_set_cs(struct spi_device *spi, bool val)
+{
+   struct spi_qup *controller;
+   u32 spi_ioc;
+   u32 spi_ioc_orig;
+
+   controller = spi_master_get_devdata(spi->master);
+   spi_ioc = readl_relaxed(controller->base + SPI_IO_CONTROL);
+   spi_ioc_orig = spi_ioc;
+   if (!val)
+   spi_ioc |= SPI_IO_C_FORCE_CS;
+   else
+   spi_ioc &= ~SPI_IO_C_FORCE_CS;
+
+   if (spi_ioc != spi_ioc_orig)
+   writel_relaxed(spi_ioc, controller->base + SPI_IO_CONTROL);
+}
+
 static int spi_qup_probe(struct platform_device *pdev)
 {
struct spi_master *master;
@@ -846,6 +864,9 @@ static int spi_qup_probe(struct platform_device *pdev)
if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1"))
controller->qup_v1 = 1;
 
+   if (!controller->qup_v1)
+   master->set_cs = spi_qup_set_cs;
+
spin_lock_init(&controller->lock);
init_completion(&controller->done);
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



[PATCH v6 00/14] spi: qup: Fixes and add support for >64k transfers

2017-07-27 Thread Varadarajan Narayanan
v6:
Fix git bisect-ability issues in
spi: qup: Add completion timeout
spi: qup: call io_config in mode specific function
spi: qup: allow multiple DMA transactions per spi xfer

v5:
Incorporate feedback from Mark Brown and Geert Uytterhoeven

spi: qup: Enable chip select support
Update commit log. Include the commit's description

Removed "spi: qup: Add completion structures for DMA". This
introduced separate completion structures for DMA based rx/tx
and FIFO based i/os. This was not needed.

Added "spi: qup: Fix QUP version identify method" to identify
qup version using of_device_get_match_data instead of
of_device_is_compatible.

v4:
Discard patch #15, 'spi: qup: support for qup v1 dma'.
This depends on ADM driver, which is not upstreamed yet.

v3:
Fix git bisect-ability issues in
spi: qup: Add completion structures for DMA
spi: qup: Add completion timeout
spi: qup: Place the QUP in run mode before DMA
spi: qup: Fix transaction done signaling

v2:
Incorporate feedback from Andy Gross, Sricharan, Stanimir Varbanov

Modified the QUP-v1 dma completion sequence to QUP-v2 as per feedback.

Removed code that used controller->xfer to identify extra interrupts,
since with the fixes done to handle i/o completion we don't see
extra interrupts.

v1:
This series fixes some existing issues in the code for both
interrupt and dma mode. Patches 1 - 11 are the fixes.
Random failures/timeout are observed without these fixes.
Also, the current driver does not support block transfers > 64K
and the driver quietly fails. Patches 12 - 18 add support for this
in both interrupt and dma mode.

The entire series has been tested on ipq4019 with
SPI-NOR flash for block sizes > 64k.

Varadarajan Narayanan (14):
  spi: qup: Enable chip select support
  spi: qup: Setup DMA mode correctly
  spi: qup: Add completion timeout
  spi: qup: Place the QUP in run mode before DMA
  spi: qup: Fix error handling in spi_qup_prep_sg
  spi: qup: Fix transaction done signaling
  spi: qup: Do block sized read/write in block mode
  spi: qup: refactor spi_qup_io_config into two functions
  spi: qup: call io_config in mode specific function
  spi: qup: allow block mode to generate multiple transactions
  spi: qup: refactor spi_qup_prep_sg
  spi: qup: allow multiple DMA transactions per spi xfer
  spi: qup: Ensure done detection
  spi: qup: Fix QUP version identify method

 drivers/spi/spi-qup.c | 566 ++
 1 file changed, 392 insertions(+), 174 deletions(-)

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of 
Code Aurora Forum, hosted by The Linux Foundation



Re: [PATCH 3.10] pstore: Make spinlock per zone instead of global

2017-07-27 Thread Leo Yan
On Fri, Jul 28, 2017 at 06:25:55AM +0200, Willy Tarreau wrote:
> Hi Leo,
> 
> There was no upstream commit ID here but I found it in mainline here :
> 
>   commit 109704492ef637956265ec2eb72ae7b3b39eb6f4
>   Author: Joel Fernandes 
>   Date:   Thu Oct 20 00:34:00 2016 -0700
> 
> pstore: Make spinlock per zone instead of global
> 
> What worries me is that some later fixes were issued, apparently to fix
> an oops and a warning after this patch :

Yes, below two patches I also notices. But at least I have not
reproduce them on Android common kernel 4.4. I only faced the hang
issue and the first patch just fixes it.

BTW, I tried to port the second and third patch, but seems the
second patch is dependency on one extra patch; so avoid to introduce
complexity to resolve issue, I just port the first one for fixing
issues.

commit 663deb47880f2283809669563c5a52ac7c6aef1a
Author: Joel Fernandes 
Date:   Thu Oct 20 00:34:01 2016 -0700

pstore: Allow prz to control need for locking

In preparation of not locking at all for certain buffers depending on if
there's contention, make locking optional depending on the initialization
of the prz.

>   commit 76d5692a58031696e282384cbd893832bc92bd76
>   Author: Kees Cook 
>   Date:   Thu Feb 9 15:43:44 2017 -0800
> 
> pstore: Correctly initialize spinlock and flags
> 
> The ram backend wasn't always initializing its spinlock correctly. Since
> it was coming from kzalloc memory, though, it was harmless on
> architectures that initialize unlocked spinlocks to 0 (at least x86 and
> ARM). This also fixes a possibly ignored flag setting too.
> 
> and :
> 
>   commit e9a330c4289f2ba1ca4bf98c2b430ab165a8931b
>   Author: Kees Cook 
>   Date:   Sun Mar 5 22:08:58 2017 -0800
> 
> pstore: Use dynamic spinlock initializer
> 
> The per-prz spinlock should be using the dynamic initializer so that
> lockdep can correctly track it. Without this, under lockdep, we get a
> warning at boot that the lock is in non-static memory.
> 
> So I'm fine with merging this patch as long as Kees is OK with this and
> we know what exact patch series needs to be merged.
> 
> Also, the information you added to the commit message references a trace
> on a 4.4 kernel. Do you confirm that you got the same issue on 3.10 ?

No, I only can confirm this on kernel 4.4. Now only kernel 4.4 are
avaliable on the board, and I verified mainline kernel can work well;
so this is why I can check difference between them and find the first
patch is critical.

> I just prefer to avoid blindly backporting sensitive patches if they're not
> absolutely needed.
> 
> > [   65.103905] hrtimer: interrupt took 2759375 ns
> > [   65.108721] BUG: spinlock recursion on CPU#0, kschedfreq:0/1246
> > [   65.108760]  lock: buffer_lock+0x0/0x38, .magic: dead4ead, .owner: 
> > kschedfreq:0/1246, .owner_cpu: 0
> > [   65.108779] CPU: 0 PID: 1246 Comm: kschedfreq:0 Not tainted 
> > 4.4.74-07294-g5c996a9-dirty #130
> 
> Thanks!
> willy


Re: [PATCH v8 06/13] iommu/amd: copy old trans table from old kernel

2017-07-27 Thread Baoquan He
On 07/27/17 at 05:38pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:04PM +0800, Baoquan He wrote:
> > @@ -2128,9 +2131,43 @@ static void early_enable_iommu(struct amd_iommu 
> > *iommu)
> >  static void early_enable_iommus(void)
> >  {
> > struct amd_iommu *iommu;
> > +   bool is_pre_enabled = false;
> >  
> > -   for_each_iommu(iommu)
> > -   early_enable_iommu(iommu);
> > +   for_each_iommu(iommu) {
> > +   if (translation_pre_enabled(iommu)) {
> > +   is_pre_enabled = true;
> > +   break;
> > +   }
> > +   }
> 
> is_pre_enabled should only be true when _all_ iommus are pre-enabled. If
> only one is found disabled just disable the others and continue without
> copying the device table.

OK, will change as you suggested.

> 


Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses

2017-07-27 Thread Borislav Petkov
On Thu, Jul 27, 2017 at 07:04:52PM -0700, Ricardo Neri wrote:
> However using the union could be less readable than having two almost
> identical functions.

So having some small duplication for the sake of clarity and readability
is much better, if you ask me. And it's not like you're duplicating a
lot of code - it is only a handful of functions.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


[PATCH] x86/boot: check overlap between kernel and EFI_BOOT_SERVICES_*

2017-07-27 Thread Naoya Horiguchi
On Wed, Jul 26, 2017 at 09:34:32AM +0800, Baoquan He wrote:
> On 07/26/17 at 09:13am, Baoquan He wrote:
> > On 07/26/17 at 12:12am, Naoya Horiguchi wrote:
> > > On Mon, Jul 24, 2017 at 02:20:44PM +0100, Matt Fleming wrote:
> > > > On Mon, 10 Jul, at 02:51:36PM, Naoya Horiguchi wrote:
> > > > > EFI_BOOT_SERVICES_{CODE|DATA} regions never overlap the kernel now,
> > > > > so we can clean up the check in efi_reserve_boot_services().
> > > > > 
> > > > > Signed-off-by: Naoya Horiguchi 
> > > > > ---
> > > > >  arch/x86/platform/efi/quirks.c | 23 +--
> > > > >  1 file changed, 1 insertion(+), 22 deletions(-)
> > > > 
> > > > Is this true for kernels not using KASLR? 
> > > 
> > > Thank you for pointing out this. It's not true depending on memmap layout.
> > > If a firmware does not define the memory around the kernel address
> > > (0x100 or CONFIG_PHYSICAL_START) as EFI_BOOT_SERVICES_*, no overlap
> > > happens.  That's true in my testing server, but I don't think that we can
> > > expect it generally.
> > > 
> > > So I think of adding some assertion in the patch 1/2 to detect this 
> > > overlap
> > > in extract_kernel() even for no KASLR case.
> > 
> > EFI_BOOT_SERVICES_* memory are collected as e820 region of
> > E820_TYPE_RAM, how can we guarantee kernel won't use them after jumping
> > into the running kernel whether KASLR enabled or not? We can only wish

Current kernels have no such check, so it's not guarantted, I think.
And I guess that most firmwares (luckily?) avoid the overlap, and/or
if the overlap happens, that might not cause any detectable error.

> > that EFI firmware engineer don't put EFI_BOOT_SERVICES_* far from
>   sorry, typo.  I meant EFI boot
> service region need be put far from 0x100. Otherwise normal kernel could
> allocate memory bottom up and stomp on them. It's embarassment caused by
> the hardware flaw of x86 platfrom.
> > 0x100 where normal kernel is loaded.


> > > So I think of adding some assertion in the patch 1/2 to detect this 
> > > overlap
> > > in extract_kernel() even for no KASLR case.

Anyway I wrote the following patch adding the assertion as a separate one.
I'm glad if I can get your feedback or advise. 

Thanks,
Naoya Horiguchi
---
From: Naoya Horiguchi 
Date: Thu, 27 Jul 2017 13:19:35 +0900
Subject: [PATCH] x86/boot: check overlap between kernel and
 EFI_BOOT_SERVICES_*

Even when KASLR is turned off, kernel still could overlap with
EFI_BOOT_SERVICES_* region, for example because of firmware memmap
layout and/or CONFIG_PHYSICAL_START.

So this patch adds an assertion that we are free from the overlap
which is called for non-KASLR case.

Signed-off-by: Naoya Horiguchi 
---
 arch/x86/boot/compressed/kaslr.c |  3 ---
 arch/x86/boot/compressed/misc.c  | 56 
 2 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index a6604717d4fe..5549c80b45c2 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -721,9 +721,6 @@ void choose_random_location(unsigned long input,
 
boot_params->hdr.loadflags |= KASLR_FLAG;
 
-   /* Prepare to add new identity pagetables on demand. */
-   initialize_identity_maps();
-
/* Record the various known unsafe memory ranges. */
mem_avoid_init(input, input_size, *output);
 
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index a0838ab929f2..b23159fa159c 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -15,6 +15,8 @@
 #include "error.h"
 #include "../string.h"
 #include "../voffset.h"
+#include 
+#include 
 
 /*
  * WARNING!!
@@ -169,6 +171,55 @@ void __puthex(unsigned long value)
}
 }
 
+#ifdef CONFIG_EFI
+bool __init
+efi_kernel_boot_services_overlap(unsigned long start, unsigned long size)
+{
+   int i;
+   u32 nr_desc;
+   struct efi_info *e = &boot_params->efi_info;
+   efi_memory_desc_t *md;
+   char *signature;
+   unsigned long pmap;
+
+   signature = (char *)&boot_params->efi_info.efi_loader_signature;
+   if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
+   strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
+   return false;
+
+#ifdef CONFIG_X86  /* Can't handle data above 4GB at this time */
+   if (e->efi_memmap_hi) {
+   warn("Memory map is above 4GB, EFI should be disabled.\n");
+   return false;
+   }
+   pmap =  e->efi_memmap;
+#else
+   pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
+#endif
+
+   add_identity_map(pmap, e->efi_memmap_size);
+
+   nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
+   for (i = 0; i < nr_desc; i++) {
+   md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size));
+   if (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) <= start)
+

[PATCH V5 2/2] cpufreq: Process remote callbacks from any CPU if the platform permits

2017-07-27 Thread Viresh Kumar
On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU
from policy-A can change frequency of CPUs belonging to policy-B.

This is quite common in case of ARM platforms where we don't
configure any per-cpu register.

Add a flag to identify such platforms and update
cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is
set.

Also enable the flag for cpufreq-dt driver which is used only on ARM
platforms currently.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq-dt.c |  1 +
 include/linux/cpufreq.h  | 18 --
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index fef3c2160691..d83ab94d041a 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -274,6 +274,7 @@ static int cpufreq_init(struct cpufreq_policy *policy)
transition_latency = CPUFREQ_ETERNAL;
 
policy->cpuinfo.transition_latency = transition_latency;
+   policy->dvfs_possible_from_any_cpu = true;
 
return 0;
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index b3b6e8203e82..227cd0f13300 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -127,6 +127,15 @@ struct cpufreq_policy {
 */
unsigned inttransition_delay_us;
 
+   /*
+* Remote DVFS flag (Not added to the driver structure as we don't want
+* to access another structure from scheduler hotpath).
+*
+* Should be set if CPUs can do DVFS on behalf of other CPUs from
+* different cpufreq policies.
+*/
+   booldvfs_possible_from_any_cpu;
+
 /* Cached frequency lookup from cpufreq_driver_resolve_freq. */
unsigned int cached_target_freq;
int cached_resolved_idx;
@@ -564,8 +573,13 @@ struct governor_attr {
 
 static inline bool cpufreq_can_do_remote_dvfs(struct cpufreq_policy *policy)
 {
-   /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
-   if (cpumask_test_cpu(smp_processor_id(), policy->cpus))
+   /*
+* Allow remote callbacks if:
+* - dvfs_possible_from_any_cpu flag is set
+* - the local and remote CPUs share cpufreq policy
+*/
+   if (policy->dvfs_possible_from_any_cpu ||
+   cpumask_test_cpu(smp_processor_id(), policy->cpus))
return true;
 
return false;
-- 
2.13.0.71.gd7076ec9c9cb



[PATCH V5 0/2] sched: cpufreq: Allow remote callbacks

2017-07-27 Thread Viresh Kumar
With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.

One testcase [1] to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.

This series updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well and updates the registered hooks to handle that.

This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.

The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
  OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
  next tick.

Rebased over: pm/linux-next

V4->V5:
- Drop cpu field from "struct update_util_data" and add it in "struct
  sugov_cpu" instead.
- Can't have separate patches now because of the above change and so
  merged all the patches from V4 into a single patch.
- Add a comment suggested by PeterZ.
- Commit log of 1/2 is improved to contain more details.
- A new patch (which was posted during V1) is also added to take care of
  platforms where any CPU can do DVFS on behalf of any other CPU, even
  if they are part of different cpufreq policies. This has been
  requested by Saravana several times already and as the series is quite
  straight forward now, I decided to include it in.

V3->V4:
- Respect iowait boost flag and util updates for the all remote
  callbacks.
- Minor updates in commit log of 2/3.

V2->V3:
- Rearranged/merged patches as suggested by Rafael (looks much better
  now)
- Also handle new hook added to intel-pstate driver.
- The final code remains the same as V2, except for the above hook.

V1->V2:
- Don't support remote callbacks for unshared cpufreq policies.
- Don't support remote callbacks where local CPU isn't part of the
  target CPU's cpufreq policy.
- Dropped dvfs_possible_from_any_cpu flag.

--
viresh

[1] http://pastebin.com/7LkMSRxE

Viresh Kumar (2):
  sched: cpufreq: Allow remote cpufreq callbacks
  cpufreq: Process remote callbacks from any CPU if the platform permits

 drivers/cpufreq/cpufreq-dt.c   |  1 +
 drivers/cpufreq/cpufreq_governor.c |  3 +++
 drivers/cpufreq/intel_pstate.c |  8 
 include/linux/cpufreq.h| 23 +++
 kernel/sched/cpufreq_schedutil.c   | 31 ++-
 kernel/sched/deadline.c|  2 +-
 kernel/sched/fair.c|  8 +---
 kernel/sched/rt.c  |  2 +-
 kernel/sched/sched.h   | 10 ++
 9 files changed, 70 insertions(+), 18 deletions(-)

-- 
2.13.0.71.gd7076ec9c9cb



[PATCH V5 1/2] sched: cpufreq: Allow remote cpufreq callbacks

2017-07-27 Thread Viresh Kumar
With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into cpufreq governors are only made from the scheduler if the target
CPU of the event is the same as the current CPU. This means there are
certain situations where a target CPU may not run the cpufreq governor
for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that the new tasks should receive maximum
demand initially, this should result in CPU0 increasing frequency
immediately. But because of the above mentioned limitation though, this
does not occur.

This patch updates the scheduler core to call the cpufreq callbacks for
remote CPUs as well.

The schedutil, ondemand and conservative governors are updated to
process cpufreq utilization update hooks called for remote CPUs where
the remote CPU is managed by the cpufreq policy of the local CPU.

The intel_pstate driver is updated to always reject remote callbacks.

This is tested with couple of usecases (Android: hackbench, recentfling,
galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit
octa-core, single policy). Only galleryfling showed minor improvements,
while others didn't had much deviation.

The reason being that this patch only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has high demand, and should take the target CPU to higher
  OPPs.
- And the target CPU doesn't call into the cpufreq governor until the
  next tick.

Based on initial work from Steve Muckle.

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_governor.c |  3 +++
 drivers/cpufreq/intel_pstate.c |  8 
 include/linux/cpufreq.h|  9 +
 kernel/sched/cpufreq_schedutil.c   | 31 ++-
 kernel/sched/deadline.c|  2 +-
 kernel/sched/fair.c|  8 +---
 kernel/sched/rt.c  |  2 +-
 kernel/sched/sched.h   | 10 ++
 8 files changed, 55 insertions(+), 18 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c 
b/drivers/cpufreq/cpufreq_governor.c
index eed069ecfd5e..58d4f4e1ad6a 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -272,6 +272,9 @@ static void dbs_update_util_handler(struct update_util_data 
*data, u64 time,
struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
u64 delta_ns, lst;
 
+   if (!cpufreq_can_do_remote_dvfs(policy_dbs->policy))
+   return;
+
/*
 * The work may not be allowed to be queued up right now.
 * Possible reasons:
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 8bc252512dbe..d9de01399dbb 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1747,6 +1747,10 @@ static void intel_pstate_update_util_pid(struct 
update_util_data *data,
struct cpudata *cpu = container_of(data, struct cpudata, update_util);
u64 delta_ns = time - cpu->sample.time;
 
+   /* Don't allow remote callbacks */
+   if (smp_processor_id() != cpu->cpu)
+   return;
+
if ((s64)delta_ns < pid_params.sample_rate_ns)
return;
 
@@ -1764,6 +1768,10 @@ static void intel_pstate_update_util(struct 
update_util_data *data, u64 time,
struct cpudata *cpu = container_of(data, struct cpudata, update_util);
u64 delta_ns;
 
+   /* Don't allow remote callbacks */
+   if (smp_processor_id() != cpu->cpu)
+   return;
+
if (flags & SCHED_CPUFREQ_IOWAIT) {
cpu->iowait_boost = int_tofp(1);
} else if (cpu->iowait_boost) {
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5f40522ec98c..b3b6e8203e82 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -562,6 +562,15 @@ struct governor_attr {
 size_t count);
 };
 
+static inline bool cpufreq_can_do_remote_dvfs(struct cpufreq_policy *policy)
+{
+   /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
+   if (cpumask_test_cpu(smp_processor_id(), policy->cpus))
+   return true;
+
+   return false;
+}
+
 /*
  * FREQUENCY TABLE HELPERS   *
  */
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 9deedd5f16a5..5465bf221e8f 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -52,6 +52,7 @@ struct sugov_policy {
 struct sugov_cpu {
struct update_util_data update_util;
struct sugov_policy *s

Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support

2017-07-27 Thread David.Wu

Hi Andrew,

在 2017/7/27 21:48, Andrew Lunn 写道:

I think we need to discuss this. This PHY appears to be on an MDIO
bus, it uses a standard PHY driver, and it appears to be using an RMII
interface. So it is just an ordinary PHY.

Internal is supposed to be something which is not ordinary, does not
use one of the standard phy modes, needs something special to make it
work.


Yes, it is a ordinary PHY in fact, using MDIO bus, but it is a internal 
phy inside Soc, so the "internal" is not the internal as Florain said.




Re: [PATCH v7] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions

2017-07-27 Thread Ingo Molnar

* Baoquan He  wrote:

> Currently KASLR will parse all e820 entries of RAM type and add all
> candidate position into slots array. Then we will choose one slot
> randomly as the new position which kernel will be decompressed into
> and run at.
> 
> On system with EFI enabled, e820 memory regions are coming from EFI
> memory regions by combining adjacent regions. While these EFI memory
> regions have more attributes to mark their different use. Mirror
> attribute is such kind. The physical memory region whose descriptors
> in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are
> mirrored. The address range mirroring feature of kernel arranges such
> mirror region into normal zone and other region into movable zone. And
> with mirroring feature enabled, the code and date of kernel can only be
> located in more reliable mirror region. However, the current KASLR code
> doesn't check EFI memory entries, and could choose new position in
> non-mirrored region. This will break the functionality of the address
> range mirroring feature.
> 
> So if EFI is detected, iterate EFI memory map and pick the mirror region
> to process for adding candidate of randomization slot. If EFI is disabled
> or no mirror region found, still process e820 memory map.
> 
> Signed-off-by: Baoquan He 
> ---
> v6->v7:
>   Ingo pointed out several incorrect line break issues and unclear 
>   description of patch log. Correct them and rewrite patch log.
> 
>   And also rewrite the EFI warning message that if EFI memmap is above
>   4G in 32bit system since 32bit system can not handle data above 4G at
>   kernel decompression stage. This is suggested by Ingo too.
> 
> v5->v6:
>   Code style issue fix according to Kees's comment.
> 
>   This is based on tip/x86/boot, patch 1,2,3/4 in v5 post has
>   been put into tip/x86/boot now.
> 
>  arch/x86/boot/compressed/kaslr.c | 68 
> ++--
>  1 file changed, 66 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c 
> b/arch/x86/boot/compressed/kaslr.c
> index 99c7194f7ea6..e1dd59ea8a8f 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -37,7 +37,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> +#include 
>  
>  /* Macros used by the included decompressor code below. */
>  #define STATIC
> @@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector *entry,
>   }
>  }
>  
> +#ifdef CONFIG_EFI
> +/*
> + * Returns true if mirror region found (and must have been processed
> + * for slots adding)
> + */
> +static bool
> +process_efi_entries(unsigned long minimum, unsigned long image_size)
> +{
> + struct efi_info *e = &boot_params->efi_info;
> + bool efi_mirror_found = false;
> + struct mem_vector region;
> + efi_memory_desc_t *md;
> + unsigned long pmap;
> + char *signature;
> + u32 nr_desc;
> + int i;
> +
> + signature = (char *)&e->efi_loader_signature;
> + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
> + strncmp(signature, EFI64_LOADER_SIGNATURE, 4))
> + return false;
> +
> +#ifdef CONFIG_X86_32
> + /* Can't handle data above 4GB at this time */
> + if (e->efi_memmap_hi) {
> + warn("EFI memmap is above 4GB, can't be handled now on x86_32. 
> EFI should be disabled.\n");
> + return false;
> + }
> + pmap =  e->efi_memmap;
> +#else
> + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32));
> +#endif
> +
> + nr_desc = e->efi_memmap_size / e->efi_memdesc_size;
> + for (i = 0; i < nr_desc; i++) {
> + md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size));
> + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) {
> + region.start = md->phys_addr;
> + region.size = md->num_pages << EFI_PAGE_SHIFT;
> + process_mem_region(®ion, minimum, image_size);
> + efi_mirror_found = true;
> +
> + if (slot_area_index == MAX_SLOT_AREA) {
> + debug_putstr("Aborted EFI scan (slot_areas 
> full)!\n");
> + break;
> + }
> + }
> + }

So I suggested this before: if you treat 'md' as an array of elements (which it 
is), then the type cast can be moved to a more natural point, where we do 
address 
calculations anyway:

md = (efi_memory_desc_t *)(e->efi_memmap | ((__u64)e->efi_memmap_hi << 
32)));

The 'pmap' variable can be eliminated and all places in the loop that use 
'md->' 
can use 'md[i].'.

Thanks,

Ingo


Re: [PATCH v3] cpu_pm: replace raw_notifier to atomic_notifier

2017-07-27 Thread Tony Lindgren
* Alex Shi  [170727 18:42]:
> Tony Lezcano found a miss use that rcu_read_lock used after rcu_idle_enter
> Paul E. McKenney suggested trying RCU_NONIDLE.

Hmm I think you have a hybrid name typo above in case you mean me :)

Tony


[PATCH 0/3] fix several TLB batch races

2017-07-27 Thread Minchan Kim
Nadav and Mel founded several subtle races caused by TLB batching.
This patchset aims for solving thoses problems using embedding
[set|clear]_tlb_flush_pending to TLB batching API.
With that, places to know TLB flush pending catch it up by
using mm_tlb_flush_pending.

Each patch includes detailed description.

This patchset is based on v4.13-rc2-mmots-2017-07-26-16-16 +
revert: "mm: prevent racy access to tlb_flush_pending" +
adding: "[PATCH v3 0/2] mm: fixes of tlb_flush_pending races".

Minchan Kim (3):
  mm: make tlb_flush_pending global
  mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
  mm: fix KSM data corruption

 arch/arm/include/asm/tlb.h  | 15 ++-
 arch/ia64/include/asm/tlb.h | 12 
 arch/s390/include/asm/tlb.h | 15 +++
 arch/sh/include/asm/tlb.h   |  4 +++-
 arch/um/include/asm/tlb.h   |  8 
 fs/proc/task_mmu.c  |  4 +++-
 include/linux/mm_types.h| 22 +-
 kernel/fork.c   |  2 --
 mm/debug.c  |  2 --
 mm/ksm.c|  3 ++-
 mm/memory.c | 24 
 11 files changed, 74 insertions(+), 37 deletions(-)

-- 
2.7.4



[PATCH 3/3] mm: fix KSM data corruption

2017-07-27 Thread Minchan Kim
Nadav reported KSM can corrupt the user data by the TLB batching race[1].
That means data user written can be lost.

Quote from Nadav Amit
"
For this race we need 4 CPUs:

CPU0: Caches a writable and dirty PTE entry, and uses the stale value for
write later.

CPU1: Runs madvise_free on the range that includes the PTE. It would clear
the dirty-bit. It batches TLB flushes.

CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We
care about the fact that it clears the PTE write-bit, and of course, batches
TLB flushes.

CPU3: Runs KSM. Our purpose is to pass the following test in
write_protect_page():

if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
(pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)))

Since it will avoid TLB flush. And we want to do it while the PTE is stale.
Later, and before replacing the page, we would be able to change the page.

Note that all the operations the CPU1-3 perform canhappen in parallel since
they only acquire mmap_sem for read.

We start with two identical pages. Everything below regards the same
page/PTE.

CPU0CPU1CPU2CPU3

Write the same
value on page

[cache PTE as
 dirty in TLB]

MADV_FREE
pte_mkclean()

4 > clear_refs
pte_wrprotect()

write_protect_page()
[ success, no flush ]

pages_indentical()
[ ok ]

Write to page
different value

[Ok, using stale
 PTE]

replace_page()

Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0
already wrote on the page, but KSM ignored this write, and it got lost.
"

In above scenario, MADV_FREE is fixed by changing TLB batching API
including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part.

This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm
and KSM checks pending TLB flush by using mm_tlb_flush_pending so that
it will flush TLB to avoid data lost if there are other parallel threads
pending TLB flush.

[1] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com

Note:
I failed to reproduce this problem through Nadav's test program which
need to tune timing in my system speed so didn't confirm it work.
Nadav, Could you test this patch on your test machine?

Thanks!

Cc: Nadav Amit 
Cc: Mel Gorman 
Cc: Hugh Dickins 
Cc: Andrea Arcangeli 
Signed-off-by: Minchan Kim 
---
 fs/proc/task_mmu.c | 4 +++-
 mm/ksm.c   | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 35be35e05153..583fc50eb36d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1019,6 +1019,7 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
enum clear_refs_types type;
int itype;
int rv;
+   struct mmu_gather tlb;
 
memset(buffer, 0, sizeof(buffer));
if (count > sizeof(buffer) - 1)
@@ -1063,6 +1064,7 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
}
 
down_read(&mm->mmap_sem);
+   tlb_gather_mmu(&tlb, mm, 0, -1);
if (type == CLEAR_REFS_SOFT_DIRTY) {
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (!(vma->vm_flags & VM_SOFTDIRTY))
@@ -1084,7 +1086,7 @@ static ssize_t clear_refs_write(struct file *file, const 
char __user *buf,
walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
if (type == CLEAR_REFS_SOFT_DIRTY)
mmu_notifier_invalidate_range_end(mm, 0, -1);
-   flush_tlb_mm(mm);
+   tlb_finish_mmu(&tlb, 0, -1);
up_read(&mm->mmap_sem);
 out_mm:
mmput(mm);
diff --git a/mm/ksm.c b/mm/ksm.c
index 4dc92f138786..d3b1c70aac18 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1038,7 +1038,8 @@ static int write_protect_page(struct vm_area_struct *vma, 
struct page *page,
goto out_unlock;
 
if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
-   (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) {
+   (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)) ||
+   mm_tlb_flush_pending(mm, true)) {
pte_t entry;
 
swapped = PageSwapCache(page);
-- 
2.7.4



[PATCH 2/3] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem

2017-07-27 Thread Minchan Kim
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman

"The race in question is CPU 0 running madv_free and updating some PTEs
while CPU 1 is also running madv_free and looking at the same PTEs.
CPU 1 may have writable TLB entries for a page but fail the pte_dirty
check (because CPU 0 has updated it already) and potentially fail to flush.
Hence, when madv_free on CPU 1 returns, there are still potentially writable
TLB entries and the underlying PTE is still present so that a subsequent write
does not necessarily propagate the dirty bit to the underlying PTE any more.
Reclaim at some unknown time at the future may then see that the PTE is still
clean and discard the page even though a write has happened in the meantime.
I think this is possible but I could have missed some protection in madv_free
that prevents it happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [set|clear]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch
there are parallel threads going on. In that case, flush TLB to prevent
for user to access memory via stale TLB entry although it fail to gather
pte entry.

I confiremd this patch works with [4] test program Nadav gave so this patch
supersedes "mm: Always flush VMA ranges affected by zap_page_range v2"
in current mmotm.

NOTE:
This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um). It seems most of architecture are straightforward but s390
need to be careful because tlb_flush_mmu works only if mm->context.flush_mm
is set to non-zero which happens only a pte entry really is cleared by
ptep_get_and_clear and friends. However, this problem never changes the
pte entries but need to flush to prevent memory access from stale tlb.

Any thoughts?

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzz...@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrf...@suse.de
[3] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

Cc: Ingo Molnar 
Cc: x...@kernel.org
Cc: Russell King 
Cc: linux-arm-ker...@lists.infradead.org
Cc: Tony Luck 
Cc: linux-i...@vger.kernel.org
Cc: Martin Schwidefsky 
Cc: "David S. Miller" 
Cc: Heiko Carstens 
Cc: linux-s...@vger.kernel.org
Cc: Yoshinori Sato 
Cc: linux...@vger.kernel.org
Cc: Jeff Dike 
Cc: user-mode-linux-de...@lists.sourceforge.net
Cc: linux-a...@vger.kernel.org
Cc: Nadav Amit 
Reported-by: Mel Gorman 
Signed-off-by: Minchan Kim 
---
 arch/arm/include/asm/tlb.h  | 15 ++-
 arch/ia64/include/asm/tlb.h | 12 
 arch/s390/include/asm/tlb.h | 15 +++
 arch/sh/include/asm/tlb.h   |  4 +++-
 arch/um/include/asm/tlb.h   |  8 
 include/linux/mm_types.h|  7 +--
 mm/memory.c | 24 
 7 files changed, 69 insertions(+), 16 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 3f2eb76243e3..8c26961f0503 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -163,13 +163,26 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct 
*mm, unsigned long start
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
tlb->batch = NULL;
 #endif
+   set_tlb_flush_pending(tlb->mm);
 }
 
 static inline void
 tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
-   tlb_flush_mmu(tlb);
+   /*
+* If there are parallel threads are doing PTE changes on same range
+* under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
+* flush by batching, a thread has stable TLB entry can fail to flush
+* the TLB by observing pte_none|!pte_dirty, for example so flush TLB
+* if we detect parallel PTE batching threads.
+*/
+   if (mm_tlb_flush_pending(tlb->mm, false) > 1) {
+   tlb->range_start = start;
+   tlb->range_end = end;
+   }
 
+   tlb_flush_mmu(tlb);
+   clear_tlb_flush_pending(tlb->mm);
/* keep the page table cache within bounds */
check_pgt_cache();
 
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index fced197b9626..22fe976a4693 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -178,6 +178,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct 
*mm, unsigned long start
tlb->start = start;
tlb->end = end;
tlb->start_addr = ~0UL;
+   set_tlb_flush_pending(tlb->mm);
 }
 
 /*
@@ -188,10 +189,21 @@ static inline void
 tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
/*
+* If there are parallel threads are doing PTE changes on same range
+* under non-exclusive lock(e.g., m

[PATCH 1/3] mm: make tlb_flush_pending global

2017-07-27 Thread Minchan Kim
Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING|
COMPACTION] but upcoming patches to solve subtle TLB flush bacting
problem will use it regardless of compaction/numa so this patch
doesn't remove the dependency.

Cc: Nadav Amit 
Cc: Mel Gorman 
Signed-off-by: Minchan Kim 
---
 include/linux/mm_types.h | 15 ---
 kernel/fork.c|  2 --
 mm/debug.c   |  2 --
 3 files changed, 19 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4b9a625c370c..6953d2c706fe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -487,14 +487,12 @@ struct mm_struct {
/* numa_scan_seq prevents two threads setting pte_numa */
int numa_scan_seq;
 #endif
-#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
/*
 * An operation with batched TLB flushing is going on. Anything that
 * can move process memory needs to flush the TLB when moving a
 * PROT_NONE or PROT_NUMA mapped page.
 */
atomic_t tlb_flush_pending;
-#endif
 #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH
/* See flush_tlb_batched_pending() */
bool tlb_flush_batched;
@@ -522,7 +520,6 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
return mm->cpu_vm_mask_var;
 }
 
-#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
 /*
  * Memory barriers to keep this state in sync are graciously provided by
  * the page table locks, outside of which no page table modifications happen.
@@ -565,18 +562,6 @@ static inline void clear_tlb_flush_pending(struct 
mm_struct *mm)
smp_mb__before_atomic();
atomic_dec(&mm->tlb_flush_pending);
 }
-#else
-static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked)
-{
-   return false;
-}
-static inline void set_tlb_flush_pending(struct mm_struct *mm)
-{
-}
-static inline void clear_tlb_flush_pending(struct mm_struct *mm)
-{
-}
-#endif
 
 struct vm_fault;
 
diff --git a/kernel/fork.c b/kernel/fork.c
index aaf4d70afd8b..7e9f42060976 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -807,9 +807,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, 
struct task_struct *p,
mm_init_aio(mm);
mm_init_owner(mm, p);
mmu_notifier_mm_init(mm);
-#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
atomic_set(&mm->tlb_flush_pending, 0);
-#endif
 #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
 #endif
diff --git a/mm/debug.c b/mm/debug.c
index d70103bb4731..18a9b15b1e37 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -158,9 +158,7 @@ void dump_mm(const struct mm_struct *mm)
 #ifdef CONFIG_NUMA_BALANCING
mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq,
 #endif
-#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION)
atomic_read(&mm->tlb_flush_pending),
-#endif
mm->def_flags, &mm->def_flags
);
 }
-- 
2.7.4



[PATCH] ARM: rockchip: enable ZONE_DMA for non 64-bit capable peripherals

2017-07-27 Thread Tao Huang
Most IP cores on ARM Rockchip platforms can only address 32 bits of
physical memory for DMA. Thus ZONE_DMA should be enabled when LPAE
is activated.

Signed-off-by: Tao Huang 
---
 arch/arm/mach-rockchip/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/mach-rockchip/Kconfig b/arch/arm/mach-rockchip/Kconfig
index 9ad84cd01ba0..fafd3d7f9f8c 100644
--- a/arch/arm/mach-rockchip/Kconfig
+++ b/arch/arm/mach-rockchip/Kconfig
@@ -16,6 +16,7 @@ config ARCH_ROCKCHIP
select ROCKCHIP_TIMER
select ARM_GLOBAL_TIMER
select CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
+   select ZONE_DMA if ARM_LPAE
help
  Support for Rockchip's Cortex-A9 Single-to-Quad-Core-SoCs
  containing the RK2928, RK30xx and RK31xx series.
-- 
2.7.4




Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error

2017-07-27 Thread Ingo Molnar

* Borislav Petkov  wrote:

> BUT(!), I just realized, I *think* I can address this much more elegantly: 
> extend trace_mce_record() by adding the decoded string as its last argument. 
> And 
> that's fine, I'm being told, because adding arguments to the tracepoints is 
> not 
> a big deal, removing them is hard. And actually, we have added args before, 
> come 
> to think of it:

Yeah, structured, append-only ABIs are elegant - that's what perf uses too.

> I agree with the rest but you're obviously preaching to the choir.
> 
> :-)

Had do vent my (non kernel tree integrated) Linux tooling frustration!! ;-)

Thanks,

Ingo


Re: [PATCH -tip v5 1/2] irq: Introduce CONFIG_IRQENTRY kconfig

2017-07-27 Thread Ingo Molnar

* Masami Hiramatsu  wrote:

> Introduce CONFIG_IRQENTRY to simplify generating
> irqentry and softirqentry text sections.
> Currently generating those sections depends on
> CONFIG_FUNCTION_GRAPH_TRACER and/or CONFIG_KASAN, in
> each source code. This moves those #ifdef dependencies
> into Kconfig, instead of the actual code. This makes
> it scalable for other user of irqentry section.

Please just make it unconditional. That would remove a number of messy #ifdefs, 
and extra sections/symbols don't have much if any cost.

Thanks,

Ingo


[PATCH 2/2] init/main.c: Fixes quoted string split across lines & missing blank line after declaration

2017-07-27 Thread janani-sankarababu
Signed-off-by: Janani S 
---
 init/main.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/init/main.c b/init/main.c
index f8eb4966..920b829 100644
--- a/init/main.c
+++ b/init/main.c
@@ -176,6 +176,7 @@ static bool __init obsolete_checksetup(char *line)
p = __setup_start;
do {
int n = strlen(p->str);
+
if (parameqn(line, p->str, n)) {
if (p->early) {
/* Already done in parse_early_param?
@@ -302,6 +303,7 @@ static int __init unknown_bootoption(char *param, char *val,
if (val) {
/* Environment option */
unsigned int i;
+
for (i = 0; envp_init[i]; i++) {
if (i == MAX_INIT_ENVS) {
panic_later = "env";
@@ -314,6 +316,7 @@ static int __init unknown_bootoption(char *param, char *val,
} else {
/* Command line option */
unsigned int i;
+
for (i = 0; argv_init[i]; i++) {
if (i == MAX_INIT_ARGS) {
panic_later = "init";
@@ -1020,7 +1023,7 @@ static int __ref kernel_init(void *unused)
!try_to_run_init_process("/bin/sh"))
return 0;
 
-   panic("No working init found.  Try passing init= option to kernel. "
+   panic("No working init found.  Try passing init= option to kernel.\n"
  "See Linux Documentation/admin-guide/init.rst for guidance.");
 }
 
-- 
1.9.1



Re: [RFC PATCH 3/3] mm: shm: Use new hugetlb size encoding definitions

2017-07-27 Thread Michal Hocko
On Thu 27-07-17 14:18:11, Mike Kravetz wrote:
> On 07/27/2017 12:50 AM, Michal Hocko wrote:
> > On Wed 26-07-17 10:39:30, Mike Kravetz wrote:
> >> On 07/26/2017 03:07 AM, Michal Hocko wrote:
> >>> On Wed 26-07-17 11:53:38, Michal Hocko wrote:
>  On Mon 17-07-17 15:28:01, Mike Kravetz wrote:
> > Use the common definitions from hugetlb_encode.h header file for
> > encoding hugetlb size definitions in shmget system call flags.  In
> > addition, move these definitions to the from the internal to user
> > (uapi) header file.
> 
>  s@to the from@from@
> 
> >
> > Suggested-by: Matthew Wilcox 
> > Signed-off-by: Mike Kravetz 
> 
>  with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@
> 
>  Acked-by: Michal Hocko 
> >>>
> >>> Btw. man page mentions only 2MB and 1GB, we should document others and
> >>> note that each arch might support only subset of them
> >>
> >> Thanks for looking at these Michal.
> >> BTW, those definitions below are wrong.  They should be SHM_HUGE_*. :(
> > 
> > Ups, and I completely missed that.
> > 
> >> In the overview of this RFC, I mentioned still needing to address the
> >> comment from Aneesh about splitting SHM_HUGE_* definitions into arch
> >> specific header files.  This is how it is done for mmap.  If an arch
> >> supports multiple huge page sizes, the 'asm/mman.h' contains definitions
> >> for those sizes.  There will be a bit of churn (such as header file
> >> renaming) to do this for shm as well.  So, I keep going back and forth
> >> asking myself 'is it worth it'?
> > 
> > Why cannot we use a generic header? Btw. I think it would be better for
> > MMAP definitions as well.
> 
> I assume you are asking about a uapi asm-generic header file?  Currently
> mmap has two such files:  mman.h and mman-common.h.  In order to get the
> definitions in such files, arch specific header files must #include the
> asm-generic headers.  There are arch specific mmap headers today that do
> not include either of the asm-generic headers.  And, they have their own
> definitions for MAP_HUGE_SHIFT.  So, it seems we can not use one of the
> existing mmap asm-generic header files.  Rather, we would need to create
> a new one and have that included by all arch specific files.

yes, add a new one like you did in your first patch

> However, ALL the MAP_HUGE_* definitions in all the arch specific and
> asm-generic header files are the same.  It would be possible to just put
> all those MAP_HUGE_* definitions in the primary uapi header file
> (include/uapi/linux/mman.h).  If there was ever a need for arch specific
> values in the future, we could split them out at that time.

agreed

[...]

> >> - Another alternative is to make all known huge page sizes available
> >>   to all users.  This is 'easier' as the definitions can likely reside
> >>   in a common header file.  The user will  need to determine what
> >>   huge page sizes are supported by the running kernel as mentioned in
> >>   the man page.
> > 
> > yes I think this makes more sense.
> 
> Ok, thanks.
> 
> The only remaining question is what kind of common header to use:
> 1) An asm-generic header file in case there may be arch specific differences
>in the future.
> 2) Use the primary uapi header file in include/uapi/linux/mman|shm.h.

I would use the primary one and only got the arch specific if we ever
need to do arch specific thing.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts

2017-07-27 Thread Paolo Bonzini
On 28/07/2017 04:31, Longpeng (Mike) wrote:
> Hi Paolo,
> 
> On 2017/6/6 18:57, Paolo Bonzini wrote:
> 
>> In some cases, for example involving hot-unplug of assigned
>> devices, pi_post_block can forget to remove the vCPU from the
>> blocked_vcpu_list.  When this happens, the next call to
>> pi_pre_block corrupts the list.
>>
>> Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
>> and WARN instead of adding the element twice in the list.  Second,
>> always do the list removal in pi_post_block if vcpu->pre_pcpu is
>> set (not -1).
>>
>> The new code keeps interrupts disabled for the whole duration of
>> pi_pre_block/pi_post_block.  This is not strictly necessary, but
>> easier to follow.  For the same reason, PI.ON is checked only
>> after the cmpxchg, and to handle it we just call the post-block
>> code.  This removes duplication of the list removal code.
>>
>> Cc: Longpeng (Mike) 
>> Cc: Huangweidong 
>> Cc: Gonglei 
>> Cc: wangxin 
>> Cc: Radim Krčmář 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  arch/x86/kvm/vmx.c | 62 
>> ++
>>  1 file changed, 25 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 747d16525b45..0f4714fe4908 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>>  struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>>  struct pi_desc old, new;
>>  unsigned int dest;
>> -unsigned long flags;
>>  
>>  do {
>>  old.control = new.control = pi_desc->control;
>> +WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
>> + "Wakeup handler not enabled while the VCPU is blocked\n");
>>  
>>  dest = cpu_physical_id(vcpu->cpu);
>>  
>> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>>  } while (cmpxchg(&pi_desc->control, old.control,
>>  new.control) != old.control);
>>  
>> -if(vcpu->pre_pcpu != -1) {
>> -spin_lock_irqsave(
>> -&per_cpu(blocked_vcpu_on_cpu_lock,
>> -vcpu->pre_pcpu), flags);
>> +if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> 
> 
> __pi_post_block is only called by pi_post_block/pi_pre_block now, it seems 
> that
> both of them would make sure "vcpu->pre_pcpu != -1" before __pi_post_block is
> called, so maybe the above check is useless, right?

It's because a WARN is better than a double-add.  And even if the caller
broke the invariant you'd have to do the cmpxchg loop above to make
things not break too much.

Paolo


Re: [PATCH v1 2/2] acpi, x86: Remove encryption mask from ACPI page protection type

2017-07-27 Thread Ingo Molnar

* Tom Lendacky  wrote:

> > > +  * in memory in an encrypted state so return a protection attribute
> > > +  * that does not have the encryption bit set.
> > >*/
> > > -  return PAGE_KERNEL;
> > > +  return sme_active() ? PAGE_KERNEL_IO : PAGE_KERNEL;
> > 
> > Why isn't there a PAGE_KERNEL_NOENC define which you can simply return
> > instead of testing?
> 
> Sounds like something I should add to pgtable_types.h (which has a
> #define for PAGE_KERNEL_EXEC_NOENC, but not PAGE_KERNEL_NOENC). I'll
> create that #define.
> 
> As for the sme_active() check I was getting ahead of myself since
> under SEV the encryption mask is needed.  I'll change it to just
> return PAGE_KERNEL_NOENC and then worry about the SEV change in
> the SEV patches.

Ok, that works for me too, as we at least don't sprinke the code with repeated 
sme_active() toggles.

Thanks,

Ingo


Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-27 Thread Michal Hocko
On Thu 27-07-17 16:55:59, Andrea Arcangeli wrote:
> On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote:
> > Yes this will work and it won't depend on the oom_lock. But isn't it
> > just more ugly than simply doing
> > 
> > if (tsk_is_oom_victim) {
> > down_write(&mm->mmap_sem);
> > locked = true;
> > }
> > free_pgtables(...)
> > [...]
> > if (locked)
> > down_up(&mm->mmap_sem);
> 
> To me not doing if (tsk_is_oom...) { down_write; up_write } is by
> default a confusing implementation, because it's not strict and not
> strict code is not self documenting and you've to think twice of why
> you're doing something the way you're doing it.

I disagree. But this is a matter of taste, I guess. I prefer when
locking is explicit and clear about the scope. An empty locked region
used for synchronization is just obscure and you have to think much more
what it means when you can see what the lock actually protects. It is
also less error prone I believe.

> The doubt on what was the point to hold the mmap_sem during
> free_pgtables is precisely why I started digging into this issue
> because it didn't look possible you could truly benefit from holding
> the mmap_sem during free_pgtables.

The point is that you cannot remove page tables while somebody is
walking them. This is enforced in all other paths except for exit which
was kind of special because nobody could race there.

> I also don't like having a new invariant that your solution relies on,
> that is mm->mmap = NULL, when we can make just set the MMF_OOM_SKIP a
> bit earlier that it gets set anyway and use that to control the other
> side of the race.

Well, again a matter of taste. I prefer making it clear that mm->mmap is
no longer valid because it points to a freed pointer. Relying on
MMF_OOM_SKIP for the handshake is just abusing the flag for a different
purpose than it was intended.
 
> I like strict code that uses as fewer invariants as possible and that
> never holds a lock for any instruction more than it is required (again
> purely for self documenting reasons, the CPU won't notice much one
> instruction more or less).
> 
> Even with your patch the two branches are unnecessary, that may not be
> measurable, but it's still wasted CPU. It's all about setting mm->mmap
> before the up_write. In fact my patch should at least put an incremental
> unlikely around my single branch added to exit_mmap.
>
> I see the {down_write;up_write} Hugh's ksm_exit-like as a strict
> solution to this issue and I wrote it specifically while trying to
> research a way to be more strict because from the start it didn't look
> the holding of the mmap_sem during free_pgtables was necessary.

While we have discussed that with Hugh he has shown an interest to
actually get rid of this (peculiar) construct which would be possible if
the down_write was implicit in exit_mmap [1].

> I'm also fine to drop the oom_lock but I think it can be done
> incrementally as it's a separate issue, my second patch should allow
> for it with no adverse side effects.
>
> All I care about is the exit_mmap path because it runs too many times
> not to pay deep attention to every bit of it ;).

Well, all I care about is to fix this over-eager oom killing due to oom
reaper setting MMF_OOM_SKIP too early. That is a regression. I think we
can agree that both proposed solutions are functionally equivalent. I do
not want to push mine too hard so if you _believe_ that yours is really
better feel free to post it for inclusion.

[1] http://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils
-- 
Michal Hocko
SUSE Labs


Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-27 Thread Michael Ellerman
Jens Axboe  writes:
> On 07/27/2017 08:47 AM, Bart Van Assche wrote:
>> On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote:
>>> The bug looks like SCSI running the queue inline from IRQ
>>> context, that's not a good idea.
...
>> 
>> scsi_run_queue() works fine if no scheduler is configured. Additionally, that
>> code predates the introduction of blk-mq I/O schedulers. I think it is
>> nontrivial for block driver authors to figure out that a queue has to be run
>> from process context if a scheduler has been configured that does not support
>> to be run from interrupt context.
>
> No it doesn't, you could never run the queue from interrupt context with
> async == false. So I don't think that's confusing at all, you should
> always be aware of the context.
>
>> How about adding WARN_ON_ONCE(in_interrupt()) to
>> blk_mq_start_hw_queue() or replacing the above patch by the following:
>
> No, I hate having dependencies like that, because they always just catch
> one of them. Looks like the IPR path that hits this should just offload
> to a workqueue or similar, you don't have to make any scsi_run_queue()
> async.

OK, so the resolution is "fix it in IPR" ?

cheers


Re: [PATCH v2] qe: fix compile issue for arm64

2017-07-27 Thread Michael Ellerman
Zhao Qiang  writes:

> Signed-off-by: Zhao Qiang 
> ---
> Changes for v2:
>   - include all Errata QE_General4 in #ifdef
>
>  drivers/soc/fsl/qe/qe.c | 2 ++
>  1 file changed, 2 insertions(+)

AFAICS this driver can only be built on PPC, what am I missing?

config QUICC_ENGINE
bool "Freescale QUICC Engine (QE) Support"
depends on FSL_SOC && PPC32

cheers


[PATCH] init:main.c: Fixed issues in Block comments and removed braces in single statement if block

2017-07-27 Thread janani-sankarababu
Signed-off-by: Janani S 

---
 init/main.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/init/main.c b/init/main.c
index 052481f..f8eb4966 100644
--- a/init/main.c
+++ b/init/main.c
@@ -181,7 +181,8 @@ static bool __init obsolete_checksetup(char *line)
/* Already done in parse_early_param?
 * (Needs exact match on param part).
 * Keep iterating, as we can have early
-* params and __setups of same names 8( */
+* params and __setups of same names
+*/
if (line[n] == '\0' || line[n] == '=')
had_early_param = true;
} else if (!p->setup_func) {
@@ -693,9 +694,9 @@ asmlinkage __visible void __init start_kernel(void)
arch_post_acpi_subsys_init();
sfi_init_late();
 
-   if (efi_enabled(EFI_RUNTIME_SERVICES)) {
+   if (efi_enabled(EFI_RUNTIME_SERVICES))
efi_free_boot_services();
-   }
+
 
/* Do the rest non-__init'ed, we're now alive */
rest_init();
-- 
1.9.1



[PATCH v6.1 4/7] drm/rockchip: vop: group vop registers

2017-07-27 Thread Mark Yao
Grouping the vop registers facilitates make register
definition clearer, and also is useful for different vop
reuse the same group register.

Signed-off-by: Mark Yao 
Reviewed-by: Jeffy Chen 
---
Changes in v6.1
- fix Null pointer crash on vop_reg_set

 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 101 -
 drivers/gpu/drm/rockchip/rockchip_drm_vop.h |  60 ---
 drivers/gpu/drm/rockchip/rockchip_vop_reg.c | 112 +++-
 3 files changed, 146 insertions(+), 127 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index fd47da5..39912f2 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -42,30 +42,19 @@
 #include "rockchip_drm_psr.h"
 #include "rockchip_drm_vop.h"
 
-#define REG_SET(x, base, reg, v) \
-   vop_mask_write(x, base + reg.offset, reg.mask, reg.shift, \
-  v, reg.write_mask, reg.relaxed)
-#define REG_SET_MASK(x, base, reg, mask, v) \
-   vop_mask_write(x, base + reg.offset, \
-  mask, reg.shift, v, reg.write_mask, reg.relaxed)
-
 #define VOP_WIN_SET(x, win, name, v) \
-   REG_SET(x, win->base, win->phy->name, v)
+   vop_reg_set(vop, &win->phy->name, win->base, ~0, v, #name)
 #define VOP_SCL_SET(x, win, name, v) \
-   REG_SET(x, win->base, win->phy->scl->name, v)
+   vop_reg_set(vop, &win->phy->scl->name, win->base, ~0, v, #name)
 #define VOP_SCL_SET_EXT(x, win, name, v) \
-   REG_SET(x, win->base, win->phy->scl->ext->name, v)
-#define VOP_CTRL_SET(x, name, v) \
-   REG_SET(x, 0, (x)->data->ctrl->name, v)
-
-#define VOP_INTR_GET(vop, name) \
-   vop_read_reg(vop, 0, &vop->data->ctrl->name)
-
-#define VOP_INTR_SET(vop, name, v) \
-   REG_SET(vop, 0, vop->data->intr->name, v)
+   vop_reg_set(vop, &win->phy->scl->ext->name, \
+   win->base, ~0, v, #name)
 
 #define VOP_INTR_SET_MASK(vop, name, mask, v) \
-   REG_SET_MASK(vop, 0, vop->data->intr->name, mask, v)
+   vop_reg_set(vop, &vop->data->intr->name, 0, mask, v, #name)
+
+#define VOP_REG_SET(vop, group, name, v) \
+   vop_reg_set(vop, &vop->data->group->name, 0, ~0, v, #name)
 
 #define VOP_INTR_SET_TYPE(vop, name, type, v) \
do { \
@@ -82,7 +71,7 @@
vop_get_intr_type(vop, &vop->data->intr->name, type)
 
 #define VOP_WIN_GET(x, win, name) \
-   vop_read_reg(x, win->base, &win->phy->name)
+   vop_read_reg(x, win->offset, win->phy->name)
 
 #define VOP_WIN_GET_YRGBADDR(vop, win) \
vop_readl(vop, win->base + win->phy->yrgb_mst.offset)
@@ -164,14 +153,22 @@ static inline uint32_t vop_read_reg(struct vop *vop, 
uint32_t base,
return (vop_readl(vop, base + reg->offset) >> reg->shift) & reg->mask;
 }
 
-static inline void vop_mask_write(struct vop *vop, uint32_t offset,
- uint32_t mask, uint32_t shift, uint32_t v,
- bool write_mask, bool relaxed)
+static void vop_reg_set(struct vop *vop, const struct vop_reg *reg,
+   uint32_t _offset, uint32_t _mask, uint32_t v,
+   const char *reg_name)
 {
-   if (!mask)
+   int offset, mask, shift;
+
+   if (!reg || !reg->mask) {
+   dev_dbg(vop->dev, "Warning: not support %s\n", reg_name);
return;
+   }
+
+   offset = reg->offset + _offset;
+   mask = reg->mask & _mask;
+   shift = reg->shift;
 
-   if (write_mask) {
+   if (reg->write_mask) {
v = ((v << shift) & 0x) | (mask << (shift + 16));
} else {
uint32_t cached_val = vop->regsbak[offset >> 2];
@@ -180,7 +177,7 @@ static inline void vop_mask_write(struct vop *vop, uint32_t 
offset,
vop->regsbak[offset >> 2] = v;
}
 
-   if (relaxed)
+   if (reg->relaxed)
writel_relaxed(v, vop->regs + offset);
else
writel(v, vop->regs + offset);
@@ -202,7 +199,7 @@ static inline uint32_t vop_get_intr_type(struct vop *vop,
 
 static inline void vop_cfg_done(struct vop *vop)
 {
-   VOP_CTRL_SET(vop, cfg_done, 1);
+   VOP_REG_SET(vop, common, cfg_done, 1);
 }
 
 static bool has_rb_swapped(uint32_t format)
@@ -540,7 +537,7 @@ static int vop_enable(struct drm_crtc *crtc)
 
spin_lock(&vop->reg_lock);
 
-   VOP_CTRL_SET(vop, standby, 0);
+   VOP_REG_SET(vop, common, standby, 1);
 
spin_unlock(&vop->reg_lock);
 
@@ -600,7 +597,7 @@ static void vop_crtc_disable(struct drm_crtc *crtc)
 
spin_lock(&vop->reg_lock);
 
-   VOP_CTRL_SET(vop, standby, 1);
+   VOP_REG_SET(vop, common, standby, 1);
 
spin_unlock(&vop->reg_lock);
 
@@ -923,7 +920,7 @@ static void vop_crtc_e

RE: [PATCH] cpufreq: x86: Make scaling_cur_freq behave more as expected

2017-07-27 Thread Doug Smythies
On 2017.07.27 17:13 Rafael J. Wysocki wrote:

> From: Rafael J. Wysocki 
>
> After commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to
> calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
> in sysfs only behaves as expected on x86 with APERF/MPERF registers
> available when it is read from at least twice in a row.
>
> The value returned by the first read may not be meaningful, because
> the computations in there use cached values from the previous
> aperfmperf_snapshot_khz() call which may be stale.  However, the
> interface is expected to return meaningful values on every read,
> including the first one.
>
> To address this problem modify arch_freq_get_on_cpu() to call
> aperfmperf_snapshot_khz() twice, with a short delay between
> these calls, if the previous invocation of aperfmperf_snapshot_khz()
> was too far back in the past (specifically, more that 1s ago) and
> adjust aperfmperf_snapshot_khz() for that.
>
> Fixes: f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz 
> using APERF/MPERF"
> Reported-by: Doug Smythies 
> Signed-off-by: Rafael J. Wysocki 
> ---
> arch/x86/kernel/cpu/aperfmperf.c |   36 +---
> 1 file changed, 29 insertions(+), 7 deletions(-)
>
> Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c

...[deleted the rest]...

This proposed patch would be good. However, I can only try it maybe by Sunday.
I think the maximum time span means that this code:

/*
 * if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then
 *  khz = (cpu_khz * aperf_delta) / mperf_delta
 */
if (div64_u64(ULLONG_MAX, cpu_khz) > aperf_delta)
s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);
else/* khz = aperf_delta / (mperf_delta / cpu_khz) */
s->khz = div64_u64(aperf_delta,
div64_u64(mperf_delta, cpu_khz));

Could be reduced to this:

s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta);

Because it could never overflow anymore.

... Doug




Re: [PATCH 2/2] arm64: dts: Add device node for pmi8994 gpios

2017-07-27 Thread Vivek Gautam
On Fri, Jul 28, 2017 at 2:30 AM, Stephen Boyd  wrote:
> On 07/26/2017 11:30 PM, Vivek Gautam wrote:
>> Signed-off-by: Vivek Gautam 
>> ---
>>  arch/arm64/boot/dts/qcom/pmi8994.dtsi | 17 +
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/arch/arm64/boot/dts/qcom/pmi8994.dtsi 
>> b/arch/arm64/boot/dts/qcom/pmi8994.dtsi
>> index d3879a4e8076..3b04ca63c31e 100644
>> --- a/arch/arm64/boot/dts/qcom/pmi8994.dtsi
>> +++ b/arch/arm64/boot/dts/qcom/pmi8994.dtsi
>> @@ -8,6 +8,23 @@
>>   reg = <0x2 SPMI_USID>;
>>   #address-cells = <1>;
>>   #size-cells = <0>;
>> +
>> + pmi8994_gpios: gpios@c000 {
>> + compatible = "qcom,pmi8994-gpio";
>
>
> Please also include the generic "qcom,spmi-gpio" compatible. We really
> don't need to add the "qcom,pmi8994-gpio" one to the driver right now so
> I would leave that out of the driver but leave it here in the dtsi file
> of course.

Sure, will add the "qcom,spmi-gpio" compatible as well, and will drop the
"qcom,pmi8994-gpio" from the driver.
Thanks.

regards
Vivek

>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


Re: [PATCH] device property: Fix usecount for of_graph_get_port_parent()

2017-07-27 Thread Tony Lindgren
* Rob Herring  [170727 15:19]:
> On Thu, Jul 27, 2017 at 4:44 AM, Tony Lindgren  wrote:
> > --- a/drivers/of/property.c
> > +++ b/drivers/of/property.c
> > @@ -708,6 +708,15 @@ struct device_node *of_graph_get_port_parent(struct 
> > device_node *node)
> >  {
> > unsigned int depth;
> >
> > +   if (!node)
> > +   return NULL;
> > +
> > +   /*
> > +* Preserve usecount for passed in node as of_get_next_parent()
> > +* will do of_node_put() on it.
> > +*/
> > +   of_node_get(node);
> 
> I think this messes up of_graph_get_remote_port_parent(). First it
> calls of_graph_get_remote_endpoint which returns the endpoint node
> with ref count incremented. Then you are incrementing it again here.

Hmm OK looks like I missed that one. If we want to have
of_graph_get_port_parent not trash the node passed to it, we should
just change things there too:

struct device_node *of_graph_get_remote_port_parent(
const struct device_node *node)
{
struct device_node *np, *pp;

/* Get remote endpoint node. */
np = of_graph_get_remote_endpoint(node);

pp = of_graph_get_port_parent(np);
of_node_put(np);

return pp;
}
EXPORT_SYMBOL(of_graph_get_remote_port_parent);

Does that make sense to you?

> > diff --git a/sound/soc/generic/audio-graph-card.c 
> > b/sound/soc/generic/audio-graph-card.c
> > --- a/sound/soc/generic/audio-graph-card.c
> > +++ b/sound/soc/generic/audio-graph-card.c
> > @@ -224,7 +224,6 @@ static int asoc_graph_card_parse_of(struct 
> > graph_card_data *priv)
> >
> > of_for_each_phandle(&it, rc, node, "dais", NULL, 0) {
> > ret = asoc_graph_card_dai_link_of(it.node, priv, idx++);
> > -   of_node_put(it.node);
> > if (ret < 0)
> > return ret;
> 
> I think you need a put here.

Do you mean on error it should be as below, right?

ret = asoc_graph_card_dai_link_of(it.node, priv, idx++);
if (ret < 0) {
of_node_put(it.node);
return ret;
}

Regards,

Tony


Re: [Eas-dev] [PATCH V3 1/3] sched: cpufreq: Allow remote cpufreq callbacks

2017-07-27 Thread Viresh Kumar
On 27-07-17, 12:55, Saravana Kannan wrote:
> Yes. Simplifying isn't always about number of lines of code. It's also about
> abstraction. Having generic scheduler code care about HW details doesn't
> seem nice.

I can argue that even the policy->cpus field is also hardware
specific, isn't it ? And we are using that in the schedutil governor
anyway. What's wrong with having another field (in a generic way) in
the same structure that tells us more about hardware ?

And then schedutil isn't really scheduler, but a cpufreq governor.
Just like ondemand/conservative, which are also called from the same
scheduler path.

> It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in"
> policy->cpus).
> 
> Also, this is only for drivers that currently support fast switching. How
> many of those do you have?

Why? Why shouldn't we do that for the other drivers? I think it should
be done across everyone.

> >The core already has most of the data required and I believe that we
> >need to handle it in the governor's code as is handled in this series.
> 
> Clearly, it doesn't. You are just making assumptions about HW.

So assuming that any CPU from a policy can change freq on behalf of
all the CPUs of the same policy is wrong? That is the basis of how the
cpufreq core is designed.

-- 
viresh


Re: linux-next: Signed-off-by missing for commit in the renesas tree

2017-07-27 Thread Stephen Rothwell
Hi,

On Fri, 28 Jul 2017 12:09:12 +0700 jmondi  wrote:
>
> On Fri, Jul 28, 2017 at 09:21:08AM +1000, Stephen Rothwell wrote:
> >
> > Commit
> >
> >   bb003371172f ("arm: dts: genmai: Add RIIC2 pin group")
> >
> > is missing a Signed-off-by from its committer.  
> 
> I do see my SOB in the patch I've sent. Any chance it has been dropped
> while applying it? Should I resend?

The commit is missing Simon's Signed-off-by ...

-- 
Cheers,
Stephen Rothwell


Re: [PATCH V3] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level

2017-07-27 Thread Anshuman Khandual
On 07/28/2017 06:19 AM, Mike Kravetz wrote:
> On 05/16/2017 03:05 AM, Anshuman Khandual wrote:
>> Though migrating gigantic HugeTLB pages does not sound much like real
>> world use case, they can be affected by memory errors. Hence migration
>> at the PGD level HugeTLB pages should be supported just to enable soft
>> and hard offline use cases.
> 
> Hi Anshuman,
> 
> Sorry for the late question, but I just stumbled on this code when
> looking at something else.
> 
> It appears the primary motivation for these changes is to handle
> memory errors in gigantic pages.  In this case, you migrate to

Right.

> another gigantic page.  However, doesn't this assume that there is

Right.

> a pre-allocated gigantic page sitting unused that will be the target
> of the migration?  alloc_huge_page_node will not allocate a gigantic
> page.  Or, am I missing something?

Yes, its in the context of 16GB pages on POWER8 system where all the
gigantic pages are pre allocated from the platform and passed on to
the kernel through the device tree. We dont allocate these gigantic
pages on runtime.



Re: [PATCH V1 02/12] spmi: pmic-arb: rename pa_xx to pmic_arb_xx and other cleanup

2017-07-27 Thread kgunda

On 2017-07-28 05:30, Stephen Boyd wrote:

On 07/20, Kiran Gunda wrote:

This patch cleans up the following.

- Rename the "pa" to "pmic_arb".
- Rename the spmi_pmic_arb *dev to spmi_pmic_arb *pmic_arb.
- Rename the pa_{read,write}_data() functions to
  pmic_arb_{read,write}_data().
- Rename channel to APID.
- Rename the HWIRQ_*() macros to hwirq_to_*().

Signed-off-by: Kiran Gunda 
---


Reviewed-by: Stephen Boyd 

Thanks. objdiff doesn't show any diff anymore so I'm much more
confident this doesn't change any functionality.

Thanks for introducing me a nice tool !


linux-next: Tree for Jul 28

2017-07-27 Thread Stephen Rothwell
Hi all,

Changes since 20170727:

Non-merge commits (relative to Linus' tree): 2843
 2887 files changed, 101051 insertions(+), 48048 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 267 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (0ce2f3851193 Merge tag 'acpi-4.13-rc3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (b4b8cbf679c4 Cavium CNN55XX: fix broken default Kconfig 
entry)
Merging kbuild-current/fixes (ad8181060788 kconfig: fix sparse warnings in 
nconfig)
Merging arc-current/for-curr (37f1db0e85ff ARC: [plat-axs10x]: prepare dts 
files for enabling PAE40 on axs103)
Merging arm-current/fixes (ce184a0dee92 ARM: 8687/1: signal: Fix unparseable 
iwmmxt_sigframe in uc_regspace[])
Merging m68k-current/for-linus (204a2be30a7a m68k: Remove ptrace_signal_deliver)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (0da12a7a81f1 powerpc/mm/hash: Free the 
subpage_prot_table correctly)
Merging sparc/master (8cd3ec51c0c3 sbus: Convert to using %pOF instead of 
full_name)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (6b84202c946c sctp: fix the check for _sctp_walk_params and 
_sctp_walk_errors)
Merging ipsec/master (e6194923237f esp: Fix memleaks on error paths.)
Merging netfilter/master (f7fb77fc1235 netfilter: nft_compat: check extension 
hook mask only if set)
Merging ipvs/master (3c5ab3f395d6 ipvs: SNAT packet replies only for NATed 
connections)
Merging wireless-drivers/master (5f5d03143de5 brcmfmac: fix memleak due to 
calling brcmf_sdiod_sgtable_alloc() twice)
Merging mac80211/master (d7f13f745036 cfg80211: Validate frequencies nested in 
NL80211_ATTR_SCAN_FREQUENCIES)
Merging sound-current/for-linus (ba92b1142879 ALSA: hda - Add mute led support 
for HP ProBook 440 G4)
Merging pci-current/for-linus (34d5ac2af644 PCI: rockchip: Check for 
pci_scan_root_bus_bridge() failure correctly)
Merging driver-core.current/driver-core-linus (5771a8c08880 Linux v4.13-rc1)
Merging tty.current/tty-linus (520eccdfe187 Linux 4.13-rc2)
Merging usb.current/usb-linus (520eccdfe187 Linux 4.13-rc2)
Merging usb-gadget-fixes/fixes (520eccdfe187 Linux 4.13-rc2)
Merging usb-serial-fixes/usb-linus (9585e340db9f USB: serial: cp210x: add 
support for Qivicon USB ZigBee dongle)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (5771a8c08880 Linux v4.13-rc1)
Merging staging.current/staging-linus (055655a9f0fe Merge tag 
'iio-fixes-for-4.13a' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus)
Merging char-misc.current/char-misc-linus (520eccdfe187 Linux 4.13-rc2)
Merging input-current/for-linus (293b915fd9be Input: trackpoint - assume 3 
buttons when buttons detection fails)
Merging crypto-current/master (41cdf7a45389 crypto: authencesn - Fix 
digest_null crash)
Merging ide/master (921edf312a6a ide: avoid warning for timings calculation)
Merging vfio-fixes/for-linus (bb67b496c338 include/linux/vfio.h: Guard 
powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH)

Re: [RFT v2 2/3] ASoC: samsung: Add missing prepare for iis clock of s3c24xx

2017-07-27 Thread Krzysztof Kozlowski
On Fri, Jul 28, 2017 at 10:11:48AM +0530, Arvind Yadav wrote:
> Hi,
> 
> 
> On Thursday 27 July 2017 10:43 PM, Krzysztof Kozlowski wrote:
> >The s3c_i2sv2_probe() only enabled iis clock.  Missing prepare isn't
> >probably fatal, because for SoC clocks this is usually no-op, but for
> >correctness this clock should be prepared.
> >
> >Signed-off-by: Krzysztof Kozlowski 
> >
> >---
> >
> >Changes since v1:
> >1. New patch
> >---
> >  sound/soc/samsung/s3c-i2s-v2.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/sound/soc/samsung/s3c-i2s-v2.c b/sound/soc/samsung/s3c-i2s-v2.c
> >index 9b28046eea8e..3894bda06ebb 100644
> >--- a/sound/soc/samsung/s3c-i2s-v2.c
> >+++ b/sound/soc/samsung/s3c-i2s-v2.c
> >@@ -637,7 +637,7 @@ int s3c_i2sv2_probe(struct snd_soc_dai *dai,
> > return -ENOENT;
> > }
> >-clk_enable(i2s->iis_pclk);
> >+clk_prepare_enable(i2s->iis_pclk);
> Please, handle are return value of clk_prepare_enble.

Which is a different issue, different bug. We can fix it but it should
be not mixed with this fix here.

Best regards,
Krzysztof



Re: [PATCH V1 06/12] spmi: pmic-arb: replace the writel_relaxed with __raw_writel

2017-07-27 Thread kgunda

On 2017-07-28 05:31, Stephen Boyd wrote:

On 07/20, Kiran Gunda wrote:

Replace the writel_relaxed with __raw_writel to avoid byte swapping
in pmic_arb_write_data() function. That way the code is independent
of the CPU endianness.

Signed-off-by: Kiran Gunda 
Reviewed-by: Stephen Boyd 


This also needs a Fixes tag.

Fixes: 111a10bf3e53 ("spmi: pmic-arb: rename spmi_pmic_arb_dev to
spmi_pmic_arb")

Sure. will add the tag and send the patch.


Re: [PATCH V1 03/12] spmi: pmic-arb: clean up pmic_arb_find_apid function

2017-07-27 Thread kgunda

On 2017-07-28 06:48, Stephen Boyd wrote:

On 07/20, Kiran Gunda wrote:

Clean up the pmic_arb_find_apid() by using the local
variables to improve the code readability.

Signed-off-by: Kiran Gunda 
---


Reviewed-by: Stephen Boyd 

One nit below:


break;

regval = readl_relaxed(pmic_arb->cnfg +
- SPMI_OWNERSHIP_TABLE_REG(apid));
-   pmic_arb->apid_data[apid].owner =
-   SPMI_OWNERSHIP_PERIPH2OWNER(regval);
+   SPMI_OWNERSHIP_TABLE_REG(apid));


This should be 7 spaces and not a tab? Originally looks like it
was 6 spaces

Will fix it and send the next patch.


Re: [PATCH V1 05/12] spmi: pmic-arb: fix memory allocation for mapping_table

2017-07-27 Thread kgunda

On 2017-07-28 05:19, Stephen Boyd wrote:

On 07/20, Kiran Gunda wrote:

Allocate the correct memory size (max_pmic_peripherals) for the
mapping_table that holds the apid to ppid mapping. Also use a local
variable for mapping_table for better alignment of the code.

Signed-off-by: Kiran Gunda 


This needs a

Fixes: 987a9f128b8a ("spmi: pmic-arb: Support more than 128 
peripherals")


right?

Yes. I will keep the tag and send it again.


Re: [PATCH V3 3/9] cpufreq: Cap the default transition delay value to 10 ms

2017-07-27 Thread Viresh Kumar
+ IMX maintainers.

On 27-07-17, 19:54, Leonard Crestez wrote:
> On Wed, 2017-07-26 at 11:36 +0530, Viresh Kumar wrote:

> > - Find how much time does it really take to change the frequency of
> >   the CPU. I don't really thing 109 us is the right transition
> >   latency. Use attached patch for that and look for the print message.
> 
> Your patch measures latencies of around 2.5ms, but it can vary between
> 1.6 ms to 3ms from boot-to-boot. This is a lot more than what the
> driver reports. Most transitions seem to be faster.

Wow !!

I was pretty sure all these figures are just made up by some coder :)

> I did a little digging and it seems that a majority of time is always
> spent inside clk_pllv3_wait_lock which spins on a HW bit while doing
> usleep_range(50, 500). I originally thought it was because of
> regulators but the delays involved in that are smaller.
> 
> Measuring wall time on a process that can sleep seems dubious, isn't
> this vulnerable to random delays because of other tasks?

I am not sure I understood that, sorry.

> > Without this patch the sampling rate of ondemand governor will be 109
> > ms. And after this patch it would be capped at 10 ms. Why would that
> > screw up anyone's setup ? I don't have an answer to that right now.
> 
> On a closer look it seems that most of the time is actually spent at
> low cpufreq though (90%+).
> 
> Your change makes it so that even something like "sleep 1; cat
> scaling_cur_freq" raises the frequency to the maximum.

Why?

> This happens
> enough that even if you do it in a loop you will never see the minimum
> frequency. It seems there is enough internal bookkeeping on such a
> wakeup that it takes more than 10ms and enough for a reevaluation of
> cpufreq until cat returns the value?!

At this point I really feel that this is a hardware specific problem
and it was working by chance until now. And I am not sure if we
shouldn't be stopping this patch from getting merged just because of
that.

At least you can teach your distribution to go increase the sampling
rate from userspace to make it all work.

Can you try one more thing? Try using schedutil governor and see how
it behaves ?

> I found this by enabling the power:cpu_frequency tracepoint event and
> checking for deltas with a script. Enabling CPU_FREQ_STAT show this:
> 
> time_in_state:
> 
> 396000 1609

So we still stay at the lowest frequency most of the time.

> 792000 71
> 996000 54
> 
> trans_table:
> 
>    From  :To
>  :396000792000996000 
>    396000: 010 7 
>    792000:16 012 
>    996000: 118 0 

What is it that you are trying to point out here? I still see that we
are coming back to 396 MHz quite often.

Maybe can you compare these values with and without this patch to let
us know?

> This is very unexpected but not necessarily wrong.

I haven't understood the problem completely yet :(

-- 
viresh


Re: [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load

2017-07-27 Thread Longpeng (Mike)


On 2017/7/28 12:22, Longpeng (Mike) wrote:

> 
> 
> On 2017/6/6 18:57, Paolo Bonzini wrote:
> 
>> The simplify part: do not touch pi_desc.nv, we can set it when the
>> VCPU is first created.  Likewise, pi_desc.sn is only handled by
>> vmx_vcpu_pi_load, do not touch it in __pi_post_block.
>>
>> The fix part: do not check kvm_arch_has_assigned_device, instead
>> check the SN bit to figure out whether vmx_vcpu_pi_put ran before.
>> This matches what the previous patch did in pi_post_block.
>>
>> Cc: Longpeng (Mike) 
>> Cc: Huangweidong 
>> Cc: Gonglei 
>> Cc: wangxin 
>> Cc: Radim Krčmář 
>> Signed-off-by: Paolo Bonzini 
>> ---
>>  arch/x86/kvm/vmx.c | 68 
>> --
>>  1 file changed, 35 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 0f4714fe4908..81047f373747 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, 
>> int cpu)
>>  struct pi_desc old, new;
>>  unsigned int dest;
>>  
>> -if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> -!irq_remapping_cap(IRQ_POSTING_CAP)  ||
>> -!kvm_vcpu_apicv_active(vcpu))
>> +/*
>> + * In case of hot-plug or hot-unplug, we may have to undo
>> + * vmx_vcpu_pi_put even if there is no assigned device.  And we
>> + * always keep PI.NDST up to date for simplicity: it makes the
>> + * code easier, and CPU migration is not a fast path.
>> + */
>> +if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
>> +return;
> 
> 
> Hi Paolo,
> 
> I'm confused with the following scenario:
> 
> (suppose the VM has a assigned devices)
> step 1. the running vcpu is be preempted
>   --> vmx_vcpu_pi_put [ SET pi.sn ]
> step 2. hot-unplug the assigned devices
> step 3. the vcpu is scheduled in
>   --> vmx_vcpu_pi_load [ CLEAR pi.sn ]
> step 4. the running vcpu is be preempted again
>   --> vmx_vcpu_pi_put [ direct return ]
> step 5. the vcpu is migrated to another pcpu
> step 6. the vcpu is scheduled in
>   --> vmx_vcpu_pi_load [ above check fails and
>   continue to execute the follow parts ]
> 
> I think vmx_vcpu_pi_load should return direct in step6, because
> vmx_vcpu_pi_put in step4 did nothing.
> So maybe the above check has a potential problem.


Oh! Sorry, please just ignore the above. I made a mistaken.

> 
> Please kindly figure out if I misunderstand anything important :)
> 
> --
> Regards,
> Longpeng(Mike)
> 
>> +
>> +/*
>> + * First handle the simple case where no cmpxchg is necessary; just
>> + * allow posting non-urgent interrupts.
>> + *
>> + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
>> + * PI.NDST: pi_post_block will do it for us and the wakeup_handler
>> + * expects the VCPU to be on the blocked_vcpu_list that matches
>> + * PI.NDST.
>> + */
>> +if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR ||
>> +vcpu->cpu == cpu) {
>> +pi_clear_sn(pi_desc);
>>  return;
>> +}
>>  
>> +/* The full case.  */
>>  do {
>>  old.control = new.control = pi_desc->control;
>>  
>> -/*
>> - * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
>> - * are two possible cases:
>> - * 1. After running 'pre_block', context switch
>> - *happened. For this case, 'sn' was set in
>> - *vmx_vcpu_put(), so we need to clear it here.
>> - * 2. After running 'pre_block', we were blocked,
>> - *and woken up by some other guy. For this case,
>> - *we don't need to do anything, 'pi_post_block'
>> - *will do everything for us. However, we cannot
>> - *check whether it is case #1 or case #2 here
>> - *(maybe, not needed), so we also clear sn here,
>> - *I think it is not a big deal.
>> - */
>> -if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) {
>> -if (vcpu->cpu != cpu) {
>> -dest = cpu_physical_id(cpu);
>> -
>> -if (x2apic_enabled())
>> -new.ndst = dest;
>> -else
>> -new.ndst = (dest << 8) & 0xFF00;
>> -}
>> +dest = cpu_physical_id(cpu);
>>  
>> -/* set 'NV' to 'notification vector' */
>> -new.nv = POSTED_INTR_VECTOR;
>> -}
>> +if (x2apic_enabled())
>> +new.ndst = dest;
>> +else
>> +new.ndst = (dest << 8) & 0xFF00;
>>  
>> -/* Allow posting non-urgent interrupts */
>>  new.sn = 0;
>>  } while (cmpxchg(&p

Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels

2017-07-27 Thread David Miller
From: Mikael Pettersson 
Date: Thu, 27 Jul 2017 21:45:25 +0200

> Attempting to build strace-4.18 as sparcv9 code and run its test suite
> on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails
> reliably in three test cases (sched.gen, sched_xetattr.gen, and poll)
> because two test binaries (sched_xetattr and poll) OOPS the kernel and
> get killed.  Sample dmesg from 4.13-rc2:
> 
> [42912.270398] Unable to handle kernel NULL pointer dereference
> [42912.327717] tsk->{mm,active_mm}->context = 136a
> [42912.383789] tsk->{mm,active_mm}->pgd = fff227db4000
> [42912.435247]   \|/  \|/
>  "@'/ .. \`@"
>  /_| \__/ |_\
> \__U_/
> [42912.559982] sched_xetattr(21866): Oops [#1]
> [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1
> [42912.672138] task: fff229a5c380 task.stack: fff227dec000
> [42912.732876] TSTATE: 004411001603 TPC: 007570fc TNPC: 
> 00757110 Y: Not tainted
> [42912.845079] TPC: <__bzero+0x20/0xc0>
> [42912.874870] g0:  g1:  g2: 0030 
> g3: 008ca100
> [42912.972120] g4: fff229a5c380 g5: fff23ef44000 g6: fff227dec000 
> g7: 0030
> [42913.069446] o0: 0030 o1: fff227defe70 o2:  
> o3: 0030
> [42913.166765] o4: fff227defe70 o5:  sp: fff227def5c1 
> ret_pc: 00474fa4
> [42913.268664] RPC: 

This looks really strange.  It is a memset() call with the buffer pointer
and length arguments reversed.

What exact command did you give to configure and build strace-4.18 so that
I can try to reproduce this?

Thanks.


Re: linux-next: Signed-off-by missing for commit in the renesas tree

2017-07-27 Thread jmondi
Hi Simon,

On Fri, Jul 28, 2017 at 09:21:08AM +1000, Stephen Rothwell wrote:
> Hi Simon,
>
> Commit
>
>   bb003371172f ("arm: dts: genmai: Add RIIC2 pin group")
>
> is missing a Signed-off-by from its committer.

I do see my SOB in the patch I've sent. Any chance it has been dropped
while applying it? Should I resend?

Thanks for reproposing the patches now that we have RZ/A1 pinctrl
merged.

>
> --
> Cheers,
> Stephen Rothwell


Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-07-27 Thread Kurt Van Dijck

> 
> On 07/27/2017 01:47 PM, Oliver Hartkopp wrote:
> > On 07/26/2017 08:29 PM, Franklin S Cooper Jr wrote:
> >>
> > 
> >> I'm fine with switching to using bitrate instead of speed. Kurk was
> >> originally the one that suggested to use the term arbitration and data
> >> since thats how the spec refers to it. Which I do agree with. But your
> >> right that in the drivers (struct can_priv) we just use bittiming and
> >> data_bittiming (CAN-FD timings). I don't think adding "fd" into the
> >> property name makes sense unless we are calling it something like
> >> "max-canfd-bitrate" which I would agree is the easiest to understand.
> >>
> >> So what is the preference if we end up sticking with two properties?
> >> Option 1 or 2?
> >>
> >> 1)
> >> max-bitrate
> >> max-data-bitrate
> >>
> >> 2)
> >> max-bitrate
> >> max-canfd-bitrate
> >>
> >>
> > 
> > 1
> > 
> >>> A CAN transceiver is limited in bandwidth. But you only have one RX and
> >>> one TX line between the CAN controller and the CAN transceiver. The
> >>> transceiver does not know about CAN FD - it has just a physical(!) layer
> >>> with a limited bandwidth. This is ONE limitation.
> >>>
> >>> So I tend to specify only ONE 'max-bitrate' property for the
> >>> fixed-transceiver binding.
> >>>
> >>> The fact whether the CAN controller is CAN FD capable or not is provided
> >>> by the netlink configuration interface for CAN controllers.
> >>
> >> Part of the reasoning to have two properties is to indicate that you
> >> don't support CAN FD while limiting the "arbitration" bit rate.
> > 
> > ??
> > 
> > It's a physical layer device which only has a bandwidth limitation.
> > The transceiver does not know about CAN FD.
> > 
> >> With one
> >> property you can not determine this and end up having to make some
> >> assumptions that can quickly end up biting people.
> > 
> > Despite the fact that the transceiver does not know anything about ISO
> > layer 2 (CAN/CAN FD) the properties should look like
> > 
> > max-bitrate
> > canfd-capable
> > 
> > then.
> > 
> > But when the tranceiver is 'canfd-capable' agnostic, why provide a
> > property for it?
> > 
> > Maybe I'm wrong but I still can't follow your argumentation ideas.
> 

The transceiver does not know about CAN FD, but CAN FD uses
the different restrictions of the arbitration & data phase in the CAN
frame, i.e. during arbitration, the RX must indicate the wire
(dominant/recessive) within 1 bit time, during data in CAN FD, this is
not necessary.

So while _a_ transceiver may be spec'd to 1MBit during arbitration,
CAN FD packets may IMHO exceed that speed during data phase.
That was the whole point of CAN FD: exceed the limits required for
correct arbitration on transceiver & wire.

So I do not agree on the single bandwidth limitation.

The word 'max-arbitration-bitrate' makes the difference very clear.

> Your right. I spoke to our CAN transceiver team and I finally get your
> points.
> 
> So yes using "max-bitrate" alone is all we need. Sorry for the confusion
> and I'll create a new rev using this approach.
> > 
> > Regards,
> > Oliver

Kind regards,
Kurt


Re: [PATCH v3/resubmit 1/3] staging: gs_fpgaboot: add buffer overflow checks

2017-07-27 Thread Greg Kroah-Hartman
On Wed, Jul 26, 2017 at 09:13:57PM -0400, Jacob von Chorus wrote:
> Four fields in struct fpgaimage are char arrays of length MAX_STR (256).
> The amount of data read into these buffers is controlled by a length
> field in the bitstream file read from userspace. If a corrupt or
> malicious firmware file was supplied, kernel data beyond these buffers
> can be overwritten arbitrarily.
> 
> This patch adds a check of the bitstream's length value to ensure it
> fits within the bounds of the allocated buffers. An error condition is
> returned from gs_read_bitstream if any of the reads fail.
> 
> Signed-off-by: Jacob von Chorus 
> 
> v3:
> - use >= to prevent an integer overflow in the comparison
> - use get_unaligned_be functions to interpret length fields
> - fix remainder of file to use valid error codes
> 
> v2:
> - char arrays converted to u8 arrays
> - replace error return value with proper error code in
>   gs_read_bitstream
> ---

All of the v2: and such needs to go below the --- line, as
Documentation/SubmittingPatches says to do.

Please fix that up and resend the series.

thanks,

greg k-h


Re: [PATCH 1/3] staging: ccree: Replace kzalloc with devm_kzalloc

2017-07-27 Thread Greg Kroah-Hartman
On Thu, Jul 27, 2017 at 05:27:32PM +0300, Gilad Ben-Yossef wrote:
> From: Suniel Mahesh 
> 
> It is recommended to use managed function devm_kzalloc, which
> simplifies driver cleanup paths and driver code.
> This patch does the following:
> (a) replace kzalloc with devm_kzalloc.
> (b) drop kfree(), because memory allocated with devm_kzalloc() is
> automatically freed on driver detach, otherwise it leads to a double
> free.
> (c) remove unnecessary blank lines.
> 
> Signed-off-by: Suniel Mahesh 
> [gby: rebase on top of latest coding style fixes changes]
> Acked-by: Gilad Ben-Yossef 

None of these 3 applied to my tree :(


Re: [PATCH 0/4] staging: ccree: coding style clean ups

2017-07-27 Thread Greg Kroah-Hartman
On Thu, Jul 27, 2017 at 01:43:14PM +0300, Gilad Ben-Yossef wrote:
> Misc. coding style fixes for ccree driver.
> 
> These are the missing patches that failed to apply two weeks ago,
> rebased onto latest staging-next.
> 
> By the way, I still do not understand why they failed to apply,
> as they applied just fine being cherry picked on top of latest
> staging-next apart for fixes for later changes. I hope these will
> be good.

Only one applied here :(

I did just take some more coding style fixes from someone else who sent
them before you did as they looked correct, maybe that's why?  Can you
rebase and resend?

thanks,

greg k-h


Re: [PATCH] Staging: greybus: Match alignment with open parenthesis.

2017-07-27 Thread Greg KH
On Mon, Jul 24, 2017 at 10:08:37AM +0300, Dan Carpenter wrote:
> I don't understand why greybus has to be special instead of the same as
> everything else.  Who cares about this stuff really?  Just do whatever
> is easiest and most common.

It's not special, and --strict should apply here as long as it is in
staging.  So I'll apply these types of things to prevent people from
feeling like they wasted their time fixing something up and getting it
rejected for no good reason.

thanks,

greg k-h


Re: [PATCH] Staging: greybus: Match alignment with open parenthesis.

2017-07-27 Thread Greg KH
On Sun, Jul 23, 2017 at 02:09:57PM +0530, Shreeya Patel wrote:
> Alignment should match with open parenthesis.
> This fixes the coding style issue.
> 
> Signed-off-by: Shreeya Patel 
> ---
>  drivers/staging/greybus/tools/loopback_test.c | 35 
> ---
>  1 file changed, 16 insertions(+), 19 deletions(-)

Does not apply cleanly to my tree, can you please rebase it and resend
if you want to have this applied?

thanks,

greg k-h


Re: [PATCH V3 2/9] cpufreq: Use transition_delay_us for legacy governors as well

2017-07-27 Thread Viresh Kumar
On 24-07-17, 18:17, Peter Zijlstra wrote:
> On Wed, Jul 19, 2017 at 03:42:42PM +0530, Viresh Kumar wrote:
> > The policy->transition_delay_us field is used only by the schedutil
> > governor currently, and this field describes how fast the driver wants
> > the cpufreq governor to change CPUs frequency. It should rather be a
> > common thing across all governors, as it doesn't have any schedutil
> > dependency here.
> > 
> > Create a new helper cpufreq_policy_transition_delay_us() to get the
> > transition delay across all governors.
> > 
> > Signed-off-by: Viresh Kumar 
> > ---
> >  drivers/cpufreq/cpufreq.c  | 15 +++
> >  drivers/cpufreq/cpufreq_governor.c |  9 +
> >  include/linux/cpufreq.h|  1 +
> >  kernel/sched/cpufreq_schedutil.c   | 11 +--
> >  4 files changed, 18 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 9bf97a366029..c426d21822f7 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -524,6 +524,21 @@ unsigned int cpufreq_driver_resolve_freq(struct 
> > cpufreq_policy *policy,
> >  }
> >  EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> >  
> > +unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy 
> > *policy)
> > +{
> > +   unsigned int latency;
> > +
> > +   if (policy->transition_delay_us)
> > +   return policy->transition_delay_us;
> > +
> > +   latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC;
> > +   if (latency)
> > +   return latency * LATENCY_MULTIPLIER;
> > +
> > +   return LATENCY_MULTIPLIER;
> > +}
> > +EXPORT_SYMBOL_GPL(cpufreq_policy_transition_delay_us);
> 
> I realize you're just moving code about, but _why_ are we doing that
> division?

We are doing division by NSEC_PER_USEC because the values of
sampling_rate and rate_limit_us are in us, while transition_latency is
in ns.

We shouldn't be breaking the userspace ABI and so if we want, we can
actually make transition_latency be stored in us instead.

Though I am not sure why are we multiplying with LATENCY_MULTIPLIER
(1000) as that seems to be a huge value.

-- 
viresh


Re: Possible race in hysdn.ko

2017-07-27 Thread isdn
Hello Anton,

first of all, this code was developed by other people and I
never managed to get one of these cards - so I do not know so much about
this driver at all.
Unfortunately the firm behind hysdn do not longer exist and
was taken over by Hermstedt AG years ago and even Hermstedt AG is not
longer active in this businesss I think (ISDN is a obsolete technology).

Am 27.07.2017 um 18:19 schrieb Anton Volkov:
> Hello.
> 
> While searching for races in the Linux kernel I've come across
> "drivers/isdn/hysdn/hysdn.ko" module. Here is a question that I came up
> with while analysing results. Lines are given using the info from Linux
> v4.12.
> 
> In hysdn_proclog.c file in put_log_buffer function a non-standard type
> of synchronization is employed. It uses pd->del_lock as some kind of
> semaphore (hysdn_proclog.c: lines 129 and 143). Consider the following
> case:
> 
> Thread 1:Thread 2:
> hysdn_log_write
> -> hysdn_add_log
> -> put_log_buffer
>  spin_lock()  hysdn_conf_open
>  i = pd->del_lock++   -> hysdn_add_log
>  spin_unlock()   -> put_log_buffer
>  if (!i) spin_lock()
>  pd->del_lock--   i = pd->del_lock++
>   spin_unlock()
>   if (!i) 
>   pd->del_lock--
> 
>  - the loop that deletes unused buffer entries
> (hysdn_proclog.c: lines 134-142).
> pd->del_lock-- is not an atomic operation and is executed without any
> locks. Thus it may interfere in the increment process of pd->del_lock in
> another thread. There may be cases that lead to the inability of any
> thread going through the .

Good catch.

> 
> I see several possible solutions to this problem:
> 1) move the  under the spin_lock and delete
> pd->del_lock synchronization;
> 2) wrap pd->del_lock-- with spin_lock protection.
> 
> What do you think should be done about it?

I think the intention to have this construct was to not hold the card
lock for long times from /proc/ access to log data, since that may
disrupt the normal function. This is only a guess - I did not really
analyzed the code deeply enough, but I fear here are other critical
problems with this code, since without extra protection the list could
be damaged during the deletion loop I think.
So maybe to have the complete loop under the lock is a good idea.


Best regards
Karsten




Re: [RFT v2 2/3] ASoC: samsung: Add missing prepare for iis clock of s3c24xx

2017-07-27 Thread Arvind Yadav

Hi,


On Thursday 27 July 2017 10:43 PM, Krzysztof Kozlowski wrote:

The s3c_i2sv2_probe() only enabled iis clock.  Missing prepare isn't
probably fatal, because for SoC clocks this is usually no-op, but for
correctness this clock should be prepared.

Signed-off-by: Krzysztof Kozlowski 

---

Changes since v1:
1. New patch
---
  sound/soc/samsung/s3c-i2s-v2.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/soc/samsung/s3c-i2s-v2.c b/sound/soc/samsung/s3c-i2s-v2.c
index 9b28046eea8e..3894bda06ebb 100644
--- a/sound/soc/samsung/s3c-i2s-v2.c
+++ b/sound/soc/samsung/s3c-i2s-v2.c
@@ -637,7 +637,7 @@ int s3c_i2sv2_probe(struct snd_soc_dai *dai,
return -ENOENT;
}
  
-	clk_enable(i2s->iis_pclk);

+   clk_prepare_enable(i2s->iis_pclk);

Please, handle are return value of clk_prepare_enble.
  
  	/* Mark ourselves as in TXRX mode so we can run through our cleanup

 * process without warnings. */

~arvind


Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the HNAE3 framework

2017-07-27 Thread Leon Romanovsky
On Thu, Jul 27, 2017 at 11:44:32PM +, Salil Mehta wrote:
> Hi Leon
>
> > -Original Message-
> > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma-
> > ow...@vger.kernel.org] On Behalf Of Leon Romanovsky
> > Sent: Sunday, July 23, 2017 2:16 PM
> > To: Salil Mehta
> > Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); huangdaode; lipeng (Y);
> > mehta.salil@gmail.com; net...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm
> > Subject: Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the
> > HNAE3 framework
> >
> > On Sat, Jul 22, 2017 at 11:09:36PM +0100, Salil Mehta wrote:
> > > This patch adds the support of the HNAE3 (Hisilicon Network
> > > Acceleration Engine 3) framework support to the HNS3 driver.
> > >
> > > Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE
> > > and user-space Ethernet drivers (like ODP etc.) to register with
> > HNAE3
> > > devices and their associated operations.
> > >
> > > Signed-off-by: Daode Huang 
> > > Signed-off-by: lipeng 
> > > Signed-off-by: Salil Mehta 
> > > Signed-off-by: Yisen Zhuang 
> > > ---
> > > Patch V4: Addressed following comments
> > >   1. Andrew Lunn:
> > >  https://lkml.org/lkml/2017/6/17/233
> > >  https://lkml.org/lkml/2017/6/18/105
> > >   2. Bo Yu:
> > >  https://lkml.org/lkml/2017/6/18/112
> > >   3. Stephen Hamminger:
> > >  https://lkml.org/lkml/2017/6/19/778
> > > Patch V3: Addressed below comments
> > >   1. Andrew Lunn:
> > >  https://lkml.org/lkml/2017/6/13/1025
> > > Patch V2: No change
> > > Patch V1: Initial Submit
> > > ---
> > >  drivers/net/ethernet/hisilicon/hns3/hnae3.c | 319
> > 
> > >  drivers/net/ethernet/hisilicon/hns3/hnae3.h | 449
> > 
> > >  2 files changed, 768 insertions(+)
> > >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > >  create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h
> > >
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > > new file mode 100644
> > > index ..7a11aaff0a23
> > > --- /dev/null
> > > +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
> > > @@ -0,0 +1,319 @@
> > > +/*
> > > + * Copyright (c) 2016-2017 Hisilicon Limited.
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > modify
> > > + * it under the terms of the GNU General Public License as published
> > by
> > > + * the Free Software Foundation; either version 2 of the License, or
> > > + * (at your option) any later version.
> > > + */
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +#include "hnae3.h"
> > > +
> > > +static LIST_HEAD(hnae3_ae_algo_list);
> > > +static LIST_HEAD(hnae3_client_list);
> > > +static LIST_HEAD(hnae3_ae_dev_list);
> > > +
> > > +/* we are keeping things simple and using single lock for all the
> > > + * list. This is a non-critical code so other updations, if happen
> > > + * in parallel, can wait.
> > > + */
> > > +static DEFINE_MUTEX(hnae3_common_lock);
> > > +
> > > +static bool hnae3_client_match(enum hnae3_client_type client_type,
> > > +enum hnae3_dev_type dev_type)
> > > +{
> > > + if (dev_type == HNAE3_DEV_KNIC) {
> > > + switch (client_type) {
> > > + case HNAE3_CLIENT_KNIC:
> > > + case HNAE3_CLIENT_ROCE:
> > > + return true;
> > > + default:
> > > + return false;
> > > + }
> > > + } else if (dev_type == HNAE3_DEV_UNIC) {
> > > + switch (client_type) {
> > > + case HNAE3_CLIENT_UNIC:
> > > + return true;
> > > + default:
> > > + return false;
> > > + }
> > > + } else {
> > > + return false;
> > > + }
> > > +}
> > > +
> > > +static int hnae3_match_n_instantiate(struct hnae3_client *client,
> > > +  struct hnae3_ae_dev *ae_dev,
> > > +  bool is_reg, bool *matched)
> > > +{
> > > + int ret;
> > > +
> > > + *matched = false;
> > > +
> > > + /* check if this client matches the type of ae_dev */
> > > + if (!(hnae3_client_match(client->type, ae_dev->dev_type) &&
> > > +   hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))) {
> > > + return 0;
> > > + }
> > > + /* there is a match of client and dev */
> > > + *matched = true;
> > > +
> > > + if (!(ae_dev->ops && ae_dev->ops->init_client_instance &&
> > > +   ae_dev->ops->uninit_client_instance)) {
> > > + dev_err(&ae_dev->pdev->dev,
> > > + "ae_dev or client init/uninit ops are null\n");
> > > + return -EOPNOTSUPP;
> > > + }
> > > +
> > > + /* now, (un-)instantiate client by calling lower layer */
> > > + if (is_reg) {
> > > + ret = ae_dev->ops->init_client_instance(client, ae_dev);
> > > + if (ret)
> > > + dev_err(&ae_dev->pdev->dev,
> > > +  

Re: [RFT v2 1/3] ASoC: samsung: Fix possible double iounmap on s3c24xx driver probe failure

2017-07-27 Thread Arvind Yadav



On Thursday 27 July 2017 10:43 PM, Krzysztof Kozlowski wrote:

Commit 87b132bc0315 ("ASoC: samsung: s3c24{xx,12}-i2s: port to use
generic dmaengine API") moved ioremap() call from
s3c-i2s-v2.c:s3c_i2sv2_probe() to s3c2412-i2s.c:s3c2412_iis_dev_probe()
and converted it to devm- resource managed interface.

However the error path in first of them - s3c_i2sv2_probe() - was not
updated.  If getting a iis clock in s3c_i2sv2_probe() failed, the
address space would be unmapped there.  This could lead to:
1. double iounmap() later from devm-interface of s3c2412_iis_dev_probe()),
2. accessing the memory by other functions in s3c2412-i2s.c unit.

Anyway, the owner of this mapped region should be s3c2412-i2s.c because
it starts the mapping.

Affected are drivers for S3C24xx family although issue was not reproduced.

Fixes: 87b132bc0315 ("ASoC: samsung: s3c24{xx,12}-i2s: port to use generic dmaengine 
API")
Signed-off-by: Krzysztof Kozlowski 

Acked-by: Arvind Yadav

---

Not marking as Cc-stable because this is theoretical problem, not
reproduced and also not tested.

Please, kindly test on S3C24xx hardware.

Changes since v1:
1. None
---
  sound/soc/samsung/s3c-i2s-v2.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/sound/soc/samsung/s3c-i2s-v2.c b/sound/soc/samsung/s3c-i2s-v2.c
index ca522a95160b..9b28046eea8e 100644
--- a/sound/soc/samsung/s3c-i2s-v2.c
+++ b/sound/soc/samsung/s3c-i2s-v2.c
@@ -634,7 +634,6 @@ int s3c_i2sv2_probe(struct snd_soc_dai *dai,
i2s->iis_pclk = clk_get(dev, "iis");
if (IS_ERR(i2s->iis_pclk)) {
dev_err(dev, "failed to get iis_clock\n");
-   iounmap(i2s->regs);
return -ENOENT;
}
  




Re: [PATCH V8 3/3] powernv: Add support to clear sensor groups data

2017-07-27 Thread Cyril Bur
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote:
> Adds support for clearing different sensor groups. OCC inband sensor
> groups like CSM, Profiler, Job Scheduler can be cleared using this
> driver. The min/max of all sensors belonging to these sensor groups
> will be cleared.
> 

Hi Shilpasri,

I think also some comments from v1 also apply here.

Other comments inline

Thanks,

Cyril

> Signed-off-by: Shilpasri G Bhat 
> ---
> Changes from V7:
> - s/send_occ_command/opal_sensor_groups_clear_history
> 
>  arch/powerpc/include/asm/opal-api.h|   3 +-
>  arch/powerpc/include/asm/opal.h|   2 +
>  arch/powerpc/include/uapi/asm/opal-occ.h   |  23 ++
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-occ.c  | 109 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   1 +
>  arch/powerpc/platforms/powernv/opal.c  |   3 +
>  7 files changed, 141 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h
>  create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index 0d37315..342738a 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -195,7 +195,8 @@
>  #define OPAL_SET_POWERCAP153
>  #define OPAL_GET_PSR 154
>  #define OPAL_SET_PSR 155
> -#define OPAL_LAST155
> +#define OPAL_SENSOR_GROUPS_CLEAR 156
> +#define OPAL_LAST156
>  
>  /* Device tree flags */
>  
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 58b30a4..92db6af 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
>  int opal_set_powercap(u32 handle, int token, u32 pcap);
>  int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr);
>  int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr);
> +int opal_sensor_groups_clear(u32 group_hndl, int token);
>  
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> @@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  
>  void opal_powercap_init(void);
>  void opal_psr_init(void);
> +int opal_sensor_groups_clear_history(u32 handle);
>  
>  #endif /* __ASSEMBLY__ */
>  
> diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h 
> b/arch/powerpc/include/uapi/asm/opal-occ.h
> new file mode 100644
> index 000..97c45e2
> --- /dev/null
> +++ b/arch/powerpc/include/uapi/asm/opal-occ.h
> @@ -0,0 +1,23 @@
> +/*
> + * OPAL OCC command interface
> + * Supported on POWERNV platform
> + *
> + * (C) Copyright IBM 2017
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2, or (at your option)
> + * any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_
> +#define _UAPI_ASM_POWERPC_OPAL_OCC_H_
> +
> +#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS   _IOR('o', 1, u32)
> +
> +#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index 9ed7d33..f193b33 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o
> +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o 
> opal-occ.o
>  
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-occ.c 
> b/arch/powerpc/platforms/powernv/opal-occ.c
> new file mode 100644
> index 000..d1d4b28
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-occ.c
> @@ -0,0 +1,109 @@
> +/*
> + * Copyright IBM Corporation 2017
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free

Re: [PATCH] staging: unisys: visorchipset: constify attribute_group structure

2017-07-27 Thread Greg KH
On Wed, Jul 26, 2017 at 09:51:32PM -0400, Amitoj Kaur Chawla wrote:
> Functions working with attribute_groups provided by 
> work with const attribute_group. These attribute_group structures do not
> change at runtime so mark them as const.
> 
> File size before:
>  text  data bss dec hex filename
>  24124 6216 448   307887844
> drivers/staging/unisys/visorbus/visorchipset.o
> 
> File size after:
>  text data bss dec hex filename
>  242206120 448   307887844
> drivers/staging/unisys/visorbus/visorchipset.o
> 
> This change was made with the help of Coccinelle.
> 
> Signed-off-by: Amitoj Kaur Chawla 
> ---
>  drivers/staging/unisys/visorbus/visorchipset.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/unisys/visorbus/visorchipset.c 
> b/drivers/staging/unisys/visorbus/visorchipset.c
> index 2215056..798a92e 100644
> --- a/drivers/staging/unisys/visorbus/visorchipset.c
> +++ b/drivers/staging/unisys/visorbus/visorchipset.c
> @@ -1146,7 +1146,7 @@ static struct attribute 
> *visorchipset_parahotplug_attrs[] = {
>   NULL
>  };
>  
> -static struct attribute_group visorchipset_parahotplug_group = {
> +static const struct attribute_group visorchipset_parahotplug_group = {
>   .name = "parahotplug",
>   .attrs = visorchipset_parahotplug_attrs
>  };

Someone else sent this change right before you did :(


Re: [PATCH] staging: unisys: visorbus_main: constify attribute_group structures

2017-07-27 Thread Greg KH
On Tue, Jul 25, 2017 at 06:45:52PM -0400, Amitoj Kaur Chawla wrote:
> Functions working with attribute_groups provided by 
> work with const attribute_group. These attribute_group structures do not
> change at runtime so mark them as const.
> 
> File size before:
>  text data bss dec hex filename
>  142167304 832   223525750
> drivers/staging/unisys/visorbus/visorbus_main.o
> 
> File size after:
>  text data bss dec hex filename
>  144087112 832   223525750
> drivers/staging/unisys/visorbus/visorbus_main.o
> 
> This change was made with the help of Coccinelle.
> 
> Signed-off-by: Amitoj Kaur Chawla 
> ---
>  drivers/staging/unisys/visorbus/visorbus_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/unisys/visorbus/visorbus_main.c 
> b/drivers/staging/unisys/visorbus/visorbus_main.c
> index 1c785dd..bba10dc 100644
> --- a/drivers/staging/unisys/visorbus/visorbus_main.c
> +++ b/drivers/staging/unisys/visorbus/visorbus_main.c
> @@ -249,7 +249,7 @@ static struct attribute *channel_attrs[] = {
>   NULL
>  };
>  
> -static struct attribute_group channel_attr_grp = {
> +static const struct attribute_group channel_attr_grp = {
>   .name = "channel",
>   .attrs = channel_attrs,
>  };
> @@ -340,7 +340,7 @@ static struct attribute *dev_attrs[] = {
>   NULL
>  };
>  
> -static struct attribute_group dev_attr_grp = {
> +static const struct attribute_group dev_attr_grp = {
>   .attrs = dev_attrs,
>  };
>  

Does not apply to my tree :(

sorry,

greg k-h


Re: [Eas-dev] [PATCH V3 1/3] sched: cpufreq: Allow remote cpufreq callbacks

2017-07-27 Thread Joel Fernandes (Google)
On Thu, Jul 27, 2017 at 12:55 PM, Saravana Kannan
 wrote:
> On 07/26/2017 08:30 PM, Viresh Kumar wrote:
>>
>> On 26-07-17, 14:00, Saravana Kannan wrote:
>>>
>>> No, the alternative is to pass it on to the CPU freq driver and let it
>>> decide what it wants to do. That's the whole point if having a CPU freq
>>> driver -- so that the generic code doesn't need to care about HW specific
>>> details. Which is the point I was making in an earlier email to Viresh's
>>> patch -- we shouldn't be doing any CPU check for the call backs at the
>>> scheduler or ever governor level.
>>>
>>> That would simplify this whole thing by deleting a bunch of code. And
>>> having
>>> much simpler checks in those drivers that actually have to deal with
>>> their
>>> HW specific details.
>>
>>
>> So what you are saying is that we go and update (almost) every cpufreq
>> driver we have today and make their ->target() callbacks return early
>> if they don't support switching frequency remotely ? Is that really
>> simplifying anything?
>
>
> Yes. Simplifying isn't always about number of lines of code. It's also about
> abstraction. Having generic scheduler code care about HW details doesn't
> seem nice.
>
> It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in"
> policy->cpus).
>

I think we can have both approaches? So we query the driver some time
around sugov_should_update_freq (with a new driver callback?) and ask
it if it has any say over the default behavior of "can't update remote
CPU if I'm not a part of its policy" and use that over the default if
it hasn't defined it in their struct cpufreq_driver.

I think this will also not have the concern of "updating every
driver", then we can just stick to the sane default of "no" for
drivers that haven't defined it. Probably Viresh has already thought
about this, but I just thought of bringing it up anyway. I also think
its fine to handle this case after this series gets in, but that's
just my opinion.

thanks!

-Joel


Re: [PATCH 2/3] staging: ccree: Convert to devm_ioremap_resource for map, unmap

2017-07-27 Thread Suniel Mahesh
On Friday 28 July 2017 01:18 AM, Dan Carpenter wrote:
> On Thu, Jul 27, 2017 at 05:27:33PM +0300, Gilad Ben-Yossef wrote:
>> +new_drvdata->cc_base = devm_ioremap_resource(&plat_dev->dev,
>> + req_mem_cc_regs);
>> +if (IS_ERR(new_drvdata->cc_base)) {
>> +rc = PTR_ERR(new_drvdata->cc_base);
>>  goto init_cc_res_err;
> 
> (This code was in the original and not introduced by the patch.)

Hi Dan, the above lines of code were not in the original except 
"goto init_cc_res_err;" which was doing the error handling at different
places.

This patch has added those first three lines. I was constantly checking the 
latest
linux-next and staging-testing / next git trees. 
 
> 
> Ideally, the goto name should say what the goto does.  In this case it
> does everything.  Unfortunately trying to do everything is very
> complicated so obviously the error handling is going to be full of bugs.
> 
> The first thing the error handling does is:
> 
>   ssi_aead_free(new_drvdata);
> 
> But this function assumes that if new_drvdata->aead_handle is non-NULL
> then that means we have called:
> 
>   INIT_LIST_HEAD(&aead_handle->aead_list);
> 
> That assumption is false if the aead_handle->sram_workspace_addr
> allocation fails.  It can't actually fail in the current code...  So
> that's good, I suppose.  Reviewing this code is really hard, because I
> have to jump back and forth through several functions in different
> files.
> 
> Moving on two the second error handling function:
> 
>   ssi_hash_free(new_drvdata);
> 
> This one has basically the same assumption that if ->hash_handle is
> allocated that means we called:
> 
>   INIT_LIST_HEAD(&hash_handle->hash_list);
> 
> That assumption is not true if ssi_hash_init_sram_digest_consts(drvdata);
> fails.  That function can fail in real life.  Except the the error
> handling in ssi_hash_alloc() sets ->hash_handle to NULL.  So the bug is
> just a leak and not a crashing bug.
> 
> I've reviewed the first two lines of the error handling just to give a
> feel for how complicated "do everything" style error handling is to
> review.
> 
> The better way to do error handling is:
> 1) Only free things which have been allocated.
> 2) The unwind code should mirror the wind up code.
> 3) Every allocation function should have a free function.
> 4) Label names should tell you what the goto does.
> 5) Use direct returns and literals where possible.
> 6) Generally it's better to keep the error path and the success path
>separate.
> 7) Do error handling as opposed to success handling.
> 
>   one = alloc();
>   if (!one)
>   return -ENOMEM;
>   if (foo) {
>   two = alloc();
>   if (!two) {
>   ret = -ENOMEM;
>   goto free_one;
>   }
>   }
>   three = alloc();
>   if (!three) {
>   ret = -ENOMEM;
>   goto free_two;
>   }
>   ...
> 
>   return 0;
> 
> free_two:
>   if (foo)
>   free(two);
> free_one:
>   free(one);
> 
>   return ret;
> 
> This style of error handling is easier to review.  You only need to
> remember the most recent thing that you have allocated.  You can tell
> from the goto that it frees it so you don't have to scroll to the
> bottom of the function or jump to a different file.

I understand, its make sense, may be we should come up with something 
better and simpler.

Thanks
Suniel

> 
> regards,
> dan carpenter
> 
> 



Re: [PATCH 3.10] pstore: Make spinlock per zone instead of global

2017-07-27 Thread Willy Tarreau
Hi Leo,

There was no upstream commit ID here but I found it in mainline here :

  commit 109704492ef637956265ec2eb72ae7b3b39eb6f4
  Author: Joel Fernandes 
  Date:   Thu Oct 20 00:34:00 2016 -0700

pstore: Make spinlock per zone instead of global

What worries me is that some later fixes were issued, apparently to fix
an oops and a warning after this patch :

  commit 76d5692a58031696e282384cbd893832bc92bd76
  Author: Kees Cook 
  Date:   Thu Feb 9 15:43:44 2017 -0800

pstore: Correctly initialize spinlock and flags

The ram backend wasn't always initializing its spinlock correctly. Since
it was coming from kzalloc memory, though, it was harmless on
architectures that initialize unlocked spinlocks to 0 (at least x86 and
ARM). This also fixes a possibly ignored flag setting too.

and :

  commit e9a330c4289f2ba1ca4bf98c2b430ab165a8931b
  Author: Kees Cook 
  Date:   Sun Mar 5 22:08:58 2017 -0800

pstore: Use dynamic spinlock initializer

The per-prz spinlock should be using the dynamic initializer so that
lockdep can correctly track it. Without this, under lockdep, we get a
warning at boot that the lock is in non-static memory.

So I'm fine with merging this patch as long as Kees is OK with this and
we know what exact patch series needs to be merged.

Also, the information you added to the commit message references a trace
on a 4.4 kernel. Do you confirm that you got the same issue on 3.10 ? I
just prefer to avoid blindly backporting sensitive patches if they're not
absolutely needed.

> [   65.103905] hrtimer: interrupt took 2759375 ns
> [   65.108721] BUG: spinlock recursion on CPU#0, kschedfreq:0/1246
> [   65.108760]  lock: buffer_lock+0x0/0x38, .magic: dead4ead, .owner: 
> kschedfreq:0/1246, .owner_cpu: 0
> [   65.108779] CPU: 0 PID: 1246 Comm: kschedfreq:0 Not tainted 
> 4.4.74-07294-g5c996a9-dirty #130

Thanks!
willy


Re: [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load

2017-07-27 Thread Longpeng (Mike)


On 2017/6/6 18:57, Paolo Bonzini wrote:

> The simplify part: do not touch pi_desc.nv, we can set it when the
> VCPU is first created.  Likewise, pi_desc.sn is only handled by
> vmx_vcpu_pi_load, do not touch it in __pi_post_block.
> 
> The fix part: do not check kvm_arch_has_assigned_device, instead
> check the SN bit to figure out whether vmx_vcpu_pi_put ran before.
> This matches what the previous patch did in pi_post_block.
> 
> Cc: Longpeng (Mike) 
> Cc: Huangweidong 
> Cc: Gonglei 
> Cc: wangxin 
> Cc: Radim Krčmář 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/vmx.c | 68 
> --
>  1 file changed, 35 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0f4714fe4908..81047f373747 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, 
> int cpu)
>   struct pi_desc old, new;
>   unsigned int dest;
>  
> - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> - !irq_remapping_cap(IRQ_POSTING_CAP)  ||
> - !kvm_vcpu_apicv_active(vcpu))
> + /*
> +  * In case of hot-plug or hot-unplug, we may have to undo
> +  * vmx_vcpu_pi_put even if there is no assigned device.  And we
> +  * always keep PI.NDST up to date for simplicity: it makes the
> +  * code easier, and CPU migration is not a fast path.
> +  */
> + if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu)
> + return;


Hi Paolo,

I'm confused with the following scenario:

(suppose the VM has a assigned devices)
step 1. the running vcpu is be preempted
--> vmx_vcpu_pi_put [ SET pi.sn ]
step 2. hot-unplug the assigned devices
step 3. the vcpu is scheduled in
--> vmx_vcpu_pi_load [ CLEAR pi.sn ]
step 4. the running vcpu is be preempted again
--> vmx_vcpu_pi_put [ direct return ]
step 5. the vcpu is migrated to another pcpu
step 6. the vcpu is scheduled in
--> vmx_vcpu_pi_load [ above check fails and
continue to execute the follow parts ]

I think vmx_vcpu_pi_load should return direct in step6, because
vmx_vcpu_pi_put in step4 did nothing.
So maybe the above check has a potential problem.

Please kindly figure out if I misunderstand anything important :)

--
Regards,
Longpeng(Mike)

> +
> + /*
> +  * First handle the simple case where no cmpxchg is necessary; just
> +  * allow posting non-urgent interrupts.
> +  *
> +  * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change
> +  * PI.NDST: pi_post_block will do it for us and the wakeup_handler
> +  * expects the VCPU to be on the blocked_vcpu_list that matches
> +  * PI.NDST.
> +  */
> + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR ||
> + vcpu->cpu == cpu) {
> + pi_clear_sn(pi_desc);
>   return;
> + }
>  
> + /* The full case.  */
>   do {
>   old.control = new.control = pi_desc->control;
>  
> - /*
> -  * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there
> -  * are two possible cases:
> -  * 1. After running 'pre_block', context switch
> -  *happened. For this case, 'sn' was set in
> -  *vmx_vcpu_put(), so we need to clear it here.
> -  * 2. After running 'pre_block', we were blocked,
> -  *and woken up by some other guy. For this case,
> -  *we don't need to do anything, 'pi_post_block'
> -  *will do everything for us. However, we cannot
> -  *check whether it is case #1 or case #2 here
> -  *(maybe, not needed), so we also clear sn here,
> -  *I think it is not a big deal.
> -  */
> - if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) {
> - if (vcpu->cpu != cpu) {
> - dest = cpu_physical_id(cpu);
> -
> - if (x2apic_enabled())
> - new.ndst = dest;
> - else
> - new.ndst = (dest << 8) & 0xFF00;
> - }
> + dest = cpu_physical_id(cpu);
>  
> - /* set 'NV' to 'notification vector' */
> - new.nv = POSTED_INTR_VECTOR;
> - }
> + if (x2apic_enabled())
> + new.ndst = dest;
> + else
> + new.ndst = (dest << 8) & 0xFF00;
>  
> - /* Allow posting non-urgent interrupts */
>   new.sn = 0;
>   } while (cmpxchg(&pi_desc->control, old.control,
>   new.control) != old.control);
> @@ -9259,6 +9257,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm 
> *kvm, unsigned int id)
>  
>   

Re: [PATCH V8 2/3] powernv: Add support to set power-shifting-ratio

2017-07-27 Thread Cyril Bur
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote:
> This patch adds support to set power-shifting-ratio for CPU-GPU which
> is used by OCC power capping algorithm.
> 
> Signed-off-by: Shilpasri G Bhat 


Hi Shilpasri,

I started looking though this - a lot the comments to patch 1/3 apply
here so I'll stop repeating myself :).


Thanks,

Cyril
> ---
> Changes from V7:
> - Replaced sscanf with kstrtoint
> 
>  arch/powerpc/include/asm/opal-api.h|   4 +-
>  arch/powerpc/include/asm/opal.h|   3 +
>  arch/powerpc/platforms/powernv/Makefile|   2 +-
>  arch/powerpc/platforms/powernv/opal-psr.c  | 169 
> +
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   2 +
>  arch/powerpc/platforms/powernv/opal.c  |   3 +
>  6 files changed, 181 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index c3e0c4a..0d37315 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -193,7 +193,9 @@
>  #define OPAL_NPU_MAP_LPAR148
>  #define OPAL_GET_POWERCAP152
>  #define OPAL_SET_POWERCAP153
> -#define OPAL_LAST153
> +#define OPAL_GET_PSR 154
> +#define OPAL_SET_PSR 155
> +#define OPAL_LAST155
>  
>  /* Device tree flags */
>  
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index ec2087c..58b30a4 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp,
>  int64_t opal_xive_dump(uint32_t type, uint32_t id);
>  int opal_get_powercap(u32 handle, int token, u32 *pcap);
>  int opal_set_powercap(u32 handle, int token, u32 pcap);
> +int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr);
> +int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr);
>  
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
> @@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg)
>  void opal_wake_poller(void);
>  
>  void opal_powercap_init(void);
> +void opal_psr_init(void);
>  
>  #endif /* __ASSEMBLY__ */
>  
> diff --git a/arch/powerpc/platforms/powernv/Makefile 
> b/arch/powerpc/platforms/powernv/Makefile
> index e79f806..9ed7d33 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o 
> opal-async.o idle.o
>  obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o 
> opal-flash.o
>  obj-y+= rng.o opal-elog.o opal-dump.o 
> opal-sysparam.o opal-sensor.o
>  obj-y+= opal-msglog.o opal-hmi.o opal-power.o 
> opal-irqchip.o
> -obj-y+= opal-kmsg.o opal-powercap.o
> +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o
>  
>  obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o
>  obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o
> diff --git a/arch/powerpc/platforms/powernv/opal-psr.c 
> b/arch/powerpc/platforms/powernv/opal-psr.c
> new file mode 100644
> index 000..07e3f78
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-psr.c
> @@ -0,0 +1,169 @@
> +/*
> + * PowerNV OPAL Power-Shifting-Ratio interface
> + *
> + * Copyright 2017 IBM Corp.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +
> +#define pr_fmt(fmt) "opal-psr: " fmt
> +
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +DEFINE_MUTEX(psr_mutex);
> +
> +static struct kobject *psr_kobj;
> +
> +struct psr_attr {
> + u32 handle;
> + struct kobj_attribute attr;
> +};
> +
> +static struct psr_attr *psr_attrs;
> +static struct kobject *psr_kobj;
> +
> +static ssize_t psr_show(struct kobject *kobj, struct kobj_attribute *attr,
> + char *buf)
> +{
> + struct psr_attr *psr_attr = container_of(attr, struct psr_attr, attr);
> + struct opal_msg msg;
> + int psr, ret, token;
> +
> + token = opal_async_get_token_interruptible();
> + if (token < 0) {
> + pr_devel("Failed to get token\n");
> + return token;
> + }
> +
> + mutex_lock(&psr_mutex);
> + ret = opal_get_power_shifting_ratio(psr_attr->handle, token, &psr);

__pa()

> + switch (ret) {
> + case OPAL_ASYNC_COMPLETION:
> + ret = opal_async_wait_response(token, &msg);
> + if (ret) {
> + pr_devel("Failed to wait

Re: [PATCH V8 1/3] powernv: powercap: Add support for powercap framework

2017-07-27 Thread Shilpasri G Bhat
Hi Cyril,

On 07/28/2017 07:09 AM, Cyril Bur wrote:
> Is there any reason that pcap_attrs needs to be contiguous? If not, I
> feel like you could eliminate the entire loop below (and the last one
> as well maybe) and just do the assignment of pattr_groups[].attrs[] up
> there.
> 
> In fact do you even need to store pcap_attrs? If you kcalloc them as
> you need them (in the loop above), you can always free them again on
> error by freeing pattr_groups[].attrs[] right?
> 
> I'll admit I've become quite confused as to the layout of the sysfs dir
>  that you're creating here - would you mind showing what the expected
> layout will be?
> 
> I'll take more of a look once thats more clear in my head
> 
> Thanks,
> 
> Cyril

The sysfs layout looks as below:
# ls /sys/firmware/opal/powercap/
system-powercap

# ls /sys/firmware/opal/powercap/system-powercap/
powercap-current  powercap-max  powercap-min

# grep .  /sys/firmware/opal/powercap/system-powercap/*
/sys/firmware/opal/powercap/system-powercap/powercap-current:2375
/sys/firmware/opal/powercap/system-powercap/powercap-max:2375
/sys/firmware/opal/powercap/system-powercap/powercap-min:1945

Thanks and Regards,
Shilpa



Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Michael S. Tsirkin
On Fri, Jul 28, 2017 at 11:50:45AM +0800, Jason Wang wrote:
> 
> 
> On 2017年07月28日 11:46, Michael S. Tsirkin wrote:
> > On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:
> > > > > + old_prog = rtnl_dereference(tun->xdp_prog);
> > > > > + if (old_prog)
> > > > > + bpf_prog_put(old_prog);
> > > > > + rcu_assign_pointer(tun->xdp_prog, prog);
> > > > Is this OK?  Could this lead to the program getting freed and then
> > > > datapath accessing a stale pointer?  I mean in the scenario where the
> > > > process gets pre-empted between the bpf_prog_put() and
> > > > rcu_assign_pointer()?
> > > Will call bpf_prog_put() after rcu_assign_pointer().
> > I suspect you need to sync RCU or something before that.
> 
> __bpf_prog_put() will do call_rcu(), so looks like it was ok.
> 
> Thanks

True - I missed that.

-- 
MST


Re: [PATCH v8 09/13] iommu/amd: Use is_attach_deferred call-back

2017-07-27 Thread Baoquan He
On 07/27/17 at 05:53pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:07PM +0800, Baoquan He wrote:
> > +static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain,
> > +struct device *dev)
> > +{
> > +   struct iommu_dev_data *dev_data = dev->archdata.iommu;
> > +   return dev_data->defer_attach;
> > +}
> 
> If the redundant newline from patch 1 needs a new home, put it here :-)

Got it, will add a newline at this place. Thanks.


Re: [PATCH v8 05/13] iommu/amd: Add function copy_dev_tables()

2017-07-27 Thread Baoquan He
On 07/27/17 at 05:29pm, Joerg Roedel wrote:
> On Fri, Jul 21, 2017 at 04:59:03PM +0800, Baoquan He wrote:
> > Add function copy_dev_tables to copy the old DEV table entries of the 
> > panicked
> 
> Since there is only one (for now), you can name the function in
> singular: copy_dev_table() or copy_device_table().

Make sense, will change it to copy_device_table(). Thanks.

> 
> > kernel to the new allocated DEV table. Since all iommus share the same DTE 
> > table
> > the copy only need be done one time. Besides, we also need to:
> > 
> >   - Check whether all IOMMUs actually use the same device table with the 
> > same size
> > 
> >   - Verify that the size of the old device table is the expected size.
> > 
> >   - Reserve the old domain id occupied in 1st kernel to avoid touching the 
> > old
> > io-page tables. Then on-flight DMA can continue looking it up.
> > 
> > And also define MACRO DEV_DOMID_MASK to replace magic number 0xULL, it 
> > can be
> > reused in copy_dev_tables().
> > 
> > Signed-off-by: Baoquan He 
> > ---
> >  drivers/iommu/amd_iommu.c   |  2 +-
> >  drivers/iommu/amd_iommu_init.c  | 55 
> > +
> >  drivers/iommu/amd_iommu_types.h |  1 +
> >  3 files changed, 57 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> > index e5a03f259986..4d00f1bda900 100644
> > --- a/drivers/iommu/amd_iommu.c
> > +++ b/drivers/iommu/amd_iommu.c
> > @@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct 
> > protection_domain *domain, bool ats)
> > flags|= tmp;
> > }
> >  
> > -   flags &= ~(0xUL);
> > +   flags &= ~DEV_DOMID_MASK;
> > flags |= domain->id;
> >  
> > amd_iommu_dev_table[devid].data[1]  = flags;
> > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> > index f6da5fe03b31..c58f091ce232 100644
> > --- a/drivers/iommu/amd_iommu_init.c
> > +++ b/drivers/iommu/amd_iommu_init.c
> > @@ -842,6 +842,61 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
> >  }
> >  
> >  
> > +static int copy_dev_tables(void)
> > +{
> > +   struct dev_table_entry *old_devtb = NULL;
> > +   u32 lo, hi, devid, old_devtb_size;
> > +   phys_addr_t old_devtb_phys;
> > +   u64 entry, last_entry = 0;
> > +   struct amd_iommu *iommu;
> > +   u16 dom_id, dte_v;
> > +   static int copied;
> > +
> > +   for_each_iommu(iommu) {
> > +   if (!translation_pre_enabled(iommu)) {
> > +   pr_err("IOMMU:%d is not pre-enabled!/n",
> > +   iommu->index);
> > +   return -1;
> > +   }
> > +
> > +   /* All IOMMUs should use the same device table with the same 
> > size */
> > +   lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
> > +   hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
> > +   entry = (((u64) hi) << 32) + lo;
> > +   if (last_entry && last_entry != entry) {
> > +   pr_err("IOMMU:%d should use the same dev table as 
> > others!/n",
> > +   iommu->index);
> > +   return -1;
> > +   }
> > +   last_entry = entry;
> > +
> > +   old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
> > +   if (old_devtb_size != dev_table_size) {
> > +   pr_err("The device table size of IOMMU:%d is not 
> > expected!/n",
> > +   iommu->index);
> > +   return -1;
> > +   }
> 
> I think the loop can end here. There is only one table to copy, so you
> don't need to copy it for every iommu. Just do the checks in the loop
> and the copy once after the loop.

Ok, will change. I worried that device table addr might not be stored
into iommu register correctly. I am fine we only do the check only at
the first time.

> 
> > +
> > +   old_devtb_phys = entry & PAGE_MASK;
> > +   old_devtb = memremap(old_devtb_phys, dev_table_size, 
> > MEMREMAP_WB);
> > +   if (!old_devtb)
> > +   return -1;
> > +
> > +   if (copied)
> > +   continue;
> > +   for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
> > +   amd_iommu_dev_table[devid] = old_devtb[devid];
> > +   dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
> > +   dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
> > +   if (dte_v && dom_id)
> > +   __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
> > +   }
> 
> Please only copy the DTE when DTE.V is set in the old table. Also

Sorry, the sanity check about DTE.V has been done in patch 7/13. I
should rewrite the subject and git log for patch 7/13.

> don't copy any of the IOMMUv2 enablement bits from the old DTE. PRI
> faults are recoverable by design and a device shouldn't fail on a
> negative answer from the IOMMU.

Yes, this is done in patch 11/13. Saw y

Fwd: trinity test fanotify cause hungtasks on kernel 4.13

2017-07-27 Thread Gu Zheng


hi,ALL:
when we used the trinity test the fanotify interfaces, it cause many hungtasks.
CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y
the shell is  simple:
  1 #!/bin/bash
  2
  3 while true
  4 do
  5 ./trinity -c fanotify_init -l off -C 2 -X > /dev/null 2>&1 &
  6 sleep 1
  7 ./trinity -c fanotify_mark -l off -C 2 -X > /dev/null 2>&1 &
  8 sleep 10
  9 done
we found the trinity enter the D state fastly.
we check the pids'stack
[root@localhost ~]# ps -aux | grep D
USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root   977  0.0  0.0 207992  7904 ?Ss   15:23   0:00 
/usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible 
recursive locking detected ernel BUG at list_del corruption list_add corruption 
do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable 
to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) 
went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody 
cared IRQ handler type mismatch Machine Check Exception: Machine check events 
logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment 
not present: invalid opcode: alignment check: stack segment: fpu exception: 
simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops 
-xtD
root   997  0.0  0.0 203360  3188 ?Ssl  15:23   0:00 
/usr/sbin/gssproxy -D
root  1549  0.0  0.0  82552  6012 ?Ss   15:23   0:00 /usr/sbin/sshd 
-D
root  2807  3.5  0.2  59740 35416 pts/0DL   15:24   0:00 ./trinity -c 
fanotify_init -l off -C 2 -X
root  2809  3.1  0.2  53712 35332 pts/0DL   15:24   0:00 ./trinity -c 
fanotify_mark -l off -C 2 -X
root  2915  0.0  0.0 136948  1776 pts/0D15:24   0:00 ps ax
root  2919  0.0  0.0 112656  2100 pts/1S+   15:24   0:00 grep 
--color=auto D
[root@localhost ~]# cat /proc/2807/stack
[] fanotify_handle_event+0x2a1/0x2f0
[] fsnotify+0x2d3/0x4f0
[] security_file_open+0x89/0x90
[] do_dentry_open+0x139/0x330
[] vfs_open+0x4f/0x70
[] path_openat+0x548/0x1350
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

[root@localhost ~]# cat /proc/2915/stack
[] fanotify_handle_event+0x2a1/0x2f0
[] fsnotify+0x2d3/0x4f0
[] security_file_open+0x89/0x90
[] do_dentry_open+0x139/0x330
[] vfs_open+0x4f/0x70
[] path_openat+0x548/0x1350
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x
[root@localhost ~]# cat /proc/2809/stack
[] fanotify_handle_event+0x2a1/0x2f0
[] fsnotify+0x2d3/0x4f0
[] security_file_open+0x89/0x90
[] do_dentry_open+0x139/0x330
[] vfs_open+0x4f/0x70
[] path_openat+0x548/0x1350
[] do_filp_open+0x91/0x100
[] do_sys_open+0x124/0x210
[] SyS_open+0x1e/0x20
[] do_syscall_64+0x67/0x150
[] entry_SYSCALL64_slow_path+0x25/0x25
[] 0x

all progresses are waiting for the response in 
fanotify_handle_event->fanotify_get_response,
becauseof non-response or killed monitor,so the waitqueue is  in blocked state,
then the others will be stucked which use the  fanotify_get_response.



if we use wait_event_timeout , the responed time can not be guaranteed.

do you have any ideas?
thanks.






Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jakub Kicinski
On Fri, 28 Jul 2017 06:46:40 +0300, Michael S. Tsirkin wrote:
> On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:
> > > > +   old_prog = rtnl_dereference(tun->xdp_prog);
> > > > +   if (old_prog)
> > > > +   bpf_prog_put(old_prog);
> > > > +   rcu_assign_pointer(tun->xdp_prog, prog);  
> > > Is this OK?  Could this lead to the program getting freed and then
> > > datapath accessing a stale pointer?  I mean in the scenario where the
> > > process gets pre-empted between the bpf_prog_put() and
> > > rcu_assign_pointer()?  
> > 
> > Will call bpf_prog_put() after rcu_assign_pointer().  
> 
> I suspect you need to sync RCU or something before that.

I think the bpf_prog_put() will use call_rcu() to do the actual free:

static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
{
if (atomic_dec_and_test(&prog->aux->refcnt)) {
trace_bpf_prog_put_rcu(prog);
/* bpf_prog_free_id() must be called first */
bpf_prog_free_id(prog, do_idr_lock);
bpf_prog_kallsyms_del(prog);
call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
}

It's just that we are only under the rtnl here, RCU lock is not held, so
grace period may elapse between bpf_prog_put() and rcu_assign_pointer().


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jason Wang



On 2017年07月28日 11:46, Michael S. Tsirkin wrote:

On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:

+   old_prog = rtnl_dereference(tun->xdp_prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   rcu_assign_pointer(tun->xdp_prog, prog);

Is this OK?  Could this lead to the program getting freed and then
datapath accessing a stale pointer?  I mean in the scenario where the
process gets pre-empted between the bpf_prog_put() and
rcu_assign_pointer()?

Will call bpf_prog_put() after rcu_assign_pointer().

I suspect you need to sync RCU or something before that.


__bpf_prog_put() will do call_rcu(), so looks like it was ok.

Thanks


Re: [PATCH 1/6] dma: bcm-sba-raid: Improve memory allocation in SBA RAID driver

2017-07-27 Thread Anup Patel
On Fri, Jul 28, 2017 at 8:43 AM, Vinod Koul  wrote:
> On Thu, Jul 27, 2017 at 09:42:33AM +0530, Anup Patel wrote:
>> On Wed, Jul 26, 2017 at 10:39 PM, Vinod Koul  wrote:
>> > On Wed, Jul 26, 2017 at 11:06:39AM +0530, Anup Patel wrote:
>
>> >>  drivers/dma/bcm-sba-raid.c | 439 
>> >> +++--
>> >>  1 file changed, 226 insertions(+), 213 deletions(-)
>> >>
>> >> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
>> >> index e41bbc7..6d15fed 100644
>> >> --- a/drivers/dma/bcm-sba-raid.c
>> >> +++ b/drivers/dma/bcm-sba-raid.c
>> >> @@ -48,7 +48,8 @@
>> >>
>> >>  #include "dmaengine.h"
>> >>
>> >> -/* SBA command related defines */
>> >> +/* == Driver macros and defines = */
>> >
>> > why this noise, seems unrelated to the change!
>>
>> This is just minor beautification. Again, I will put this
>> in separate patch.
>
> Well you can't shove garlands under an unrelated change. By all means throw
> the whole garden out there, but please as a separate patch

Sure, I will have separate patch for this beautification.

>
>>
>> >
>> >> +
>> >>  #define SBA_TYPE_SHIFT   48
>> >>  #define SBA_TYPE_MASKGENMASK(1, 
>> >> 0)
>> >>  #define SBA_TYPE_A   0x0
>> >> @@ -82,39 +83,41 @@
>> >>  #define SBA_CMD_WRITE_BUFFER 0xc
>> >>  #define SBA_CMD_GALOIS   0xe
>> >>
>> >> -/* Driver helper macros */
>> >> +#define SBA_MAX_REQ_PER_MBOX_CHANNEL 8192
>> >> +
>> >>  #define to_sba_request(tx)   \
>> >>   container_of(tx, struct sba_request, tx)
>> >>  #define to_sba_device(dchan) \
>> >>   container_of(dchan, struct sba_device, dma_chan)
>> >>
>> >> -enum sba_request_state {
>> >> - SBA_REQUEST_STATE_FREE = 1,
>> >> - SBA_REQUEST_STATE_ALLOCED = 2,
>> >> - SBA_REQUEST_STATE_PENDING = 3,
>> >> - SBA_REQUEST_STATE_ACTIVE = 4,
>> >> - SBA_REQUEST_STATE_RECEIVED = 5,
>> >> - SBA_REQUEST_STATE_COMPLETED = 6,
>> >> - SBA_REQUEST_STATE_ABORTED = 7,
>> >> +/* = Driver data structures = */
>> >> +
>> >> +enum sba_request_flags {
>> >> + SBA_REQUEST_STATE_FREE  = 0x001,
>> >> + SBA_REQUEST_STATE_ALLOCED   = 0x002,
>> >> + SBA_REQUEST_STATE_PENDING   = 0x004,
>> >> + SBA_REQUEST_STATE_ACTIVE= 0x008,
>> >> + SBA_REQUEST_STATE_RECEIVED  = 0x010,
>> >> + SBA_REQUEST_STATE_COMPLETED = 0x020,
>> >> + SBA_REQUEST_STATE_ABORTED   = 0x040,
>> >> + SBA_REQUEST_STATE_MASK  = 0x0ff,
>> >> + SBA_REQUEST_FENCE   = 0x100,
>> >
>> > how does this help in mem alloctn?
>
> ??

Ahh, I missed to address this comment.

Currently, we have separate "bool" flag for fenced
sba_request and separate "state" variable in
sba_request. We are have merged this two things
in common "u32 flags" in sba_request. In future,
we can use more bits in "u32 flags" as required
without disturbing the sba_request.

I will make this separate patch.

I agree, I have covered many changes in PATCH1
which makes it hard for you to review.

Thanks,
Anup


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Michael S. Tsirkin
On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote:
> > > + old_prog = rtnl_dereference(tun->xdp_prog);
> > > + if (old_prog)
> > > + bpf_prog_put(old_prog);
> > > + rcu_assign_pointer(tun->xdp_prog, prog);
> > Is this OK?  Could this lead to the program getting freed and then
> > datapath accessing a stale pointer?  I mean in the scenario where the
> > process gets pre-empted between the bpf_prog_put() and
> > rcu_assign_pointer()?
> 
> Will call bpf_prog_put() after rcu_assign_pointer().

I suspect you need to sync RCU or something before that.


Re: [Eas-dev] [PATCH V4 0/3] sched: cpufreq: Allow remote callbacks

2017-07-27 Thread Joel Fernandes (Google)
On Thu, Jul 27, 2017 at 12:21 AM, Juri Lelli  wrote:
[..]
>> >
>> > But even without that, if you see the routine
>> > init_entity_runnable_average() in fair.c, the new tasks are
>> > initialized in a way that they are seen as heavy tasks. And so even
>> > for the first time they run, freq should normally increase on the
>> > target CPU (at least with above application).i
>>
>> Ok, but the "heavy" in init_entity_runnable_average means for load,
>> not the util_avg. The util_avg is what's used for frequency scaling
>> IIUC and is set to 0 in that function no?
>>
>
> True for init_entity_runnable_average(), but for new task post_init_
> entity_util_avg() is then also called (from wake_up_new_task()), which
> modifies the initial util_avg value (depending on current rq {util,
> load}_avg.

Got it. That makes sense, thank you!

-Joel


Re: [PATCH 2/6] dma: bcm-sba-raid: Peek mbox when we are left with no free requests

2017-07-27 Thread Anup Patel
On Fri, Jul 28, 2017 at 8:45 AM, Vinod Koul  wrote:
> On Thu, Jul 27, 2017 at 10:25:25AM +0530, Anup Patel wrote:
>> On Wed, Jul 26, 2017 at 10:40 PM, Vinod Koul  wrote:
>> > On Wed, Jul 26, 2017 at 11:06:40AM +0530, Anup Patel wrote:
>> >> We should peek mbox channels when we are left with no free
>> >> sba_requests in sba_alloc_request()
>> >
>> > and why is the world should we do that, how does that help??
>>
>> When setting up RAID array on several NVMe disk we observed
>> that sba_alloc_request() start failing (due to no free requests left)
>> and RAID array setup becomes very slow.
>>
>> Doing mbox channel peek when we have no free requests left,
>> improves performance of RAID array setup.
>
> How about documenting this tribal knowledge in the changelog. Changelogs are
> very useful, 6 months down the line, you will struggle to remember why this
> was changed..

Sure, I will have detailed commit description for this.

Regards,
Anup


Re: [Eas-dev] [PATCH V4 1/3] sched: cpufreq: Allow remote cpufreq callbacks

2017-07-27 Thread Joel Fernandes (Google)
On Thu, Jul 27, 2017 at 12:14 AM, Viresh Kumar  wrote:
> On 26-07-17, 23:13, Joel Fernandes (Google) wrote:
>> On Wed, Jul 26, 2017 at 10:50 PM, Viresh Kumar  
>> wrote:
>> > On 26-07-17, 22:34, Joel Fernandes (Google) wrote:
>> >> On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar  
>> >> wrote:
>> >> > @@ -221,7 +226,7 @@ static void sugov_update_single(struct 
>> >> > update_util_data *hook, u64 time,
>> >> > sugov_set_iowait_boost(sg_cpu, time, flags);
>> >> > sg_cpu->last_update = time;
>> >> >
>> >> > -   if (!sugov_should_update_freq(sg_policy, time))
>> >> > +   if (!sugov_should_update_freq(sg_policy, time, hook->cpu))
>> >> > return;
>> >>
>> >> Since with the remote callbacks now possible, isn't it unsafe to
>> >> modify sg_cpu and sg_policy structures without a lock in
>> >> sugov_update_single?
>> >>
>> >> Unlike sugov_update_shared, we don't acquire any lock in
>> >> sugov_update_single before updating these structures. Did I miss
>> >> something?
>> >
>> > As Peter already mentioned it earlier, the callbacks are called with
>> > rq locks held and so sugov_update_single() wouldn't get called in
>> > parallel for a target CPU.
>>
>> Ah ok, I have to catch up with that discussion since I missed the
>> whole thing. Now that you will have me on CC, that shouldn't happen,
>> thanks and sorry about the noise.
>>
>> > That's the only race you were worried about ?
>>
>> Yes. So then in that case, makes sense to move raw_spin_lock in
>> sugov_update_shared further down? (Just discussing, this point is
>> independent of your patch), Something like:
>
> Even that was discussed tomorrow with Peter :)
>
> No it wouldn't work because sg_cpu->util we are updating here may be
> getting read from some other cpu that shares policy with sg_cpu.
>

Ok. yes you are right :) thank you Viresh and Peter for the clarification.

thanks,

-Joel


Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jason Wang



On 2017年07月28日 11:13, Jakub Kicinski wrote:

On Thu, 27 Jul 2017 17:25:33 +0800, Jason Wang wrote:

This patch tries to implement XDP for tun. The implementation was
split into two parts:

- fast path: small and no gso packet. We try to do XDP at page level
   before build_skb(). For XDP_TX, since creating/destroying queues
   were completely under control of userspace, it was implemented
   through generic XDP helper after skb has been built. This could be
   optimized in the future.
- slow path: big or gso packet. We try to do it after skb was created
   through generic XDP helpers.

XDP_REDIRECT was not implemented, it could be done on top.

xdp1 test shows 47.6% improvement:

Before: ~2.1Mpps
After:  ~3.1Mpps

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
@@ -1008,6 +1016,56 @@ tun_net_get_stats64(struct net_device *dev, struct 
rtnl_link_stats64 *stats)
stats->tx_dropped = tx_dropped;
  }
  
+static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,

+  struct netlink_ext_ack *extack)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+   struct bpf_prog *old_prog;
+
+   /* We will shift the packet that can't be handled to generic
+* XDP layer.
+*/
+
+   old_prog = rtnl_dereference(tun->xdp_prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   rcu_assign_pointer(tun->xdp_prog, prog);

Is this OK?  Could this lead to the program getting freed and then
datapath accessing a stale pointer?  I mean in the scenario where the
process gets pre-empted between the bpf_prog_put() and
rcu_assign_pointer()?


Will call bpf_prog_put() after rcu_assign_pointer().




+   if (prog) {
+   prog = bpf_prog_add(prog, 1);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+   }

I don't think you need this extra reference here.  dev_change_xdp_fd()
will call bpf_prog_get_type() which means driver gets the program with
a reference already taken, drivers does have to free that reference when
program is removed (or device is freed, as you correctly do).


I see, will drop this in next version.

Thanks.




+   return 0;
+}
+




Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 18:11:32 +, Levin, Alexander (Sasha Levin)
wrote:
> This is useful for directly looking up a task based on class id rather than
> having to scan through all open file descriptors.
> 
> Signed-off-by: Sasha Levin 
> ---
> 
> Changes in V2:
>  - Addressed comments from Cong Wang (use nla_put_u32())
> 
>  include/uapi/linux/inet_diag.h |  1 +
>  net/ipv4/inet_diag.c   | 10 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
> index bbe201047df6..678496897a68 100644
> --- a/include/uapi/linux/inet_diag.h
> +++ b/include/uapi/linux/inet_diag.h
> @@ -142,6 +142,7 @@ enum {
>   INET_DIAG_PAD,
>   INET_DIAG_MARK,
>   INET_DIAG_BBRINFO,
> + INET_DIAG_CLASS_ID,
>   __INET_DIAG_MAX,
>  };
>  
> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index 3828b3a805cd..2c2445d4bb58 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct 
> inet_connection_sock *icsk,
>   goto errout;
>   }
>  
> + if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) {
> + u32 classid = 0;
> +
> +#ifdef CONFIG_SOCK_CGROUP_DATA
> + classid = sock_cgroup_classid(&sk->sk_cgrp_data);
> +#endif
> +
> + nla_put_u32(skb, INET_DIAG_CLASS_ID, classid);

You need to check the return value from nla_put_u32() and goto errout
if it's set.  

Perhaps adding __must_check to the nla_put_*() helpers would be a good
idea.

> + }
> +
>  out:
>   nlmsg_end(skb, nlh);
>   return 0;



Re: [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start

2017-07-27 Thread Paul E. McKenney
On Thu, Jul 27, 2017 at 09:38:10PM -0400, Steven Rostedt wrote:
> On Mon, 24 Jul 2017 14:44:36 -0700
> "Paul E. McKenney"  wrote:
> 
> > There is currently event tracing to track when a task is preempted
> > within a preemptible RCU read-side critical section, and also when that
> > task subsequently reaches its outermost rcu_read_unlock(), but none
> > indicating when a new grace period starts when that grace period must
> > wait on pre-existing readers that have been been preempted at least once
> > since the beginning of their current RCU read-side critical sections.
> > 
> > This commit therefore adds an event trace at grace-period start in
> > the case where there are such readers.  Note that only the first
> > reader in the list is traced.
> > 
> > Signed-off-by: Paul E. McKenney 
> > ---
> >  kernel/rcu/tree_plugin.h | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > index 14ba496a13cd..3e3f92e981a1 100644
> > --- a/kernel/rcu/tree_plugin.h
> > +++ b/kernel/rcu/tree_plugin.h
> > @@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node 
> > *rnp)
> >   */
> >  static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
> >  {
> > +   struct task_struct *t;
> > +
> > RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() 
> > invoked with preemption enabled!!!\n");
> > WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp));
> > -   if (rcu_preempt_has_tasks(rnp))
> > +   if (rcu_preempt_has_tasks(rnp)) {
> 
> The only function of this if block is to fill the content of the
> trace event, correct?
> 
> What about doing:
> 
>   if (trace_rcu_unlock_preempted_task_enabled() &&
>   rcu_preempt_has_tasks(rnp)) {
> 
> instead? The trace_rcu_unlock_preempted_task_enabled() is a static
> branch (aka jump_label), which would make the above a constant branch
> when tracing is not enabled, and would keep this from adding any extra
> overhead.
> 
> -- Steve
> 
> > rnp->gp_tasks = rnp->blkd_tasks.next;

The trace_rcu_unlock_preempted_task_enabled() call is a new one on me,
thank you!

Unfortunately, the above assignment to rnp->gp_tasks is required even
if tracing is disabled.  The reason is that the newly started grace
period needs to wait on all tasks that have been preempted within their
current RCU read-side critical section, and rnp->gp_tasks records the
point in the rnp->blkd_tasks list beyond which all preempted tasks block
this new grace period.

If this assignment is omitted, we get too-short grace periods, and the
tasks on this list might still be holding references to stuff that gets
freed at the end of this new grace period.

I applied your two acks, thank you!

Thanx, Paul

> > +   t = container_of(rnp->gp_tasks, struct task_struct,
> > +rcu_node_entry);
> > +   trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"),
> > +   rnp->gpnum, t->pid);
> > +   }
> > WARN_ON_ONCE(rnp->qsmask);
> >  }
> >  
> 



Re: [PATCH net-next 3/3] tap: XDP support

2017-07-27 Thread Jakub Kicinski
On Thu, 27 Jul 2017 17:25:33 +0800, Jason Wang wrote:
> This patch tries to implement XDP for tun. The implementation was
> split into two parts:
> 
> - fast path: small and no gso packet. We try to do XDP at page level
>   before build_skb(). For XDP_TX, since creating/destroying queues
>   were completely under control of userspace, it was implemented
>   through generic XDP helper after skb has been built. This could be
>   optimized in the future.
> - slow path: big or gso packet. We try to do it after skb was created
>   through generic XDP helpers.
> 
> XDP_REDIRECT was not implemented, it could be done on top.
> 
> xdp1 test shows 47.6% improvement:
> 
> Before: ~2.1Mpps
> After:  ~3.1Mpps
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Jason Wang 

> @@ -1008,6 +1016,56 @@ tun_net_get_stats64(struct net_device *dev, struct 
> rtnl_link_stats64 *stats)
>   stats->tx_dropped = tx_dropped;
>  }
>  
> +static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> +struct netlink_ext_ack *extack)
> +{
> + struct tun_struct *tun = netdev_priv(dev);
> + struct bpf_prog *old_prog;
> +
> + /* We will shift the packet that can't be handled to generic
> +  * XDP layer.
> +  */
> +
> + old_prog = rtnl_dereference(tun->xdp_prog);
> + if (old_prog)
> + bpf_prog_put(old_prog);
> + rcu_assign_pointer(tun->xdp_prog, prog);

Is this OK?  Could this lead to the program getting freed and then
datapath accessing a stale pointer?  I mean in the scenario where the
process gets pre-empted between the bpf_prog_put() and
rcu_assign_pointer()?

> + if (prog) {
> + prog = bpf_prog_add(prog, 1);
> + if (IS_ERR(prog))
> + return PTR_ERR(prog);
> + }

I don't think you need this extra reference here.  dev_change_xdp_fd()
will call bpf_prog_get_type() which means driver gets the program with
a reference already taken, drivers does have to free that reference when
program is removed (or device is freed, as you correctly do).

> + return 0;
> +}
> +


Re: [PATCH 2/6] dma: bcm-sba-raid: Peek mbox when we are left with no free requests

2017-07-27 Thread Vinod Koul
On Thu, Jul 27, 2017 at 10:25:25AM +0530, Anup Patel wrote:
> On Wed, Jul 26, 2017 at 10:40 PM, Vinod Koul  wrote:
> > On Wed, Jul 26, 2017 at 11:06:40AM +0530, Anup Patel wrote:
> >> We should peek mbox channels when we are left with no free
> >> sba_requests in sba_alloc_request()
> >
> > and why is the world should we do that, how does that help??
> 
> When setting up RAID array on several NVMe disk we observed
> that sba_alloc_request() start failing (due to no free requests left)
> and RAID array setup becomes very slow.
> 
> Doing mbox channel peek when we have no free requests left,
> improves performance of RAID array setup.

How about documenting this tribal knowledge in the changelog. Changelogs are
very useful, 6 months down the line, you will struggle to remember why this
was changed..

> 
> This change is inspired from mv_chan_alloc_slot() implemented
> in drivers/dma/mv_xor.c
> 
> Regards,
> Anup

-- 
~Vinod


Re: [PATCH Y.A. RESEND] MAINTAINERS: fix alpha. ordering

2017-07-27 Thread Joe Perches
On Thu, 2017-07-27 at 19:43 -0700, Linus Torvalds wrote:
> On Thu, Jul 27, 2017 at 5:30 PM, Joe Perches  wrote:
> > 
> > Maybe add a reordering of the patterns so that each pattern list
> > is in a specific order too
> 
> I don't think this is wrong per se, but I'm not sure I want to get
> into the merge hell any more than we are already.
> 
> Maybe when/if that file is actually split up?

Fine by me.

The get_maintainer patch is a prereq to any split-up.

There are a bunch of little niggly patches that
should go in that remove/update bad F: patterns too
one day.

Given the differences between -next and your tree,
I think only Andrew and quilt would do a decent
job of getting individual patches merged.

Unless you want to take them.

I think it's better to centralize the MAINTAINERS
location in /MAINTAINERS/ than spread
them all over the tree given how many subsystems and
maintainerships are also spread around the tree.

But the get_maintainers patch I sent allows both
styles.






Re: [PATCH 1/6] dma: bcm-sba-raid: Improve memory allocation in SBA RAID driver

2017-07-27 Thread Vinod Koul
On Thu, Jul 27, 2017 at 09:42:33AM +0530, Anup Patel wrote:
> On Wed, Jul 26, 2017 at 10:39 PM, Vinod Koul  wrote:
> > On Wed, Jul 26, 2017 at 11:06:39AM +0530, Anup Patel wrote:

> >>  drivers/dma/bcm-sba-raid.c | 439 
> >> +++--
> >>  1 file changed, 226 insertions(+), 213 deletions(-)
> >>
> >> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c
> >> index e41bbc7..6d15fed 100644
> >> --- a/drivers/dma/bcm-sba-raid.c
> >> +++ b/drivers/dma/bcm-sba-raid.c
> >> @@ -48,7 +48,8 @@
> >>
> >>  #include "dmaengine.h"
> >>
> >> -/* SBA command related defines */
> >> +/* == Driver macros and defines = */
> >
> > why this noise, seems unrelated to the change!
> 
> This is just minor beautification. Again, I will put this
> in separate patch.

Well you can't shove garlands under an unrelated change. By all means throw
the whole garden out there, but please as a separate patch

> 
> >
> >> +
> >>  #define SBA_TYPE_SHIFT   48
> >>  #define SBA_TYPE_MASKGENMASK(1, 0)
> >>  #define SBA_TYPE_A   0x0
> >> @@ -82,39 +83,41 @@
> >>  #define SBA_CMD_WRITE_BUFFER 0xc
> >>  #define SBA_CMD_GALOIS   0xe
> >>
> >> -/* Driver helper macros */
> >> +#define SBA_MAX_REQ_PER_MBOX_CHANNEL 8192
> >> +
> >>  #define to_sba_request(tx)   \
> >>   container_of(tx, struct sba_request, tx)
> >>  #define to_sba_device(dchan) \
> >>   container_of(dchan, struct sba_device, dma_chan)
> >>
> >> -enum sba_request_state {
> >> - SBA_REQUEST_STATE_FREE = 1,
> >> - SBA_REQUEST_STATE_ALLOCED = 2,
> >> - SBA_REQUEST_STATE_PENDING = 3,
> >> - SBA_REQUEST_STATE_ACTIVE = 4,
> >> - SBA_REQUEST_STATE_RECEIVED = 5,
> >> - SBA_REQUEST_STATE_COMPLETED = 6,
> >> - SBA_REQUEST_STATE_ABORTED = 7,
> >> +/* = Driver data structures = */
> >> +
> >> +enum sba_request_flags {
> >> + SBA_REQUEST_STATE_FREE  = 0x001,
> >> + SBA_REQUEST_STATE_ALLOCED   = 0x002,
> >> + SBA_REQUEST_STATE_PENDING   = 0x004,
> >> + SBA_REQUEST_STATE_ACTIVE= 0x008,
> >> + SBA_REQUEST_STATE_RECEIVED  = 0x010,
> >> + SBA_REQUEST_STATE_COMPLETED = 0x020,
> >> + SBA_REQUEST_STATE_ABORTED   = 0x040,
> >> + SBA_REQUEST_STATE_MASK  = 0x0ff,
> >> + SBA_REQUEST_FENCE   = 0x100,
> >
> > how does this help in mem alloctn?

??

-- 
~Vinod


Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported

2017-07-27 Thread Ding Tianhong


On 2017/7/28 1:49, Alexander Duyck wrote:
> On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong  
> wrote:
>>
>>
>> On 2017/7/27 2:26, Casey Leedom wrote:
>>>   By the way Ding, two issues:
>>>
>>>  1. Did we ever get any acknowledgement from either Intel or AMD
>>> on this patch?  I know that we can't ensure that, but it sure would
>>> be nice since the PCI Quirks that we're putting in affect their
>>> products.
>>>
>>
>> Still no Intel and AMD guys has ack this, this is what I am worried about, 
>> should I
>> ping some man again ?
>>
>> Thanks
>> Ding
> 
> 
> I probably wouldn't worry about it too much. If anything all this
> patch is doing is disabling relaxed ordering on the platforms we know
> have issues based on what Casey originally had. If nothing else we can
> follow up once the patches are in the kernel and if somebody has an
> issue then.
> 
> You can include my acked-by, but it is mostly related to how this
> interacts with NICs, and not so much about the PCI chipsets
> themselves.
> 
> Acked-by: Alexander Duyck 
> 

Thanks, Alex. :)

> .
> 



  1   2   3   4   5   6   7   8   9   >