Re: [PATCH] ARM: Use WFI when possible when halting a core
On Thu, Jul 27, 2017 at 8:45 PM, Chen-Yu Tsai wrote: > On Thu, Jul 27, 2017 at 8:17 PM, Russell King - ARM Linux > wrote: >> On Thu, Jul 27, 2017 at 08:08:03PM +0800, Chen-Yu Tsai wrote: >>> On ARM, the kernel busy loops after cleaning up when a core is to be >>> halted. This may have the undesired effect of increasing the core >>> temperature, at a time when no power or thermal management is >>> available, such as a kernel panic or shutdown to halt. >>> >>> x86 uses the more efficient HLT (halt) instruction. The equivalent >>> instruction on ARM is WFI (wait for interrupt). >>> >>> This patch makes the busy loops in the ARM halt/reboot/panic paths >>> use WFI when available (as defined in arch/arm/include/asm/barrier.h). >>> >>> A touch test indicates that this lowers the surface temperature of the >>> Allwinner H3 SoC, with all 4 cores brought up with SMP, from painfully >>> hot to just warm to the touch after shutdown to halt. >>> >>> Signed-off-by: Chen-Yu Tsai >>> --- >>> >>> I asked about this some time ago, and the answer was no one had done >>> it. So here it is. It is somewhat similar to the patch "arm64: >>> Improve parking of stopped CPUs", for arm64 obviously. >> >> There are various errata around the "wfi" instruction on various CPUs >> which means that this simple implementation isn't going to work >> reliably. >> >> This is where cpu_do_idle() comes in - please use that instead. > > Thanks for the tip. Looks like this one is implemented for every > platform. No need for #ifdefs anymore. So I've done the modifications and re-tested. Unfortunately it no longer works, either using wfi() directly or cpu_do_idle(). It worked back in January this year. I'm not sure what's happening. The chip is still hot. (Hotter than when it is idle in userspace.) I added some printk calls around the wfi() / cpu_do_idle() calls and found it continuously looping, instead of being parked. But of course printk might add to the problem. AFAIK all interrupts should be disabled by the time the call is reached. What could possibly bring the CPU out of WFI? Regards ChenYu
Re: [PATCH] mm: memcontrol: Use int for event/state parameter in several functions
On Thu 27-07-17 14:10:04, Matthias Kaehlcke wrote: > Several functions use an enum type as parameter for an event/state, > but are called in some locations with an argument of a different enum > type. Adjust the interface of these functions to reality by changing the > parameter to int. enums are mostly for documentation purposes. Using defines would be possible but less elegant. If for anything else then things like MEMCG_NR_STAT. > This fixes a ton of enum-conversion warnings that are generated when > building the kernel with clang. > > Signed-off-by: Matthias Kaehlcke Acked-by: Michal Hocko Thanks! > --- > include/linux/memcontrol.h | 20 > mm/memcontrol.c| 4 +++- > 2 files changed, 15 insertions(+), 9 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 3914e3dd6168..80edbc04361e 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -487,8 +487,9 @@ extern int do_swap_account; > void lock_page_memcg(struct page *page); > void unlock_page_memcg(struct page *page); > > +/* idx can be of type enum memcg_stat_item or node_stat_item */ > static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, > - enum memcg_stat_item idx) > + int idx) > { > long val = 0; > int cpu; > @@ -502,15 +503,17 @@ static inline unsigned long memcg_page_state(struct > mem_cgroup *memcg, > return val; > } > > +/* idx can be of type enum memcg_stat_item or node_stat_item */ > static inline void __mod_memcg_state(struct mem_cgroup *memcg, > - enum memcg_stat_item idx, int val) > + int idx, int val) > { > if (!mem_cgroup_disabled()) > __this_cpu_add(memcg->stat->count[idx], val); > } > > +/* idx can be of type enum memcg_stat_item or node_stat_item */ > static inline void mod_memcg_state(struct mem_cgroup *memcg, > -enum memcg_stat_item idx, int val) > +int idx, int val) > { > if (!mem_cgroup_disabled()) > this_cpu_add(memcg->stat->count[idx], val); > @@ -631,8 +634,9 @@ static inline void count_memcg_events(struct mem_cgroup > *memcg, > this_cpu_add(memcg->stat->events[idx], count); > } > > +/* idx can be of type enum memcg_stat_item or node_stat_item */ > static inline void count_memcg_page_event(struct page *page, > - enum memcg_stat_item idx) > + int idx) > { > if (page->mem_cgroup) > count_memcg_events(page->mem_cgroup, idx, 1); > @@ -840,19 +844,19 @@ static inline bool mem_cgroup_oom_synchronize(bool wait) > } > > static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, > - enum memcg_stat_item idx) > + int idx) > { > return 0; > } > > static inline void __mod_memcg_state(struct mem_cgroup *memcg, > - enum memcg_stat_item idx, > + int idx, >int nr) > { > } > > static inline void mod_memcg_state(struct mem_cgroup *memcg, > -enum memcg_stat_item idx, > +int idx, > int nr) > { > } > @@ -918,7 +922,7 @@ static inline void count_memcg_events(struct mem_cgroup > *memcg, > } > > static inline void count_memcg_page_event(struct page *page, > - enum memcg_stat_item idx) > + int idx) > { > } > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 3df3c04d73ab..460130d2a796 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -550,10 +550,12 @@ mem_cgroup_largest_soft_limit_node(struct > mem_cgroup_tree_per_node *mctz) > * value, and reading all cpu value can be performance bottleneck in some > * common workload, threshold and synchronization as vmstat[] should be > * implemented. > + * > + * The parameter idx can be of type enum memcg_event_item or vm_event_item. > */ > > static unsigned long memcg_sum_events(struct mem_cgroup *memcg, > - enum memcg_event_item event) > + int event) > { > unsigned long val = 0; > int cpu; > -- > 2.14.0.rc0.400.g1c36432dff-goog -- Michal Hocko SUSE Labs
Re: [PATCH] KVM: nVMX: do not pin the VMCS12
On 27/07/2017 19:20, David Matlack wrote: >> + kvm_vcpu_write_guest_page(&vmx->vcpu, >> + vmx->nested.current_vmptr >> PAGE_SHIFT, >> + vmx->nested.cached_vmcs12, 0, VMCS12_SIZE); > Have you hit any "suspicious RCU usage" error messages during VM > teardown with this patch? We did when we replaced memcpy with > kvm_write_guest a while back. IIRC it was due to kvm->srcu not being > held in one of the teardown paths. kvm_write_guest() expects it to be > held in order to access memslots. > > We fixed this by skipping the VMCS12 flush during VMXOFF. I'll send > that patch along with a few other nVMX dirty tracking related patches > I've been meaning to get upstreamed. Oh, right. I had this other (untested) patch in the queue after Christian recently annotated everything with RCU checks: Paolo diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 890b706d1943..07e3b02a1be3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -477,7 +477,8 @@ struct kvm { static inline struct kvm_io_bus *kvm_get_bus(struct kvm *kvm, enum kvm_bus idx) { return srcu_dereference_check(kvm->buses[idx], &kvm->srcu, - lockdep_is_held(&kvm->slots_lock)); + lockdep_is_held(&kvm->slots_lock) || + !refcount_read(&kvm->users_count)); } static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i) @@ -570,7 +571,8 @@ void kvm_put_kvm(struct kvm *kvm); static inline struct kvm_memslots *__kvm_memslots(struct kvm *kvm, int as_id) { return srcu_dereference_check(kvm->memslots[as_id], &kvm->srcu, - lockdep_is_held(&kvm->slots_lock)); + lockdep_is_held(&kvm->slots_lock) || + !refcount_read(&kvm->users_count)); } static inline struct kvm_memslots *kvm_memslots(struct kvm *kvm) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f3f74271f1a9..6a21c98b22bf 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -655,7 +655,6 @@ static struct kvm *kvm_create_vm(unsigned long type) mutex_init(&kvm->lock); mutex_init(&kvm->irq_lock); mutex_init(&kvm->slots_lock); - refcount_set(&kvm->users_count, 1); INIT_LIST_HEAD(&kvm->devices); r = kvm_arch_init_vm(kvm, type); @@ -701,6 +700,7 @@ static struct kvm *kvm_create_vm(unsigned long type) if (r) goto out_err; + refcount_set(&kvm->users_count, 1); spin_lock(&kvm_lock); list_add(&kvm->vm_list, &vm_list); spin_unlock(&kvm_lock); @@ -717,10 +717,9 @@ static struct kvm *kvm_create_vm(unsigned long type) hardware_disable_all(); out_err_no_disable: for (i = 0; i < KVM_NR_BUSES; i++) - kfree(rcu_access_pointer(kvm->buses[i])); + kfree(kvm_get_bus(kvm, i)); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) - kvm_free_memslots(kvm, - rcu_dereference_protected(kvm->memslots[i], 1)); + kvm_free_memslots(kvm, __kvm_memslots(kvm, i)); kvm_arch_free_vm(kvm); mmdrop(current->mm); return ERR_PTR(r); @@ -754,9 +754,8 @@ static void kvm_destroy_vm(struct kvm *kvm) spin_unlock(&kvm_lock); kvm_free_irq_routing(kvm); for (i = 0; i < KVM_NR_BUSES; i++) { - struct kvm_io_bus *bus; + struct kvm_io_bus *bus = kvm_get_bus(kvm, i); - bus = rcu_dereference_protected(kvm->buses[i], 1); if (bus) kvm_io_bus_destroy(bus); kvm->buses[i] = NULL; @@ -770,8 +769,7 @@ static void kvm_destroy_vm(struct kvm *kvm) kvm_arch_destroy_vm(kvm); kvm_destroy_devices(kvm); for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) - kvm_free_memslots(kvm, - rcu_dereference_protected(kvm->memslots[i], 1)); + kvm_free_memslots(kvm, __kvm_memslots(kvm, i)); cleanup_srcu_struct(&kvm->irq_srcu); cleanup_srcu_struct(&kvm->srcu); kvm_arch_free_vm(kvm);
[PATCH v6 02/14] spi: qup: Setup DMA mode correctly
To operate in DMA mode, the buffer should be aligned and the size of the transfer should be a multiple of block size (for v1). And the no. of words being transferred should be programmed in the count registers appropriately. Signed-off-by: Andy Gross Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 118 +++--- 1 file changed, 55 insertions(+), 63 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index c0d4def..abe799b 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -149,11 +149,18 @@ struct spi_qup { int rx_bytes; int qup_v1; - int use_dma; + int mode; struct dma_slave_config rx_conf; struct dma_slave_config tx_conf; }; +static inline bool spi_qup_is_dma_xfer(int mode) +{ + if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM) + return true; + + return false; +} static inline bool spi_qup_is_valid_state(struct spi_qup *controller) { @@ -424,7 +431,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) error = -EIO; } - if (!controller->use_dma) { + if (!spi_qup_is_dma_xfer(controller->mode)) { if (opflags & QUP_OP_IN_SERVICE_FLAG) spi_qup_fifo_read(controller, xfer); @@ -443,34 +450,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) return IRQ_HANDLED; } -static u32 -spi_qup_get_mode(struct spi_master *master, struct spi_transfer *xfer) -{ - struct spi_qup *qup = spi_master_get_devdata(master); - u32 mode; - - qup->w_size = 4; - - if (xfer->bits_per_word <= 8) - qup->w_size = 1; - else if (xfer->bits_per_word <= 16) - qup->w_size = 2; - - qup->n_words = xfer->len / qup->w_size; - - if (qup->n_words <= (qup->in_fifo_sz / sizeof(u32))) - mode = QUP_IO_M_MODE_FIFO; - else - mode = QUP_IO_M_MODE_BLOCK; - - return mode; -} - /* set clock freq ... bits per word */ static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) { struct spi_qup *controller = spi_master_get_devdata(spi->master); - u32 config, iomode, mode, control; + u32 config, iomode, control; int ret, n_words; if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) { @@ -491,25 +475,30 @@ static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) return -EIO; } - mode = spi_qup_get_mode(spi->master, xfer); + controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8); + controller->n_words = xfer->len / controller->w_size; n_words = controller->n_words; - if (mode == QUP_IO_M_MODE_FIFO) { + if (n_words <= (controller->in_fifo_sz / sizeof(u32))) { + + controller->mode = QUP_IO_M_MODE_FIFO; + writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT); writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT); /* must be zero for FIFO */ writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT); writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT); - } else if (!controller->use_dma) { + } else if (spi->master->can_dma && + spi->master->can_dma(spi->master, spi, xfer) && + spi->master->cur_msg_mapped) { + + controller->mode = QUP_IO_M_MODE_BAM; + writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT); writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT); /* must be zero for BLOCK and BAM */ writel_relaxed(0, controller->base + QUP_MX_READ_CNT); writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT); - } else { - mode = QUP_IO_M_MODE_BAM; - writel_relaxed(0, controller->base + QUP_MX_READ_CNT); - writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT); if (!controller->qup_v1) { void __iomem *input_cnt; @@ -528,19 +517,28 @@ static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT); } + } else { + + controller->mode = QUP_IO_M_MODE_BLOCK; + + writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT); + writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT); + /* must be zero for BLOCK and BAM */ + writel_relaxed(0, controller->base + QUP_MX_READ_CNT); + writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT); } iomode = readl_relaxed(controller->bas
[PATCH v6 04/14] spi: qup: Place the QUP in run mode before DMA
Signed-off-by: Andy Gross Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index fdd34c3..f1aa5c1 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -343,6 +343,14 @@ static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer, else if (xfer->tx_buf) tx_done = spi_qup_dma_done; + /* before issuing the descriptors, set the QUP to run */ + ret = spi_qup_set_state(qup, QUP_STATE_RUN); + if (ret) { + dev_warn(qup->dev, "%s(%d): cannot set RUN state\n", + __func__, __LINE__); + return ret; + } + if (xfer->rx_buf) { ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM, rx_done); if (ret) @@ -385,6 +393,13 @@ static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer, spi_qup_fifo_write(qup, xfer); + ret = spi_qup_set_state(qup, QUP_STATE_RUN); + if (ret) { + dev_warn(qup->dev, "%s(%d): cannot set RUN state\n", + __func__, __LINE__); + return ret; + } + if (!wait_for_completion_timeout(&qup->done, timeout)) return -ETIMEDOUT; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 06/14] spi: qup: Fix transaction done signaling
Wait to signal done until we get all of the interrupts we are expecting to get for a transaction. If we don't wait for the input done flag, we can be in between transactions when the done flag comes in and this can mess up the next transaction. While here cleaning up the code which sets controller->xfer = NULL and restores it in the ISR. This looks to be some debug code which is not required. Signed-off-by: Andy Gross Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 27 +-- 1 file changed, 5 insertions(+), 22 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index ef95294..a7c630c 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -409,29 +409,16 @@ static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer, static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) { struct spi_qup *controller = dev_id; - struct spi_transfer *xfer; + struct spi_transfer *xfer = controller->xfer; u32 opflags, qup_err, spi_err; - unsigned long flags; int error = 0; - spin_lock_irqsave(&controller->lock, flags); - xfer = controller->xfer; - controller->xfer = NULL; - spin_unlock_irqrestore(&controller->lock, flags); - qup_err = readl_relaxed(controller->base + QUP_ERROR_FLAGS); spi_err = readl_relaxed(controller->base + SPI_ERROR_FLAGS); opflags = readl_relaxed(controller->base + QUP_OPERATIONAL); writel_relaxed(qup_err, controller->base + QUP_ERROR_FLAGS); writel_relaxed(spi_err, controller->base + SPI_ERROR_FLAGS); - writel_relaxed(opflags, controller->base + QUP_OPERATIONAL); - - if (!xfer) { - dev_err_ratelimited(controller->dev, "unexpected irq %08x %08x %08x\n", - qup_err, spi_err, opflags); - return IRQ_HANDLED; - } if (qup_err) { if (qup_err & QUP_ERROR_OUTPUT_OVER_RUN) @@ -455,7 +442,9 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) error = -EIO; } - if (!spi_qup_is_dma_xfer(controller->mode)) { + if (spi_qup_is_dma_xfer(controller->mode)) { + writel_relaxed(opflags, controller->base + QUP_OPERATIONAL); + } else { if (opflags & QUP_OP_IN_SERVICE_FLAG) spi_qup_fifo_read(controller, xfer); @@ -463,12 +452,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) spi_qup_fifo_write(controller, xfer); } - spin_lock_irqsave(&controller->lock, flags); - controller->error = error; - controller->xfer = xfer; - spin_unlock_irqrestore(&controller->lock, flags); - - if (controller->rx_bytes == xfer->len || error) + if ((opflags & QUP_OP_MAX_INPUT_DONE_FLAG) || error) complete(&controller->done); return IRQ_HANDLED; @@ -666,7 +650,6 @@ static int spi_qup_transfer_one(struct spi_master *master, exit: spi_qup_set_state(controller, QUP_STATE_RESET); spin_lock_irqsave(&controller->lock, flags); - controller->xfer = NULL; if (!ret) ret = controller->error; spin_unlock_irqrestore(&controller->lock, flags); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 05/14] spi: qup: Fix error handling in spi_qup_prep_sg
Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index f1aa5c1..ef95294 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -311,8 +311,8 @@ static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer *xfer, } desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags); - if (!desc) - return -EINVAL; + if (IS_ERR_OR_NULL(desc)) + return desc ? PTR_ERR(desc) : -EINVAL; desc->callback = callback; desc->callback_param = qup; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support
Hi Florian, 在 2017/7/28 0:54, Florian Fainelli 写道: - if you need knowledge about this PHY connection type prior to binding the PHY device and its driver (that is, before of_phy_connect()) we could add a boolean property e.g: "phy-is-internal" that allows us to know that, or we can have a new phy-mode value, e.g: "internal-rmii" which describes that, either way would probably be fine, but the former scales better Using "phy-is-internal" is very helpful, but it is easy to confuse with the real internal PHY, may we use the other words like phy is inlined🙂. Then again, using phy-mode = "internal" even though this is Reduced MII is not big of a deal IMHO as long as there is no loss of information and that internal de-facto means internal reduced MII for instance. --
[PATCH v6 09/14] spi: qup: call io_config in mode specific function
DMA transactions should only only need to call io_config only once, but block mode might call it several times to setup several transactions so it can handle reads/writes larger than the max size per transaction, so we move the call to the do_ functions. This is just refactoring, there should be no functional change Signed-off-by: Matthew McClintock Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 26 +- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index ff5aa08..1aa6078 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -156,6 +156,8 @@ struct spi_qup { struct dma_slave_config tx_conf; }; +static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer); + static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag) { u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL); @@ -417,11 +419,12 @@ static void spi_qup_dma_terminate(struct spi_master *master, dmaengine_terminate_all(master->dma_rx); } -static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer, +static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer, unsigned long timeout) { - struct spi_qup *qup = spi_master_get_devdata(master); dma_async_tx_callback rx_done = NULL, tx_done = NULL; + struct spi_master *master = spi->master; + struct spi_qup *qup = spi_master_get_devdata(master); int ret; if (xfer->rx_buf) @@ -429,6 +432,10 @@ static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer, else if (xfer->tx_buf) tx_done = spi_qup_dma_done; + ret = spi_qup_io_config(spi, xfer); + if (ret) + return ret; + /* before issuing the descriptors, set the QUP to run */ ret = spi_qup_set_state(qup, QUP_STATE_RUN); if (ret) { @@ -459,12 +466,17 @@ static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer, return 0; } -static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer, +static int spi_qup_do_pio(struct spi_device *spi, struct spi_transfer *xfer, unsigned long timeout) { + struct spi_master *master = spi->master; struct spi_qup *qup = spi_master_get_devdata(master); int ret; + ret = spi_qup_io_config(spi, xfer); + if (ret) + return ret; + ret = spi_qup_set_state(qup, QUP_STATE_RUN); if (ret) { dev_warn(qup->dev, "cannot set RUN state\n"); @@ -742,10 +754,6 @@ static int spi_qup_transfer_one(struct spi_master *master, if (ret) return ret; - ret = spi_qup_io_config(spi, xfer); - if (ret) - return ret; - timeout = DIV_ROUND_UP(xfer->speed_hz, MSEC_PER_SEC); timeout = DIV_ROUND_UP(xfer->len * 8, timeout); timeout = 100 * msecs_to_jiffies(timeout); @@ -760,9 +768,9 @@ static int spi_qup_transfer_one(struct spi_master *master, spin_unlock_irqrestore(&controller->lock, flags); if (spi_qup_is_dma_xfer(controller->mode)) - ret = spi_qup_do_dma(master, xfer, timeout); + ret = spi_qup_do_dma(spi, xfer, timeout); else - ret = spi_qup_do_pio(master, xfer, timeout); + ret = spi_qup_do_pio(spi, xfer, timeout); if (ret) goto exit; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 12/14] spi: qup: allow multiple DMA transactions per spi xfer
Much like the block mode changes, we are breaking up DMA transactions into 64K chunks so we can reset the QUP engine. Signed-off-by: Matthew McClintock Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 92 --- 1 file changed, 66 insertions(+), 26 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 1af3b41..3c2c2c0 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -418,12 +418,35 @@ static void spi_qup_dma_terminate(struct spi_master *master, dmaengine_terminate_all(master->dma_rx); } +static u32 spi_qup_sgl_get_nents_len(struct scatterlist *sgl, u32 max, +u32 *nents) +{ + struct scatterlist *sg; + u32 total = 0; + + *nents = 0; + + for (sg = sgl; sg; sg = sg_next(sg)) { + unsigned int len = sg_dma_len(sg); + + /* check for overflow as well as limit */ + if (((total + len) < total) || ((total + len) > max)) + break; + + total += len; + (*nents)++; + } + + return total; +} + static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer, unsigned long timeout) { dma_async_tx_callback rx_done = NULL, tx_done = NULL; struct spi_master *master = spi->master; struct spi_qup *qup = spi_master_get_devdata(master); + struct scatterlist *tx_sgl, *rx_sgl; int ret; if (xfer->rx_buf) @@ -431,40 +454,57 @@ static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer, else if (xfer->tx_buf) tx_done = spi_qup_dma_done; - ret = spi_qup_io_config(spi, xfer); - if (ret) - return ret; + rx_sgl = xfer->rx_sg.sgl; + tx_sgl = xfer->tx_sg.sgl; - /* before issuing the descriptors, set the QUP to run */ - ret = spi_qup_set_state(qup, QUP_STATE_RUN); - if (ret) { - dev_warn(qup->dev, "%s(%d): cannot set RUN state\n", - __func__, __LINE__); - return ret; - } + do { + u32 rx_nents, tx_nents; + + if (rx_sgl) + qup->n_words = spi_qup_sgl_get_nents_len(rx_sgl, + SPI_MAX_XFER, &rx_nents) / qup->w_size; + if (tx_sgl) + qup->n_words = spi_qup_sgl_get_nents_len(tx_sgl, + SPI_MAX_XFER, &tx_nents) / qup->w_size; + if (!qup->n_words) + return -EIO; - if (xfer->rx_buf) { - ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl, - xfer->rx_sg.nents, DMA_DEV_TO_MEM, - rx_done); + ret = spi_qup_io_config(spi, xfer); if (ret) return ret; - dma_async_issue_pending(master->dma_rx); - } - - if (xfer->tx_buf) { - ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl, - xfer->tx_sg.nents, DMA_MEM_TO_DEV, - tx_done); - if (ret) + /* before issuing the descriptors, set the QUP to run */ + ret = spi_qup_set_state(qup, QUP_STATE_RUN); + if (ret) { + dev_warn(qup->dev, "cannot set RUN state\n"); return ret; + } + if (rx_sgl) { + ret = spi_qup_prep_sg(master, rx_sgl, rx_nents, + DMA_DEV_TO_MEM, rx_done); + if (ret) + return ret; + dma_async_issue_pending(master->dma_rx); + } - dma_async_issue_pending(master->dma_tx); - } + if (tx_sgl) { + ret = spi_qup_prep_sg(master, tx_sgl, tx_nents, + DMA_MEM_TO_DEV, tx_done); + if (ret) + return ret; + + dma_async_issue_pending(master->dma_tx); + } + + if (!wait_for_completion_timeout(&qup->done, timeout)) + return -ETIMEDOUT; + + for (; rx_sgl && rx_nents--; rx_sgl = sg_next(rx_sgl)) + ; + for (; tx_sgl && tx_nents--; tx_sgl = sg_next(tx_sgl)) + ; - if (!wait_for_completion_timeout(&qup->done, timeout)) - return -ETIMEDOUT; + } while (rx_sgl || tx_sgl); return 0; } -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 14/14] spi: qup: Fix QUP version identify method
Use of_device_get_match_data to identify QUP version instead of of_device_is_compatible. Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 4c3c938..1364516 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -1058,9 +1059,7 @@ static int spi_qup_probe(struct platform_device *pdev) else if (!ret) master->can_dma = spi_qup_can_dma; - /* set v1 flag if device is version 1 */ - if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1")) - controller->qup_v1 = 1; + controller->qup_v1 = (int)of_device_get_match_data(dev); if (!controller->qup_v1) master->set_cs = spi_qup_set_cs; @@ -1256,7 +1255,7 @@ static int spi_qup_remove(struct platform_device *pdev) } static const struct of_device_id spi_qup_dt_match[] = { - { .compatible = "qcom,spi-qup-v1.1.1", }, + { .compatible = "qcom,spi-qup-v1.1.1", .data = (void *)1, }, { .compatible = "qcom,spi-qup-v2.1.1", }, { .compatible = "qcom,spi-qup-v2.2.1", }, { } -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 10/14] spi: qup: allow block mode to generate multiple transactions
This let's you write more to the SPI bus than 64K-1 which is important if the block size of a SPI device is >= 64K or some other device wants to do something larger. This has the benefit of completely removing spi_message from the spi-qup transactions Signed-off-by: Matthew McClintock Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 128 +++--- 1 file changed, 80 insertions(+), 48 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 1aa6078..707b1ec 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -120,7 +120,7 @@ #define SPI_NUM_CHIPSELECTS4 -#define SPI_MAX_DMA_XFER (SZ_64K - 64) +#define SPI_MAX_XFER (SZ_64K - 64) /* high speed mode is when bus rate is greater then 26MHz */ #define SPI_HS_MIN_RATE2600 @@ -149,6 +149,8 @@ struct spi_qup { int n_words; int tx_bytes; int rx_bytes; + const u8*tx_buf; + u8 *rx_buf; int qup_v1; int mode; @@ -173,6 +175,12 @@ static inline bool spi_qup_is_dma_xfer(int mode) return false; } +/* get's the transaction size length */ +static inline unsigned int spi_qup_len(struct spi_qup *controller) +{ + return controller->n_words * controller->w_size; +} + static inline bool spi_qup_is_valid_state(struct spi_qup *controller) { u32 opstate = readl_relaxed(controller->base + QUP_STATE); @@ -225,10 +233,9 @@ static int spi_qup_set_state(struct spi_qup *controller, u32 state) return 0; } -static void spi_qup_read_from_fifo(struct spi_qup *controller, - struct spi_transfer *xfer, u32 num_words) +static void spi_qup_read_from_fifo(struct spi_qup *controller, u32 num_words) { - u8 *rx_buf = xfer->rx_buf; + u8 *rx_buf = controller->rx_buf; int i, shift, num_bytes; u32 word; @@ -236,8 +243,9 @@ static void spi_qup_read_from_fifo(struct spi_qup *controller, word = readl_relaxed(controller->base + QUP_INPUT_FIFO); - num_bytes = min_t(int, xfer->len - controller->rx_bytes, - controller->w_size); + num_bytes = min_t(int, spi_qup_len(controller) - + controller->rx_bytes, + controller->w_size); if (!rx_buf) { controller->rx_bytes += num_bytes; @@ -258,13 +266,12 @@ static void spi_qup_read_from_fifo(struct spi_qup *controller, } } -static void spi_qup_read(struct spi_qup *controller, - struct spi_transfer *xfer) +static void spi_qup_read(struct spi_qup *controller) { u32 remainder, words_per_block, num_words; bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK; - remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes, + remainder = DIV_ROUND_UP(spi_qup_len(controller) - controller->rx_bytes, controller->w_size); words_per_block = controller->in_blk_sz >> 2; @@ -285,7 +292,7 @@ static void spi_qup_read(struct spi_qup *controller, } /* read up to the maximum transfer size available */ - spi_qup_read_from_fifo(controller, xfer, num_words); + spi_qup_read_from_fifo(controller, num_words); remainder -= num_words; @@ -307,18 +314,18 @@ static void spi_qup_read(struct spi_qup *controller, } -static void spi_qup_write_to_fifo(struct spi_qup *controller, - struct spi_transfer *xfer, u32 num_words) +static void spi_qup_write_to_fifo(struct spi_qup *controller, u32 num_words) { - const u8 *tx_buf = xfer->tx_buf; + const u8 *tx_buf = controller->tx_buf; int i, num_bytes; u32 word, data; for (; num_words; num_words--) { word = 0; - num_bytes = min_t(int, xfer->len - controller->tx_bytes, - controller->w_size); + num_bytes = min_t(int, spi_qup_len(controller) - + controller->tx_bytes, + controller->w_size); if (tx_buf) for (i = 0; i < num_bytes; i++) { data = tx_buf[controller->tx_bytes + i]; @@ -338,13 +345,12 @@ static void spi_qup_dma_done(void *data) complete(&qup->done); } -static void spi_qup_write(struct spi_qup *controller, - struct spi_transfer *xfer) +static void spi_qup_write(struct spi_qup *controller) { bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK; u32 remainder, words_per_block, num_words; - remain
[PATCH v6 13/14] spi: qup: Ensure done detection
This patch fixes an issue where a SPI transaction has completed, but the done condition is missed. This occurs because at the time of interrupt the MAX_INPUT_DONE_FLAG is not asserted. However, in the process of reading blocks of data from the FIFO, the last portion of data comes in. The opflags read at the beginning of the irq handler no longer matches the current opflag state. To get around this condition, the block read function should update the opflags so that done detection is correct after the return. Signed-off-by: Andy Gross Signed-off-by: Abhishek Sahu Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 3c2c2c0..4c3c938 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -266,7 +266,7 @@ static void spi_qup_read_from_fifo(struct spi_qup *controller, u32 num_words) } } -static void spi_qup_read(struct spi_qup *controller) +static void spi_qup_read(struct spi_qup *controller, u32 *opflags) { u32 remainder, words_per_block, num_words; bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK; @@ -305,10 +305,12 @@ static void spi_qup_read(struct spi_qup *controller) /* * Due to extra stickiness of the QUP_OP_IN_SERVICE_FLAG during block -* mode reads, it has to be cleared again at the very end +* reads, it has to be cleared again at the very end. However, be sure +* to refresh opflags value because MAX_INPUT_DONE_FLAG may now be +* present and this is used to determine if transaction is complete */ - if (is_block_mode && spi_qup_is_flag_set(controller, - QUP_OP_MAX_INPUT_DONE_FLAG)) + *opflags = readl_relaxed(controller->base + QUP_OPERATIONAL); + if (is_block_mode && *opflags & QUP_OP_MAX_INPUT_DONE_FLAG) writel_relaxed(QUP_OP_IN_SERVICE_FLAG, controller->base + QUP_OPERATIONAL); @@ -613,7 +615,7 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) writel_relaxed(opflags, controller->base + QUP_OPERATIONAL); } else { if (opflags & QUP_OP_IN_SERVICE_FLAG) - spi_qup_read(controller); + spi_qup_read(controller, &opflags); if (opflags & QUP_OP_OUT_SERVICE_FLAG) spi_qup_write(controller); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 11/14] spi: qup: refactor spi_qup_prep_sg
Take specific sgl and nent to be prepared. This is in preparation for splitting DMA into multiple transacations, this contains no code changes just refactoring. Signed-off-by: Matthew McClintock Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 707b1ec..1af3b41 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -382,27 +382,20 @@ static void spi_qup_write(struct spi_qup *controller) } while (remainder); } -static int spi_qup_prep_sg(struct spi_master *master, struct spi_transfer *xfer, - enum dma_transfer_direction dir, +static int spi_qup_prep_sg(struct spi_master *master, struct scatterlist *sgl, + unsigned int nents, enum dma_transfer_direction dir, dma_async_tx_callback callback) { struct spi_qup *qup = spi_master_get_devdata(master); unsigned long flags = DMA_PREP_INTERRUPT | DMA_PREP_FENCE; struct dma_async_tx_descriptor *desc; - struct scatterlist *sgl; struct dma_chan *chan; dma_cookie_t cookie; - unsigned int nents; - if (dir == DMA_MEM_TO_DEV) { + if (dir == DMA_MEM_TO_DEV) chan = master->dma_tx; - nents = xfer->tx_sg.nents; - sgl = xfer->tx_sg.sgl; - } else { + else chan = master->dma_rx; - nents = xfer->rx_sg.nents; - sgl = xfer->rx_sg.sgl; - } desc = dmaengine_prep_slave_sg(chan, sgl, nents, dir, flags); if (IS_ERR_OR_NULL(desc)) @@ -451,7 +444,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer, } if (xfer->rx_buf) { - ret = spi_qup_prep_sg(master, xfer, DMA_DEV_TO_MEM, rx_done); + ret = spi_qup_prep_sg(master, xfer->rx_sg.sgl, + xfer->rx_sg.nents, DMA_DEV_TO_MEM, + rx_done); if (ret) return ret; @@ -459,7 +454,9 @@ static int spi_qup_do_dma(struct spi_device *spi, struct spi_transfer *xfer, } if (xfer->tx_buf) { - ret = spi_qup_prep_sg(master, xfer, DMA_MEM_TO_DEV, tx_done); + ret = spi_qup_prep_sg(master, xfer->tx_sg.sgl, + xfer->tx_sg.nents, DMA_MEM_TO_DEV, + tx_done); if (ret) return ret; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 07/14] spi: qup: Do block sized read/write in block mode
This patch corrects the behavior of the BLOCK transactions. During block transactions, the controller must be read/written to in block size transactions. Signed-off-by: Andy Gross Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 151 +++--- 1 file changed, 119 insertions(+), 32 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index a7c630c..8cfa112 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -82,6 +82,8 @@ #define QUP_IO_M_MODE_BAM 3 /* QUP_OPERATIONAL fields */ +#define QUP_OP_IN_BLOCK_READ_REQ BIT(13) +#define QUP_OP_OUT_BLOCK_WRITE_REQ BIT(12) #define QUP_OP_MAX_INPUT_DONE_FLAG BIT(11) #define QUP_OP_MAX_OUTPUT_DONE_FLAGBIT(10) #define QUP_OP_IN_SERVICE_FLAG BIT(9) @@ -154,6 +156,13 @@ struct spi_qup { struct dma_slave_config tx_conf; }; +static inline bool spi_qup_is_flag_set(struct spi_qup *controller, u32 flag) +{ + u32 opflag = readl_relaxed(controller->base + QUP_OPERATIONAL); + + return (opflag & flag) != 0; +} + static inline bool spi_qup_is_dma_xfer(int mode) { if (mode == QUP_IO_M_MODE_DMOV || mode == QUP_IO_M_MODE_BAM) @@ -214,29 +223,26 @@ static int spi_qup_set_state(struct spi_qup *controller, u32 state) return 0; } -static void spi_qup_fifo_read(struct spi_qup *controller, - struct spi_transfer *xfer) +static void spi_qup_read_from_fifo(struct spi_qup *controller, + struct spi_transfer *xfer, u32 num_words) { u8 *rx_buf = xfer->rx_buf; - u32 word, state; - int idx, shift, w_size; + int i, shift, num_bytes; + u32 word; - w_size = controller->w_size; - - while (controller->rx_bytes < xfer->len) { - - state = readl_relaxed(controller->base + QUP_OPERATIONAL); - if (0 == (state & QUP_OP_IN_FIFO_NOT_EMPTY)) - break; + for (; num_words; num_words--) { word = readl_relaxed(controller->base + QUP_INPUT_FIFO); + num_bytes = min_t(int, xfer->len - controller->rx_bytes, + controller->w_size); + if (!rx_buf) { - controller->rx_bytes += w_size; + controller->rx_bytes += num_bytes; continue; } - for (idx = 0; idx < w_size; idx++, controller->rx_bytes++) { + for (i = 0; i < num_bytes; i++, controller->rx_bytes++) { /* * The data format depends on bytes per SPI word: * 4 bytes: 0x12345678 @@ -244,38 +250,80 @@ static void spi_qup_fifo_read(struct spi_qup *controller, * 1 byte : 0x0012 */ shift = BITS_PER_BYTE; - shift *= (w_size - idx - 1); + shift *= (controller->w_size - i - 1); rx_buf[controller->rx_bytes] = word >> shift; } } } -static void spi_qup_fifo_write(struct spi_qup *controller, +static void spi_qup_read(struct spi_qup *controller, struct spi_transfer *xfer) { - const u8 *tx_buf = xfer->tx_buf; - u32 word, state, data; - int idx, w_size; + u32 remainder, words_per_block, num_words; + bool is_block_mode = controller->mode == QUP_IO_M_MODE_BLOCK; + + remainder = DIV_ROUND_UP(xfer->len - controller->rx_bytes, +controller->w_size); + words_per_block = controller->in_blk_sz >> 2; + + do { + /* ACK by clearing service flag */ + writel_relaxed(QUP_OP_IN_SERVICE_FLAG, + controller->base + QUP_OPERATIONAL); + + if (is_block_mode) { + num_words = (remainder > words_per_block) ? + words_per_block : remainder; + } else { + if (!spi_qup_is_flag_set(controller, +QUP_OP_IN_FIFO_NOT_EMPTY)) + break; - w_size = controller->w_size; + num_words = 1; + } - while (controller->tx_bytes < xfer->len) { + /* read up to the maximum transfer size available */ + spi_qup_read_from_fifo(controller, xfer, num_words); - state = readl_relaxed(controller->base + QUP_OPERATIONAL); - if (state & QUP_OP_OUT_FIFO_FULL) + remainder -= num_words; + + /* if block mode, check to see if next block is available */ + if (is_block_mode && !spi_qup_is_flag_set(controller, + QUP_OP_IN_BLOCK_READ_REQ)) break;
[PATCH v6 08/14] spi: qup: refactor spi_qup_io_config into two functions
This is in preparation for handling transactions larger than 64K-1 bytes in block mode, which is currently unsupported and quietly fails. We need to break these into two functions 1) prep is called once per spi_message and 2) io_config is called once per spi-qup bus transaction This is just refactoring, there should be no functional change Signed-off-by: Matthew McClintock Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 91 +++ 1 file changed, 62 insertions(+), 29 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 8cfa112..ff5aa08 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -545,12 +545,11 @@ static irqreturn_t spi_qup_qup_irq(int irq, void *dev_id) return IRQ_HANDLED; } -/* set clock freq ... bits per word */ -static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) +/* set clock freq ... bits per word, determine mode */ +static int spi_qup_io_prep(struct spi_device *spi, struct spi_transfer *xfer) { struct spi_qup *controller = spi_master_get_devdata(spi->master); - u32 config, iomode, control; - int ret, n_words; + int ret; if (spi->mode & SPI_LOOP && xfer->len > controller->in_fifo_sz) { dev_err(controller->dev, "too big size for loopback %d > %d\n", @@ -565,32 +564,56 @@ static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) return -EIO; } - if (spi_qup_set_state(controller, QUP_STATE_RESET)) { - dev_err(controller->dev, "cannot set RESET state\n"); - return -EIO; - } - controller->w_size = DIV_ROUND_UP(xfer->bits_per_word, 8); controller->n_words = xfer->len / controller->w_size; - n_words = controller->n_words; - - if (n_words <= (controller->in_fifo_sz / sizeof(u32))) { + if (controller->n_words <= (controller->in_fifo_sz / sizeof(u32))) controller->mode = QUP_IO_M_MODE_FIFO; + else if (spi->master->can_dma && +spi->master->can_dma(spi->master, spi, xfer) && +spi->master->cur_msg_mapped) + controller->mode = QUP_IO_M_MODE_BAM; + else + controller->mode = QUP_IO_M_MODE_BLOCK; + + return 0; +} + +/* prep qup for another spi transaction of specific type */ +static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) +{ + struct spi_qup *controller = spi_master_get_devdata(spi->master); + u32 config, iomode, control; + unsigned long flags; + + spin_lock_irqsave(&controller->lock, flags); + controller->xfer = xfer; + controller->error= 0; + controller->rx_bytes = 0; + controller->tx_bytes = 0; + spin_unlock_irqrestore(&controller->lock, flags); + + + if (spi_qup_set_state(controller, QUP_STATE_RESET)) { + dev_err(controller->dev, "cannot set RESET state\n"); + return -EIO; + } - writel_relaxed(n_words, controller->base + QUP_MX_READ_CNT); - writel_relaxed(n_words, controller->base + QUP_MX_WRITE_CNT); + switch (controller->mode) { + case QUP_IO_M_MODE_FIFO: + writel_relaxed(controller->n_words, + controller->base + QUP_MX_READ_CNT); + writel_relaxed(controller->n_words, + controller->base + QUP_MX_WRITE_CNT); /* must be zero for FIFO */ writel_relaxed(0, controller->base + QUP_MX_INPUT_CNT); writel_relaxed(0, controller->base + QUP_MX_OUTPUT_CNT); - } else if (spi->master->can_dma && - spi->master->can_dma(spi->master, spi, xfer) && - spi->master->cur_msg_mapped) { - - controller->mode = QUP_IO_M_MODE_BAM; - - writel_relaxed(n_words, controller->base + QUP_MX_INPUT_CNT); - writel_relaxed(n_words, controller->base + QUP_MX_OUTPUT_CNT); + break; + case QUP_IO_M_MODE_BAM: + writel_relaxed(controller->n_words, + controller->base + QUP_MX_INPUT_CNT); + writel_relaxed(controller->n_words, + controller->base + QUP_MX_OUTPUT_CNT); /* must be zero for BLOCK and BAM */ writel_relaxed(0, controller->base + QUP_MX_READ_CNT); writel_relaxed(0, controller->base + QUP_MX_WRITE_CNT); @@ -608,19 +631,25 @@ static int spi_qup_io_config(struct spi_device *spi, struct spi_transfer *xfer) if (xfer->tx_buf) writel_relaxed(0, input_cnt); else - writel_relaxed(n_words, input_cnt); + writel_relaxed(controller->n_words, input_cnt);
[PATCH v6 03/14] spi: qup: Add completion timeout
Add i/o completion timeout for DMA and PIO modes. Signed-off-by: Andy Gross Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index abe799b..fdd34c3 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -331,8 +331,10 @@ static void spi_qup_dma_terminate(struct spi_master *master, dmaengine_terminate_all(master->dma_rx); } -static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer) +static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer, + unsigned long timeout) { + struct spi_qup *qup = spi_master_get_devdata(master); dma_async_tx_callback rx_done = NULL, tx_done = NULL; int ret; @@ -357,10 +359,14 @@ static int spi_qup_do_dma(struct spi_master *master, struct spi_transfer *xfer) dma_async_issue_pending(master->dma_tx); } + if (!wait_for_completion_timeout(&qup->done, timeout)) + return -ETIMEDOUT; + return 0; } -static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer) +static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer, + unsigned long timeout) { struct spi_qup *qup = spi_master_get_devdata(master); int ret; @@ -379,6 +385,9 @@ static int spi_qup_do_pio(struct spi_master *master, struct spi_transfer *xfer) spi_qup_fifo_write(qup, xfer); + if (!wait_for_completion_timeout(&qup->done, timeout)) + return -ETIMEDOUT; + return 0; } @@ -632,9 +641,9 @@ static int spi_qup_transfer_one(struct spi_master *master, spin_unlock_irqrestore(&controller->lock, flags); if (spi_qup_is_dma_xfer(controller->mode)) - ret = spi_qup_do_dma(master, xfer); + ret = spi_qup_do_dma(master, xfer, timeout); else - ret = spi_qup_do_pio(master, xfer); + ret = spi_qup_do_pio(master, xfer, timeout); if (ret) goto exit; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 01/14] spi: qup: Enable chip select support
Enable chip select support for QUP versions later than v1. The chip select support was broken in QUP version 1. Hence the chip select support was removed earlier in an earlier commit (4a8573abe "spi: qup: Remove chip select function"). Since the chip select support is functional in recent versions of QUP, re-enabling it for QUP versions later than v1. Signed-off-by: Sham Muthayyan Signed-off-by: Varadarajan Narayanan --- drivers/spi/spi-qup.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/spi/spi-qup.c b/drivers/spi/spi-qup.c index 1bfa889..c0d4def 100644 --- a/drivers/spi/spi-qup.c +++ b/drivers/spi/spi-qup.c @@ -750,6 +750,24 @@ static int spi_qup_init_dma(struct spi_master *master, resource_size_t base) return ret; } +static void spi_qup_set_cs(struct spi_device *spi, bool val) +{ + struct spi_qup *controller; + u32 spi_ioc; + u32 spi_ioc_orig; + + controller = spi_master_get_devdata(spi->master); + spi_ioc = readl_relaxed(controller->base + SPI_IO_CONTROL); + spi_ioc_orig = spi_ioc; + if (!val) + spi_ioc |= SPI_IO_C_FORCE_CS; + else + spi_ioc &= ~SPI_IO_C_FORCE_CS; + + if (spi_ioc != spi_ioc_orig) + writel_relaxed(spi_ioc, controller->base + SPI_IO_CONTROL); +} + static int spi_qup_probe(struct platform_device *pdev) { struct spi_master *master; @@ -846,6 +864,9 @@ static int spi_qup_probe(struct platform_device *pdev) if (of_device_is_compatible(dev->of_node, "qcom,spi-qup-v1.1.1")) controller->qup_v1 = 1; + if (!controller->qup_v1) + master->set_cs = spi_qup_set_cs; + spin_lock_init(&controller->lock); init_completion(&controller->done); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v6 00/14] spi: qup: Fixes and add support for >64k transfers
v6: Fix git bisect-ability issues in spi: qup: Add completion timeout spi: qup: call io_config in mode specific function spi: qup: allow multiple DMA transactions per spi xfer v5: Incorporate feedback from Mark Brown and Geert Uytterhoeven spi: qup: Enable chip select support Update commit log. Include the commit's description Removed "spi: qup: Add completion structures for DMA". This introduced separate completion structures for DMA based rx/tx and FIFO based i/os. This was not needed. Added "spi: qup: Fix QUP version identify method" to identify qup version using of_device_get_match_data instead of of_device_is_compatible. v4: Discard patch #15, 'spi: qup: support for qup v1 dma'. This depends on ADM driver, which is not upstreamed yet. v3: Fix git bisect-ability issues in spi: qup: Add completion structures for DMA spi: qup: Add completion timeout spi: qup: Place the QUP in run mode before DMA spi: qup: Fix transaction done signaling v2: Incorporate feedback from Andy Gross, Sricharan, Stanimir Varbanov Modified the QUP-v1 dma completion sequence to QUP-v2 as per feedback. Removed code that used controller->xfer to identify extra interrupts, since with the fixes done to handle i/o completion we don't see extra interrupts. v1: This series fixes some existing issues in the code for both interrupt and dma mode. Patches 1 - 11 are the fixes. Random failures/timeout are observed without these fixes. Also, the current driver does not support block transfers > 64K and the driver quietly fails. Patches 12 - 18 add support for this in both interrupt and dma mode. The entire series has been tested on ipq4019 with SPI-NOR flash for block sizes > 64k. Varadarajan Narayanan (14): spi: qup: Enable chip select support spi: qup: Setup DMA mode correctly spi: qup: Add completion timeout spi: qup: Place the QUP in run mode before DMA spi: qup: Fix error handling in spi_qup_prep_sg spi: qup: Fix transaction done signaling spi: qup: Do block sized read/write in block mode spi: qup: refactor spi_qup_io_config into two functions spi: qup: call io_config in mode specific function spi: qup: allow block mode to generate multiple transactions spi: qup: refactor spi_qup_prep_sg spi: qup: allow multiple DMA transactions per spi xfer spi: qup: Ensure done detection spi: qup: Fix QUP version identify method drivers/spi/spi-qup.c | 566 ++ 1 file changed, 392 insertions(+), 174 deletions(-) -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH 3.10] pstore: Make spinlock per zone instead of global
On Fri, Jul 28, 2017 at 06:25:55AM +0200, Willy Tarreau wrote: > Hi Leo, > > There was no upstream commit ID here but I found it in mainline here : > > commit 109704492ef637956265ec2eb72ae7b3b39eb6f4 > Author: Joel Fernandes > Date: Thu Oct 20 00:34:00 2016 -0700 > > pstore: Make spinlock per zone instead of global > > What worries me is that some later fixes were issued, apparently to fix > an oops and a warning after this patch : Yes, below two patches I also notices. But at least I have not reproduce them on Android common kernel 4.4. I only faced the hang issue and the first patch just fixes it. BTW, I tried to port the second and third patch, but seems the second patch is dependency on one extra patch; so avoid to introduce complexity to resolve issue, I just port the first one for fixing issues. commit 663deb47880f2283809669563c5a52ac7c6aef1a Author: Joel Fernandes Date: Thu Oct 20 00:34:01 2016 -0700 pstore: Allow prz to control need for locking In preparation of not locking at all for certain buffers depending on if there's contention, make locking optional depending on the initialization of the prz. > commit 76d5692a58031696e282384cbd893832bc92bd76 > Author: Kees Cook > Date: Thu Feb 9 15:43:44 2017 -0800 > > pstore: Correctly initialize spinlock and flags > > The ram backend wasn't always initializing its spinlock correctly. Since > it was coming from kzalloc memory, though, it was harmless on > architectures that initialize unlocked spinlocks to 0 (at least x86 and > ARM). This also fixes a possibly ignored flag setting too. > > and : > > commit e9a330c4289f2ba1ca4bf98c2b430ab165a8931b > Author: Kees Cook > Date: Sun Mar 5 22:08:58 2017 -0800 > > pstore: Use dynamic spinlock initializer > > The per-prz spinlock should be using the dynamic initializer so that > lockdep can correctly track it. Without this, under lockdep, we get a > warning at boot that the lock is in non-static memory. > > So I'm fine with merging this patch as long as Kees is OK with this and > we know what exact patch series needs to be merged. > > Also, the information you added to the commit message references a trace > on a 4.4 kernel. Do you confirm that you got the same issue on 3.10 ? No, I only can confirm this on kernel 4.4. Now only kernel 4.4 are avaliable on the board, and I verified mainline kernel can work well; so this is why I can check difference between them and find the first patch is critical. > I just prefer to avoid blindly backporting sensitive patches if they're not > absolutely needed. > > > [ 65.103905] hrtimer: interrupt took 2759375 ns > > [ 65.108721] BUG: spinlock recursion on CPU#0, kschedfreq:0/1246 > > [ 65.108760] lock: buffer_lock+0x0/0x38, .magic: dead4ead, .owner: > > kschedfreq:0/1246, .owner_cpu: 0 > > [ 65.108779] CPU: 0 PID: 1246 Comm: kschedfreq:0 Not tainted > > 4.4.74-07294-g5c996a9-dirty #130 > > Thanks! > willy
Re: [PATCH v8 06/13] iommu/amd: copy old trans table from old kernel
On 07/27/17 at 05:38pm, Joerg Roedel wrote: > On Fri, Jul 21, 2017 at 04:59:04PM +0800, Baoquan He wrote: > > @@ -2128,9 +2131,43 @@ static void early_enable_iommu(struct amd_iommu > > *iommu) > > static void early_enable_iommus(void) > > { > > struct amd_iommu *iommu; > > + bool is_pre_enabled = false; > > > > - for_each_iommu(iommu) > > - early_enable_iommu(iommu); > > + for_each_iommu(iommu) { > > + if (translation_pre_enabled(iommu)) { > > + is_pre_enabled = true; > > + break; > > + } > > + } > > is_pre_enabled should only be true when _all_ iommus are pre-enabled. If > only one is found disabled just disable the others and continue without > copying the device table. OK, will change as you suggested. >
Re: [PATCH v7 16/26] x86/insn-eval: Support both signed 32-bit and 64-bit effective addresses
On Thu, Jul 27, 2017 at 07:04:52PM -0700, Ricardo Neri wrote: > However using the union could be less readable than having two almost > identical functions. So having some small duplication for the sake of clarity and readability is much better, if you ask me. And it's not like you're duplicating a lot of code - it is only a handful of functions. -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --
[PATCH] x86/boot: check overlap between kernel and EFI_BOOT_SERVICES_*
On Wed, Jul 26, 2017 at 09:34:32AM +0800, Baoquan He wrote: > On 07/26/17 at 09:13am, Baoquan He wrote: > > On 07/26/17 at 12:12am, Naoya Horiguchi wrote: > > > On Mon, Jul 24, 2017 at 02:20:44PM +0100, Matt Fleming wrote: > > > > On Mon, 10 Jul, at 02:51:36PM, Naoya Horiguchi wrote: > > > > > EFI_BOOT_SERVICES_{CODE|DATA} regions never overlap the kernel now, > > > > > so we can clean up the check in efi_reserve_boot_services(). > > > > > > > > > > Signed-off-by: Naoya Horiguchi > > > > > --- > > > > > arch/x86/platform/efi/quirks.c | 23 +-- > > > > > 1 file changed, 1 insertion(+), 22 deletions(-) > > > > > > > > Is this true for kernels not using KASLR? > > > > > > Thank you for pointing out this. It's not true depending on memmap layout. > > > If a firmware does not define the memory around the kernel address > > > (0x100 or CONFIG_PHYSICAL_START) as EFI_BOOT_SERVICES_*, no overlap > > > happens. That's true in my testing server, but I don't think that we can > > > expect it generally. > > > > > > So I think of adding some assertion in the patch 1/2 to detect this > > > overlap > > > in extract_kernel() even for no KASLR case. > > > > EFI_BOOT_SERVICES_* memory are collected as e820 region of > > E820_TYPE_RAM, how can we guarantee kernel won't use them after jumping > > into the running kernel whether KASLR enabled or not? We can only wish Current kernels have no such check, so it's not guarantted, I think. And I guess that most firmwares (luckily?) avoid the overlap, and/or if the overlap happens, that might not cause any detectable error. > > that EFI firmware engineer don't put EFI_BOOT_SERVICES_* far from > sorry, typo. I meant EFI boot > service region need be put far from 0x100. Otherwise normal kernel could > allocate memory bottom up and stomp on them. It's embarassment caused by > the hardware flaw of x86 platfrom. > > 0x100 where normal kernel is loaded. > > > So I think of adding some assertion in the patch 1/2 to detect this > > > overlap > > > in extract_kernel() even for no KASLR case. Anyway I wrote the following patch adding the assertion as a separate one. I'm glad if I can get your feedback or advise. Thanks, Naoya Horiguchi --- From: Naoya Horiguchi Date: Thu, 27 Jul 2017 13:19:35 +0900 Subject: [PATCH] x86/boot: check overlap between kernel and EFI_BOOT_SERVICES_* Even when KASLR is turned off, kernel still could overlap with EFI_BOOT_SERVICES_* region, for example because of firmware memmap layout and/or CONFIG_PHYSICAL_START. So this patch adds an assertion that we are free from the overlap which is called for non-KASLR case. Signed-off-by: Naoya Horiguchi --- arch/x86/boot/compressed/kaslr.c | 3 --- arch/x86/boot/compressed/misc.c | 56 2 files changed, 56 insertions(+), 3 deletions(-) diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c index a6604717d4fe..5549c80b45c2 100644 --- a/arch/x86/boot/compressed/kaslr.c +++ b/arch/x86/boot/compressed/kaslr.c @@ -721,9 +721,6 @@ void choose_random_location(unsigned long input, boot_params->hdr.loadflags |= KASLR_FLAG; - /* Prepare to add new identity pagetables on demand. */ - initialize_identity_maps(); - /* Record the various known unsafe memory ranges. */ mem_avoid_init(input, input_size, *output); diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index a0838ab929f2..b23159fa159c 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -15,6 +15,8 @@ #include "error.h" #include "../string.h" #include "../voffset.h" +#include +#include /* * WARNING!! @@ -169,6 +171,55 @@ void __puthex(unsigned long value) } } +#ifdef CONFIG_EFI +bool __init +efi_kernel_boot_services_overlap(unsigned long start, unsigned long size) +{ + int i; + u32 nr_desc; + struct efi_info *e = &boot_params->efi_info; + efi_memory_desc_t *md; + char *signature; + unsigned long pmap; + + signature = (char *)&boot_params->efi_info.efi_loader_signature; + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) && + strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) + return false; + +#ifdef CONFIG_X86 /* Can't handle data above 4GB at this time */ + if (e->efi_memmap_hi) { + warn("Memory map is above 4GB, EFI should be disabled.\n"); + return false; + } + pmap = e->efi_memmap; +#else + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32)); +#endif + + add_identity_map(pmap, e->efi_memmap_size); + + nr_desc = e->efi_memmap_size / e->efi_memdesc_size; + for (i = 0; i < nr_desc; i++) { + md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size)); + if (md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT) <= start) +
[PATCH V5 2/2] cpufreq: Process remote callbacks from any CPU if the platform permits
On many platforms, CPUs can do DVFS across cpufreq policies. i.e CPU from policy-A can change frequency of CPUs belonging to policy-B. This is quite common in case of ARM platforms where we don't configure any per-cpu register. Add a flag to identify such platforms and update cpufreq_can_do_remote_dvfs() to allow remote callbacks if this flag is set. Also enable the flag for cpufreq-dt driver which is used only on ARM platforms currently. Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq-dt.c | 1 + include/linux/cpufreq.h | 18 -- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c index fef3c2160691..d83ab94d041a 100644 --- a/drivers/cpufreq/cpufreq-dt.c +++ b/drivers/cpufreq/cpufreq-dt.c @@ -274,6 +274,7 @@ static int cpufreq_init(struct cpufreq_policy *policy) transition_latency = CPUFREQ_ETERNAL; policy->cpuinfo.transition_latency = transition_latency; + policy->dvfs_possible_from_any_cpu = true; return 0; diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index b3b6e8203e82..227cd0f13300 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -127,6 +127,15 @@ struct cpufreq_policy { */ unsigned inttransition_delay_us; + /* +* Remote DVFS flag (Not added to the driver structure as we don't want +* to access another structure from scheduler hotpath). +* +* Should be set if CPUs can do DVFS on behalf of other CPUs from +* different cpufreq policies. +*/ + booldvfs_possible_from_any_cpu; + /* Cached frequency lookup from cpufreq_driver_resolve_freq. */ unsigned int cached_target_freq; int cached_resolved_idx; @@ -564,8 +573,13 @@ struct governor_attr { static inline bool cpufreq_can_do_remote_dvfs(struct cpufreq_policy *policy) { - /* Allow remote callbacks only on the CPUs sharing cpufreq policy */ - if (cpumask_test_cpu(smp_processor_id(), policy->cpus)) + /* +* Allow remote callbacks if: +* - dvfs_possible_from_any_cpu flag is set +* - the local and remote CPUs share cpufreq policy +*/ + if (policy->dvfs_possible_from_any_cpu || + cpumask_test_cpu(smp_processor_id(), policy->cpus)) return true; return false; -- 2.13.0.71.gd7076ec9c9cb
[PATCH V5 0/2] sched: cpufreq: Allow remote callbacks
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into cpufreq governors are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run the cpufreq governor for some time. One testcase [1] to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that the new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. But because of the above mentioned limitation though, this does not occur. This series updates the scheduler core to call the cpufreq callbacks for remote CPUs as well and updates the registered hooks to handle that. This is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation. The reason being that this patch only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests: - Task is migrated to another CPU. - The task has high demand, and should take the target CPU to higher OPPs. - And the target CPU doesn't call into the cpufreq governor until the next tick. Rebased over: pm/linux-next V4->V5: - Drop cpu field from "struct update_util_data" and add it in "struct sugov_cpu" instead. - Can't have separate patches now because of the above change and so merged all the patches from V4 into a single patch. - Add a comment suggested by PeterZ. - Commit log of 1/2 is improved to contain more details. - A new patch (which was posted during V1) is also added to take care of platforms where any CPU can do DVFS on behalf of any other CPU, even if they are part of different cpufreq policies. This has been requested by Saravana several times already and as the series is quite straight forward now, I decided to include it in. V3->V4: - Respect iowait boost flag and util updates for the all remote callbacks. - Minor updates in commit log of 2/3. V2->V3: - Rearranged/merged patches as suggested by Rafael (looks much better now) - Also handle new hook added to intel-pstate driver. - The final code remains the same as V2, except for the above hook. V1->V2: - Don't support remote callbacks for unshared cpufreq policies. - Don't support remote callbacks where local CPU isn't part of the target CPU's cpufreq policy. - Dropped dvfs_possible_from_any_cpu flag. -- viresh [1] http://pastebin.com/7LkMSRxE Viresh Kumar (2): sched: cpufreq: Allow remote cpufreq callbacks cpufreq: Process remote callbacks from any CPU if the platform permits drivers/cpufreq/cpufreq-dt.c | 1 + drivers/cpufreq/cpufreq_governor.c | 3 +++ drivers/cpufreq/intel_pstate.c | 8 include/linux/cpufreq.h| 23 +++ kernel/sched/cpufreq_schedutil.c | 31 ++- kernel/sched/deadline.c| 2 +- kernel/sched/fair.c| 8 +--- kernel/sched/rt.c | 2 +- kernel/sched/sched.h | 10 ++ 9 files changed, 70 insertions(+), 18 deletions(-) -- 2.13.0.71.gd7076ec9c9cb
[PATCH V5 1/2] sched: cpufreq: Allow remote cpufreq callbacks
With Android UI and benchmarks the latency of cpufreq response to certain scheduling events can become very critical. Currently, callbacks into cpufreq governors are only made from the scheduler if the target CPU of the event is the same as the current CPU. This means there are certain situations where a target CPU may not run the cpufreq governor for some time. One testcase to show this behavior is where a task starts running on CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the system is configured such that the new tasks should receive maximum demand initially, this should result in CPU0 increasing frequency immediately. But because of the above mentioned limitation though, this does not occur. This patch updates the scheduler core to call the cpufreq callbacks for remote CPUs as well. The schedutil, ondemand and conservative governors are updated to process cpufreq utilization update hooks called for remote CPUs where the remote CPU is managed by the cpufreq policy of the local CPU. The intel_pstate driver is updated to always reject remote callbacks. This is tested with couple of usecases (Android: hackbench, recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey board (64 bit octa-core, single policy). Only galleryfling showed minor improvements, while others didn't had much deviation. The reason being that this patch only targets a corner case, where following are required to be true to improve performance and that doesn't happen too often with these tests: - Task is migrated to another CPU. - The task has high demand, and should take the target CPU to higher OPPs. - And the target CPU doesn't call into the cpufreq governor until the next tick. Based on initial work from Steve Muckle. Signed-off-by: Viresh Kumar --- drivers/cpufreq/cpufreq_governor.c | 3 +++ drivers/cpufreq/intel_pstate.c | 8 include/linux/cpufreq.h| 9 + kernel/sched/cpufreq_schedutil.c | 31 ++- kernel/sched/deadline.c| 2 +- kernel/sched/fair.c| 8 +--- kernel/sched/rt.c | 2 +- kernel/sched/sched.h | 10 ++ 8 files changed, 55 insertions(+), 18 deletions(-) diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index eed069ecfd5e..58d4f4e1ad6a 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -272,6 +272,9 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time, struct policy_dbs_info *policy_dbs = cdbs->policy_dbs; u64 delta_ns, lst; + if (!cpufreq_can_do_remote_dvfs(policy_dbs->policy)) + return; + /* * The work may not be allowed to be queued up right now. * Possible reasons: diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 8bc252512dbe..d9de01399dbb 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1747,6 +1747,10 @@ static void intel_pstate_update_util_pid(struct update_util_data *data, struct cpudata *cpu = container_of(data, struct cpudata, update_util); u64 delta_ns = time - cpu->sample.time; + /* Don't allow remote callbacks */ + if (smp_processor_id() != cpu->cpu) + return; + if ((s64)delta_ns < pid_params.sample_rate_ns) return; @@ -1764,6 +1768,10 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, struct cpudata *cpu = container_of(data, struct cpudata, update_util); u64 delta_ns; + /* Don't allow remote callbacks */ + if (smp_processor_id() != cpu->cpu) + return; + if (flags & SCHED_CPUFREQ_IOWAIT) { cpu->iowait_boost = int_tofp(1); } else if (cpu->iowait_boost) { diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 5f40522ec98c..b3b6e8203e82 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -562,6 +562,15 @@ struct governor_attr { size_t count); }; +static inline bool cpufreq_can_do_remote_dvfs(struct cpufreq_policy *policy) +{ + /* Allow remote callbacks only on the CPUs sharing cpufreq policy */ + if (cpumask_test_cpu(smp_processor_id(), policy->cpus)) + return true; + + return false; +} + /* * FREQUENCY TABLE HELPERS * */ diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 9deedd5f16a5..5465bf221e8f 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -52,6 +52,7 @@ struct sugov_policy { struct sugov_cpu { struct update_util_data update_util; struct sugov_policy *s
Re: [PATCH v2 05/11] net: stmmac: dwmac-rk: Add internal phy support
Hi Andrew, 在 2017/7/27 21:48, Andrew Lunn 写道: I think we need to discuss this. This PHY appears to be on an MDIO bus, it uses a standard PHY driver, and it appears to be using an RMII interface. So it is just an ordinary PHY. Internal is supposed to be something which is not ordinary, does not use one of the standard phy modes, needs something special to make it work. Yes, it is a ordinary PHY in fact, using MDIO bus, but it is a internal phy inside Soc, so the "internal" is not the internal as Florain said.
Re: [PATCH v7] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions
* Baoquan He wrote: > Currently KASLR will parse all e820 entries of RAM type and add all > candidate position into slots array. Then we will choose one slot > randomly as the new position which kernel will be decompressed into > and run at. > > On system with EFI enabled, e820 memory regions are coming from EFI > memory regions by combining adjacent regions. While these EFI memory > regions have more attributes to mark their different use. Mirror > attribute is such kind. The physical memory region whose descriptors > in EFI memory map has EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are > mirrored. The address range mirroring feature of kernel arranges such > mirror region into normal zone and other region into movable zone. And > with mirroring feature enabled, the code and date of kernel can only be > located in more reliable mirror region. However, the current KASLR code > doesn't check EFI memory entries, and could choose new position in > non-mirrored region. This will break the functionality of the address > range mirroring feature. > > So if EFI is detected, iterate EFI memory map and pick the mirror region > to process for adding candidate of randomization slot. If EFI is disabled > or no mirror region found, still process e820 memory map. > > Signed-off-by: Baoquan He > --- > v6->v7: > Ingo pointed out several incorrect line break issues and unclear > description of patch log. Correct them and rewrite patch log. > > And also rewrite the EFI warning message that if EFI memmap is above > 4G in 32bit system since 32bit system can not handle data above 4G at > kernel decompression stage. This is suggested by Ingo too. > > v5->v6: > Code style issue fix according to Kees's comment. > > This is based on tip/x86/boot, patch 1,2,3/4 in v5 post has > been put into tip/x86/boot now. > > arch/x86/boot/compressed/kaslr.c | 68 > ++-- > 1 file changed, 66 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/boot/compressed/kaslr.c > b/arch/x86/boot/compressed/kaslr.c > index 99c7194f7ea6..e1dd59ea8a8f 100644 > --- a/arch/x86/boot/compressed/kaslr.c > +++ b/arch/x86/boot/compressed/kaslr.c > @@ -37,7 +37,9 @@ > #include > #include > #include > +#include > #include > +#include > > /* Macros used by the included decompressor code below. */ > #define STATIC > @@ -558,6 +560,65 @@ static void process_mem_region(struct mem_vector *entry, > } > } > > +#ifdef CONFIG_EFI > +/* > + * Returns true if mirror region found (and must have been processed > + * for slots adding) > + */ > +static bool > +process_efi_entries(unsigned long minimum, unsigned long image_size) > +{ > + struct efi_info *e = &boot_params->efi_info; > + bool efi_mirror_found = false; > + struct mem_vector region; > + efi_memory_desc_t *md; > + unsigned long pmap; > + char *signature; > + u32 nr_desc; > + int i; > + > + signature = (char *)&e->efi_loader_signature; > + if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) && > + strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) > + return false; > + > +#ifdef CONFIG_X86_32 > + /* Can't handle data above 4GB at this time */ > + if (e->efi_memmap_hi) { > + warn("EFI memmap is above 4GB, can't be handled now on x86_32. > EFI should be disabled.\n"); > + return false; > + } > + pmap = e->efi_memmap; > +#else > + pmap = (e->efi_memmap | ((__u64)e->efi_memmap_hi << 32)); > +#endif > + > + nr_desc = e->efi_memmap_size / e->efi_memdesc_size; > + for (i = 0; i < nr_desc; i++) { > + md = (efi_memory_desc_t *)(pmap + (i * e->efi_memdesc_size)); > + if (md->attribute & EFI_MEMORY_MORE_RELIABLE) { > + region.start = md->phys_addr; > + region.size = md->num_pages << EFI_PAGE_SHIFT; > + process_mem_region(®ion, minimum, image_size); > + efi_mirror_found = true; > + > + if (slot_area_index == MAX_SLOT_AREA) { > + debug_putstr("Aborted EFI scan (slot_areas > full)!\n"); > + break; > + } > + } > + } So I suggested this before: if you treat 'md' as an array of elements (which it is), then the type cast can be moved to a more natural point, where we do address calculations anyway: md = (efi_memory_desc_t *)(e->efi_memmap | ((__u64)e->efi_memmap_hi << 32))); The 'pmap' variable can be eliminated and all places in the loop that use 'md->' can use 'md[i].'. Thanks, Ingo
Re: [PATCH v3] cpu_pm: replace raw_notifier to atomic_notifier
* Alex Shi [170727 18:42]: > Tony Lezcano found a miss use that rcu_read_lock used after rcu_idle_enter > Paul E. McKenney suggested trying RCU_NONIDLE. Hmm I think you have a hybrid name typo above in case you mean me :) Tony
[PATCH 0/3] fix several TLB batch races
Nadav and Mel founded several subtle races caused by TLB batching. This patchset aims for solving thoses problems using embedding [set|clear]_tlb_flush_pending to TLB batching API. With that, places to know TLB flush pending catch it up by using mm_tlb_flush_pending. Each patch includes detailed description. This patchset is based on v4.13-rc2-mmots-2017-07-26-16-16 + revert: "mm: prevent racy access to tlb_flush_pending" + adding: "[PATCH v3 0/2] mm: fixes of tlb_flush_pending races". Minchan Kim (3): mm: make tlb_flush_pending global mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem mm: fix KSM data corruption arch/arm/include/asm/tlb.h | 15 ++- arch/ia64/include/asm/tlb.h | 12 arch/s390/include/asm/tlb.h | 15 +++ arch/sh/include/asm/tlb.h | 4 +++- arch/um/include/asm/tlb.h | 8 fs/proc/task_mmu.c | 4 +++- include/linux/mm_types.h| 22 +- kernel/fork.c | 2 -- mm/debug.c | 2 -- mm/ksm.c| 3 ++- mm/memory.c | 24 11 files changed, 74 insertions(+), 37 deletions(-) -- 2.7.4
[PATCH 3/3] mm: fix KSM data corruption
Nadav reported KSM can corrupt the user data by the TLB batching race[1]. That means data user written can be lost. Quote from Nadav Amit " For this race we need 4 CPUs: CPU0: Caches a writable and dirty PTE entry, and uses the stale value for write later. CPU1: Runs madvise_free on the range that includes the PTE. It would clear the dirty-bit. It batches TLB flushes. CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We care about the fact that it clears the PTE write-bit, and of course, batches TLB flushes. CPU3: Runs KSM. Our purpose is to pass the following test in write_protect_page(): if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) Since it will avoid TLB flush. And we want to do it while the PTE is stale. Later, and before replacing the page, we would be able to change the page. Note that all the operations the CPU1-3 perform canhappen in parallel since they only acquire mmap_sem for read. We start with two identical pages. Everything below regards the same page/PTE. CPU0CPU1CPU2CPU3 Write the same value on page [cache PTE as dirty in TLB] MADV_FREE pte_mkclean() 4 > clear_refs pte_wrprotect() write_protect_page() [ success, no flush ] pages_indentical() [ ok ] Write to page different value [Ok, using stale PTE] replace_page() Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0 already wrote on the page, but KSM ignored this write, and it got lost. " In above scenario, MADV_FREE is fixed by changing TLB batching API including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part. This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm and KSM checks pending TLB flush by using mm_tlb_flush_pending so that it will flush TLB to avoid data lost if there are other parallel threads pending TLB flush. [1] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com Note: I failed to reproduce this problem through Nadav's test program which need to tune timing in my system speed so didn't confirm it work. Nadav, Could you test this patch on your test machine? Thanks! Cc: Nadav Amit Cc: Mel Gorman Cc: Hugh Dickins Cc: Andrea Arcangeli Signed-off-by: Minchan Kim --- fs/proc/task_mmu.c | 4 +++- mm/ksm.c | 3 ++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 35be35e05153..583fc50eb36d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1019,6 +1019,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, enum clear_refs_types type; int itype; int rv; + struct mmu_gather tlb; memset(buffer, 0, sizeof(buffer)); if (count > sizeof(buffer) - 1) @@ -1063,6 +1064,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, } down_read(&mm->mmap_sem); + tlb_gather_mmu(&tlb, mm, 0, -1); if (type == CLEAR_REFS_SOFT_DIRTY) { for (vma = mm->mmap; vma; vma = vma->vm_next) { if (!(vma->vm_flags & VM_SOFTDIRTY)) @@ -1084,7 +1086,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf, walk_page_range(0, mm->highest_vm_end, &clear_refs_walk); if (type == CLEAR_REFS_SOFT_DIRTY) mmu_notifier_invalidate_range_end(mm, 0, -1); - flush_tlb_mm(mm); + tlb_finish_mmu(&tlb, 0, -1); up_read(&mm->mmap_sem); out_mm: mmput(mm); diff --git a/mm/ksm.c b/mm/ksm.c index 4dc92f138786..d3b1c70aac18 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -1038,7 +1038,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page, goto out_unlock; if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || - (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) { + (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)) || + mm_tlb_flush_pending(mm, true)) { pte_t entry; swapped = PageSwapCache(page); -- 2.7.4
[PATCH 2/3] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. Quote from Mel Gorman "The race in question is CPU 0 running madv_free and updating some PTEs while CPU 1 is also running madv_free and looking at the same PTEs. CPU 1 may have writable TLB entries for a page but fail the pte_dirty check (because CPU 0 has updated it already) and potentially fail to flush. Hence, when madv_free on CPU 1 returns, there are still potentially writable TLB entries and the underlying PTE is still present so that a subsequent write does not necessarily propagate the dirty bit to the underlying PTE any more. Reclaim at some unknown time at the future may then see that the PTE is still clean and discard the page even though a write has happened in the meantime. I think this is possible but I could have missed some protection in madv_free that prevents it happening." This patch aims for solving both problems all at once and is ready for other problem with KSM, MADV_FREE and soft-dirty story[3]. TLB batch API(tlb_[gather|finish]_mmu] uses [set|clear]_tlb_flush_pending and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch there are parallel threads going on. In that case, flush TLB to prevent for user to access memory via stale TLB entry although it fail to gather pte entry. I confiremd this patch works with [4] test program Nadav gave so this patch supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" in current mmotm. NOTE: This patch modifies arch-specific TLB gathering interface(x86, ia64, s390, sh, um). It seems most of architecture are straightforward but s390 need to be careful because tlb_flush_mmu works only if mm->context.flush_mm is set to non-zero which happens only a pte entry really is cleared by ptep_get_and_clear and friends. However, this problem never changes the pte entries but need to flush to prevent memory access from stale tlb. Any thoughts? [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzz...@techsingularity.net [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrf...@suse.de [3] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com [4] https://patchwork.kernel.org/patch/9861621/ Cc: Ingo Molnar Cc: x...@kernel.org Cc: Russell King Cc: linux-arm-ker...@lists.infradead.org Cc: Tony Luck Cc: linux-i...@vger.kernel.org Cc: Martin Schwidefsky Cc: "David S. Miller" Cc: Heiko Carstens Cc: linux-s...@vger.kernel.org Cc: Yoshinori Sato Cc: linux...@vger.kernel.org Cc: Jeff Dike Cc: user-mode-linux-de...@lists.sourceforge.net Cc: linux-a...@vger.kernel.org Cc: Nadav Amit Reported-by: Mel Gorman Signed-off-by: Minchan Kim --- arch/arm/include/asm/tlb.h | 15 ++- arch/ia64/include/asm/tlb.h | 12 arch/s390/include/asm/tlb.h | 15 +++ arch/sh/include/asm/tlb.h | 4 +++- arch/um/include/asm/tlb.h | 8 include/linux/mm_types.h| 7 +-- mm/memory.c | 24 7 files changed, 69 insertions(+), 16 deletions(-) diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h index 3f2eb76243e3..8c26961f0503 100644 --- a/arch/arm/include/asm/tlb.h +++ b/arch/arm/include/asm/tlb.h @@ -163,13 +163,26 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start #ifdef CONFIG_HAVE_RCU_TABLE_FREE tlb->batch = NULL; #endif + set_tlb_flush_pending(tlb->mm); } static inline void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) { - tlb_flush_mmu(tlb); + /* +* If there are parallel threads are doing PTE changes on same range +* under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB +* flush by batching, a thread has stable TLB entry can fail to flush +* the TLB by observing pte_none|!pte_dirty, for example so flush TLB +* if we detect parallel PTE batching threads. +*/ + if (mm_tlb_flush_pending(tlb->mm, false) > 1) { + tlb->range_start = start; + tlb->range_end = end; + } + tlb_flush_mmu(tlb); + clear_tlb_flush_pending(tlb->mm); /* keep the page table cache within bounds */ check_pgt_cache(); diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h index fced197b9626..22fe976a4693 100644 --- a/arch/ia64/include/asm/tlb.h +++ b/arch/ia64/include/asm/tlb.h @@ -178,6 +178,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start tlb->start = start; tlb->end = end; tlb->start_addr = ~0UL; + set_tlb_flush_pending(tlb->mm); } /* @@ -188,10 +189,21 @@ static inline void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) { /* +* If there are parallel threads are doing PTE changes on same range +* under non-exclusive lock(e.g., m
[PATCH 1/3] mm: make tlb_flush_pending global
Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING| COMPACTION] but upcoming patches to solve subtle TLB flush bacting problem will use it regardless of compaction/numa so this patch doesn't remove the dependency. Cc: Nadav Amit Cc: Mel Gorman Signed-off-by: Minchan Kim --- include/linux/mm_types.h | 15 --- kernel/fork.c| 2 -- mm/debug.c | 2 -- 3 files changed, 19 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4b9a625c370c..6953d2c706fe 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -487,14 +487,12 @@ struct mm_struct { /* numa_scan_seq prevents two threads setting pte_numa */ int numa_scan_seq; #endif -#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) /* * An operation with batched TLB flushing is going on. Anything that * can move process memory needs to flush the TLB when moving a * PROT_NONE or PROT_NUMA mapped page. */ atomic_t tlb_flush_pending; -#endif #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH /* See flush_tlb_batched_pending() */ bool tlb_flush_batched; @@ -522,7 +520,6 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) return mm->cpu_vm_mask_var; } -#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) /* * Memory barriers to keep this state in sync are graciously provided by * the page table locks, outside of which no page table modifications happen. @@ -565,18 +562,6 @@ static inline void clear_tlb_flush_pending(struct mm_struct *mm) smp_mb__before_atomic(); atomic_dec(&mm->tlb_flush_pending); } -#else -static inline bool mm_tlb_flush_pending(struct mm_struct *mm, bool pt_locked) -{ - return false; -} -static inline void set_tlb_flush_pending(struct mm_struct *mm) -{ -} -static inline void clear_tlb_flush_pending(struct mm_struct *mm) -{ -} -#endif struct vm_fault; diff --git a/kernel/fork.c b/kernel/fork.c index aaf4d70afd8b..7e9f42060976 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -807,9 +807,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm_init_aio(mm); mm_init_owner(mm, p); mmu_notifier_mm_init(mm); -#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) atomic_set(&mm->tlb_flush_pending, 0); -#endif #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS mm->pmd_huge_pte = NULL; #endif diff --git a/mm/debug.c b/mm/debug.c index d70103bb4731..18a9b15b1e37 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -158,9 +158,7 @@ void dump_mm(const struct mm_struct *mm) #ifdef CONFIG_NUMA_BALANCING mm->numa_next_scan, mm->numa_scan_offset, mm->numa_scan_seq, #endif -#if defined(CONFIG_NUMA_BALANCING) || defined(CONFIG_COMPACTION) atomic_read(&mm->tlb_flush_pending), -#endif mm->def_flags, &mm->def_flags ); } -- 2.7.4
[PATCH] ARM: rockchip: enable ZONE_DMA for non 64-bit capable peripherals
Most IP cores on ARM Rockchip platforms can only address 32 bits of physical memory for DMA. Thus ZONE_DMA should be enabled when LPAE is activated. Signed-off-by: Tao Huang --- arch/arm/mach-rockchip/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/mach-rockchip/Kconfig b/arch/arm/mach-rockchip/Kconfig index 9ad84cd01ba0..fafd3d7f9f8c 100644 --- a/arch/arm/mach-rockchip/Kconfig +++ b/arch/arm/mach-rockchip/Kconfig @@ -16,6 +16,7 @@ config ARCH_ROCKCHIP select ROCKCHIP_TIMER select ARM_GLOBAL_TIMER select CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK + select ZONE_DMA if ARM_LPAE help Support for Rockchip's Cortex-A9 Single-to-Quad-Core-SoCs containing the RK2928, RK30xx and RK31xx series. -- 2.7.4
Re: [RFC PATCH 0/8] EDAC, mce_amd: Add a tracepoint for the decoded error
* Borislav Petkov wrote: > BUT(!), I just realized, I *think* I can address this much more elegantly: > extend trace_mce_record() by adding the decoded string as its last argument. > And > that's fine, I'm being told, because adding arguments to the tracepoints is > not > a big deal, removing them is hard. And actually, we have added args before, > come > to think of it: Yeah, structured, append-only ABIs are elegant - that's what perf uses too. > I agree with the rest but you're obviously preaching to the choir. > > :-) Had do vent my (non kernel tree integrated) Linux tooling frustration!! ;-) Thanks, Ingo
Re: [PATCH -tip v5 1/2] irq: Introduce CONFIG_IRQENTRY kconfig
* Masami Hiramatsu wrote: > Introduce CONFIG_IRQENTRY to simplify generating > irqentry and softirqentry text sections. > Currently generating those sections depends on > CONFIG_FUNCTION_GRAPH_TRACER and/or CONFIG_KASAN, in > each source code. This moves those #ifdef dependencies > into Kconfig, instead of the actual code. This makes > it scalable for other user of irqentry section. Please just make it unconditional. That would remove a number of messy #ifdefs, and extra sections/symbols don't have much if any cost. Thanks, Ingo
[PATCH 2/2] init/main.c: Fixes quoted string split across lines & missing blank line after declaration
Signed-off-by: Janani S --- init/main.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/init/main.c b/init/main.c index f8eb4966..920b829 100644 --- a/init/main.c +++ b/init/main.c @@ -176,6 +176,7 @@ static bool __init obsolete_checksetup(char *line) p = __setup_start; do { int n = strlen(p->str); + if (parameqn(line, p->str, n)) { if (p->early) { /* Already done in parse_early_param? @@ -302,6 +303,7 @@ static int __init unknown_bootoption(char *param, char *val, if (val) { /* Environment option */ unsigned int i; + for (i = 0; envp_init[i]; i++) { if (i == MAX_INIT_ENVS) { panic_later = "env"; @@ -314,6 +316,7 @@ static int __init unknown_bootoption(char *param, char *val, } else { /* Command line option */ unsigned int i; + for (i = 0; argv_init[i]; i++) { if (i == MAX_INIT_ARGS) { panic_later = "init"; @@ -1020,7 +1023,7 @@ static int __ref kernel_init(void *unused) !try_to_run_init_process("/bin/sh")) return 0; - panic("No working init found. Try passing init= option to kernel. " + panic("No working init found. Try passing init= option to kernel.\n" "See Linux Documentation/admin-guide/init.rst for guidance."); } -- 1.9.1
Re: [RFC PATCH 3/3] mm: shm: Use new hugetlb size encoding definitions
On Thu 27-07-17 14:18:11, Mike Kravetz wrote: > On 07/27/2017 12:50 AM, Michal Hocko wrote: > > On Wed 26-07-17 10:39:30, Mike Kravetz wrote: > >> On 07/26/2017 03:07 AM, Michal Hocko wrote: > >>> On Wed 26-07-17 11:53:38, Michal Hocko wrote: > On Mon 17-07-17 15:28:01, Mike Kravetz wrote: > > Use the common definitions from hugetlb_encode.h header file for > > encoding hugetlb size definitions in shmget system call flags. In > > addition, move these definitions to the from the internal to user > > (uapi) header file. > > s@to the from@from@ > > > > > Suggested-by: Matthew Wilcox > > Signed-off-by: Mike Kravetz > > with s@HUGETLB_FLAG_ENCODE__16GB@HUGETLB_FLAG_ENCODE_16GB@ > > Acked-by: Michal Hocko > >>> > >>> Btw. man page mentions only 2MB and 1GB, we should document others and > >>> note that each arch might support only subset of them > >> > >> Thanks for looking at these Michal. > >> BTW, those definitions below are wrong. They should be SHM_HUGE_*. :( > > > > Ups, and I completely missed that. > > > >> In the overview of this RFC, I mentioned still needing to address the > >> comment from Aneesh about splitting SHM_HUGE_* definitions into arch > >> specific header files. This is how it is done for mmap. If an arch > >> supports multiple huge page sizes, the 'asm/mman.h' contains definitions > >> for those sizes. There will be a bit of churn (such as header file > >> renaming) to do this for shm as well. So, I keep going back and forth > >> asking myself 'is it worth it'? > > > > Why cannot we use a generic header? Btw. I think it would be better for > > MMAP definitions as well. > > I assume you are asking about a uapi asm-generic header file? Currently > mmap has two such files: mman.h and mman-common.h. In order to get the > definitions in such files, arch specific header files must #include the > asm-generic headers. There are arch specific mmap headers today that do > not include either of the asm-generic headers. And, they have their own > definitions for MAP_HUGE_SHIFT. So, it seems we can not use one of the > existing mmap asm-generic header files. Rather, we would need to create > a new one and have that included by all arch specific files. yes, add a new one like you did in your first patch > However, ALL the MAP_HUGE_* definitions in all the arch specific and > asm-generic header files are the same. It would be possible to just put > all those MAP_HUGE_* definitions in the primary uapi header file > (include/uapi/linux/mman.h). If there was ever a need for arch specific > values in the future, we could split them out at that time. agreed [...] > >> - Another alternative is to make all known huge page sizes available > >> to all users. This is 'easier' as the definitions can likely reside > >> in a common header file. The user will need to determine what > >> huge page sizes are supported by the running kernel as mentioned in > >> the man page. > > > > yes I think this makes more sense. > > Ok, thanks. > > The only remaining question is what kind of common header to use: > 1) An asm-generic header file in case there may be arch specific differences >in the future. > 2) Use the primary uapi header file in include/uapi/linux/mman|shm.h. I would use the primary one and only got the arch specific if we ever need to do arch specific thing. -- Michal Hocko SUSE Labs
Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
On 28/07/2017 04:31, Longpeng (Mike) wrote: > Hi Paolo, > > On 2017/6/6 18:57, Paolo Bonzini wrote: > >> In some cases, for example involving hot-unplug of assigned >> devices, pi_post_block can forget to remove the vCPU from the >> blocked_vcpu_list. When this happens, the next call to >> pi_pre_block corrupts the list. >> >> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block >> and WARN instead of adding the element twice in the list. Second, >> always do the list removal in pi_post_block if vcpu->pre_pcpu is >> set (not -1). >> >> The new code keeps interrupts disabled for the whole duration of >> pi_pre_block/pi_post_block. This is not strictly necessary, but >> easier to follow. For the same reason, PI.ON is checked only >> after the cmpxchg, and to handle it we just call the post-block >> code. This removes duplication of the list removal code. >> >> Cc: Longpeng (Mike) >> Cc: Huangweidong >> Cc: Gonglei >> Cc: wangxin >> Cc: Radim Krčmář >> Signed-off-by: Paolo Bonzini >> --- >> arch/x86/kvm/vmx.c | 62 >> ++ >> 1 file changed, 25 insertions(+), 37 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 747d16525b45..0f4714fe4908 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) >> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu); >> struct pi_desc old, new; >> unsigned int dest; >> -unsigned long flags; >> >> do { >> old.control = new.control = pi_desc->control; >> +WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR, >> + "Wakeup handler not enabled while the VCPU is blocked\n"); >> >> dest = cpu_physical_id(vcpu->cpu); >> >> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu) >> } while (cmpxchg(&pi_desc->control, old.control, >> new.control) != old.control); >> >> -if(vcpu->pre_pcpu != -1) { >> -spin_lock_irqsave( >> -&per_cpu(blocked_vcpu_on_cpu_lock, >> -vcpu->pre_pcpu), flags); >> +if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) { > > > __pi_post_block is only called by pi_post_block/pi_pre_block now, it seems > that > both of them would make sure "vcpu->pre_pcpu != -1" before __pi_post_block is > called, so maybe the above check is useless, right? It's because a WARN is better than a double-add. And even if the caller broke the invariant you'd have to do the cmpxchg loop above to make things not break too much. Paolo
Re: [PATCH v1 2/2] acpi, x86: Remove encryption mask from ACPI page protection type
* Tom Lendacky wrote: > > > + * in memory in an encrypted state so return a protection attribute > > > + * that does not have the encryption bit set. > > >*/ > > > - return PAGE_KERNEL; > > > + return sme_active() ? PAGE_KERNEL_IO : PAGE_KERNEL; > > > > Why isn't there a PAGE_KERNEL_NOENC define which you can simply return > > instead of testing? > > Sounds like something I should add to pgtable_types.h (which has a > #define for PAGE_KERNEL_EXEC_NOENC, but not PAGE_KERNEL_NOENC). I'll > create that #define. > > As for the sme_active() check I was getting ahead of myself since > under SEV the encryption mask is needed. I'll change it to just > return PAGE_KERNEL_NOENC and then worry about the SEV change in > the SEV patches. Ok, that works for me too, as we at least don't sprinke the code with repeated sme_active() toggles. Thanks, Ingo
Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap
On Thu 27-07-17 16:55:59, Andrea Arcangeli wrote: > On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote: > > Yes this will work and it won't depend on the oom_lock. But isn't it > > just more ugly than simply doing > > > > if (tsk_is_oom_victim) { > > down_write(&mm->mmap_sem); > > locked = true; > > } > > free_pgtables(...) > > [...] > > if (locked) > > down_up(&mm->mmap_sem); > > To me not doing if (tsk_is_oom...) { down_write; up_write } is by > default a confusing implementation, because it's not strict and not > strict code is not self documenting and you've to think twice of why > you're doing something the way you're doing it. I disagree. But this is a matter of taste, I guess. I prefer when locking is explicit and clear about the scope. An empty locked region used for synchronization is just obscure and you have to think much more what it means when you can see what the lock actually protects. It is also less error prone I believe. > The doubt on what was the point to hold the mmap_sem during > free_pgtables is precisely why I started digging into this issue > because it didn't look possible you could truly benefit from holding > the mmap_sem during free_pgtables. The point is that you cannot remove page tables while somebody is walking them. This is enforced in all other paths except for exit which was kind of special because nobody could race there. > I also don't like having a new invariant that your solution relies on, > that is mm->mmap = NULL, when we can make just set the MMF_OOM_SKIP a > bit earlier that it gets set anyway and use that to control the other > side of the race. Well, again a matter of taste. I prefer making it clear that mm->mmap is no longer valid because it points to a freed pointer. Relying on MMF_OOM_SKIP for the handshake is just abusing the flag for a different purpose than it was intended. > I like strict code that uses as fewer invariants as possible and that > never holds a lock for any instruction more than it is required (again > purely for self documenting reasons, the CPU won't notice much one > instruction more or less). > > Even with your patch the two branches are unnecessary, that may not be > measurable, but it's still wasted CPU. It's all about setting mm->mmap > before the up_write. In fact my patch should at least put an incremental > unlikely around my single branch added to exit_mmap. > > I see the {down_write;up_write} Hugh's ksm_exit-like as a strict > solution to this issue and I wrote it specifically while trying to > research a way to be more strict because from the start it didn't look > the holding of the mmap_sem during free_pgtables was necessary. While we have discussed that with Hugh he has shown an interest to actually get rid of this (peculiar) construct which would be possible if the down_write was implicit in exit_mmap [1]. > I'm also fine to drop the oom_lock but I think it can be done > incrementally as it's a separate issue, my second patch should allow > for it with no adverse side effects. > > All I care about is the exit_mmap path because it runs too many times > not to pay deep attention to every bit of it ;). Well, all I care about is to fix this over-eager oom killing due to oom reaper setting MMF_OOM_SKIP too early. That is a regression. I think we can agree that both proposed solutions are functionally equivalent. I do not want to push mine too hard so if you _believe_ that yours is really better feel free to post it for inclusion. [1] http://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils -- Michal Hocko SUSE Labs
Re: blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage
Jens Axboe writes: > On 07/27/2017 08:47 AM, Bart Van Assche wrote: >> On Thu, 2017-07-27 at 08:02 -0600, Jens Axboe wrote: >>> The bug looks like SCSI running the queue inline from IRQ >>> context, that's not a good idea. ... >> >> scsi_run_queue() works fine if no scheduler is configured. Additionally, that >> code predates the introduction of blk-mq I/O schedulers. I think it is >> nontrivial for block driver authors to figure out that a queue has to be run >> from process context if a scheduler has been configured that does not support >> to be run from interrupt context. > > No it doesn't, you could never run the queue from interrupt context with > async == false. So I don't think that's confusing at all, you should > always be aware of the context. > >> How about adding WARN_ON_ONCE(in_interrupt()) to >> blk_mq_start_hw_queue() or replacing the above patch by the following: > > No, I hate having dependencies like that, because they always just catch > one of them. Looks like the IPR path that hits this should just offload > to a workqueue or similar, you don't have to make any scsi_run_queue() > async. OK, so the resolution is "fix it in IPR" ? cheers
Re: [PATCH v2] qe: fix compile issue for arm64
Zhao Qiang writes: > Signed-off-by: Zhao Qiang > --- > Changes for v2: > - include all Errata QE_General4 in #ifdef > > drivers/soc/fsl/qe/qe.c | 2 ++ > 1 file changed, 2 insertions(+) AFAICS this driver can only be built on PPC, what am I missing? config QUICC_ENGINE bool "Freescale QUICC Engine (QE) Support" depends on FSL_SOC && PPC32 cheers
[PATCH] init:main.c: Fixed issues in Block comments and removed braces in single statement if block
Signed-off-by: Janani S --- init/main.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/init/main.c b/init/main.c index 052481f..f8eb4966 100644 --- a/init/main.c +++ b/init/main.c @@ -181,7 +181,8 @@ static bool __init obsolete_checksetup(char *line) /* Already done in parse_early_param? * (Needs exact match on param part). * Keep iterating, as we can have early -* params and __setups of same names 8( */ +* params and __setups of same names +*/ if (line[n] == '\0' || line[n] == '=') had_early_param = true; } else if (!p->setup_func) { @@ -693,9 +694,9 @@ asmlinkage __visible void __init start_kernel(void) arch_post_acpi_subsys_init(); sfi_init_late(); - if (efi_enabled(EFI_RUNTIME_SERVICES)) { + if (efi_enabled(EFI_RUNTIME_SERVICES)) efi_free_boot_services(); - } + /* Do the rest non-__init'ed, we're now alive */ rest_init(); -- 1.9.1
[PATCH v6.1 4/7] drm/rockchip: vop: group vop registers
Grouping the vop registers facilitates make register definition clearer, and also is useful for different vop reuse the same group register. Signed-off-by: Mark Yao Reviewed-by: Jeffy Chen --- Changes in v6.1 - fix Null pointer crash on vop_reg_set drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 101 - drivers/gpu/drm/rockchip/rockchip_drm_vop.h | 60 --- drivers/gpu/drm/rockchip/rockchip_vop_reg.c | 112 +++- 3 files changed, 146 insertions(+), 127 deletions(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index fd47da5..39912f2 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -42,30 +42,19 @@ #include "rockchip_drm_psr.h" #include "rockchip_drm_vop.h" -#define REG_SET(x, base, reg, v) \ - vop_mask_write(x, base + reg.offset, reg.mask, reg.shift, \ - v, reg.write_mask, reg.relaxed) -#define REG_SET_MASK(x, base, reg, mask, v) \ - vop_mask_write(x, base + reg.offset, \ - mask, reg.shift, v, reg.write_mask, reg.relaxed) - #define VOP_WIN_SET(x, win, name, v) \ - REG_SET(x, win->base, win->phy->name, v) + vop_reg_set(vop, &win->phy->name, win->base, ~0, v, #name) #define VOP_SCL_SET(x, win, name, v) \ - REG_SET(x, win->base, win->phy->scl->name, v) + vop_reg_set(vop, &win->phy->scl->name, win->base, ~0, v, #name) #define VOP_SCL_SET_EXT(x, win, name, v) \ - REG_SET(x, win->base, win->phy->scl->ext->name, v) -#define VOP_CTRL_SET(x, name, v) \ - REG_SET(x, 0, (x)->data->ctrl->name, v) - -#define VOP_INTR_GET(vop, name) \ - vop_read_reg(vop, 0, &vop->data->ctrl->name) - -#define VOP_INTR_SET(vop, name, v) \ - REG_SET(vop, 0, vop->data->intr->name, v) + vop_reg_set(vop, &win->phy->scl->ext->name, \ + win->base, ~0, v, #name) #define VOP_INTR_SET_MASK(vop, name, mask, v) \ - REG_SET_MASK(vop, 0, vop->data->intr->name, mask, v) + vop_reg_set(vop, &vop->data->intr->name, 0, mask, v, #name) + +#define VOP_REG_SET(vop, group, name, v) \ + vop_reg_set(vop, &vop->data->group->name, 0, ~0, v, #name) #define VOP_INTR_SET_TYPE(vop, name, type, v) \ do { \ @@ -82,7 +71,7 @@ vop_get_intr_type(vop, &vop->data->intr->name, type) #define VOP_WIN_GET(x, win, name) \ - vop_read_reg(x, win->base, &win->phy->name) + vop_read_reg(x, win->offset, win->phy->name) #define VOP_WIN_GET_YRGBADDR(vop, win) \ vop_readl(vop, win->base + win->phy->yrgb_mst.offset) @@ -164,14 +153,22 @@ static inline uint32_t vop_read_reg(struct vop *vop, uint32_t base, return (vop_readl(vop, base + reg->offset) >> reg->shift) & reg->mask; } -static inline void vop_mask_write(struct vop *vop, uint32_t offset, - uint32_t mask, uint32_t shift, uint32_t v, - bool write_mask, bool relaxed) +static void vop_reg_set(struct vop *vop, const struct vop_reg *reg, + uint32_t _offset, uint32_t _mask, uint32_t v, + const char *reg_name) { - if (!mask) + int offset, mask, shift; + + if (!reg || !reg->mask) { + dev_dbg(vop->dev, "Warning: not support %s\n", reg_name); return; + } + + offset = reg->offset + _offset; + mask = reg->mask & _mask; + shift = reg->shift; - if (write_mask) { + if (reg->write_mask) { v = ((v << shift) & 0x) | (mask << (shift + 16)); } else { uint32_t cached_val = vop->regsbak[offset >> 2]; @@ -180,7 +177,7 @@ static inline void vop_mask_write(struct vop *vop, uint32_t offset, vop->regsbak[offset >> 2] = v; } - if (relaxed) + if (reg->relaxed) writel_relaxed(v, vop->regs + offset); else writel(v, vop->regs + offset); @@ -202,7 +199,7 @@ static inline uint32_t vop_get_intr_type(struct vop *vop, static inline void vop_cfg_done(struct vop *vop) { - VOP_CTRL_SET(vop, cfg_done, 1); + VOP_REG_SET(vop, common, cfg_done, 1); } static bool has_rb_swapped(uint32_t format) @@ -540,7 +537,7 @@ static int vop_enable(struct drm_crtc *crtc) spin_lock(&vop->reg_lock); - VOP_CTRL_SET(vop, standby, 0); + VOP_REG_SET(vop, common, standby, 1); spin_unlock(&vop->reg_lock); @@ -600,7 +597,7 @@ static void vop_crtc_disable(struct drm_crtc *crtc) spin_lock(&vop->reg_lock); - VOP_CTRL_SET(vop, standby, 1); + VOP_REG_SET(vop, common, standby, 1); spin_unlock(&vop->reg_lock); @@ -923,7 +920,7 @@ static void vop_crtc_e
RE: [PATCH] cpufreq: x86: Make scaling_cur_freq behave more as expected
On 2017.07.27 17:13 Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > After commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to > calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute > in sysfs only behaves as expected on x86 with APERF/MPERF registers > available when it is read from at least twice in a row. > > The value returned by the first read may not be meaningful, because > the computations in there use cached values from the previous > aperfmperf_snapshot_khz() call which may be stale. However, the > interface is expected to return meaningful values on every read, > including the first one. > > To address this problem modify arch_freq_get_on_cpu() to call > aperfmperf_snapshot_khz() twice, with a short delay between > these calls, if the previous invocation of aperfmperf_snapshot_khz() > was too far back in the past (specifically, more that 1s ago) and > adjust aperfmperf_snapshot_khz() for that. > > Fixes: f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz > using APERF/MPERF" > Reported-by: Doug Smythies > Signed-off-by: Rafael J. Wysocki > --- > arch/x86/kernel/cpu/aperfmperf.c | 36 +--- > 1 file changed, 29 insertions(+), 7 deletions(-) > > Index: linux-pm/arch/x86/kernel/cpu/aperfmperf.c ...[deleted the rest]... This proposed patch would be good. However, I can only try it maybe by Sunday. I think the maximum time span means that this code: /* * if (cpu_khz * aperf_delta) fits into ULLONG_MAX, then * khz = (cpu_khz * aperf_delta) / mperf_delta */ if (div64_u64(ULLONG_MAX, cpu_khz) > aperf_delta) s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); else/* khz = aperf_delta / (mperf_delta / cpu_khz) */ s->khz = div64_u64(aperf_delta, div64_u64(mperf_delta, cpu_khz)); Could be reduced to this: s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); Because it could never overflow anymore. ... Doug
Re: [PATCH 2/2] arm64: dts: Add device node for pmi8994 gpios
On Fri, Jul 28, 2017 at 2:30 AM, Stephen Boyd wrote: > On 07/26/2017 11:30 PM, Vivek Gautam wrote: >> Signed-off-by: Vivek Gautam >> --- >> arch/arm64/boot/dts/qcom/pmi8994.dtsi | 17 + >> 1 file changed, 17 insertions(+) >> >> diff --git a/arch/arm64/boot/dts/qcom/pmi8994.dtsi >> b/arch/arm64/boot/dts/qcom/pmi8994.dtsi >> index d3879a4e8076..3b04ca63c31e 100644 >> --- a/arch/arm64/boot/dts/qcom/pmi8994.dtsi >> +++ b/arch/arm64/boot/dts/qcom/pmi8994.dtsi >> @@ -8,6 +8,23 @@ >> reg = <0x2 SPMI_USID>; >> #address-cells = <1>; >> #size-cells = <0>; >> + >> + pmi8994_gpios: gpios@c000 { >> + compatible = "qcom,pmi8994-gpio"; > > > Please also include the generic "qcom,spmi-gpio" compatible. We really > don't need to add the "qcom,pmi8994-gpio" one to the driver right now so > I would leave that out of the driver but leave it here in the dtsi file > of course. Sure, will add the "qcom,spmi-gpio" compatible as well, and will drop the "qcom,pmi8994-gpio" from the driver. Thanks. regards Vivek > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, > a Linux Foundation Collaborative Project > > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH] device property: Fix usecount for of_graph_get_port_parent()
* Rob Herring [170727 15:19]: > On Thu, Jul 27, 2017 at 4:44 AM, Tony Lindgren wrote: > > --- a/drivers/of/property.c > > +++ b/drivers/of/property.c > > @@ -708,6 +708,15 @@ struct device_node *of_graph_get_port_parent(struct > > device_node *node) > > { > > unsigned int depth; > > > > + if (!node) > > + return NULL; > > + > > + /* > > +* Preserve usecount for passed in node as of_get_next_parent() > > +* will do of_node_put() on it. > > +*/ > > + of_node_get(node); > > I think this messes up of_graph_get_remote_port_parent(). First it > calls of_graph_get_remote_endpoint which returns the endpoint node > with ref count incremented. Then you are incrementing it again here. Hmm OK looks like I missed that one. If we want to have of_graph_get_port_parent not trash the node passed to it, we should just change things there too: struct device_node *of_graph_get_remote_port_parent( const struct device_node *node) { struct device_node *np, *pp; /* Get remote endpoint node. */ np = of_graph_get_remote_endpoint(node); pp = of_graph_get_port_parent(np); of_node_put(np); return pp; } EXPORT_SYMBOL(of_graph_get_remote_port_parent); Does that make sense to you? > > diff --git a/sound/soc/generic/audio-graph-card.c > > b/sound/soc/generic/audio-graph-card.c > > --- a/sound/soc/generic/audio-graph-card.c > > +++ b/sound/soc/generic/audio-graph-card.c > > @@ -224,7 +224,6 @@ static int asoc_graph_card_parse_of(struct > > graph_card_data *priv) > > > > of_for_each_phandle(&it, rc, node, "dais", NULL, 0) { > > ret = asoc_graph_card_dai_link_of(it.node, priv, idx++); > > - of_node_put(it.node); > > if (ret < 0) > > return ret; > > I think you need a put here. Do you mean on error it should be as below, right? ret = asoc_graph_card_dai_link_of(it.node, priv, idx++); if (ret < 0) { of_node_put(it.node); return ret; } Regards, Tony
Re: [Eas-dev] [PATCH V3 1/3] sched: cpufreq: Allow remote cpufreq callbacks
On 27-07-17, 12:55, Saravana Kannan wrote: > Yes. Simplifying isn't always about number of lines of code. It's also about > abstraction. Having generic scheduler code care about HW details doesn't > seem nice. I can argue that even the policy->cpus field is also hardware specific, isn't it ? And we are using that in the schedutil governor anyway. What's wrong with having another field (in a generic way) in the same structure that tells us more about hardware ? And then schedutil isn't really scheduler, but a cpufreq governor. Just like ondemand/conservative, which are also called from the same scheduler path. > It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in" > policy->cpus). > > Also, this is only for drivers that currently support fast switching. How > many of those do you have? Why? Why shouldn't we do that for the other drivers? I think it should be done across everyone. > >The core already has most of the data required and I believe that we > >need to handle it in the governor's code as is handled in this series. > > Clearly, it doesn't. You are just making assumptions about HW. So assuming that any CPU from a policy can change freq on behalf of all the CPUs of the same policy is wrong? That is the basis of how the cpufreq core is designed. -- viresh
Re: linux-next: Signed-off-by missing for commit in the renesas tree
Hi, On Fri, 28 Jul 2017 12:09:12 +0700 jmondi wrote: > > On Fri, Jul 28, 2017 at 09:21:08AM +1000, Stephen Rothwell wrote: > > > > Commit > > > > bb003371172f ("arm: dts: genmai: Add RIIC2 pin group") > > > > is missing a Signed-off-by from its committer. > > I do see my SOB in the patch I've sent. Any chance it has been dropped > while applying it? Should I resend? The commit is missing Simon's Signed-off-by ... -- Cheers, Stephen Rothwell
Re: [PATCH V3] mm/madvise: Enable (soft|hard) offline of HugeTLB pages at PGD level
On 07/28/2017 06:19 AM, Mike Kravetz wrote: > On 05/16/2017 03:05 AM, Anshuman Khandual wrote: >> Though migrating gigantic HugeTLB pages does not sound much like real >> world use case, they can be affected by memory errors. Hence migration >> at the PGD level HugeTLB pages should be supported just to enable soft >> and hard offline use cases. > > Hi Anshuman, > > Sorry for the late question, but I just stumbled on this code when > looking at something else. > > It appears the primary motivation for these changes is to handle > memory errors in gigantic pages. In this case, you migrate to Right. > another gigantic page. However, doesn't this assume that there is Right. > a pre-allocated gigantic page sitting unused that will be the target > of the migration? alloc_huge_page_node will not allocate a gigantic > page. Or, am I missing something? Yes, its in the context of 16GB pages on POWER8 system where all the gigantic pages are pre allocated from the platform and passed on to the kernel through the device tree. We dont allocate these gigantic pages on runtime.
Re: [PATCH V1 02/12] spmi: pmic-arb: rename pa_xx to pmic_arb_xx and other cleanup
On 2017-07-28 05:30, Stephen Boyd wrote: On 07/20, Kiran Gunda wrote: This patch cleans up the following. - Rename the "pa" to "pmic_arb". - Rename the spmi_pmic_arb *dev to spmi_pmic_arb *pmic_arb. - Rename the pa_{read,write}_data() functions to pmic_arb_{read,write}_data(). - Rename channel to APID. - Rename the HWIRQ_*() macros to hwirq_to_*(). Signed-off-by: Kiran Gunda --- Reviewed-by: Stephen Boyd Thanks. objdiff doesn't show any diff anymore so I'm much more confident this doesn't change any functionality. Thanks for introducing me a nice tool !
linux-next: Tree for Jul 28
Hi all, Changes since 20170727: Non-merge commits (relative to Linus' tree): 2843 2887 files changed, 101051 insertions(+), 48048 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig (with CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu. Below is a summary of the state of the merge. I am currently merging 267 trees (counting Linus' and 41 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (0ce2f3851193 Merge tag 'acpi-4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging fixes/master (b4b8cbf679c4 Cavium CNN55XX: fix broken default Kconfig entry) Merging kbuild-current/fixes (ad8181060788 kconfig: fix sparse warnings in nconfig) Merging arc-current/for-curr (37f1db0e85ff ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103) Merging arm-current/fixes (ce184a0dee92 ARM: 8687/1: signal: Fix unparseable iwmmxt_sigframe in uc_regspace[]) Merging m68k-current/for-linus (204a2be30a7a m68k: Remove ptrace_signal_deliver) Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups) Merging powerpc-fixes/fixes (0da12a7a81f1 powerpc/mm/hash: Free the subpage_prot_table correctly) Merging sparc/master (8cd3ec51c0c3 sbus: Convert to using %pOF instead of full_name) Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and linking special files) Merging net/master (6b84202c946c sctp: fix the check for _sctp_walk_params and _sctp_walk_errors) Merging ipsec/master (e6194923237f esp: Fix memleaks on error paths.) Merging netfilter/master (f7fb77fc1235 netfilter: nft_compat: check extension hook mask only if set) Merging ipvs/master (3c5ab3f395d6 ipvs: SNAT packet replies only for NATed connections) Merging wireless-drivers/master (5f5d03143de5 brcmfmac: fix memleak due to calling brcmf_sdiod_sgtable_alloc() twice) Merging mac80211/master (d7f13f745036 cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES) Merging sound-current/for-linus (ba92b1142879 ALSA: hda - Add mute led support for HP ProBook 440 G4) Merging pci-current/for-linus (34d5ac2af644 PCI: rockchip: Check for pci_scan_root_bus_bridge() failure correctly) Merging driver-core.current/driver-core-linus (5771a8c08880 Linux v4.13-rc1) Merging tty.current/tty-linus (520eccdfe187 Linux 4.13-rc2) Merging usb.current/usb-linus (520eccdfe187 Linux 4.13-rc2) Merging usb-gadget-fixes/fixes (520eccdfe187 Linux 4.13-rc2) Merging usb-serial-fixes/usb-linus (9585e340db9f USB: serial: cp210x: add support for Qivicon USB ZigBee dongle) Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: check before accessing ci_role in ci_role_show) Merging phy/fixes (5771a8c08880 Linux v4.13-rc1) Merging staging.current/staging-linus (055655a9f0fe Merge tag 'iio-fixes-for-4.13a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus) Merging char-misc.current/char-misc-linus (520eccdfe187 Linux 4.13-rc2) Merging input-current/for-linus (293b915fd9be Input: trackpoint - assume 3 buttons when buttons detection fails) Merging crypto-current/master (41cdf7a45389 crypto: authencesn - Fix digest_null crash) Merging ide/master (921edf312a6a ide: avoid warning for timings calculation) Merging vfio-fixes/for-linus (bb67b496c338 include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH)
Re: [RFT v2 2/3] ASoC: samsung: Add missing prepare for iis clock of s3c24xx
On Fri, Jul 28, 2017 at 10:11:48AM +0530, Arvind Yadav wrote: > Hi, > > > On Thursday 27 July 2017 10:43 PM, Krzysztof Kozlowski wrote: > >The s3c_i2sv2_probe() only enabled iis clock. Missing prepare isn't > >probably fatal, because for SoC clocks this is usually no-op, but for > >correctness this clock should be prepared. > > > >Signed-off-by: Krzysztof Kozlowski > > > >--- > > > >Changes since v1: > >1. New patch > >--- > > sound/soc/samsung/s3c-i2s-v2.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/sound/soc/samsung/s3c-i2s-v2.c b/sound/soc/samsung/s3c-i2s-v2.c > >index 9b28046eea8e..3894bda06ebb 100644 > >--- a/sound/soc/samsung/s3c-i2s-v2.c > >+++ b/sound/soc/samsung/s3c-i2s-v2.c > >@@ -637,7 +637,7 @@ int s3c_i2sv2_probe(struct snd_soc_dai *dai, > > return -ENOENT; > > } > >-clk_enable(i2s->iis_pclk); > >+clk_prepare_enable(i2s->iis_pclk); > Please, handle are return value of clk_prepare_enble. Which is a different issue, different bug. We can fix it but it should be not mixed with this fix here. Best regards, Krzysztof
Re: [PATCH V1 06/12] spmi: pmic-arb: replace the writel_relaxed with __raw_writel
On 2017-07-28 05:31, Stephen Boyd wrote: On 07/20, Kiran Gunda wrote: Replace the writel_relaxed with __raw_writel to avoid byte swapping in pmic_arb_write_data() function. That way the code is independent of the CPU endianness. Signed-off-by: Kiran Gunda Reviewed-by: Stephen Boyd This also needs a Fixes tag. Fixes: 111a10bf3e53 ("spmi: pmic-arb: rename spmi_pmic_arb_dev to spmi_pmic_arb") Sure. will add the tag and send the patch.
Re: [PATCH V1 03/12] spmi: pmic-arb: clean up pmic_arb_find_apid function
On 2017-07-28 06:48, Stephen Boyd wrote: On 07/20, Kiran Gunda wrote: Clean up the pmic_arb_find_apid() by using the local variables to improve the code readability. Signed-off-by: Kiran Gunda --- Reviewed-by: Stephen Boyd One nit below: break; regval = readl_relaxed(pmic_arb->cnfg + - SPMI_OWNERSHIP_TABLE_REG(apid)); - pmic_arb->apid_data[apid].owner = - SPMI_OWNERSHIP_PERIPH2OWNER(regval); + SPMI_OWNERSHIP_TABLE_REG(apid)); This should be 7 spaces and not a tab? Originally looks like it was 6 spaces Will fix it and send the next patch.
Re: [PATCH V1 05/12] spmi: pmic-arb: fix memory allocation for mapping_table
On 2017-07-28 05:19, Stephen Boyd wrote: On 07/20, Kiran Gunda wrote: Allocate the correct memory size (max_pmic_peripherals) for the mapping_table that holds the apid to ppid mapping. Also use a local variable for mapping_table for better alignment of the code. Signed-off-by: Kiran Gunda This needs a Fixes: 987a9f128b8a ("spmi: pmic-arb: Support more than 128 peripherals") right? Yes. I will keep the tag and send it again.
Re: [PATCH V3 3/9] cpufreq: Cap the default transition delay value to 10 ms
+ IMX maintainers. On 27-07-17, 19:54, Leonard Crestez wrote: > On Wed, 2017-07-26 at 11:36 +0530, Viresh Kumar wrote: > > - Find how much time does it really take to change the frequency of > > the CPU. I don't really thing 109 us is the right transition > > latency. Use attached patch for that and look for the print message. > > Your patch measures latencies of around 2.5ms, but it can vary between > 1.6 ms to 3ms from boot-to-boot. This is a lot more than what the > driver reports. Most transitions seem to be faster. Wow !! I was pretty sure all these figures are just made up by some coder :) > I did a little digging and it seems that a majority of time is always > spent inside clk_pllv3_wait_lock which spins on a HW bit while doing > usleep_range(50, 500). I originally thought it was because of > regulators but the delays involved in that are smaller. > > Measuring wall time on a process that can sleep seems dubious, isn't > this vulnerable to random delays because of other tasks? I am not sure I understood that, sorry. > > Without this patch the sampling rate of ondemand governor will be 109 > > ms. And after this patch it would be capped at 10 ms. Why would that > > screw up anyone's setup ? I don't have an answer to that right now. > > On a closer look it seems that most of the time is actually spent at > low cpufreq though (90%+). > > Your change makes it so that even something like "sleep 1; cat > scaling_cur_freq" raises the frequency to the maximum. Why? > This happens > enough that even if you do it in a loop you will never see the minimum > frequency. It seems there is enough internal bookkeeping on such a > wakeup that it takes more than 10ms and enough for a reevaluation of > cpufreq until cat returns the value?! At this point I really feel that this is a hardware specific problem and it was working by chance until now. And I am not sure if we shouldn't be stopping this patch from getting merged just because of that. At least you can teach your distribution to go increase the sampling rate from userspace to make it all work. Can you try one more thing? Try using schedutil governor and see how it behaves ? > I found this by enabling the power:cpu_frequency tracepoint event and > checking for deltas with a script. Enabling CPU_FREQ_STAT show this: > > time_in_state: > > 396000 1609 So we still stay at the lowest frequency most of the time. > 792000 71 > 996000 54 > > trans_table: > > From :To > :396000792000996000 > 396000: 010 7 > 792000:16 012 > 996000: 118 0 What is it that you are trying to point out here? I still see that we are coming back to 396 MHz quite often. Maybe can you compare these values with and without this patch to let us know? > This is very unexpected but not necessarily wrong. I haven't understood the problem completely yet :( -- viresh
Re: [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load
On 2017/7/28 12:22, Longpeng (Mike) wrote: > > > On 2017/6/6 18:57, Paolo Bonzini wrote: > >> The simplify part: do not touch pi_desc.nv, we can set it when the >> VCPU is first created. Likewise, pi_desc.sn is only handled by >> vmx_vcpu_pi_load, do not touch it in __pi_post_block. >> >> The fix part: do not check kvm_arch_has_assigned_device, instead >> check the SN bit to figure out whether vmx_vcpu_pi_put ran before. >> This matches what the previous patch did in pi_post_block. >> >> Cc: Longpeng (Mike) >> Cc: Huangweidong >> Cc: Gonglei >> Cc: wangxin >> Cc: Radim Krčmář >> Signed-off-by: Paolo Bonzini >> --- >> arch/x86/kvm/vmx.c | 68 >> -- >> 1 file changed, 35 insertions(+), 33 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 0f4714fe4908..81047f373747 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, >> int cpu) >> struct pi_desc old, new; >> unsigned int dest; >> >> -if (!kvm_arch_has_assigned_device(vcpu->kvm) || >> -!irq_remapping_cap(IRQ_POSTING_CAP) || >> -!kvm_vcpu_apicv_active(vcpu)) >> +/* >> + * In case of hot-plug or hot-unplug, we may have to undo >> + * vmx_vcpu_pi_put even if there is no assigned device. And we >> + * always keep PI.NDST up to date for simplicity: it makes the >> + * code easier, and CPU migration is not a fast path. >> + */ >> +if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu) >> +return; > > > Hi Paolo, > > I'm confused with the following scenario: > > (suppose the VM has a assigned devices) > step 1. the running vcpu is be preempted > --> vmx_vcpu_pi_put [ SET pi.sn ] > step 2. hot-unplug the assigned devices > step 3. the vcpu is scheduled in > --> vmx_vcpu_pi_load [ CLEAR pi.sn ] > step 4. the running vcpu is be preempted again > --> vmx_vcpu_pi_put [ direct return ] > step 5. the vcpu is migrated to another pcpu > step 6. the vcpu is scheduled in > --> vmx_vcpu_pi_load [ above check fails and > continue to execute the follow parts ] > > I think vmx_vcpu_pi_load should return direct in step6, because > vmx_vcpu_pi_put in step4 did nothing. > So maybe the above check has a potential problem. Oh! Sorry, please just ignore the above. I made a mistaken. > > Please kindly figure out if I misunderstand anything important :) > > -- > Regards, > Longpeng(Mike) > >> + >> +/* >> + * First handle the simple case where no cmpxchg is necessary; just >> + * allow posting non-urgent interrupts. >> + * >> + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change >> + * PI.NDST: pi_post_block will do it for us and the wakeup_handler >> + * expects the VCPU to be on the blocked_vcpu_list that matches >> + * PI.NDST. >> + */ >> +if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || >> +vcpu->cpu == cpu) { >> +pi_clear_sn(pi_desc); >> return; >> +} >> >> +/* The full case. */ >> do { >> old.control = new.control = pi_desc->control; >> >> -/* >> - * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there >> - * are two possible cases: >> - * 1. After running 'pre_block', context switch >> - *happened. For this case, 'sn' was set in >> - *vmx_vcpu_put(), so we need to clear it here. >> - * 2. After running 'pre_block', we were blocked, >> - *and woken up by some other guy. For this case, >> - *we don't need to do anything, 'pi_post_block' >> - *will do everything for us. However, we cannot >> - *check whether it is case #1 or case #2 here >> - *(maybe, not needed), so we also clear sn here, >> - *I think it is not a big deal. >> - */ >> -if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) { >> -if (vcpu->cpu != cpu) { >> -dest = cpu_physical_id(cpu); >> - >> -if (x2apic_enabled()) >> -new.ndst = dest; >> -else >> -new.ndst = (dest << 8) & 0xFF00; >> -} >> +dest = cpu_physical_id(cpu); >> >> -/* set 'NV' to 'notification vector' */ >> -new.nv = POSTED_INTR_VECTOR; >> -} >> +if (x2apic_enabled()) >> +new.ndst = dest; >> +else >> +new.ndst = (dest << 8) & 0xFF00; >> >> -/* Allow posting non-urgent interrupts */ >> new.sn = 0; >> } while (cmpxchg(&p
Re: strace-4.18 test suite oopses sparc64 4.12 and 4.13-rc kernels
From: Mikael Pettersson Date: Thu, 27 Jul 2017 21:45:25 +0200 > Attempting to build strace-4.18 as sparcv9 code and run its test suite > on a sparc64 machine (Sun Blade 2500 w/ 2 x USIIIi in my case) fails > reliably in three test cases (sched.gen, sched_xetattr.gen, and poll) > because two test binaries (sched_xetattr and poll) OOPS the kernel and > get killed. Sample dmesg from 4.13-rc2: > > [42912.270398] Unable to handle kernel NULL pointer dereference > [42912.327717] tsk->{mm,active_mm}->context = 136a > [42912.383789] tsk->{mm,active_mm}->pgd = fff227db4000 > [42912.435247] \|/ \|/ > "@'/ .. \`@" > /_| \__/ |_\ > \__U_/ > [42912.559982] sched_xetattr(21866): Oops [#1] > [42912.597773] CPU: 0 PID: 21866 Comm: sched_xetattr Not tainted 4.13.0-rc2 #1 > [42912.672138] task: fff229a5c380 task.stack: fff227dec000 > [42912.732876] TSTATE: 004411001603 TPC: 007570fc TNPC: > 00757110 Y: Not tainted > [42912.845079] TPC: <__bzero+0x20/0xc0> > [42912.874870] g0: g1: g2: 0030 > g3: 008ca100 > [42912.972120] g4: fff229a5c380 g5: fff23ef44000 g6: fff227dec000 > g7: 0030 > [42913.069446] o0: 0030 o1: fff227defe70 o2: > o3: 0030 > [42913.166765] o4: fff227defe70 o5: sp: fff227def5c1 > ret_pc: 00474fa4 > [42913.268664] RPC: This looks really strange. It is a memset() call with the buffer pointer and length arguments reversed. What exact command did you give to configure and build strace-4.18 so that I can try to reproduce this? Thanks.
Re: linux-next: Signed-off-by missing for commit in the renesas tree
Hi Simon, On Fri, Jul 28, 2017 at 09:21:08AM +1000, Stephen Rothwell wrote: > Hi Simon, > > Commit > > bb003371172f ("arm: dts: genmai: Add RIIC2 pin group") > > is missing a Signed-off-by from its committer. I do see my SOB in the patch I've sent. Any chance it has been dropped while applying it? Should I resend? Thanks for reproposing the patches now that we have RZ/A1 pinctrl merged. > > -- > Cheers, > Stephen Rothwell
Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings
> > On 07/27/2017 01:47 PM, Oliver Hartkopp wrote: > > On 07/26/2017 08:29 PM, Franklin S Cooper Jr wrote: > >> > > > >> I'm fine with switching to using bitrate instead of speed. Kurk was > >> originally the one that suggested to use the term arbitration and data > >> since thats how the spec refers to it. Which I do agree with. But your > >> right that in the drivers (struct can_priv) we just use bittiming and > >> data_bittiming (CAN-FD timings). I don't think adding "fd" into the > >> property name makes sense unless we are calling it something like > >> "max-canfd-bitrate" which I would agree is the easiest to understand. > >> > >> So what is the preference if we end up sticking with two properties? > >> Option 1 or 2? > >> > >> 1) > >> max-bitrate > >> max-data-bitrate > >> > >> 2) > >> max-bitrate > >> max-canfd-bitrate > >> > >> > > > > 1 > > > >>> A CAN transceiver is limited in bandwidth. But you only have one RX and > >>> one TX line between the CAN controller and the CAN transceiver. The > >>> transceiver does not know about CAN FD - it has just a physical(!) layer > >>> with a limited bandwidth. This is ONE limitation. > >>> > >>> So I tend to specify only ONE 'max-bitrate' property for the > >>> fixed-transceiver binding. > >>> > >>> The fact whether the CAN controller is CAN FD capable or not is provided > >>> by the netlink configuration interface for CAN controllers. > >> > >> Part of the reasoning to have two properties is to indicate that you > >> don't support CAN FD while limiting the "arbitration" bit rate. > > > > ?? > > > > It's a physical layer device which only has a bandwidth limitation. > > The transceiver does not know about CAN FD. > > > >> With one > >> property you can not determine this and end up having to make some > >> assumptions that can quickly end up biting people. > > > > Despite the fact that the transceiver does not know anything about ISO > > layer 2 (CAN/CAN FD) the properties should look like > > > > max-bitrate > > canfd-capable > > > > then. > > > > But when the tranceiver is 'canfd-capable' agnostic, why provide a > > property for it? > > > > Maybe I'm wrong but I still can't follow your argumentation ideas. > The transceiver does not know about CAN FD, but CAN FD uses the different restrictions of the arbitration & data phase in the CAN frame, i.e. during arbitration, the RX must indicate the wire (dominant/recessive) within 1 bit time, during data in CAN FD, this is not necessary. So while _a_ transceiver may be spec'd to 1MBit during arbitration, CAN FD packets may IMHO exceed that speed during data phase. That was the whole point of CAN FD: exceed the limits required for correct arbitration on transceiver & wire. So I do not agree on the single bandwidth limitation. The word 'max-arbitration-bitrate' makes the difference very clear. > Your right. I spoke to our CAN transceiver team and I finally get your > points. > > So yes using "max-bitrate" alone is all we need. Sorry for the confusion > and I'll create a new rev using this approach. > > > > Regards, > > Oliver Kind regards, Kurt
Re: [PATCH v3/resubmit 1/3] staging: gs_fpgaboot: add buffer overflow checks
On Wed, Jul 26, 2017 at 09:13:57PM -0400, Jacob von Chorus wrote: > Four fields in struct fpgaimage are char arrays of length MAX_STR (256). > The amount of data read into these buffers is controlled by a length > field in the bitstream file read from userspace. If a corrupt or > malicious firmware file was supplied, kernel data beyond these buffers > can be overwritten arbitrarily. > > This patch adds a check of the bitstream's length value to ensure it > fits within the bounds of the allocated buffers. An error condition is > returned from gs_read_bitstream if any of the reads fail. > > Signed-off-by: Jacob von Chorus > > v3: > - use >= to prevent an integer overflow in the comparison > - use get_unaligned_be functions to interpret length fields > - fix remainder of file to use valid error codes > > v2: > - char arrays converted to u8 arrays > - replace error return value with proper error code in > gs_read_bitstream > --- All of the v2: and such needs to go below the --- line, as Documentation/SubmittingPatches says to do. Please fix that up and resend the series. thanks, greg k-h
Re: [PATCH 1/3] staging: ccree: Replace kzalloc with devm_kzalloc
On Thu, Jul 27, 2017 at 05:27:32PM +0300, Gilad Ben-Yossef wrote: > From: Suniel Mahesh > > It is recommended to use managed function devm_kzalloc, which > simplifies driver cleanup paths and driver code. > This patch does the following: > (a) replace kzalloc with devm_kzalloc. > (b) drop kfree(), because memory allocated with devm_kzalloc() is > automatically freed on driver detach, otherwise it leads to a double > free. > (c) remove unnecessary blank lines. > > Signed-off-by: Suniel Mahesh > [gby: rebase on top of latest coding style fixes changes] > Acked-by: Gilad Ben-Yossef None of these 3 applied to my tree :(
Re: [PATCH 0/4] staging: ccree: coding style clean ups
On Thu, Jul 27, 2017 at 01:43:14PM +0300, Gilad Ben-Yossef wrote: > Misc. coding style fixes for ccree driver. > > These are the missing patches that failed to apply two weeks ago, > rebased onto latest staging-next. > > By the way, I still do not understand why they failed to apply, > as they applied just fine being cherry picked on top of latest > staging-next apart for fixes for later changes. I hope these will > be good. Only one applied here :( I did just take some more coding style fixes from someone else who sent them before you did as they looked correct, maybe that's why? Can you rebase and resend? thanks, greg k-h
Re: [PATCH] Staging: greybus: Match alignment with open parenthesis.
On Mon, Jul 24, 2017 at 10:08:37AM +0300, Dan Carpenter wrote: > I don't understand why greybus has to be special instead of the same as > everything else. Who cares about this stuff really? Just do whatever > is easiest and most common. It's not special, and --strict should apply here as long as it is in staging. So I'll apply these types of things to prevent people from feeling like they wasted their time fixing something up and getting it rejected for no good reason. thanks, greg k-h
Re: [PATCH] Staging: greybus: Match alignment with open parenthesis.
On Sun, Jul 23, 2017 at 02:09:57PM +0530, Shreeya Patel wrote: > Alignment should match with open parenthesis. > This fixes the coding style issue. > > Signed-off-by: Shreeya Patel > --- > drivers/staging/greybus/tools/loopback_test.c | 35 > --- > 1 file changed, 16 insertions(+), 19 deletions(-) Does not apply cleanly to my tree, can you please rebase it and resend if you want to have this applied? thanks, greg k-h
Re: [PATCH V3 2/9] cpufreq: Use transition_delay_us for legacy governors as well
On 24-07-17, 18:17, Peter Zijlstra wrote: > On Wed, Jul 19, 2017 at 03:42:42PM +0530, Viresh Kumar wrote: > > The policy->transition_delay_us field is used only by the schedutil > > governor currently, and this field describes how fast the driver wants > > the cpufreq governor to change CPUs frequency. It should rather be a > > common thing across all governors, as it doesn't have any schedutil > > dependency here. > > > > Create a new helper cpufreq_policy_transition_delay_us() to get the > > transition delay across all governors. > > > > Signed-off-by: Viresh Kumar > > --- > > drivers/cpufreq/cpufreq.c | 15 +++ > > drivers/cpufreq/cpufreq_governor.c | 9 + > > include/linux/cpufreq.h| 1 + > > kernel/sched/cpufreq_schedutil.c | 11 +-- > > 4 files changed, 18 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 9bf97a366029..c426d21822f7 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -524,6 +524,21 @@ unsigned int cpufreq_driver_resolve_freq(struct > > cpufreq_policy *policy, > > } > > EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq); > > > > +unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy > > *policy) > > +{ > > + unsigned int latency; > > + > > + if (policy->transition_delay_us) > > + return policy->transition_delay_us; > > + > > + latency = policy->cpuinfo.transition_latency / NSEC_PER_USEC; > > + if (latency) > > + return latency * LATENCY_MULTIPLIER; > > + > > + return LATENCY_MULTIPLIER; > > +} > > +EXPORT_SYMBOL_GPL(cpufreq_policy_transition_delay_us); > > I realize you're just moving code about, but _why_ are we doing that > division? We are doing division by NSEC_PER_USEC because the values of sampling_rate and rate_limit_us are in us, while transition_latency is in ns. We shouldn't be breaking the userspace ABI and so if we want, we can actually make transition_latency be stored in us instead. Though I am not sure why are we multiplying with LATENCY_MULTIPLIER (1000) as that seems to be a huge value. -- viresh
Re: Possible race in hysdn.ko
Hello Anton, first of all, this code was developed by other people and I never managed to get one of these cards - so I do not know so much about this driver at all. Unfortunately the firm behind hysdn do not longer exist and was taken over by Hermstedt AG years ago and even Hermstedt AG is not longer active in this businesss I think (ISDN is a obsolete technology). Am 27.07.2017 um 18:19 schrieb Anton Volkov: > Hello. > > While searching for races in the Linux kernel I've come across > "drivers/isdn/hysdn/hysdn.ko" module. Here is a question that I came up > with while analysing results. Lines are given using the info from Linux > v4.12. > > In hysdn_proclog.c file in put_log_buffer function a non-standard type > of synchronization is employed. It uses pd->del_lock as some kind of > semaphore (hysdn_proclog.c: lines 129 and 143). Consider the following > case: > > Thread 1:Thread 2: > hysdn_log_write > -> hysdn_add_log > -> put_log_buffer > spin_lock() hysdn_conf_open > i = pd->del_lock++ -> hysdn_add_log > spin_unlock() -> put_log_buffer > if (!i) spin_lock() > pd->del_lock-- i = pd->del_lock++ > spin_unlock() > if (!i) > pd->del_lock-- > > - the loop that deletes unused buffer entries > (hysdn_proclog.c: lines 134-142). > pd->del_lock-- is not an atomic operation and is executed without any > locks. Thus it may interfere in the increment process of pd->del_lock in > another thread. There may be cases that lead to the inability of any > thread going through the . Good catch. > > I see several possible solutions to this problem: > 1) move the under the spin_lock and delete > pd->del_lock synchronization; > 2) wrap pd->del_lock-- with spin_lock protection. > > What do you think should be done about it? I think the intention to have this construct was to not hold the card lock for long times from /proc/ access to log data, since that may disrupt the normal function. This is only a guess - I did not really analyzed the code deeply enough, but I fear here are other critical problems with this code, since without extra protection the list could be damaged during the deletion loop I think. So maybe to have the complete loop under the lock is a good idea. Best regards Karsten
Re: [RFT v2 2/3] ASoC: samsung: Add missing prepare for iis clock of s3c24xx
Hi, On Thursday 27 July 2017 10:43 PM, Krzysztof Kozlowski wrote: The s3c_i2sv2_probe() only enabled iis clock. Missing prepare isn't probably fatal, because for SoC clocks this is usually no-op, but for correctness this clock should be prepared. Signed-off-by: Krzysztof Kozlowski --- Changes since v1: 1. New patch --- sound/soc/samsung/s3c-i2s-v2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/samsung/s3c-i2s-v2.c b/sound/soc/samsung/s3c-i2s-v2.c index 9b28046eea8e..3894bda06ebb 100644 --- a/sound/soc/samsung/s3c-i2s-v2.c +++ b/sound/soc/samsung/s3c-i2s-v2.c @@ -637,7 +637,7 @@ int s3c_i2sv2_probe(struct snd_soc_dai *dai, return -ENOENT; } - clk_enable(i2s->iis_pclk); + clk_prepare_enable(i2s->iis_pclk); Please, handle are return value of clk_prepare_enble. /* Mark ourselves as in TXRX mode so we can run through our cleanup * process without warnings. */ ~arvind
Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the HNAE3 framework
On Thu, Jul 27, 2017 at 11:44:32PM +, Salil Mehta wrote: > Hi Leon > > > -Original Message- > > From: linux-rdma-ow...@vger.kernel.org [mailto:linux-rdma- > > ow...@vger.kernel.org] On Behalf Of Leon Romanovsky > > Sent: Sunday, July 23, 2017 2:16 PM > > To: Salil Mehta > > Cc: da...@davemloft.net; Zhuangyuzeng (Yisen); huangdaode; lipeng (Y); > > mehta.salil@gmail.com; net...@vger.kernel.org; linux- > > ker...@vger.kernel.org; linux-r...@vger.kernel.org; Linuxarm > > Subject: Re: [PATCH V4 net-next 2/8] net: hns3: Add support of the > > HNAE3 framework > > > > On Sat, Jul 22, 2017 at 11:09:36PM +0100, Salil Mehta wrote: > > > This patch adds the support of the HNAE3 (Hisilicon Network > > > Acceleration Engine 3) framework support to the HNS3 driver. > > > > > > Framework facilitates clients like ENET(HNS3 Ethernet Driver), RoCE > > > and user-space Ethernet drivers (like ODP etc.) to register with > > HNAE3 > > > devices and their associated operations. > > > > > > Signed-off-by: Daode Huang > > > Signed-off-by: lipeng > > > Signed-off-by: Salil Mehta > > > Signed-off-by: Yisen Zhuang > > > --- > > > Patch V4: Addressed following comments > > > 1. Andrew Lunn: > > > https://lkml.org/lkml/2017/6/17/233 > > > https://lkml.org/lkml/2017/6/18/105 > > > 2. Bo Yu: > > > https://lkml.org/lkml/2017/6/18/112 > > > 3. Stephen Hamminger: > > > https://lkml.org/lkml/2017/6/19/778 > > > Patch V3: Addressed below comments > > > 1. Andrew Lunn: > > > https://lkml.org/lkml/2017/6/13/1025 > > > Patch V2: No change > > > Patch V1: Initial Submit > > > --- > > > drivers/net/ethernet/hisilicon/hns3/hnae3.c | 319 > > > > > drivers/net/ethernet/hisilicon/hns3/hnae3.h | 449 > > > > > 2 files changed, 768 insertions(+) > > > create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.c > > > create mode 100644 drivers/net/ethernet/hisilicon/hns3/hnae3.h > > > > > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c > > b/drivers/net/ethernet/hisilicon/hns3/hnae3.c > > > new file mode 100644 > > > index ..7a11aaff0a23 > > > --- /dev/null > > > +++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c > > > @@ -0,0 +1,319 @@ > > > +/* > > > + * Copyright (c) 2016-2017 Hisilicon Limited. > > > + * > > > + * This program is free software; you can redistribute it and/or > > modify > > > + * it under the terms of the GNU General Public License as published > > by > > > + * the Free Software Foundation; either version 2 of the License, or > > > + * (at your option) any later version. > > > + */ > > > + > > > +#include > > > +#include > > > +#include > > > + > > > +#include "hnae3.h" > > > + > > > +static LIST_HEAD(hnae3_ae_algo_list); > > > +static LIST_HEAD(hnae3_client_list); > > > +static LIST_HEAD(hnae3_ae_dev_list); > > > + > > > +/* we are keeping things simple and using single lock for all the > > > + * list. This is a non-critical code so other updations, if happen > > > + * in parallel, can wait. > > > + */ > > > +static DEFINE_MUTEX(hnae3_common_lock); > > > + > > > +static bool hnae3_client_match(enum hnae3_client_type client_type, > > > +enum hnae3_dev_type dev_type) > > > +{ > > > + if (dev_type == HNAE3_DEV_KNIC) { > > > + switch (client_type) { > > > + case HNAE3_CLIENT_KNIC: > > > + case HNAE3_CLIENT_ROCE: > > > + return true; > > > + default: > > > + return false; > > > + } > > > + } else if (dev_type == HNAE3_DEV_UNIC) { > > > + switch (client_type) { > > > + case HNAE3_CLIENT_UNIC: > > > + return true; > > > + default: > > > + return false; > > > + } > > > + } else { > > > + return false; > > > + } > > > +} > > > + > > > +static int hnae3_match_n_instantiate(struct hnae3_client *client, > > > + struct hnae3_ae_dev *ae_dev, > > > + bool is_reg, bool *matched) > > > +{ > > > + int ret; > > > + > > > + *matched = false; > > > + > > > + /* check if this client matches the type of ae_dev */ > > > + if (!(hnae3_client_match(client->type, ae_dev->dev_type) && > > > + hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))) { > > > + return 0; > > > + } > > > + /* there is a match of client and dev */ > > > + *matched = true; > > > + > > > + if (!(ae_dev->ops && ae_dev->ops->init_client_instance && > > > + ae_dev->ops->uninit_client_instance)) { > > > + dev_err(&ae_dev->pdev->dev, > > > + "ae_dev or client init/uninit ops are null\n"); > > > + return -EOPNOTSUPP; > > > + } > > > + > > > + /* now, (un-)instantiate client by calling lower layer */ > > > + if (is_reg) { > > > + ret = ae_dev->ops->init_client_instance(client, ae_dev); > > > + if (ret) > > > + dev_err(&ae_dev->pdev->dev, > > > +
Re: [RFT v2 1/3] ASoC: samsung: Fix possible double iounmap on s3c24xx driver probe failure
On Thursday 27 July 2017 10:43 PM, Krzysztof Kozlowski wrote: Commit 87b132bc0315 ("ASoC: samsung: s3c24{xx,12}-i2s: port to use generic dmaengine API") moved ioremap() call from s3c-i2s-v2.c:s3c_i2sv2_probe() to s3c2412-i2s.c:s3c2412_iis_dev_probe() and converted it to devm- resource managed interface. However the error path in first of them - s3c_i2sv2_probe() - was not updated. If getting a iis clock in s3c_i2sv2_probe() failed, the address space would be unmapped there. This could lead to: 1. double iounmap() later from devm-interface of s3c2412_iis_dev_probe()), 2. accessing the memory by other functions in s3c2412-i2s.c unit. Anyway, the owner of this mapped region should be s3c2412-i2s.c because it starts the mapping. Affected are drivers for S3C24xx family although issue was not reproduced. Fixes: 87b132bc0315 ("ASoC: samsung: s3c24{xx,12}-i2s: port to use generic dmaengine API") Signed-off-by: Krzysztof Kozlowski Acked-by: Arvind Yadav --- Not marking as Cc-stable because this is theoretical problem, not reproduced and also not tested. Please, kindly test on S3C24xx hardware. Changes since v1: 1. None --- sound/soc/samsung/s3c-i2s-v2.c | 1 - 1 file changed, 1 deletion(-) diff --git a/sound/soc/samsung/s3c-i2s-v2.c b/sound/soc/samsung/s3c-i2s-v2.c index ca522a95160b..9b28046eea8e 100644 --- a/sound/soc/samsung/s3c-i2s-v2.c +++ b/sound/soc/samsung/s3c-i2s-v2.c @@ -634,7 +634,6 @@ int s3c_i2sv2_probe(struct snd_soc_dai *dai, i2s->iis_pclk = clk_get(dev, "iis"); if (IS_ERR(i2s->iis_pclk)) { dev_err(dev, "failed to get iis_clock\n"); - iounmap(i2s->regs); return -ENOENT; }
Re: [PATCH V8 3/3] powernv: Add support to clear sensor groups data
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote: > Adds support for clearing different sensor groups. OCC inband sensor > groups like CSM, Profiler, Job Scheduler can be cleared using this > driver. The min/max of all sensors belonging to these sensor groups > will be cleared. > Hi Shilpasri, I think also some comments from v1 also apply here. Other comments inline Thanks, Cyril > Signed-off-by: Shilpasri G Bhat > --- > Changes from V7: > - s/send_occ_command/opal_sensor_groups_clear_history > > arch/powerpc/include/asm/opal-api.h| 3 +- > arch/powerpc/include/asm/opal.h| 2 + > arch/powerpc/include/uapi/asm/opal-occ.h | 23 ++ > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-occ.c | 109 > + > arch/powerpc/platforms/powernv/opal-wrappers.S | 1 + > arch/powerpc/platforms/powernv/opal.c | 3 + > 7 files changed, 141 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/include/uapi/asm/opal-occ.h > create mode 100644 arch/powerpc/platforms/powernv/opal-occ.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index 0d37315..342738a 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -195,7 +195,8 @@ > #define OPAL_SET_POWERCAP153 > #define OPAL_GET_PSR 154 > #define OPAL_SET_PSR 155 > -#define OPAL_LAST155 > +#define OPAL_SENSOR_GROUPS_CLEAR 156 > +#define OPAL_LAST156 > > /* Device tree flags */ > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index 58b30a4..92db6af 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -271,6 +271,7 @@ int64_t opal_xive_set_vp_info(uint64_t vp, > int opal_set_powercap(u32 handle, int token, u32 pcap); > int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr); > int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr); > +int opal_sensor_groups_clear(u32 group_hndl, int token); > > /* Internal functions */ > extern int early_init_dt_scan_opal(unsigned long node, const char *uname, > @@ -351,6 +352,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) > > void opal_powercap_init(void); > void opal_psr_init(void); > +int opal_sensor_groups_clear_history(u32 handle); > > #endif /* __ASSEMBLY__ */ > > diff --git a/arch/powerpc/include/uapi/asm/opal-occ.h > b/arch/powerpc/include/uapi/asm/opal-occ.h > new file mode 100644 > index 000..97c45e2 > --- /dev/null > +++ b/arch/powerpc/include/uapi/asm/opal-occ.h > @@ -0,0 +1,23 @@ > +/* > + * OPAL OCC command interface > + * Supported on POWERNV platform > + * > + * (C) Copyright IBM 2017 > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2, or (at your option) > + * any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > + > +#ifndef _UAPI_ASM_POWERPC_OPAL_OCC_H_ > +#define _UAPI_ASM_POWERPC_OPAL_OCC_H_ > + > +#define OPAL_OCC_IOCTL_CLEAR_SENSOR_GROUPS _IOR('o', 1, u32) > + > +#endif /* _UAPI_ASM_POWERPC_OPAL_OCC_H */ > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index 9ed7d33..f193b33 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o > +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o > opal-occ.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-occ.c > b/arch/powerpc/platforms/powernv/opal-occ.c > new file mode 100644 > index 000..d1d4b28 > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-occ.c > @@ -0,0 +1,109 @@ > +/* > + * Copyright IBM Corporation 2017 > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free
Re: [PATCH] staging: unisys: visorchipset: constify attribute_group structure
On Wed, Jul 26, 2017 at 09:51:32PM -0400, Amitoj Kaur Chawla wrote: > Functions working with attribute_groups provided by > work with const attribute_group. These attribute_group structures do not > change at runtime so mark them as const. > > File size before: > text data bss dec hex filename > 24124 6216 448 307887844 > drivers/staging/unisys/visorbus/visorchipset.o > > File size after: > text data bss dec hex filename > 242206120 448 307887844 > drivers/staging/unisys/visorbus/visorchipset.o > > This change was made with the help of Coccinelle. > > Signed-off-by: Amitoj Kaur Chawla > --- > drivers/staging/unisys/visorbus/visorchipset.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/unisys/visorbus/visorchipset.c > b/drivers/staging/unisys/visorbus/visorchipset.c > index 2215056..798a92e 100644 > --- a/drivers/staging/unisys/visorbus/visorchipset.c > +++ b/drivers/staging/unisys/visorbus/visorchipset.c > @@ -1146,7 +1146,7 @@ static struct attribute > *visorchipset_parahotplug_attrs[] = { > NULL > }; > > -static struct attribute_group visorchipset_parahotplug_group = { > +static const struct attribute_group visorchipset_parahotplug_group = { > .name = "parahotplug", > .attrs = visorchipset_parahotplug_attrs > }; Someone else sent this change right before you did :(
Re: [PATCH] staging: unisys: visorbus_main: constify attribute_group structures
On Tue, Jul 25, 2017 at 06:45:52PM -0400, Amitoj Kaur Chawla wrote: > Functions working with attribute_groups provided by > work with const attribute_group. These attribute_group structures do not > change at runtime so mark them as const. > > File size before: > text data bss dec hex filename > 142167304 832 223525750 > drivers/staging/unisys/visorbus/visorbus_main.o > > File size after: > text data bss dec hex filename > 144087112 832 223525750 > drivers/staging/unisys/visorbus/visorbus_main.o > > This change was made with the help of Coccinelle. > > Signed-off-by: Amitoj Kaur Chawla > --- > drivers/staging/unisys/visorbus/visorbus_main.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/staging/unisys/visorbus/visorbus_main.c > b/drivers/staging/unisys/visorbus/visorbus_main.c > index 1c785dd..bba10dc 100644 > --- a/drivers/staging/unisys/visorbus/visorbus_main.c > +++ b/drivers/staging/unisys/visorbus/visorbus_main.c > @@ -249,7 +249,7 @@ static struct attribute *channel_attrs[] = { > NULL > }; > > -static struct attribute_group channel_attr_grp = { > +static const struct attribute_group channel_attr_grp = { > .name = "channel", > .attrs = channel_attrs, > }; > @@ -340,7 +340,7 @@ static struct attribute *dev_attrs[] = { > NULL > }; > > -static struct attribute_group dev_attr_grp = { > +static const struct attribute_group dev_attr_grp = { > .attrs = dev_attrs, > }; > Does not apply to my tree :( sorry, greg k-h
Re: [Eas-dev] [PATCH V3 1/3] sched: cpufreq: Allow remote cpufreq callbacks
On Thu, Jul 27, 2017 at 12:55 PM, Saravana Kannan wrote: > On 07/26/2017 08:30 PM, Viresh Kumar wrote: >> >> On 26-07-17, 14:00, Saravana Kannan wrote: >>> >>> No, the alternative is to pass it on to the CPU freq driver and let it >>> decide what it wants to do. That's the whole point if having a CPU freq >>> driver -- so that the generic code doesn't need to care about HW specific >>> details. Which is the point I was making in an earlier email to Viresh's >>> patch -- we shouldn't be doing any CPU check for the call backs at the >>> scheduler or ever governor level. >>> >>> That would simplify this whole thing by deleting a bunch of code. And >>> having >>> much simpler checks in those drivers that actually have to deal with >>> their >>> HW specific details. >> >> >> So what you are saying is that we go and update (almost) every cpufreq >> driver we have today and make their ->target() callbacks return early >> if they don't support switching frequency remotely ? Is that really >> simplifying anything? > > > Yes. Simplifying isn't always about number of lines of code. It's also about > abstraction. Having generic scheduler code care about HW details doesn't > seem nice. > > It'll literally one simple check (cpu == smp_processor_id()) or (cpu "in" > policy->cpus). > I think we can have both approaches? So we query the driver some time around sugov_should_update_freq (with a new driver callback?) and ask it if it has any say over the default behavior of "can't update remote CPU if I'm not a part of its policy" and use that over the default if it hasn't defined it in their struct cpufreq_driver. I think this will also not have the concern of "updating every driver", then we can just stick to the sane default of "no" for drivers that haven't defined it. Probably Viresh has already thought about this, but I just thought of bringing it up anyway. I also think its fine to handle this case after this series gets in, but that's just my opinion. thanks! -Joel
Re: [PATCH 2/3] staging: ccree: Convert to devm_ioremap_resource for map, unmap
On Friday 28 July 2017 01:18 AM, Dan Carpenter wrote: > On Thu, Jul 27, 2017 at 05:27:33PM +0300, Gilad Ben-Yossef wrote: >> +new_drvdata->cc_base = devm_ioremap_resource(&plat_dev->dev, >> + req_mem_cc_regs); >> +if (IS_ERR(new_drvdata->cc_base)) { >> +rc = PTR_ERR(new_drvdata->cc_base); >> goto init_cc_res_err; > > (This code was in the original and not introduced by the patch.) Hi Dan, the above lines of code were not in the original except "goto init_cc_res_err;" which was doing the error handling at different places. This patch has added those first three lines. I was constantly checking the latest linux-next and staging-testing / next git trees. > > Ideally, the goto name should say what the goto does. In this case it > does everything. Unfortunately trying to do everything is very > complicated so obviously the error handling is going to be full of bugs. > > The first thing the error handling does is: > > ssi_aead_free(new_drvdata); > > But this function assumes that if new_drvdata->aead_handle is non-NULL > then that means we have called: > > INIT_LIST_HEAD(&aead_handle->aead_list); > > That assumption is false if the aead_handle->sram_workspace_addr > allocation fails. It can't actually fail in the current code... So > that's good, I suppose. Reviewing this code is really hard, because I > have to jump back and forth through several functions in different > files. > > Moving on two the second error handling function: > > ssi_hash_free(new_drvdata); > > This one has basically the same assumption that if ->hash_handle is > allocated that means we called: > > INIT_LIST_HEAD(&hash_handle->hash_list); > > That assumption is not true if ssi_hash_init_sram_digest_consts(drvdata); > fails. That function can fail in real life. Except the the error > handling in ssi_hash_alloc() sets ->hash_handle to NULL. So the bug is > just a leak and not a crashing bug. > > I've reviewed the first two lines of the error handling just to give a > feel for how complicated "do everything" style error handling is to > review. > > The better way to do error handling is: > 1) Only free things which have been allocated. > 2) The unwind code should mirror the wind up code. > 3) Every allocation function should have a free function. > 4) Label names should tell you what the goto does. > 5) Use direct returns and literals where possible. > 6) Generally it's better to keep the error path and the success path >separate. > 7) Do error handling as opposed to success handling. > > one = alloc(); > if (!one) > return -ENOMEM; > if (foo) { > two = alloc(); > if (!two) { > ret = -ENOMEM; > goto free_one; > } > } > three = alloc(); > if (!three) { > ret = -ENOMEM; > goto free_two; > } > ... > > return 0; > > free_two: > if (foo) > free(two); > free_one: > free(one); > > return ret; > > This style of error handling is easier to review. You only need to > remember the most recent thing that you have allocated. You can tell > from the goto that it frees it so you don't have to scroll to the > bottom of the function or jump to a different file. I understand, its make sense, may be we should come up with something better and simpler. Thanks Suniel > > regards, > dan carpenter > >
Re: [PATCH 3.10] pstore: Make spinlock per zone instead of global
Hi Leo, There was no upstream commit ID here but I found it in mainline here : commit 109704492ef637956265ec2eb72ae7b3b39eb6f4 Author: Joel Fernandes Date: Thu Oct 20 00:34:00 2016 -0700 pstore: Make spinlock per zone instead of global What worries me is that some later fixes were issued, apparently to fix an oops and a warning after this patch : commit 76d5692a58031696e282384cbd893832bc92bd76 Author: Kees Cook Date: Thu Feb 9 15:43:44 2017 -0800 pstore: Correctly initialize spinlock and flags The ram backend wasn't always initializing its spinlock correctly. Since it was coming from kzalloc memory, though, it was harmless on architectures that initialize unlocked spinlocks to 0 (at least x86 and ARM). This also fixes a possibly ignored flag setting too. and : commit e9a330c4289f2ba1ca4bf98c2b430ab165a8931b Author: Kees Cook Date: Sun Mar 5 22:08:58 2017 -0800 pstore: Use dynamic spinlock initializer The per-prz spinlock should be using the dynamic initializer so that lockdep can correctly track it. Without this, under lockdep, we get a warning at boot that the lock is in non-static memory. So I'm fine with merging this patch as long as Kees is OK with this and we know what exact patch series needs to be merged. Also, the information you added to the commit message references a trace on a 4.4 kernel. Do you confirm that you got the same issue on 3.10 ? I just prefer to avoid blindly backporting sensitive patches if they're not absolutely needed. > [ 65.103905] hrtimer: interrupt took 2759375 ns > [ 65.108721] BUG: spinlock recursion on CPU#0, kschedfreq:0/1246 > [ 65.108760] lock: buffer_lock+0x0/0x38, .magic: dead4ead, .owner: > kschedfreq:0/1246, .owner_cpu: 0 > [ 65.108779] CPU: 0 PID: 1246 Comm: kschedfreq:0 Not tainted > 4.4.74-07294-g5c996a9-dirty #130 Thanks! willy
Re: [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load
On 2017/6/6 18:57, Paolo Bonzini wrote: > The simplify part: do not touch pi_desc.nv, we can set it when the > VCPU is first created. Likewise, pi_desc.sn is only handled by > vmx_vcpu_pi_load, do not touch it in __pi_post_block. > > The fix part: do not check kvm_arch_has_assigned_device, instead > check the SN bit to figure out whether vmx_vcpu_pi_put ran before. > This matches what the previous patch did in pi_post_block. > > Cc: Longpeng (Mike) > Cc: Huangweidong > Cc: Gonglei > Cc: wangxin > Cc: Radim Krčmář > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/vmx.c | 68 > -- > 1 file changed, 35 insertions(+), 33 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 0f4714fe4908..81047f373747 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -2184,43 +2184,41 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, > int cpu) > struct pi_desc old, new; > unsigned int dest; > > - if (!kvm_arch_has_assigned_device(vcpu->kvm) || > - !irq_remapping_cap(IRQ_POSTING_CAP) || > - !kvm_vcpu_apicv_active(vcpu)) > + /* > + * In case of hot-plug or hot-unplug, we may have to undo > + * vmx_vcpu_pi_put even if there is no assigned device. And we > + * always keep PI.NDST up to date for simplicity: it makes the > + * code easier, and CPU migration is not a fast path. > + */ > + if (!pi_test_sn(pi_desc) && vcpu->cpu == cpu) > + return; Hi Paolo, I'm confused with the following scenario: (suppose the VM has a assigned devices) step 1. the running vcpu is be preempted --> vmx_vcpu_pi_put [ SET pi.sn ] step 2. hot-unplug the assigned devices step 3. the vcpu is scheduled in --> vmx_vcpu_pi_load [ CLEAR pi.sn ] step 4. the running vcpu is be preempted again --> vmx_vcpu_pi_put [ direct return ] step 5. the vcpu is migrated to another pcpu step 6. the vcpu is scheduled in --> vmx_vcpu_pi_load [ above check fails and continue to execute the follow parts ] I think vmx_vcpu_pi_load should return direct in step6, because vmx_vcpu_pi_put in step4 did nothing. So maybe the above check has a potential problem. Please kindly figure out if I misunderstand anything important :) -- Regards, Longpeng(Mike) > + > + /* > + * First handle the simple case where no cmpxchg is necessary; just > + * allow posting non-urgent interrupts. > + * > + * If the 'nv' field is POSTED_INTR_WAKEUP_VECTOR, do not change > + * PI.NDST: pi_post_block will do it for us and the wakeup_handler > + * expects the VCPU to be on the blocked_vcpu_list that matches > + * PI.NDST. > + */ > + if (pi_desc->nv == POSTED_INTR_WAKEUP_VECTOR || > + vcpu->cpu == cpu) { > + pi_clear_sn(pi_desc); > return; > + } > > + /* The full case. */ > do { > old.control = new.control = pi_desc->control; > > - /* > - * If 'nv' field is POSTED_INTR_WAKEUP_VECTOR, there > - * are two possible cases: > - * 1. After running 'pre_block', context switch > - *happened. For this case, 'sn' was set in > - *vmx_vcpu_put(), so we need to clear it here. > - * 2. After running 'pre_block', we were blocked, > - *and woken up by some other guy. For this case, > - *we don't need to do anything, 'pi_post_block' > - *will do everything for us. However, we cannot > - *check whether it is case #1 or case #2 here > - *(maybe, not needed), so we also clear sn here, > - *I think it is not a big deal. > - */ > - if (pi_desc->nv != POSTED_INTR_WAKEUP_VECTOR) { > - if (vcpu->cpu != cpu) { > - dest = cpu_physical_id(cpu); > - > - if (x2apic_enabled()) > - new.ndst = dest; > - else > - new.ndst = (dest << 8) & 0xFF00; > - } > + dest = cpu_physical_id(cpu); > > - /* set 'NV' to 'notification vector' */ > - new.nv = POSTED_INTR_VECTOR; > - } > + if (x2apic_enabled()) > + new.ndst = dest; > + else > + new.ndst = (dest << 8) & 0xFF00; > > - /* Allow posting non-urgent interrupts */ > new.sn = 0; > } while (cmpxchg(&pi_desc->control, old.control, > new.control) != old.control); > @@ -9259,6 +9257,13 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm > *kvm, unsigned int id) > >
Re: [PATCH V8 2/3] powernv: Add support to set power-shifting-ratio
On Wed, 2017-07-26 at 10:35 +0530, Shilpasri G Bhat wrote: > This patch adds support to set power-shifting-ratio for CPU-GPU which > is used by OCC power capping algorithm. > > Signed-off-by: Shilpasri G Bhat Hi Shilpasri, I started looking though this - a lot the comments to patch 1/3 apply here so I'll stop repeating myself :). Thanks, Cyril > --- > Changes from V7: > - Replaced sscanf with kstrtoint > > arch/powerpc/include/asm/opal-api.h| 4 +- > arch/powerpc/include/asm/opal.h| 3 + > arch/powerpc/platforms/powernv/Makefile| 2 +- > arch/powerpc/platforms/powernv/opal-psr.c | 169 > + > arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + > arch/powerpc/platforms/powernv/opal.c | 3 + > 6 files changed, 181 insertions(+), 2 deletions(-) > create mode 100644 arch/powerpc/platforms/powernv/opal-psr.c > > diff --git a/arch/powerpc/include/asm/opal-api.h > b/arch/powerpc/include/asm/opal-api.h > index c3e0c4a..0d37315 100644 > --- a/arch/powerpc/include/asm/opal-api.h > +++ b/arch/powerpc/include/asm/opal-api.h > @@ -193,7 +193,9 @@ > #define OPAL_NPU_MAP_LPAR148 > #define OPAL_GET_POWERCAP152 > #define OPAL_SET_POWERCAP153 > -#define OPAL_LAST153 > +#define OPAL_GET_PSR 154 > +#define OPAL_SET_PSR 155 > +#define OPAL_LAST155 > > /* Device tree flags */ > > diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h > index ec2087c..58b30a4 100644 > --- a/arch/powerpc/include/asm/opal.h > +++ b/arch/powerpc/include/asm/opal.h > @@ -269,6 +269,8 @@ int64_t opal_xive_set_vp_info(uint64_t vp, > int64_t opal_xive_dump(uint32_t type, uint32_t id); > int opal_get_powercap(u32 handle, int token, u32 *pcap); > int opal_set_powercap(u32 handle, int token, u32 pcap); > +int opal_get_power_shifting_ratio(u32 handle, int token, u32 *psr); > +int opal_set_power_shifting_ratio(u32 handle, int token, u32 psr); > > /* Internal functions */ > extern int early_init_dt_scan_opal(unsigned long node, const char *uname, > @@ -348,6 +350,7 @@ static inline int opal_get_async_rc(struct opal_msg msg) > void opal_wake_poller(void); > > void opal_powercap_init(void); > +void opal_psr_init(void); > > #endif /* __ASSEMBLY__ */ > > diff --git a/arch/powerpc/platforms/powernv/Makefile > b/arch/powerpc/platforms/powernv/Makefile > index e79f806..9ed7d33 100644 > --- a/arch/powerpc/platforms/powernv/Makefile > +++ b/arch/powerpc/platforms/powernv/Makefile > @@ -2,7 +2,7 @@ obj-y += setup.o opal-wrappers.o opal.o > opal-async.o idle.o > obj-y+= opal-rtc.o opal-nvram.o opal-lpc.o > opal-flash.o > obj-y+= rng.o opal-elog.o opal-dump.o > opal-sysparam.o opal-sensor.o > obj-y+= opal-msglog.o opal-hmi.o opal-power.o > opal-irqchip.o > -obj-y+= opal-kmsg.o opal-powercap.o > +obj-y+= opal-kmsg.o opal-powercap.o opal-psr.o > > obj-$(CONFIG_SMP)+= smp.o subcore.o subcore-asm.o > obj-$(CONFIG_PCI)+= pci.o pci-ioda.o npu-dma.o > diff --git a/arch/powerpc/platforms/powernv/opal-psr.c > b/arch/powerpc/platforms/powernv/opal-psr.c > new file mode 100644 > index 000..07e3f78 > --- /dev/null > +++ b/arch/powerpc/platforms/powernv/opal-psr.c > @@ -0,0 +1,169 @@ > +/* > + * PowerNV OPAL Power-Shifting-Ratio interface > + * > + * Copyright 2017 IBM Corp. > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public License > + * as published by the Free Software Foundation; either version > + * 2 of the License, or (at your option) any later version. > + */ > + > +#define pr_fmt(fmt) "opal-psr: " fmt > + > +#include > +#include > +#include > + > +#include > + > +DEFINE_MUTEX(psr_mutex); > + > +static struct kobject *psr_kobj; > + > +struct psr_attr { > + u32 handle; > + struct kobj_attribute attr; > +}; > + > +static struct psr_attr *psr_attrs; > +static struct kobject *psr_kobj; > + > +static ssize_t psr_show(struct kobject *kobj, struct kobj_attribute *attr, > + char *buf) > +{ > + struct psr_attr *psr_attr = container_of(attr, struct psr_attr, attr); > + struct opal_msg msg; > + int psr, ret, token; > + > + token = opal_async_get_token_interruptible(); > + if (token < 0) { > + pr_devel("Failed to get token\n"); > + return token; > + } > + > + mutex_lock(&psr_mutex); > + ret = opal_get_power_shifting_ratio(psr_attr->handle, token, &psr); __pa() > + switch (ret) { > + case OPAL_ASYNC_COMPLETION: > + ret = opal_async_wait_response(token, &msg); > + if (ret) { > + pr_devel("Failed to wait
Re: [PATCH V8 1/3] powernv: powercap: Add support for powercap framework
Hi Cyril, On 07/28/2017 07:09 AM, Cyril Bur wrote: > Is there any reason that pcap_attrs needs to be contiguous? If not, I > feel like you could eliminate the entire loop below (and the last one > as well maybe) and just do the assignment of pattr_groups[].attrs[] up > there. > > In fact do you even need to store pcap_attrs? If you kcalloc them as > you need them (in the loop above), you can always free them again on > error by freeing pattr_groups[].attrs[] right? > > I'll admit I've become quite confused as to the layout of the sysfs dir > that you're creating here - would you mind showing what the expected > layout will be? > > I'll take more of a look once thats more clear in my head > > Thanks, > > Cyril The sysfs layout looks as below: # ls /sys/firmware/opal/powercap/ system-powercap # ls /sys/firmware/opal/powercap/system-powercap/ powercap-current powercap-max powercap-min # grep . /sys/firmware/opal/powercap/system-powercap/* /sys/firmware/opal/powercap/system-powercap/powercap-current:2375 /sys/firmware/opal/powercap/system-powercap/powercap-max:2375 /sys/firmware/opal/powercap/system-powercap/powercap-min:1945 Thanks and Regards, Shilpa
Re: [PATCH net-next 3/3] tap: XDP support
On Fri, Jul 28, 2017 at 11:50:45AM +0800, Jason Wang wrote: > > > On 2017年07月28日 11:46, Michael S. Tsirkin wrote: > > On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote: > > > > > + old_prog = rtnl_dereference(tun->xdp_prog); > > > > > + if (old_prog) > > > > > + bpf_prog_put(old_prog); > > > > > + rcu_assign_pointer(tun->xdp_prog, prog); > > > > Is this OK? Could this lead to the program getting freed and then > > > > datapath accessing a stale pointer? I mean in the scenario where the > > > > process gets pre-empted between the bpf_prog_put() and > > > > rcu_assign_pointer()? > > > Will call bpf_prog_put() after rcu_assign_pointer(). > > I suspect you need to sync RCU or something before that. > > __bpf_prog_put() will do call_rcu(), so looks like it was ok. > > Thanks True - I missed that. -- MST
Re: [PATCH v8 09/13] iommu/amd: Use is_attach_deferred call-back
On 07/27/17 at 05:53pm, Joerg Roedel wrote: > On Fri, Jul 21, 2017 at 04:59:07PM +0800, Baoquan He wrote: > > +static bool amd_iommu_is_attach_deferred(struct iommu_domain *domain, > > +struct device *dev) > > +{ > > + struct iommu_dev_data *dev_data = dev->archdata.iommu; > > + return dev_data->defer_attach; > > +} > > If the redundant newline from patch 1 needs a new home, put it here :-) Got it, will add a newline at this place. Thanks.
Re: [PATCH v8 05/13] iommu/amd: Add function copy_dev_tables()
On 07/27/17 at 05:29pm, Joerg Roedel wrote: > On Fri, Jul 21, 2017 at 04:59:03PM +0800, Baoquan He wrote: > > Add function copy_dev_tables to copy the old DEV table entries of the > > panicked > > Since there is only one (for now), you can name the function in > singular: copy_dev_table() or copy_device_table(). Make sense, will change it to copy_device_table(). Thanks. > > > kernel to the new allocated DEV table. Since all iommus share the same DTE > > table > > the copy only need be done one time. Besides, we also need to: > > > > - Check whether all IOMMUs actually use the same device table with the > > same size > > > > - Verify that the size of the old device table is the expected size. > > > > - Reserve the old domain id occupied in 1st kernel to avoid touching the > > old > > io-page tables. Then on-flight DMA can continue looking it up. > > > > And also define MACRO DEV_DOMID_MASK to replace magic number 0xULL, it > > can be > > reused in copy_dev_tables(). > > > > Signed-off-by: Baoquan He > > --- > > drivers/iommu/amd_iommu.c | 2 +- > > drivers/iommu/amd_iommu_init.c | 55 > > + > > drivers/iommu/amd_iommu_types.h | 1 + > > 3 files changed, 57 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c > > index e5a03f259986..4d00f1bda900 100644 > > --- a/drivers/iommu/amd_iommu.c > > +++ b/drivers/iommu/amd_iommu.c > > @@ -2086,7 +2086,7 @@ static void set_dte_entry(u16 devid, struct > > protection_domain *domain, bool ats) > > flags|= tmp; > > } > > > > - flags &= ~(0xUL); > > + flags &= ~DEV_DOMID_MASK; > > flags |= domain->id; > > > > amd_iommu_dev_table[devid].data[1] = flags; > > diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c > > index f6da5fe03b31..c58f091ce232 100644 > > --- a/drivers/iommu/amd_iommu_init.c > > +++ b/drivers/iommu/amd_iommu_init.c > > @@ -842,6 +842,61 @@ static int get_dev_entry_bit(u16 devid, u8 bit) > > } > > > > > > +static int copy_dev_tables(void) > > +{ > > + struct dev_table_entry *old_devtb = NULL; > > + u32 lo, hi, devid, old_devtb_size; > > + phys_addr_t old_devtb_phys; > > + u64 entry, last_entry = 0; > > + struct amd_iommu *iommu; > > + u16 dom_id, dte_v; > > + static int copied; > > + > > + for_each_iommu(iommu) { > > + if (!translation_pre_enabled(iommu)) { > > + pr_err("IOMMU:%d is not pre-enabled!/n", > > + iommu->index); > > + return -1; > > + } > > + > > + /* All IOMMUs should use the same device table with the same > > size */ > > + lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET); > > + hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4); > > + entry = (((u64) hi) << 32) + lo; > > + if (last_entry && last_entry != entry) { > > + pr_err("IOMMU:%d should use the same dev table as > > others!/n", > > + iommu->index); > > + return -1; > > + } > > + last_entry = entry; > > + > > + old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12; > > + if (old_devtb_size != dev_table_size) { > > + pr_err("The device table size of IOMMU:%d is not > > expected!/n", > > + iommu->index); > > + return -1; > > + } > > I think the loop can end here. There is only one table to copy, so you > don't need to copy it for every iommu. Just do the checks in the loop > and the copy once after the loop. Ok, will change. I worried that device table addr might not be stored into iommu register correctly. I am fine we only do the check only at the first time. > > > + > > + old_devtb_phys = entry & PAGE_MASK; > > + old_devtb = memremap(old_devtb_phys, dev_table_size, > > MEMREMAP_WB); > > + if (!old_devtb) > > + return -1; > > + > > + if (copied) > > + continue; > > + for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) { > > + amd_iommu_dev_table[devid] = old_devtb[devid]; > > + dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK; > > + dte_v = old_devtb[devid].data[0] & DTE_FLAG_V; > > + if (dte_v && dom_id) > > + __set_bit(dom_id, amd_iommu_pd_alloc_bitmap); > > + } > > Please only copy the DTE when DTE.V is set in the old table. Also Sorry, the sanity check about DTE.V has been done in patch 7/13. I should rewrite the subject and git log for patch 7/13. > don't copy any of the IOMMUv2 enablement bits from the old DTE. PRI > faults are recoverable by design and a device shouldn't fail on a > negative answer from the IOMMU. Yes, this is done in patch 11/13. Saw y
Fwd: trinity test fanotify cause hungtasks on kernel 4.13
hi,ALL: when we used the trinity test the fanotify interfaces, it cause many hungtasks. CONFIG_FANOTIFY_ACCESS_PERMISSIONS=y the shell is simple: 1 #!/bin/bash 2 3 while true 4 do 5 ./trinity -c fanotify_init -l off -C 2 -X > /dev/null 2>&1 & 6 sleep 1 7 ./trinity -c fanotify_mark -l off -C 2 -X > /dev/null 2>&1 & 8 sleep 10 9 done we found the trinity enter the D state fastly. we check the pids'stack [root@localhost ~]# ps -aux | grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 977 0.0 0.0 207992 7904 ?Ss 15:23 0:00 /usr/bin/abrt-watch-log -F BUG: WARNING: at WARNING: CPU: INFO: possible recursive locking detected ernel BUG at list_del corruption list_add corruption do_IRQ: stack overflow: ear stack overflow (cur: eneral protection fault nable to handle kernel ouble fault: RTNL: assertion failed eek! page_mapcount(page) went negative! adness at NETDEV WATCHDOG ysctl table check failed : nobody cared IRQ handler type mismatch Machine Check Exception: Machine check events logged divide error: bounds: coprocessor segment overrun: invalid TSS: segment not present: invalid opcode: alignment check: stack segment: fpu exception: simd exception: iret exception: /var/log/messages -- /usr/bin/abrt-dump-oops -xtD root 997 0.0 0.0 203360 3188 ?Ssl 15:23 0:00 /usr/sbin/gssproxy -D root 1549 0.0 0.0 82552 6012 ?Ss 15:23 0:00 /usr/sbin/sshd -D root 2807 3.5 0.2 59740 35416 pts/0DL 15:24 0:00 ./trinity -c fanotify_init -l off -C 2 -X root 2809 3.1 0.2 53712 35332 pts/0DL 15:24 0:00 ./trinity -c fanotify_mark -l off -C 2 -X root 2915 0.0 0.0 136948 1776 pts/0D15:24 0:00 ps ax root 2919 0.0 0.0 112656 2100 pts/1S+ 15:24 0:00 grep --color=auto D [root@localhost ~]# cat /proc/2807/stack [] fanotify_handle_event+0x2a1/0x2f0 [] fsnotify+0x2d3/0x4f0 [] security_file_open+0x89/0x90 [] do_dentry_open+0x139/0x330 [] vfs_open+0x4f/0x70 [] path_openat+0x548/0x1350 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x [root@localhost ~]# cat /proc/2915/stack [] fanotify_handle_event+0x2a1/0x2f0 [] fsnotify+0x2d3/0x4f0 [] security_file_open+0x89/0x90 [] do_dentry_open+0x139/0x330 [] vfs_open+0x4f/0x70 [] path_openat+0x548/0x1350 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x [root@localhost ~]# cat /proc/2809/stack [] fanotify_handle_event+0x2a1/0x2f0 [] fsnotify+0x2d3/0x4f0 [] security_file_open+0x89/0x90 [] do_dentry_open+0x139/0x330 [] vfs_open+0x4f/0x70 [] path_openat+0x548/0x1350 [] do_filp_open+0x91/0x100 [] do_sys_open+0x124/0x210 [] SyS_open+0x1e/0x20 [] do_syscall_64+0x67/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 [] 0x all progresses are waiting for the response in fanotify_handle_event->fanotify_get_response, becauseof non-response or killed monitor,so the waitqueue is in blocked state, then the others will be stucked which use the fanotify_get_response. if we use wait_event_timeout , the responed time can not be guaranteed. do you have any ideas? thanks.
Re: [PATCH net-next 3/3] tap: XDP support
On Fri, 28 Jul 2017 06:46:40 +0300, Michael S. Tsirkin wrote: > On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote: > > > > + old_prog = rtnl_dereference(tun->xdp_prog); > > > > + if (old_prog) > > > > + bpf_prog_put(old_prog); > > > > + rcu_assign_pointer(tun->xdp_prog, prog); > > > Is this OK? Could this lead to the program getting freed and then > > > datapath accessing a stale pointer? I mean in the scenario where the > > > process gets pre-empted between the bpf_prog_put() and > > > rcu_assign_pointer()? > > > > Will call bpf_prog_put() after rcu_assign_pointer(). > > I suspect you need to sync RCU or something before that. I think the bpf_prog_put() will use call_rcu() to do the actual free: static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock) { if (atomic_dec_and_test(&prog->aux->refcnt)) { trace_bpf_prog_put_rcu(prog); /* bpf_prog_free_id() must be called first */ bpf_prog_free_id(prog, do_idr_lock); bpf_prog_kallsyms_del(prog); call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); } } It's just that we are only under the rtnl here, RCU lock is not held, so grace period may elapse between bpf_prog_put() and rcu_assign_pointer().
Re: [PATCH net-next 3/3] tap: XDP support
On 2017年07月28日 11:46, Michael S. Tsirkin wrote: On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote: + old_prog = rtnl_dereference(tun->xdp_prog); + if (old_prog) + bpf_prog_put(old_prog); + rcu_assign_pointer(tun->xdp_prog, prog); Is this OK? Could this lead to the program getting freed and then datapath accessing a stale pointer? I mean in the scenario where the process gets pre-empted between the bpf_prog_put() and rcu_assign_pointer()? Will call bpf_prog_put() after rcu_assign_pointer(). I suspect you need to sync RCU or something before that. __bpf_prog_put() will do call_rcu(), so looks like it was ok. Thanks
Re: [PATCH 1/6] dma: bcm-sba-raid: Improve memory allocation in SBA RAID driver
On Fri, Jul 28, 2017 at 8:43 AM, Vinod Koul wrote: > On Thu, Jul 27, 2017 at 09:42:33AM +0530, Anup Patel wrote: >> On Wed, Jul 26, 2017 at 10:39 PM, Vinod Koul wrote: >> > On Wed, Jul 26, 2017 at 11:06:39AM +0530, Anup Patel wrote: > >> >> drivers/dma/bcm-sba-raid.c | 439 >> >> +++-- >> >> 1 file changed, 226 insertions(+), 213 deletions(-) >> >> >> >> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c >> >> index e41bbc7..6d15fed 100644 >> >> --- a/drivers/dma/bcm-sba-raid.c >> >> +++ b/drivers/dma/bcm-sba-raid.c >> >> @@ -48,7 +48,8 @@ >> >> >> >> #include "dmaengine.h" >> >> >> >> -/* SBA command related defines */ >> >> +/* == Driver macros and defines = */ >> > >> > why this noise, seems unrelated to the change! >> >> This is just minor beautification. Again, I will put this >> in separate patch. > > Well you can't shove garlands under an unrelated change. By all means throw > the whole garden out there, but please as a separate patch Sure, I will have separate patch for this beautification. > >> >> > >> >> + >> >> #define SBA_TYPE_SHIFT 48 >> >> #define SBA_TYPE_MASKGENMASK(1, >> >> 0) >> >> #define SBA_TYPE_A 0x0 >> >> @@ -82,39 +83,41 @@ >> >> #define SBA_CMD_WRITE_BUFFER 0xc >> >> #define SBA_CMD_GALOIS 0xe >> >> >> >> -/* Driver helper macros */ >> >> +#define SBA_MAX_REQ_PER_MBOX_CHANNEL 8192 >> >> + >> >> #define to_sba_request(tx) \ >> >> container_of(tx, struct sba_request, tx) >> >> #define to_sba_device(dchan) \ >> >> container_of(dchan, struct sba_device, dma_chan) >> >> >> >> -enum sba_request_state { >> >> - SBA_REQUEST_STATE_FREE = 1, >> >> - SBA_REQUEST_STATE_ALLOCED = 2, >> >> - SBA_REQUEST_STATE_PENDING = 3, >> >> - SBA_REQUEST_STATE_ACTIVE = 4, >> >> - SBA_REQUEST_STATE_RECEIVED = 5, >> >> - SBA_REQUEST_STATE_COMPLETED = 6, >> >> - SBA_REQUEST_STATE_ABORTED = 7, >> >> +/* = Driver data structures = */ >> >> + >> >> +enum sba_request_flags { >> >> + SBA_REQUEST_STATE_FREE = 0x001, >> >> + SBA_REQUEST_STATE_ALLOCED = 0x002, >> >> + SBA_REQUEST_STATE_PENDING = 0x004, >> >> + SBA_REQUEST_STATE_ACTIVE= 0x008, >> >> + SBA_REQUEST_STATE_RECEIVED = 0x010, >> >> + SBA_REQUEST_STATE_COMPLETED = 0x020, >> >> + SBA_REQUEST_STATE_ABORTED = 0x040, >> >> + SBA_REQUEST_STATE_MASK = 0x0ff, >> >> + SBA_REQUEST_FENCE = 0x100, >> > >> > how does this help in mem alloctn? > > ?? Ahh, I missed to address this comment. Currently, we have separate "bool" flag for fenced sba_request and separate "state" variable in sba_request. We are have merged this two things in common "u32 flags" in sba_request. In future, we can use more bits in "u32 flags" as required without disturbing the sba_request. I will make this separate patch. I agree, I have covered many changes in PATCH1 which makes it hard for you to review. Thanks, Anup
Re: [PATCH net-next 3/3] tap: XDP support
On Fri, Jul 28, 2017 at 11:28:54AM +0800, Jason Wang wrote: > > > + old_prog = rtnl_dereference(tun->xdp_prog); > > > + if (old_prog) > > > + bpf_prog_put(old_prog); > > > + rcu_assign_pointer(tun->xdp_prog, prog); > > Is this OK? Could this lead to the program getting freed and then > > datapath accessing a stale pointer? I mean in the scenario where the > > process gets pre-empted between the bpf_prog_put() and > > rcu_assign_pointer()? > > Will call bpf_prog_put() after rcu_assign_pointer(). I suspect you need to sync RCU or something before that.
Re: [Eas-dev] [PATCH V4 0/3] sched: cpufreq: Allow remote callbacks
On Thu, Jul 27, 2017 at 12:21 AM, Juri Lelli wrote: [..] >> > >> > But even without that, if you see the routine >> > init_entity_runnable_average() in fair.c, the new tasks are >> > initialized in a way that they are seen as heavy tasks. And so even >> > for the first time they run, freq should normally increase on the >> > target CPU (at least with above application).i >> >> Ok, but the "heavy" in init_entity_runnable_average means for load, >> not the util_avg. The util_avg is what's used for frequency scaling >> IIUC and is set to 0 in that function no? >> > > True for init_entity_runnable_average(), but for new task post_init_ > entity_util_avg() is then also called (from wake_up_new_task()), which > modifies the initial util_avg value (depending on current rq {util, > load}_avg. Got it. That makes sense, thank you! -Joel
Re: [PATCH 2/6] dma: bcm-sba-raid: Peek mbox when we are left with no free requests
On Fri, Jul 28, 2017 at 8:45 AM, Vinod Koul wrote: > On Thu, Jul 27, 2017 at 10:25:25AM +0530, Anup Patel wrote: >> On Wed, Jul 26, 2017 at 10:40 PM, Vinod Koul wrote: >> > On Wed, Jul 26, 2017 at 11:06:40AM +0530, Anup Patel wrote: >> >> We should peek mbox channels when we are left with no free >> >> sba_requests in sba_alloc_request() >> > >> > and why is the world should we do that, how does that help?? >> >> When setting up RAID array on several NVMe disk we observed >> that sba_alloc_request() start failing (due to no free requests left) >> and RAID array setup becomes very slow. >> >> Doing mbox channel peek when we have no free requests left, >> improves performance of RAID array setup. > > How about documenting this tribal knowledge in the changelog. Changelogs are > very useful, 6 months down the line, you will struggle to remember why this > was changed.. Sure, I will have detailed commit description for this. Regards, Anup
Re: [Eas-dev] [PATCH V4 1/3] sched: cpufreq: Allow remote cpufreq callbacks
On Thu, Jul 27, 2017 at 12:14 AM, Viresh Kumar wrote: > On 26-07-17, 23:13, Joel Fernandes (Google) wrote: >> On Wed, Jul 26, 2017 at 10:50 PM, Viresh Kumar >> wrote: >> > On 26-07-17, 22:34, Joel Fernandes (Google) wrote: >> >> On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar >> >> wrote: >> >> > @@ -221,7 +226,7 @@ static void sugov_update_single(struct >> >> > update_util_data *hook, u64 time, >> >> > sugov_set_iowait_boost(sg_cpu, time, flags); >> >> > sg_cpu->last_update = time; >> >> > >> >> > - if (!sugov_should_update_freq(sg_policy, time)) >> >> > + if (!sugov_should_update_freq(sg_policy, time, hook->cpu)) >> >> > return; >> >> >> >> Since with the remote callbacks now possible, isn't it unsafe to >> >> modify sg_cpu and sg_policy structures without a lock in >> >> sugov_update_single? >> >> >> >> Unlike sugov_update_shared, we don't acquire any lock in >> >> sugov_update_single before updating these structures. Did I miss >> >> something? >> > >> > As Peter already mentioned it earlier, the callbacks are called with >> > rq locks held and so sugov_update_single() wouldn't get called in >> > parallel for a target CPU. >> >> Ah ok, I have to catch up with that discussion since I missed the >> whole thing. Now that you will have me on CC, that shouldn't happen, >> thanks and sorry about the noise. >> >> > That's the only race you were worried about ? >> >> Yes. So then in that case, makes sense to move raw_spin_lock in >> sugov_update_shared further down? (Just discussing, this point is >> independent of your patch), Something like: > > Even that was discussed tomorrow with Peter :) > > No it wouldn't work because sg_cpu->util we are updating here may be > getting read from some other cpu that shares policy with sg_cpu. > Ok. yes you are right :) thank you Viresh and Peter for the clarification. thanks, -Joel
Re: [PATCH net-next 3/3] tap: XDP support
On 2017年07月28日 11:13, Jakub Kicinski wrote: On Thu, 27 Jul 2017 17:25:33 +0800, Jason Wang wrote: This patch tries to implement XDP for tun. The implementation was split into two parts: - fast path: small and no gso packet. We try to do XDP at page level before build_skb(). For XDP_TX, since creating/destroying queues were completely under control of userspace, it was implemented through generic XDP helper after skb has been built. This could be optimized in the future. - slow path: big or gso packet. We try to do it after skb was created through generic XDP helpers. XDP_REDIRECT was not implemented, it could be done on top. xdp1 test shows 47.6% improvement: Before: ~2.1Mpps After: ~3.1Mpps Suggested-by: Michael S. Tsirkin Signed-off-by: Jason Wang @@ -1008,6 +1016,56 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats) stats->tx_dropped = tx_dropped; } +static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog, + struct netlink_ext_ack *extack) +{ + struct tun_struct *tun = netdev_priv(dev); + struct bpf_prog *old_prog; + + /* We will shift the packet that can't be handled to generic +* XDP layer. +*/ + + old_prog = rtnl_dereference(tun->xdp_prog); + if (old_prog) + bpf_prog_put(old_prog); + rcu_assign_pointer(tun->xdp_prog, prog); Is this OK? Could this lead to the program getting freed and then datapath accessing a stale pointer? I mean in the scenario where the process gets pre-empted between the bpf_prog_put() and rcu_assign_pointer()? Will call bpf_prog_put() after rcu_assign_pointer(). + if (prog) { + prog = bpf_prog_add(prog, 1); + if (IS_ERR(prog)) + return PTR_ERR(prog); + } I don't think you need this extra reference here. dev_change_xdp_fd() will call bpf_prog_get_type() which means driver gets the program with a reference already taken, drivers does have to free that reference when program is removed (or device is freed, as you correctly do). I see, will drop this in next version. Thanks. + return 0; +} +
Re: [PATCH v2] net: inet: diag: expose sockets cgroup classid
On Thu, 27 Jul 2017 18:11:32 +, Levin, Alexander (Sasha Levin) wrote: > This is useful for directly looking up a task based on class id rather than > having to scan through all open file descriptors. > > Signed-off-by: Sasha Levin > --- > > Changes in V2: > - Addressed comments from Cong Wang (use nla_put_u32()) > > include/uapi/linux/inet_diag.h | 1 + > net/ipv4/inet_diag.c | 10 ++ > 2 files changed, 11 insertions(+) > > diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h > index bbe201047df6..678496897a68 100644 > --- a/include/uapi/linux/inet_diag.h > +++ b/include/uapi/linux/inet_diag.h > @@ -142,6 +142,7 @@ enum { > INET_DIAG_PAD, > INET_DIAG_MARK, > INET_DIAG_BBRINFO, > + INET_DIAG_CLASS_ID, > __INET_DIAG_MAX, > }; > > diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c > index 3828b3a805cd..2c2445d4bb58 100644 > --- a/net/ipv4/inet_diag.c > +++ b/net/ipv4/inet_diag.c > @@ -274,6 +274,16 @@ int inet_sk_diag_fill(struct sock *sk, struct > inet_connection_sock *icsk, > goto errout; > } > > + if (ext & (1 << (INET_DIAG_CLASS_ID - 1))) { > + u32 classid = 0; > + > +#ifdef CONFIG_SOCK_CGROUP_DATA > + classid = sock_cgroup_classid(&sk->sk_cgrp_data); > +#endif > + > + nla_put_u32(skb, INET_DIAG_CLASS_ID, classid); You need to check the return value from nla_put_u32() and goto errout if it's set. Perhaps adding __must_check to the nla_put_*() helpers would be a good idea. > + } > + > out: > nlmsg_end(skb, nlh); > return 0;
Re: [PATCH tip/core/rcu 07/15] rcu: Add event tracing to ->gp_tasks update at GP start
On Thu, Jul 27, 2017 at 09:38:10PM -0400, Steven Rostedt wrote: > On Mon, 24 Jul 2017 14:44:36 -0700 > "Paul E. McKenney" wrote: > > > There is currently event tracing to track when a task is preempted > > within a preemptible RCU read-side critical section, and also when that > > task subsequently reaches its outermost rcu_read_unlock(), but none > > indicating when a new grace period starts when that grace period must > > wait on pre-existing readers that have been been preempted at least once > > since the beginning of their current RCU read-side critical sections. > > > > This commit therefore adds an event trace at grace-period start in > > the case where there are such readers. Note that only the first > > reader in the list is traced. > > > > Signed-off-by: Paul E. McKenney > > --- > > kernel/rcu/tree_plugin.h | 9 - > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h > > index 14ba496a13cd..3e3f92e981a1 100644 > > --- a/kernel/rcu/tree_plugin.h > > +++ b/kernel/rcu/tree_plugin.h > > @@ -636,10 +636,17 @@ static int rcu_print_task_exp_stall(struct rcu_node > > *rnp) > > */ > > static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp) > > { > > + struct task_struct *t; > > + > > RCU_LOCKDEP_WARN(preemptible(), "rcu_preempt_check_blocked_tasks() > > invoked with preemption enabled!!!\n"); > > WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp(rnp)); > > - if (rcu_preempt_has_tasks(rnp)) > > + if (rcu_preempt_has_tasks(rnp)) { > > The only function of this if block is to fill the content of the > trace event, correct? > > What about doing: > > if (trace_rcu_unlock_preempted_task_enabled() && > rcu_preempt_has_tasks(rnp)) { > > instead? The trace_rcu_unlock_preempted_task_enabled() is a static > branch (aka jump_label), which would make the above a constant branch > when tracing is not enabled, and would keep this from adding any extra > overhead. > > -- Steve > > > rnp->gp_tasks = rnp->blkd_tasks.next; The trace_rcu_unlock_preempted_task_enabled() call is a new one on me, thank you! Unfortunately, the above assignment to rnp->gp_tasks is required even if tracing is disabled. The reason is that the newly started grace period needs to wait on all tasks that have been preempted within their current RCU read-side critical section, and rnp->gp_tasks records the point in the rnp->blkd_tasks list beyond which all preempted tasks block this new grace period. If this assignment is omitted, we get too-short grace periods, and the tasks on this list might still be holding references to stuff that gets freed at the end of this new grace period. I applied your two acks, thank you! Thanx, Paul > > + t = container_of(rnp->gp_tasks, struct task_struct, > > +rcu_node_entry); > > + trace_rcu_unlock_preempted_task(TPS("rcu_preempt-GPS"), > > + rnp->gpnum, t->pid); > > + } > > WARN_ON_ONCE(rnp->qsmask); > > } > > >
Re: [PATCH net-next 3/3] tap: XDP support
On Thu, 27 Jul 2017 17:25:33 +0800, Jason Wang wrote: > This patch tries to implement XDP for tun. The implementation was > split into two parts: > > - fast path: small and no gso packet. We try to do XDP at page level > before build_skb(). For XDP_TX, since creating/destroying queues > were completely under control of userspace, it was implemented > through generic XDP helper after skb has been built. This could be > optimized in the future. > - slow path: big or gso packet. We try to do it after skb was created > through generic XDP helpers. > > XDP_REDIRECT was not implemented, it could be done on top. > > xdp1 test shows 47.6% improvement: > > Before: ~2.1Mpps > After: ~3.1Mpps > > Suggested-by: Michael S. Tsirkin > Signed-off-by: Jason Wang > @@ -1008,6 +1016,56 @@ tun_net_get_stats64(struct net_device *dev, struct > rtnl_link_stats64 *stats) > stats->tx_dropped = tx_dropped; > } > > +static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog, > +struct netlink_ext_ack *extack) > +{ > + struct tun_struct *tun = netdev_priv(dev); > + struct bpf_prog *old_prog; > + > + /* We will shift the packet that can't be handled to generic > + * XDP layer. > + */ > + > + old_prog = rtnl_dereference(tun->xdp_prog); > + if (old_prog) > + bpf_prog_put(old_prog); > + rcu_assign_pointer(tun->xdp_prog, prog); Is this OK? Could this lead to the program getting freed and then datapath accessing a stale pointer? I mean in the scenario where the process gets pre-empted between the bpf_prog_put() and rcu_assign_pointer()? > + if (prog) { > + prog = bpf_prog_add(prog, 1); > + if (IS_ERR(prog)) > + return PTR_ERR(prog); > + } I don't think you need this extra reference here. dev_change_xdp_fd() will call bpf_prog_get_type() which means driver gets the program with a reference already taken, drivers does have to free that reference when program is removed (or device is freed, as you correctly do). > + return 0; > +} > +
Re: [PATCH 2/6] dma: bcm-sba-raid: Peek mbox when we are left with no free requests
On Thu, Jul 27, 2017 at 10:25:25AM +0530, Anup Patel wrote: > On Wed, Jul 26, 2017 at 10:40 PM, Vinod Koul wrote: > > On Wed, Jul 26, 2017 at 11:06:40AM +0530, Anup Patel wrote: > >> We should peek mbox channels when we are left with no free > >> sba_requests in sba_alloc_request() > > > > and why is the world should we do that, how does that help?? > > When setting up RAID array on several NVMe disk we observed > that sba_alloc_request() start failing (due to no free requests left) > and RAID array setup becomes very slow. > > Doing mbox channel peek when we have no free requests left, > improves performance of RAID array setup. How about documenting this tribal knowledge in the changelog. Changelogs are very useful, 6 months down the line, you will struggle to remember why this was changed.. > > This change is inspired from mv_chan_alloc_slot() implemented > in drivers/dma/mv_xor.c > > Regards, > Anup -- ~Vinod
Re: [PATCH Y.A. RESEND] MAINTAINERS: fix alpha. ordering
On Thu, 2017-07-27 at 19:43 -0700, Linus Torvalds wrote: > On Thu, Jul 27, 2017 at 5:30 PM, Joe Perches wrote: > > > > Maybe add a reordering of the patterns so that each pattern list > > is in a specific order too > > I don't think this is wrong per se, but I'm not sure I want to get > into the merge hell any more than we are already. > > Maybe when/if that file is actually split up? Fine by me. The get_maintainer patch is a prereq to any split-up. There are a bunch of little niggly patches that should go in that remove/update bad F: patterns too one day. Given the differences between -next and your tree, I think only Andrew and quilt would do a decent job of getting individual patches merged. Unless you want to take them. I think it's better to centralize the MAINTAINERS location in /MAINTAINERS/ than spread them all over the tree given how many subsystems and maintainerships are also spread around the tree. But the get_maintainers patch I sent allows both styles.
Re: [PATCH 1/6] dma: bcm-sba-raid: Improve memory allocation in SBA RAID driver
On Thu, Jul 27, 2017 at 09:42:33AM +0530, Anup Patel wrote: > On Wed, Jul 26, 2017 at 10:39 PM, Vinod Koul wrote: > > On Wed, Jul 26, 2017 at 11:06:39AM +0530, Anup Patel wrote: > >> drivers/dma/bcm-sba-raid.c | 439 > >> +++-- > >> 1 file changed, 226 insertions(+), 213 deletions(-) > >> > >> diff --git a/drivers/dma/bcm-sba-raid.c b/drivers/dma/bcm-sba-raid.c > >> index e41bbc7..6d15fed 100644 > >> --- a/drivers/dma/bcm-sba-raid.c > >> +++ b/drivers/dma/bcm-sba-raid.c > >> @@ -48,7 +48,8 @@ > >> > >> #include "dmaengine.h" > >> > >> -/* SBA command related defines */ > >> +/* == Driver macros and defines = */ > > > > why this noise, seems unrelated to the change! > > This is just minor beautification. Again, I will put this > in separate patch. Well you can't shove garlands under an unrelated change. By all means throw the whole garden out there, but please as a separate patch > > > > >> + > >> #define SBA_TYPE_SHIFT 48 > >> #define SBA_TYPE_MASKGENMASK(1, 0) > >> #define SBA_TYPE_A 0x0 > >> @@ -82,39 +83,41 @@ > >> #define SBA_CMD_WRITE_BUFFER 0xc > >> #define SBA_CMD_GALOIS 0xe > >> > >> -/* Driver helper macros */ > >> +#define SBA_MAX_REQ_PER_MBOX_CHANNEL 8192 > >> + > >> #define to_sba_request(tx) \ > >> container_of(tx, struct sba_request, tx) > >> #define to_sba_device(dchan) \ > >> container_of(dchan, struct sba_device, dma_chan) > >> > >> -enum sba_request_state { > >> - SBA_REQUEST_STATE_FREE = 1, > >> - SBA_REQUEST_STATE_ALLOCED = 2, > >> - SBA_REQUEST_STATE_PENDING = 3, > >> - SBA_REQUEST_STATE_ACTIVE = 4, > >> - SBA_REQUEST_STATE_RECEIVED = 5, > >> - SBA_REQUEST_STATE_COMPLETED = 6, > >> - SBA_REQUEST_STATE_ABORTED = 7, > >> +/* = Driver data structures = */ > >> + > >> +enum sba_request_flags { > >> + SBA_REQUEST_STATE_FREE = 0x001, > >> + SBA_REQUEST_STATE_ALLOCED = 0x002, > >> + SBA_REQUEST_STATE_PENDING = 0x004, > >> + SBA_REQUEST_STATE_ACTIVE= 0x008, > >> + SBA_REQUEST_STATE_RECEIVED = 0x010, > >> + SBA_REQUEST_STATE_COMPLETED = 0x020, > >> + SBA_REQUEST_STATE_ABORTED = 0x040, > >> + SBA_REQUEST_STATE_MASK = 0x0ff, > >> + SBA_REQUEST_FENCE = 0x100, > > > > how does this help in mem alloctn? ?? -- ~Vinod
Re: [PATCH v7 2/3] PCI: Enable PCIe Relaxed Ordering if supported
On 2017/7/28 1:49, Alexander Duyck wrote: > On Wed, Jul 26, 2017 at 6:08 PM, Ding Tianhong > wrote: >> >> >> On 2017/7/27 2:26, Casey Leedom wrote: >>> By the way Ding, two issues: >>> >>> 1. Did we ever get any acknowledgement from either Intel or AMD >>> on this patch? I know that we can't ensure that, but it sure would >>> be nice since the PCI Quirks that we're putting in affect their >>> products. >>> >> >> Still no Intel and AMD guys has ack this, this is what I am worried about, >> should I >> ping some man again ? >> >> Thanks >> Ding > > > I probably wouldn't worry about it too much. If anything all this > patch is doing is disabling relaxed ordering on the platforms we know > have issues based on what Casey originally had. If nothing else we can > follow up once the patches are in the kernel and if somebody has an > issue then. > > You can include my acked-by, but it is mostly related to how this > interacts with NICs, and not so much about the PCI chipsets > themselves. > > Acked-by: Alexander Duyck > Thanks, Alex. :) > . >