[patch 13/14] x86/apic: Use default send single IPI wrapper

2015-11-04 Thread Thomas Gleixner
Wire up the default_send_IPI_single() wrapper to the last holdouts.

Signed-off-by: Thomas Gleixner 
---
 arch/x86/kernel/apic/apic_flat_64.c |1 +
 arch/x86/kernel/apic/probe_32.c |1 +
 2 files changed, 2 insertions(+)

Index: linux/arch/x86/kernel/apic/apic_flat_64.c
===
--- linux.orig/arch/x86/kernel/apic/apic_flat_64.c
+++ linux/arch/x86/kernel/apic/apic_flat_64.c
@@ -185,6 +185,7 @@ static struct apic apic_flat =  {
 
.cpu_mask_to_apicid_and = flat_cpu_mask_to_apicid_and,
 
+   .send_IPI   = default_send_IPI_single,
.send_IPI_mask  = flat_send_IPI_mask,
.send_IPI_mask_allbutself   = flat_send_IPI_mask_allbutself,
.send_IPI_allbutself= flat_send_IPI_allbutself,
Index: linux/arch/x86/kernel/apic/probe_32.c
===
--- linux.orig/arch/x86/kernel/apic/probe_32.c
+++ linux/arch/x86/kernel/apic/probe_32.c
@@ -105,6 +105,7 @@ static struct apic apic_default = {
 
.cpu_mask_to_apicid_and = flat_cpu_mask_to_apicid_and,
 
+   .send_IPI   = default_send_IPI_single,
.send_IPI_mask  = default_send_IPI_mask_logical,
.send_IPI_mask_allbutself   = 
default_send_IPI_mask_allbutself_logical,
.send_IPI_allbutself= default_send_IPI_allbutself,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] livepatch: Cleanup page permission changes

2015-11-04 Thread Jiri Kosina
On Tue, 3 Nov 2015, Josh Poimboeuf wrote:

> Subject: [PATCH] livepatch: Cleanup page permission changes
> 
> Calling set_memory_rw() and set_memory_ro() for every iteration of the
> loop in klp_write_object_relocations() is messy and inefficient.  Change
> all the RO pages to RW before the loop and convert them back to RO after
> the loop.

Generally speaking, I like the patch and would like to have this in 4.4 
still (if worse becomes worst and we don't make it in time for merge 
window, this still qualifies for -rc bugfix).

> Suggested-by: Miroslav Benes 
> Signed-off-by: Josh Poimboeuf 
> ---
>  arch/x86/kernel/livepatch.c | 25 ++---
>  kernel/livepatch/core.c | 42 +-
>  2 files changed, 39 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/kernel/livepatch.c b/arch/x86/kernel/livepatch.c
> index d1d35cc..1062eff 100644
> --- a/arch/x86/kernel/livepatch.c
> +++ b/arch/x86/kernel/livepatch.c
> @@ -20,8 +20,6 @@
>  
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
>  #include 
>  
> @@ -38,8 +36,7 @@
>  int klp_write_module_reloc(struct module *mod, unsigned long type,
>  unsigned long loc, unsigned long value)
>  {
> - int ret, numpages, size = 4;
> - bool readonly;
> + int size = 4;

BTW I don't see a reason to have 'size' signed here.

[ ... snip ... [
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -28,6 +28,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /**
>   * struct klp_ops - structure for tracking registered ftrace ops structs
> @@ -131,6 +132,33 @@ static bool klp_initialized(void)
>   return !!klp_root_kobj;
>  }
>  
> +#ifdef CONFIG_DEBUG_SET_MODULE_RONX
> +static void set_page_attributes(void *start, void *end,
> + int (*set)(unsigned long start, int num_pages))
> +{
> + unsigned long begin_pfn = PFN_DOWN((unsigned long)start);
> + unsigned long end_pfn = PFN_DOWN((unsigned long)end);
> +
> + if (end_pfn > begin_pfn)
> + set(begin_pfn << PAGE_SHIFT, end_pfn - begin_pfn);
> +}
> +static void set_module_ro_rw(struct module *mod)
> +{
> + set_page_attributes(mod->module_core,
> + mod->module_core + mod->core_ro_size,
> + set_memory_rw);
> +}
> +static void set_module_ro_ro(struct module *mod)

Honestly, I find both the function names above horrible and not really 
self-explanatory (especially the _ro_ro variant). At least comment, 
explaining what they are actually doing, or picking up a better name, 
would make the code much more self-explanatory in my eyes.

Thanks,

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/6] Documentation: tps65086: Add DT bindings for the TPS65086 PMIC

2015-11-04 Thread Rob Herring
On Wed, Nov 04, 2015 at 11:12:10AM -0600, Andrew F. Davis wrote:
> The TPS65086 PMIC contains several regulators and a GPO controller.
> Add bindings for the TPS65086 PMIC.
> 
> Signed-off-by: Andrew F. Davis 

Acked-by: Rob Herring 

> ---
>  Documentation/devicetree/bindings/mfd/tps65086.txt | 46 
> ++
>  1 file changed, 46 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/mfd/tps65086.txt
> 
> diff --git a/Documentation/devicetree/bindings/mfd/tps65086.txt 
> b/Documentation/devicetree/bindings/mfd/tps65086.txt
> new file mode 100644
> index 000..2fd5394
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mfd/tps65086.txt
> @@ -0,0 +1,46 @@
> +* TPS65086 Power Management Integrated Circuit bindings
> +
> +Required properties:
> + - compatible: Should be "ti,tps65086".
> + - reg   : I2C slave address.
> + - interrupt-parent  : Phandle to he parent interrupt controller.
> + - interrupts: The interrupt line the device is connected to.
> + - interrupt-controller  : Marks the device node as an interrupt 
> controller.
> + - #interrupt-cells  : The number of cells to describe an IRQ, this
> + should be 2. The first cell is the IRQ number,
> + the second cell is the flags, encoded as the trigger
> + masks from .
> +
> +Additional nodes defined in:
> + - Regulators: ../regulator/tps65086-regulator.txt
> + - GPIO  : ../gpio/gpio-tps65086.txt
> +
> +Example:
> +
> + pmic: tps65086@5e {
> + compatible = "ti,tps65086";
> + reg = <0x5e>;
> + interrupt-parent = <>;
> + interrupts = <28 IRQ_TYPE_LEVEL_LOW>;
> + interrupt-controller;
> + #interrupt-cells = <2>;
> +
> + regulators {
> + compatible = "ti,tps65086-regulator";
> +
> + buck1 {
> + regulator-name = "vcc1";
> + regulator-min-microvolt = <160>;
> + regulator-max-microvolt = <160>;
> + regulator-boot-on;
> + ti,regulator-decay;
> + ti,regulator-step-size-25mv;
> + };
> + };
> +
> + gpio4: gpio {
> + compatible = "ti,tps65086-gpio";
> + gpio-controller;
> + #gpio-cells = <2>;
> + };
> + };
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

2015-11-04 Thread Johannes Weiner
On Wed, Nov 04, 2015 at 04:48:17PM -0500, Daniel Micay wrote:
> > Even if we're wrong about the aging of those MADV_FREE pages, their
> > contents are invalidated; they can be discarded freely, and restoring
> > them is a mere GFP_ZERO allocation. All other anonymous pages have to
> > be written to disk, and potentially be read back.
> > 
> > [ Arguably, MADV_FREE pages should even be reclaimed before inactive
> >   page cache. It's the same cost to discard both types of pages, but
> >   restoring page cache involves IO. ]
> 
> Keep in mind that this is memory the kernel wouldn't be getting back at
> all if the allocator wasn't going out of the way to purge it, and they
> aren't going to go out of their way to purge it if it means the kernel
> is going to steal the pages when there isn't actually memory pressure.

Well, obviously you'd still only reclaim them on memory pressure. I'm
only talking about where these pages should go on the LRU hierarchy.

> > It probably makes sense to stop thinking about them as anonymous pages
> > entirely at this point when it comes to aging. They're really not. The
> > LRU lists are split to differentiate access patterns and cost of page
> > stealing (and restoring). From that angle, MADV_FREE pages really have
> > nothing in common with in-use anonymous pages, and so they shouldn't
> > be on the same LRU list.
> > 
> > That would also fix the very unfortunate and unexpected consequence of
> > tying the lazy free optimization to the availability of swap space.
> > 
> > I would prefer to see this addressed before the code goes upstream.
> 
> I don't think it would be ideal for these potentially very hot pages to
> be dropped before very cold pages were swapped out. It's the kind of
> tuning that needs to be informed by lots of real world experience and
> lots of testing. It wouldn't impact the API.

What about them is hot? They contain garbage, you have to write to
them before you can use them. Granted, you might have to refetch
cachelines if you don't do cacheline-aligned populating writes, but
you can do a lot of them before it's more expensive than doing IO.

> Whether MADV_FREE is useful as an API vs. something like a pair of
> system calls for pinning and unpinning memory is what should be worried
> about right now. The internal implementation just needs to be correct
> and useful right now, not perfect. Simpler is probably better than it
> being more well tuned for an initial implementation too.

Yes, it wouldn't impact the API, but the dependency on swap is very
random from a user experience and severely limits the usefulness of
this. It should probably be addressed before this gets released. As
this involves getting the pages off the anon LRU, we need to figure
out where they should go instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] dwc2: Speed up the interrupt handler quite a bit

2015-11-04 Thread Douglas Anderson
The dwc2 interrupt handler is quite slow.  On rk3288 with a few things
plugged into the ports and with cpufreq locked at 696MHz (to simulate
real world idle system), I can easily observe dwc2_handle_hcd_intr()
taking > 120 us, sometimes > 150 us.  Note that SOF interrupts come
every 125 us with high speed USB, so taking > 120 us in the interrupt
handler is a big deal.

The patches here will speed up the interrupt controller significantly.
After this series, I have a hard time seeing the interrupt controller
taking > 20 us and I don't ever see it taking > 30 us ever in my tests
unless I bring the cpufreq back down.  With the cpufreq at 126 MHz I can
still see the interrupt handler take > 50 us, so I'm sure we could
improve this further.  ...but hey, it's a start.

In addition to the speedup, this series also has the advantage of
simplifying dwc2 and making it more like everyone else (introducing the
possibility of future simplifications).  Picking this series up will
help your diffstat and likely win you friends.  ;)

===

Steps for gathering data with ftrace:

cd /sys/devices/system/cpu/cpu0/cpufreq/
echo userspace > scaling_governor
echo 696000 > scaling_setspeed

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
echo "" > trace
echo nop > current_tracer
echo function_graph > current_tracer
echo dwc2_handle_hcd_intr > set_graph_function
echo dwc2_handle_common_intr >> set_graph_function
echo dwc2_handle_hcd_intr > set_ftrace_filter
echo dwc2_handle_common_intr >> set_ftrace_filter
echo funcgraph-abstime > trace_options
echo 70 > tracing_thresh
echo 1 > /sys/kernel/debug/tracing/tracing_on

sleep 2
cat trace

===

NOTE: This series doesn't replace any other patches I've submitted
recently, it merely adds another set of changes that upstream could
benefit from.


Douglas Anderson (3):
  usb: dwc2: rockchip: Make the max_transfer_size automatic
  usb: dwc2: host: Giveback URB in tasklet context
  usb: dwc2: host: Get aligned DMA in a more supported way

 drivers/usb/dwc2/core.c  |  21 +-
 drivers/usb/dwc2/hcd.c   | 170 ---
 drivers/usb/dwc2/hcd.h   |  10 ---
 drivers/usb/dwc2/hcd_intr.c  |  65 -
 drivers/usb/dwc2/hcd_queue.c |   7 +-
 drivers/usb/dwc2/platform.c  |   2 +-
 6 files changed, 85 insertions(+), 190 deletions(-)

-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] usb: dwc2: host: Giveback URB in tasklet context

2015-11-04 Thread Douglas Anderson
In commit 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet
context") support was added to give back the URB in tasklet context.
Let's take advantage of this in dwc2.

This speeds up the dwc2 interrupt handler considerably.

Signed-off-by: Douglas Anderson 
---
 drivers/usb/dwc2/hcd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index e79baf73c234..9e7988950c7a 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -2273,9 +2273,7 @@ void dwc2_host_complete(struct dwc2_hsotg *hsotg, struct 
dwc2_qtd *qtd,
kfree(qtd->urb);
qtd->urb = NULL;
 
-   spin_unlock(>lock);
usb_hcd_giveback_urb(dwc2_hsotg_to_hcd(hsotg), urb, status);
-   spin_lock(>lock);
 }
 
 /*
@@ -2888,7 +2886,7 @@ static struct hc_driver dwc2_hc_driver = {
.hcd_priv_size = sizeof(struct wrapper_priv_data),
 
.irq = _dwc2_hcd_irq,
-   .flags = HCD_MEMORY | HCD_USB2,
+   .flags = HCD_MEMORY | HCD_USB2 | HCD_BH,
 
.start = _dwc2_hcd_start,
.stop = _dwc2_hcd_stop,
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] arm64 updates for 4.4

2015-11-04 Thread Linus Torvalds
On Wed, Nov 4, 2015 at 10:25 AM, Catalin Marinas
 wrote:
>
> - Support for 16KB pages, with the additional bonus of a 36-bit VA
>   space, though the latter only depending on EXPERT

So I told the ppc people this many years ago, and I guess I'll tell
you guys too: 16kB pages are not actually useful, and anybody who
thinks they are have not actually done the math.

It ends up being a horrible waste of memory for things like the page
cache, to the point where all the arguments for it ("it allows us to
manage lots of memory more cheaply") are pure and utter BS, because
you effectively lose half of that memory to fragmentation in pretty
much all normal loads.

It's good for single-process loads - if you do a lot of big fortran
jobs, or a lot of big database loads, and nothing else, you're fine.
Or if you are an embedded OS and only haev one particular load you
worry about.

But it is really really nasty for any general-purpose stuff, and when
your hardware people tell you that it's a great way to make your TLB's
more effective, tell them back that they are incompetent morons, and
that they should just make their TLB's better.

Because they are.

To make them understand the problem, compare it to having a 256-byte
cacheline. They might understand it then, because you're talking about
things that they almost certainly *also* wanted to do, but did the
numbers on, and realized it was bad.

And on the other hand, if they go "Hmm. 256-byte cache lines? We
should do that too", then you know they are not worth your time, and
you can quietly tell your bosses that they should up the medication in
the watercooler in the hw lab.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] usb: dwc2: host: Get aligned DMA in a more supported way

2015-11-04 Thread Douglas Anderson
All other host controllers who want aligned buffers for DMA do it a
certain way.  Let's do that too instead of working behind the USB core's
back.  This makes our interrupt handler not take forever and also rips
out a lot of code, simplifying things a bunch.

This also has the side effect of removing the 65535 max transfer size
limit.

NOTE: The actual code to allocate the aligned buffers is ripped almost
completely from the tegra EHCI driver.  At some point in the future we
may want to add this functionality to the USB core to share more code
everywhere.

Signed-off-by: Douglas Anderson 
---
 drivers/usb/dwc2/core.c  |  21 +-
 drivers/usb/dwc2/hcd.c   | 166 ---
 drivers/usb/dwc2/hcd.h   |  10 ---
 drivers/usb/dwc2/hcd_intr.c  |  65 -
 drivers/usb/dwc2/hcd_queue.c |   7 +-
 5 files changed, 83 insertions(+), 186 deletions(-)

diff --git a/drivers/usb/dwc2/core.c b/drivers/usb/dwc2/core.c
index ef73e498e98f..7e28cfafcfd8 100644
--- a/drivers/usb/dwc2/core.c
+++ b/drivers/usb/dwc2/core.c
@@ -1830,19 +1830,11 @@ void dwc2_hc_start_transfer(struct dwc2_hsotg *hsotg,
}
 
if (hsotg->core_params->dma_enable > 0) {
-   dma_addr_t dma_addr;
-
-   if (chan->align_buf) {
-   if (dbg_hc(chan))
-   dev_vdbg(hsotg->dev, "align_buf\n");
-   dma_addr = chan->align_buf;
-   } else {
-   dma_addr = chan->xfer_dma;
-   }
-   dwc2_writel((u32)dma_addr, hsotg->regs + HCDMA(chan->hc_num));
+   dwc2_writel((u32)chan->xfer_dma,
+   hsotg->regs + HCDMA(chan->hc_num));
if (dbg_hc(chan))
dev_vdbg(hsotg->dev, "Wrote %08lx to HCDMA(%d)\n",
-(unsigned long)dma_addr, chan->hc_num);
+(unsigned long)chan->xfer_dma, chan->hc_num);
}
 
/* Start the split */
@@ -3137,13 +3129,6 @@ int dwc2_get_hwparams(struct dwc2_hsotg *hsotg)
width = (hwcfg3 & GHWCFG3_XFER_SIZE_CNTR_WIDTH_MASK) >>
GHWCFG3_XFER_SIZE_CNTR_WIDTH_SHIFT;
hw->max_transfer_size = (1 << (width + 11)) - 1;
-   /*
-* Clip max_transfer_size to 65535. dwc2_hc_setup_align_buf() allocates
-* coherent buffers with this size, and if it's too large we can
-* exhaust the coherent DMA pool.
-*/
-   if (hw->max_transfer_size > 65535)
-   hw->max_transfer_size = 65535;
width = (hwcfg3 & GHWCFG3_PACKET_SIZE_CNTR_WIDTH_MASK) >>
GHWCFG3_PACKET_SIZE_CNTR_WIDTH_SHIFT;
hw->max_packet_count = (1 << (width + 4)) - 1;
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 9e7988950c7a..4487f1b262b2 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -598,9 +598,9 @@ static void dwc2_hc_init_split(struct dwc2_hsotg *hsotg,
chan->hub_port = (u8)hub_port;
 }
 
-static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
-  struct dwc2_host_chan *chan,
-  struct dwc2_qtd *qtd, void *bufptr)
+static void dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
+ struct dwc2_host_chan *chan,
+ struct dwc2_qtd *qtd)
 {
struct dwc2_hcd_urb *urb = qtd->urb;
struct dwc2_hcd_iso_packet_desc *frame_desc;
@@ -620,7 +620,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
else
chan->xfer_buf = urb->setup_packet;
chan->xfer_len = 8;
-   bufptr = NULL;
break;
 
case DWC2_CONTROL_DATA:
@@ -647,7 +646,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
chan->xfer_dma = hsotg->status_buf_dma;
else
chan->xfer_buf = hsotg->status_buf;
-   bufptr = NULL;
break;
}
break;
@@ -680,14 +678,6 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
 
chan->xfer_len = frame_desc->length - qtd->isoc_split_offset;
 
-   /* For non-dword aligned buffers */
-   if (hsotg->core_params->dma_enable > 0 &&
-   (chan->xfer_dma & 0x3))
-   bufptr = (u8 *)urb->buf + frame_desc->offset +
-   qtd->isoc_split_offset;
-   else
-   bufptr = NULL;
-
if (chan->xact_pos == DWC2_HCSPLT_XACTPOS_ALL) {
if (chan->xfer_len <= 188)
chan->xact_pos = DWC2_HCSPLT_XACTPOS_ALL;
@@ -696,63 +686,89 @@ static void *dwc2_hc_init_xfer(struct dwc2_hsotg *hsotg,
}
  

[PATCH 1/3] usb: dwc2: rockchip: Make the max_transfer_size automatic

2015-11-04 Thread Douglas Anderson
Previously we needed to set the max_transfer_size to explicitly be 65535
because the old driver would detect that our hardware could support much
bigger transfers and then would try to do them.  This wouldn't work
since the DMA alignment code couldn't support it.

Later in commit e8f8c14d9da7 ("usb: dwc2: clip max_transfer_size to
65535") upstream added support for clipping this automatically.  Since
that commit it has been OK to just use "-1" (default), but nobody
bothered to change it.

Let's change it to default now for two reasons:
- It's nice to use autodetected params.
- If we can remove the 65535 limit, we can transfer more!

Signed-off-by: Douglas Anderson 
---
 drivers/usb/dwc2/platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/dwc2/platform.c b/drivers/usb/dwc2/platform.c
index 5859b0fa19ee..f26e0c31c07e 100644
--- a/drivers/usb/dwc2/platform.c
+++ b/drivers/usb/dwc2/platform.c
@@ -95,7 +95,7 @@ static const struct dwc2_core_params params_rk3066 = {
.host_rx_fifo_size  = 520,  /* 520 DWORDs */
.host_nperio_tx_fifo_size   = 128,  /* 128 DWORDs */
.host_perio_tx_fifo_size= 256,  /* 256 DWORDs */
-   .max_transfer_size  = 65535,
+   .max_transfer_size  = -1,
.max_packet_count   = -1,
.host_channels  = -1,
.phy_type   = -1,
-- 
2.6.0.rc2.230.g3dd15c0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v5] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-04 Thread Nelson, Shannon
> From: Andy Shevchenko [mailto:andy.shevche...@gmail.com]
> Sent: Wednesday, November 04, 2015 11:59 AM
> 
> On Wed, Nov 4, 2015 at 9:39 PM, Sowmini Varadhan
>  wrote:
> >
> > This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC
> > address in Open Firmware or IDPROM").

[...]

> > +   }
> > +
> > +   memset(, 0, sizeof(element));
> > +   ether_addr_copy(element.mac_addr, macaddr);
> > +   element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH);
> > +   ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, ,
> 1, NULL);
> > +   aq_err = vsi->back->hw.aq.asq_last_status;
> 
> Do you really need a separate variable (aq_err)?

These are two separate error values that we're tracking - one from the 
communication between the driver and the firmware (aq_err) and one from the 
driver activity.  Sometimes there may be an AQ error that we want to report, 
but it might not actually be a driver error.  Alternatively, there are times 
when the AQ error needs to get interpreted different ways depending on which 
task the driver is performing.  Lastly, the AQ error gives us more detail on 
whatever the transaction error may have been which gives us more useful debug 
info.

sln


Re: [PATCH v3 2/6] Documentation: tps65086: Add DT bindings for the TPS65086 regulators

2015-11-04 Thread Rob Herring
On Wed, Nov 04, 2015 at 11:12:11AM -0600, Andrew F. Davis wrote:
> The TPS65086 PMIC contains several regulators and a GPO controller.
> Add bindings for the TPS65086 regulators.
> 
> Signed-off-by: Andrew F. Davis 

Acked-by: Rob Herring 

> ---
>  .../bindings/regulator/tps65086-regulator.txt  | 35 
> ++
>  1 file changed, 35 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/regulator/tps65086-regulator.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/regulator/tps65086-regulator.txt 
> b/Documentation/devicetree/bindings/regulator/tps65086-regulator.txt
> new file mode 100644
> index 000..de7d2d6
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/regulator/tps65086-regulator.txt
> @@ -0,0 +1,35 @@
> +* TPS65086 regulator bindings
> +
> +Required properties:
> + - compatible: Should be "ti,tps65086-regulator".
> + - list of regulators provided by this controller, must be named after their
> + hardware counterparts: buck[1-6], ldoa[1-3], swa1, swb[1-2], and vtt.
> +
> +Optional properties:
> + - Per-regulator optional properties are defined in regulator.txt.
> + - ti,regulator-step-size-25mv   : This is applicable for buck[1,2,6], 
> set this
> +   if the regulator is factory set with a 25mv
> +   step voltage mapping.
> + - ti,regulator-decay: This is applicable for buck[1-6], set 
> this if
> +   the output needs to decay, default is for the
> +   output to slew down.
> +
> +Example:
> +
> + regulators {
> + compatible = "ti,tps65086-regulator";
> +
> + buck1 {
> + regulator-name = "vcc1";
> + regulator-min-microvolt = <160>;
> + regulator-max-microvolt = <160>;
> + regulator-boot-on;
> + ti,regulator-step-size-25mv;
> + ti,regulator-decay;
> + };
> +
> + swa1 {
> + regulator-name = "ls1";
> + regulator-boot-on;
> + };
> + };
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH 19/32] block: add helper to get data dir from op

2015-11-04 Thread Bart Van Assche

On 11/04/2015 02:08 PM, mchri...@redhat.com wrote:

From: Mike Christie 

In later patches the op will no longer be a bitmap, so we will
not have REQ_WRITE set for all non reads like discard, flush,
and write same. Drivers will still want to treat them as writes
for accounting reasons, so this patch adds a helper to translate
a op to a data direction.

Signed-off-by: Mike Christie 
---
  include/linux/blkdev.h | 12 
  1 file changed, 12 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 19c2e94..cf5f518 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -586,6 +586,18 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)

  #define list_entry_rq(ptr)list_entry((ptr), struct request, queuelist)

+/*
+ * Non REQ_OP_WRITE requests like discard, write same, etc, are
+ * considered WRITEs.
+ */
+static inline int op_to_data_dir(int op)
+{
+   if (op == REQ_OP_READ)
+   return READ;
+   else
+   return WRITE;
+}
+
  #define rq_data_dir(rq)   ((int)((rq)->cmd_flags & 1))

  /*



How about introducing two functions - op_is_write() and op_is_read() ? I 
think that approach will result in shorter and easier to read code in 
the contexts where these functions are used.


Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bpf: add mod default A and X test cases

2015-11-04 Thread Xi Wang
On Wed, Nov 4, 2015 at 11:36 AM, Yang Shi  wrote:
> When running "mod X" operation, if X is 0 the filter has to be halt.
> Add new test cases to cover A = A mod X if X is 0, and A = A mod 1.
>
> CC: Xi Wang 
> CC: Zi Shen Lim 
> Signed-off-by: Yang Shi 

Acked-by: Xi Wang 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH net] net: dsa: mv88e6xxx: isolate unbridged ports

2015-11-04 Thread Florian Fainelli
On 04/11/15 14:23, Vivien Didelot wrote:
> The DSA documentation specifies that each port must be capable of
> forwarding frames to the CPU port. The last changes on bridging support
> for the mv88e6xxx driver broke this requirement for non-bridged ports.
> 
> So as for the bridged ports, reserve a few VLANs (4000+) in the switch
> to isolate ports that have not been bridged yet.
> 
> By default, a port will be isolated with the CPU and DSA ports. When the
> port joins a bridge, it will leave its reserved port. When it is removed
> from a bridge, it will join its reserved VLAN again.

This looks fine but the logic is a little hard to understand at first
glance.

> 
> Fixes: 5fe7f68016ff ("net: dsa: mv88e6xxx: fix hardware bridging")
> Reported-by: Andrew Lunn 
> Signed-off-by: Vivien Didelot 
> ---
>  drivers/net/dsa/mv88e6171.c |  2 ++
>  drivers/net/dsa/mv88e6352.c |  2 ++
>  drivers/net/dsa/mv88e6xxx.c | 42 ++
>  drivers/net/dsa/mv88e6xxx.h |  2 ++
>  4 files changed, 48 insertions(+)
> 
> diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
> index 54aa000..6e18213 100644
> --- a/drivers/net/dsa/mv88e6171.c
> +++ b/drivers/net/dsa/mv88e6171.c
> @@ -103,6 +103,8 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
>  #endif
>   .get_regs_len   = mv88e6xxx_get_regs_len,
>   .get_regs   = mv88e6xxx_get_regs,
> + .port_join_bridge   = mv88e6xxx_port_bridge_join,
> + .port_leave_bridge  = mv88e6xxx_port_bridge_leave,
>   .port_stp_update= mv88e6xxx_port_stp_update,
>   .port_pvid_get  = mv88e6xxx_port_pvid_get,
>   .port_vlan_prepare  = mv88e6xxx_port_vlan_prepare,
> diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
> index ff846d0..cc6c545 100644
> --- a/drivers/net/dsa/mv88e6352.c
> +++ b/drivers/net/dsa/mv88e6352.c
> @@ -323,6 +323,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
>   .set_eeprom = mv88e6352_set_eeprom,
>   .get_regs_len   = mv88e6xxx_get_regs_len,
>   .get_regs   = mv88e6xxx_get_regs,
> + .port_join_bridge   = mv88e6xxx_port_bridge_join,
> + .port_leave_bridge  = mv88e6xxx_port_bridge_leave,
>   .port_stp_update= mv88e6xxx_port_stp_update,
>   .port_pvid_get  = mv88e6xxx_port_pvid_get,
>   .port_vlan_prepare  = mv88e6xxx_port_vlan_prepare,
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index 04cff58..b06dba0 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -1462,6 +1462,10 @@ int mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, 
> int port,
>   const struct switchdev_obj_port_vlan *vlan,
>   struct switchdev_trans *trans)
>  {
> + /* We reserve a few VLANs to isolate unbridged ports */
> + if (vlan->vid_end >= 4000)
> + return -EOPNOTSUPP;

Since this constant is repeated 3 times, you might want to create a
local define for it and size it based on the number of ports present in
the switch rather than leaving 95 numbers?

> +
>   /* We don't need any dynamic resource from the kernel (yet),
>* so skip the prepare phase.
>*/
> @@ -1870,6 +1874,36 @@ unlock:
>   return err;
>  }
>  
> +int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port, u32 members)
> +{
> + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> + const u16 pvid = 4000 + ds->index * DSA_MAX_PORTS + port;
> + int err;
> +
> + /* The port joined a bridge, so leave its reserved VLAN */
> + mutex_lock(>smi_mutex);
> + err = _mv88e6xxx_port_vlan_del(ds, port, pvid);
> + if (!err)
> + err = _mv88e6xxx_port_pvid_set(ds, port, 0);

Does that mean that the following happens:

- bridge is created and port joins it
- port is configured to be in pvid 0 while joining
- port is then configured again by the bridge layer to be in whatever
pvid the user has decided

The other question is, does that break isolation between multiple
bridges on the same switch? Should we use the bridge ifindex here
somehow as a pvid indication?

> + mutex_unlock(>smi_mutex);
> + return err;
> +}
> +
> +int mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port, u32 members)
> +{
> + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> + const u16 pvid = 4000 + ds->index * DSA_MAX_PORTS + port;
> + int err;
> +
> + /* The port left the bridge, so join its reserved VLAN */
> + mutex_lock(>smi_mutex);
> + err = _mv88e6xxx_port_vlan_add(ds, port, pvid, true);
> + if (!err)
> + err = _mv88e6xxx_port_pvid_set(ds, port, pvid);
> + mutex_unlock(>smi_mutex);
> + return err;
> +}
> +
>  static void mv88e6xxx_bridge_work(struct work_struct *work)
>  {
>   struct mv88e6xxx_priv_state *ps;
> @@ -2140,6 +2174,14 @@ int mv88e6xxx_setup_ports(struct 

[PATCH] USB: serial: cp210x: Add tx_empty()

2015-11-04 Thread Konstantin Shkolnyy
Without this function, when the port is closed the data in the chip's
transmit FIFO are lost. If the actual byte count is reported the close
can be delayed until all data are sent.

Signed-off-by: Konstantin Shkolnyy 
---
 drivers/usb/serial/cp210x.c | 60 +
 1 file changed, 60 insertions(+)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index e91b654..756e432 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -38,6 +38,7 @@ static void cp210x_change_speed(struct tty_struct *, struct 
usb_serial_port *,
struct ktermios *);
 static void cp210x_set_termios(struct tty_struct *, struct usb_serial_port *,
struct ktermios*);
+static bool cp210x_tx_empty(struct usb_serial_port *port);
 static int cp210x_tiocmget(struct tty_struct *);
 static int cp210x_tiocmset(struct tty_struct *, unsigned int, unsigned int);
 static int cp210x_tiocmset_port(struct usb_serial_port *port,
@@ -215,6 +216,7 @@ static struct usb_serial_driver cp210x_device = {
.close  = cp210x_close,
.break_ctl  = cp210x_break_ctl,
.set_termios= cp210x_set_termios,
+   .tx_empty   = cp210x_tx_empty,
.tiocmget   = cp210x_tiocmget,
.tiocmset   = cp210x_tiocmset,
.port_probe = cp210x_port_probe,
@@ -301,6 +303,18 @@ static struct usb_serial_driver * const serial_drivers[] = 
{
 #define CONTROL_WRITE_DTR  0x0100
 #define CONTROL_WRITE_RTS  0x0200
 
+/* CP210X_GET_COMM_STATUS returns these 0x13 bytes */
+#define CP210X_COMM_STATUS_SIZE 0x13
+struct cp210x_comm_status {
+   __le32   ulErrors;
+   __le32   ulHoldReasons;
+   __le32   ulAmountInInQueue;
+   __le32   ulAmountInOutQueue;
+   u8   bEofReceived;
+   u8   bWaitForImmediate;
+   u8   bReserved;
+};
+
 /*
  * CP210X_PURGE - 16 bits passed in wValue of USB request.
  * SiLabs app note AN571 gives a strange description of the 4 bits:
@@ -551,6 +565,52 @@ static void cp210x_close(struct usb_serial_port *port)
 }
 
 /*
+ * Read how many bytes are waiting in the TX queue.
+ */
+static int cp210x_get_tx_queue_byte_count(struct usb_serial_port *port,
+   u32 *count)
+{
+   struct usb_serial *serial = port->serial;
+   struct cp210x_port_private *port_priv = usb_get_serial_port_data(port);
+   struct cp210x_comm_status *sts;
+   int result;
+
+   sts = kmalloc(CP210X_COMM_STATUS_SIZE, GFP_KERNEL);
+   if (!sts)
+   return -ENOMEM;
+
+   result = usb_control_msg(serial->dev, usb_rcvctrlpipe(serial->dev, 0),
+   CP210X_GET_COMM_STATUS, REQTYPE_INTERFACE_TO_HOST, 0x,
+   port_priv->bInterfaceNumber, sts, CP210X_COMM_STATUS_SIZE,
+   USB_CTRL_GET_TIMEOUT);
+   if (result == CP210X_COMM_STATUS_SIZE) {
+   *count = le32_to_cpu(sts->ulAmountInOutQueue);
+   result = 0;
+   } else {
+   dev_dbg(>dev, "%s error: size=%d result=%d\n",
+   __func__, CP210X_COMM_STATUS_SIZE, result);
+   if (result >= 0)
+   result = -EPROTO;
+   }
+
+   kfree(sts);
+
+   return result;
+}
+
+static bool cp210x_tx_empty(struct usb_serial_port *port)
+{
+   int err;
+   u32 count;
+
+   err = cp210x_get_tx_queue_byte_count(port, );
+   if (!err && count)
+   return false;
+
+   return true;
+}
+
+/*
  * cp210x_get_termios
  * Reads the baud rate, data bits, parity, stop bits and flow control mode
  * from the device, corrects any unsupported values, and configures the
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] mm: mmap: Add new /proc tunable for mmap_base ASLR.

2015-11-04 Thread Kees Cook
On Wed, Nov 4, 2015 at 2:10 PM, Eric W. Biederman  wrote:
> Daniel Cashman  writes:
>
>> On 11/3/15 5:31 PM, Andrew Morton wrote:
>>> On Tue, 03 Nov 2015 18:40:31 -0600 ebied...@xmission.com (Eric W. 
>>> Biederman) wrote:
>>>
 Andrew Morton  writes:

> On Tue,  3 Nov 2015 10:10:03 -0800 Daniel Cashman  
> wrote:
>
>> ASLR currently only uses 8 bits to generate the random offset for the
>> mmap base address on 32 bit architectures. This value was chosen to
>> prevent a poorly chosen value from dividing the address space in such
>> a way as to prevent large allocations. This may not be an issue on all
>> platforms. Allow the specification of a minimum number of bits so that
>> platforms desiring greater ASLR protection may determine where to place
>> the trade-off.
>
> Can we please include a very good description of the motivation for this
> change?  What is inadequate about the current code, what value does the
> enhancement have to our users, what real-world problems are being solved,
> etc.
>
> Because all we have at present is "greater ASLR protection", which doesn't
> really tell anyone anything.

 The description seemed clear to me.

 More random bits, more entropy, more work needed to brute force.

 8 bits only requires 256 tries (or a 1 in 256) chance to brute force
 something.
>>>
>>> Of course, but that's not really very useful.
>>>
 We have seen in the last couple of months on Android how only having 8 bits
 doesn't help much.
>>>
>>> Now THAT is important.  What happened here and how well does the
>>> proposed fix improve things?  How much longer will a brute-force attack
>>> take to succeed, with a particular set of kernel parameters?  Is the
>>> new duration considered to be sufficiently long and if not, are there
>>> alternative fixes we should be looking at?
>>>
>>> Stuff like this.
>>>
 Each additional bit doubles the protection (and unfortunately also
 increases fragmentation of the userspace address space).
>>>
>>> OK, so the benefit comes with a cost and people who are configuring
>>> systems (and the people who are reviewing this patchset!) need to
>>> understand the tradeoffs.  Please.
>>
>> The direct motivation here was in response to the libstagefright
>> vulnerabilities that affected Android, specifically to information
>> provided by Google's project zero at:
>>
>> http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html
>>
>> The attack there specifically used the limited randomness used in
>> generating the mmap base address as part of a brute-force-based exploit.
>>  In this particular case, the attack was against the mediaserver process
>> on Android, which was limited to respawning every 5 seconds, giving the
>> attacker an average expected success rate of defeating the mmap ASLR
>> after over 10 minutes (128 tries at 5 seconds each).  With change to the
>> maximum proposed value of 16 bits, this would change to over 45 hours
>> (32768 tries), which would make the user of such a system much more
>> likely to notice such an attack.
>>
>> I understand the desire for this clarification, and will happily try to
>> improve the explanation for this change, especially so that those
>> considering use of this option understand the tradeoffs, but I also view
>> this as one particular hardening change which is a component of making
>> attacks such as these harder, rather than the only solution.  As for the
>> clarification itself, where would you like it?  I could include a cover
>> letter for this patch-set, elaborate more in the commit message itself,
>> add more to the Kconfig help description, or some combination of the above.
>
> Unless I am mistaken this there is no cross over between different
> processes of this randomization.  Would it make sense to have this as
> an rlimit so that if you have processes on the system that are affected
> by the tradeoff differently this setting can be changed per process?

I think that could be a good future bit of work, but I'd want to get
this in for all architectures first, so we have a more common base to
work from before introducing a new rlimit.

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] tracefs: fix refcount imbalance in start_creating

2015-11-04 Thread Daniel Borkmann
In tracefs' start_creating(), we pin the file system to safely access
its root. When we failed to create a file, we unpin the file system via
failed_creating() to release the mount count and eventually the reference
of the singleton vfsmount.

However, when we run into an error during lookup_one_len() when still
in start_creating(), we only release the parent's mutex but not so the
reference on the mount.

F.e., in securityfs_create_file(), after doing simple_pin_fs() when
lookup_one_len() fails there, we infact do simple_release_fs(). This
seems necessary here as well.

Same issue seen in debugfs due to 190afd81e4a5 ("debugfs: split the
beginning and the end of __create_file() off"), which seemed to got
carried over into tracefs, too. Noticed during code review.

Fixes: 4282d60689d4 ("tracefs: Add new tracefs file system")
Signed-off-by: Daniel Borkmann 
Acked-by: Steven Rostedt 
---
 v1 -> v2:
 - Split into two patches (tracefs and debugfs, will send debugfs one 
separately)
 - Kept Steven's Acked-by

 fs/tracefs/inode.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index cbc8d5d..c66f242 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -340,8 +340,12 @@ static struct dentry *start_creating(const char *name, 
struct dentry *parent)
dput(dentry);
dentry = ERR_PTR(-EEXIST);
}
-   if (IS_ERR(dentry))
+
+   if (IS_ERR(dentry)) {
mutex_unlock(>d_inode->i_mutex);
+   simple_release_fs(_mount, _mount_count);
+   }
+
return dentry;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/32] block/fs/mm: prepare submit_bio_wait users for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares submit_bio_wait callers for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers now pass them in seperately.

Temp issue: When the fs.h read/write types, like WRITE_SYNC or
WRITE_FUA, are used we still pass in the operation along with the
flags in the flags argument. When all the code has been converted
that will be cleaned up. It is left in here for compat and git
bisect use and to try and make the patches smaller.

Signed-off-by: Mike Christie 
---
 block/bio.c|  8 
 block/blk-flush.c  |  2 +-
 drivers/md/bcache/debug.c  |  4 ++--
 drivers/md/md.c|  2 +-
 drivers/md/raid1.c |  2 +-
 drivers/md/raid10.c|  2 +-
 fs/btrfs/check-integrity.c |  8 
 fs/btrfs/check-integrity.h |  2 +-
 fs/btrfs/extent_io.c   |  2 +-
 fs/btrfs/scrub.c   |  6 +++---
 fs/ext4/crypto.c   |  2 +-
 fs/f2fs/segment.c  |  4 ++--
 fs/hfsplus/hfsplus_fs.h|  2 +-
 fs/hfsplus/part_tbl.c  |  5 +++--
 fs/hfsplus/super.c |  6 --
 fs/hfsplus/wrapper.c   | 14 --
 fs/logfs/dev_bdev.c|  2 +-
 include/linux/bio.h|  2 +-
 kernel/power/swap.c| 30 ++
 19 files changed, 58 insertions(+), 47 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ad3f276..610c704 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -859,21 +859,21 @@ static void submit_bio_wait_endio(struct bio *bio)
 
 /**
  * submit_bio_wait - submit a bio, and wait until it completes
- * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
+ * @op: REQ_OP_*
+ * @flags: rq_flag_bits
  * @bio: The  bio which describes the I/O
  *
  * Simple wrapper around submit_bio(). Returns 0 on success, or the error from
  * bio_endio() on failure.
  */
-int submit_bio_wait(int rw, struct bio *bio)
+int submit_bio_wait(int op, int flags, struct bio *bio)
 {
struct submit_bio_ret ret;
 
-   rw |= REQ_SYNC;
init_completion();
bio->bi_private = 
bio->bi_end_io = submit_bio_wait_endio;
-   submit_bio(rw, bio);
+   submit_bio(op | flags | REQ_SYNC, bio);
wait_for_completion();
 
return ret.error;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 9c423e5..f707ba1 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -485,7 +485,7 @@ int blkdev_issue_flush(struct block_device *bdev, gfp_t 
gfp_mask,
bio = bio_alloc(gfp_mask, 0);
bio->bi_bdev = bdev;
 
-   ret = submit_bio_wait(WRITE_FLUSH, bio);
+   ret = submit_bio_wait(REQ_OP_WRITE, WRITE_FLUSH, bio);
 
/*
 * The driver must store the error location in ->bi_sector, if
diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
index 8b1f1d5..001f5f1 100644
--- a/drivers/md/bcache/debug.c
+++ b/drivers/md/bcache/debug.c
@@ -54,7 +54,7 @@ void bch_btree_verify(struct btree *b)
bio->bi_iter.bi_size= KEY_SIZE(>key) << 9;
bch_bio_map(bio, sorted);
 
-   submit_bio_wait(REQ_META|READ_SYNC, bio);
+   submit_bio_wait(REQ_OP_READ, READ_SYNC, bio);
bch_bbio_free(bio, b->c);
 
memcpy(ondisk, sorted, KEY_SIZE(>key) << 9);
@@ -117,7 +117,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
if (bio_alloc_pages(check, GFP_NOIO))
goto out_put;
 
-   submit_bio_wait(READ_SYNC, check);
+   submit_bio_wait(REQ_OP_READ, READ_SYNC, check);
 
bio_for_each_segment(bv, bio, iter) {
void *p1 = kmap_atomic(bv.bv_page);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c702de1..1ca5959 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -771,7 +771,7 @@ int sync_page_io(struct md_rdev *rdev, sector_t sector, int 
size,
else
bio->bi_iter.bi_sector = sector + rdev->data_offset;
bio_add_page(bio, page, size, 0);
-   submit_bio_wait(rw, bio);
+   submit_bio_wait(rw, 0, bio);
 
ret = !bio->bi_error;
bio_put(bio);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d9d031e..527fdf5 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2195,7 +2195,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
bio_trim(wbio, sector - r1_bio->sector, sectors);
wbio->bi_iter.bi_sector += rdev->data_offset;
wbio->bi_bdev = rdev->bdev;
-   if (submit_bio_wait(WRITE, wbio) < 0)
+   if (submit_bio_wait(REQ_OP_WRITE, 0, wbio) < 0)
/* failure! */
ok = rdev_set_badblocks(rdev, sector,
sectors, 0)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 96f3659..69352a6 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2470,7 +2470,7 @@ static int narrow_write_error(struct r10bio 

[PATCH 07/32] dm: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares dm's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 drivers/md/dm-bufio.c   |  6 +++--
 drivers/md/dm-io.c  | 56 ++---
 drivers/md/dm-kcopyd.c  |  3 ++-
 drivers/md/dm-log.c |  5 ++--
 drivers/md/dm-raid1.c   | 11 +---
 drivers/md/dm-snap-persistent.c | 24 ++
 drivers/md/dm-thin.c|  6 ++---
 include/linux/dm-io.h   |  3 ++-
 8 files changed, 64 insertions(+), 50 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 83cc52e..9d5ef0c 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -554,7 +554,8 @@ static void use_dmio(struct dm_buffer *b, int rw, sector_t 
block,
 {
int r;
struct dm_io_request io_req = {
-   .bi_rw = rw,
+   .bi_op = rw,
+   .bi_op_flags = 0,
.notify.fn = dmio_complete,
.notify.context = b,
.client = b->c->dm_io,
@@ -1302,7 +1303,8 @@ EXPORT_SYMBOL_GPL(dm_bufio_write_dirty_buffers);
 int dm_bufio_issue_flush(struct dm_bufio_client *c)
 {
struct dm_io_request io_req = {
-   .bi_rw = WRITE_FLUSH,
+   .bi_op = REQ_OP_WRITE,
+   .bi_op_flags = WRITE_FLUSH,
.mem.type = DM_IO_KMEM,
.mem.ptr.addr = NULL,
.client = c->dm_io,
diff --git a/drivers/md/dm-io.c b/drivers/md/dm-io.c
index 6f8e83b2..6479096 100644
--- a/drivers/md/dm-io.c
+++ b/drivers/md/dm-io.c
@@ -279,8 +279,9 @@ static void km_dp_init(struct dpages *dp, void *data)
 /*-
  * IO routines that accept a list of pages.
  *---*/
-static void do_region(int rw, unsigned region, struct dm_io_region *where,
- struct dpages *dp, struct io *io)
+static void do_region(int op, int op_flags, unsigned region,
+ struct dm_io_region *where, struct dpages *dp,
+ struct io *io)
 {
struct bio *bio;
struct page *page;
@@ -296,24 +297,25 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
/*
 * Reject unsupported discard and write same requests.
 */
-   if (rw & REQ_DISCARD)
+   if (op == REQ_DISCARD)
special_cmd_max_sectors = q->limits.max_discard_sectors;
-   else if (rw & REQ_WRITE_SAME)
+   else if (op == REQ_WRITE_SAME)
special_cmd_max_sectors = q->limits.max_write_same_sectors;
-   if ((rw & (REQ_DISCARD | REQ_WRITE_SAME)) && special_cmd_max_sectors == 
0) {
+   if ((op == REQ_DISCARD || op == REQ_WRITE_SAME) &&
+   special_cmd_max_sectors == 0) {
dec_count(io, region, -EOPNOTSUPP);
return;
}
 
/*
-* where->count may be zero if rw holds a flush and we need to
+* where->count may be zero if op holds a flush and we need to
 * send a zero-sized flush.
 */
do {
/*
 * Allocate a suitably sized-bio.
 */
-   if ((rw & REQ_DISCARD) || (rw & REQ_WRITE_SAME))
+   if ((op == REQ_DISCARD) || (op == REQ_WRITE_SAME))
num_bvecs = 1;
else
num_bvecs = min_t(int, BIO_MAX_PAGES,
@@ -325,11 +327,11 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
bio->bi_end_io = endio;
store_io_and_region_in_bio(bio, io, region);
 
-   if (rw & REQ_DISCARD) {
+   if (op == REQ_DISCARD) {
num_sectors = min_t(sector_t, special_cmd_max_sectors, 
remaining);
bio->bi_iter.bi_size = num_sectors << SECTOR_SHIFT;
remaining -= num_sectors;
-   } else if (rw & REQ_WRITE_SAME) {
+   } else if (op == REQ_WRITE_SAME) {
/*
 * WRITE SAME only uses a single page.
 */
@@ -356,11 +358,11 @@ static void do_region(int rw, unsigned region, struct 
dm_io_region *where,
}
 
atomic_inc(>count);
-   submit_bio(rw, bio);
+   submit_bio(op | op_flags, bio);
} while (remaining);
 }
 
-static void dispatch_io(int rw, unsigned int num_regions,
+static 

[PATCH 01/32] block/fs: add REQ_OP definitions.

2015-11-04 Thread mchristi
From: Mike Christie 

This patch adds definitions for request/bio operations which
will be used in the next patches.

In the initial patches the REQ_OPs match the REQ ones for compat reasons
while all the code is converted in this set. In the last patches
that will be removed.

Signed-off-by: Mike Christie 
---
 include/linux/blk_types.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index e813013..d7b6009 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -244,4 +244,11 @@ enum rq_flag_bits {
 #define REQ_MQ_INFLIGHT(1ULL << __REQ_MQ_INFLIGHT)
 #define REQ_NO_TIMEOUT (1ULL << __REQ_NO_TIMEOUT)
 
+enum req_op {
+   REQ_OP_READ,
+   REQ_OP_WRITE= REQ_WRITE,
+   REQ_OP_DISCARD  = REQ_DISCARD,
+   REQ_OP_WRITE_SAME   = REQ_WRITE_SAME,
+};
+
 #endif /* __LINUX_BLK_TYPES_H */
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 06/32] xen blkback: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares xen blkback submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
modify the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 drivers/block/xen-blkback/blkback.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 6a685ae..bfffab3 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -488,7 +488,7 @@ static int xen_vbd_translate(struct phys_req *req, struct 
xen_blkif *blkif,
struct xen_vbd *vbd = >vbd;
int rc = -EACCES;
 
-   if ((operation != READ) && vbd->readonly)
+   if ((operation != REQ_OP_READ) && vbd->readonly)
goto out;
 
if (likely(req->nr_sects)) {
@@ -990,7 +990,7 @@ static int dispatch_discard_io(struct xen_blkif *blkif,
preq.sector_number = req->u.discard.sector_number;
preq.nr_sects  = req->u.discard.nr_sectors;
 
-   err = xen_vbd_translate(, blkif, WRITE);
+   err = xen_vbd_translate(, blkif, REQ_OP_WRITE);
if (err) {
pr_warn("access denied: DISCARD [%llu->%llu] on dev=%04x\n",
preq.sector_number,
@@ -1203,6 +1203,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
struct bio **biolist = pending_req->biolist;
int i, nbio = 0;
int operation;
+   int operation_flags = 0;
struct blk_plug plug;
bool drain = false;
struct grant_page **pages = pending_req->segments;
@@ -1220,17 +1221,19 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
switch (req_operation) {
case BLKIF_OP_READ:
blkif->st_rd_req++;
-   operation = READ;
+   operation = REQ_OP_READ;
break;
case BLKIF_OP_WRITE:
blkif->st_wr_req++;
-   operation = WRITE_ODIRECT;
+   operation = REQ_OP_WRITE;
+   operation_flags = WRITE_ODIRECT;
break;
case BLKIF_OP_WRITE_BARRIER:
drain = true;
case BLKIF_OP_FLUSH_DISKCACHE:
blkif->st_f_req++;
-   operation = WRITE_FLUSH;
+   operation = REQ_OP_WRITE;
+   operation_flags = WRITE_FLUSH;
break;
default:
operation = 0; /* make gcc happy */
@@ -1242,7 +1245,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
nseg = req->operation == BLKIF_OP_INDIRECT ?
   req->u.indirect.nr_segments : req->u.rw.nr_segments;
 
-   if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
+   if (unlikely(nseg == 0 && operation_flags != WRITE_FLUSH) ||
unlikely((req->operation != BLKIF_OP_INDIRECT) &&
 (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) ||
unlikely((req->operation == BLKIF_OP_INDIRECT) &&
@@ -1349,7 +1352,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
 
/* This will be hit if the operation was a flush or discard. */
if (!bio) {
-   BUG_ON(operation != WRITE_FLUSH);
+   BUG_ON(operation_flags != WRITE_FLUSH);
 
bio = bio_alloc(GFP_KERNEL, 0);
if (unlikely(bio == NULL))
@@ -1365,14 +1368,14 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
blk_start_plug();
 
for (i = 0; i < nbio; i++)
-   submit_bio(operation, biolist[i]);
+   submit_bio(operation | operation_flags, biolist[i]);
 
/* Let the I/Os go.. */
blk_finish_plug();
 
-   if (operation == READ)
+   if (operation == REQ_OP_READ)
blkif->st_rd_sect += preq.nr_sects;
-   else if (operation & WRITE)
+   else if (operation == REQ_OP_WRITE)
blkif->st_wr_sect += preq.nr_sects;
 
return 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/32] drbd: prepare drbd for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares drbd's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
modify the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 drivers/block/drbd/drbd_actlog.c | 30 --
 drivers/block/drbd/drbd_bitmap.c |  4 ++--
 drivers/block/drbd/drbd_int.h|  2 +-
 drivers/block/drbd/drbd_main.c   |  5 +++--
 4 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index b3868e7..c290e8b 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -137,19 +137,19 @@ void wait_until_done_or_force_detached(struct drbd_device 
*device, struct drbd_b
 
 static int _drbd_md_sync_page_io(struct drbd_device *device,
 struct drbd_backing_dev *bdev,
-sector_t sector, int rw)
+sector_t sector, int op)
 {
struct bio *bio;
/* we do all our meta data IO in aligned 4k blocks. */
const int size = 4096;
-   int err;
+   int err, op_flags = 0;
 
device->md_io.done = 0;
device->md_io.error = -ENODEV;
 
-   if ((rw & WRITE) && !test_bit(MD_NO_FUA, >flags))
-   rw |= REQ_FUA | REQ_FLUSH;
-   rw |= REQ_SYNC | REQ_NOIDLE;
+   if ((op == REQ_OP_WRITE) && !test_bit(MD_NO_FUA, >flags))
+   op_flags |= REQ_FUA | REQ_FLUSH;
+   op_flags |= REQ_SYNC | REQ_NOIDLE;
 
bio = bio_alloc_drbd(GFP_NOIO);
bio->bi_bdev = bdev->md_bdev;
@@ -159,9 +159,9 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
goto out;
bio->bi_private = device;
bio->bi_end_io = drbd_md_endio;
-   bio->bi_rw = rw;
+   bio->bi_rw = op | op_flags;
 
-   if (!(rw & WRITE) && device->state.disk == D_DISKLESS && device->ldev 
== NULL)
+   if (op != REQ_OP_WRITE && device->state.disk == D_DISKLESS && 
device->ldev == NULL)
/* special case, drbd_md_read() during drbd_adm_attach(): no 
get_ldev */
;
else if (!get_ldev_if_state(device, D_ATTACHING)) {
@@ -174,10 +174,10 @@ static int _drbd_md_sync_page_io(struct drbd_device 
*device,
bio_get(bio); /* one bio_put() is in the completion handler */
atomic_inc(>md_io.in_use); /* drbd_md_put_buffer() is in the 
completion handler */
device->md_io.submit_jif = jiffies;
-   if (drbd_insert_fault(device, (rw & WRITE) ? DRBD_FAULT_MD_WR : 
DRBD_FAULT_MD_RD))
+   if (drbd_insert_fault(device, (op == REQ_OP_WRITE) ? DRBD_FAULT_MD_WR : 
DRBD_FAULT_MD_RD))
bio_io_error(bio);
else
-   submit_bio(rw, bio);
+   submit_bio(op | op_flags, bio);
wait_until_done_or_force_detached(device, bdev, >md_io.done);
if (!bio->bi_error)
err = device->md_io.error;
@@ -188,7 +188,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
 }
 
 int drbd_md_sync_page_io(struct drbd_device *device, struct drbd_backing_dev 
*bdev,
-sector_t sector, int rw)
+sector_t sector, int op)
 {
int err;
D_ASSERT(device, atomic_read(>md_io.in_use) == 1);
@@ -197,19 +197,21 @@ int drbd_md_sync_page_io(struct drbd_device *device, 
struct drbd_backing_dev *bd
 
dynamic_drbd_dbg(device, "meta_data io: %s [%d]:%s(,%llus,%s) %pS\n",
 current->comm, current->pid, __func__,
-(unsigned long long)sector, (rw & WRITE) ? "WRITE" : "READ",
+(unsigned long long)sector, (op == REQ_OP_WRITE) ? "WRITE" : 
"READ",
 (void*)_RET_IP_ );
 
if (sector < drbd_md_first_sector(bdev) ||
sector + 7 > drbd_md_last_sector(bdev))
drbd_alert(device, "%s [%d]:%s(,%llus,%s) out of range md 
access!\n",
 current->comm, current->pid, __func__,
-(unsigned long long)sector, (rw & WRITE) ? "WRITE" : 
"READ");
+(unsigned long long)sector,
+(op == REQ_OP_WRITE) ? "WRITE" : "READ");
 
-   err = _drbd_md_sync_page_io(device, bdev, sector, rw);
+   err = _drbd_md_sync_page_io(device, bdev, sector, op);
if (err) {
drbd_err(device, "drbd_md_sync_page_io(,%llus,%s) failed with 
error %d\n",
-   (unsigned long long)sector, (rw & WRITE) ? "WRITE" : 
"READ", err);
+   (unsigned long long)sector,
+   (op == REQ_OP_WRITE) ? "WRITE" : "READ", err);
}
return err;
 }

[PATCH 09/32] btrfs: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares btrfs's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 fs/btrfs/check-integrity.c |   6 +--
 fs/btrfs/check-integrity.h |   2 +-
 fs/btrfs/compression.c |   8 ++--
 fs/btrfs/ctree.h   |   3 +-
 fs/btrfs/disk-io.c |  48 +++--
 fs/btrfs/disk-io.h |   2 +-
 fs/btrfs/extent-tree.c |   2 +-
 fs/btrfs/extent_io.c   | 103 -
 fs/btrfs/extent_io.h   |   7 +--
 fs/btrfs/inode.c   |  65 ++--
 fs/btrfs/scrub.c   |   4 +-
 fs/btrfs/volumes.c |  88 --
 fs/btrfs/volumes.h |   4 +-
 13 files changed, 177 insertions(+), 165 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index fd50b2f..515a92e 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -3058,10 +3058,10 @@ leave:
mutex_unlock(_mutex);
 }
 
-void btrfsic_submit_bio(int rw, struct bio *bio)
+void btrfsic_submit_bio(int op, int op_flags, struct bio *bio)
 {
-   __btrfsic_submit_bio(rw, bio);
-   submit_bio(rw, bio);
+   __btrfsic_submit_bio(op | op_flags, bio);
+   submit_bio(op | op_flags, bio);
 }
 
 int btrfsic_submit_bio_wait(int op, int op_flags, struct bio *bio)
diff --git a/fs/btrfs/check-integrity.h b/fs/btrfs/check-integrity.h
index 13b0d54..a8edc424 100644
--- a/fs/btrfs/check-integrity.h
+++ b/fs/btrfs/check-integrity.h
@@ -21,7 +21,7 @@
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
 int btrfsic_submit_bh(int rw, struct buffer_head *bh);
-void btrfsic_submit_bio(int rw, struct bio *bio);
+void btrfsic_submit_bio(int op, int op_flags, struct bio *bio);
 int btrfsic_submit_bio_wait(int op, int op_flags, struct bio *bio);
 #else
 #define btrfsic_submit_bh submit_bh
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 57ee8ca..a7b245d 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -401,7 +401,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
BUG_ON(ret); /* -ENOMEM */
}
 
-   ret = btrfs_map_bio(root, WRITE, bio, 0, 1);
+   ret = btrfs_map_bio(root, REQ_OP_WRITE, 0, bio, 0, 1);
BUG_ON(ret); /* -ENOMEM */
 
bio_put(bio);
@@ -431,7 +431,7 @@ int btrfs_submit_compressed_write(struct inode *inode, u64 
start,
BUG_ON(ret); /* -ENOMEM */
}
 
-   ret = btrfs_map_bio(root, WRITE, bio, 0, 1);
+   ret = btrfs_map_bio(root, REQ_OP_WRITE, 0, bio, 0, 1);
BUG_ON(ret); /* -ENOMEM */
 
bio_put(bio);
@@ -692,7 +692,7 @@ int btrfs_submit_compressed_read(struct inode *inode, 
struct bio *bio,
sums += DIV_ROUND_UP(comp_bio->bi_iter.bi_size,
 root->sectorsize);
 
-   ret = btrfs_map_bio(root, READ, comp_bio,
+   ret = btrfs_map_bio(root, REQ_OP_READ, 0, comp_bio,
mirror_num, 0);
if (ret) {
bio->bi_error = ret;
@@ -722,7 +722,7 @@ int btrfs_submit_compressed_read(struct inode *inode, 
struct bio *bio,
BUG_ON(ret); /* -ENOMEM */
}
 
-   ret = btrfs_map_bio(root, READ, comp_bio, mirror_num, 0);
+   ret = btrfs_map_bio(root, REQ_OP_READ, 0, comp_bio, mirror_num, 0);
if (ret) {
bio->bi_error = ret;
bio_endio(comp_bio);
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 938efe3..e4489dd1 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3910,8 +3910,7 @@ int btrfs_create_subvol_root(struct btrfs_trans_handle 
*trans,
 struct btrfs_root *parent_root,
 u64 new_dirid);
 int btrfs_merge_bio_hook(int rw, struct page *page, unsigned long offset,
-size_t size, struct bio *bio,
-unsigned long bio_flags);
+size_t size, struct bio *bio, unsigned long bio_flags);
 int btrfs_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf);
 int btrfs_readpage(struct file *file, struct page *page);
 void btrfs_evict_inode(struct inode *inode);
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 1e60d00..6c17d5d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -118,7 +118,8 @@ struct 

[PATCH 03/32] dio/btrfs: prep dio->submit_bio users for bi_rw split.

2015-11-04 Thread mchristi
From: Mike Christie 

Instead of passing around a bitmap of ops and flags, the
next patches separate it into a op field and a flags field.
This patch prepares the dio code and dio->submit_bio users
for the split.

Note that the next patches will fix up the submit_bio() call
with along other users of that function.

Signed-off-by: Mike Christie 
---
 fs/btrfs/inode.c   |  9 -
 fs/direct-io.c | 34 +-
 include/linux/fs.h |  4 ++--
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 611b66d..0ad8bab 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8196,14 +8196,13 @@ out_err:
return 0;
 }
 
-static void btrfs_submit_direct(int rw, struct bio *dio_bio,
+static void btrfs_submit_direct(int op, int op_flags, struct bio *dio_bio,
struct inode *inode, loff_t file_offset)
 {
struct btrfs_dio_private *dip = NULL;
struct bio *io_bio = NULL;
struct btrfs_io_bio *btrfs_bio;
int skip_sum;
-   int write = rw & REQ_WRITE;
int ret = 0;
 
skip_sum = BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM;
@@ -8232,14 +8231,14 @@ static void btrfs_submit_direct(int rw, struct bio 
*dio_bio,
btrfs_bio = btrfs_io_bio(io_bio);
btrfs_bio->logical = file_offset;
 
-   if (write) {
+   if (op == REQ_OP_WRITE) {
io_bio->bi_end_io = btrfs_endio_direct_write;
} else {
io_bio->bi_end_io = btrfs_endio_direct_read;
dip->subio_endio = btrfs_subio_endio_read;
}
 
-   ret = btrfs_submit_direct_hook(rw, dip, skip_sum);
+   ret = btrfs_submit_direct_hook(op | op_flags, dip, skip_sum);
if (!ret)
return;
 
@@ -8267,7 +8266,7 @@ free_ordered:
dip = NULL;
io_bio = NULL;
} else {
-   if (write) {
+   if (op == REQ_OP_WRITE) {
struct btrfs_ordered_extent *ordered;
 
ordered = btrfs_lookup_ordered_extent(inode,
diff --git a/fs/direct-io.c b/fs/direct-io.c
index 1125629..5e1b1a0 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -108,7 +108,8 @@ struct dio_submit {
 /* dio_state communicated between submission path and end_io */
 struct dio {
int flags;  /* doesn't change */
-   int rw;
+   int op;
+   int op_flags;
struct inode *inode;
loff_t i_size;  /* i_size when submitted */
dio_iodone_t *end_io;   /* IO completion function */
@@ -160,7 +161,7 @@ static inline int dio_refill_pages(struct dio *dio, struct 
dio_submit *sdio)
ret = iov_iter_get_pages(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES,
>from);
 
-   if (ret < 0 && sdio->blocks_available && (dio->rw & WRITE)) {
+   if (ret < 0 && sdio->blocks_available && (dio->op == REQ_OP_WRITE)) {
struct page *page = ZERO_PAGE(0);
/*
 * A memory fault, but the filesystem has some outstanding
@@ -239,7 +240,8 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, 
ssize_t ret,
transferred = dio->result;
 
/* Check for short read case */
-   if ((dio->rw == READ) && ((offset + transferred) > dio->i_size))
+   if ((dio->op == REQ_OP_READ) &&
+   ((offset + transferred) > dio->i_size))
transferred = dio->i_size - offset;
}
 
@@ -257,7 +259,7 @@ static ssize_t dio_complete(struct dio *dio, loff_t offset, 
ssize_t ret,
inode_dio_end(dio->inode);
 
if (is_async) {
-   if (dio->rw & WRITE) {
+   if (dio->op == REQ_OP_WRITE) {
int err;
 
err = generic_write_sync(dio->iocb->ki_filp, offset,
@@ -393,14 +395,14 @@ static inline void dio_bio_submit(struct dio *dio, struct 
dio_submit *sdio)
dio->refcount++;
spin_unlock_irqrestore(>bio_lock, flags);
 
-   if (dio->is_async && dio->rw == READ)
+   if (dio->is_async && dio->op == REQ_OP_READ)
bio_set_pages_dirty(bio);
 
if (sdio->submit_io)
-   sdio->submit_io(dio->rw, bio, dio->inode,
+   sdio->submit_io(dio->op, dio->op_flags, bio, dio->inode,
   sdio->logical_offset_in_bio);
else
-   submit_bio(dio->rw, bio);
+   submit_bio(dio->op | dio->op_flags, bio);
 
sdio->bio = NULL;
sdio->boundary = 0;
@@ -464,14 +466,14 @@ static int dio_bio_complete(struct dio *dio, struct bio 
*bio)
if (bio->bi_error)
dio->io_error = -EIO;
 
-   if (dio->is_async && dio->rw == READ) {
+   if (dio->is_async && dio->op == REQ_OP_READ) {
bio_check_pages_dirty(bio); /* transfers ownership */
  

[PATCH 0/8] mm: memcontrol: account socket memory in unified hierarchy v2

2015-11-04 Thread Johannes Weiner
Hi,

this is version 2 of the patches to add socket memory accounting to
the unified hierarchy memory controller. Changes from v1 include:

- No accounting overhead unless a dedicated cgroup is created and the
  memory controller instructed to track that group's memory footprint.
  Distribution kernels enable CONFIG_MEMCG, and users (incl. systemd)
  might create cgroups only for process control or resources other
  than memory. As noted by David and Michal, these setups shouldn't
  pay any overhead for this.

- Continue to enter the socket pressure state when hitting the memory
  controller's hard limit. Vladimir noted that there is at least some
  value in telling other sockets in the cgroup to not increase their
  transmit windows when one of them is already dropping packets.

- Drop the controversial vmpressure rework. Instead of changing the
  level where pressure is noted, keep noting pressure in its origin
  and then make the pressure check hierarchical. As noted by Michal
  and Vladimir, we shouldn't risk changing user-visible behavior.

---

Socket buffer memory can make up a significant share of a workload's
memory footprint that can be directly linked to userspace activity,
and so it needs to be part of the memory controller to provide proper
resource isolation/containment.

Historically, socket buffers were accounted in a separate counter,
without any pressure equalization between anonymous memory, page
cache, and the socket buffers. When the socket buffer pool was
exhausted, buffer allocations would fail hard and cause network
performance to tank, regardless of whether there was still memory
available to the group or not. Likewise, struggling anonymous or cache
workingsets could not dip into an idle socket memory pool. Because of
this, the feature was not usable for many real life applications.

To not repeat this mistake, the new memory controller will account all
types of memory pages it is tracking on behalf of a cgroup in a single
pool. Upon pressure, the VM reclaims and shrinks and puts pressure on
whatever memory consumer in that pool is within its reach.

For socket memory, pressure feedback is provided through vmpressure
events. When the VM has trouble freeing memory, the network code is
instructed to stop growing the cgroup's transmit windows.

---

This series begins with a rework of the existing tcp memory controller
that simplifies and cleans up the code while allowing us to have only
one set of networking hooks for both memory controller versions. The
original behavior of the existing tcp controller should be preserved.

It then adds socket accounting to the v2 memory controller, including
the use of the per-cpu charge cache and async memory.high enforcement
from socket memory charges.

Lastly, vmpressure is hooked up to the socket code so that it stops
growing transmit windows when the VM has trouble reclaiming memory.

 include/linux/memcontrol.h   |  98 ---
 include/linux/page_counter.h |   6 +-
 include/net/sock.h   | 137 ++---
 include/net/tcp.h|   5 +-
 include/net/tcp_memcontrol.h |   7 --
 mm/backing-dev.c |   2 +-
 mm/hugetlb_cgroup.c  |   3 +-
 mm/memcontrol.c  | 262 +
 mm/page_counter.c|  14 +--
 mm/vmpressure.c  |  25 +++-
 mm/vmscan.c  |  31 ++---
 net/core/sock.c  |  78 +++-
 net/ipv4/sysctl_net_ipv4.c   |   1 -
 net/ipv4/tcp.c   |   3 +-
 net/ipv4/tcp_ipv4.c  |   9 +-
 net/ipv4/tcp_memcontrol.c| 147 ---
 net/ipv4/tcp_output.c|   6 +-
 net/ipv6/tcp_ipv6.c  |   3 -
 18 files changed, 328 insertions(+), 509 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 08/32] target: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares lio's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 drivers/target/target_core_iblock.c | 32 ++--
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 0f19e11..25f75ab 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -354,14 +354,14 @@ iblock_get_bio(struct se_cmd *cmd, sector_t lba, u32 
sg_num)
return bio;
 }
 
-static void iblock_submit_bios(struct bio_list *list, int rw)
+static void iblock_submit_bios(struct bio_list *list, int op, int op_flags)
 {
struct blk_plug plug;
struct bio *bio;
 
blk_start_plug();
while ((bio = bio_list_pop(list)))
-   submit_bio(rw, bio);
+   submit_bio(op | op_flags, bio);
blk_finish_plug();
 }
 
@@ -480,7 +480,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
sectors -= 1;
}
 
-   iblock_submit_bios(, WRITE);
+   iblock_submit_bios(, REQ_OP_WRITE, 0);
return 0;
 
 fail_put_bios:
@@ -653,7 +653,8 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
u32 sg_num = sgl_nents;
sector_t block_lba;
unsigned bio_cnt;
-   int rw = 0;
+   int op_flags = 0;
+   int op = 0;
int i;
 
if (data_direction == DMA_TO_DEVICE) {
@@ -664,17 +665,20 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
 * is not enabled, or if initiator set the Force Unit Access 
bit.
 */
if (q->flush_flags & REQ_FUA) {
-   if (cmd->se_cmd_flags & SCF_FUA)
-   rw = WRITE_FUA;
-   else if (!(q->flush_flags & REQ_FLUSH))
-   rw = WRITE_FUA;
-   else
-   rw = WRITE;
+   if (cmd->se_cmd_flags & SCF_FUA) {
+   op = REQ_OP_WRITE;
+   op_flags = WRITE_FUA;
+   } else if (!(q->flush_flags & REQ_FLUSH)) {
+   op = REQ_OP_WRITE;
+   op_flags = WRITE_FUA;
+   } else {
+   op = REQ_OP_WRITE;
+   }
} else {
-   rw = WRITE;
+   op = REQ_OP_WRITE;
}
} else {
-   rw = READ;
+   op = REQ_OP_READ;
}
 
/*
@@ -726,7 +730,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
while (bio_add_page(bio, sg_page(sg), sg->length, sg->offset)
!= sg->length) {
if (bio_cnt >= IBLOCK_MAX_BIO_PER_TASK) {
-   iblock_submit_bios(, rw);
+   iblock_submit_bios(, op, op_flags);
bio_cnt = 0;
}
 
@@ -750,7 +754,7 @@ iblock_execute_rw(struct se_cmd *cmd, struct scatterlist 
*sgl, u32 sgl_nents,
goto fail_put_bios;
}
 
-   iblock_submit_bios(, rw);
+   iblock_submit_bios(, op, op_flags);
iblock_complete_cmd(cmd);
return 0;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] net: tcp_memcontrol: remove bogus hierarchy pressure propagation

2015-11-04 Thread Johannes Weiner
When a cgroup currently breaches its socket memory limit, it enters
memory pressure mode for itself and its *parents*. This throttles
transmission in unrelated groups that have nothing to do with the
breached limit.

On the contrary, breaching a limit should make that group and its
*children* enter memory pressure mode. But this happens already,
albeit lazily: if a parent limit is breached, siblings will enter
memory pressure on their own once the next packet arrives for them.

So no additional hierarchy code is needed. Remove the bogus stuff.

Signed-off-by: Johannes Weiner 
---
 include/net/sock.h | 19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 59a7196..d541bed 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1152,14 +1152,8 @@ static inline void sk_leave_memory_pressure(struct sock 
*sk)
if (*memory_pressure)
*memory_pressure = 0;
 
-   if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-   struct cg_proto *cg_proto = sk->sk_cgrp;
-   struct proto *prot = sk->sk_prot;
-
-   for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-   cg_proto->memory_pressure = 0;
-   }
-
+   if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+   sk->sk_cgrp->memory_pressure = 0;
 }
 
 static inline void sk_enter_memory_pressure(struct sock *sk)
@@ -1167,13 +1161,8 @@ static inline void sk_enter_memory_pressure(struct sock 
*sk)
if (!sk->sk_prot->enter_memory_pressure)
return;
 
-   if (mem_cgroup_sockets_enabled && sk->sk_cgrp) {
-   struct cg_proto *cg_proto = sk->sk_cgrp;
-   struct proto *prot = sk->sk_prot;
-
-   for (; cg_proto; cg_proto = parent_cg_proto(prot, cg_proto))
-   cg_proto->memory_pressure = 1;
-   }
+   if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
+   sk->sk_cgrp->memory_pressure = 1;
 
sk->sk_prot->enter_memory_pressure(sk);
 }
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/8] mm: memcontrol: export root_mem_cgroup

2015-11-04 Thread Johannes Weiner
A later patch will need this symbol in files other than memcontrol.c,
so export it now and replace mem_cgroup_root_css at the same time.

Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
---
 include/linux/memcontrol.h | 3 ++-
 mm/backing-dev.c   | 2 +-
 mm/memcontrol.c| 5 ++---
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 805da1f..19ff87b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -275,7 +275,8 @@ struct mem_cgroup {
struct mem_cgroup_per_node *nodeinfo[0];
/* WARNING: nodeinfo must be the last member here */
 };
-extern struct cgroup_subsys_state *mem_cgroup_root_css;
+
+extern struct mem_cgroup *root_mem_cgroup;
 
 /**
  * mem_cgroup_events - count memory events against a cgroup
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 095b23b..73ab967 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -702,7 +702,7 @@ static int cgwb_bdi_init(struct backing_dev_info *bdi)
 
ret = wb_init(>wb, bdi, 1, GFP_KERNEL);
if (!ret) {
-   bdi->wb.memcg_css = mem_cgroup_root_css;
+   bdi->wb.memcg_css = _mem_cgroup->css;
bdi->wb.blkcg_css = blkcg_root_css;
}
return ret;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c71fe40..7049e55 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -76,9 +76,9 @@
 struct cgroup_subsys memory_cgrp_subsys __read_mostly;
 EXPORT_SYMBOL(memory_cgrp_subsys);
 
+struct mem_cgroup *root_mem_cgroup __read_mostly;
+
 #define MEM_CGROUP_RECLAIM_RETRIES 5
-static struct mem_cgroup *root_mem_cgroup __read_mostly;
-struct cgroup_subsys_state *mem_cgroup_root_css __read_mostly;
 
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
@@ -4214,7 +4214,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
/* root ? */
if (parent_css == NULL) {
root_mem_cgroup = memcg;
-   mem_cgroup_root_css = >css;
page_counter_init(>memory, NULL);
memcg->high = PAGE_COUNTER_MAX;
memcg->soft_limit = PAGE_COUNTER_MAX;
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] mm: vmscan: simplify memcg vs. global shrinker invocation

2015-11-04 Thread Johannes Weiner
Letting shrink_slab() handle the root_mem_cgroup, and implicitely the
!CONFIG_MEMCG case, allows shrink_zone() to invoke the shrinkers
unconditionally from within the memcg iteration loop.

Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
---
 include/linux/memcontrol.h |  2 ++
 mm/vmscan.c| 31 ---
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 19ff87b..8929685 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -502,6 +502,8 @@ void mem_cgroup_split_huge_fixup(struct page *head);
 #else /* CONFIG_MEMCG */
 struct mem_cgroup;
 
+#define root_mem_cgroup NULL
+
 static inline void mem_cgroup_events(struct mem_cgroup *memcg,
 enum mem_cgroup_events_index idx,
 unsigned int nr)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 9b52ecf..ecc2125 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -411,6 +411,10 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
struct shrinker *shrinker;
unsigned long freed = 0;
 
+   /* Global shrinker mode */
+   if (memcg == root_mem_cgroup)
+   memcg = NULL;
+
if (memcg && !memcg_kmem_is_active(memcg))
return 0;
 
@@ -2417,11 +2421,22 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
shrink_lruvec(lruvec, swappiness, sc, _pages);
zone_lru_pages += lru_pages;
 
-   if (memcg && is_classzone)
+   /*
+* Shrink the slab caches in the same proportion that
+* the eligible LRU pages were scanned.
+*/
+   if (is_classzone) {
shrink_slab(sc->gfp_mask, zone_to_nid(zone),
memcg, sc->nr_scanned - scanned,
lru_pages);
 
+   if (reclaim_state) {
+   sc->nr_reclaimed +=
+   reclaim_state->reclaimed_slab;
+   reclaim_state->reclaimed_slab = 0;
+   }
+   }
+
/*
 * Direct reclaim and kswapd have to scan all memory
 * cgroups to fulfill the overall scan target for the
@@ -2439,20 +2454,6 @@ static bool shrink_zone(struct zone *zone, struct 
scan_control *sc,
}
} while ((memcg = mem_cgroup_iter(root, memcg, )));
 
-   /*
-* Shrink the slab caches in the same proportion that
-* the eligible LRU pages were scanned.
-*/
-   if (global_reclaim(sc) && is_classzone)
-   shrink_slab(sc->gfp_mask, zone_to_nid(zone), NULL,
-   sc->nr_scanned - nr_scanned,
-   zone_lru_pages);
-
-   if (reclaim_state) {
-   sc->nr_reclaimed += reclaim_state->reclaimed_slab;
-   reclaim_state->reclaimed_slab = 0;
-   }
-
vmpressure(sc->gfp_mask, sc->target_mem_cgroup,
   sc->nr_scanned - nr_scanned,
   sc->nr_reclaimed - nr_reclaimed);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] mm: memcontrol: prepare for unified hierarchy socket accounting

2015-11-04 Thread Johannes Weiner
The unified hierarchy memory controller will account socket
memory. Move the infrastructure functions accordingly.

Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
---
 mm/memcontrol.c | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d649b56..85f212e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -287,76 +287,6 @@ static inline struct mem_cgroup 
*mem_cgroup_from_id(unsigned short id)
return mem_cgroup_from_css(css);
 }
 
-/* Writing them here to avoid exposing memcg's inner layout */
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
-
-DEFINE_STATIC_KEY_FALSE(mem_cgroup_sockets);
-
-void sock_update_memcg(struct sock *sk)
-{
-   struct mem_cgroup *memcg;
-   /*
-* Socket cloning can throw us here with sk_cgrp already
-* filled. It won't however, necessarily happen from
-* process context. So the test for root memcg given
-* the current task's memcg won't help us in this case.
-*
-* Respecting the original socket's memcg is a better
-* decision in this case.
-*/
-   if (sk->sk_memcg) {
-   BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
-   css_get(>sk_memcg->css);
-   return;
-   }
-
-   rcu_read_lock();
-   memcg = mem_cgroup_from_task(current);
-   if (css_tryget_online(>css))
-   sk->sk_memcg = memcg;
-   rcu_read_unlock();
-}
-EXPORT_SYMBOL(sock_update_memcg);
-
-void sock_release_memcg(struct sock *sk)
-{
-   if (sk->sk_memcg)
-   css_put(>sk_memcg->css);
-}
-
-/**
- * mem_cgroup_charge_skmem - charge socket memory
- * @memcg: memcg to charge
- * @nr_pages: number of pages to charge
- *
- * Charges @nr_pages to @memcg. Returns %true if the charge fit within
- * the memcg's configured limit, %false if the charge had to be forced.
- */
-bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-   struct page_counter *counter;
-
-   if (page_counter_try_charge(>skmem, nr_pages, )) {
-   memcg->skmem_breached = false;
-   return true;
-   }
-   page_counter_charge(>skmem, nr_pages);
-   memcg->skmem_breached = true;
-   return false;
-}
-
-/**
- * mem_cgroup_uncharge_skmem - uncharge socket memory
- * @memcg: memcg to uncharge
- * @nr_pages: number of pages to uncharge
- */
-void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
-{
-   page_counter_uncharge(>skmem, nr_pages);
-}
-
-#endif
-
 #ifdef CONFIG_MEMCG_KMEM
 /*
  * This will be the memcg's index in each cache's ->memcg_params.memcg_caches.
@@ -5523,6 +5453,76 @@ void mem_cgroup_replace_page(struct page *oldpage, 
struct page *newpage)
commit_charge(newpage, memcg, true);
 }
 
+/* Writing them here to avoid exposing memcg's inner layout */
+#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+
+DEFINE_STATIC_KEY_FALSE(mem_cgroup_sockets);
+
+void sock_update_memcg(struct sock *sk)
+{
+   struct mem_cgroup *memcg;
+   /*
+* Socket cloning can throw us here with sk_cgrp already
+* filled. It won't however, necessarily happen from
+* process context. So the test for root memcg given
+* the current task's memcg won't help us in this case.
+*
+* Respecting the original socket's memcg is a better
+* decision in this case.
+*/
+   if (sk->sk_memcg) {
+   BUG_ON(mem_cgroup_is_root(sk->sk_memcg));
+   css_get(>sk_memcg->css);
+   return;
+   }
+
+   rcu_read_lock();
+   memcg = mem_cgroup_from_task(current);
+   if (css_tryget_online(>css))
+   sk->sk_memcg = memcg;
+   rcu_read_unlock();
+}
+EXPORT_SYMBOL(sock_update_memcg);
+
+void sock_release_memcg(struct sock *sk)
+{
+   if (sk->sk_memcg)
+   css_put(>sk_memcg->css);
+}
+
+/**
+ * mem_cgroup_charge_skmem - charge socket memory
+ * @memcg: memcg to charge
+ * @nr_pages: number of pages to charge
+ *
+ * Charges @nr_pages to @memcg. Returns %true if the charge fit within
+ * the memcg's configured limit, %false if the charge had to be forced.
+ */
+bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+   struct page_counter *counter;
+
+   if (page_counter_try_charge(>skmem, nr_pages, )) {
+   memcg->skmem_breached = false;
+   return true;
+   }
+   page_counter_charge(>skmem, nr_pages);
+   memcg->skmem_breached = true;
+   return false;
+}
+
+/**
+ * mem_cgroup_uncharge_skmem - uncharge socket memory
+ * @memcg: memcg to uncharge
+ * @nr_pages: number of pages to uncharge
+ */
+void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages)
+{
+   page_counter_uncharge(>skmem, nr_pages);
+}
+
+#endif
+
 /*
  * subsys_initcall() for memory 

Re: [PATCH 2/4] perf tools: Pass LINUX_VERSION_CODE to BPF program when compiling

2015-11-04 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 04, 2015 at 11:20:05AM +, Wang Nan escreveu:
> Arnaldo suggests to make LINUX_VERSION_CODE works like __func__ and
> __FILE__ so user don't need to care setting right linux version
> too much. In this patch, perf llvm transfers LINUX_VERSION_CODE macro
> through clang cmdline.
> 
> [1] http://lkml.kernel.org/r/20151029223744.gk2...@kernel.org

Tested, updated the comment, applied and pushed to my perf/core branch,
please continue from there, I'll try to push it tomorrow.

- Arnaldo
 
> Signed-off-by: Wang Nan 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Alexei Starovoitov 
> Cc: Namhyung Kim 
> Cc: Zefan Li 
> Cc: pi3or...@163.com
> ---
>  tools/perf/util/llvm-utils.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
> index 80eecef..8ee25be 100644
> --- a/tools/perf/util/llvm-utils.c
> +++ b/tools/perf/util/llvm-utils.c
> @@ -12,6 +12,7 @@
>  
>  #define CLANG_BPF_CMD_DEFAULT_TEMPLATE   \
>   "$CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS "\
> + "-DLINUX_VERSION_CODE=$LINUX_VERSION_CODE " \
>   "$CLANG_OPTIONS $KERNEL_INC_OPTIONS "   \
>   "-Wno-unused-value -Wno-pointer-sign "  \
>   "-working-directory $WORKING_DIR "  \
> @@ -324,11 +325,33 @@ get_kbuild_opts(char **kbuild_dir, char 
> **kbuild_include_opts)
>   pr_debug("include option is set to %s\n", *kbuild_include_opts);
>  }
>  
> +static unsigned long
> +fetch_kernel_version(void)
> +{
> + struct utsname utsname;
> + int version, patchlevel, sublevel, err;
> +
> + if (uname())
> + return 0;
> +
> + err = sscanf(utsname.release, "%d.%d.%d",
> +  , , );
> +
> + if (err != 3) {
> + pr_debug("Unablt to get kernel version from uname '%s'\n",
> +  utsname.release);
> + return 0;
> + }
> +
> + return (version << 16) + (patchlevel << 8) + sublevel;
> +}
> +
>  int llvm__compile_bpf(const char *path, void **p_obj_buf,
> size_t *p_obj_buf_sz)
>  {
>   int err, nr_cpus_avail;
>   char clang_path[PATH_MAX], nr_cpus_avail_str[64];
> + char linux_version_code_str[64];
>   const char *clang_opt = llvm_param.clang_opt;
>   const char *template = llvm_param.clang_bpf_cmd_template;
>   char *kbuild_dir = NULL, *kbuild_include_opts = NULL;
> @@ -365,7 +388,11 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
>   snprintf(nr_cpus_avail_str, sizeof(nr_cpus_avail_str), "%d",
>nr_cpus_avail);
>  
> + snprintf(linux_version_code_str, sizeof(linux_version_code_str),
> +  "0x%lx", fetch_kernel_version());
> +
>   force_set_env("NR_CPUS", nr_cpus_avail_str);
> + force_set_env("LINUX_VERSION_CODE", linux_version_code_str);
>   force_set_env("CLANG_EXEC", clang_path);
>   force_set_env("CLANG_OPTIONS", clang_opt);
>   force_set_env("KERNEL_INC_OPTIONS", kbuild_include_opts);
> -- 
> 1.8.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] mm: memcontrol: hook up vmpressure to socket pressure

2015-11-04 Thread Johannes Weiner
Let the networking stack know when a memcg is under reclaim pressure
so that it can clamp its transmit windows accordingly.

Whenever the reclaim efficiency of a cgroup's LRU lists drops low
enough for a MEDIUM or HIGH vmpressure event to occur, assert a
pressure state in the socket and tcp memory code that tells it to curb
consumption growth from sockets associated with said control group.

vmpressure events are naturally edge triggered, so for hysteresis
assert socket pressure for a second to allow for subsequent vmpressure
events to occur before letting the socket code return to normal.

This will likely need finetuning for a wider variety of workloads, but
for now stick to the vmpressure presets and keep hysteresis simple.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h | 27 +--
 mm/memcontrol.c| 15 +--
 mm/vmpressure.c| 25 -
 3 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 7adabb7..d45379a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -247,6 +247,7 @@ struct mem_cgroup {
 
 #ifdef CONFIG_INET
struct work_struct socket_work;
+   unsigned long socket_pressure;
 #endif
 
/* List of events which userspace want to receive */
@@ -292,18 +293,34 @@ struct lruvec *mem_cgroup_page_lruvec(struct page *, 
struct zone *);
 
 bool task_in_mem_cgroup(struct task_struct *task, struct mem_cgroup *memcg);
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p);
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg);
 
 static inline
 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
return css ? container_of(css, struct mem_cgroup, css) : NULL;
 }
 
+#define mem_cgroup_from_counter(counter, member)   \
+   container_of(counter, struct mem_cgroup, member)
+
 struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *,
   struct mem_cgroup *,
   struct mem_cgroup_reclaim_cookie *);
 void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
 
+/**
+ * parent_mem_cgroup - find the accounting parent of a memcg
+ * @memcg: memcg whose parent to find
+ *
+ * Returns the parent memcg, or NULL if this is the root or the memory
+ * controller is in legacy no-hierarchy mode.
+ */
+static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
+{
+   if (!memcg->memory.parent)
+   return NULL;
+   return mem_cgroup_from_counter(memcg->memory.parent, memory);
+}
+
 static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg,
  struct mem_cgroup *root)
 {
@@ -695,7 +712,13 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, 
unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int 
nr_pages);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
-   return memcg->skmem_breached;
+   if (memcg->skmem_breached)
+   return true;
+   do {
+   if (time_before(jiffies, memcg->socket_pressure))
+   return true;
+   } while ((memcg = parent_mem_cgroup(memcg)));
+   return false;
 }
 #else
 static inline bool mem_cgroup_do_sockets(void)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2994c9d..e10637f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1084,9 +1084,6 @@ bool task_in_mem_cgroup(struct task_struct *task, struct 
mem_cgroup *memcg)
return ret;
 }
 
-#define mem_cgroup_from_counter(counter, member)   \
-   container_of(counter, struct mem_cgroup, member)
-
 /**
  * mem_cgroup_margin - calculate chargeable space of a memory cgroup
  * @memcg: the memory cgroup
@@ -4126,17 +4123,6 @@ static void __mem_cgroup_free(struct mem_cgroup *memcg)
kfree(memcg);
 }
 
-/*
- * Returns the parent mem_cgroup in memcgroup hierarchy with hierarchy enabled.
- */
-struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg)
-{
-   if (!memcg->memory.parent)
-   return NULL;
-   return mem_cgroup_from_counter(memcg->memory.parent, memory);
-}
-EXPORT_SYMBOL(parent_mem_cgroup);
-
 static void socket_work_func(struct work_struct *work);
 
 static struct cgroup_subsys_state * __ref
@@ -4181,6 +4167,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
 #endif
 #ifdef CONFIG_INET
INIT_WORK(>socket_work, socket_work_func);
+   memcg->socket_pressure = jiffies;
 #endif
return >css;
 
diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 4c25e62..07e8440 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -137,14 +137,11 @@ struct vmpressure_event {
 };
 
 static bool vmpressure_event(struct vmpressure *vmpr,
-unsigned long scanned, unsigned long reclaimed)
+

[PATCH net] net: dsa: mv88e6xxx: isolate unbridged ports

2015-11-04 Thread Vivien Didelot
The DSA documentation specifies that each port must be capable of
forwarding frames to the CPU port. The last changes on bridging support
for the mv88e6xxx driver broke this requirement for non-bridged ports.

So as for the bridged ports, reserve a few VLANs (4000+) in the switch
to isolate ports that have not been bridged yet.

By default, a port will be isolated with the CPU and DSA ports. When the
port joins a bridge, it will leave its reserved port. When it is removed
from a bridge, it will join its reserved VLAN again.

Fixes: 5fe7f68016ff ("net: dsa: mv88e6xxx: fix hardware bridging")
Reported-by: Andrew Lunn 
Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6171.c |  2 ++
 drivers/net/dsa/mv88e6352.c |  2 ++
 drivers/net/dsa/mv88e6xxx.c | 42 ++
 drivers/net/dsa/mv88e6xxx.h |  2 ++
 4 files changed, 48 insertions(+)

diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 54aa000..6e18213 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -103,6 +103,8 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
 #endif
.get_regs_len   = mv88e6xxx_get_regs_len,
.get_regs   = mv88e6xxx_get_regs,
+   .port_join_bridge   = mv88e6xxx_port_bridge_join,
+   .port_leave_bridge  = mv88e6xxx_port_bridge_leave,
.port_stp_update= mv88e6xxx_port_stp_update,
.port_pvid_get  = mv88e6xxx_port_pvid_get,
.port_vlan_prepare  = mv88e6xxx_port_vlan_prepare,
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index ff846d0..cc6c545 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -323,6 +323,8 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.set_eeprom = mv88e6352_set_eeprom,
.get_regs_len   = mv88e6xxx_get_regs_len,
.get_regs   = mv88e6xxx_get_regs,
+   .port_join_bridge   = mv88e6xxx_port_bridge_join,
+   .port_leave_bridge  = mv88e6xxx_port_bridge_leave,
.port_stp_update= mv88e6xxx_port_stp_update,
.port_pvid_get  = mv88e6xxx_port_pvid_get,
.port_vlan_prepare  = mv88e6xxx_port_vlan_prepare,
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 04cff58..b06dba0 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -1462,6 +1462,10 @@ int mv88e6xxx_port_vlan_prepare(struct dsa_switch *ds, 
int port,
const struct switchdev_obj_port_vlan *vlan,
struct switchdev_trans *trans)
 {
+   /* We reserve a few VLANs to isolate unbridged ports */
+   if (vlan->vid_end >= 4000)
+   return -EOPNOTSUPP;
+
/* We don't need any dynamic resource from the kernel (yet),
 * so skip the prepare phase.
 */
@@ -1870,6 +1874,36 @@ unlock:
return err;
 }
 
+int mv88e6xxx_port_bridge_join(struct dsa_switch *ds, int port, u32 members)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   const u16 pvid = 4000 + ds->index * DSA_MAX_PORTS + port;
+   int err;
+
+   /* The port joined a bridge, so leave its reserved VLAN */
+   mutex_lock(>smi_mutex);
+   err = _mv88e6xxx_port_vlan_del(ds, port, pvid);
+   if (!err)
+   err = _mv88e6xxx_port_pvid_set(ds, port, 0);
+   mutex_unlock(>smi_mutex);
+   return err;
+}
+
+int mv88e6xxx_port_bridge_leave(struct dsa_switch *ds, int port, u32 members)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   const u16 pvid = 4000 + ds->index * DSA_MAX_PORTS + port;
+   int err;
+
+   /* The port left the bridge, so join its reserved VLAN */
+   mutex_lock(>smi_mutex);
+   err = _mv88e6xxx_port_vlan_add(ds, port, pvid, true);
+   if (!err)
+   err = _mv88e6xxx_port_pvid_set(ds, port, pvid);
+   mutex_unlock(>smi_mutex);
+   return err;
+}
+
 static void mv88e6xxx_bridge_work(struct work_struct *work)
 {
struct mv88e6xxx_priv_state *ps;
@@ -2140,6 +2174,14 @@ int mv88e6xxx_setup_ports(struct dsa_switch *ds)
ret = mv88e6xxx_setup_port(ds, i);
if (ret < 0)
return ret;
+
+   if (dsa_is_cpu_port(ds, i) || dsa_is_dsa_port(ds, i))
+   continue;
+
+   /* setup the unbridged state */
+   ret = mv88e6xxx_port_bridge_leave(ds, i, 0);
+   if (ret < 0)
+   return ret;
}
return 0;
 }
diff --git a/drivers/net/dsa/mv88e6xxx.h b/drivers/net/dsa/mv88e6xxx.h
index fb9a873..21c8daa 100644
--- a/drivers/net/dsa/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx.h
@@ -468,6 +468,8 @@ int mv88e6xxx_phy_write_indirect(struct dsa_switch *ds, int 
addr, int regnum,
 int mv88e6xxx_get_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e);
 int 

Re: [PATCH 3/4] perf test: Enforce LLVM test: update basic BPF test program

2015-11-04 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 04, 2015 at 11:20:06AM +, Wang Nan escreveu:
> This patch replaces the original toy BPF program with previous introduced
> bpf-script-example.c. Dynamically embedded it into 'llvm-src-base.c'.
> 
> The newly introduced BPF program attaches a BPF program to
> 'sys_epoll_pwait()'. perf itself never use that syscall, so further test
> can verify their result with it. The program would generate 1 sample
> in every 2 calls of epoll_pwait() system call.
> 
> Since the resuling BPF object is useful, test_llvm__fetch_bpf_obj() is
> introduced for creating BPF objects for source. llvm test is rewritten
> according to it.


[acme@zoo linux]$ am /wb/1.patch 
Applying: perf test: Enforce LLVM test: update basic BPF test program
/home/acme/git/linux/.git/rebase-apply/patch:149: space before tab in indent.
*p_obj_buf = NULL;
/home/acme/git/linux/.git/rebase-apply/patch:150: space before tab in indent.
*p_obj_buf_sz = 0;
error: patch failed: tools/perf/tests/llvm.c:28
error: tools/perf/tests/llvm.c: patch does not apply
Patch failed at 0001 perf test: Enforce LLVM test: update basic BPF test program
The copy of the patch that failed is found in:
   /home/acme/git/linux/.git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
[acme@zoo linux]$ 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tracefs, debugfs: fix refcount imbalance in start_creating

2015-11-04 Thread Daniel Borkmann

On 11/02/2015 08:07 PM, Steven Rostedt wrote:

On Fri,  9 Oct 2015 20:30:10 +0200
Daniel Borkmann  wrote:


[...]


Fixes: 4282d60689d4 ("tracefs: Add new tracefs file system")
Fixes: 190afd81e4a5 ("debugfs: split the beginning and the end of __create_file() 
off")
Signed-off-by: Daniel Borkmann 
Cc: Steven Rostedt 

[...]

Fyi, I'll respin and split this one into two, so they can be routed through
their individual trees. (Keeping Steven's Acked-by on the tracefs one.)

Thanks & sorry for the noise,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] net: tcp_memcontrol: consolidate socket buffer tracking and accounting

2015-11-04 Thread Johannes Weiner
The tcp memory controller has extensive provisions for future memory
accounting interfaces that won't materialize after all. Cut the code
base down to what's actually used, now and in the likely future.

- There won't be any different protocol counters in the future, so a
  direct sock->sk_memcg linkage is enough. This eliminates a lot of
  callback maze and boilerplate code, and restores most of the socket
  allocation code to pre-tcp_memcontrol state.

- There won't be a tcp control soft limit, so integrating the memcg
  code into the global skmem limiting scheme complicates things
  unnecessarily. Replace all that with simple and clear charge and
  uncharge calls--hidden behind a jump label--to account skb memory.

  Without a soft limit, the per-memcg pressure state is questionable
  as well, but for now we still enter it when the hard limit is hit,
  and packets are dropped, to let other sockets in the cgroup know
  that they shouldn't grow their transmit windows, either. However,
  because network performance will already be in the toilet at this
  point, keep it simple: leave memory pressure lazily when the next
  packet is accepted, and delete the code that checks synchroneously
  when memory is released. This should be acceptable.

- The previous jump label code was an elaborate state machine that
  tracked the number of cgroups with an active socket limit in order
  to enable the skmem tracking and accounting code only when actively
  necessary. But this is overengineered: it was meant to protect the
  people who never use this feature in the first place. Simply enable
  the branches once when the first limit is set until the next reboot.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h   |  60 ++
 include/net/sock.h   | 126 +++--
 include/net/tcp.h|   5 +-
 include/net/tcp_memcontrol.h |   7 ---
 mm/memcontrol.c  | 103 --
 net/core/sock.c  |  78 ++-
 net/ipv4/sysctl_net_ipv4.c   |   1 -
 net/ipv4/tcp.c   |   3 +-
 net/ipv4/tcp_ipv4.c  |   9 +--
 net/ipv4/tcp_memcontrol.c| 147 +++
 net/ipv4/tcp_output.c|   6 +-
 net/ipv6/tcp_ipv6.c  |   3 -
 12 files changed, 137 insertions(+), 411 deletions(-)
 delete mode 100644 include/net/tcp_memcontrol.h

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8929685..f3caf84 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -85,34 +85,6 @@ enum mem_cgroup_events_target {
MEM_CGROUP_NTARGETS,
 };
 
-/*
- * Bits in struct cg_proto.flags
- */
-enum cg_proto_flags {
-   /* Currently active and new sockets should be assigned to cgroups */
-   MEMCG_SOCK_ACTIVE,
-   /* It was ever activated; we must disarm static keys on destruction */
-   MEMCG_SOCK_ACTIVATED,
-};
-
-struct cg_proto {
-   struct page_counter memory_allocated;   /* Current allocated 
memory. */
-   struct percpu_counter   sockets_allocated;  /* Current number of 
sockets. */
-   int memory_pressure;
-   longsysctl_mem[3];
-   unsigned long   flags;
-   /*
-* memcg field is used to find which memcg we belong directly
-* Each memcg struct can hold more than one cg_proto, so container_of
-* won't really cut.
-*
-* The elegant solution would be having an inverse function to
-* proto_cgroup in struct proto, but that means polluting the structure
-* for everybody, instead of just for memcg users.
-*/
-   struct mem_cgroup   *memcg;
-};
-
 #ifdef CONFIG_MEMCG
 struct mem_cgroup_stat_cpu {
long count[MEM_CGROUP_STAT_NSTATS];
@@ -185,8 +157,16 @@ struct mem_cgroup {
 
/* Accounted resources */
struct page_counter memory;
+
+   /*
+* Legacy non-resource counters. In unified hierarchy, all
+* memory is accounted and limited through memcg->memory.
+* Consumer breakdown happens in the statistics.
+*/
struct page_counter memsw;
struct page_counter kmem;
+   struct page_counter skmem;
+   bool skmem_breached;/* (ancestral) skmem.limit breached */
 
/* Normal memory consumption range */
unsigned long low;
@@ -246,9 +226,6 @@ struct mem_cgroup {
 */
struct mem_cgroup_stat_cpu __percpu *stat;
 
-#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_INET)
-   struct cg_proto tcp_mem;
-#endif
 #if defined(CONFIG_MEMCG_KMEM)
 /* Index in the kmem_cache->memcg_params.memcg_caches array */
int kmemcg_id;
@@ -678,12 +655,6 @@ void mem_cgroup_count_vm_event(struct mm_struct *mm, enum 
vm_event_item idx)
 }
 #endif /* CONFIG_MEMCG */
 
-enum {
-   UNDER_LIMIT,
-   SOFT_LIMIT,
-   OVER_LIMIT,
-};
-
 #ifdef 

[PATCH 7/8] mm: memcontrol: account socket memory in unified hierarchy memory controller

2015-11-04 Thread Johannes Weiner
Socket memory can be a significant share of overall memory consumed by
common workloads. In order to provide reasonable resource isolation in
the unified hierarchy, this type of memory needs to be included in the
tracking/accounting of a cgroup under active memory resource control.

Overhead is only incurred when a non-root control group is created AND
the memory controller is instructed to track and account the memory
footprint of that group. cgroup.memory=nosocket can be specified on
the boot commandline to override any runtime configuration and
forcibly exclude socket memory from active memory resource control.

Signed-off-by: Johannes Weiner 
---
 include/linux/memcontrol.h |   8 +++-
 mm/memcontrol.c| 110 +
 2 files changed, 97 insertions(+), 21 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index f3caf84..7adabb7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -245,6 +245,10 @@ struct mem_cgroup {
struct wb_domain cgwb_domain;
 #endif
 
+#ifdef CONFIG_INET
+   struct work_struct socket_work;
+#endif
+
/* List of events which userspace want to receive */
struct list_head event_list;
spinlock_t event_list_lock;
@@ -679,7 +683,7 @@ static inline void mem_cgroup_wb_stats(struct bdi_writeback 
*wb,
 #endif /* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-#if defined(CONFIG_INET) && defined(CONFIG_MEMCG_KMEM)
+#ifdef CONFIG_INET
 extern struct static_key_false mem_cgroup_sockets;
 static inline bool mem_cgroup_do_sockets(void)
 {
@@ -698,7 +702,7 @@ static inline bool mem_cgroup_do_sockets(void)
 {
return false;
 }
-#endif /* CONFIG_INET && CONFIG_MEMCG_KMEM */
+#endif /* CONFIG_INET */
 
 #ifdef CONFIG_MEMCG_KMEM
 extern struct static_key memcg_kmem_enabled_key;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 85f212e..2994c9d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -79,6 +79,9 @@ struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #define MEM_CGROUP_RECLAIM_RETRIES 5
 
+/* Socket memory accounting disabled? */
+static int cgroup_memory_nosocket;
+
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
 int do_swap_account __read_mostly;
@@ -1916,6 +1919,18 @@ static int memcg_cpu_hotplug_callback(struct 
notifier_block *nb,
return NOTIFY_OK;
 }
 
+static void reclaim_high(struct mem_cgroup *memcg,
+unsigned int nr_pages,
+gfp_t gfp_mask)
+{
+   do {
+   if (page_counter_read(>memory) <= memcg->high)
+   continue;
+   mem_cgroup_events(memcg, MEMCG_HIGH, 1);
+   try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true);
+   } while ((memcg = parent_mem_cgroup(memcg)));
+}
+
 /*
  * Scheduled by try_charge() to be executed from the userland return path
  * and reclaims memory over the high limit.
@@ -1923,20 +1938,13 @@ static int memcg_cpu_hotplug_callback(struct 
notifier_block *nb,
 void mem_cgroup_handle_over_high(void)
 {
unsigned int nr_pages = current->memcg_nr_pages_over_high;
-   struct mem_cgroup *memcg, *pos;
+   struct mem_cgroup *memcg;
 
if (likely(!nr_pages))
return;
 
-   pos = memcg = get_mem_cgroup_from_mm(current->mm);
-
-   do {
-   if (page_counter_read(>memory) <= pos->high)
-   continue;
-   mem_cgroup_events(pos, MEMCG_HIGH, 1);
-   try_to_free_mem_cgroup_pages(pos, nr_pages, GFP_KERNEL, true);
-   } while ((pos = parent_mem_cgroup(pos)));
-
+   memcg = get_mem_cgroup_from_mm(current->mm);
+   reclaim_high(memcg, nr_pages, GFP_KERNEL);
css_put(>css);
current->memcg_nr_pages_over_high = 0;
 }
@@ -4129,6 +4137,8 @@ struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup 
*memcg)
 }
 EXPORT_SYMBOL(parent_mem_cgroup);
 
+static void socket_work_func(struct work_struct *work);
+
 static struct cgroup_subsys_state * __ref
 mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 {
@@ -4169,6 +4179,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
 #ifdef CONFIG_CGROUP_WRITEBACK
INIT_LIST_HEAD(>cgwb_list);
 #endif
+#ifdef CONFIG_INET
+   INIT_WORK(>socket_work, socket_work_func);
+#endif
return >css;
 
 free_out:
@@ -4228,6 +4241,9 @@ mem_cgroup_css_online(struct cgroup_subsys_state *css)
if (ret)
return ret;
 
+   if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
+   static_branch_enable(_cgroup_sockets);
+
/*
 * Make sure the memcg is initialized: mem_cgroup_iter()
 * orders reading memcg->initialized against its callers
@@ -4266,6 +4282,8 @@ static void mem_cgroup_css_free(struct 
cgroup_subsys_state *css)
 {
struct mem_cgroup *memcg = mem_cgroup_from_css(css);
 
+   cancel_work_sync(>socket_work);
+

[PATCH 14/32] block/fs/mm: pass in op and flags to submit_bio

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares submit_bio callers for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers now pass them in seperately.

Signed-off-by: Mike Christie 
---
 block/bio.c |  2 +-
 block/blk-core.c| 13 +++--
 block/blk-lib.c | 11 ---
 drivers/block/drbd/drbd_actlog.c|  2 +-
 drivers/block/drbd/drbd_bitmap.c|  2 +-
 drivers/block/floppy.c  |  2 +-
 drivers/block/xen-blkback/blkback.c |  2 +-
 drivers/block/xen-blkfront.c|  4 ++--
 drivers/md/bcache/journal.c |  2 +-
 drivers/md/bcache/super.c   |  2 +-
 drivers/md/dm-bufio.c   |  2 +-
 drivers/md/dm-io.c  |  2 +-
 drivers/md/dm-log-writes.c  |  6 +++---
 drivers/md/dm-thin.c|  2 +-
 drivers/md/md.c |  4 ++--
 drivers/target/target_core_iblock.c |  4 ++--
 fs/btrfs/check-integrity.c  |  2 +-
 fs/btrfs/raid56.c   | 10 +-
 fs/buffer.c |  7 ---
 fs/direct-io.c  |  2 +-
 fs/ext4/page-io.c   |  6 +++---
 fs/ext4/readpage.c  |  8 
 fs/f2fs/data.c  | 10 +-
 fs/gfs2/lops.c  |  2 +-
 fs/gfs2/ops_fstype.c|  2 +-
 fs/jfs/jfs_logmgr.c |  4 ++--
 fs/jfs/jfs_metapage.c   |  8 
 fs/logfs/dev_bdev.c |  8 
 fs/mpage.c  |  2 +-
 fs/nfs/blocklayout/blocklayout.c|  2 +-
 fs/nilfs2/segbuf.c  |  2 +-
 fs/ocfs2/cluster/heartbeat.c|  4 ++--
 fs/xfs/xfs_aops.c   |  3 ++-
 fs/xfs/xfs_buf.c|  2 +-
 include/linux/fs.h  |  2 +-
 kernel/power/swap.c |  2 +-
 mm/page_io.c|  4 ++--
 37 files changed, 81 insertions(+), 73 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 610c704..ae91ccb 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -873,7 +873,7 @@ int submit_bio_wait(int op, int flags, struct bio *bio)
init_completion();
bio->bi_private = 
bio->bi_end_io = submit_bio_wait_endio;
-   submit_bio(op | flags | REQ_SYNC, bio);
+   submit_bio(op, flags | REQ_SYNC, bio);
wait_for_completion();
 
return ret.error;
diff --git a/block/blk-core.c b/block/blk-core.c
index 18e92a6..d325ece 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1976,7 +1976,8 @@ EXPORT_SYMBOL(generic_make_request);
 
 /**
  * submit_bio - submit a bio to the block device layer for I/O
- * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
+ * @op: REQ_OP_*
+ * @flags: rq_flag_bits
  * @bio: The  bio which describes the I/O
  *
  * submit_bio() is very similar in purpose to generic_make_request(), and
@@ -1984,9 +1985,9 @@ EXPORT_SYMBOL(generic_make_request);
  * interfaces; @bio must be presetup and ready for I/O.
  *
  */
-void submit_bio(int rw, struct bio *bio)
+void submit_bio(int op, int flags, struct bio *bio)
 {
-   bio->bi_rw |= rw;
+   bio->bi_rw |= op | flags;
 
/*
 * If it's a regular read/write or a barrier with data attached,
@@ -1995,12 +1996,12 @@ void submit_bio(int rw, struct bio *bio)
if (bio_has_data(bio)) {
unsigned int count;
 
-   if (unlikely(rw & REQ_WRITE_SAME))
+   if (unlikely(op == REQ_WRITE_SAME))
count = bdev_logical_block_size(bio->bi_bdev) >> 9;
else
count = bio_sectors(bio);
 
-   if (rw & WRITE) {
+   if (op == REQ_OP_WRITE) {
count_vm_events(PGPGOUT, count);
} else {
task_io_account_read(bio->bi_iter.bi_size);
@@ -2011,7 +2012,7 @@ void submit_bio(int rw, struct bio *bio)
char b[BDEVNAME_SIZE];
printk(KERN_DEBUG "%s(%d): %s block %Lu on %s (%u 
sectors)\n",
current->comm, task_pid_nr(current),
-   (rw & WRITE) ? "WRITE" : "READ",
+   bio_rw(bio) ? "WRITE" : "READ",
(unsigned long long)bio->bi_iter.bi_sector,
bdevname(bio->bi_bdev, b),
count);
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 0861c7a..49786b0 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -109,7 +109,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
sector = end_sect;
 
atomic_inc();
-   submit_bio(op | op_flags, bio);
+   submit_bio(op, op_flags, bio);
 
/*
 * We can loop for a long time in 

[PATCH 3/8] mm: page_counter: let page_counter_try_charge() return bool

2015-11-04 Thread Johannes Weiner
page_counter_try_charge() currently returns 0 on success and -ENOMEM
on failure, which is surprising behavior given the function name.

Make it follow the expected pattern of try_stuff() functions that
return a boolean true to indicate success, or false for failure.

Signed-off-by: Johannes Weiner 
Acked-by: Michal Hocko 
---
 include/linux/page_counter.h |  6 +++---
 mm/hugetlb_cgroup.c  |  3 ++-
 mm/memcontrol.c  | 11 +--
 mm/page_counter.c| 14 +++---
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 17fa4f8..7e62920 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -36,9 +36,9 @@ static inline unsigned long page_counter_read(struct 
page_counter *counter)
 
 void page_counter_cancel(struct page_counter *counter, unsigned long nr_pages);
 void page_counter_charge(struct page_counter *counter, unsigned long nr_pages);
-int page_counter_try_charge(struct page_counter *counter,
-   unsigned long nr_pages,
-   struct page_counter **fail);
+bool page_counter_try_charge(struct page_counter *counter,
+unsigned long nr_pages,
+struct page_counter **fail);
 void page_counter_uncharge(struct page_counter *counter, unsigned long 
nr_pages);
 int page_counter_limit(struct page_counter *counter, unsigned long limit);
 int page_counter_memparse(const char *buf, const char *max,
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 6a44263..d8fb10d 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -186,7 +186,8 @@ again:
}
rcu_read_unlock();
 
-   ret = page_counter_try_charge(_cg->hugepage[idx], nr_pages, );
+   if (!page_counter_try_charge(_cg->hugepage[idx], nr_pages, ))
+   ret = -ENOMEM;
css_put(_cg->css);
 done:
*ptr = h_cg;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7049e55..e54f434 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2018,8 +2018,8 @@ retry:
return 0;
 
if (!do_swap_account ||
-   !page_counter_try_charge(>memsw, batch, )) {
-   if (!page_counter_try_charge(>memory, batch, ))
+   page_counter_try_charge(>memsw, batch, )) {
+   if (page_counter_try_charge(>memory, batch, ))
goto done_restock;
if (do_swap_account)
page_counter_uncharge(>memsw, batch);
@@ -2383,14 +2383,13 @@ int __memcg_kmem_charge_memcg(struct page *page, gfp_t 
gfp, int order,
 {
unsigned int nr_pages = 1 << order;
struct page_counter *counter;
-   int ret = 0;
+   int ret;
 
if (!memcg_kmem_is_active(memcg))
return 0;
 
-   ret = page_counter_try_charge(>kmem, nr_pages, );
-   if (ret)
-   return ret;
+   if (!page_counter_try_charge(>kmem, nr_pages, ))
+   return -ENOMEM;
 
ret = try_charge(memcg, gfp, nr_pages);
if (ret) {
diff --git a/mm/page_counter.c b/mm/page_counter.c
index 11b4bed..7c6a63d 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c
@@ -56,12 +56,12 @@ void page_counter_charge(struct page_counter *counter, 
unsigned long nr_pages)
  * @nr_pages: number of pages to charge
  * @fail: points first counter to hit its limit, if any
  *
- * Returns 0 on success, or -ENOMEM and @fail if the counter or one of
- * its ancestors has hit its configured limit.
+ * Returns %true on success, or %false and @fail if the counter or one
+ * of its ancestors has hit its configured limit.
  */
-int page_counter_try_charge(struct page_counter *counter,
-   unsigned long nr_pages,
-   struct page_counter **fail)
+bool page_counter_try_charge(struct page_counter *counter,
+unsigned long nr_pages,
+struct page_counter **fail)
 {
struct page_counter *c;
 
@@ -99,13 +99,13 @@ int page_counter_try_charge(struct page_counter *counter,
if (new > c->watermark)
c->watermark = new;
}
-   return 0;
+   return true;
 
 failed:
for (c = counter; c != *fail; c = c->parent)
page_counter_cancel(c, nr_pages);
 
-   return -ENOMEM;
+   return false;
 }
 
 /**
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/32] btrfs: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares btrfs's submit_bh use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bh calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bh call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 fs/btrfs/check-integrity.c | 68 ++
 fs/btrfs/check-integrity.h |  2 +-
 fs/btrfs/disk-io.c |  4 +--
 3 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index a5ff6e4..17eba2d 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -168,7 +168,8 @@ struct btrfsic_block {
bio_end_io_t *bio;
bh_end_io_t *bh;
} orig_bio_bh_end_io;
-   int submit_bio_bh_rw;
+   int submit_bio_bh_op;
+   int submit_bio_bh_op_flags;
u64 flush_gen; /* only valid if !never_written */
 };
 
@@ -338,7 +339,8 @@ static void btrfsic_process_written_block(struct 
btrfsic_dev_state *dev_state,
  unsigned int num_pages,
  struct bio *bio, int *bio_is_patched,
  struct buffer_head *bh,
- int submit_bio_bh_rw);
+ int submit_bio_bh_op,
+ int submit_bio_bh_op_flags);
 static int btrfsic_process_written_superblock(
struct btrfsic_state *state,
struct btrfsic_block *const block,
@@ -418,7 +420,8 @@ static void btrfsic_block_init(struct btrfsic_block *b)
INIT_LIST_HEAD(>all_blocks_node);
INIT_LIST_HEAD(>ref_to_list);
INIT_LIST_HEAD(>ref_from_list);
-   b->submit_bio_bh_rw = 0;
+   b->submit_bio_bh_op = 0;
+   b->submit_bio_bh_op_flags = 0;
b->flush_gen = 0;
 }
 
@@ -1820,7 +1823,8 @@ static void btrfsic_process_written_block(struct 
btrfsic_dev_state *dev_state,
  unsigned int num_pages,
  struct bio *bio, int *bio_is_patched,
  struct buffer_head *bh,
- int submit_bio_bh_rw)
+ int submit_bio_bh_op,
+ int submit_bio_bh_op_flags)
 {
int is_metadata;
struct btrfsic_block *block;
@@ -2038,7 +2042,8 @@ again:
}
 
block->flush_gen = dev_state->last_flush_gen + 1;
-   block->submit_bio_bh_rw = submit_bio_bh_rw;
+   block->submit_bio_bh_op = submit_bio_bh_op;
+   block->submit_bio_bh_op_flags = submit_bio_bh_op_flags;
if (is_metadata) {
block->logical_bytenr = bytenr;
block->is_metadata = 1;
@@ -2141,7 +2146,8 @@ again:
block->iodone_w_error = 0;
block->mirror_num = 0;  /* unknown */
block->flush_gen = dev_state->last_flush_gen + 1;
-   block->submit_bio_bh_rw = submit_bio_bh_rw;
+   block->submit_bio_bh_op = submit_bio_bh_op;
+   block->submit_bio_bh_op_flags = submit_bio_bh_op_flags;
if (NULL != bio) {
block->is_iodone = 0;
BUG_ON(NULL == bio_is_patched);
@@ -2236,7 +2242,7 @@ static void btrfsic_bio_end_io(struct bio *bp)
   block->dev_bytenr, block->mirror_num);
next_block = block->next_in_same_bio;
block->iodone_w_error = iodone_w_error;
-   if (block->submit_bio_bh_rw & REQ_FLUSH) {
+   if (block->submit_bio_bh_op_flags & REQ_FLUSH) {
dev_state->last_flush_gen++;
if ((dev_state->state->print_mask &
 BTRFSIC_PRINT_MASK_END_IO_BIO_BH))
@@ -2245,7 +2251,7 @@ static void btrfsic_bio_end_io(struct bio *bp)
   dev_state->name,
   dev_state->last_flush_gen);
}
-   if (block->submit_bio_bh_rw & REQ_FUA)
+   if (block->submit_bio_bh_op_flags & REQ_FUA)
block->flush_gen = 0; /* FUA completed means block is
   * on disk */
block->is_iodone = 1; /* for FLUSH, this releases the block */
@@ -2272,7 +2278,7 @@ static void btrfsic_bh_end_io(struct buffer_head *bh, int 
uptodate)
   block->dev_bytenr, block->mirror_num);
 

[PATCH 10/32] f2fs: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares f2fs's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 fs/f2fs/checkpoint.c|  6 --
 fs/f2fs/data.c  | 30 ++
 fs/f2fs/f2fs.h  |  5 +++--
 fs/f2fs/gc.c| 11 +++
 fs/f2fs/inline.c|  3 ++-
 fs/f2fs/node.c  | 10 ++
 fs/f2fs/segment.c   |  6 --
 fs/f2fs/trace.c |  8 +---
 include/trace/events/f2fs.h | 34 +-
 9 files changed, 70 insertions(+), 43 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index c5a38e3..ebab316 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -54,7 +54,8 @@ struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t 
index)
struct f2fs_io_info fio = {
.sbi = sbi,
.type = META,
-   .rw = READ_SYNC | REQ_META | REQ_PRIO,
+   .op = REQ_OP_READ,
+   .op_flags = READ_SYNC | REQ_META | REQ_PRIO,
.blk_addr = index,
.encrypted_page = NULL,
};
@@ -133,7 +134,8 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, 
int nrpages, int type
struct f2fs_io_info fio = {
.sbi = sbi,
.type = META,
-   .rw = READ_SYNC | REQ_META | REQ_PRIO,
+   .op = REQ_OP_READ,
+   .op_flags = READ_SYNC | REQ_META | REQ_PRIO,
.encrypted_page = NULL,
};
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a82abe9..fb767e4f 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -107,12 +107,12 @@ static void __submit_merged_bio(struct f2fs_bio_info *io)
if (!io->bio)
return;
 
-   if (is_read_io(fio->rw))
+   if (is_read_io(fio->op))
trace_f2fs_submit_read_bio(io->sbi->sb, fio, io->bio);
else
trace_f2fs_submit_write_bio(io->sbi->sb, fio, io->bio);
 
-   submit_bio(fio->rw, io->bio);
+   submit_bio(fio->op | fio->op_flags, io->bio);
io->bio = NULL;
 }
 
@@ -129,10 +129,12 @@ void f2fs_submit_merged_bio(struct f2fs_sb_info *sbi,
/* change META to META_FLUSH in the checkpoint procedure */
if (type >= META_FLUSH) {
io->fio.type = META_FLUSH;
+   io->fio.op = REQ_OP_WRITE;
if (test_opt(sbi, NOBARRIER))
-   io->fio.rw = WRITE_FLUSH | REQ_META | REQ_PRIO;
+   io->fio.op_flags = WRITE_FLUSH | REQ_META | REQ_PRIO;
else
-   io->fio.rw = WRITE_FLUSH_FUA | REQ_META | REQ_PRIO;
+   io->fio.op_flags = WRITE_FLUSH_FUA | REQ_META |
+   REQ_PRIO;
}
__submit_merged_bio(io);
up_write(>io_rwsem);
@@ -151,14 +153,14 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio)
f2fs_trace_ios(fio, 0);
 
/* Allocate a new bio */
-   bio = __bio_alloc(fio->sbi, fio->blk_addr, 1, is_read_io(fio->rw));
+   bio = __bio_alloc(fio->sbi, fio->blk_addr, 1, is_read_io(fio->op));
 
if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
bio_put(bio);
return -EFAULT;
}
 
-   submit_bio(fio->rw, bio);
+   submit_bio(fio->op | fio->op_flags, bio);
return 0;
 }
 
@@ -167,7 +169,7 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
struct f2fs_sb_info *sbi = fio->sbi;
enum page_type btype = PAGE_TYPE_OF_BIO(fio->type);
struct f2fs_bio_info *io;
-   bool is_read = is_read_io(fio->rw);
+   bool is_read = is_read_io(fio->op);
struct page *bio_page;
 
io = is_read ? >read_io : >write_io[btype];
@@ -180,7 +182,7 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
inc_page_count(sbi, F2FS_WRITEBACK);
 
if (io->bio && (io->last_block_in_bio != fio->blk_addr - 1 ||
-   io->fio.rw != fio->rw))
+   io->fio.op != fio->op))
__submit_merged_bio(io);
 alloc_new:
if (io->bio == NULL) {
@@ -275,7 +277,8 @@ int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index)
return f2fs_reserve_block(dn, index);
 }
 
-struct page *get_read_data_page(struct inode *inode, pgoff_t index, int rw)
+struct page *get_read_data_page(struct inode *inode, pgoff_t index,
+ 

[PATCH 16/32] block/fs/md: pass in op and flags to submit_bh

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares submit_bh callers for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers now pass them in seperately.

Signed-off-by: Mike Christie 
---
 drivers/md/bitmap.c |  4 ++--
 fs/btrfs/check-integrity.c  |  4 ++--
 fs/buffer.c | 52 ++---
 fs/ext4/balloc.c|  2 +-
 fs/ext4/ialloc.c|  2 +-
 fs/ext4/inode.c |  2 +-
 fs/ext4/mmp.c   |  4 ++--
 fs/ext4/super.c |  2 +-
 fs/fat/misc.c   |  2 +-
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/dir.c   |  2 +-
 fs/gfs2/meta_io.c   |  8 +++
 fs/jbd2/commit.c|  7 +++---
 fs/jbd2/journal.c   |  8 +++
 fs/nilfs2/btnode.c  |  6 +++---
 fs/nilfs2/btnode.h  |  2 +-
 fs/nilfs2/btree.c   |  6 --
 fs/nilfs2/gcinode.c |  5 +++--
 fs/nilfs2/mdt.c | 11 +-
 fs/nilfs2/super.c   |  4 ++--
 fs/ntfs/aops.c  |  6 +++---
 fs/ntfs/compress.c  |  2 +-
 fs/ntfs/file.c  |  2 +-
 fs/ntfs/logfile.c   |  2 +-
 fs/ntfs/mft.c   |  4 ++--
 fs/ocfs2/buffer_head_io.c   |  8 +++
 fs/reiserfs/inode.c |  4 ++--
 fs/reiserfs/journal.c   | 12 ++-
 fs/ufs/util.c   |  2 +-
 include/linux/buffer_head.h |  9 
 30 files changed, 97 insertions(+), 89 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 48b5890..9070ee8 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -299,7 +299,7 @@ static void write_page(struct bitmap *bitmap, struct page 
*page, int wait)
atomic_inc(>pending_writes);
set_buffer_locked(bh);
set_buffer_mapped(bh);
-   submit_bh(WRITE | REQ_SYNC, bh);
+   submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
bh = bh->b_this_page;
}
 
@@ -394,7 +394,7 @@ static int read_page(struct file *file, unsigned long index,
atomic_inc(>pending_writes);
set_buffer_locked(bh);
set_buffer_mapped(bh);
-   submit_bh(READ, bh);
+   submit_bh(REQ_OP_READ, 0, bh);
}
block++;
bh = bh->b_this_page;
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 17eba2d..9cb367f0 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2908,7 +2908,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct 
buffer_head *bh)
struct btrfsic_dev_state *dev_state;
 
if (!btrfsic_is_initialized)
-   return submit_bh(op | op_flags, bh);
+   return submit_bh(op, op_flags, bh);
 
mutex_lock(_mutex);
/* since btrfsic_submit_bh() might also be called before
@@ -2964,7 +2964,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct 
buffer_head *bh)
}
}
mutex_unlock(_mutex);
-   return submit_bh(op | op_flags, bh);
+   return submit_bh(op, op_flags, bh);
 }
 
 static void __btrfsic_submit_bio(int op, int op_flags, struct bio *bio)
diff --git a/fs/buffer.c b/fs/buffer.c
index a190c25..cd07d86 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -45,7 +45,7 @@
 #include 
 
 static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);
-static int submit_bh_wbc(int rw, struct buffer_head *bh,
+static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 unsigned long bio_flags,
 struct writeback_control *wbc);
 
@@ -1236,7 +1236,7 @@ static struct buffer_head *__bread_slow(struct 
buffer_head *bh)
} else {
get_bh(bh);
bh->b_end_io = end_buffer_read_sync;
-   submit_bh(READ, bh);
+   submit_bh(REQ_OP_READ, 0, bh);
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return bh;
@@ -1708,7 +1708,7 @@ static int __block_write_full_page(struct inode *inode, 
struct page *page,
struct buffer_head *bh, *head;
unsigned int blocksize, bbits;
int nr_underway = 0;
-   int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
+   int write_flags = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC: 0);
 
head = create_page_buffers(page, inode,
(1 << BH_Dirty)|(1 << BH_Uptodate));
@@ -1797,7 +1797,7 @@ static int __block_write_full_page(struct inode *inode, 
struct page *page,
do {
struct buffer_head *next = bh->b_this_page;
if (buffer_async_write(bh)) {
-   submit_bh_wbc(write_op, bh, 0, wbc);
+   

[PATCH 17/32] block: add operation field to bio struct

2015-11-04 Thread mchristi
From: Mike Christie 

This patch adds field to the bio to store the REQ_OP, and it
has the block layer code set it.

The next patches will modify the other drivers and filesystems
to also set the bi_op. We are still ORing the op into the bi_rw.
When I am done with the conversion, that will be dropped.

Signed-off-by: Mike Christie 
---
 block/bio.c   | 11 +--
 block/blk-core.c  |  1 +
 block/blk-map.c   |  4 +++-
 include/linux/blk_types.h |  8 +++-
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index ae91ccb..1cf8428 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -582,6 +582,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
bio->bi_bdev = bio_src->bi_bdev;
bio_set_flag(bio, BIO_CLONED);
bio->bi_rw = bio_src->bi_rw;
+   bio->bi_op = bio_src->bi_op;
bio->bi_iter = bio_src->bi_iter;
bio->bi_io_vec = bio_src->bi_io_vec;
 }
@@ -664,6 +665,7 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
 
bio->bi_bdev= bio_src->bi_bdev;
bio->bi_rw  = bio_src->bi_rw;
+   bio->bi_op  = bio_src->bi_op;
bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
 
@@ -1168,8 +1170,10 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
if (!bio)
goto out_bmd;
 
-   if (iter->type & WRITE)
+   if (iter->type & WRITE) {
bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
+   }
 
ret = 0;
 
@@ -1338,8 +1342,10 @@ struct bio *bio_map_user_iov(struct request_queue *q,
/*
 * set data direction, and check if mapped pages need bouncing
 */
-   if (iter->type & WRITE)
+   if (iter->type & WRITE) {
bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
+   }
 
bio_set_flag(bio, BIO_USER_MAPPED);
 
@@ -1533,6 +1539,7 @@ struct bio *bio_copy_kern(struct request_queue *q, void 
*data, unsigned int len,
} else {
bio->bi_end_io = bio_copy_kern_endio;
bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
}
 
return bio;
diff --git a/block/blk-core.c b/block/blk-core.c
index d325ece..c8672f2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1988,6 +1988,7 @@ EXPORT_SYMBOL(generic_make_request);
 void submit_bio(int op, int flags, struct bio *bio)
 {
bio->bi_rw |= op | flags;
+   bio->bi_op = op;
 
/*
 * If it's a regular read/write or a barrier with data attached,
diff --git a/block/blk-map.c b/block/blk-map.c
index f565e11..4a91dc4 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -223,8 +223,10 @@ int blk_rq_map_kern(struct request_queue *q, struct 
request *rq, void *kbuf,
if (IS_ERR(bio))
return PTR_ERR(bio);
 
-   if (!reading)
+   if (!reading) {
bio->bi_rw |= REQ_WRITE;
+   bio->bi_op = REQ_OP_WRITE;
+   }
 
if (do_copy)
rq->cmd_flags |= REQ_COPY_USER;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index d7b6009..b974aea 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -48,9 +48,15 @@ struct bio {
struct block_device *bi_bdev;
unsigned intbi_flags;   /* status, command, etc */
int bi_error;
-   unsigned long   bi_rw;  /* bottom bits READ/WRITE,
+   unsigned long   bi_rw;  /* bottom bits rq_flags_bits
 * top bits priority
 */
+   /*
+* this will be a u8 in the next patches and bi_rw can be shrunk to
+* a u32. For compat in these transistional patches op is a int here.
+*/
+   int bi_op;  /* REQ_OP */
+
 
struct bvec_iterbi_iter;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] mm: mmap: Add new /proc tunable for mmap_base ASLR.

2015-11-04 Thread Eric W. Biederman
Daniel Cashman  writes:

> On 11/3/15 5:31 PM, Andrew Morton wrote:
>> On Tue, 03 Nov 2015 18:40:31 -0600 ebied...@xmission.com (Eric W. Biederman) 
>> wrote:
>> 
>>> Andrew Morton  writes:
>>>
 On Tue,  3 Nov 2015 10:10:03 -0800 Daniel Cashman  
 wrote:

> ASLR currently only uses 8 bits to generate the random offset for the
> mmap base address on 32 bit architectures. This value was chosen to
> prevent a poorly chosen value from dividing the address space in such
> a way as to prevent large allocations. This may not be an issue on all
> platforms. Allow the specification of a minimum number of bits so that
> platforms desiring greater ASLR protection may determine where to place
> the trade-off.

 Can we please include a very good description of the motivation for this
 change?  What is inadequate about the current code, what value does the
 enhancement have to our users, what real-world problems are being solved,
 etc.

 Because all we have at present is "greater ASLR protection", which doesn't
 really tell anyone anything.
>>>
>>> The description seemed clear to me.
>>>
>>> More random bits, more entropy, more work needed to brute force.
>>>
>>> 8 bits only requires 256 tries (or a 1 in 256) chance to brute force
>>> something.
>> 
>> Of course, but that's not really very useful.
>> 
>>> We have seen in the last couple of months on Android how only having 8 bits
>>> doesn't help much.
>> 
>> Now THAT is important.  What happened here and how well does the
>> proposed fix improve things?  How much longer will a brute-force attack
>> take to succeed, with a particular set of kernel parameters?  Is the
>> new duration considered to be sufficiently long and if not, are there
>> alternative fixes we should be looking at?
>> 
>> Stuff like this.
>> 
>>> Each additional bit doubles the protection (and unfortunately also
>>> increases fragmentation of the userspace address space).
>> 
>> OK, so the benefit comes with a cost and people who are configuring
>> systems (and the people who are reviewing this patchset!) need to
>> understand the tradeoffs.  Please.
>
> The direct motivation here was in response to the libstagefright
> vulnerabilities that affected Android, specifically to information
> provided by Google's project zero at:
>
> http://googleprojectzero.blogspot.com/2015/09/stagefrightened.html
>
> The attack there specifically used the limited randomness used in
> generating the mmap base address as part of a brute-force-based exploit.
>  In this particular case, the attack was against the mediaserver process
> on Android, which was limited to respawning every 5 seconds, giving the
> attacker an average expected success rate of defeating the mmap ASLR
> after over 10 minutes (128 tries at 5 seconds each).  With change to the
> maximum proposed value of 16 bits, this would change to over 45 hours
> (32768 tries), which would make the user of such a system much more
> likely to notice such an attack.
>
> I understand the desire for this clarification, and will happily try to
> improve the explanation for this change, especially so that those
> considering use of this option understand the tradeoffs, but I also view
> this as one particular hardening change which is a component of making
> attacks such as these harder, rather than the only solution.  As for the
> clarification itself, where would you like it?  I could include a cover
> letter for this patch-set, elaborate more in the commit message itself,
> add more to the Kconfig help description, or some combination of the above.

Unless I am mistaken this there is no cross over between different
processes of this randomization.  Would it make sense to have this as
an rlimit so that if you have processes on the system that are affected
by the tradeoff differently this setting can be changed per process?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] char/misc driver patches for 4.4-rc1

2015-11-04 Thread Greg KH
The following changes since commit 25cb62b76430a91cc6195f902e61c2cb84ade622:

  Linux 4.3-rc5 (2015-10-11 11:09:45 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ 
tags/char-misc-4.4-rc1

for you to fetch changes up to e2d8680741edec84f843f783a7f4a44418b818d7:

  fpga: socfpga: Fix check of return value of devm_request_irq (2015-10-29 
15:20:25 -0700)


char/misc drivers for 4.4-rc1

Here is the big char/misc driver update for 4.4-rc1.  Lots of different
driver and subsystem updates, hwtracing being the largest with the
addition of some new platforms that are now supported.  Full details in
the shortlog.

All of these have been in linux-next for a long time with no reported issues.

Signed-off-by: Greg Kroah-Hartman 


Alan Tull (7):
  usage documentation for FPGA manager core
  fpga manager: add sysfs interface document
  add FPGA manager core
  fpga manager: add driver for socfpga fpga manager
  MAINTAINERS: add fpga manager framework
  fpga manager: ensure lifetime with of_fpga_mgr_get
  fpga manager: remove unnecessary null pointer checks

Alexander Kapshuk (22):
  ver_linux: gcc -dumpversion, use regex to find version number
  ver_linux: make --version, use regex to find version number
  ver_linux: binutils, fix inaccurate output
  ver_linux: util-linux, 'fdformat' not ubiquitous any longer
  ver_linux: module-init-tools, look for numerical input, not field number
  ver_linux: e2fsprogs, look for numerical input, not field number
  ver_linux: jfsutils, look for numerical input, not field number
  ver_linux: reiserfsprogs, look for numerical input, not field number
  ver_linux: xfsprogs, look for numerical input, not field number
  ver_linux: pcmciautils, look for numerical input, not field number
  ver_linux: quota-tools, look for numerical input, not field number
  ver_linux: ppp, look for numerical input, not field number
  ver_linux: libc, input redirection to sed fails in some distros
  ver_linux: ldd, look for numerical input, not field number
  ver_linux: libcpp, fix missing output
  ver_linux: procps, look for numerical input, not field number
  ver_linux: net-tools, look for numerical input, not field number
  ver_linux: loadkeys, look for numerical input, not field number
  ver_linux: sh-utils, look for numerical input, not field number
  ver_linux: use 'udevadm', instead of 'udevinfo'
  ver_linux: wireless-tools, look for numerical input, not field number
  ver_linux: proc/modules, limit text processing to 'sed'

Alexander Kuleshov (1):
  mei: Fix debugfs filename in error output

Alexander Shishkin (14):
  stm class: Introduce an abstraction for System Trace Module devices
  MAINTAINERS: add an entry for System Trace Module device class
  stm class: dummy_stm: Add dummy driver for testing stm class
  stm class: stm_console: Add kernel-console-over-stm driver
  intel_th: Add driver infrastructure for Intel(R) Trace Hub devices
  intel_th: Add pci glue layer for Intel(R) Trace Hub
  intel_th: Add Global Trace Hub driver
  intel_th: Add Software Trace Hub driver
  intel_th: Add Memory Storage Unit driver
  intel_th: Add PTI output driver
  MAINTAINERS: add an entry for Intel(R) Trace Hub
  stm class: Mark src::link __rcu
  intel_th: Fix integer mismatch warnings
  stm class: Select configfs

Alexander Usyskin (4):
  mei: me: fix d0i3 register offset in tracing
  mei: keep the device awake during reads in chunks
  mei: fix the KDoc formating
  mei: amthif: Do not compare bool to 0/1

Alexey Khoroshilov (1):
  mcb: Do not return zero on error path in mcb_pci_probe()

Amitoj Kaur Chawla (1):
  char: ipmi: ipmi_ssif: Replace timeval with timespec64

Andrzej Hajda (3):
  misc/vmw_vmci: use kmemdup rather than duplicating its implementation
  extcon: rt8973a: fix handling regmap_irq_get_virq result
  extcon: sm5502: fix handling regmap_irq_get_virq result

Ashutosh Dixit (10):
  misc: mic: SCIF poll
  misc: mic: Add support for kernel mode SCIF clients
  misc: mic: MIC COSM bus
  misc: mic: Coprocessor State Management (COSM) driver
  misc: mic: COSM SCIF server
  misc: mic: COSM client driver
  misc: mic: Remove COSM functionality from the MIC host driver
  misc: mic: Remove COSM functionality from the MIC card driver
  misc: mic: Update MIC host daemon with COSM changes
  misc: mic: Fix randconfig build error

Chanwoo Choi (8):
  Merge branch 'ib-extcon-mfd-4.4' into extcon-next
  extcon: arizona: Reorder the default statement to remove unnecessary 
warning
  extcon: gpio: Use resource managed function for request_irq
  extcon: gpio: 

[GIT PULL] Driver core patches for 4.4-rc1

2015-11-04 Thread Greg KH
The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/ 
tags/driver-core-4.4-rc1

for you to fetch changes up to c23fe83138ed7b11ad763cbe8bf98e5378c04bd6:

  debugfs: Add debugfs_create_ulong() (2015-10-18 10:14:39 -0700)


driver core update for 4.4-rc1

Here's the "big" driver core updates for 4.4-rc1.  Primarily a bunch of
debugfs updates, with a smattering of minor driver core fixes and
updates as well.

All have been in linux-next for a long time.

Signed-off-by: Greg Kroah-Hartman 


Dan Carpenter (1):
  devres: fix a for loop bounds check

Gabriel Somlo (1):
  kobject: move EXPORT_SYMBOL() macros next to corresponding definitions

Greg Kroah-Hartman (1):
  Revert "mm: Check if section present during memory block (un)registering"

Lee Duncan (1):
  base: soc: siplify ida usage

NeilBrown (1):
  sysfs: correctly handle short reads on PREALLOC attrs.

Stephen Boyd (4):
  debugfs: Consolidate file mode checks in debugfs_create_*()
  debugfs: Add read-only/write-only x64 file ops
  debugfs: Add read-only/write-only size_t file ops
  debugfs: Add read-only/write-only bool file ops

Tan Xiaojun (1):
  CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit

Thierry Reding (1):
  driver-core: platform: Provide helpers for multi-driver modules

Ulf Magnusson (2):
  debugfs: document that debugfs_remove*() accepts NULL and error values
  kobject: explain what kobject's sd field is

Uwe Kleine-König (1):
  base/platform: assert that dev_pm_domain callbacks are called 
unconditionally

Viresh Kumar (3):
  ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
  debugfs: Pass bool pointer to debugfs_create_bool()
  debugfs: Add debugfs_create_ulong()

Yinghai Lu (1):
  mm: Check if section present during memory block (un)registering

Zhen Lei (1):
  of: to support binding numa node to specified device in devicetree

 Documentation/driver-model/platform.txt|  14 +++
 Documentation/filesystems/debugfs.txt  |   2 +-
 arch/arm64/kernel/debug-monitors.c |   4 +-
 drivers/acpi/ec_sys.c  |   2 +-
 drivers/acpi/internal.h|   2 +-
 drivers/base/core.c|   2 +-
 drivers/base/dma-contiguous.c  |   2 +-
 drivers/base/platform.c|  80 +++--
 drivers/base/regmap/internal.h |   6 +-
 drivers/base/regmap/regcache-lzo.c |   4 +-
 drivers/base/regmap/regcache.c |  24 ++--
 drivers/base/soc.c |  21 +---
 drivers/bluetooth/hci_qca.c|   4 +-
 drivers/iommu/amd_iommu_init.c |   2 +-
 drivers/iommu/amd_iommu_types.h|   2 +-
 drivers/misc/mei/mei_dev.h |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |   4 +-
 drivers/net/wireless/ath/ath10k/core.h |   2 +-
 drivers/net/wireless/ath/ath5k/ath5k.h |   2 +-
 drivers/net/wireless/ath/ath9k/hw.c|   2 +-
 drivers/net/wireless/ath/ath9k/hw.h|   4 +-
 drivers/net/wireless/b43/debugfs.c |  18 +--
 drivers/net/wireless/b43/debugfs.h |   2 +-
 drivers/net/wireless/b43legacy/debugfs.c   |  10 +-
 drivers/net/wireless/b43legacy/debugfs.h   |   2 +-
 drivers/net/wireless/iwlegacy/common.h |   6 +-
 drivers/net/wireless/iwlwifi/mvm/mvm.h |   6 +-
 drivers/of/device.c|  11 +-
 drivers/scsi/snic/snic_trc.c   |   4 +-
 drivers/scsi/snic/snic_trc.h   |   2 +-
 drivers/uwb/uwb-debug.c|   2 +-
 fs/debugfs/file.c  | 177 +
 fs/debugfs/inode.c |   6 +-
 fs/sysfs/file.c|   4 +-
 include/linux/debugfs.h|   6 +-
 include/linux/edac.h   |   2 +-
 include/linux/fault-inject.h   |   2 +-
 include/linux/kobject.h|   2 +-
 include/linux/platform_device.h|   8 ++
 kernel/futex.c |   4 +-
 lib/devres.c   |   2 +-
 lib/dma-debug.c|   2 +-
 lib/kobject.c  |  12 +-
 mm/failslab.c  |   8 +-
 mm/page_alloc.c|   8 +-
 sound/soc/codecs/wm_adsp.h |   2 +-
 46 files changed, 302 insertions(+), 193 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

[GIT PULL] TTY/Serial patches for 4.4-rc1

2015-11-04 Thread Greg KH
The following changes since commit 25cb62b76430a91cc6195f902e61c2cb84ade622:

  Linux 4.3-rc5 (2015-10-11 11:09:45 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git/ tags/tty-4.4-rc1

for you to fetch changes up to e052c6d15c61cc4caff2f06cbca72b183da9f15e:

  tty: Use unbound workqueue for all input workers (2015-10-17 21:32:21 -0700)


TTY/Serial driver patches for 4.4-rc1

Here is the big tty and serial driver update for 4.4-rc1.

Lots of serial driver updates and a few small tty core changes.  Full
details in the shortlog.

All of these have been in linux-next for a while.

Signed-off-by: Greg Kroah-Hartman 


Abhimanyu Kapur (1):
  ARM64: TTY: hvc_dcc: Add support for ARM64 dcc

Aleksandar Mitev (1):
  serial: sh-sci: Remove timer on shutdown of port

Alexandre Belloni (1):
  tty/serial: at91: move ATMEL_MAX_UART

Andre Przywara (1):
  serial: atmel: fix compiler warning on address cast

Andreas Werner (1):
  tty: serial: men_z135_uart.c: use mcb memory region size instead of 
hardcoded one

Andrzej Hajda (1):
  tty: serial: lpc32xx_hs: fix handling platform_get_irq result

Andy Shevchenko (1):
  serial: 8250_dma: no need to sync RX buffer

Arnd Bergmann (1):
  serial: fix mctrl helper functions

Axel Lin (1):
  serial: mux: Convert to uart_console_device instead of open-coded

Christoph Hellwig (1):
  mpsc: use dma_set_mask insted of dma_supported

Ezequiel Garcia (1):
  serial: omap: remove warnings about unused functions

Fabio Estevam (1):
  serial: 68328serial: Use NULL for pointers

Geert Uytterhoeven (35):
  serial: sh-sci: Replace buggy big #ifdef by runtime logic
  serial: sh-sci: Prevent compiler warnings on 64-bit
  serial: sh-sci: Correct SCIF_ERROR_CLEAR for plain SCIF
  serial: sh-sci: Use SCIF_DR instead of hardcoded literal 1
  serial: sh-sci: Use SCSMR_CKS instead of hardcoded literal 3
  serial: sh-sci: Drop path in reference to serial_core.c
  serial: sh-sci: Improve readability of sampling rate configuration
  serial: sh-sci: Make sci_irq_desc[] const
  serial: sh-sci: Make sci_regmap[] const
  serial: sh-sci: Remove useless memory allocation failure printks
  serial: sh-sci: Remove bogus sci_handle_fifo_overrun() call on (H)SCIF
  serial: sh-sci: Improve DMA error messages
  serial: sh-sci: Improve comments for DMA timeout calculation
  serial: sh-sci: Handle DMA init failures inside sci_request_dma()
  serial: sh-sci: Use correct device for DMA mapping with IOMMU
  serial: sh-sci: Use min_t()/max_t() instead of casts
  serial: sh-sci: Switch to dma_map_single() for DMA transmission
  serial: sh-sci: Fix TX buffer mapping leak
  serial: sh-sci: Use DMA submission helpers instead of open-coding
  serial: sh-sci: Switch to generic DMA residue handling
  serial: sh-sci: Stop acknowledging DMA transmit completions
  serial: sh-sci: Simplify sci_submit_rx() error handling
  serial: sh-sci: Do not resubmit DMA descriptors
  serial: sh-sci: Fix race condition between RX worker and cleanup
  serial: sh-sci: Pass scatterlist to sci_dma_rx_push()
  serial: sh-sci: Use tty_insert_flip_string() for DMA receive
  serial: sh-sci: Use incrementing pointers instead of stack array
  serial: sh-sci: Don't call sci_rx_interrupt() on error when using DMA
  serial: sh-sci: Don't call sci_dma_rx_push() if no data has arrived
  serial: sh-sci: Shuffle functions around
  serial: sh-sci: Get rid of the workqueue to handle receive DMA requests
  serial: sh-sci: Submit RX DMA from RX interrupt on (H)SCIF
  serial: sh-sci: Stop calling sci_start_rx() from sci_request_dma()
  serial: sh-sci: Add DT support to DMA setup
  serial: pl011: Spelling s/clocks-names/clock-names/

Greg Kroah-Hartman (1):
  Merge 4.3-rc5 into tty-next

Guillaume Gomez (1):
  tty: remove unneeded return statement

Heikki Krogerus (15):
  serial: 8250_dw: add separate pointer for the uart_port to dw8250_probe
  serial: 8250_dw: adapt to unified device property interface
  serial: 8250_dw: hook the DMA in one place
  serial: 8250_dw: only setup the port from one place
  serial: 8250_dw: add dw8250_quirks function
  serial: 8250_dw: proper support for UARTs without busy functionality
  serial: 8250_dw: rename and comment the fallback dma filter
  serial: 8250_dw: cleanup dw8250_idma_filter
  serial: 8250_dw: cleanup dw8250_setup_port
  serial: 8250_dw: don't set UPF_BOOT_AUTOCONF flag
  serial: 8250_pci: Intel MID UART support to its own driver
  dmaengine: hsu: make the UART driver in control of selecting this driver
  dmaengine: hsu: introduce stubs for the exported functions
  dmaengine: hsu: remove 

Re: [linux-sunxi] [PATCH v4 2/6] clk: sunxi: Add H3 clocks support

2015-11-04 Thread Julian Calaby
Hi Maxime,

On Thu, Nov 5, 2015 at 3:23 AM, Maxime Ripard
 wrote:
> Hi Julian,
>
> On Wed, Oct 28, 2015 at 10:12:09AM +1100, Julian Calaby wrote:
>> > +   of_property_for_each_u32(node, "clock-indices", prop, p, index) {
>> > +   of_property_read_string_index(node, "clock-output-names",
>> > + i, _name);
>> > +
>> > +   if (index == 17 || (index >= 29 && index <= 31))
>> > +   clk_parent = AHB2;
>> > +   else if (index <= 63 || index >= 128)
>> > +   clk_parent = AHB1;
>> > +   else if (index >= 64 && index <= 95)
>> > +   clk_parent = APB1;
>> > +   else if (index >= 96 && index <= 127)
>> > +   clk_parent = APB2;
>>
>> A way to make this reusable in the future might be to encode it in a
>> structure like:
>>
>> static const struct bus_clock_paths sun8i_h3_bus_clock_paths __initdata = {
>> {.parent = 2, .min = 17, .max = 17}, /* index 17 is from AHB2 */
>> {.parent = 2, .min = 29, .max = 31}, /* AHB2 bank */
>> {.parent = 1, .min = 63, .max = 128}, /* AHB1 bank */
>> ...
>> {}
>> };
>>
>> Then the code here can be reused for other clocks like this in the
>> future without too much bloat. (And this would potentially could be
>> generic enough for other platforms.)
>
> We don't really need that at the moment. There's not point in writing
> more complicated code to support a use case we don't have yet.
>
> (However, something along these lines will definitely be needed if we
> ever have another SoC having the same bus gates madness)

This was a suggestion for the future to address Jens' comment about
having a bus clock driver instead of encoding it in devicetree.

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] USB driver patches for 4.4-rc1

2015-11-04 Thread Greg KH
The following changes since commit 32b88194f71d6ae7768a29f87fbba454728273ee:

  Linux 4.3-rc7 (2015-10-25 10:39:47 +0900)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git/ tags/usb-4.4-rc1

for you to fetch changes up to 0bbc367e21bfeea33230d893be4fa3a3ff9bcb48:

  Merge 4.3-rc7 into usb-next (2015-10-26 06:39:46 +0900)


USB patches for 4.4-rc1

Here is the big USB patchset for 4.4-rc1.

As usual, most of the changes are in the gadget subsystem, and we
removed a host controller for a device that is no longer in existance,
and probably never was even made public.  There is also other minor
driver updates and new device ids, full details in the changelog.

All of these have been in linux-next for a while.

Signed-off-by: Greg Kroah-Hartman 


Alan Stern (1):
  usb: misc: usbtest: format the data pattern according to max packet size

Alexandre Belloni (1):
  usb: gadget: at91_udc: move at91_udc_data in at91_udc.h

Andrew F. Davis (1):
  usb: misc: usb3503: Use i2c_add_driver helper macro

Andrzej Hajda (1):
  usb: host: ehci-msm: fix handling platform_get_irq result

Andy Shevchenko (1):
  xhci: replace custom implementation of readq / writeq

Antti Seppälä (1):
  usb: dwc2: Use platform endianness when accessing registers

Axel Lin (1):
  phy: sun4i-usb: Use devm_gpiod_get_optional for optional GPIOs

Bin Liu (2):
  usb: musb: set the controller speed based on the config setting
  usb: musb: dsps: control musb speed based on dts setting

Bjørn Mork (1):
  USB: qcserial: add Sierra Wireless MC74xx/EM74xx

Chase Metzger (1):
  usb: core: hub: Removed some warnings generated by checkpatch.pl

Chunfeng Yun (3):
  dt-bindings: Add usb3.0 phy binding for MT65xx SoCs
  phy: add usb3.0 phy driver for mt65xx SoCs
  MAINTAINERS: add Mediatek usb3 phy driver

David Ward (6):
  USB: option: revert introduction of struct option_private
  USB: usb_wwan/option: generalize option_send_setup for other drivers
  USB: qcserial: make AT URCs work for Sierra Wireless devices
  Revert "USB: qcserial/option: make AT URCs work for Sierra Wireless 
MC7305/MC7355"
  Revert "USB: qcserial/option: make AT URCs work for Sierra Wireless 
MC73xx"
  USB: qcserial: update comment for Sierra Wireless MC7304/MC7354

Doug Anderson (1):
  usb: dwc2: host: Fix use after free w/ simultaneous irqs

Douglas Anderson (1):
  usb: dwc2: host: Protect PCGCTL with lock in dwc2_port_resume()

Duc Dang (3):
  usb: make xhci platform driver use 64 bit or 32 bit DMA
  usb: Add support for ACPI identification to xhci-platform
  usb: xhci: configure 32-bit DMA if the controller does not support 64-bit 
DMA

Eric Curtin (1):
  tools: usbip: detach: avoid calling strlen() at each iteration

Fabio Estevam (2):
  usb: chipidea: Add support for 'phy-clkgate-delay-us' property
  Doc: usb: ci-hdrc-usb2: Add phy-clkgate-delay-us entry

Felipe Balbi (12):
  usb: dwc3: gadget: move trace_dwc3_ep_queue()
  usb: dwc3: gadget: start requests as soon as they come
  usb: dwc3: gadget: clear DWC3_PENDING_REQUEST when request is queued
  usb: dwc3: gadget: improve ep_queue's error reporting
  usb: gadget: mass_storage: allow for deeper queue lengths
  usb: dwc2: rename all s3c_* to dwc2_*
  usb: gadget: pch-udc: fix lock
  usb: dwc3: gadget: start transfer on XFER_COMPLETE
  usb: dwc3: gadget: use update transfer command
  usb: dwc3: gadget: use Update Transfer from Xfer In Progress
  usb: dwc3: gadget: remove unnecessary _irqsave()
  Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"

Felipe F. Tonello (1):
  usb: gadget: f_midi: check for error on usb_ep_queue

Geliang Tang (1):
  usb: gadget: fix a trivial typo

Greg Kroah-Hartman (7):
  Merge 4.3-rc3 into usb-next
  Merge 4.3-rc5 into usb-next
  Merge tag 'phy-for-4.4' of git://git.kernel.org/.../kishon/linux-phy into 
usb-next
  Merge tag 'usb-for-v4.4' of git://git.kernel.org/.../balbi/usb into 
usb-next
  Merge tag 'usb-ci-v4.4-rc1' of git://git.kernel.org/.../peter.chen/usb 
into usb-next
  Merge tag 'usb-serial-4.4-rc1' of 
git://git.kernel.org/.../johan/usb-serial into usb-next
  Merge 4.3-rc7 into usb-next

Gregory Herrero (22):
  usb: dwc2: host: don't clear hprt0 status bits when exiting hibernation
  usb: dwc2: host: create a function to handle port_resume
  usb: dwc2: host: add flag to reflect bus state
  usb: dwc2: host: enter hibernation during bus suspend
  usb: dwc2: host: update hcd and lx_state during start/stop callbacks
  usb: dwc2: host: avoid resetting lx_state to L3 during disconnect
  usb: dwc2: host: ignore wakeup interrupt if hibernation supported
  usb: dwc2: host: resume only if bus 

[PATCH 13/32] mm: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares mm's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 mm/page_io.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index b995a5b..ec7ad22 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -253,7 +253,8 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
bio_end_io_t end_write_func)
 {
struct bio *bio;
-   int ret, rw = WRITE;
+   int ret;
+   u32 op_flags = 0;
struct swap_info_struct *sis = page_swap_info(page);
 
if (sis->flags & SWP_FILE) {
@@ -312,11 +313,11 @@ int __swap_writepage(struct page *page, struct 
writeback_control *wbc,
goto out;
}
if (wbc->sync_mode == WB_SYNC_ALL)
-   rw |= REQ_SYNC;
+   op_flags |= REQ_SYNC;
count_vm_event(PSWPOUT);
set_page_writeback(page);
unlock_page(page);
-   submit_bio(rw, bio);
+   submit_bio(REQ_OP_WRITE | op_flags, bio);
 out:
return ret;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/32] drbd: set bio bi_op to REQ_OP

2015-11-04 Thread mchristi
From: Mike Christie 

This patch has drbd set the bi_op.

For compat reasons, we are still ORing the op into bi_rw. This
will be dropped in later patches in this series when everyone
is updated.

Signed-off-by: Mike Christie 
---
 drivers/block/drbd/drbd_actlog.c   |  1 +
 drivers/block/drbd/drbd_bitmap.c   |  1 +
 drivers/block/drbd/drbd_int.h  |  2 +-
 drivers/block/drbd/drbd_receiver.c | 39 --
 drivers/block/drbd/drbd_worker.c   |  3 ++-
 5 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 5ad6b09..ed2eafe 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -160,6 +160,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
bio->bi_private = device;
bio->bi_end_io = drbd_md_endio;
bio->bi_rw = op | op_flags;
+   bio->bi_op = op;
 
if (op != REQ_OP_WRITE && device->state.disk == D_DISKLESS && 
device->ldev == NULL)
/* special case, drbd_md_read() during drbd_adm_attach(): no 
get_ldev */
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index e8c65a4..2ff407a 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -1021,6 +1021,7 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, 
int page_nr) __must_ho
bio->bi_end_io = drbd_bm_endio;
 
if (drbd_insert_fault(device, (rw == REQ_OP_WRITE) ? DRBD_FAULT_MD_WR : 
DRBD_FAULT_MD_RD)) {
+   bio->bi_op = rw;
bio->bi_rw |= rw;
bio_io_error(bio);
} else {
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 2934481..05eaba8 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1542,7 +1542,7 @@ extern bool drbd_rs_should_slow_down(struct drbd_device 
*device, sector_t sector
bool throttle_if_app_is_waiting);
 extern int drbd_submit_peer_request(struct drbd_device *,
struct drbd_peer_request *, const unsigned,
-   const int);
+   const unsigned, const int);
 extern int drbd_free_peer_reqs(struct drbd_device *, struct list_head *);
 extern struct drbd_peer_request *drbd_alloc_peer_req(struct drbd_peer_device 
*, u64,
 sector_t, unsigned int,
diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index c097909..4e458bd 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1365,7 +1365,8 @@ void drbd_bump_write_ordering(struct drbd_resource 
*resource, struct drbd_backin
 /* TODO allocate from our own bio_set. */
 int drbd_submit_peer_request(struct drbd_device *device,
 struct drbd_peer_request *peer_req,
-const unsigned rw, const int fault_type)
+const unsigned op, const unsigned op_flags,
+const int fault_type)
 {
struct bio *bios = NULL;
struct bio *bio;
@@ -1417,7 +1418,8 @@ next_bio:
/* > peer_req->i.sector, unless this is the first bio */
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = device->ldev->backing_bdev;
-   bio->bi_rw = rw;
+   bio->bi_rw = op | op_flags;
+   bio->bi_op = op;
bio->bi_private = peer_req;
bio->bi_end_io = drbd_peer_request_endio;
 
@@ -1425,7 +1427,7 @@ next_bio:
bios = bio;
++n_bios;
 
-   if (rw & REQ_DISCARD) {
+   if (op & REQ_OP_DISCARD) {
bio->bi_iter.bi_size = data_size;
goto submit;
}
@@ -1803,7 +1805,8 @@ static int recv_resync_read(struct drbd_peer_device 
*peer_device, sector_t secto
spin_unlock_irq(>resource->req_lock);
 
atomic_add(pi->size >> 9, >rs_sect_ev);
-   if (drbd_submit_peer_request(device, peer_req, WRITE, DRBD_FAULT_RS_WR) 
== 0)
+   if (drbd_submit_peer_request(device, peer_req, REQ_OP_WRITE, 0,
+DRBD_FAULT_RS_WR) == 0)
return 0;
 
/* don't care for the reason here */
@@ -2125,7 +2128,7 @@ static int wait_for_and_update_peer_seq(struct 
drbd_peer_device *peer_device, co
 /* see also bio_flags_to_wire()
  * DRBD_REQ_*, because we need to semantically map the flags to data packet
  * flags and back. We may replicate to other kernel versions. */
-static unsigned long wire_flags_to_bio(u32 dpf)
+static unsigned long wire_flags_to_bio_flags(u32 dpf)
 {
return  (dpf & DP_RW_SYNC ? REQ_SYNC : 0) |
(dpf & DP_FUA ? REQ_FUA : 0) |
@@ -2133,6 +2136,14 @@ static unsigned long wire_flags_to_bio(u32 dpf)
(dpf & DP_DISCARD ? REQ_DISCARD : 0);
 }
 
+static unsigned long wire_flags_to_bio_op(u32 dpf)
+{
+   if (dpf 

[PATCH 19/32] block: add helper to get data dir from op

2015-11-04 Thread mchristi
From: Mike Christie 

In later patches the op will no longer be a bitmap, so we will
not have REQ_WRITE set for all non reads like discard, flush,
and write same. Drivers will still want to treat them as writes
for accounting reasons, so this patch adds a helper to translate
a op to a data direction.

Signed-off-by: Mike Christie 
---
 include/linux/blkdev.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 19c2e94..cf5f518 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -586,6 +586,18 @@ static inline void queue_flag_clear(unsigned int flag, 
struct request_queue *q)
 
 #define list_entry_rq(ptr) list_entry((ptr), struct request, queuelist)
 
+/*
+ * Non REQ_OP_WRITE requests like discard, write same, etc, are
+ * considered WRITEs.
+ */
+static inline int op_to_data_dir(int op)
+{
+   if (op == REQ_OP_READ)
+   return READ;
+   else
+   return WRITE;
+}
+
 #define rq_data_dir(rq)((int)((rq)->cmd_flags & 1))
 
 /*
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/32] xfs: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares xfs's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 fs/xfs/xfs_buf.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 8ecffb3..0621d70 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1123,7 +1123,8 @@ xfs_buf_ioapply_map(
int map,
int *buf_offset,
int *count,
-   int rw)
+   int op,
+   int op_flags)
 {
int page_index;
int total_nr_pages = bp->b_page_count;
@@ -1186,7 +1187,7 @@ next_chunk:
flush_kernel_vmap_range(bp->b_addr,
xfs_buf_vmap_len(bp));
}
-   submit_bio(rw, bio);
+   submit_bio(op | op_flags, bio);
if (size)
goto next_chunk;
} else {
@@ -1206,7 +1207,8 @@ _xfs_buf_ioapply(
struct xfs_buf  *bp)
 {
struct blk_plug plug;
-   int rw;
+   int op;
+   int op_flags = 0;
int offset;
int size;
int i;
@@ -1225,14 +1227,13 @@ _xfs_buf_ioapply(
bp->b_ioend_wq = bp->b_target->bt_mount->m_buf_workqueue;
 
if (bp->b_flags & XBF_WRITE) {
+   op = REQ_OP_WRITE;
if (bp->b_flags & XBF_SYNCIO)
-   rw = WRITE_SYNC;
-   else
-   rw = WRITE;
+   op_flags = WRITE_SYNC;
if (bp->b_flags & XBF_FUA)
-   rw |= REQ_FUA;
+   op_flags |= REQ_FUA;
if (bp->b_flags & XBF_FLUSH)
-   rw |= REQ_FLUSH;
+   op_flags |= REQ_FLUSH;
 
/*
 * Run the write verifier callback function if it exists. If
@@ -1262,13 +1263,14 @@ _xfs_buf_ioapply(
}
}
} else if (bp->b_flags & XBF_READ_AHEAD) {
-   rw = READA;
+   op = REQ_OP_READ;
+   op_flags = REQ_RAHEAD;
} else {
-   rw = READ;
+   op = REQ_OP_READ;
}
 
/* we only use the buffer cache for meta-data */
-   rw |= REQ_META;
+   op_flags |= REQ_META;
 
/*
 * Walk all the vectors issuing IO on them. Set up the initial offset
@@ -1280,7 +1282,7 @@ _xfs_buf_ioapply(
size = BBTOB(bp->b_io_length);
blk_start_plug();
for (i = 0; i < bp->b_map_count; i++) {
-   xfs_buf_ioapply_map(bp, i, , , rw);
+   xfs_buf_ioapply_map(bp, i, , , op, op_flags);
if (bp->b_error)
break;
if (size <= 0)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/32] md: set bi_op to REQ_OP

2015-11-04 Thread mchristi
From: Mike Christie 

This patch has md set the bi_op.

For compat reasons, we are still ORing the op into bi_rw. This
will be dropped in later patches in this series when everyone
is updated.

For discards, I am also still passing in REQ_WRITE in with the
flags, so code that has not yet been converted will work like
before. This will be cleaned up in later patches when everyone
is converted.

Signed-off-by: Mike Christie 
---
 drivers/md/raid1.c  |  9 +
 drivers/md/raid10.c | 13 +
 drivers/md/raid5.c  | 50 +++---
 3 files changed, 53 insertions(+), 19 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 527fdf5..94e5a63 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1051,6 +1051,7 @@ static void make_request(struct mddev *mddev, struct bio 
* bio)
int i, disks;
struct bitmap *bitmap;
unsigned long flags;
+   const int op = bio->bi_op;
const int rw = bio_data_dir(bio);
const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
@@ -1164,6 +1165,7 @@ read_again:
mirror->rdev->data_offset;
read_bio->bi_bdev = mirror->rdev->bdev;
read_bio->bi_end_io = raid1_end_read_request;
+   read_bio->bi_op = REQ_OP_READ;
read_bio->bi_rw = READ | do_sync;
read_bio->bi_private = r1_bio;
 
@@ -1374,6 +1376,7 @@ read_again:
   conf->mirrors[i].rdev->data_offset);
mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
mbio->bi_end_io = raid1_end_write_request;
+   mbio->bi_op = op;
mbio->bi_rw =
WRITE | do_flush_fua | do_sync | do_discard | do_same;
mbio->bi_private = r1_bio;
@@ -2017,6 +2020,7 @@ static void sync_request_write(struct mddev *mddev, 
struct r1bio *r1_bio)
  !test_bit(MD_RECOVERY_SYNC, >recovery
continue;
 
+   wbio->bi_op = REQ_OP_WRITE;
wbio->bi_rw = WRITE;
wbio->bi_end_io = end_sync_write;
atomic_inc(_bio->remaining);
@@ -2188,6 +2192,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
wbio = bio_clone_mddev(r1_bio->master_bio, GFP_NOIO, 
mddev);
}
 
+   wbio->bi_op = REQ_OP_WRITE;
wbio->bi_rw = WRITE;
wbio->bi_iter.bi_sector = r1_bio->sector;
wbio->bi_iter.bi_size = r1_bio->sectors << 9;
@@ -2329,6 +2334,7 @@ read_more:
bio->bi_iter.bi_sector = r1_bio->sector + rdev->data_offset;
bio->bi_bdev = rdev->bdev;
bio->bi_end_io = raid1_end_read_request;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw = READ | do_sync;
bio->bi_private = r1_bio;
if (max_sectors < r1_bio->sectors) {
@@ -2544,6 +2550,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr, int *skipp
if (i < conf->raid_disks)
still_degraded = 1;
} else if (!test_bit(In_sync, >flags)) {
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw = WRITE;
bio->bi_end_io = end_sync_write;
write_targets ++;
@@ -2571,6 +2578,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr, int *skipp
if (disk < 0)
disk = i;
}
+   bio->bi_op = REQ_OP_READ;
bio->bi_rw = READ;
bio->bi_end_io = end_sync_read;
read_targets++;
@@ -2583,6 +2591,7 @@ static sector_t sync_request(struct mddev *mddev, 
sector_t sector_nr, int *skipp
 * if we are doing resync or repair. Otherwise, 
leave
 * this device alone for this sync request.
 */
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw = WRITE;
bio->bi_end_io = end_sync_write;
write_targets++;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 69352a6..c7430f9 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1058,6 +1058,7 @@ static void __make_request(struct mddev *mddev, struct 
bio *bio)
struct r10bio *r10_bio;
struct bio *read_bio;
int i;
+   const int op = bio->bi_op;
const int rw = bio_data_dir(bio);
const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
const unsigned 

[PATCH 21/32] bcache: set bi_op to REQ_OP

2015-11-04 Thread mchristi
From: Mike Christie 

This patch has bcache set the bi_op.

For compat reasons, we are still ORing the op into bi_rw. This
will be dropped in later patches in this series when everyone
is updated.

Signed-off-by: Mike Christie 
---
 drivers/md/bcache/btree.c |  2 ++
 drivers/md/bcache/journal.c   |  3 +++
 drivers/md/bcache/movinggc.c  |  1 +
 drivers/md/bcache/request.c   |  2 ++
 drivers/md/bcache/super.c | 24 ++--
 drivers/md/bcache/writeback.c |  2 ++
 6 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 83392f8..3be5b05 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -295,6 +295,7 @@ static void bch_btree_node_read(struct btree *b)
closure_init_stack();
 
bio = bch_bbio_alloc(b->c);
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = REQ_META|READ_SYNC;
bio->bi_iter.bi_size = KEY_SIZE(>key) << 9;
bio->bi_end_io  = btree_node_read_endio;
@@ -397,6 +398,7 @@ static void do_btree_node_write(struct btree *b)
 
b->bio->bi_end_io   = btree_node_write_endio;
b->bio->bi_private  = cl;
+   b->bio->bi_op   = REQ_OP_WRITE;
b->bio->bi_rw   = REQ_META|WRITE_SYNC|REQ_FUA;
b->bio->bi_iter.bi_size = roundup(set_bytes(i), block_bytes(b->c));
bch_bio_map(b->bio, i);
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index ba9192b..d152e78 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -54,6 +54,7 @@ reread:   left = ca->sb.bucket_size - offset;
bio_reset(bio);
bio->bi_iter.bi_sector  = bucket + offset;
bio->bi_bdev= ca->bdev;
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = READ;
bio->bi_iter.bi_size= len << 9;
 
@@ -452,6 +453,7 @@ static void do_journal_discard(struct cache *ca)
bio->bi_iter.bi_sector  = bucket_to_sector(ca->set,
ca->sb.d[ja->discard_idx]);
bio->bi_bdev= ca->bdev;
+   bio->bi_op  = REQ_OP_DISCARD;
bio->bi_rw  = REQ_WRITE|REQ_DISCARD;
bio->bi_max_vecs= 1;
bio->bi_io_vec  = bio->bi_inline_vecs;
@@ -626,6 +628,7 @@ static void journal_write_unlocked(struct closure *cl)
bio_reset(bio);
bio->bi_iter.bi_sector  = PTR_OFFSET(k, i);
bio->bi_bdev= ca->bdev;
+   bio->bi_op  = REQ_OP_WRITE;
bio->bi_rw  = REQ_WRITE|REQ_SYNC|REQ_META|REQ_FLUSH|REQ_FUA;
bio->bi_iter.bi_size = sectors << 9;
 
diff --git a/drivers/md/bcache/movinggc.c b/drivers/md/bcache/movinggc.c
index b929fc9..1318f32 100644
--- a/drivers/md/bcache/movinggc.c
+++ b/drivers/md/bcache/movinggc.c
@@ -163,6 +163,7 @@ static void read_moving(struct cache_set *c)
moving_init(io);
bio = >bio.bio;
 
+   bio->bi_op  = REQ_OP_READ;
bio->bi_rw  = READ;
bio->bi_end_io  = read_moving_endio;
 
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 8e9877b..7a84f3b 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -253,6 +253,7 @@ static void bch_data_insert_start(struct closure *cl)
trace_bcache_cache_insert(k);
bch_keylist_push(>insert_keys);
 
+   n->bi_op = REQ_OP_WRITE;
n->bi_rw |= REQ_WRITE;
bch_submit_bbio(n, op->c, k, 0);
} while (n != bio);
@@ -925,6 +926,7 @@ static void cached_dev_write(struct cached_dev *dc, struct 
search *s)
struct bio *flush = bio_alloc_bioset(GFP_NOIO, 0,
 
dc->disk.bio_split);
 
+   flush->bi_op= REQ_OP_WRITE;
flush->bi_rw= WRITE_FLUSH;
flush->bi_bdev  = bio->bi_bdev;
flush->bi_end_io = request_endio;
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index a987c90..ccc6266 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -212,6 +212,7 @@ static void __write_super(struct cache_sb *sb, struct bio 
*bio)
unsigned i;
 
bio->bi_iter.bi_sector  = SB_SECTOR;
+   bio->bi_op  = 0;
bio->bi_rw  = REQ_SYNC|REQ_META;
bio->bi_iter.bi_size= SB_SIZE;
bch_bio_map(bio, NULL);
@@ -333,7 +334,7 @@ static void uuid_io_unlock(struct closure *cl)
up(>uuid_write_mutex);
 }
 
-static void uuid_io(struct cache_set *c, unsigned long rw,
+static void uuid_io(struct cache_set *c, int op, unsigned long op_flags,
struct bkey *k, struct 

[PATCH 11/32] gfs2: prepare for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

This patch prepares gfs2's submit_bio use for the next
patches that split bi_rw into a operation and flags field.
Instead of passing in a bitmap with both the operation and
flags mixed in, the callers will now pass them in seperately.

This patch modifies the code related to the submit_bio calls
so the flags and operation are seperated. When this is done
for all code, one of the later patches in the series will
the actual submit_bio call, so the patches are bisectable.

Signed-off-by: Mike Christie 
---
 fs/gfs2/log.c  |  8 
 fs/gfs2/lops.c | 11 ++-
 fs/gfs2/lops.h |  2 +-
 3 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 536e7a6..8324af5 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -657,7 +657,7 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 
flags)
struct gfs2_log_header *lh;
unsigned int tail;
u32 hash;
-   int rw = WRITE_FLUSH_FUA | REQ_META;
+   int op_flags = WRITE_FLUSH_FUA | REQ_META;
struct page *page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
enum gfs2_freeze_state state = atomic_read(>sd_freeze_state);
lh = page_address(page);
@@ -682,12 +682,12 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 
flags)
if (test_bit(SDF_NOBARRIERS, >sd_flags)) {
gfs2_ordered_wait(sdp);
log_flush_wait(sdp);
-   rw = WRITE_SYNC | REQ_META | REQ_PRIO;
+   op_flags = WRITE_SYNC | REQ_META | REQ_PRIO;
}
 
sdp->sd_log_idle = (tail == sdp->sd_log_flush_head);
gfs2_log_write_page(sdp, page);
-   gfs2_log_flush_bio(sdp, rw);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, op_flags);
log_flush_wait(sdp);
 
if (sdp->sd_log_tail != tail)
@@ -735,7 +735,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock 
*gl,
 
gfs2_ordered_write(sdp);
lops_before_commit(sdp, tr);
-   gfs2_log_flush_bio(sdp, WRITE);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0);
 
if (sdp->sd_log_head != sdp->sd_log_flush_head) {
log_flush_wait(sdp);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index d5369a1..36b047a 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -230,17 +230,18 @@ static void gfs2_end_log_write(struct bio *bio)
 /**
  * gfs2_log_flush_bio - Submit any pending log bio
  * @sdp: The superblock
- * @rw: The rw flags
+ * @op: REQ_OP
+ * @op_flags: rq_flag_bits
  *
  * Submit any pending part-built or full bio to the block device. If
  * there is no pending bio, then this is a no-op.
  */
 
-void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw)
+void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int op, int op_flags)
 {
if (sdp->sd_log_bio) {
atomic_inc(>sd_log_in_flight);
-   submit_bio(rw, sdp->sd_log_bio);
+   submit_bio(op | op_flags, sdp->sd_log_bio);
sdp->sd_log_bio = NULL;
}
 }
@@ -299,7 +300,7 @@ static struct bio *gfs2_log_get_bio(struct gfs2_sbd *sdp, 
u64 blkno)
nblk >>= sdp->sd_fsb2bb_shift;
if (blkno == nblk)
return bio;
-   gfs2_log_flush_bio(sdp, WRITE);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0);
}
 
return gfs2_log_alloc_bio(sdp, blkno);
@@ -328,7 +329,7 @@ static void gfs2_log_write(struct gfs2_sbd *sdp, struct 
page *page,
bio = gfs2_log_get_bio(sdp, blkno);
ret = bio_add_page(bio, page, size, offset);
if (ret == 0) {
-   gfs2_log_flush_bio(sdp, WRITE);
+   gfs2_log_flush_bio(sdp, REQ_OP_WRITE, 0);
bio = gfs2_log_alloc_bio(sdp, blkno);
ret = bio_add_page(bio, page, size, offset);
WARN_ON(ret == 0);
diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h
index a65a7ba..e529f53 100644
--- a/fs/gfs2/lops.h
+++ b/fs/gfs2/lops.h
@@ -27,7 +27,7 @@ extern const struct gfs2_log_operations gfs2_databuf_lops;
 
 extern const struct gfs2_log_operations *gfs2_log_ops[];
 extern void gfs2_log_write_page(struct gfs2_sbd *sdp, struct page *page);
-extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int rw);
+extern void gfs2_log_flush_bio(struct gfs2_sbd *sdp, int op, int op_flags);
 extern void gfs2_pin(struct gfs2_sbd *sdp, struct buffer_head *bh);
 
 static inline unsigned int buf_limit(struct gfs2_sbd *sdp)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 26/32] ide cd: do not set REQ_WRITE on requests.

2015-11-04 Thread mchristi
From: Mike Christie 

The block layer will set the correct READ/WRITE operation flags/fields
when creating a request, so there is not need for drivers to set the
REQ_WRITE flag.

Signed-off-by: Mike Christie 
---
 drivers/ide/ide-cd_ioctl.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/ide/ide-cd_ioctl.c b/drivers/ide/ide-cd_ioctl.c
index 066e390..d2d0b38 100644
--- a/drivers/ide/ide-cd_ioctl.c
+++ b/drivers/ide/ide-cd_ioctl.c
@@ -459,9 +459,6 @@ int ide_cdrom_packet(struct cdrom_device_info *cdi,
   layer. the packet must be complete, as we do not
   touch it at all. */
 
-   if (cgc->data_direction == CGC_DATA_WRITE)
-   flags |= REQ_WRITE;
-
if (cgc->sense)
memset(cgc->sense, 0, sizeof(struct request_sense));
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/32] dm: pass dm stats data dir instead of bi_rw

2015-11-04 Thread mchristi
From: Mike Christie 

It looks like dm stats primarily cares about the data direction
(READ vs WRITE) and does not need the bio/request flags
and in the future operation value. REQ_DISCARD is always set with
REQ_WRITE, so the check for either one in dm_stats_account_io
is not needed.

This patch has it use the bio and request data_dir helpers
instead of accessing the bi_rw/cmd_flags directly. This makes
the next patches that remove the operation from the cmd_flags
and bi_rw cleaner since we do not have to check for multiple
operations.

Signed-off-by: Mike Christie 
---
 drivers/md/dm-stats.c | 6 +++---
 drivers/md/dm.c   | 8 
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 8289804..96b5c1b 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -518,7 +518,7 @@ static void dm_stat_for_entry(struct dm_stat *s, size_t 
entry,
  struct dm_stats_aux *stats_aux, bool end,
  unsigned long duration_jiffies)
 {
-   unsigned long idx = bi_rw & REQ_WRITE;
+   unsigned long idx = bi_rw;
struct dm_stat_shared *shared = >stat_shared[entry];
struct dm_stat_percpu *p;
 
@@ -645,8 +645,8 @@ void dm_stats_account_io(struct dm_stats *stats, unsigned 
long bi_rw,
last = raw_cpu_ptr(stats->last);
stats_aux->merged =
(bi_sector == (ACCESS_ONCE(last->last_sector) &&
-  ((bi_rw & (REQ_WRITE | REQ_DISCARD)) ==
-   (ACCESS_ONCE(last->last_rw) & 
(REQ_WRITE | REQ_DISCARD)))
+  ((bi_rw == WRITE) ==
+   (ACCESS_ONCE(last->last_rw) == WRITE))
   ));
ACCESS_ONCE(last->last_sector) = end_sector;
ACCESS_ONCE(last->last_rw) = bi_rw;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d2cf6d9..ea4bc70 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -655,7 +655,7 @@ static void start_io_acct(struct dm_io *io)
atomic_inc_return(>pending[rw]));
 
if (unlikely(dm_stats_used(>stats)))
-   dm_stats_account_io(>stats, bio->bi_rw, 
bio->bi_iter.bi_sector,
+   dm_stats_account_io(>stats, bio_data_dir(bio), 
bio->bi_iter.bi_sector,
bio_sectors(bio), false, 0, >stats_aux);
 }
 
@@ -670,7 +670,7 @@ static void end_io_acct(struct dm_io *io)
generic_end_io_acct(rw, _disk(md)->part0, io->start_time);
 
if (unlikely(dm_stats_used(>stats)))
-   dm_stats_account_io(>stats, bio->bi_rw, 
bio->bi_iter.bi_sector,
+   dm_stats_account_io(>stats, bio_data_dir(bio), 
bio->bi_iter.bi_sector,
bio_sectors(bio), true, duration, 
>stats_aux);
 
/*
@@ -1053,7 +1053,7 @@ static void rq_end_stats(struct mapped_device *md, struct 
request *orig)
if (unlikely(dm_stats_used(>stats))) {
struct dm_rq_target_io *tio = tio_from_request(orig);
tio->duration_jiffies = jiffies - tio->duration_jiffies;
-   dm_stats_account_io(>stats, orig->cmd_flags, 
blk_rq_pos(orig),
+   dm_stats_account_io(>stats, rq_data_dir(orig), 
blk_rq_pos(orig),
tio->n_sectors, true, tio->duration_jiffies,
>stats_aux);
}
@@ -1988,7 +1988,7 @@ static void dm_start_request(struct mapped_device *md, 
struct request *orig)
struct dm_rq_target_io *tio = tio_from_request(orig);
tio->duration_jiffies = jiffies;
tio->n_sectors = blk_rq_sectors(orig);
-   dm_stats_account_io(>stats, orig->cmd_flags, 
blk_rq_pos(orig),
+   dm_stats_account_io(>stats, rq_data_dir(orig), 
blk_rq_pos(orig),
tio->n_sectors, false, 0, >stats_aux);
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/32] block: add operation field to request struct

2015-11-04 Thread mchristi
From: Mike Christie 

This patch adds field to the request to store the REQ_OP, and
has the block layer code set it up.

The next patches will modify the other drivers to get/test the
request->op field. We are still ORing the op into the cmd_flags.
When I am done with the conversion, that will be dropped.

Signed-off-by: Mike Christie 
---
 block/blk-core.c   | 50 --
 block/blk-flush.c  |  1 +
 block/blk-mq.c | 31 +--
 include/linux/blkdev.h |  1 +
 4 files changed, 47 insertions(+), 36 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index c8672f2..e625516 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -984,7 +984,8 @@ static struct io_context *rq_ioc(struct bio *bio)
 /**
  * __get_request - get a free request
  * @rl: request list to allocate from
- * @rw_flags: RW and SYNC flags
+ * @op: REQ_OP
+ * @op_flags: rq_flag_bits
  * @bio: bio to allocate request for (can be %NULL)
  * @gfp_mask: allocation mask
  *
@@ -995,21 +996,22 @@ static struct io_context *rq_ioc(struct bio *bio)
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *__get_request(struct request_list *rl, int rw_flags,
-struct bio *bio, gfp_t gfp_mask)
+static struct request *__get_request(struct request_list *rl, int op,
+int op_flags, struct bio *bio,
+gfp_t gfp_mask)
 {
struct request_queue *q = rl->q;
struct request *rq;
struct elevator_type *et = q->elevator->type;
struct io_context *ioc = rq_ioc(bio);
struct io_cq *icq = NULL;
-   const bool is_sync = rw_is_sync(rw_flags) != 0;
+   const bool is_sync = rw_is_sync(op | op_flags) != 0;
int may_queue;
 
if (unlikely(blk_queue_dying(q)))
return ERR_PTR(-ENODEV);
 
-   may_queue = elv_may_queue(q, rw_flags);
+   may_queue = elv_may_queue(q, op | op_flags);
if (may_queue == ELV_MQUEUE_NO)
goto rq_starved;
 
@@ -1053,7 +1055,7 @@ static struct request *__get_request(struct request_list 
*rl, int rw_flags,
 
/*
 * Decide whether the new request will be managed by elevator.  If
-* so, mark @rw_flags and increment elvpriv.  Non-zero elvpriv will
+* so, mark @op_flags and increment elvpriv.  Non-zero elvpriv will
 * prevent the current elevator from being destroyed until the new
 * request is freed.  This guarantees icq's won't be destroyed and
 * makes creating new ones safe.
@@ -1062,14 +1064,14 @@ static struct request *__get_request(struct 
request_list *rl, int rw_flags,
 * it will be created after releasing queue_lock.
 */
if (blk_rq_should_init_elevator(bio) && !blk_queue_bypass(q)) {
-   rw_flags |= REQ_ELVPRIV;
+   op_flags |= REQ_ELVPRIV;
q->nr_rqs_elvpriv++;
if (et->icq_cache && ioc)
icq = ioc_lookup_icq(ioc, q);
}
 
if (blk_queue_io_stat(q))
-   rw_flags |= REQ_IO_STAT;
+   op_flags |= REQ_IO_STAT;
spin_unlock_irq(q->queue_lock);
 
/* allocate and init request */
@@ -1079,10 +1081,11 @@ static struct request *__get_request(struct 
request_list *rl, int rw_flags,
 
blk_rq_init(q, rq);
blk_rq_set_rl(rq, rl);
-   rq->cmd_flags = rw_flags | REQ_ALLOCED;
+   rq->cmd_flags = op | op_flags | REQ_ALLOCED;
+   rq->op = op;
 
/* init elvpriv */
-   if (rw_flags & REQ_ELVPRIV) {
+   if (op_flags & REQ_ELVPRIV) {
if (unlikely(et->icq_cache && !icq)) {
if (ioc)
icq = ioc_create_icq(ioc, q, gfp_mask);
@@ -1108,7 +,7 @@ out:
if (ioc_batching(q, ioc))
ioc->nr_batch_requests--;
 
-   trace_block_getrq(q, bio, rw_flags & 1);
+   trace_block_getrq(q, bio, op);
return rq;
 
 fail_elvpriv:
@@ -1138,7 +1141,7 @@ fail_alloc:
 * queue, but this is pretty rare.
 */
spin_lock_irq(q->queue_lock);
-   freed_request(rl, rw_flags);
+   freed_request(rl, op | op_flags);
 
/*
 * in the very unlikely event that allocation failed and no
@@ -1156,7 +1159,8 @@ rq_starved:
 /**
  * get_request - get a free request
  * @q: request_queue to allocate request from
- * @rw_flags: RW and SYNC flags
+ * op: REQ_OP
+ * @op_flags: rq_flag_bits
  * @bio: bio to allocate request for (can be %NULL)
  * @gfp_mask: allocation mask
  *
@@ -1167,17 +1171,18 @@ rq_starved:
  * Returns ERR_PTR on failure, with @q->queue_lock held.
  * Returns request pointer on success, with @q->queue_lock *not held*.
  */
-static struct request *get_request(struct request_queue *q, int rw_flags,
-  

[PATCH 27/32] cfq/cgroup: pass operation and flags seperately

2015-11-04 Thread mchristi
From: Mike Christie 

The operation is about to be separated from the flags, so this
patch has users pass them in separately to the cgroup stats.

Signed-off-by: Mike Christie 
---
 block/cfq-iosched.c| 49 +++---
 include/linux/blk-cgroup.h | 13 ++--
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 04de884..dbc3da4 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -660,9 +660,10 @@ static inline void cfqg_put(struct cfq_group *cfqg)
 } while (0)
 
 static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg,
-   struct cfq_group *curr_cfqg, int rw)
+   struct cfq_group *curr_cfqg, int op,
+   int op_flags)
 {
-   blkg_rwstat_add(>stats.queued, rw, 1);
+   blkg_rwstat_add(>stats.queued, op, op_flags, 1);
cfqg_stats_end_empty_time(>stats);
cfqg_stats_set_start_group_wait_time(cfqg, curr_cfqg);
 }
@@ -676,26 +677,30 @@ static inline void 
cfqg_stats_update_timeslice_used(struct cfq_group *cfqg,
 #endif
 }
 
-static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int rw)
+static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int op,
+  int op_flags)
 {
-   blkg_rwstat_add(>stats.queued, rw, -1);
+   blkg_rwstat_add(>stats.queued, op, op_flags, -1);
 }
 
-static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int rw)
+static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int op,
+  int op_flags)
 {
-   blkg_rwstat_add(>stats.merged, rw, 1);
+   blkg_rwstat_add(>stats.merged, op, op_flags, 1);
 }
 
 static inline void cfqg_stats_update_completion(struct cfq_group *cfqg,
-   uint64_t start_time, uint64_t io_start_time, int rw)
+   uint64_t start_time, uint64_t io_start_time, int op,
+   int op_flags)
 {
struct cfqg_stats *stats = >stats;
unsigned long long now = sched_clock();
 
if (time_after64(now, io_start_time))
-   blkg_rwstat_add(>service_time, rw, now - io_start_time);
+   blkg_rwstat_add(>service_time, op, op_flags,
+   now - io_start_time);
if (time_after64(io_start_time, start_time))
-   blkg_rwstat_add(>wait_time, rw,
+   blkg_rwstat_add(>wait_time, op, op_flags,
io_start_time - start_time);
 }
 
@@ -769,13 +774,16 @@ static inline void cfqg_put(struct cfq_group *cfqg) { }
 #define cfq_log_cfqg(cfqd, cfqg, fmt, args...) do {} while (0)
 
 static inline void cfqg_stats_update_io_add(struct cfq_group *cfqg,
-   struct cfq_group *curr_cfqg, int rw) { }
+   struct cfq_group *curr_cfqg, int op, int op_flags) { }
 static inline void cfqg_stats_update_timeslice_used(struct cfq_group *cfqg,
unsigned long time, unsigned long unaccounted_time) { }
-static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int rw) 
{ }
-static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int rw) 
{ }
+static inline void cfqg_stats_update_io_remove(struct cfq_group *cfqg, int op,
+   int op_flags) { }
+static inline void cfqg_stats_update_io_merged(struct cfq_group *cfqg, int op,
+   int op_flags) { }
 static inline void cfqg_stats_update_completion(struct cfq_group *cfqg,
-   uint64_t start_time, uint64_t io_start_time, int rw) { }
+   uint64_t start_time, uint64_t io_start_time, int op,
+   int op_flags) { }
 
 #endif /* CONFIG_CFQ_GROUP_IOSCHED */
 
@@ -2449,10 +2457,10 @@ static void cfq_reposition_rq_rb(struct cfq_queue 
*cfqq, struct request *rq)
 {
elv_rb_del(>sort_list, rq);
cfqq->queued[rq_is_sync(rq)]--;
-   cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->cmd_flags);
+   cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->op, rq->cmd_flags);
cfq_add_rq_rb(rq);
cfqg_stats_update_io_add(RQ_CFQG(rq), cfqq->cfqd->serving_group,
-rq->cmd_flags);
+rq->op, rq->cmd_flags);
 }
 
 static struct request *
@@ -2505,7 +2513,7 @@ static void cfq_remove_request(struct request *rq)
cfq_del_rq_rb(rq);
 
cfqq->cfqd->rq_queued--;
-   cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->cmd_flags);
+   cfqg_stats_update_io_remove(RQ_CFQG(rq), rq->op, rq->cmd_flags);
if (rq->cmd_flags & REQ_PRIO) {
WARN_ON(!cfqq->prio_pending);
cfqq->prio_pending--;
@@ -2540,7 +2548,7 @@ static void cfq_merged_request(struct request_queue *q, 
struct request *req,
 

[PATCH 28/32] block/fs/drivers: use bio/rq_data_dir helpers

2015-11-04 Thread mchristi
From: Mike Christie 

This has the the block layer, drivers and fs code use
the bio and rq data_dir helpers instead of accessing the
bi_rw/cmd_flags and checking for REQ_WRITE.

Signed-off-by: Mike Christie 
---
 block/blk-merge.c| 2 +-
 drivers/ata/libata-scsi.c| 2 +-
 drivers/block/loop.c | 6 +++---
 drivers/block/rbd.c  | 2 +-
 drivers/block/umem.c | 2 +-
 drivers/ide/ide-floppy.c | 2 +-
 drivers/md/bcache/io.c   | 2 +-
 drivers/md/bcache/request.c  | 6 +++---
 drivers/scsi/osd/osd_initiator.c | 4 ++--
 fs/btrfs/disk-io.c   | 2 +-
 fs/btrfs/extent_io.c | 2 +-
 fs/btrfs/inode.c | 2 +-
 include/linux/blkdev.h   | 2 +-
 include/linux/fs.h   | 2 +-
 14 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index c4e9c37..fe00d94 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -378,7 +378,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request 
*rq,
}
 
if (q->dma_drain_size && q->dma_drain_needed(rq)) {
-   if (rq->cmd_flags & REQ_WRITE)
+   if (rq_data_dir(rq) == WRITE)
memset(q->dma_drain_buffer, 0, q->dma_drain_size);
 
sg_unmark_end(sg);
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 0d7f0da..68c2b34 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1125,7 +1125,7 @@ static int atapi_drain_needed(struct request *rq)
if (likely(rq->cmd_type != REQ_TYPE_BLOCK_PC))
return 0;
 
-   if (!blk_rq_bytes(rq) || (rq->cmd_flags & REQ_WRITE))
+   if (!blk_rq_bytes(rq) || rq_data_dir(rq) == WRITE)
return 0;
 
return atapi_cmd_type(rq->cmd[0]) == ATAPI_MISC;
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 674f800..e214936 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -396,7 +396,7 @@ static int do_req_filebacked(struct loop_device *lo, struct 
request *rq)
 
pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset;
 
-   if (rq->cmd_flags & REQ_WRITE) {
+   if (rq_data_dir(rq) == WRITE) {
if (rq->cmd_flags & REQ_FLUSH)
ret = lo_req_flush(lo, rq);
else if (rq->cmd_flags & REQ_DISCARD)
@@ -1461,7 +1461,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
if (lo->lo_state != Lo_bound)
return -EIO;
 
-   if (cmd->rq->cmd_flags & REQ_WRITE) {
+   if (rq_data_dir(cmd->rq) == WRITE) {
struct loop_device *lo = cmd->rq->q->queuedata;
bool need_sched = true;
 
@@ -1484,7 +1484,7 @@ static int loop_queue_rq(struct blk_mq_hw_ctx *hctx,
 
 static void loop_handle_cmd(struct loop_cmd *cmd)
 {
-   const bool write = cmd->rq->cmd_flags & REQ_WRITE;
+   const bool write = rq_data_dir(cmd->rq);
struct loop_device *lo = cmd->rq->q->queuedata;
int ret = 0;
 
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 6f26cf3..39104ca 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3377,7 +3377,7 @@ static void rbd_queue_workfn(struct work_struct *work)
 
if (rq->cmd_flags & REQ_DISCARD)
op_type = OBJ_OP_DISCARD;
-   else if (rq->cmd_flags & REQ_WRITE)
+   else if (rq_data_dir(rq) == WRITE)
op_type = OBJ_OP_WRITE;
else
op_type = OBJ_OP_READ;
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index 04d6579..2355754 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -462,7 +462,7 @@ static void process_page(unsigned long data)
le32_to_cpu(desc->local_addr)>>9,
le32_to_cpu(desc->transfer_size));
dump_dmastat(card, control);
-   } else if ((bio->bi_rw & REQ_WRITE) &&
+   } else if (bio_data_dir(bio) == WRITE &&
   le32_to_cpu(desc->local_addr) >> 9 ==
card->init_size) {
card->init_size += le32_to_cpu(desc->transfer_size) >> 
9;
diff --git a/drivers/ide/ide-floppy.c b/drivers/ide/ide-floppy.c
index 2fb5350..f079d8d 100644
--- a/drivers/ide/ide-floppy.c
+++ b/drivers/ide/ide-floppy.c
@@ -206,7 +206,7 @@ static void idefloppy_create_rw_cmd(ide_drive_t *drive,
memcpy(rq->cmd, pc->c, 12);
 
pc->rq = rq;
-   if (rq->cmd_flags & REQ_WRITE)
+   if (cmd == WRITE)
pc->flags |= PC_FLAG_WRITING;
 
pc->flags |= PC_FLAG_DMA_OK;
diff --git a/drivers/md/bcache/io.c b/drivers/md/bcache/io.c
index 86a0bb8..fbc8974 100644
--- a/drivers/md/bcache/io.c
+++ b/drivers/md/bcache/io.c
@@ -111,7 +111,7 @@ void bch_bbio_count_io_errors(struct cache_set *c, struct 
bio *bio,
struct bbio *b = container_of(bio, struct bbio, bio);
struct 

Re: Mobility Radeon HD 4530/4570/545v: flicker in 1920x1080

2015-11-04 Thread Alex Deucher
On Wed, Nov 4, 2015 at 5:10 PM, Pavel Machek  wrote:
> Hi!
>
>> index dac78ad..b86f06a 100644
>> --- a/drivers/gpu/drm/radeon/atombios_crtc.c
>> +++ b/drivers/gpu/drm/radeon/atombios_crtc.c
>> @@ -569,6 +569,8 @@ static u32 atombios_adjust_pll(struct drm_crtc *crtc,
>>  radeon_crtc->pll_flags = 0;
>> 
>>  if (ASIC_IS_AVIVO(rdev)) {
>> +   radeon_crtc->pll_flags |= 
>> RADEON_PLL_PREFER_MINM_OVER_MAXP;
>> +
>>  if ((rdev->family == CHIP_RS600) ||
>>  (rdev->family == CHIP_RS690) ||
>>  (rdev->family == CHIP_RS740))
>> 
>> >>>Help.. maybe... it is tricky to tell. It definitely does _not_ fix the
>> >>>issue completely.
>> >>You could also try the old pll algorithm:
>> >I reverted the patch above, and switched to the old algorithm.
>> >
>> >The flicker is still there. (But maybe its less horrible, like with
>> >RADEON_PLL_PREFER_MINM_OVER_MAXP).
>>
>> The flickering would vanish completely if that's the reason for the issue
>> you are seeing.
>
>> Try setting ref_div_min and ref_div_max to 2 in
>>  radeon_compute_pll_avivo().
>
> Ok, I did this, but no luck, still flickers. But the flicker only
> happens when something changes on screen, like dragging a big
> window. Is that consistent with wrong PLL timings?

Does it go away with radeon.dpm=0?  Sounds more like either memory
reclocking happening outside of vblank, or underflow to the display
controllers.

Alex

>
> diff --git a/config.32 b/config.32
> index 00e5dd2..4734158 100644
> --- a/config.32
> +++ b/config.32
> @@ -1090,7 +1090,7 @@ CONFIG_DEVTMPFS_MOUNT=y
>  CONFIG_PREVENT_FIRMWARE_BUILD=y
>  CONFIG_FW_LOADER=y
>  CONFIG_FIRMWARE_IN_KERNEL=y
> -CONFIG_EXTRA_FIRMWARE="radeon/R700_rlc.bin"
> +CONFIG_EXTRA_FIRMWARE="radeon/R700_rlc.bin radeon/RV710_smc.bin 
> radeon/RV710_uvd.bin"
>  CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
>  # CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
>  CONFIG_ALLOW_DEV_COREDUMP=y
> diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c 
> b/drivers/gpu/drm/radeon/atombios_crtc.c
> index dac78ad..dcc4f4d 100644
> --- a/drivers/gpu/drm/radeon/atombios_crtc.c
> +++ b/drivers/gpu/drm/radeon/atombios_crtc.c
> @@ -569,6 +569,8 @@ static u32 atombios_adjust_pll(struct drm_crtc *crtc,
> radeon_crtc->pll_flags = 0;
>
> if (ASIC_IS_AVIVO(rdev)) {
> +   //radeon_crtc->pll_flags |= RADEON_PLL_PREFER_MINM_OVER_MAXP;
> +
> if ((rdev->family == CHIP_RS600) ||
> (rdev->family == CHIP_RS690) ||
> (rdev->family == CHIP_RS740))
> diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
> b/drivers/gpu/drm/radeon/radeon_display.c
> index 6743174..bebaf4f 100644
> --- a/drivers/gpu/drm/radeon/radeon_display.c
> +++ b/drivers/gpu/drm/radeon/radeon_display.c
> @@ -947,6 +947,7 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
> fb_div_max = pll->max_feedback_div;
>
> if (pll->flags & RADEON_PLL_USE_FRAC_FB_DIV) {
> +   printk("radeon: fractional divider\n");
> fb_div_min *= 10;
> fb_div_max *= 10;
> }
> @@ -966,6 +967,9 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
> else
> ref_div_max = pll->max_ref_div;
>
> +   ref_div_min = 2;
> +   ref_div_max = 2;
> +
> /* determine allowed post divider range */
> if (pll->flags & RADEON_PLL_USE_POST_DIV) {
> post_div_min = pll->post_div;
> @@ -1020,6 +1024,8 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
> diff = abs(target_clock - (pll->reference_freq * fb_div) /
> (ref_div * post_div));
>
> +   printk("post_div = %d, diff = %d\n", post_div, diff);
> +
> if (diff < diff_best || (diff == diff_best &&
> !(pll->flags & RADEON_PLL_PREFER_MINM_OVER_MAXP))) {
>
> @@ -1028,6 +1034,7 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
> }
> }
> post_div = post_div_best;
> +   printk("Selected post_div = %d\n", post_div);
>
> /* get the feedback and reference divider for the optimal value */
> avivo_get_fb_ref_div(nom, den, post_div, fb_div_max, ref_div_max,
> @@ -1062,7 +1069,7 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
> *ref_div_p = ref_div;
> *post_div_p = post_div;
>
> -   DRM_DEBUG_KMS("%d - %d, pll dividers - fb: %d.%d ref: %d, post %d\n",
> +   printk("%d - %d, pll dividers - fb: %d.%d ref: %d, post %d\n",
>   freq, *dot_clock_p * 10, *fb_div_p, *frac_fb_div_p,
>   ref_div, post_div);
>  }
>
>
>> But I'm not 100% convinced that this is actually a PLL problem, try to
>> compile the firmware it complains about into the kernel as well.
>
> Did that, too.
>
> Best regards,
> 

[PATCH 30/32] drbd: don't use bi_rw for operations

2015-11-04 Thread mchristi
From: Mike Christie 

This removes drbd's bi_rw use for operations read, write,
discard, write same, etc (REQ_OPs).

Signed-off-by: Mike Christie 
---
 drivers/block/drbd/drbd_actlog.c   |  2 +-
 drivers/block/drbd/drbd_bitmap.c   |  1 -
 drivers/block/drbd/drbd_main.c | 15 ---
 drivers/block/drbd/drbd_receiver.c | 22 --
 drivers/block/drbd/drbd_worker.c   |  4 ++--
 5 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index ed2eafe..fc96a3c 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -159,7 +159,7 @@ static int _drbd_md_sync_page_io(struct drbd_device *device,
goto out;
bio->bi_private = device;
bio->bi_end_io = drbd_md_endio;
-   bio->bi_rw = op | op_flags;
+   bio->bi_rw = op_flags;
bio->bi_op = op;
 
if (op != REQ_OP_WRITE && device->state.disk == D_DISKLESS && 
device->ldev == NULL)
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 2ff407a..173a3d6 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -1022,7 +1022,6 @@ static void bm_page_io_async(struct drbd_bm_aio_ctx *ctx, 
int page_nr) __must_ho
 
if (drbd_insert_fault(device, (rw == REQ_OP_WRITE) ? DRBD_FAULT_MD_WR : 
DRBD_FAULT_MD_RD)) {
bio->bi_op = rw;
-   bio->bi_rw |= rw;
bio_io_error(bio);
} else {
submit_bio(rw, 0, bio);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 9eb8039..d74178c 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1602,15 +1602,16 @@ static int _drbd_send_zc_ee(struct drbd_peer_device 
*peer_device,
return 0;
 }
 
-static u32 bio_flags_to_wire(struct drbd_connection *connection, unsigned long 
bi_rw)
+static u32 bio_flags_to_wire(struct drbd_connection *connection,
+struct bio *bio)
 {
if (connection->agreed_pro_version >= 95)
-   return  (bi_rw & REQ_SYNC ? DP_RW_SYNC : 0) |
-   (bi_rw & REQ_FUA ? DP_FUA : 0) |
-   (bi_rw & REQ_FLUSH ? DP_FLUSH : 0) |
-   (bi_rw & REQ_DISCARD ? DP_DISCARD : 0);
+   return  (bio->bi_rw & REQ_SYNC ? DP_RW_SYNC : 0) |
+   (bio->bi_rw & REQ_FUA ? DP_FUA : 0) |
+   (bio->bi_rw & REQ_FLUSH ? DP_FLUSH : 0) |
+   (bio->bi_op == REQ_OP_DISCARD ? DP_DISCARD : 0);
else
-   return bi_rw & REQ_SYNC ? DP_RW_SYNC : 0;
+   return bio->bi_rw & REQ_SYNC ? DP_RW_SYNC : 0;
 }
 
 /* Used to send write or TRIM aka REQ_DISCARD requests
@@ -1635,7 +1636,7 @@ int drbd_send_dblock(struct drbd_peer_device 
*peer_device, struct drbd_request *
p->sector = cpu_to_be64(req->i.sector);
p->block_id = (unsigned long)req;
p->seq_num = cpu_to_be32(atomic_inc_return(>packet_seq));
-   dp_flags = bio_flags_to_wire(peer_device->connection, 
req->master_bio->bi_rw);
+   dp_flags = bio_flags_to_wire(peer_device->connection, req->master_bio);
if (device->state.conn >= C_SYNC_SOURCE &&
device->state.conn <= C_PAUSED_SYNC_T)
dp_flags |= DP_MAY_SET_IN_SYNC;
diff --git a/drivers/block/drbd/drbd_receiver.c 
b/drivers/block/drbd/drbd_receiver.c
index 4e458bd..44193da 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1418,7 +1418,7 @@ next_bio:
/* > peer_req->i.sector, unless this is the first bio */
bio->bi_iter.bi_sector = sector;
bio->bi_bdev = device->ldev->backing_bdev;
-   bio->bi_rw = op | op_flags;
+   bio->bi_rw = op_flags;
bio->bi_op = op;
bio->bi_private = peer_req;
bio->bi_end_io = drbd_peer_request_endio;
@@ -1427,7 +1427,7 @@ next_bio:
bios = bio;
++n_bios;
 
-   if (op & REQ_OP_DISCARD) {
+   if (op == REQ_OP_DISCARD) {
bio->bi_iter.bi_size = data_size;
goto submit;
}
@@ -2132,8 +2132,7 @@ static unsigned long wire_flags_to_bio_flags(u32 dpf)
 {
return  (dpf & DP_RW_SYNC ? REQ_SYNC : 0) |
(dpf & DP_FUA ? REQ_FUA : 0) |
-   (dpf & DP_FLUSH ? REQ_FLUSH : 0) |
-   (dpf & DP_DISCARD ? REQ_DISCARD : 0);
+   (dpf & DP_FLUSH ? REQ_FLUSH : 0);
 }
 
 static unsigned long wire_flags_to_bio_op(u32 dpf)
@@ -2141,7 +2140,7 @@ static unsigned long wire_flags_to_bio_op(u32 dpf)
if (dpf & DP_DISCARD)
return REQ_OP_DISCARD;
else
-   return 0;
+   return REQ_OP_WRITE;;
 }
 
 static void fail_postponed_requests(struct drbd_device *device, sector_t 
sector,
@@ -2287,7 +2286,7 @@ static int receive_Data(struct drbd_connection 
*connection, struct 

[PATCH 29/32] block/drivers: rm request cmd_flags REQ_OP use

2015-11-04 Thread mchristi
From: Mike Christie 

With this patch the request struct code no longer uses the
cmd_flags field for REQ_OP operations.

---
 block/blk-core.c  | 17 +
 block/blk-merge.c | 10 ++
 block/blk-mq.c| 10 +-
 block/cfq-iosched.c   |  4 ++--
 block/elevator.c  |  8 
 drivers/block/loop.c  |  2 +-
 drivers/block/mtip32xx/mtip32xx.c |  2 +-
 drivers/block/nbd.c   |  2 +-
 drivers/block/nvme-core.c |  6 +++---
 drivers/block/rbd.c   |  2 +-
 drivers/block/skd_main.c  | 11 ---
 drivers/block/xen-blkfront.c  |  8 +---
 drivers/md/dm.c   |  2 +-
 drivers/mmc/card/block.c  |  7 +++
 drivers/mmc/card/queue.c  |  6 ++
 drivers/mmc/card/queue.h  |  5 -
 drivers/mtd/mtd_blkdevs.c |  2 +-
 drivers/scsi/sd.c | 22 ++
 include/linux/blkdev.h| 26 +-
 include/linux/elevator.h  |  4 ++--
 20 files changed, 82 insertions(+), 74 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index e625516..deb8bfd 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -889,10 +889,10 @@ static void __freed_request(struct request_list *rl, int 
sync)
  * A request has just been released.  Account for it, update the full and
  * congestion status, wake up any waiters.   Called under q->queue_lock.
  */
-static void freed_request(struct request_list *rl, unsigned int flags)
+static void freed_request(struct request_list *rl, int op, unsigned int flags)
 {
struct request_queue *q = rl->q;
-   int sync = rw_is_sync(flags);
+   int sync = rw_is_sync(op, flags);
 
q->nr_rqs[sync]--;
rl->count[sync]--;
@@ -1005,13 +1005,13 @@ static struct request *__get_request(struct 
request_list *rl, int op,
struct elevator_type *et = q->elevator->type;
struct io_context *ioc = rq_ioc(bio);
struct io_cq *icq = NULL;
-   const bool is_sync = rw_is_sync(op | op_flags) != 0;
+   const bool is_sync = rw_is_sync(op, op_flags) != 0;
int may_queue;
 
if (unlikely(blk_queue_dying(q)))
return ERR_PTR(-ENODEV);
 
-   may_queue = elv_may_queue(q, op | op_flags);
+   may_queue = elv_may_queue(q, op, op_flags);
if (may_queue == ELV_MQUEUE_NO)
goto rq_starved;
 
@@ -1141,7 +1141,7 @@ fail_alloc:
 * queue, but this is pretty rare.
 */
spin_lock_irq(q->queue_lock);
-   freed_request(rl, op | op_flags);
+   freed_request(rl, op, op_flags);
 
/*
 * in the very unlikely event that allocation failed and no
@@ -1175,7 +1175,7 @@ static struct request *get_request(struct request_queue 
*q, int op,
   int op_flags, struct bio *bio,
   gfp_t gfp_mask)
 {
-   const bool is_sync = rw_is_sync(op | op_flags) != 0;
+   const bool is_sync = rw_is_sync(op, op_flags) != 0;
DEFINE_WAIT(wait);
struct request_list *rl;
struct request *rq;
@@ -1424,13 +1424,14 @@ void __blk_put_request(struct request_queue *q, struct 
request *req)
 */
if (req->cmd_flags & REQ_ALLOCED) {
unsigned int flags = req->cmd_flags;
+   int op = req->op;
struct request_list *rl = blk_rq_rl(req);
 
BUG_ON(!list_empty(>queuelist));
BUG_ON(ELV_ON_HASH(req));
 
blk_free_request(rl, req);
-   freed_request(rl, flags);
+   freed_request(rl, op, flags);
blk_put_rl(rl);
}
 }
@@ -2054,7 +2055,7 @@ int blk_rq_check_limits(struct request_queue *q, struct 
request *rq)
if (!rq_mergeable(rq))
return 0;
 
-   if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
+   if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->op)) {
printk(KERN_ERR "%s: over max size limit.\n", __func__);
return -EIO;
}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index fe00d94..ec42c7e 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -582,7 +582,8 @@ static int attempt_merge(struct request_queue *q, struct 
request *req,
if (!rq_mergeable(req) || !rq_mergeable(next))
return 0;
 
-   if (!blk_check_merge_flags(req->cmd_flags, next->cmd_flags))
+   if (!blk_check_merge_flags(req->cmd_flags, req->op, next->cmd_flags,
+  next->op))
return 0;
 
/*
@@ -596,7 +597,7 @@ static int attempt_merge(struct request_queue *q, struct 
request *req,
|| req_no_special_merge(next))
return 0;
 
-   if (req->cmd_flags & REQ_WRITE_SAME &&
+   if (req->op == REQ_OP_WRITE_SAME &&

[GIT PULL] power supply changes for 4.4

2015-11-04 Thread Sebastian Reichel
Hi Linus,

The following changes since commit 1f93e4a96c9109378204c147b3eec0d0e8100fde:

  Linux 4.3-rc2 (2015-09-20 14:32:34 -0700)

are available in the git repositories at:

 git://git.infradead.org/battery-2.6.git tags/for-v4.4
 git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git 
tags/for-v4.4

for you to fetch changes up to 6bd03ce3c12a22d86f59070f1da15aaa2bde8a51:

  power: bq27xxx_battery: Remove unneeded dependency in Kconfig (2015-10-19 
10:38:50 +0200)


power supply and reset changes for the v4.4 series

 * new AXP20X USB Power driver
 * new Qualcomm SMBB driver
 * new TPS65217 Charger driver
 * BQ24257: add BQ24250/BQ24251 support
 * overhaul bq27x00 battery driver, rename to bq27xxx
 * misc. fixes and cleanups


Alexandre Belloni (5):
  power/reset: at91-reset: remove useless at91_reset_platform_probe()
  power/reset: at91-reset: allow compiling as a module
  power/reset: at91-reset: get and use slow clock
  power/reset: at91-poweroff: allow compiling as a module
  power/reset: at91-poweroff: get and use slow clock

Andreas Dannenberg (14):
  power: bq24257: Remove IRQ config through stat-gpios
  power: bq24257: Streamline input current limit setup
  power: bq24257: Use managed power supply register
  power: bq24257: Simplify bq24257_power_supply_init()
  dt: power: bq24257-charger: Cover additional devices
  power: bq24257: Add basic support for bq24250/bq24251
  power: bq24257: Add bit definition for temp sense enable
  power: bq24257: Allow manual setting of input current limit
  power: bq24257: Add SW-based approach for Power Good determination
  power: bq24257: Add over voltage protection setting support
  power: bq24257: Add input DPM voltage threshold setting support
  power: bq24257: Allow input current limit sysfs access
  power: bq24257: Add various device-specific sysfs properties
  Documentation: power: bq24257: Document exported sysfs entries

Andrew F. Davis (8):
  power: bq27x00_battery: Remove unneeded i2c MODULE_ALIAS
  power: bq27x00_battery: Renaming for consistency
  power: bq27xxx_battery: Platform initialization must declare a device
  power: bq27xxx_battery: Fix typos and change naming for state of charge 
functions
  power: bq27xxx_battery: Add support for additional bq27xxx family devices
  power: bq27xxx_battery: Cleanup health checking
  power: bq27xxx_battery: Add interrupt handling support
  power: bq27xxx_battery: Remove unneeded dependency in Kconfig

Andrzej Hajda (1):
  power: bq27xxx_battery: fix signedness bug in 
bq27xxx_battery_read_health()

Courtney Cavin (2):
  dt-binding: power: Add Qualcomm SMBB binding
  power: Add Qualcomm SMBB driver

Dan Carpenter (1):
  power: qcom_smbb: test the correct variable

Enric Balletbo i Serra (2):
  devicetree: Add TPS65217 charger binding.
  power_supply: Add support for tps65217-charger.

Hans de Goede (2):
  ARM: dts: Add binding documentation for AXP20x pmic usb power supply
  power: Add an axp20x-usb-power driver

Javier Martinez Canillas (1):
  power: Remove unnecessary MODULE_ALIAS() for I2C drivers

Julia Lawall (1):
  power_supply: charger-manager: add missing of_node_put

Luis de Bethencourt (1):
  tps65090-charger: Fix module autoload for OF platform driver

Marcel Ziswiler (1):
  power: charger-manager: comment spelling fixes

Marek Belisko (2):
  ARM: dts: twl4030: Add iio properties for bci subnode
  drivers: power: twl4030_charger: fix link problems when building as module

Mark Brown (1):
  power: wm831x_power: Convert to devm_kzalloc()

Milo Kim (2):
  power:lp8727_charger: use the private data instead of updating I2C device 
platform data
  power:lp8727_charger: parsing child node after getting debounce-ms

Nicolas Ferre (1):
  power: reset: at91-reset/trivial: driver applies to SAMA5 family as well

Pali Rohár (1):
  bq2415x_charger: Fix null pointer dereference

Sebastian Reichel (3):
  twl4030_charger: add missing iio dependency
  power: bq27xxx_battery: fix platform probe
  power: bq27xxx_battery: move irq handler to i2c section

Vaishali Thakkar (4):
  88pm860x_battery: Convert to using managed resources
  power: max17042_battery: Convert to using managed resources
  max8903_charger: Convert to using managed resources
  power_supply: max8998: Use devm_power_supply_register

Valentin Rothberg (1):
  wm831x_power: Use IRQF_ONESHOT to request threaded IRQs

 Documentation/ABI/testing/sysfs-class-power |   58 
 Documentation/devicetree/bindings/power/bq24257.txt |   53 +++-
 Documentation/devicetree/bindings/power_supply/axp20x_usb_power.txt |   34 +++
 

[GIT PULL] HSI changes for 4.4

2015-11-04 Thread Sebastian Reichel
Hi Linus,

The following changes since commit 6ff33f3902c3b1c5d0db6b1e2c70b6d76fba357f:

  Linux 4.3-rc1 (2015-09-12 16:35:56 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi.git 
tags/hsi-for-4.4

for you to fetch changes up to 16bd5865cdb3b190105db21bd22a0ca0501e7b20:

  hsi: controllers:remove redundant code (2015-10-30 16:10:40 +0100)


HSI changes for the v4.4 series

 * misc. fixes


Geliang Tang (1):
  hsi: fix double kfree

Insu Yun (1):
  hsi: correctly handle return value of kzalloc

Jakub Wilk (1):
  HSI: Fix a typo

Roger Quadros (1):
  hsi: omap_ssi_port: Prevent warning if cawake_gpio is not defined.

Sanjeev Sharma (1):
  hsi: controllers:remove redundant code

 drivers/hsi/clients/ssi_protocol.c  |  2 +-
 drivers/hsi/controllers/omap_ssi.c  | 21 -
 drivers/hsi/controllers/omap_ssi_port.c |  2 +-
 drivers/hsi/hsi.c   | 13 +++--
 4 files changed, 17 insertions(+), 21 deletions(-)

-- Sebastian


signature.asc
Description: PGP signature


Re: [PATCH] PM / OPP: Protect updates to list_dev with mutex

2015-11-04 Thread Stephen Boyd
On 10/30, Viresh Kumar wrote:
> dev_opp_list_lock is used everywhere to protect device and OPP lists,
> but dev_pm_opp_set_sharing_cpus() is missed somehow. And instead we used
> rcu-lock, which wouldn't help here as we are adding a new list_dev.
> 
> This also fixes a problem where we have called kzalloc(..., GFP_KERNEL)
> from within rcu-lock, which isn't allowed as kzalloc can sleep when
> called with GFP_KERNEL.
> 
> With CONFIG_DEBUG_ATOMIC_SLEEP set, we will see the caller vomiting.
> 
> Fixes: 8d4d4e98acd6 ("PM / OPP: Add helpers for initializing CPU OPPs")
> Reported-by: Michael Turquette 
> Signed-off-by: Viresh Kumar 
> ---

Reviewed-by: Stephen Boyd 

I assume some other patch will come to fix the comment and/or add
the lockdep check.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v13 12/51] vfs: Cache richacl in struct inode

2015-11-04 Thread Andreas Gruenbacher
Andreas,

On Wed, Nov 4, 2015 at 3:03 AM, Andreas Dilger  wrote:
>> @@ -33,7 +33,7 @@ richacl_alloc(int count, gfp_t gfp)
>>   struct richacl *acl = kzalloc(size, gfp);
>>
>>   if (acl) {
>> - atomic_set(>a_refcount, 1);
>> + atomic_set(>a_base.ba_refcount, 1);
>>   acl->a_count = count;
>>   }
>>   return acl;
>> @@ -52,7 +52,7 @@ richacl_clone(const struct richacl *acl, gfp_t gfp)
>>
>>   if (dup) {
>>   memcpy(dup, acl, size);
>> - atomic_set(>a_refcount, 1);
>> + atomic_set(>a_base.ba_refcount, 1);
>
> These two calls should be base_acl_init().

Yes. This should all be fixed in the next snapshot.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Mobility Radeon HD 4530/4570/545v: flicker in 1920x1080

2015-11-04 Thread Pavel Machek
Hi!

> index dac78ad..b86f06a 100644
> --- a/drivers/gpu/drm/radeon/atombios_crtc.c
> +++ b/drivers/gpu/drm/radeon/atombios_crtc.c
> @@ -569,6 +569,8 @@ static u32 atombios_adjust_pll(struct drm_crtc *crtc,
>  radeon_crtc->pll_flags = 0;
> 
>  if (ASIC_IS_AVIVO(rdev)) {
> +   radeon_crtc->pll_flags |= 
> RADEON_PLL_PREFER_MINM_OVER_MAXP;
> +
>  if ((rdev->family == CHIP_RS600) ||
>  (rdev->family == CHIP_RS690) ||
>  (rdev->family == CHIP_RS740))
> 
> >>>Help.. maybe... it is tricky to tell. It definitely does _not_ fix the
> >>>issue completely.
> >>You could also try the old pll algorithm:
> >I reverted the patch above, and switched to the old algorithm.
> >
> >The flicker is still there. (But maybe its less horrible, like with
> >RADEON_PLL_PREFER_MINM_OVER_MAXP).
> 
> The flickering would vanish completely if that's the reason for the issue
> you are seeing.

> Try setting ref_div_min and ref_div_max to 2 in
>  radeon_compute_pll_avivo().

Ok, I did this, but no luck, still flickers. But the flicker only
happens when something changes on screen, like dragging a big
window. Is that consistent with wrong PLL timings?

diff --git a/config.32 b/config.32
index 00e5dd2..4734158 100644
--- a/config.32
+++ b/config.32
@@ -1090,7 +1090,7 @@ CONFIG_DEVTMPFS_MOUNT=y
 CONFIG_PREVENT_FIRMWARE_BUILD=y
 CONFIG_FW_LOADER=y
 CONFIG_FIRMWARE_IN_KERNEL=y
-CONFIG_EXTRA_FIRMWARE="radeon/R700_rlc.bin"
+CONFIG_EXTRA_FIRMWARE="radeon/R700_rlc.bin radeon/RV710_smc.bin 
radeon/RV710_uvd.bin"
 CONFIG_EXTRA_FIRMWARE_DIR="/lib/firmware"
 # CONFIG_FW_LOADER_USER_HELPER_FALLBACK is not set
 CONFIG_ALLOW_DEV_COREDUMP=y
diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c 
b/drivers/gpu/drm/radeon/atombios_crtc.c
index dac78ad..dcc4f4d 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -569,6 +569,8 @@ static u32 atombios_adjust_pll(struct drm_crtc *crtc,
radeon_crtc->pll_flags = 0;
 
if (ASIC_IS_AVIVO(rdev)) {
+   //radeon_crtc->pll_flags |= RADEON_PLL_PREFER_MINM_OVER_MAXP;
+
if ((rdev->family == CHIP_RS600) ||
(rdev->family == CHIP_RS690) ||
(rdev->family == CHIP_RS740))
diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
b/drivers/gpu/drm/radeon/radeon_display.c
index 6743174..bebaf4f 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -947,6 +947,7 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
fb_div_max = pll->max_feedback_div;
 
if (pll->flags & RADEON_PLL_USE_FRAC_FB_DIV) {
+   printk("radeon: fractional divider\n");
fb_div_min *= 10;
fb_div_max *= 10;
}
@@ -966,6 +967,9 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
else
ref_div_max = pll->max_ref_div;
 
+   ref_div_min = 2;
+   ref_div_max = 2;
+
/* determine allowed post divider range */
if (pll->flags & RADEON_PLL_USE_POST_DIV) {
post_div_min = pll->post_div;
@@ -1020,6 +1024,8 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
diff = abs(target_clock - (pll->reference_freq * fb_div) /
(ref_div * post_div));
 
+   printk("post_div = %d, diff = %d\n", post_div, diff);
+
if (diff < diff_best || (diff == diff_best &&
!(pll->flags & RADEON_PLL_PREFER_MINM_OVER_MAXP))) {
 
@@ -1028,6 +1034,7 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
}
}
post_div = post_div_best;
+   printk("Selected post_div = %d\n", post_div);
 
/* get the feedback and reference divider for the optimal value */
avivo_get_fb_ref_div(nom, den, post_div, fb_div_max, ref_div_max,
@@ -1062,7 +1069,7 @@ void radeon_compute_pll_avivo(struct radeon_pll *pll,
*ref_div_p = ref_div;
*post_div_p = post_div;
 
-   DRM_DEBUG_KMS("%d - %d, pll dividers - fb: %d.%d ref: %d, post %d\n",
+   printk("%d - %d, pll dividers - fb: %d.%d ref: %d, post %d\n",
  freq, *dot_clock_p * 10, *fb_div_p, *frac_fb_div_p,
  ref_div, post_div);
 }


> But I'm not 100% convinced that this is actually a PLL problem, try to
> compile the firmware it complains about into the kernel as well.

Did that, too.

Best regards,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCH 1/3] PM / OPP: Add "opp-supported-hw" binding

2015-11-04 Thread Stephen Boyd
On 11/03, Viresh Kumar wrote:
> On 02-11-15, 11:21, Stephen Boyd wrote:
> > Ah I see that after looking at the previous thread. Perhaps we
> > can add such information into the documentation so that people
> > aren't misled into thinking they're limited to 32 bits?
> 
> What about these changes:

Yep looks good. Assuming this is squashed into the original:

Reviewed-by: Stephen Boyd 

One typo below.

> 
> diff --git a/Documentation/devicetree/bindings/opp/opp.txt 
> b/Documentation/devicetree/bindings/opp/opp.txt
> index 96892057586a..b6ca2239838b 100644
> --- a/Documentation/devicetree/bindings/opp/opp.txt
> +++ b/Documentation/devicetree/bindings/opp/opp.txt
> @@ -123,11 +123,15 @@ properties.
>  - opp-suspend: Marks the OPP to be used during device suspend. Only one OPP 
> in
>the table should have this.
>  
> -- opp-supported-hw: User defined array containing a hierarchy of hardware
> -  version numbers, supported by the OPP. For example: a platform with 
> hierarchy
> -  of three levels of versions (A, B and C), this field should be like  Z>,
> -  where X corresponds to Version hierarchy A, Y corresponds to version 
> hierarchy
> -  B and Z corresponds to version hierarchy C.
> +- opp-supported-hw: This enables us to select only a subset of OPPs from the
> +  larger OPP table, based on what version of the hardware we are running on. 
> We
> +  still can't have multiple nodes with the same opp-hz value in OPP table.
> +
> +  Its an user defined array containing a hierarchy of hardware version 
> numbers,

s/Its/It's/

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] perf tools: Pass available CPU number to clang compiler

2015-11-04 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 04, 2015 at 11:20:04AM +, Wang Nan escreveu:
> This patch introduces a new macro "__NR_CPUS__" to perf's embedded
> clang compiler, which represent the available CPU counters in this

available "CPU counters"? ENOPARSE :-)

> system. BPF program can use this macro to create a map with same
> number of system CPUs. For exmaple:
 example
> 
>  struct bpf_map_def SEC("maps") pmu_map = {
>  .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
>  .key_size = sizeof(int),
>  .value_size = sizeof(u32),
>  .max_entries = __NR_CPUS__,
>  };

I wonder if we shouldn't use the getconf() parameter here, i.e. define
_SC_NPROCESSORS_CONF and also provide _SC_NPROCESSORS_ONLN.

The kernel uses NR_CPUS, for accessing CONFIG_NR_CPUS, Alexei, what do
you think?

- Arnaldo
 
> Signed-off-by: Wang Nan 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Alexei Starovoitov 
> Cc: Namhyung Kim 
> Cc: Zefan Li 
> Cc: pi3or...@163.com
> ---
>  tools/perf/util/llvm-utils.c | 24 ++--
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/util/llvm-utils.c b/tools/perf/util/llvm-utils.c
> index 4f6a478..80eecef 100644
> --- a/tools/perf/util/llvm-utils.c
> +++ b/tools/perf/util/llvm-utils.c
> @@ -11,10 +11,11 @@
>  #include "cache.h"
>  
>  #define CLANG_BPF_CMD_DEFAULT_TEMPLATE   \
> - "$CLANG_EXEC -D__KERNEL__ $CLANG_OPTIONS "  \
> - "$KERNEL_INC_OPTIONS -Wno-unused-value "\
> - "-Wno-pointer-sign -working-directory " \
> - "$WORKING_DIR -c \"$CLANG_SOURCE\" -target bpf -O2 -o -"
> + "$CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS "\
> + "$CLANG_OPTIONS $KERNEL_INC_OPTIONS "   \
> + "-Wno-unused-value -Wno-pointer-sign "  \
> + "-working-directory $WORKING_DIR "  \
> + "-c \"$CLANG_SOURCE\" -target bpf -O2 -o -"
>  
>  struct llvm_param llvm_param = {
>   .clang_path = "clang",
> @@ -326,8 +327,8 @@ get_kbuild_opts(char **kbuild_dir, char 
> **kbuild_include_opts)
>  int llvm__compile_bpf(const char *path, void **p_obj_buf,
> size_t *p_obj_buf_sz)
>  {
> - int err;
> - char clang_path[PATH_MAX];
> + int err, nr_cpus_avail;
> + char clang_path[PATH_MAX], nr_cpus_avail_str[64];
>   const char *clang_opt = llvm_param.clang_opt;
>   const char *template = llvm_param.clang_bpf_cmd_template;
>   char *kbuild_dir = NULL, *kbuild_include_opts = NULL;
> @@ -354,6 +355,17 @@ int llvm__compile_bpf(const char *path, void **p_obj_buf,
>*/
>   get_kbuild_opts(_dir, _include_opts);
>  
> + nr_cpus_avail = sysconf(_SC_NPROCESSORS_CONF);
> + if (nr_cpus_avail <= 0) {
> + pr_err(
> +"WARNING:\tunable to get available CPUs in this system: %s\n"
> +"\tUse 128 instead.\n", strerror(errno));
> + nr_cpus_avail = 128;
> + }
> + snprintf(nr_cpus_avail_str, sizeof(nr_cpus_avail_str), "%d",
> +  nr_cpus_avail);
> +
> + force_set_env("NR_CPUS", nr_cpus_avail_str);
>   force_set_env("CLANG_EXEC", clang_path);
>   force_set_env("CLANG_OPTIONS", clang_opt);
>   force_set_env("KERNEL_INC_OPTIONS", kbuild_include_opts);
> -- 
> 1.8.3.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/32] block/fs: pass in op and flags to ll_rw_block

2015-11-04 Thread mchristi
From: Mike Christie 

This has ll_rw_block users pass in the request op and flags seperately
instead of as a bitmap.

Signed-off-by: Mike Christie 
---
 fs/buffer.c | 19 ++-
 fs/ext4/inode.c |  6 +++---
 fs/ext4/namei.c |  3 ++-
 fs/ext4/super.c |  2 +-
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/meta_io.c   |  4 ++--
 fs/gfs2/quota.c |  2 +-
 fs/isofs/compress.c |  2 +-
 fs/jbd2/journal.c   |  2 +-
 fs/jbd2/recovery.c  |  4 ++--
 fs/ocfs2/aops.c |  2 +-
 fs/ocfs2/super.c|  2 +-
 fs/reiserfs/journal.c   |  8 
 fs/reiserfs/stree.c |  4 ++--
 fs/reiserfs/super.c |  2 +-
 fs/squashfs/block.c |  4 ++--
 fs/udf/dir.c|  2 +-
 fs/udf/directory.c  |  2 +-
 fs/udf/inode.c  |  2 +-
 fs/ufs/balloc.c |  2 +-
 include/linux/buffer_head.h |  2 +-
 21 files changed, 40 insertions(+), 38 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index cd07d86..ba84126 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -595,7 +595,7 @@ void write_boundary_block(struct block_device *bdev,
struct buffer_head *bh = __find_get_block(bdev, bblock + 1, blocksize);
if (bh) {
if (buffer_dirty(bh))
-   ll_rw_block(WRITE, 1, );
+   ll_rw_block(REQ_OP_WRITE, 0, 1, );
put_bh(bh);
}
 }
@@ -1406,7 +1406,7 @@ void __breadahead(struct block_device *bdev, sector_t 
block, unsigned size)
 {
struct buffer_head *bh = __getblk(bdev, block, size);
if (likely(bh)) {
-   ll_rw_block(READA, 1, );
+   ll_rw_block(REQ_OP_READ, REQ_RAHEAD, 1, );
brelse(bh);
}
 }
@@ -1966,7 +1966,7 @@ int __block_write_begin(struct page *page, loff_t pos, 
unsigned len,
if (!buffer_uptodate(bh) && !buffer_delay(bh) &&
!buffer_unwritten(bh) &&
 (block_start < from || block_end > to)) {
-   ll_rw_block(READ, 1, );
+   ll_rw_block(REQ_OP_READ, 0, 1, );
*wait_bh++=bh;
}
}
@@ -2883,7 +2883,7 @@ int block_truncate_page(struct address_space *mapping,
 
if (!buffer_uptodate(bh) && !buffer_delay(bh) && !buffer_unwritten(bh)) 
{
err = -EIO;
-   ll_rw_block(READ, 1, );
+   ll_rw_block(REQ_OP_READ, 0, 1, );
wait_on_buffer(bh);
/* Uhhuh. Read error. Complain and punt. */
if (!buffer_uptodate(bh))
@@ -3081,7 +3081,8 @@ EXPORT_SYMBOL(submit_bh);
 
 /**
  * ll_rw_block: low-level access to block devices (DEPRECATED)
- * @rw: whether to %READ or %WRITE or maybe %READA (readahead)
+ * @op: REQ_OP_READ or REQ_OP_WRITE
+ * op_flags: rq_flag_bits
  * @nr: number of  buffer_heads in the array
  * @bhs: array of pointers to  buffer_head
  *
@@ -3104,7 +3105,7 @@ EXPORT_SYMBOL(submit_bh);
  * All of the buffers must be for the same device, and must also be a
  * multiple of the current approved size for the device.
  */
-void ll_rw_block(int rw, int nr, struct buffer_head *bhs[])
+void ll_rw_block(int op, int op_flags, int nr, struct buffer_head *bhs[])
 {
int i;
 
@@ -3113,18 +3114,18 @@ void ll_rw_block(int rw, int nr, struct buffer_head 
*bhs[])
 
if (!trylock_buffer(bh))
continue;
-   if (rw == WRITE) {
+   if (op == REQ_OP_WRITE) {
if (test_clear_buffer_dirty(bh)) {
bh->b_end_io = end_buffer_write_sync;
get_bh(bh);
-   submit_bh(REQ_OP_WRITE, 0, bh);
+   submit_bh(REQ_OP_WRITE, op_flags, bh);
continue;
}
} else {
if (!buffer_uptodate(bh)) {
bh->b_end_io = end_buffer_read_sync;
get_bh(bh);
-   submit_bh(rw, 0, bh);
+   submit_bh(REQ_OP_READ, op_flags, bh);
continue;
}
}
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f34ef29..dd2e197 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -791,7 +791,7 @@ struct buffer_head *ext4_bread(handle_t *handle, struct 
inode *inode,
return bh;
if (!bh || buffer_uptodate(bh))
return bh;
-   ll_rw_block(READ | REQ_META | REQ_PRIO, 1, );
+   ll_rw_block(REQ_OP_READ,  REQ_META | REQ_PRIO, 1, );
wait_on_buffer(bh);
if (buffer_uptodate(bh))
return bh;
@@ -948,7 +948,7 @@ static int ext4_block_write_begin(struct page *page, loff_t 
pos, unsigned len,
if 

[PATCH 22/32] block/fs/drivers: set bi_op to REQ_OP

2015-11-04 Thread mchristi
From: Mike Christie 

This patch sets the bi_op to a REQ_OP for users where it
was a simple one line change.

For compat reasons, we are still ORing the op into bi_rw. This
will be dropped in later patches in this series when everyone
is updated.

Signed-off-by: Mike Christie 
---
 drivers/block/pktcdvd.c| 2 ++
 drivers/md/dm-crypt.c  | 1 +
 drivers/md/dm.c| 1 +
 drivers/scsi/osd/osd_initiator.c   | 4 
 drivers/target/target_core_pscsi.c | 4 +++-
 fs/btrfs/volumes.c | 1 +
 fs/exofs/ore.c | 1 +
 7 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 7be2375..bbb7a45 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -1075,6 +1075,7 @@ static void pkt_gather_data(struct pktcdvd_device *pd, 
struct packet_data *pkt)
 
atomic_inc(>io_wait);
bio->bi_rw = READ;
+   bio->bi_op = REQ_OP_READ;
pkt_queue_bio(pd, bio);
frames_read++;
}
@@ -1337,6 +1338,7 @@ static void pkt_start_write(struct pktcdvd_device *pd, 
struct packet_data *pkt)
/* Start the write request */
atomic_set(>io_wait, 1);
pkt->w_bio->bi_rw = WRITE;
+   pkt->w_bio->bi_op = REQ_OP_WRITE;
pkt_queue_bio(pd, pkt->w_bio);
 }
 
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 4b3b6f8..92689e5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1132,6 +1132,7 @@ static void clone_init(struct dm_crypt_io *io, struct bio 
*clone)
clone->bi_private = io;
clone->bi_end_io  = crypt_endio;
clone->bi_bdev= cc->dev->bdev;
+   clone->bi_op  = io->base_bio->bi_op;
clone->bi_rw  = io->base_bio->bi_rw;
 }
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 1b5c604..d2cf6d9 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2329,6 +2329,7 @@ static struct mapped_device *alloc_dev(int minor)
 
bio_init(>flush_bio);
md->flush_bio.bi_bdev = md->bdev;
+   md->flush_bio.bi_op = REQ_OP_WRITE;
md->flush_bio.bi_rw = WRITE_FLUSH;
 
dm_stats_init(>stats);
diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c
index 0cccd60..ca7b4b6 100644
--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -729,6 +729,7 @@ static int _osd_req_list_objects(struct osd_request *or,
return PTR_ERR(bio);
}
 
+   bio->bi_op = REQ_OP_READ;
bio->bi_rw &= ~REQ_WRITE;
or->in.bio = bio;
or->in.total_bytes = bio->bi_iter.bi_size;
@@ -842,6 +843,7 @@ int osd_req_write_kern(struct osd_request *or,
if (IS_ERR(bio))
return PTR_ERR(bio);
 
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw |= REQ_WRITE; /* FIXME: bio_set_dir() */
osd_req_write(or, obj, offset, bio, len);
return 0;
@@ -959,6 +961,7 @@ static int _osd_req_finalize_cdb_cont(struct osd_request 
*or, const u8 *cap_key)
if (IS_ERR(bio))
return PTR_ERR(bio);
 
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw |= REQ_WRITE;
 
/* integrity check the continuation before the bio is linked
@@ -1080,6 +1083,7 @@ int osd_req_write_sg_kern(struct osd_request *or,
if (IS_ERR(bio))
return PTR_ERR(bio);
 
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw |= REQ_WRITE;
osd_req_write_sg(or, obj, bio, sglist, numentries);
 
diff --git a/drivers/target/target_core_pscsi.c 
b/drivers/target/target_core_pscsi.c
index de18790..00a7bda5 100644
--- a/drivers/target/target_core_pscsi.c
+++ b/drivers/target/target_core_pscsi.c
@@ -921,8 +921,10 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, 
u32 sgl_nents,
if (!bio)
goto fail;
 
-   if (rw)
+   if (rw) {
+   bio->bi_op = REQ_OP_WRITE;
bio->bi_rw |= REQ_WRITE;
+   }
 
pr_debug("PSCSI: Allocated bio: %p,"
" dir: %s nr_vecs: %d\n", bio,
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3dfac71..ef67c2f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5847,6 +5847,7 @@ static noinline void btrfs_schedule_bio(struct btrfs_root 
*root,
atomic_inc(>fs_info->nr_async_bios);
WARN_ON(bio->bi_next);
bio->bi_next = NULL;
+   bio->bi_op = op;
bio->bi_rw |= op | op_flags;
 
spin_lock(>io_lock);
diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 7bd8ac8..7339bef 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -878,6 +878,7 @@ static int _write_mirror(struct ore_io_state *ios, int 
cur_comp)
} else {
 

[PATCH 31/32] block/fs/driver: rm bio bi_rw REQ_OP use

2015-11-04 Thread mchristi
From: Mike Christie 

With this patch we no longer use the bio->bi_rw field
for REQ_WRITE, REQ_DISCARD, REQ_WRITE_SAME, (REQ_OPs). bi_rw should
only set REQ_XYZ values and bi_op is for REQ_OPs.

Signed-off-by: Mike Christie 
---
 block/bio.c | 15 +++--
 block/blk-core.c| 10 +++---
 block/blk-lib.c |  9 ++---
 block/blk-map.c |  4 +--
 block/blk-merge.c   | 12 +++
 drivers/block/brd.c |  2 +-
 drivers/block/pktcdvd.c |  2 --
 drivers/block/rsxx/dma.c|  2 +-
 drivers/block/xen-blkfront.c|  5 +--
 drivers/block/zram/zram_drv.c   |  2 +-
 drivers/md/bcache/journal.c |  6 ++--
 drivers/md/bcache/movinggc.c|  2 +-
 drivers/md/bcache/request.c | 11 +++---
 drivers/md/bcache/writeback.c   |  4 +--
 drivers/md/dm-cache-target.c| 10 +++---
 drivers/md/dm-crypt.c   |  2 +-
 drivers/md/dm-io.c  | 12 +++
 drivers/md/dm-kcopyd.c  |  2 +-
 drivers/md/dm-log-writes.c  |  2 +-
 drivers/md/dm-raid1.c   | 10 +++---
 drivers/md/dm-region-hash.c |  4 +--
 drivers/md/dm-stripe.c  |  4 +--
 drivers/md/dm-thin.c| 15 +
 drivers/md/dm.c |  6 ++--
 drivers/md/linear.c |  2 +-
 drivers/md/raid0.c  |  2 +-
 drivers/md/raid1.c  | 25 ++
 drivers/md/raid10.c | 34 +--
 drivers/md/raid5.c  | 20 ---
 drivers/scsi/osd/osd_initiator.c|  4 ---
 drivers/staging/lustre/lustre/llite/lloop.c |  8 ++---
 drivers/target/target_core_pscsi.c  |  4 +--
 fs/btrfs/volumes.c  |  6 ++--
 fs/exofs/ore.c  |  1 -
 include/linux/bio.h | 15 ++---
 include/linux/blk_types.h   | 13 +++-
 include/linux/blktrace_api.h|  2 +-
 include/linux/fs.h  | 29 ++--
 include/trace/events/bcache.h   | 12 ---
 include/trace/events/block.h| 31 +++--
 kernel/trace/blktrace.c | 52 -
 41 files changed, 204 insertions(+), 209 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 1cf8428..064a858 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -669,10 +669,10 @@ struct bio *bio_clone_bioset(struct bio *bio_src, gfp_t 
gfp_mask,
bio->bi_iter.bi_sector  = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size= bio_src->bi_iter.bi_size;
 
-   if (bio->bi_rw & REQ_DISCARD)
+   if (bio->bi_op == REQ_OP_DISCARD)
goto integrity_clone;
 
-   if (bio->bi_rw & REQ_WRITE_SAME) {
+   if (bio->bi_op == REQ_OP_WRITE_SAME) {
bio->bi_io_vec[bio->bi_vcnt++] = bio_src->bi_io_vec[0];
goto integrity_clone;
}
@@ -1170,10 +1170,8 @@ struct bio *bio_copy_user_iov(struct request_queue *q,
if (!bio)
goto out_bmd;
 
-   if (iter->type & WRITE) {
-   bio->bi_rw |= REQ_WRITE;
+   if (iter->type & WRITE)
bio->bi_op = REQ_OP_WRITE;
-   }
 
ret = 0;
 
@@ -1342,10 +1340,8 @@ struct bio *bio_map_user_iov(struct request_queue *q,
/*
 * set data direction, and check if mapped pages need bouncing
 */
-   if (iter->type & WRITE) {
-   bio->bi_rw |= REQ_WRITE;
+   if (iter->type & WRITE)
bio->bi_op = REQ_OP_WRITE;
-   }
 
bio_set_flag(bio, BIO_USER_MAPPED);
 
@@ -1538,7 +1534,6 @@ struct bio *bio_copy_kern(struct request_queue *q, void 
*data, unsigned int len,
bio->bi_private = data;
} else {
bio->bi_end_io = bio_copy_kern_endio;
-   bio->bi_rw |= REQ_WRITE;
bio->bi_op = REQ_OP_WRITE;
}
 
@@ -1798,7 +1793,7 @@ struct bio *bio_split(struct bio *bio, int sectors,
 * Discards need a mutable bio_vec to accommodate the payload
 * required by the DSM TRIM and UNMAP commands.
 */
-   if (bio->bi_rw & REQ_DISCARD)
+   if (bio->bi_op == REQ_OP_DISCARD)
split = bio_clone_bioset(bio, gfp, bs);
else
split = bio_clone_fast(bio, gfp, bs);
diff --git a/block/blk-core.c b/block/blk-core.c
index deb8bfd..c270a4a 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1873,14 +1873,14 @@ generic_make_request_checks(struct bio *bio)
}
}
 
-   if ((bio->bi_rw & REQ_DISCARD) &&
+   if ((bio->bi_op == REQ_OP_DISCARD) 

[PATCH 32/32] block: remove __REQ op defs and reduce bi_op/bi_rw sizes

2015-11-04 Thread mchristi
From: Mike Christie 

This patches removes the __REQ/REQ definitions for operations
now defined by REQ_OPs.

There is now no need for bi_rw to be a long, so this makes it a
int. I also moved the priority to its own field, but I guess I could
have just kept this in the bi_rw since there is only 16 bio related
REQ_XYZ flags.

bi_op is also no longer a bitmap, so it only needs to be a u8/char,
so that is changed too.

This is more of a RFC patch, because I still need to update the rest
of the block layer code that was treating bi_rw as a long and I can
also shrink the request->cmd_flags.

I was not sure if or how much or where people wanted to stick things.
There also appears to be room in the bi_flags field. If bi_flags is
only using 13 bits and there are only 16 REQ_XYZs bits related bios,
I could put them all in one variable if we wanted to go wild with trying
to shrink the bio while I am at it..

Signed-off-by: Mike Christie 
---
 include/linux/bio.h | 13 ++---
 include/linux/blk_types.h   | 23 ++-
 include/trace/events/f2fs.h |  1 -
 3 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index 7cbad7a..34a20cf 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -44,18 +44,9 @@
 #define BIO_MAX_SIZE   (BIO_MAX_PAGES << PAGE_CACHE_SHIFT)
 #define BIO_MAX_SECTORS(BIO_MAX_SIZE >> 9)
 
-/*
- * upper 16 bits of bi_rw define the io priority of this bio
- */
-#define BIO_PRIO_SHIFT (8 * sizeof(unsigned long) - IOPRIO_BITS)
-#define bio_prio(bio)  ((bio)->bi_rw >> BIO_PRIO_SHIFT)
+#define bio_prio(bio)  (bio)->bi_ioprio
 #define bio_prio_valid(bio)ioprio_valid(bio_prio(bio))
-
-#define bio_set_prio(bio, prio)do {\
-   WARN_ON(prio >= (1 << IOPRIO_BITS));\
-   (bio)->bi_rw &= ((1UL << BIO_PRIO_SHIFT) - 1);  \
-   (bio)->bi_rw |= ((unsigned long) (prio) << BIO_PRIO_SHIFT); \
-} while (0)
+#define bio_set_prio(bio, prio)((bio)->bi_ioprio = prio)
 
 /*
  * various member access, note that bio_data should of course not be used
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 581d353..c32ae3c 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -48,14 +48,9 @@ struct bio {
struct block_device *bi_bdev;
unsigned intbi_flags;   /* status, command, etc */
int bi_error;
-   unsigned long   bi_rw;  /* bottom bits rq_flags_bits
-* top bits priority
-*/
-   /*
-* this will be a u8 in the next patches and bi_rw can be shrunk to
-* a u32. For compat in these transistional patches op is a int here.
-*/
-   int bi_op;  /* REQ_OP */
+   unsigned intbi_rw;  /* rq_flags_bits */
+   unsigned short  bi_ioprio;
+   u8  bi_op;  /* REQ_OP */
 
 
struct bvec_iterbi_iter;
@@ -151,7 +146,6 @@ struct bio {
  */
 enum rq_flag_bits {
/* common flags */
-   __REQ_WRITE,/* not set, read. set, write */
__REQ_FAILFAST_DEV, /* no driver retries of device errors */
__REQ_FAILFAST_TRANSPORT, /* no driver retries of transport errors */
__REQ_FAILFAST_DRIVER,  /* no driver retries of driver errors */
@@ -159,9 +153,7 @@ enum rq_flag_bits {
__REQ_SYNC, /* request is sync (sync write or read) */
__REQ_META, /* metadata io request */
__REQ_PRIO, /* boost priority in cfq */
-   __REQ_DISCARD,  /* request to discard sectors */
-   __REQ_SECURE,   /* secure discard (used with __REQ_DISCARD) */
-   __REQ_WRITE_SAME,   /* write same block many times */
+   __REQ_SECURE,   /* secure discard (used with REQ_OP_DISCARD) */
 
__REQ_NOIDLE,   /* don't anticipate more IO after this one */
__REQ_INTEGRITY,/* I/O includes block integrity payload */
@@ -198,15 +190,12 @@ enum rq_flag_bits {
__REQ_NR_BITS,  /* stops here */
 };
 
-#define REQ_WRITE  (1ULL << __REQ_WRITE)
 #define REQ_FAILFAST_DEV   (1ULL << __REQ_FAILFAST_DEV)
 #define REQ_FAILFAST_TRANSPORT (1ULL << __REQ_FAILFAST_TRANSPORT)
 #define REQ_FAILFAST_DRIVER(1ULL << __REQ_FAILFAST_DRIVER)
 #define REQ_SYNC   (1ULL << __REQ_SYNC)
 #define REQ_META   (1ULL << __REQ_META)
 #define REQ_PRIO   (1ULL << __REQ_PRIO)
-#define REQ_DISCARD(1ULL << __REQ_DISCARD)
-#define REQ_WRITE_SAME (1ULL << __REQ_WRITE_SAME)
 #define REQ_NOIDLE (1ULL << __REQ_NOIDLE)
 #define REQ_INTEGRITY  (1ULL << __REQ_INTEGRITY)
 
@@ -250,8 +239,8 @@ enum rq_flag_bits {

[PATCH 04/32] block: prepare blkdev_issue_discard for bi_rw split

2015-11-04 Thread mchristi
From: Mike Christie 

The next patches will prepare the submit_bio users
for the split. There were a lot more users than
there were for submit_bio_wait, so if the conversion
was not a one liner, I broke it out into its own
patch.

This patch prepares blkdev_issue_discard.

There is some compat code left which will be dropped
in later patches in the series.
1. REQ_WRITE is still being set. This is because a lot
of code assumes it will be set for discard, flushes and
write sames.
2. submit_bio is still taking a bitmap. This is to make
the series git bisectable.

Signed-off-by: Mike Christie 
---
 block/blk-lib.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 9ebf653..0861c7a 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -42,7 +42,8 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
 {
DECLARE_COMPLETION_ONSTACK(wait);
struct request_queue *q = bdev_get_queue(bdev);
-   int type = REQ_WRITE | REQ_DISCARD;
+   int op = REQ_OP_DISCARD;
+   int op_flags = REQ_WRITE;
unsigned int granularity;
int alignment;
struct bio_batch bb;
@@ -63,7 +64,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t 
sector,
if (flags & BLKDEV_DISCARD_SECURE) {
if (!blk_queue_secdiscard(q))
return -EOPNOTSUPP;
-   type |= REQ_SECURE;
+   op_flags |= REQ_SECURE;
}
 
atomic_set(, 1);
@@ -108,7 +109,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
sector = end_sect;
 
atomic_inc();
-   submit_bio(type, bio);
+   submit_bio(op | op_flags, bio);
 
/*
 * We can loop for a long time in here, if someone does
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND RFC PATCH 00/32] separate operations from flags in the bio/request structs

2015-11-04 Thread mchristi
This is just a resend of the patchset from earlier today. There was
a error in the middle of sending the set, so it looks like 10 - 32 got
dropped.

There are a couple new block layer commands we are trying to add support
for in the near term:

compare and write
http://www.spinics.net/lists/target-devel/msg07826.html

copy offload/extended copy/xcopy
https://www.redhat.com/archives/dm-devel/2014-July/msg00070.html

The problem is if we contine to add more commands we will have to one day
extend the cmd_flags/bi_rw fields again. To prevent that, this patchset
separates the operation (REQ_WRITE, REQ_DISCARD, REQ_WRITE_SAME, etc) from
the flags (REQ_SYNC, REQ_QUIET, etc) in the bio and request structs. In the
end of this set, we will have two fields bio->bi_op/request->op and
bio->bi_rw/request->cmd_flags.

The patches were made against Jens's linux-block tree's for-linus branch:
https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git/log/?h=for-linus
(last commit a22c4d7e34402ccdf3414f64c50365436eba7b93).

I have done some basic testing for a lot of the drivers and filesystems,
but I wanted to get comments before trying to track down more hardware/
systems for testing.


Known issues:
- REQ_FLUSH is still a flag, but should probably be a operation.
 For lower level drivers like SCSI where we only get a flush, it makes
more sense to be a operation. However, upper layers like filesystems
can send down flushes with writes, so it is more of a flag for them.
I am still working on this.

- There is a regression with the dm flakey target. It currently
cannot corrupt the operation values.

- The patchset is a little awkward. It touches so much code,
but I wanted to maintain git bisectibility, so there is lots of compat code
left around until the last patches where everyting is cleaned up.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] PM / OPP: Add "opp-supported-hw" binding

2015-11-04 Thread Stephen Boyd
On 11/03, Viresh Kumar wrote:
> On 30-10-15, 15:18, Stephen Boyd wrote:
> > A side-note. I wonder if it would be better style to have the
> > node name be:
> > 
> > opp@6 {
> > 
> > At least it seems that the assumption is we can store all the
> > possible combinations of OPP values for a particular frequency in
> > the same node. Following this style would make dt compilation
> > fail if two nodes have the same frequency.
> 
> From: Viresh Kumar 
> Date: Tue, 3 Nov 2015 07:51:09 +0530
> Subject: [PATCH] PM / OPP: Rename OPP nodes as opp@
> 
> It would be better to name OPP nodes as opp@ as that will ensure
> that multiple DT nodes don't contain the same frequency. Of course we
> expect the writer to name the node with its opp-hz frequency and not any
> other frequency.
> 
> And that will let the compile error out if multiple nodes are using the
> same opp-hz frequency.
> 
> Suggested-by: Stephen Boyd 
> Signed-off-by: Viresh Kumar 
> ---

Reviewed-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/10] staging: lustre: Handle nodemask on UMP machines

2015-11-04 Thread kbuild test robot
Hi James,

[auto build test WARNING on: staging/staging-next]
[also build test WARNING on: next-20151104]
[cannot apply to: v4.3]

url:
https://github.com/0day-ci/linux/commits/James-Simmons/staging-lustre-wrong-parameter-to-cfs_hash_keycpy/20151105-024407
config: i386-randconfig-b0-11050505 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/staging/lustre/lustre/libcfs/libcfs_cpu.c: In function 
'cfs_cpt_table_alloc':
>> drivers/staging/lustre/lustre/libcfs/libcfs_cpu.c:61:14: warning: passing 
>> argument 2 of 'set_bit' from incompatible pointer type 
>> [-Wincompatible-pointer-types]
  set_bit(0, >ctb_nodemask);
 ^
   In file included from include/linux/bitops.h:36:0,
from 
drivers/staging/lustre/lustre/libcfs/../../include/linux/libcfs/linux/libcfs.h:44,
from 
drivers/staging/lustre/lustre/libcfs/../../include/linux/libcfs/libcfs.h:40,
from drivers/staging/lustre/lustre/libcfs/libcfs_cpu.c:38:
   arch/x86/include/asm/bitops.h:72:1: note: expected 'volatile long unsigned 
int *' but argument is of type 'nodemask_t * {aka struct  *}'
set_bit(long nr, volatile unsigned long *addr)
^

vim +/set_bit +61 drivers/staging/lustre/lustre/libcfs/libcfs_cpu.c

45  
46  #define CFS_CPU_VERSION_MAGIC  0xbabecafe
47  
48  struct cfs_cpt_table *
49  cfs_cpt_table_alloc(unsigned int ncpt)
50  {
51  struct cfs_cpt_table *cptab;
52  
53  if (ncpt != 1) {
54  CERROR("Can't support cpu partition number %d\n", ncpt);
55  return NULL;
56  }
57  
58  LIBCFS_ALLOC(cptab, sizeof(*cptab));
59  if (cptab != NULL) {
60  cptab->ctb_version = CFS_CPU_VERSION_MAGIC;
  > 61  set_bit(0, >ctb_nodemask);
62  cptab->ctb_nparts  = ncpt;
63  }
64  
65  return cptab;
66  }
67  EXPORT_SYMBOL(cfs_cpt_table_alloc);
68  
69  void

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-04 Thread Daniel Micay
> With enough pages at once, though, munmap would be fine, too.

That implies lots of page faults and zeroing though. The zeroing alone
is a major performance issue.

There are separate issues with munmap since it ends up resulting in a
lot more virtual memory fragmentation. It would help if the kernel used
first-best-fit for mmap instead of the current naive algorithm (bonus:
O(log n) worst-case, not O(n)). Since allocators like jemalloc and
PartitionAlloc want 2M aligned spans, mixing them with other allocators
can also accelerate the VM fragmentation caused by the dumb mmap
algorithm (i.e. they make a 2M aligned mapping, some other mmap user
does 4k, now there's a nearly 2M gap when the next 2M region is made and
the kernel keeps going rather than reusing it). Anyway, that's a totally
separate issue from this. Just felt like complaining :).

> Maybe what's really needed is a MADV_FREE variant that takes an iovec.
> On an all-cores multithreaded mm, the TLB shootdown broadcast takes
> thousands of cycles on each core more or less regardless of how much
> of the TLB gets zapped.

That would work very well. The allocator ends up having a sequence of
dirty spans that it needs to purge in one go. As long as purging is
fairly spread out, the cost of a single TLB shootdown isn't that bad. It
is extremely bad if it needs to do it over and over to purge a bunch of
ranges, which can happen if the memory has ended up being very, very
fragmentated despite the efforts to compact it (depends on what the
application ends up doing).



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] bpf: add mod default A and X test cases

2015-11-04 Thread Z Lim
On Wed, Nov 4, 2015 at 11:36 AM, Yang Shi  wrote:
> When running "mod X" operation, if X is 0 the filter has to be halt.
> Add new test cases to cover A = A mod X if X is 0, and A = A mod 1.
>
> CC: Xi Wang 
> CC: Zi Shen Lim 
> Signed-off-by: Yang Shi 
> ---

Acked-by: Zi Shen Lim 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/4] bpf tools: Improve libbpf error reporting

2015-11-04 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 04, 2015 at 02:25:58AM +, Wang Nan escreveu:
> In this patch, a series libbpf specific error numbers and
> libbpf_strerror() are created to help reporting error to caller.
> Functions are updated to pass correct error number through macro
> CHECK_ERR().
> 
> All users of bpf_object__open{_buffer}() and bpf_program__title()
> in perf are modified accordingly.

So, before I get:

  [root@zoo ~]# perf record -e /tmp/foo.o sleep 1
  event syntax error: '/tmp/foo.o'
   \___ Invalid argument: Are you root and runing a 
CONFIG_BPF_SYSCALL kernel?

  (add -v to see detail)
  Run 'perf list' for a list of valid events

   Usage: perf record [] []
  or: perf record [] --  []

  -e, --eventevent selector. use 'perf list' to list available 
events


And now:

  [root@zoo ~]# perf record -e /tmp/foo.o sleep 1
  event syntax error: '/tmp/foo.o'
   \___ Unknown error 4006

  (add -v to see detail)
  Run 'perf list' for a list of valid events

   Usage: perf record [] []
  or: perf record [] --  []

  -e, --eventevent selector. use 'perf list' to list available 
events
  [root@zoo ~]#

Can you please fix this? The relevant strerror() routine should know about the
errors it handles and produce an informative message.

- Arnaldo
 
> Signed-off-by: Wang Nan 
> Cc: Arnaldo Carvalho de Melo 
> Cc: Namhyung Kim 
> ---
>  tools/lib/bpf/libbpf.c | 149 
> -
>  tools/lib/bpf/libbpf.h |  12 
>  tools/perf/tests/llvm.c|   2 +-
>  tools/perf/util/bpf-loader.c   |   8 +--
>  tools/perf/util/parse-events.c |   4 +-
>  5 files changed, 120 insertions(+), 55 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 4252fc2..74c64b1 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -61,6 +61,62 @@ void libbpf_set_print(libbpf_print_fn_t warn,
>   __pr_debug = debug;
>  }
>  
> +#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
> +#define STRERR_BUFSIZE  128
> +
> +struct {
> + int code;
> + const char *msg;
> +} libbpf_strerror_table[] = {
> + {LIBBPF_ERRNO__ELIBELF, "Something wrong in libelf"},
> + {LIBBPF_ERRNO__EFORMAT, "BPF object format invalid"},
> + {LIBBPF_ERRNO__EKVERSION, "'version' section incorrect or lost"},
> + {LIBBPF_ERRNO__EENDIAN, "Endian missmatch"},
> + {LIBBPF_ERRNO__EINTERNAL, "Internal error in libbpf"},
> + {LIBBPF_ERRNO__ERELOC, "Relocation failed"},
> + {LIBBPF_ERRNO__ELOAD, "Failed to load program"},
> +};
> +
> +int libbpf_strerror(int err, char *buf, size_t size)
> +{
> + unsigned int i;
> +
> + if (!buf || !size)
> + return -1;
> +
> + err = err > 0 ? err : -err;
> +
> + if (err < LIBBPF_ERRNO__START) {
> + int ret;
> +
> + ret = strerror_r(err, buf, size);
> + buf[size - 1] = '\0';
> + return ret;
> + }
> +
> + for (i = 0; i < ARRAY_SIZE(libbpf_strerror_table); i++) {
> + if (libbpf_strerror_table[i].code == err) {
> + const char *msg;
> +
> + msg = libbpf_strerror_table[i].msg;
> + snprintf(buf, size, "%s", msg);
> + buf[size - 1] = '\0';
> + return 0;
> + }
> + }
> +
> + snprintf(buf, size, "Unknown libbpf error %d", err);
> + buf[size - 1] = '\0';
> + return -1;
> +}
> +
> +#define CHECK_ERR(action, err, out) do { \
> + err = action;   \
> + if (err)\
> + goto out;   \
> +} while(0)
> +
> +
>  /* Copied from tools/perf/util/util.h */
>  #ifndef zfree
>  # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
> @@ -258,7 +314,7 @@ static struct bpf_object *bpf_object__new(const char 
> *path,
>   obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
>   if (!obj) {
>   pr_warning("alloc memory failed for %s\n", path);
> - return NULL;
> + return ERR_PTR(-ENOMEM);
>   }
>  
>   strcpy(obj->path, path);
> @@ -305,7 +361,7 @@ static int bpf_object__elf_init(struct bpf_object *obj)
>  
>   if (obj_elf_valid(obj)) {
>   pr_warning("elf init: internal error\n");
> - return -EEXIST;
> + return -LIBBPF_ERRNO__ELIBELF;
>   }
>  
>   if (obj->efile.obj_buf_sz > 0) {
> @@ -331,14 +387,14 @@ static int bpf_object__elf_init(struct bpf_object *obj)
>   if (!obj->efile.elf) {
>   pr_warning("failed to open %s as ELF file\n",
>   obj->path);
> - err = -EINVAL;
> + err = -LIBBPF_ERRNO__ELIBELF;
>   goto errout;
>   }
>  
>   if (!gelf_getehdr(obj->efile.elf, >efile.ehdr)) {
>   pr_warning("failed to get EHDR from %s\n",
>   obj->path);
> - err = -EINVAL;

Re: [PATCH v2 1/2] mm: mmap: Add new /proc tunable for mmap_base ASLR.

2015-11-04 Thread Andrew Morton
On Wed, 4 Nov 2015 11:31:25 -0800 Daniel Cashman  wrote:

> As for the
> clarification itself, where would you like it?  I could include a cover
> letter for this patch-set, elaborate more in the commit message itself,
> add more to the Kconfig help description, or some combination of the above.

In either [0/n] or [x/x] changelog, please.  I routinely move the [0/n]
material into the [1/n] changelog anyway.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v13 10/51] vfs: Cache base_acl objects in inodes

2015-11-04 Thread Andreas Gruenbacher
Andreas,

On Tue, Nov 3, 2015 at 11:29 PM, Andreas Dilger  wrote:
> On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher  wrote:
>>
>> POSIX ACLs and richacls are both objects allocated by kmalloc() with a
>> reference count which are freed by kfree_rcu().  An inode can either
>> cache an access and a default POSIX ACL, or a richacl (richacls do not
>> have default acls).  To allow an inode to cache either of the two kinds
>> of acls, introduce a new base_acl type and convert i_acl and
>> i_default_acl to that type. In most cases, the vfs then doesn't have to
>> care which kind of acl an inode caches (if any).
>
> For new wrapper functions like this better to name them as "NOUN_VERB" so
> rather than "VERB_NOUN" so that related functions sort together, like
> base_acl_init(), base_acl_get(), base_acl_put(), base_acl_refcount(), etc.

That's better, yes. I agree with all your comments and I've changed
things accordingly.

>> @@ -270,7 +270,7 @@ static struct posix_acl *f2fs_acl_clone(const struct 
>> posix_acl *acl,
>>   sizeof(struct posix_acl_entry);
>>   clone = kmemdup(acl, size, flags);
>>   if (clone)
>> - atomic_set(>a_refcount, 1);
>> + atomic_set(>a_base.ba_refcount, 1);
>
> This should be base_acl_init() since this should also reset the RCU state
> if it was just copied from "acl" above.

Yes. The rcu_head doesn't need initializing or resetting though.

>  That wouldn't be quite correct if
> there are other fields added to struct base_acl that don't need to be
> initialized when it is copied, so possibly base_acl_reinit() would be better
> here and below if that will be the case in the near future (I haven't looked
> through the whole patch series yet).

We won't need a base_acl_reinit() function for now.

>> @@ -25,9 +25,9 @@ struct posix_acl **acl_by_type(struct inode *inode, int 
>> type)
>> {
>>   switch (type) {
>>   case ACL_TYPE_ACCESS:
>> - return >i_acl;
>> + return (struct posix_acl **)>i_acl;
>>   case ACL_TYPE_DEFAULT:
>> - return >i_default_acl;
>> + return (struct posix_acl **)>i_default_acl;
>
> This would be better to use container_of() to unwrap struct base_acl from
> struct posix_acl.  That avoids the hard requirement (which isn't documented
> anywhere) that base_acl needs to be the first member of struct posix_acl.
>
> I was originally going to write that you should add a comment that base_acl
> needs to be the first member of both richacl and posix_acl, but container_of()
> is both cleaner and safer.
>
> Looking further down, that IS actually needed due to the way kfree is used on
> the base_acl pointer, but using container_of() is still cleaner and safer
> than directly casting double pointers (which some compilers and static
> analysis tools will be unhappy with).

Well, we would end up with _of() here which doesn't work and
doesn't make sense, either. Let me change acl_by_type to return a
base_acl ** to clean this up.

>> @@ -576,6 +576,12 @@ static inline void mapping_allow_writable(struct 
>> address_space *mapping)
>> #define i_size_ordered_init(inode) do { } while (0)
>> #endif
>>
>> +struct base_acl {
>> + union {
>> + atomic_t ba_refcount;
>> + struct rcu_head ba_rcu;
>> + };
>> +};
>> struct posix_acl;
>
> Is this forward declaration of struct posix_acl even needed anymore after
> the change below?  There shouldn't be references to the struct in the common
> code anymore (at least not by the end of the patch series.

The get_acl and set_acl inode operations expect struct posix_acl to be declared.

> Hmm, using the base_acl pointer as the pointer to kfree means that the
> base_acl structure DOES need to be the first one in both struct posix_acl
> and struct richacl, so that needs to be commented at each structure so
> it doesn't accidentally break in the future.

Yes. I've added comments; there are also BUILD_BUG_ON() asserts in
posix_acl_release and richacl_put.

>> @@ -57,7 +57,7 @@ static inline struct richacl *
>> richacl_get(struct richacl *acl)
>> {
>>   if (acl)
>> - atomic_inc(>a_refcount);
>> + atomic_inc(>a_base.ba_refcount);
>>   return acl;
>
> This should also use base_acl_get() for consistency. That said, where is
> the call to base_acl_put() in the richacl code?
> Also, where is the change to struct richacl?  It looks like this patch would
> not be able to compile by itself.

Ah, a little problem in how the patches are split. I've fixed it. This
code doesn't get pulled into the build because nothing requires
CONFIG_FS_RICHACL at that point; that's why I didn't notice.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] vfio: make an array larger

2015-11-04 Thread walter harms


Am 04.11.2015 14:26, schrieb Dan Carpenter:
> Smatch complains about a possible out of bounds error:
> 
>   drivers/vfio/pci/vfio_pci_config.c:1241 vfio_cap_init()
>   error: buffer overflow 'pci_cap_length' 20 <= 20
> 
> Fix this by making the array larger.
> 
> Signed-off-by: Dan Carpenter 
> 
> diff --git a/drivers/vfio/pci/vfio_pci_config.c 
> b/drivers/vfio/pci/vfio_pci_config.c
> index ff75ca3..001d48a 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -46,7 +46,7 @@
>   *   0: Removed from the user visible capability list
>   *   FF: Variable length
>   */
> -static u8 pci_cap_length[] = {
> +static u8 pci_cap_length[PCI_CAP_ID_MAX + 1] = {
>   [PCI_CAP_ID_BASIC]  = PCI_STD_HEADER_SIZEOF, /* pci config header */
>   [PCI_CAP_ID_PM] = PCI_PM_SIZEOF,
>   [PCI_CAP_ID_AGP]= PCI_AGP_SIZEOF,


(i am sorry Dave)

I am not sure if that is the way to go.
this define make me feel uneasy,
#define   PCI_CAP_ID_MAX PCI_CAP_ID_AF

Would it be possible to ARRAY_SIZE(pci_cap_length) instead of PCI_CAP_ID_MAX ?
Then that would grow automatically with the array. And its more clear what
is actually happening.

re,
 wh



> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

2015-11-04 Thread Daniel Micay
> Even if we're wrong about the aging of those MADV_FREE pages, their
> contents are invalidated; they can be discarded freely, and restoring
> them is a mere GFP_ZERO allocation. All other anonymous pages have to
> be written to disk, and potentially be read back.
> 
> [ Arguably, MADV_FREE pages should even be reclaimed before inactive
>   page cache. It's the same cost to discard both types of pages, but
>   restoring page cache involves IO. ]

Keep in mind that this is memory the kernel wouldn't be getting back at
all if the allocator wasn't going out of the way to purge it, and they
aren't going to go out of their way to purge it if it means the kernel
is going to steal the pages when there isn't actually memory pressure.

An allocator would be using MADV_DONTNEED if it didn't expect that the
pages were going to be used against shortly. MADV_FREE indicates that it
has time to inform the kernel that they're unused but they could still
be very hot.

> It probably makes sense to stop thinking about them as anonymous pages
> entirely at this point when it comes to aging. They're really not. The
> LRU lists are split to differentiate access patterns and cost of page
> stealing (and restoring). From that angle, MADV_FREE pages really have
> nothing in common with in-use anonymous pages, and so they shouldn't
> be on the same LRU list.
> 
> That would also fix the very unfortunate and unexpected consequence of
> tying the lazy free optimization to the availability of swap space.
> 
> I would prefer to see this addressed before the code goes upstream.

I don't think it would be ideal for these potentially very hot pages to
be dropped before very cold pages were swapped out. It's the kind of
tuning that needs to be informed by lots of real world experience and
lots of testing. It wouldn't impact the API.

Whether MADV_FREE is useful as an API vs. something like a pair of
system calls for pinning and unpinning memory is what should be worried
about right now. The internal implementation just needs to be correct
and useful right now, not perfect. Simpler is probably better than it
being more well tuned for an initial implementation too.



signature.asc
Description: OpenPGP digital signature


[PATCH 6/8] ARM: dts: rockchip: add clock-cells for usb phy nodes

2015-11-04 Thread Heiko Stuebner
Add the #clock-cells properties for the usbphy nodes as they
provide the pll-clocks now.

Signed-off-by: Heiko Stuebner 
---
 arch/arm/boot/dts/rk3066a.dtsi | 2 ++
 arch/arm/boot/dts/rk3188.dtsi  | 2 ++
 arch/arm/boot/dts/rk3288.dtsi  | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/arch/arm/boot/dts/rk3066a.dtsi b/arch/arm/boot/dts/rk3066a.dtsi
index 946f187..3e4b41b 100644
--- a/arch/arm/boot/dts/rk3066a.dtsi
+++ b/arch/arm/boot/dts/rk3066a.dtsi
@@ -181,6 +181,7 @@
reg = <0x17c>;
clocks = < SCLK_OTGPHY0>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
 
usbphy1: usb-phy1 {
@@ -188,6 +189,7 @@
reg = <0x188>;
clocks = < SCLK_OTGPHY1>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
};
 
diff --git a/arch/arm/boot/dts/rk3188.dtsi b/arch/arm/boot/dts/rk3188.dtsi
index 6399942..48a287e 100644
--- a/arch/arm/boot/dts/rk3188.dtsi
+++ b/arch/arm/boot/dts/rk3188.dtsi
@@ -156,6 +156,7 @@
reg = <0x10c>;
clocks = < SCLK_OTGPHY0>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
 
usbphy1: usb-phy1 {
@@ -163,6 +164,7 @@
reg = <0x11c>;
clocks = < SCLK_OTGPHY1>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
};
 
diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
index c64a116..51a5d29 100644
--- a/arch/arm/boot/dts/rk3288.dtsi
+++ b/arch/arm/boot/dts/rk3288.dtsi
@@ -943,6 +943,7 @@
reg = <0x320>;
clocks = < SCLK_OTGPHY0>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
 
usbphy1: usb-phy1 {
@@ -950,6 +951,7 @@
reg = <0x334>;
clocks = < SCLK_OTGPHY1>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
 
usbphy2: usb-phy2 {
@@ -957,6 +959,7 @@
reg = <0x348>;
clocks = < SCLK_OTGPHY2>;
clock-names = "phyclk";
+   #clock-cells = <0>;
};
};
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] ARM: dts: rockchip: assign usbphy480m_src to the new usbphy pll on veyron

2015-11-04 Thread Heiko Stuebner
Veyron devices try to always set the source for usbphy480m to the usbphy0
that is the phy connected to the otg controller, because the firmware-
default is usbphy1, the ehci-controller connected to the internal camera
that might get turned off way easier to save power.

In the mainline kernel we currently don't use the usbphy480m_src at all,
as it mainly powers the uart0 source that is connected to the bluetooth
component of the wifi/bt combo.

So move that assignment over to the new real pll clock inside the usbphy.

Signed-off-by: Heiko Stuebner 
---
 arch/arm/boot/dts/rk3288-veyron.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/rk3288-veyron.dtsi 
b/arch/arm/boot/dts/rk3288-veyron.dtsi
index d4263ed..c8329b5 100644
--- a/arch/arm/boot/dts/rk3288-veyron.dtsi
+++ b/arch/arm/boot/dts/rk3288-veyron.dtsi
@@ -410,7 +410,7 @@
status = "okay";
 
assigned-clocks = < SCLK_USBPHY480M_SRC>;
-   assigned-clock-parents = < SCLK_OTGPHY0>;
+   assigned-clock-parents = <>;
dr_mode = "host";
 };
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] phy: rockchip-usb: expose the phy-internal PLLs

2015-11-04 Thread Heiko Stuebner
The USB phys on Rockchip SoCs contain their own internal PLLs to create
the 480MHz needed. Additionally this PLL output is also fed back into the
core clock-controller as possible source for clocks like the GPU or others.

Until now this was modelled incorrectly with a "virtual" factor clock in
the clock controller. The one big caveat is that if we turn off the usb phy
via the siddq signal, all analog components get turned off, including the
PLLs. It is therefore possible that a source clock gets disabled without
the clock driver ever knowing, possibly making the system hang.

Therefore register the phy-plls as real clocks that the clock driver can
then reference again normally, making the clock hirarchy finally reflect
the actual hardware.

The phy-ops get converted to simply turning that new clock on and off
which in turn controls the siddq signal of the phy.

Through this the driver gains handling for platform-specific data, to
handle the phy->clock name association.

Signed-off-by: Heiko Stuebner 
---
 .../devicetree/bindings/phy/rockchip-usb-phy.txt   |   6 +-
 drivers/phy/phy-rockchip-usb.c | 177 ++---
 2 files changed, 160 insertions(+), 23 deletions(-)

diff --git a/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt 
b/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt
index 826454a..68498d5 100644
--- a/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt
+++ b/Documentation/devicetree/bindings/phy/rockchip-usb-phy.txt
@@ -1,7 +1,10 @@
 ROCKCHIP USB2 PHY
 
 Required properties:
- - compatible: rockchip,rk3288-usb-phy
+ - compatible: matching the soc type, one of
+ "rockchip,rk3066a-usb-phy"
+ "rockchip,rk3188-usb-phy"
+ "rockchip,rk3288-usb-phy"
  - rockchip,grf : phandle to the syscon managing the "general
register files"
  - #address-cells: should be 1
@@ -21,6 +24,7 @@ required properties:
 Optional Properties:
 - clocks : phandle + clock specifier for the phy clocks
 - clock-names: string, clock name, must be "phyclk"
+- #clock-cells: for users of the phy-pll, should be 0
 
 Example:
 
diff --git a/drivers/phy/phy-rockchip-usb.c b/drivers/phy/phy-rockchip-usb.c
index f10e130..509497b 100644
--- a/drivers/phy/phy-rockchip-usb.c
+++ b/drivers/phy/phy-rockchip-usb.c
@@ -15,12 +15,14 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -36,18 +38,35 @@
 #define SIDDQ_ON   BIT(13)
 #define SIDDQ_OFF  (0 << 13)
 
+struct rockchip_usb_phys {
+   int reg;
+   const char *pll_name;
+};
+
+struct rockchip_usb_phy_pdata {
+   struct rockchip_usb_phys *phys;
+};
+
 struct rockchip_usb_phy_base {
struct device *dev;
struct regmap *reg_base;
+   const struct rockchip_usb_phy_pdata *pdata;
 };
 
 struct rockchip_usb_phy {
struct rockchip_usb_phy_base *base;
+   struct device_node *np;
unsigned intreg_offset;
struct clk  *clk;
+   struct clk  *clk480m;
+   struct clk_hw   clk480m_hw;
struct phy  *phy;
 };
 
+/*
+ * Set siddq to 1 to power down usb phy analog blocks,
+ * set to 0 to enable.
+ */
 static int rockchip_usb_phy_power(struct rockchip_usb_phy *phy,
   bool siddq)
 {
@@ -55,17 +74,57 @@ static int rockchip_usb_phy_power(struct rockchip_usb_phy 
*phy,
SIDDQ_WRITE_ENA | (siddq ? SIDDQ_ON : SIDDQ_OFF));
 }
 
-static int rockchip_usb_phy_power_off(struct phy *_phy)
+static unsigned long rockchip_usb_phy480m_recalc_rate(struct clk_hw *hw,
+   unsigned long parent_rate)
 {
-   struct rockchip_usb_phy *phy = phy_get_drvdata(_phy);
-   int ret = 0;
+   return 48000;
+}
+
+static void rockchip_usb_phy480m_disable(struct clk_hw *hw)
+{
+   struct rockchip_usb_phy *phy = container_of(hw,
+   struct rockchip_usb_phy,
+   clk480m_hw);
+
+   rockchip_usb_phy_power(phy, 1);
+}
+
+static int rockchip_usb_phy480m_enable(struct clk_hw *hw)
+{
+   struct rockchip_usb_phy *phy = container_of(hw,
+   struct rockchip_usb_phy,
+   clk480m_hw);
 
-   /* Power down usb phy analog blocks by set siddq 1 */
-   ret = rockchip_usb_phy_power(phy, 1);
-   if (ret)
+   return rockchip_usb_phy_power(phy, 0);
+}
+
+static int rockchip_usb_phy480m_is_enabled(struct clk_hw *hw)
+{
+   struct rockchip_usb_phy *phy = container_of(hw,
+   struct rockchip_usb_phy,
+   clk480m_hw);
+   int ret;
+   u32 val;
+
+   ret = regmap_read(phy->base->reg_base, phy->reg_offset, );
+   if (ret < 0)

[PATCH 2/8] phy: rockchip-usb: introduce a common data-struct for the device

2015-11-04 Thread Heiko Stuebner
This introduces a common struct that holds data belonging to
the umbrella device that contains all the phys and that we
want to use later.

Signed-off-by: Heiko Stuebner 
---
 drivers/phy/phy-rockchip-usb.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/phy/phy-rockchip-usb.c b/drivers/phy/phy-rockchip-usb.c
index dfc056b..dda1994 100644
--- a/drivers/phy/phy-rockchip-usb.c
+++ b/drivers/phy/phy-rockchip-usb.c
@@ -36,9 +36,14 @@
 #define SIDDQ_ON   BIT(13)
 #define SIDDQ_OFF  (0 << 13)
 
+struct rockchip_usb_phy_base {
+   struct device *dev;
+   struct regmap *reg_base;
+};
+
 struct rockchip_usb_phy {
+   struct rockchip_usb_phy_base *base;
unsigned intreg_offset;
-   struct regmap   *reg_base;
struct clk  *clk;
struct phy  *phy;
 };
@@ -46,7 +51,7 @@ struct rockchip_usb_phy {
 static int rockchip_usb_phy_power(struct rockchip_usb_phy *phy,
   bool siddq)
 {
-   return regmap_write(phy->reg_base, phy->reg_offset,
+   return regmap_write(phy->base->reg_base, phy->reg_offset,
SIDDQ_WRITE_ENA | (siddq ? SIDDQ_ON : SIDDQ_OFF));
 }
 
@@ -101,17 +106,23 @@ static void rockchip_usb_phy_action(void *data)
 static int rockchip_usb_phy_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
+   struct rockchip_usb_phy_base *phy_base;
struct rockchip_usb_phy *rk_phy;
struct phy_provider *phy_provider;
struct device_node *child;
-   struct regmap *grf;
unsigned int reg_offset;
int err;
 
-   grf = syscon_regmap_lookup_by_phandle(dev->of_node, "rockchip,grf");
-   if (IS_ERR(grf)) {
+   phy_base = devm_kzalloc(dev, sizeof(*phy_base), GFP_KERNEL);
+   if (!phy_base)
+   return -ENOMEM;
+
+   phy_base->dev = dev;
+   phy_base->reg_base = syscon_regmap_lookup_by_phandle(dev->of_node,
+"rockchip,grf");
+   if (IS_ERR(phy_base->reg_base)) {
dev_err(>dev, "Missing rockchip,grf property\n");
-   return PTR_ERR(grf);
+   return PTR_ERR(phy_base->reg_base);
}
 
for_each_available_child_of_node(dev->of_node, child) {
@@ -126,7 +137,6 @@ static int rockchip_usb_phy_probe(struct platform_device 
*pdev)
}
 
rk_phy->reg_offset = reg_offset;
-   rk_phy->reg_base = grf;
 
rk_phy->clk = of_clk_get_by_name(child, "phyclk");
if (IS_ERR(rk_phy->clk))
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] clk: rockchip: fix usbphy-related clocks

2015-11-04 Thread Heiko Stuebner
The otgphy clocks really only drive the phy blocks. These in turn
contain plls that then generate the 480m clocks the clock controller
uses to supply some other clocks like uart0, gpu or the video-codec.

So fix this structure to actually respect that hirarchy and removed
that usb480m fixed-rate clock working as a placeholder till now, as
this wouldn't even work if the supplying phy gets turned off while
its pll-output gets used elsewhere.

Signed-off-by: Heiko Stuebner 
---
 drivers/clk/rockchip/clk-rk3188.c | 11 +++
 drivers/clk/rockchip/clk-rk3288.c | 16 +---
 2 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/clk/rockchip/clk-rk3188.c 
b/drivers/clk/rockchip/clk-rk3188.c
index abb4760..7836a97 100644
--- a/drivers/clk/rockchip/clk-rk3188.c
+++ b/drivers/clk/rockchip/clk-rk3188.c
@@ -319,9 +319,9 @@ static struct rockchip_clk_branch common_clk_branches[] 
__initdata = {
 * the 480m are generated inside the usb block from these clocks,
 * but they are also a source for the hsicphy clock.
 */
-   GATE(SCLK_OTGPHY0, "sclk_otgphy0", "usb480m", CLK_IGNORE_UNUSED,
+   GATE(SCLK_OTGPHY0, "sclk_otgphy0", "xin24m", CLK_IGNORE_UNUSED,
RK2928_CLKGATE_CON(1), 5, GFLAGS),
-   GATE(SCLK_OTGPHY1, "sclk_otgphy1", "usb480m", CLK_IGNORE_UNUSED,
+   GATE(SCLK_OTGPHY1, "sclk_otgphy1", "xin24m", CLK_IGNORE_UNUSED,
RK2928_CLKGATE_CON(1), 6, GFLAGS),
 
COMPOSITE(0, "mac_src", mux_mac_p, 0,
@@ -635,7 +635,7 @@ static struct clk_div_table div_rk3188_aclk_core_t[] = {
{ /* sentinel */ },
 };
 
-PNAME(mux_hsicphy_p)   = { "sclk_otgphy0", "sclk_otgphy1",
+PNAME(mux_hsicphy_p)   = { "sclk_otgphy0_480m", "sclk_otgphy1_480m",
"gpll", "cpll" };
 
 static struct rockchip_clk_branch rk3188_clk_branches[] __initdata = {
@@ -739,11 +739,6 @@ static void __init rk3188_common_clk_init(struct 
device_node *np)
pr_warn("%s: could not register clock xin12m: %ld\n",
__func__, PTR_ERR(clk));
 
-   clk = clk_register_fixed_factor(NULL, "usb480m", "xin24m", 0, 20, 1);
-   if (IS_ERR(clk))
-   pr_warn("%s: could not register clock usb480m: %ld\n",
-   __func__, PTR_ERR(clk));
-
rockchip_clk_register_branches(common_clk_branches,
  ARRAY_SIZE(common_clk_branches));
 
diff --git a/drivers/clk/rockchip/clk-rk3288.c 
b/drivers/clk/rockchip/clk-rk3288.c
index 9040878..7c8a3e9 100644
--- a/drivers/clk/rockchip/clk-rk3288.c
+++ b/drivers/clk/rockchip/clk-rk3288.c
@@ -195,8 +195,8 @@ PNAME(mux_hsadcout_p)   = { "hsadc_src", "ext_hsadc" };
 PNAME(mux_edp_24m_p)   = { "ext_edp_24m", "xin24m" };
 PNAME(mux_tspout_p)= { "cpll", "gpll", "npll", "xin27m" };
 
-PNAME(mux_usbphy480m_p)= { "sclk_otgphy1", "sclk_otgphy2",
-   "sclk_otgphy0" };
+PNAME(mux_usbphy480m_p)= { "sclk_otgphy1_480m", 
"sclk_otgphy2_480m",
+   "sclk_otgphy0_480m" };
 PNAME(mux_hsicphy480m_p)   = { "cpll", "gpll", "usbphy480m_src" };
 PNAME(mux_hsicphy12m_p)= { "hsicphy12m_xin12m", 
"hsicphy12m_usbphy" };
 
@@ -506,11 +506,11 @@ static struct rockchip_clk_branch rk3288_clk_branches[] 
__initdata = {
RK3288_CLKSEL_CON(35), 6, 2, MFLAGS, 0, 5, DFLAGS,
RK3288_CLKGATE_CON(4), 10, GFLAGS),
 
-   GATE(SCLK_OTGPHY0, "sclk_otgphy0", "usb480m", CLK_IGNORE_UNUSED,
+   GATE(SCLK_OTGPHY0, "sclk_otgphy0", "xin24m", CLK_IGNORE_UNUSED,
RK3288_CLKGATE_CON(13), 4, GFLAGS),
-   GATE(SCLK_OTGPHY1, "sclk_otgphy1", "usb480m", CLK_IGNORE_UNUSED,
+   GATE(SCLK_OTGPHY1, "sclk_otgphy1", "xin24m", CLK_IGNORE_UNUSED,
RK3288_CLKGATE_CON(13), 5, GFLAGS),
-   GATE(SCLK_OTGPHY2, "sclk_otgphy2", "usb480m", CLK_IGNORE_UNUSED,
+   GATE(SCLK_OTGPHY2, "sclk_otgphy2", "xin24m", CLK_IGNORE_UNUSED,
RK3288_CLKGATE_CON(13), 6, GFLAGS),
GATE(SCLK_OTG_ADP, "sclk_otg_adp", "xin32k", CLK_IGNORE_UNUSED,
RK3288_CLKGATE_CON(13), 7, GFLAGS),
@@ -874,12 +874,6 @@ static void __init rk3288_clk_init(struct device_node *np)
pr_warn("%s: could not register clock xin12m: %ld\n",
__func__, PTR_ERR(clk));
 
-
-   clk = clk_register_fixed_factor(NULL, "usb480m", "xin24m", 0, 20, 1);
-   if (IS_ERR(clk))
-   pr_warn("%s: could not register clock usb480m: %ld\n",
-   __func__, PTR_ERR(clk));
-
clk = clk_register_fixed_factor(NULL, "hclk_vcodec_pre",
"hclk_vcodec_pre_v", 0, 1, 4);
if (IS_ERR(clk))
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to 

[PATCH 0/8] phy: rockchip-usb: correct pll handling and usb-uart

2015-11-04 Thread Heiko Stuebner
Patches 1-7 fix a long-standing issue with the clock-tree of Rockchip SoCs
namely our ignorance of the usbphy-internal pll that creates the needed
480MHz but is also a supply-clock back to the core clock-controller in
Rockchip SoCs.

Till now that was worked around using a virtual clock in the cru itself,
but that is of course ignorant of other parts then disabling the phy
behind the cru's back, thus breaking potential users of these clocks.


Patch 8, while not associated with the new pll handling, also builds
on the groundwork introduced there and adds support for the function
repurposing one of the phys as passthrough for uart-data. This enables
attaching a ttl converter to the D+ and D- pins of an usb cable to
receive uart data this way, when it is not really possible to attach
a regular serial console to a board.

One point of critique in my first iteration [0] of this was, that
due to when the reconfiguration happens we may miss parts of the logs
when earlycon is enabled. So far early_initcall gets used as the
unflattened devicetree is necessary to set this up. Doing this for
example in the early_param directly would require parsing the flattened
devicetree to get needed nodes and properties.

I still maintain that if you're working on anything before smp-bringup
you should use a real dev-board instead or try to solder uart cables
on hopefully available test-points :-) .


In any case, if patch 8 causes to much headache, it could be dropped
to not hinder the earlier 7 patches.

[0] http://comments.gmane.org/gmane.linux.ports.arm.rockchip/715


Heiko Stuebner (8):
  phy: rockchip-usb: fix clock get-put mismatch
  phy: rockchip-usb: introduce a common data-struct for the device
  phy: rockchip-usb: move per-phy init into a separate function
  phy: rockchip-usb: expose the phy-internal PLLs
  clk: rockchip: fix usbphy-related clocks
  ARM: dts: rockchip: add clock-cells for usb phy nodes
  ARM: dts: rockchip: assign usbphy480m_src to the new usbphy pll on
veyron
  phy: rockchip-usb: add handler for usb-uart functionality

 .../devicetree/bindings/phy/rockchip-usb-phy.txt   |   6 +-
 Documentation/kernel-parameters.txt|   6 +
 arch/arm/boot/dts/rk3066a.dtsi |   2 +
 arch/arm/boot/dts/rk3188.dtsi  |   2 +
 arch/arm/boot/dts/rk3288-veyron.dtsi   |   2 +-
 arch/arm/boot/dts/rk3288.dtsi  |   3 +
 drivers/clk/rockchip/clk-rk3188.c  |  11 +-
 drivers/clk/rockchip/clk-rk3288.c  |  16 +-
 drivers/phy/phy-rockchip-usb.c | 451 ++---
 9 files changed, 417 insertions(+), 82 deletions(-)

-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] phy: rockchip-usb: add handler for usb-uart functionality

2015-11-04 Thread Heiko Stuebner
Most newer Rockchip SoCs provide the possibility to use a usb-phy
as passthrough for the debug uart (uart2), making it possible to
for example get console output without needing to open the device.

This patch adds an early_initcall to enable this functionality
conditionally via the commandline and also disables the corresponding
usb controller in the devicetree.

Currently only data for the rk3288 is provided, but at least the
rk3188 and arm64 rk3368 also provide this functionality and will be
enabled later.

On a spliced usb cable the signals are tx on white wire(D+) and
rx on green wire(D-).

The one caveat is that currently the reconfiguration of the phy
happens as early_initcall, as the code depends on the unflattened
devicetree being available. Everything is fine if only a regular
console is active as the console-replay will happen after the
reconfiguation. But with earlycon active output up to smp-init
currently will get lost.

The phy is an optional property for the connected dwc2 controller,
so we still provide the phy device but fail all phy-ops with -EBUSY
to make sure the dwc2 does not try to transmit anything on the
repurposed phy.

Signed-off-by: Heiko Stuebner 
---
 Documentation/kernel-parameters.txt |   6 +
 drivers/phy/phy-rockchip-usb.c  | 231 ++--
 2 files changed, 201 insertions(+), 36 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index c6dd5f3..8d9a86e 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3370,6 +3370,12 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
 
ro  [KNL] Mount root device read-only on boot
 
+   rockchip.usb_uart
+   Enable the uart passthrough on the designated usb port
+   on Rockchip SoCs. When active, the signals of the
+   debug-uart get routed to the D+ and D- pins of the usb
+   port and the regular usb controller gets disabled.
+
root=   [KNL] Root filesystem
See name_to_dev_t comment in init/do_mounts.c.
 
diff --git a/drivers/phy/phy-rockchip-usb.c b/drivers/phy/phy-rockchip-usb.c
index 509497b..4c2f5ee 100644
--- a/drivers/phy/phy-rockchip-usb.c
+++ b/drivers/phy/phy-rockchip-usb.c
@@ -30,21 +30,23 @@
 #include 
 #include 
 
-/*
- * The higher 16-bit of this register is used for write protection
- * only if BIT(13 + 16) set to 1 the BIT(13) can be written.
- */
-#define SIDDQ_WRITE_ENABIT(29)
-#define SIDDQ_ON   BIT(13)
-#define SIDDQ_OFF  (0 << 13)
+static int enable_usb_uart;
+
+#define HIWORD_UPDATE(val, mask) \
+   ((val) | (mask) << 16)
+
+#define UOC_CON0_SIDDQ BIT(13)
 
 struct rockchip_usb_phys {
int reg;
const char *pll_name;
 };
 
+struct rockchip_usb_phy_base;
 struct rockchip_usb_phy_pdata {
struct rockchip_usb_phys *phys;
+   int (*init_usb_uart)(struct regmap *grf);
+   int usb_uart_phy;
 };
 
 struct rockchip_usb_phy_base {
@@ -61,6 +63,7 @@ struct rockchip_usb_phy {
struct clk  *clk480m;
struct clk_hw   clk480m_hw;
struct phy  *phy;
+   booluart_enabled;
 };
 
 /*
@@ -70,8 +73,9 @@ struct rockchip_usb_phy {
 static int rockchip_usb_phy_power(struct rockchip_usb_phy *phy,
   bool siddq)
 {
-   return regmap_write(phy->base->reg_base, phy->reg_offset,
-   SIDDQ_WRITE_ENA | (siddq ? SIDDQ_ON : SIDDQ_OFF));
+   u32 val = HIWORD_UPDATE(siddq ? UOC_CON0_SIDDQ : 0, UOC_CON0_SIDDQ);
+
+   return regmap_write(phy->base->reg_base, phy->reg_offset, val);
 }
 
 static unsigned long rockchip_usb_phy480m_recalc_rate(struct clk_hw *hw,
@@ -110,7 +114,7 @@ static int rockchip_usb_phy480m_is_enabled(struct clk_hw 
*hw)
if (ret < 0)
return ret;
 
-   return (val & SIDDQ_ON) ? 0 : 1;
+   return (val & UOC_CON0_SIDDQ) ? 0 : 1;
 }
 
 static const struct clk_ops rockchip_usb_phy480m_ops = {
@@ -124,6 +128,9 @@ static int rockchip_usb_phy_power_off(struct phy *_phy)
 {
struct rockchip_usb_phy *phy = phy_get_drvdata(_phy);
 
+   if (phy->uart_enabled)
+   return -EBUSY;
+
clk_disable_unprepare(phy->clk480m);
 
return 0;
@@ -133,6 +140,9 @@ static int rockchip_usb_phy_power_on(struct phy *_phy)
 {
struct rockchip_usb_phy *phy = phy_get_drvdata(_phy);
 
+   if (phy->uart_enabled)
+   return -EBUSY;
+
return clk_prepare_enable(phy->clk480m);
 }
 
@@ -146,8 +156,10 @@ static void rockchip_usb_phy_action(void *data)
 {
struct rockchip_usb_phy *rk_phy = data;
 
-   of_clk_del_provider(rk_phy->np);
-   clk_unregister(rk_phy->clk480m);
+   if (!rk_phy->uart_enabled) {
+   of_clk_del_provider(rk_phy->np);
+   

[PATCH 1/8] phy: rockchip-usb: fix clock get-put mismatch

2015-11-04 Thread Heiko Stuebner
Currently the phy driver only gets the optional clock reference but
never puts it again, neither during error handling nor on remove.
Fix that by moving the clk_put to a devm-action that gets called at
the right time when all other devm actions are done.

Signed-off-by: Heiko Stuebner 
---
 drivers/phy/phy-rockchip-usb.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/phy/phy-rockchip-usb.c b/drivers/phy/phy-rockchip-usb.c
index 91d6f34..dfc056b 100644
--- a/drivers/phy/phy-rockchip-usb.c
+++ b/drivers/phy/phy-rockchip-usb.c
@@ -90,6 +90,14 @@ static const struct phy_ops ops = {
.owner  = THIS_MODULE,
 };
 
+static void rockchip_usb_phy_action(void *data)
+{
+   struct rockchip_usb_phy *rk_phy = data;
+
+   if (rk_phy->clk)
+   clk_put(rk_phy->clk);
+}
+
 static int rockchip_usb_phy_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
@@ -124,6 +132,13 @@ static int rockchip_usb_phy_probe(struct platform_device 
*pdev)
if (IS_ERR(rk_phy->clk))
rk_phy->clk = NULL;
 
+   err = devm_add_action(dev, rockchip_usb_phy_action, rk_phy);
+   if (err) {
+   if (rk_phy->clk)
+   clk_put(rk_phy->clk);
+   return err;
+   }
+
rk_phy->phy = devm_phy_create(dev, child, );
if (IS_ERR(rk_phy->phy)) {
dev_err(dev, "failed to create PHY\n");
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] phy: rockchip-usb: move per-phy init into a separate function

2015-11-04 Thread Heiko Stuebner
This unclutters the loop in probe a lot and makes current (and future)
error handling easier to read.

Signed-off-by: Heiko Stuebner 
---
 drivers/phy/phy-rockchip-usb.c | 82 --
 1 file changed, 47 insertions(+), 35 deletions(-)

diff --git a/drivers/phy/phy-rockchip-usb.c b/drivers/phy/phy-rockchip-usb.c
index dda1994..f10e130 100644
--- a/drivers/phy/phy-rockchip-usb.c
+++ b/drivers/phy/phy-rockchip-usb.c
@@ -103,14 +103,57 @@ static void rockchip_usb_phy_action(void *data)
clk_put(rk_phy->clk);
 }
 
+static int rockchip_usb_phy_init(struct rockchip_usb_phy_base *base,
+struct device_node *child)
+{
+   struct rockchip_usb_phy *rk_phy;
+   unsigned int reg_offset;
+   int err;
+
+   rk_phy = devm_kzalloc(base->dev, sizeof(*rk_phy), GFP_KERNEL);
+   if (!rk_phy)
+   return -ENOMEM;
+
+   rk_phy->base = base;
+
+   if (of_property_read_u32(child, "reg", _offset)) {
+   dev_err(base->dev, "missing reg property in node %s\n",
+   child->name);
+   return -EINVAL;
+   }
+
+   rk_phy->reg_offset = reg_offset;
+
+   rk_phy->clk = of_clk_get_by_name(child, "phyclk");
+   if (IS_ERR(rk_phy->clk))
+   rk_phy->clk = NULL;
+
+   err = devm_add_action(base->dev, rockchip_usb_phy_action, rk_phy);
+   if (err)
+   goto err_devm_action;
+
+   rk_phy->phy = devm_phy_create(base->dev, child, );
+   if (IS_ERR(rk_phy->phy)) {
+   dev_err(base->dev, "failed to create PHY\n");
+   return PTR_ERR(rk_phy->phy);
+   }
+   phy_set_drvdata(rk_phy->phy, rk_phy);
+
+   /* only power up usb phy when it use, so disable it when init*/
+   return rockchip_usb_phy_power(rk_phy, 1);
+
+err_devm_action:
+   if (rk_phy->clk)
+   clk_put(rk_phy->clk);
+   return err;
+}
+
 static int rockchip_usb_phy_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
struct rockchip_usb_phy_base *phy_base;
-   struct rockchip_usb_phy *rk_phy;
struct phy_provider *phy_provider;
struct device_node *child;
-   unsigned int reg_offset;
int err;
 
phy_base = devm_kzalloc(dev, sizeof(*phy_base), GFP_KERNEL);
@@ -126,39 +169,8 @@ static int rockchip_usb_phy_probe(struct platform_device 
*pdev)
}
 
for_each_available_child_of_node(dev->of_node, child) {
-   rk_phy = devm_kzalloc(dev, sizeof(*rk_phy), GFP_KERNEL);
-   if (!rk_phy)
-   return -ENOMEM;
-
-   if (of_property_read_u32(child, "reg", _offset)) {
-   dev_err(dev, "missing reg property in node %s\n",
-   child->name);
-   return -EINVAL;
-   }
-
-   rk_phy->reg_offset = reg_offset;
-
-   rk_phy->clk = of_clk_get_by_name(child, "phyclk");
-   if (IS_ERR(rk_phy->clk))
-   rk_phy->clk = NULL;
-
-   err = devm_add_action(dev, rockchip_usb_phy_action, rk_phy);
-   if (err) {
-   if (rk_phy->clk)
-   clk_put(rk_phy->clk);
-   return err;
-   }
-
-   rk_phy->phy = devm_phy_create(dev, child, );
-   if (IS_ERR(rk_phy->phy)) {
-   dev_err(dev, "failed to create PHY\n");
-   return PTR_ERR(rk_phy->phy);
-   }
-   phy_set_drvdata(rk_phy->phy, rk_phy);
-
-   /* only power up usb phy when it use, so disable it when init*/
-   err = rockchip_usb_phy_power(rk_phy, 1);
-   if (err)
+   err = rockchip_usb_phy_init(phy_base, child);
+   if (err < 0)
return err;
}
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-04 Thread Andy Lutomirski
On Wed, Nov 4, 2015 at 12:00 PM, Shaohua Li  wrote:
>
> The new proposal tries to fix the TLB issue. We introduce two madvise verbs:
>
> MARK_FREE. Userspace notifies kernel the memory range can be discarded. Kernel
> just records the range in current stage. Should memory pressure happen, page
> reclaim can free the memory directly regardless the pte state.
>
> MARK_NOFREE. Userspace notifies kernel the memory range will be reused soon.
> Kernel deletes the record and prevents page reclaim discards the memory. If 
> the
> memory isn't reclaimed, userspace will access the old memory, otherwise do
> normal page fault handling.
>
> The point is to let userspace notify kernel if memory can be discarded, 
> instead
> of depending on pte dirty bit used by MADV_FREE. With these, no TLB flush is
> required till page reclaim actually frees the memory (page reclaim need do the
> TLB flush for MADV_FREE too). It still preserves the lazy memory free merit of
> MADV_FREE.
>
> Compared to MADV_FREE, reusing memory with the new proposal isn't transparent,
> eg must call MARK_NOFREE. But it's easy to utilize the new API in jemalloc.
>

I can't speak to the usefulness of this or to other arches, but on x86
(unless you have nohz_full or similar enabled), a pair of syscalls
should be *much* faster than an IPI or a page fault.

I don't know how expensive it is to write to a clean page or to access
an unaccessed page on x86.  I'm sure it's not free (there's memory
bandwidth if nothing else), but it could be very cheap.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/5] do_div(): generic optimization for constant divisor on 32-bit machines

2015-11-04 Thread Måns Rullgård
Nicolas Pitre  writes:

> On Tue, 3 Nov 2015, Arnd Bergmann wrote:
>
>> On Tuesday 03 November 2015 13:32:17 kbuild test robot wrote:
>> > 
>> >net/can/bcm.c: In function 'bcm_proc_show':
>> > >> net/can/bcm.c:223:1: warning: the frame size of 1156 bytes is larger 
>> > >> than 1024 bytes [-Wframe-larger-than=]
>> > }
>> 
>> Interesting, that is a lot of stack for a function that only has a couple
>> of local variables:
>> 
>> #define IFNAMSIZ16
>> char ifname[IFNAMSIZ];
>> struct sock *sk = (struct sock *)m->private;
>> struct bcm_sock *bo = bcm_sk(sk);
>> struct bcm_op *op;
>> 
>> 
>> This is a parisc-allyesconfig kernel, so I assume that 
>> CONFIG_PROFILE_ALL_BRANCHES
>> is on, which instruments every 'if' in the kernel. If that causes problems,
>> we could decide to disable the do_div optimization whenever 
>> CONFIG_PROFILE_ALL_BRANCHES
>> is enabled.
>
> I have an ARM allyesconfig build here where that function needs a frame 
> of 88 bytes only. And that is with my do_div optimization applied.
>
> With the do_div optimization turned off, the stack frame is still 88 
> bytes.
>
> Turning on CONFIG_PROFILE_ALL_BRANCHES makes the frame size to grow to 
> 96 bytes.
>
> Keeping CONFIG_PROFILE_ALL_BRANCHES=y and activating the do_div 
> optimization again, and the function frame size goes back to 88 bytes.
>
> So I wonder what parisc gcc could be doing with this code.

I've seen parisc gcc do many strange things.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND][PATCH] ARM: debug: add support for Palmchip 16550-like UART

2015-11-04 Thread Måns Rullgård
Arnd Bergmann  writes:

> On Tuesday 27 October 2015 12:57:57 Mans Rullgard wrote:
>> --- a/arch/arm/include/debug/8250.S
>> +++ b/arch/arm/include/debug/8250.S
>> @@ -9,6 +9,18 @@
>>   */
>>  #include 
>>  
>> +#ifdef CONFIG_DEBUG_UART_8250_PALMCHIP
>> +
>> +#undef UART_TX
>> +#undef UART_LSR
>> +#undef UART_MSR
>> +
>> +#define UART_TX 1
>> +#define UART_LSR 7
>> +#define UART_MSR 8
>> +
>> +#endif
>> 
>
> Maybe use a separate file instead of an #ifdef?
>
> Something like
>
> arch/arm/include/debug/8250-palmchip.S:
>
> #include 
>  
> #undef UART_TX
> #undef UART_LSR
> #undef UART_MSR
>
> #define UART_TX 1
> #define UART_LSR 7
> #define UART_MSR 8
>
> #include "8250.S"

Good idea.  I'll make a new patch.

-- 
Måns Rullgård
m...@mansr.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] i40e: Look up MAC address in Open Firmware or IDPROM

2015-11-04 Thread Andy Shevchenko
On Wed, Nov 4, 2015 at 10:06 PM, Sowmini Varadhan
 wrote:
> On (11/04/15 21:59), Andy Shevchenko wrote:
>>
> See earlier response.

So, if maintainer is okay I'm also okay with those and you may take my tag.


-- 
With Best Regards,
Andy Shevchenko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-04 Thread Daniel Micay
> That's comparable to Android's pinning / unpinning API for ashmem and I
> think it makes sense if it's faster. It's different than the MADV_FREE
> API though, because the new allocations that are handed out won't have
> the usual lazy commit which MADV_FREE provides. Pages in an allocation
> that's handed out can still be dropped until they are actually written
> to. It's considered active by jemalloc either way, but only a subset of
> the active pages are actually committed. There's probably a use case for
> both of these systems.

Also, consider that MADV_FREE would allow jemalloc to be extremely
aggressive with purging when it actually has to do it. It can start with
the largest span of memory and it can mark more than strictly necessary
to drop below the ratio as there's no cost to using the memory again
(not even a system call).

Since the main cost is using the system call at all, there's going to be
pressure to mark the largest possible spans in one go. It will mean
concentration on memory compaction will improve performance. I think
that's the right direction for the kernel to be guiding userspace. It
will play better with THP than the allocator trying to be very precise
with purging based on aging.



signature.asc
Description: OpenPGP digital signature


<    1   2   3   4   5   6   7   8   9   10   >