date:20170412

[tip:x86/mm] x86/mm: Fix dump pagetables for 4 levels of page tables

2017-04-12 Thread tip-bot for Juergen Gross

Commit-ID:  84bbabc3a452e8085cfbd745ff0bff2b89074417
Gitweb: http://git.kernel.org/tip/84bbabc3a452e8085cfbd745ff0bff2b89074417
Author: Juergen Gross 
AuthorDate: Wed, 12 Apr 2017 16:36:34 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 13 Apr 2017 00:26:30 +0200

x86/mm: Fix dump pagetables for 4 levels of page tables

Commit fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level
paging") introduced an error for dumping with only 4 levels by setting
PGD_LEVEL_MULT to a wrong value.

This is leading to e.g. addresses printed as "(null)" for ranges:

  x86/mm: Found insecure W+X mapping at address (null)/(null)

Make PGD_LEVEL_MULT a multiple of PTRS_PER_P4D instead of PTRS_PER_PUD

Fixes: fdd3d8ce0ea62 ("x86/dump_pagetables: Add support for 5-level paging")
Signed-off-by: Juergen Gross 
Reviewed-by: Kirill A. Shutemov 
Link: http://lkml.kernel.org/r/20170412143634.6846-1-jgr...@suse.com
Signed-off-by: Thomas Gleixner 

---
 arch/x86/mm/dump_pagetables.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 9f305be..bce6990 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -111,7 +111,7 @@ static struct addr_marker address_markers[] = {
 #define PMD_LEVEL_MULT (PTRS_PER_PTE * PTE_LEVEL_MULT)
 #define PUD_LEVEL_MULT (PTRS_PER_PMD * PMD_LEVEL_MULT)
 #define P4D_LEVEL_MULT (PTRS_PER_PUD * PUD_LEVEL_MULT)
-#define PGD_LEVEL_MULT (PTRS_PER_PUD * P4D_LEVEL_MULT)
+#define PGD_LEVEL_MULT (PTRS_PER_P4D * P4D_LEVEL_MULT)
 
 #define pt_dump_seq_printf(m, to_dmesg, fmt, args...)  \
 ({ \

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> We can futz with that and have them specify which chain (or both)
> that they want to be added to.

Well, I didn't want the atomic chain to be a notifier because we can
keep it simple and non-blocking. Only the process context one will be.

So the question is, do we even have a use case for outside consumers
hanging on the atomic chain? Because if not, we're good to go.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 03:26:19PM -0700, Luck, Tony wrote:
> We can futz with that and have them specify which chain (or both)
> that they want to be added to.

Well, I didn't want the atomic chain to be a notifier because we can
keep it simple and non-blocking. Only the process context one will be.

So the question is, do we even have a use case for outside consumers
hanging on the atomic chain? Because if not, we're good to go.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

[PATCH] kbuild: drop -Wno-unknown-warning-option from clang options

2017-04-12 Thread Masahiro Yamada

Since commit c3f0d0bc5b01 ("kbuild, LLVMLinux: Add -Werror to
cc-option to support clang"), cc-option and friends work correctly
for clang.

However, the combination of -Werror and -Wno-unknown-warning-option
makes clang happy with any unknown warning options.

Once -Wno-unknown-warning-option is added, any succeeding call of
cc-disable-warning is evaluated positive, then unknown warning
options are accepted.  This should be dropped.

Signed-off-by: Masahiro Yamada 
---

 Makefile   | 1 -
 scripts/Makefile.extrawarn | 1 -
 2 files changed, 2 deletions(-)

diff --git a/Makefile b/Makefile
index 57a3695..03a9f9b 100644
--- a/Makefile
+++ b/Makefile
@@ -696,7 +696,6 @@ KBUILD_CFLAGS += $(stackp-flag)
 
 ifeq ($(cc-name),clang)
 KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,)
-KBUILD_CPPFLAGS += $(call cc-option,-Wno-unknown-warning-option,)
 KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier)
 KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 7c321a6..fb3522f 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -64,7 +64,6 @@ ifeq ($(cc-name),clang)
 KBUILD_CFLAGS += $(call cc-disable-warning, initializer-overrides)
 KBUILD_CFLAGS += $(call cc-disable-warning, unused-value)
 KBUILD_CFLAGS += $(call cc-disable-warning, format)
-KBUILD_CFLAGS += $(call cc-disable-warning, unknown-warning-option)
 KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length)
 KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized)
-- 
2.7.4

[PATCH] kbuild: drop -Wno-unknown-warning-option from clang options

2017-04-12 Thread Masahiro Yamada

Since commit c3f0d0bc5b01 ("kbuild, LLVMLinux: Add -Werror to
cc-option to support clang"), cc-option and friends work correctly
for clang.

However, the combination of -Werror and -Wno-unknown-warning-option
makes clang happy with any unknown warning options.

Once -Wno-unknown-warning-option is added, any succeeding call of
cc-disable-warning is evaluated positive, then unknown warning
options are accepted.  This should be dropped.

Signed-off-by: Masahiro Yamada 
---

 Makefile   | 1 -
 scripts/Makefile.extrawarn | 1 -
 2 files changed, 2 deletions(-)

diff --git a/Makefile b/Makefile
index 57a3695..03a9f9b 100644
--- a/Makefile
+++ b/Makefile
@@ -696,7 +696,6 @@ KBUILD_CFLAGS += $(stackp-flag)
 
 ifeq ($(cc-name),clang)
 KBUILD_CPPFLAGS += $(call cc-option,-Qunused-arguments,)
-KBUILD_CPPFLAGS += $(call cc-option,-Wno-unknown-warning-option,)
 KBUILD_CFLAGS += $(call cc-disable-warning, unused-variable)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier)
 KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index 7c321a6..fb3522f 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -64,7 +64,6 @@ ifeq ($(cc-name),clang)
 KBUILD_CFLAGS += $(call cc-disable-warning, initializer-overrides)
 KBUILD_CFLAGS += $(call cc-disable-warning, unused-value)
 KBUILD_CFLAGS += $(call cc-disable-warning, format)
-KBUILD_CFLAGS += $(call cc-disable-warning, unknown-warning-option)
 KBUILD_CFLAGS += $(call cc-disable-warning, sign-compare)
 KBUILD_CFLAGS += $(call cc-disable-warning, format-zero-length)
 KBUILD_CFLAGS += $(call cc-disable-warning, uninitialized)
-- 
2.7.4

Re: net/ipv4: use-after-free in ipv4_datagram_support_cmsg

2017-04-12 Thread Willem de Bruijn

On Wed, Apr 12, 2017 at 4:47 PM, Eric Dumazet  wrote:
> On Wed, 2017-04-12 at 13:07 -0700, Cong Wang wrote:
>> On Wed, Apr 12, 2017 at 8:39 AM, Willem de Bruijn
>>  wrote:
>> > ===
>> >> BUG: KASAN: use-after-free in ipv4_datagram_support_cmsg
>> >> net/ipv4/ip_sockglue.c:500 [inline] at addr 880059be0128
>> >
>> > Thanks for the report. This is accessing skb->dev from within recvmsg() at 
>> > line
>> >
>> > info->ipi_ifindex = skb->dev->ifindex;
>> >
>> > Introduced in 829ae9d61165 ("net-timestamp: allow reading recv cmsg on
>> > errqueue with origin tstamp"). At this time the device may indeed have
>> > gone away. I'm having a look at a way to read this in the receive BH
>> > and store the ifindex.
>>
>> Why not use skb_iif?

This code is called from the error path for transmit timestamps.

We can make use of the fact that SKB_EXT_ERR used on enqueue has iif as
the first field in its control block. This also holds for the PKTINFO_SKB_CB
struct to which skb->cb is cast on dequeue when it copies pktinfo to userspace.
So if set on enqueue in __skb_complete_tx_timestamp, no conversion operation
is even needed on dequeue, let alone the currently buggy line that touches
skb->dev.

This iif cast was added for this purpose in the receive path in 0b922b7a829c
("net: original ingress device index in PKTINFO").

The device pointer is valid on enqueue for all paths called from device drivers,
as well as from dev_queue_xmit for SCM_TSTAMP_SCHED generation in
__dev_queue_xmit. The exception is SCM_TSTAMP_ACK generation, but
there skb->dev is NULL.

The v6 path does need a conversion, but already does this in
ip6_datagram_recv_common_ctl. There, too, we can remove the buggy
logic to set it from skb->dev->ifindex in ip6_datagram_support_cmsg.

I will send a patch.

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Luck, Tony

On Thu, Apr 13, 2017 at 12:16:39AM +0200, Borislav Petkov wrote:
> ... and it is midnight here so I could be talking crap but we probably
> should really split the reporting "chain" into two:

This shouldn't be too painful. Users ask to be put on the chain via
the wrapper:

void mce_register_decode_chain(struct notifier_block *nb)
{
atomic_inc(_notifiers);

WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);

atomic_notifier_chain_register(_mce_decoder_chain, nb);
}

We can futz with that and have them specify which chain (or both)
that they want to be added to.

-Tony

Re: net/ipv4: use-after-free in ipv4_datagram_support_cmsg

2017-04-12 Thread Willem de Bruijn

On Wed, Apr 12, 2017 at 4:47 PM, Eric Dumazet  wrote:
> On Wed, 2017-04-12 at 13:07 -0700, Cong Wang wrote:
>> On Wed, Apr 12, 2017 at 8:39 AM, Willem de Bruijn
>>  wrote:
>> > ===
>> >> BUG: KASAN: use-after-free in ipv4_datagram_support_cmsg
>> >> net/ipv4/ip_sockglue.c:500 [inline] at addr 880059be0128
>> >
>> > Thanks for the report. This is accessing skb->dev from within recvmsg() at 
>> > line
>> >
>> > info->ipi_ifindex = skb->dev->ifindex;
>> >
>> > Introduced in 829ae9d61165 ("net-timestamp: allow reading recv cmsg on
>> > errqueue with origin tstamp"). At this time the device may indeed have
>> > gone away. I'm having a look at a way to read this in the receive BH
>> > and store the ifindex.
>>
>> Why not use skb_iif?

This code is called from the error path for transmit timestamps.

We can make use of the fact that SKB_EXT_ERR used on enqueue has iif as
the first field in its control block. This also holds for the PKTINFO_SKB_CB
struct to which skb->cb is cast on dequeue when it copies pktinfo to userspace.
So if set on enqueue in __skb_complete_tx_timestamp, no conversion operation
is even needed on dequeue, let alone the currently buggy line that touches
skb->dev.

This iif cast was added for this purpose in the receive path in 0b922b7a829c
("net: original ingress device index in PKTINFO").

The device pointer is valid on enqueue for all paths called from device drivers,
as well as from dev_queue_xmit for SCM_TSTAMP_SCHED generation in
__dev_queue_xmit. The exception is SCM_TSTAMP_ACK generation, but
there skb->dev is NULL.

The v6 path does need a conversion, but already does this in
ip6_datagram_recv_common_ctl. There, too, we can remove the buggy
logic to set it from skb->dev->ifindex in ip6_datagram_support_cmsg.

I will send a patch.

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Luck, Tony

On Thu, Apr 13, 2017 at 12:16:39AM +0200, Borislav Petkov wrote:
> ... and it is midnight here so I could be talking crap but we probably
> should really split the reporting "chain" into two:

This shouldn't be too painful. Users ask to be put on the chain via
the wrapper:

void mce_register_decode_chain(struct notifier_block *nb)
{
atomic_inc(_notifiers);

WARN_ON(nb->priority > MCE_PRIO_LOWEST && nb->priority < MCE_PRIO_EDAC);

atomic_notifier_chain_register(_mce_decoder_chain, nb);
}

We can futz with that and have them specify which chain (or both)
that they want to be added to.

-Tony

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 11:47:49PM +0200, Borislav Petkov wrote:
> so we need to think about something better to handle events down the
> whole chain. Maybe route events from the atomic path to the blocking
> path where the sleeping notifier callbacks can sleep as much as they
> want to...

... and it is midnight here so I could be talking crap but we probably
should really split the reporting "chain" into two:

1. atomic context which doesn't do any notifier wankery. We simply call
non-sleeping functions one after the other. And those should be fast and
not taking any locks.

For example, I'm thinking you want to decode the error and that's it. So
we can replace the notifier call in print_mce() with our atomic context
function.

Then:

2. move the notifier completely in process context, convert it to a
blocking one and connect there all the noodling stuff like EDAC, NFIT,
mcelog, EXTLOG, ... I.e., all the logging machinery.

Problem solved.

And here's where you come in and say, "Boris, you're talking crap.." :-)

Lemme sleep on it.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 11:47:49PM +0200, Borislav Petkov wrote:
> so we need to think about something better to handle events down the
> whole chain. Maybe route events from the atomic path to the blocking
> path where the sleeping notifier callbacks can sleep as much as they
> want to...

... and it is midnight here so I could be talking crap but we probably
should really split the reporting "chain" into two:

1. atomic context which doesn't do any notifier wankery. We simply call
non-sleeping functions one after the other. And those should be fast and
not taking any locks.

For example, I'm thinking you want to decode the error and that's it. So
we can replace the notifier call in print_mce() with our atomic context
function.

Then:

2. move the notifier completely in process context, convert it to a
blocking one and connect there all the noodling stuff like EDAC, NFIT,
mcelog, EXTLOG, ... I.e., all the logging machinery.

Problem solved.

And here's where you come in and say, "Boris, you're talking crap.." :-)

Lemme sleep on it.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [PATCH v2 08/17] fs: retrofit old error reporting API onto new infrastructure

2017-04-12 Thread NeilBrown

On Wed, Apr 12 2017, Jeff Layton wrote:

> Now that we have a better way to store and report errors that occur
> during writeback, we need to convert the existing codebase to use it. We
> could just adapt all of the filesystem code and related infrastructure
> to the new API, but that's a lot of churn.
>
> When it comes to setting errors in the mapping, filemap_set_wb_error is
> a drop-in replacement for mapping_set_error. Turn that function into a
> simple wrapper around the new one.
>
> Because we want to ensure that writeback errors are always reported at
> fsync time, inject filemap_report_wb_error calls much closer to the
> syscall boundary. For fsync calls (and things like the nfsd equivalent),
> we either return the error that the fsync operation returns, or the one
> returned by filemap_report_wb_error. In both cases, we advance the
> file->f_wb_err to the latest value.
>
> The final piece of the puzzle is what to do about filemap_check_errors
> calls that are being called directly or via filemap_* functions. Here,
> we must take a little "creative license".
>
> Since we now handle advancing the file->f_wb_err value at the generic
> filesystem layer, we no longer need those callers to do anything like
> that, or clear errors out of the mapping. A lot of the existing codebase
> relies on being getting an error back from those functions when there is
> a writeback problem, so we do still want to have them report writeback
> errors somehow.
>
> When reporting writeback errors, we will always report errors that have
> occurred since a particular point in time. With the old writeback error
> reporting, the time we used was "since it was last tested/cleared" which
> is entirely arbitrary and potentially racy. Now, we can at least report
> the latest error that has occurred since an arbitrary point in time
> (represented as a sampled wb_err_t value).
>
> In the case where we don't have a struct file to work with, this patch
> just has the wrappers sample the mapping->wb_err value, and use that as
> an arbitrary point from which to check for errors.
>
> That's probably not ideal, particularly in the case of something like
> filemap_fdatawait, but I'm not sure it's any worse than what we
> already have, and this gives us a basis from which to work.

This aspect of the patch looked rather odd -- sampling a wb_err_t at
some arbitrary time, and then comparing it later.  So that for going to
the trouble of explaining it.

I suspect that the filemap_check_wb_error() will need to be moved
into some parent of the current call site, which is essentially what you
suggest below.  It would be nice if we could do that first, rather than
having the current rather odd code.  But maybe this way is an easier
transition.  It isn't obviously wrong, it just isn't obviously right
either.

>  int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int 
> datasync)
>  {
> + int ret, ret2;
>   struct inode *inode = file->f_mapping->host;
>  
>   if (!file->f_op->fsync)
> @@ -192,7 +193,9 @@ int vfs_fsync_range(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   spin_unlock(>i_lock);
>   mark_inode_dirty_sync(inode);
>   }
> - return call_fsync(file, start, end, datasync);
> + ret = call_fsync(file, start, end, datasync);
> + ret2 = filemap_report_wb_error(file);
> + return ret ? : ret2;
>  }
>  EXPORT_SYMBOL(vfs_fsync_range);
>  


>  static int shm_fsync(struct file *file, loff_t start, loff_t end, int 
> datasync)
>  {
> + int ret, ret2;
>   struct shm_file_data *sfd = shm_file_data(file);
>  
>   if (!sfd->file->f_op->fsync)
>   return -EINVAL;
> - return call_fsync(sfd->file, start, end, datasync);
> + ret = call_fsync(sfd->file, start, end, datasync);
> + ret2 = filemap_report_wb_error(file);
> + return ret ? : ret2;
>  }

These are the only two places that call_fsync() is called.
Did you consider moving the filemap_report_wb_error() call into
call_fsync() ??

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: [PATCH v2 08/17] fs: retrofit old error reporting API onto new infrastructure

2017-04-12 Thread NeilBrown

On Wed, Apr 12 2017, Jeff Layton wrote:

> Now that we have a better way to store and report errors that occur
> during writeback, we need to convert the existing codebase to use it. We
> could just adapt all of the filesystem code and related infrastructure
> to the new API, but that's a lot of churn.
>
> When it comes to setting errors in the mapping, filemap_set_wb_error is
> a drop-in replacement for mapping_set_error. Turn that function into a
> simple wrapper around the new one.
>
> Because we want to ensure that writeback errors are always reported at
> fsync time, inject filemap_report_wb_error calls much closer to the
> syscall boundary. For fsync calls (and things like the nfsd equivalent),
> we either return the error that the fsync operation returns, or the one
> returned by filemap_report_wb_error. In both cases, we advance the
> file->f_wb_err to the latest value.
>
> The final piece of the puzzle is what to do about filemap_check_errors
> calls that are being called directly or via filemap_* functions. Here,
> we must take a little "creative license".
>
> Since we now handle advancing the file->f_wb_err value at the generic
> filesystem layer, we no longer need those callers to do anything like
> that, or clear errors out of the mapping. A lot of the existing codebase
> relies on being getting an error back from those functions when there is
> a writeback problem, so we do still want to have them report writeback
> errors somehow.
>
> When reporting writeback errors, we will always report errors that have
> occurred since a particular point in time. With the old writeback error
> reporting, the time we used was "since it was last tested/cleared" which
> is entirely arbitrary and potentially racy. Now, we can at least report
> the latest error that has occurred since an arbitrary point in time
> (represented as a sampled wb_err_t value).
>
> In the case where we don't have a struct file to work with, this patch
> just has the wrappers sample the mapping->wb_err value, and use that as
> an arbitrary point from which to check for errors.
>
> That's probably not ideal, particularly in the case of something like
> filemap_fdatawait, but I'm not sure it's any worse than what we
> already have, and this gives us a basis from which to work.

This aspect of the patch looked rather odd -- sampling a wb_err_t at
some arbitrary time, and then comparing it later.  So that for going to
the trouble of explaining it.

I suspect that the filemap_check_wb_error() will need to be moved
into some parent of the current call site, which is essentially what you
suggest below.  It would be nice if we could do that first, rather than
having the current rather odd code.  But maybe this way is an easier
transition.  It isn't obviously wrong, it just isn't obviously right
either.

>  int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int 
> datasync)
>  {
> + int ret, ret2;
>   struct inode *inode = file->f_mapping->host;
>  
>   if (!file->f_op->fsync)
> @@ -192,7 +193,9 @@ int vfs_fsync_range(struct file *file, loff_t start, 
> loff_t end, int datasync)
>   spin_unlock(>i_lock);
>   mark_inode_dirty_sync(inode);
>   }
> - return call_fsync(file, start, end, datasync);
> + ret = call_fsync(file, start, end, datasync);
> + ret2 = filemap_report_wb_error(file);
> + return ret ? : ret2;
>  }
>  EXPORT_SYMBOL(vfs_fsync_range);
>  


>  static int shm_fsync(struct file *file, loff_t start, loff_t end, int 
> datasync)
>  {
> + int ret, ret2;
>   struct shm_file_data *sfd = shm_file_data(file);
>  
>   if (!sfd->file->f_op->fsync)
>   return -EINVAL;
> - return call_fsync(sfd->file, start, end, datasync);
> + ret = call_fsync(sfd->file, start, end, datasync);
> + ret2 = filemap_report_wb_error(file);
> + return ret ? : ret2;
>  }

These are the only two places that call_fsync() is called.
Did you consider moving the filemap_report_wb_error() call into
call_fsync() ??

Thanks,
NeilBrown


signature.asc
Description: PGP signature

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-12 Thread Andrew Lunn

> >>> To give some more background and rational for this change.
> >>>
> >>> On a platform where we have a parent MDIO bus, backed by the
> >>> mdio-bcm-unimac.c driver, we also register a slave MII bus (through
> >>> net/dsa/dsa2.c) which is parented to this UniMAC MDIO bus through an
> >>> assignment of of_node. This slave MII bus is created in order to
> >>> intercept reads/writes to problematic addresses (e.g: that clashes with
> >>> another piece of hardware).
> >>>
> >>> This means that the slave DSA MII bus inherits all child nodes from the
> >>> originating master MII bus. This also means that when the slave MII bus
> >>> is probed via of_mdiobus_register(), we probe the same devices twice:
> >>> once through the master, another time through the slave.
> >>
> >> Ah, O.K. This makes more sense. On the hardware i have, we get three
> >> deep in MDIO busses. We have the FEC mdio bus. On top of that we have
> >> a gpio-mux-mdio, and on top of that we have the mv88e6xxx mdio
> >> bus. And i've never seen issues.
> >>
> >> So your real problem here is you have two mdio busses using the same
> >> device tree properties. I would actually say that is just plain
> >> broken.
> > 
> > From a Device Tree/HW representation perspective, we do have the
> > external BCM53125 switch physically attached to the 7445/7278
> > SWITCH_MDIO bus (backed by mdio-bcm-unimac) so in that regard the
> > representation is correct. There is also an integrated Gigabit PHY
> > (bcm7xxx) which is attached to that bus.

This is made harder by you talking about a board which does not appear
to have its DT file in mainline. So i'm having to guess what it looks
like.

So what i think we are talking about is this bit of code:

static int bcm_sf2_mdio_register(struct dsa_switch *ds)
{
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
struct device_node *dn;
static int index;
int err;

/* Find our integrated MDIO bus node */
dn = of_find_compatible_node(NULL, NULL, "brcm,unimac-mdio");
priv->master_mii_bus = of_mdio_find_bus(dn);
if (!priv->master_mii_bus)
return -EPROBE_DEFER;

get_device(>master_mii_bus->dev);
priv->master_mii_dn = dn;

priv->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
if (!priv->slave_mii_bus)
return -ENOMEM;

priv->slave_mii_bus->priv = priv;
priv->slave_mii_bus->name = "sf2 slave mii";
priv->slave_mii_bus->read = bcm_sf2_sw_mdio_read;
priv->slave_mii_bus->write = bcm_sf2_sw_mdio_write;
snprintf(priv->slave_mii_bus->id, MII_BUS_ID_SIZE, "sf2-%d",
 index++);
priv->slave_mii_bus->dev.of_node = dn;

If i get you right, your switch is hanging off the MDIO bus
"brcm,unimac-mdio" you find the dn for. You then register another MDIO
bus using the exact same node? How does that make any sense? Isn't it
a physical separate MDIO bus? So it should have its own set of nodes
in the device tree. This is how we do it for the Marvell switches. See
Documentation/devicetree/binding/net/dsa/marvell.txt and
arch/arm/boot/dts/vf610-zii-dev-rev-b.dts. That DT blob uses
phy-handle to link the switch ports to the phys on the mdio bus.

   Andrew

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-12 Thread Andrew Lunn

> >>> To give some more background and rational for this change.
> >>>
> >>> On a platform where we have a parent MDIO bus, backed by the
> >>> mdio-bcm-unimac.c driver, we also register a slave MII bus (through
> >>> net/dsa/dsa2.c) which is parented to this UniMAC MDIO bus through an
> >>> assignment of of_node. This slave MII bus is created in order to
> >>> intercept reads/writes to problematic addresses (e.g: that clashes with
> >>> another piece of hardware).
> >>>
> >>> This means that the slave DSA MII bus inherits all child nodes from the
> >>> originating master MII bus. This also means that when the slave MII bus
> >>> is probed via of_mdiobus_register(), we probe the same devices twice:
> >>> once through the master, another time through the slave.
> >>
> >> Ah, O.K. This makes more sense. On the hardware i have, we get three
> >> deep in MDIO busses. We have the FEC mdio bus. On top of that we have
> >> a gpio-mux-mdio, and on top of that we have the mv88e6xxx mdio
> >> bus. And i've never seen issues.
> >>
> >> So your real problem here is you have two mdio busses using the same
> >> device tree properties. I would actually say that is just plain
> >> broken.
> > 
> > From a Device Tree/HW representation perspective, we do have the
> > external BCM53125 switch physically attached to the 7445/7278
> > SWITCH_MDIO bus (backed by mdio-bcm-unimac) so in that regard the
> > representation is correct. There is also an integrated Gigabit PHY
> > (bcm7xxx) which is attached to that bus.

This is made harder by you talking about a board which does not appear
to have its DT file in mainline. So i'm having to guess what it looks
like.

So what i think we are talking about is this bit of code:

static int bcm_sf2_mdio_register(struct dsa_switch *ds)
{
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
struct device_node *dn;
static int index;
int err;

/* Find our integrated MDIO bus node */
dn = of_find_compatible_node(NULL, NULL, "brcm,unimac-mdio");
priv->master_mii_bus = of_mdio_find_bus(dn);
if (!priv->master_mii_bus)
return -EPROBE_DEFER;

get_device(>master_mii_bus->dev);
priv->master_mii_dn = dn;

priv->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
if (!priv->slave_mii_bus)
return -ENOMEM;

priv->slave_mii_bus->priv = priv;
priv->slave_mii_bus->name = "sf2 slave mii";
priv->slave_mii_bus->read = bcm_sf2_sw_mdio_read;
priv->slave_mii_bus->write = bcm_sf2_sw_mdio_write;
snprintf(priv->slave_mii_bus->id, MII_BUS_ID_SIZE, "sf2-%d",
 index++);
priv->slave_mii_bus->dev.of_node = dn;

If i get you right, your switch is hanging off the MDIO bus
"brcm,unimac-mdio" you find the dn for. You then register another MDIO
bus using the exact same node? How does that make any sense? Isn't it
a physical separate MDIO bus? So it should have its own set of nodes
in the device tree. This is how we do it for the Marvell switches. See
Documentation/devicetree/binding/net/dsa/marvell.txt and
arch/arm/boot/dts/vf610-zii-dev-rev-b.dts. That DT blob uses
phy-handle to link the switch ports to the phys on the mdio bus.

   Andrew

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-12 Thread Benjamin Herrenschmidt

On Wed, 2017-04-12 at 11:09 -0600, Logan Gunthorpe wrote:
> 
> > Do you handle funky address translation too ? IE. the fact that the PCI
> > addresses aren't the same as the CPU physical addresses for a BAR ?
> 
> No, we use the CPU physical address of the BAR. If it's not mapped that
> way we can't use it.

Ok, you need to fix that or a bunch of architectures won't work. 

Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They
will perform the conversion between the struct resource content (CPU
physical address) and the actual PCI bus side address.

When behind the same switch you need to use PCI addresses. If one tries
later to do P2P between host bridges (via the CPU fabric) things get
more complex and one will have to use either CPU addresses or something
else alltogether (probably would have to teach the arch DMA mapping
routines to work with those struct pages you create and return the
right thing).

> > > This will mean many setups that could likely
> > > work well will not be supported so that we can be more confident it
> > > will work and not place any responsibility on the user to understand
> > > their topology. (We've chosen to go this route based on feedback we
> > > received at LSF).
> > > 
> > > In order to enable this functionality we introduce a new p2pmem device
> > > which can be instantiated by PCI drivers. The device will register some
> > > PCI memory as ZONE_DEVICE and provide an genalloc based allocator for
> > > users of these devices to get buffers.
> > 
> > I don't completely understand this. This is actual memory on the PCI
> > bus ? Where does it come from ? Or are you just trying to create struct
> > pages that cover your PCIe DMA target ?
> 
> Yes, the memory is on the PCI bus in a BAR. For now we have a special
> PCI card for this, but in the future it would likely be the CMB in an
> NVMe card. These patches create struct pages to map these BAR addresses
> using ZONE_DEVICE.

Ok.

So ideally we'd want things like dma_map_* to be able to be fed those
struct pages and do the right thing which is ... tricky, especially
with the address translation I mentioned since the address will be
different whether the initiator is on the same host bridge as the
target or not.

> > So correct me if I'm wrong, you are trying to create struct page's that
> > map a PCIe BAR right ? I'm trying to understand how that interacts with
> > what Jerome is doing for HMM.
> 
> Yes, well we are using ZONE_DEVICE in the exact same way as the dax code
> is. These patches use the existing API with no modifications. As I
> understand it, HMM was using ZONE_DEVICE in a way that was quite
> different to how it was originally designed.

Sort-of. I don't see why there would be a conflict with the struct
pages use though. Jerome can you chime in ? Jerome: It looks like they
are just laying out struct page over a BAR which is the same thing I
think you should do when the BAR is "large enough" for the GPU memory.

The case where HMM uses "holes" in the address space for its struct
page is somewhat orthogonal but I also see no conflict here.

> > The reason is that the HMM currently creates the struct pages with
> > "fake" PFNs pointing to a hole in the address space rather than
> > covering the actual PCIe memory of the GPU. He does that to deal with
> > the fact that some GPUs have a smaller aperture on PCIe than their
> > total memory.
> 
> I'm aware of what HMM is trying to do and although I'm not familiar with
> the intimate details, I saw it as fairly orthogonal to what we are
> attempting to do.

Right.

> > However, I have asked him to only apply that policy if the aperture is
> > indeed smaller, and if not, create struct pages that directly cover the
> > PCIe BAR of the GPU instead, which will work better on systems or
> > architecture that don't have a "pinhole" window limitation.
> > However he was under the impression that this was going to collide with
> > what you guys are doing, so I'm trying to understand how. 
> 
> I'm not sure I understand how either. However, I suspect if you collide
> with these patches then you'd also be breaking dax too.

Possibly but as I said, I don't see why so I'll let Jerome chime in
since he was under the impression that there was a conflict here :-)

Cheers,
Ben.

Re: [RFC 0/8] Copy Offload with Peer-to-Peer PCI Memory

2017-04-12 Thread Benjamin Herrenschmidt

On Wed, 2017-04-12 at 11:09 -0600, Logan Gunthorpe wrote:
> 
> > Do you handle funky address translation too ? IE. the fact that the PCI
> > addresses aren't the same as the CPU physical addresses for a BAR ?
> 
> No, we use the CPU physical address of the BAR. If it's not mapped that
> way we can't use it.

Ok, you need to fix that or a bunch of architectures won't work. 

Look at pcibios_resource_to_bus() and pcibios_bus_to_resource(). They
will perform the conversion between the struct resource content (CPU
physical address) and the actual PCI bus side address.

When behind the same switch you need to use PCI addresses. If one tries
later to do P2P between host bridges (via the CPU fabric) things get
more complex and one will have to use either CPU addresses or something
else alltogether (probably would have to teach the arch DMA mapping
routines to work with those struct pages you create and return the
right thing).

> > > This will mean many setups that could likely
> > > work well will not be supported so that we can be more confident it
> > > will work and not place any responsibility on the user to understand
> > > their topology. (We've chosen to go this route based on feedback we
> > > received at LSF).
> > > 
> > > In order to enable this functionality we introduce a new p2pmem device
> > > which can be instantiated by PCI drivers. The device will register some
> > > PCI memory as ZONE_DEVICE and provide an genalloc based allocator for
> > > users of these devices to get buffers.
> > 
> > I don't completely understand this. This is actual memory on the PCI
> > bus ? Where does it come from ? Or are you just trying to create struct
> > pages that cover your PCIe DMA target ?
> 
> Yes, the memory is on the PCI bus in a BAR. For now we have a special
> PCI card for this, but in the future it would likely be the CMB in an
> NVMe card. These patches create struct pages to map these BAR addresses
> using ZONE_DEVICE.

Ok.

So ideally we'd want things like dma_map_* to be able to be fed those
struct pages and do the right thing which is ... tricky, especially
with the address translation I mentioned since the address will be
different whether the initiator is on the same host bridge as the
target or not.

> > So correct me if I'm wrong, you are trying to create struct page's that
> > map a PCIe BAR right ? I'm trying to understand how that interacts with
> > what Jerome is doing for HMM.
> 
> Yes, well we are using ZONE_DEVICE in the exact same way as the dax code
> is. These patches use the existing API with no modifications. As I
> understand it, HMM was using ZONE_DEVICE in a way that was quite
> different to how it was originally designed.

Sort-of. I don't see why there would be a conflict with the struct
pages use though. Jerome can you chime in ? Jerome: It looks like they
are just laying out struct page over a BAR which is the same thing I
think you should do when the BAR is "large enough" for the GPU memory.

The case where HMM uses "holes" in the address space for its struct
page is somewhat orthogonal but I also see no conflict here.

> > The reason is that the HMM currently creates the struct pages with
> > "fake" PFNs pointing to a hole in the address space rather than
> > covering the actual PCIe memory of the GPU. He does that to deal with
> > the fact that some GPUs have a smaller aperture on PCIe than their
> > total memory.
> 
> I'm aware of what HMM is trying to do and although I'm not familiar with
> the intimate details, I saw it as fairly orthogonal to what we are
> attempting to do.

Right.

> > However, I have asked him to only apply that policy if the aperture is
> > indeed smaller, and if not, create struct pages that directly cover the
> > PCIe BAR of the GPU instead, which will work better on systems or
> > architecture that don't have a "pinhole" window limitation.
> > However he was under the impression that this was going to collide with
> > what you guys are doing, so I'm trying to understand how. 
> 
> I'm not sure I understand how either. However, I suspect if you collide
> with these patches then you'd also be breaking dax too.

Possibly but as I said, I don't see why so I'll let Jerome chime in
since he was under the impression that there was a conflict here :-)

Cheers,
Ben.

RE: [PATCH] ACPICA: Export mutex functions

2017-04-12 Thread Moore, Robert



> -Original Message-
> From: Guenter Roeck [mailto:li...@roeck-us.net]
> Sent: Wednesday, April 12, 2017 2:23 PM
> To: Moore, Robert 
> Cc: Zheng, Lv ; Wysocki, Rafael J
> ; Len Brown ; linux-
> a...@vger.kernel.org; de...@acpica.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ACPICA: Export mutex functions
> 
> On Wed, Apr 12, 2017 at 03:29:55PM +, Moore, Robert wrote:
> > The ACPICA mutex functions are based on the host OS functions, so they
> don't really buy you anything. You should just use the native Linux
> functions.
> >
> 
> You mean they don't really acquire the requested ACPI mutex, and the
> underlying DSDT which declares and uses the mutex just ignores if the
> mutex was acquired by acpi_acquire_mutex() ?
> 
[Moore, Robert] 

OK, I understand now. Yes, these mutex interfaces are in fact intended for the 
purpose you mention:

* FUNCTION:AcpiAcquireMutex
 *
 * PARAMETERS:  Handle  - Mutex or prefix handle (optional)
 *  Pathname- Mutex pathname (optional)
 *  Timeout - Max time to wait for the lock (millisec)
 *
 * RETURN:  Status
 *
 * DESCRIPTION: Acquire an AML mutex. This is a device driver interface to
 *  AML mutex objects, and allows for transaction locking between
 *  drivers and AML code. The mutex node is pointed to by
 *  Handle:Pathname. Either Handle or Pathname can be NULL, but
 *  not both.


And yes, both the acquire and release interfaces should be exported.





> To clarify: You are saying that code such as
> 
>   acpi_status status;
> 
>   status = acpi_acquire_mutex(NULL, "\\_SB.PCI0.SBRG.SIO1.MUT0",
> 0x10);
>   if (ACPI_FAILURE(status)) {
>   pr_err("Failed to acquire ACPI mutex\n");
>   return -EBUSY;
>   }
>   ...
> 
> when used in conjunction with
> 
>   ...
>   Mutex (MUT0, 0x00)
>   Method (ENFG, 1, NotSerialized)
>   {
>   Acquire (MUT0, 0x0FFF)
>   ...
>   }
> 
> doesn't really provide exclusive access to the resource(s) protected by
> MUT0, even if acpi_acquire_mutex() returns ACPI_SUCCESS ?
> 
> Outch. Really ?
> 
> Thanks,
> Guenter
> 
> >
> > > -Original Message-
> > > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > > Sent: Wednesday, April 12, 2017 8:13 AM
> > > To: Moore, Robert ; Zheng, Lv
> > > ; Wysocki, Rafael J
> > > ; Len Brown 
> > > Cc: linux-a...@vger.kernel.org; de...@acpica.org; linux-
> > > ker...@vger.kernel.org; Guenter Roeck 
> > > Subject: [PATCH] ACPICA: Export mutex functions
> > >
> > > Mutex functions may be needed by drivers. Examples are accesses to
> > > Super-IO SIO registers (0x2e/0x2f or 0x4e/0x4f) or Super-IO
> > > environmental monitor registers, both which may also be accessed
> > > through DSDT.
> > >
> > > Signed-off-by: Guenter Roeck 
> > > ---
> > >  drivers/acpi/acpica/utxfmutex.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpica/utxfmutex.c
> > > b/drivers/acpi/acpica/utxfmutex.c index c016211c35ae..5d20581f4b2f
> > > 100644
> > > --- a/drivers/acpi/acpica/utxfmutex.c
> > > +++ b/drivers/acpi/acpica/utxfmutex.c
> > > @@ -150,6 +150,7 @@ acpi_acquire_mutex(acpi_handle handle,
> > > acpi_string pathname, u16 timeout)
> > >   status = acpi_os_acquire_mutex(mutex_obj->mutex.os_mutex, timeout);
> > >   return (status);
> > >  }
> > > +ACPI_EXPORT_SYMBOL(acpi_acquire_mutex)
> > >
> > >
> > > /***
> > > 
> > > 
> > >   *
> > > @@ -185,3 +186,4 @@ acpi_status acpi_release_mutex(acpi_handle
> > > handle, acpi_string pathname)
> > >   acpi_os_release_mutex(mutex_obj->mutex.os_mutex);
> > >   return (AE_OK);
> > >  }
> > > +ACPI_EXPORT_SYMBOL(acpi_release_mutex)
> > > --
> > > 2.7.4
> >

RE: [PATCH] ACPICA: Export mutex functions

2017-04-12 Thread Moore, Robert



> -Original Message-
> From: Guenter Roeck [mailto:li...@roeck-us.net]
> Sent: Wednesday, April 12, 2017 2:23 PM
> To: Moore, Robert 
> Cc: Zheng, Lv ; Wysocki, Rafael J
> ; Len Brown ; linux-
> a...@vger.kernel.org; de...@acpica.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] ACPICA: Export mutex functions
> 
> On Wed, Apr 12, 2017 at 03:29:55PM +, Moore, Robert wrote:
> > The ACPICA mutex functions are based on the host OS functions, so they
> don't really buy you anything. You should just use the native Linux
> functions.
> >
> 
> You mean they don't really acquire the requested ACPI mutex, and the
> underlying DSDT which declares and uses the mutex just ignores if the
> mutex was acquired by acpi_acquire_mutex() ?
> 
[Moore, Robert] 

OK, I understand now. Yes, these mutex interfaces are in fact intended for the 
purpose you mention:

* FUNCTION:AcpiAcquireMutex
 *
 * PARAMETERS:  Handle  - Mutex or prefix handle (optional)
 *  Pathname- Mutex pathname (optional)
 *  Timeout - Max time to wait for the lock (millisec)
 *
 * RETURN:  Status
 *
 * DESCRIPTION: Acquire an AML mutex. This is a device driver interface to
 *  AML mutex objects, and allows for transaction locking between
 *  drivers and AML code. The mutex node is pointed to by
 *  Handle:Pathname. Either Handle or Pathname can be NULL, but
 *  not both.


And yes, both the acquire and release interfaces should be exported.





> To clarify: You are saying that code such as
> 
>   acpi_status status;
> 
>   status = acpi_acquire_mutex(NULL, "\\_SB.PCI0.SBRG.SIO1.MUT0",
> 0x10);
>   if (ACPI_FAILURE(status)) {
>   pr_err("Failed to acquire ACPI mutex\n");
>   return -EBUSY;
>   }
>   ...
> 
> when used in conjunction with
> 
>   ...
>   Mutex (MUT0, 0x00)
>   Method (ENFG, 1, NotSerialized)
>   {
>   Acquire (MUT0, 0x0FFF)
>   ...
>   }
> 
> doesn't really provide exclusive access to the resource(s) protected by
> MUT0, even if acpi_acquire_mutex() returns ACPI_SUCCESS ?
> 
> Outch. Really ?
> 
> Thanks,
> Guenter
> 
> >
> > > -Original Message-
> > > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > > Sent: Wednesday, April 12, 2017 8:13 AM
> > > To: Moore, Robert ; Zheng, Lv
> > > ; Wysocki, Rafael J
> > > ; Len Brown 
> > > Cc: linux-a...@vger.kernel.org; de...@acpica.org; linux-
> > > ker...@vger.kernel.org; Guenter Roeck 
> > > Subject: [PATCH] ACPICA: Export mutex functions
> > >
> > > Mutex functions may be needed by drivers. Examples are accesses to
> > > Super-IO SIO registers (0x2e/0x2f or 0x4e/0x4f) or Super-IO
> > > environmental monitor registers, both which may also be accessed
> > > through DSDT.
> > >
> > > Signed-off-by: Guenter Roeck 
> > > ---
> > >  drivers/acpi/acpica/utxfmutex.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpica/utxfmutex.c
> > > b/drivers/acpi/acpica/utxfmutex.c index c016211c35ae..5d20581f4b2f
> > > 100644
> > > --- a/drivers/acpi/acpica/utxfmutex.c
> > > +++ b/drivers/acpi/acpica/utxfmutex.c
> > > @@ -150,6 +150,7 @@ acpi_acquire_mutex(acpi_handle handle,
> > > acpi_string pathname, u16 timeout)
> > >   status = acpi_os_acquire_mutex(mutex_obj->mutex.os_mutex, timeout);
> > >   return (status);
> > >  }
> > > +ACPI_EXPORT_SYMBOL(acpi_acquire_mutex)
> > >
> > >
> > > /***
> > > 
> > > 
> > >   *
> > > @@ -185,3 +186,4 @@ acpi_status acpi_release_mutex(acpi_handle
> > > handle, acpi_string pathname)
> > >   acpi_os_release_mutex(mutex_obj->mutex.os_mutex);
> > >   return (AE_OK);
> > >  }
> > > +ACPI_EXPORT_SYMBOL(acpi_release_mutex)
> > > --
> > > 2.7.4
> >

Re: [PATCH v2 07/17] fs: new infrastructure for writeback error handling and reporting

2017-04-12 Thread NeilBrown

On Wed, Apr 12 2017, Jeff Layton wrote:


> +void __filemap_set_wb_error(struct address_space *mapping, int err)

I was really hoping that this would be

  void __set_wb_error(wb_err_t *wb_err, int err)

so

Then nfs_context_set_write_error could become

static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
{
__set_wb_error(>wb_err, error);
}

and filemap_set_sb_error() would be:

static inline void filemap_set_wb_error(struct address_space *mapping, int err)
{
/* Optimize for the common case of no error */
if (unlikely(err))
__set_wb_error(>f_wb_err, err);
}

Similarly we would have
  wb_err_t sample_wb_error(wb_err_t *wb_err)
  {
   ...
  }

and

wb_err_t filemap_sample_wb_error(struct address_space *mapping)
{
  return sample_wb_error(>f_wb_err);
}

so nfs_file_fsync_commit() could have
  ret = sample_wb_error(>wb_err);
in place of
ret = xchg(>error, 0);

int filemap_report_wb_error(struct file *file)

would become

int filemap_report_wb_error(struct file *file, wb_err_t *err)

or something.

The address space is just one (obvious) place where the wb error can be
stored.  The filesystem might have a different place with finer
granularity (nfs already does).


> +wb_err_t filemap_sample_wb_error(struct address_space *mapping)
> +{
> + wb_err_t old = READ_ONCE(mapping->wb_err);
> + wb_err_t new = old;
> +
> + /*
> +  * For the common case of no errors ever having been set, we can skip
> +  * marking the SEEN bit. Once an error has been set, the value will
> +  * never go back to zero.
> +  */
> + if (old != 0) {
> + new |= WB_ERR_SEEN;
> + if (old != new)
> + cmpxchg(>wb_err, old, new);
> + }
> + return new;
> +}

I do like how the use of cmpxchg work out here - no looping!

Thanks
NeilBrown


signature.asc
Description: PGP signature

Re: [PATCH v2 07/17] fs: new infrastructure for writeback error handling and reporting

2017-04-12 Thread NeilBrown

On Wed, Apr 12 2017, Jeff Layton wrote:


> +void __filemap_set_wb_error(struct address_space *mapping, int err)

I was really hoping that this would be

  void __set_wb_error(wb_err_t *wb_err, int err)

so

Then nfs_context_set_write_error could become

static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
{
__set_wb_error(>wb_err, error);
}

and filemap_set_sb_error() would be:

static inline void filemap_set_wb_error(struct address_space *mapping, int err)
{
/* Optimize for the common case of no error */
if (unlikely(err))
__set_wb_error(>f_wb_err, err);
}

Similarly we would have
  wb_err_t sample_wb_error(wb_err_t *wb_err)
  {
   ...
  }

and

wb_err_t filemap_sample_wb_error(struct address_space *mapping)
{
  return sample_wb_error(>f_wb_err);
}

so nfs_file_fsync_commit() could have
  ret = sample_wb_error(>wb_err);
in place of
ret = xchg(>error, 0);

int filemap_report_wb_error(struct file *file)

would become

int filemap_report_wb_error(struct file *file, wb_err_t *err)

or something.

The address space is just one (obvious) place where the wb error can be
stored.  The filesystem might have a different place with finer
granularity (nfs already does).


> +wb_err_t filemap_sample_wb_error(struct address_space *mapping)
> +{
> + wb_err_t old = READ_ONCE(mapping->wb_err);
> + wb_err_t new = old;
> +
> + /*
> +  * For the common case of no errors ever having been set, we can skip
> +  * marking the SEEN bit. Once an error has been set, the value will
> +  * never go back to zero.
> +  */
> + if (old != 0) {
> + new |= WB_ERR_SEEN;
> + if (old != new)
> + cmpxchg(>wb_err, old, new);
> + }
> + return new;
> +}

I do like how the use of cmpxchg work out here - no looping!

Thanks
NeilBrown


signature.asc
Description: PGP signature

Re: [PATCH v2 25/27] ia64: Use generic pci_mmap_resource_range()

2017-04-12 Thread Thomas Gleixner

On Wed, 12 Apr 2017, Tony Luck wrote:

> On Wed, Apr 12, 2017 at 5:26 AM, David Woodhouse  wrote:
> > From: David Woodhouse 
> >
> > Now that we eliminated the different behaviour in separately-reviewable
> > commits, we can switch IA64 to the generic implementation.
> >
> > Signed-off-by: David Woodhouse 
> 
> Well it builds and boots on my last remaining ia64 machine.

Does that mean ia64 reached the 'NCR Voyager' stage and can be scheduled
for removal?

Thanks,

tglx

Re: [PATCH v2 25/27] ia64: Use generic pci_mmap_resource_range()

2017-04-12 Thread Thomas Gleixner

On Wed, 12 Apr 2017, Tony Luck wrote:

> On Wed, Apr 12, 2017 at 5:26 AM, David Woodhouse  wrote:
> > From: David Woodhouse 
> >
> > Now that we eliminated the different behaviour in separately-reviewable
> > commits, we can switch IA64 to the generic implementation.
> >
> > Signed-off-by: David Woodhouse 
> 
> Well it builds and boots on my last remaining ia64 machine.

Does that mean ia64 reached the 'NCR Voyager' stage and can be scheduled
for removal?

Thanks,

tglx

Re: [PATCH] iommu/vt-d: Make sure IOMMUs are off when intel_iommu=off

2017-04-12 Thread Joerg Roedel

Hi Baoquan,

On Wed, Apr 12, 2017 at 09:40:56AM +0800, Baoquan He wrote:
> Do you plan to merge this one as urgent?
> 
> There's bug created about this issue on rhel, it would be great if it
> can be put in next or merged so that we can back port it.

No, I am not sending this for v4.11, because this issue existed forever
and is no regression. I queued it for v4.12 and the commit-id in the
iommu-tree will be the same as upstream, if you need that for your
backport.

Joerg

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Thomas Gleixner

On Wed, 12 Apr 2017, Borislav Petkov wrote:

> On Wed, Apr 12, 2017 at 08:27:05PM +, Verma, Vishal L wrote:
> > But isn't the atomic notifier call chain always called in atomic
> > context?
> 
> No, it isn't. We're calling it in normal process context in
> mce_gen_pool_process() too.
> 
> So this early exit will avoid any sleeping in atomic context. And since
> there's nothing you can do about the errors reported in atomic context,
> we can actually use that fact.

No, you can't.

CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
ever enter the handler.

The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
obviouly not disable preemption, but it should still trigger the
might_sleep() check when a blocking function is called from within a rcu
read side critical section.

Thanks,

tglx

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Thomas Gleixner

On Wed, 12 Apr 2017, Borislav Petkov wrote:

> On Wed, Apr 12, 2017 at 08:27:05PM +, Verma, Vishal L wrote:
> > But isn't the atomic notifier call chain always called in atomic
> > context?
> 
> No, it isn't. We're calling it in normal process context in
> mce_gen_pool_process() too.
> 
> So this early exit will avoid any sleeping in atomic context. And since
> there's nothing you can do about the errors reported in atomic context,
> we can actually use that fact.

No, you can't.

CONFIG_RCU_PREEMPT=n + CONFIG_PREEMPT_COUNT will disable preemption from
within __atomic_notifier_call_chain() via rcu_read_lock(). Ergo you wont
ever enter the handler.

The behaviour in the RCU code is inconsistent. CONFIG_RCU_PREEMPT=y does
obviouly not disable preemption, but it should still trigger the
might_sleep() check when a blocking function is called from within a rcu
read side critical section.

Thanks,

tglx

Re: [PATCH V3 01/16] block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler

2017-04-12 Thread kbuild test robot

Hi Paolo,

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.11-rc6 next-20170412]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Paolo-Valente/Introduce-the-BFQ-I-O-scheduler/20170412-021320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git 
for-next
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

Note: the 
linux-review/Paolo-Valente/Introduce-the-BFQ-I-O-scheduler/20170412-021320 HEAD 
36eb6533f8b6705991185201f75e98880cd223f7 builds fine.
  It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

 ^~~~
   block/bfq-iosched.c:4095:40: error: initializer element is not constant
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4105:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(timeout_sync),
 ^~~~
   block/bfq-iosched.c:4095:40: note: (near initialization for 
'bfq_attrs[7].store')
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4105:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(timeout_sync),
 ^~~~
   block/bfq-iosched.c:4095:21: error: initializer element is not constant
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
^
   include/linux/sysfs.h:103:10: note: in definition of macro '__ATTR'
 .show = _show,  \
 ^
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4095:21: note: (near initialization for 
'bfq_attrs[8].show')
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
^
   include/linux/sysfs.h:103:10: note: in definition of macro '__ATTR'
 .show = _show,  \
 ^
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4095:40: error: initializer element is not constant
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4095:40: note: (near initialization for 
'bfq_attrs[8].store')
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4112:19: error: initializer element is not constant
  .get_rq_priv  = bfq_get_rq_private,
  ^~
   block/bfq-iosched.c:4112:19: note: (near initialization for 
'iosched_bfq_mq.ops.mq.get_rq_priv')
   block/bfq-iosched.c:4113:19: error: initializer element is not constant
  .put_rq_priv  = bfq_put_rq_private,
  ^~
   block/bfq-iosched.c:4113:19: note: (near initialization for 
'iosched_bfq_mq.ops.mq.put_rq_priv')
   block/bfq-iosched.c:4114:16: error: initializer element is not constant
  .exit_icq  = bfq_exit_icq,
   ^~~~
   block/bfq-iosched.c:4114:16: note: (near initialization for 
'iosched_bfq_mq.ops.mq.exit_icq')
   block/bfq-iosched.c:4115:22: error: initializer element is not constant
  .insert_requests = bfq_insert_requests,
 ^~~
   block/bfq-iosched.c:4115:22: note: (near initialization for 
'iosched_bfq_mq.ops.mq.insert_requests')
   block/bfq-iosched.c:4116:23: error: initializer element is not constant
  .dispatch_request = bfq_dispatch_request,
  ^~~~
   block/bfq-iosched.c:4116:23: note: (near initialization for 
'iosched_bfq_mq.ops.mq.dispatch_request')
   block/bfq-iosched.c:4124:16: error: initializer element is not constant
  .has_work  = bfq_has_work,
   ^~~~
   block/bfq-iosched.c:4124:16: note:

Re: [PATCH V3 01/16] block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler

2017-04-12 Thread kbuild test robot

Hi Paolo,

[auto build test ERROR on block/for-next]
[also build test ERROR on v4.11-rc6 next-20170412]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Paolo-Valente/Introduce-the-BFQ-I-O-scheduler/20170412-021320
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git 
for-next
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

Note: the 
linux-review/Paolo-Valente/Introduce-the-BFQ-I-O-scheduler/20170412-021320 HEAD 
36eb6533f8b6705991185201f75e98880cd223f7 builds fine.
  It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

 ^~~~
   block/bfq-iosched.c:4095:40: error: initializer element is not constant
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4105:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(timeout_sync),
 ^~~~
   block/bfq-iosched.c:4095:40: note: (near initialization for 
'bfq_attrs[7].store')
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4105:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(timeout_sync),
 ^~~~
   block/bfq-iosched.c:4095:21: error: initializer element is not constant
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
^
   include/linux/sysfs.h:103:10: note: in definition of macro '__ATTR'
 .show = _show,  \
 ^
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4095:21: note: (near initialization for 
'bfq_attrs[8].show')
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
^
   include/linux/sysfs.h:103:10: note: in definition of macro '__ATTR'
 .show = _show,  \
 ^
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4095:40: error: initializer element is not constant
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4095:40: note: (near initialization for 
'bfq_attrs[8].store')
 __ATTR(name, 0644, bfq_##name##_show, bfq_##name##_store)
   ^
   include/linux/sysfs.h:104:11: note: in definition of macro '__ATTR'
 .store = _store,  \
  ^~
   block/bfq-iosched.c:4106:2: note: in expansion of macro 'BFQ_ATTR'
 BFQ_ATTR(strict_guarantees),
 ^~~~
   block/bfq-iosched.c:4112:19: error: initializer element is not constant
  .get_rq_priv  = bfq_get_rq_private,
  ^~
   block/bfq-iosched.c:4112:19: note: (near initialization for 
'iosched_bfq_mq.ops.mq.get_rq_priv')
   block/bfq-iosched.c:4113:19: error: initializer element is not constant
  .put_rq_priv  = bfq_put_rq_private,
  ^~
   block/bfq-iosched.c:4113:19: note: (near initialization for 
'iosched_bfq_mq.ops.mq.put_rq_priv')
   block/bfq-iosched.c:4114:16: error: initializer element is not constant
  .exit_icq  = bfq_exit_icq,
   ^~~~
   block/bfq-iosched.c:4114:16: note: (near initialization for 
'iosched_bfq_mq.ops.mq.exit_icq')
   block/bfq-iosched.c:4115:22: error: initializer element is not constant
  .insert_requests = bfq_insert_requests,
 ^~~
   block/bfq-iosched.c:4115:22: note: (near initialization for 
'iosched_bfq_mq.ops.mq.insert_requests')
   block/bfq-iosched.c:4116:23: error: initializer element is not constant
  .dispatch_request = bfq_dispatch_request,
  ^~~~
   block/bfq-iosched.c:4116:23: note: (near initialization for 
'iosched_bfq_mq.ops.mq.dispatch_request')
   block/bfq-iosched.c:4124:16: error: initializer element is not constant
  .has_work  = bfq_has_work,
   ^~~~
   block/bfq-iosched.c:4124:16: note:

RE: [Regression Linux 4.11] TPM module not loaded anymore

2017-04-12 Thread Moore, Robert



> -Original Message-
> From: Paul Menzel [mailto:pmen...@molgen.mpg.de]
> Sent: Wednesday, April 12, 2017 2:27 PM
> To: Moore, Robert 
> Cc: Jarkko Sakkinen ; Maciej S.
> Szmigiero ; linux-kernel@vger.kernel.org;
> Arthur Heymans ; tpmdd-de...@lists.sourceforge.net;
> gnu...@no-log.org; Zheng, Lv ; Wysocki, Rafael J
> ; linux-a...@vger.linux.org
> Subject: RE: [Regression Linux 4.11] TPM module not loaded anymore
> 
> Dear Robert,
> 
> 
> Thank you for looking into this.
> 
> 
> On 2017-04-12 17:54, Moore, Robert wrote:
> > And probably the dmesg if error messages appear in there.
> 
> Linux doesn’t log any messages, as the `tpm` module doesn’t load. Please
> find the output of `sudo acpidump` attached.
> 
> […]
> 
[Moore, Robert] 

Do you have any idea what control method(s) are executing? The DSDT and SSDT in 
the acpidump load fine here, and all predefined control methods in these tables 
execute OK.

We need to root-cause this problem, as a simple revert will break the customers 
that the fix was intended for in the first place.

Bob


> 
> Kind regards,
> 
> Paul
> 
> 
> PS: Doesn’t your mail client support to easily reply in interleaved
> style?

RE: [Regression Linux 4.11] TPM module not loaded anymore

2017-04-12 Thread Moore, Robert



> -Original Message-
> From: Paul Menzel [mailto:pmen...@molgen.mpg.de]
> Sent: Wednesday, April 12, 2017 2:27 PM
> To: Moore, Robert 
> Cc: Jarkko Sakkinen ; Maciej S.
> Szmigiero ; linux-kernel@vger.kernel.org;
> Arthur Heymans ; tpmdd-de...@lists.sourceforge.net;
> gnu...@no-log.org; Zheng, Lv ; Wysocki, Rafael J
> ; linux-a...@vger.linux.org
> Subject: RE: [Regression Linux 4.11] TPM module not loaded anymore
> 
> Dear Robert,
> 
> 
> Thank you for looking into this.
> 
> 
> On 2017-04-12 17:54, Moore, Robert wrote:
> > And probably the dmesg if error messages appear in there.
> 
> Linux doesn’t log any messages, as the `tpm` module doesn’t load. Please
> find the output of `sudo acpidump` attached.
> 
> […]
> 
[Moore, Robert] 

Do you have any idea what control method(s) are executing? The DSDT and SSDT in 
the acpidump load fine here, and all predefined control methods in these tables 
execute OK.

We need to root-cause this problem, as a simple revert will break the customers 
that the fix was intended for in the first place.

Bob


> 
> Kind regards,
> 
> Paul
> 
> 
> PS: Doesn’t your mail client support to easily reply in interleaved
> style?

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-12 Thread Florian Fainelli

On 04/11/2017 04:23 PM, Florian Fainelli wrote:
> On 04/11/2017 04:14 PM, Andrew Lunn wrote:
>>> To give some more background and rational for this change.
>>>
>>> On a platform where we have a parent MDIO bus, backed by the
>>> mdio-bcm-unimac.c driver, we also register a slave MII bus (through
>>> net/dsa/dsa2.c) which is parented to this UniMAC MDIO bus through an
>>> assignment of of_node. This slave MII bus is created in order to
>>> intercept reads/writes to problematic addresses (e.g: that clashes with
>>> another piece of hardware).
>>>
>>> This means that the slave DSA MII bus inherits all child nodes from the
>>> originating master MII bus. This also means that when the slave MII bus
>>> is probed via of_mdiobus_register(), we probe the same devices twice:
>>> once through the master, another time through the slave.
>>
>> Ah, O.K. This makes more sense. On the hardware i have, we get three
>> deep in MDIO busses. We have the FEC mdio bus. On top of that we have
>> a gpio-mux-mdio, and on top of that we have the mv88e6xxx mdio
>> bus. And i've never seen issues.
>>
>> So your real problem here is you have two mdio busses using the same
>> device tree properties. I would actually say that is just plain
>> broken.
> 
> From a Device Tree/HW representation perspective, we do have the
> external BCM53125 switch physically attached to the 7445/7278
> SWITCH_MDIO bus (backed by mdio-bcm-unimac) so in that regard the
> representation is correct. There is also an integrated Gigabit PHY
> (bcm7xxx) which is attached to that bus.
> 
> From a SW perspective though, we want to talk to the integrated Gigabit
> PHY using mdio-bcm-unimac but talk to the external BCM53125 switch using
> the slave MII bus created by the bcm_sf2 driver in order to create an
> isolation. We need to inherit some of the parent (mdio-bcm-unimac) child
> DT nodes (such as the BCM53125), but not the GPHY. The easiest solution
> I found was to use this patch.
> 
> Using mdiobus_register() instead of of_mdiobus_register() was
> considered, but then, the child BCM53125 has no more "visbility" into
> the OF world at all, and it matters, because this switch is also driven
> via a DSA switch driver and its Ethernet data-path is connected to one
> port of the bcm_sf2 switch..
> 
> Thankfully the HW bug was fixed eventually ;)

In fact, all I need is to flag the internal Gigabit PHY for the slave
MII bus node with something that makes it appear as "disabled" which I
can presumably do with of_update_property() and putting a status =
"disabled" property in there. Let me do something like that and see how
big of a hack this becomes.
-- 
Florian

Re: [RFC net-next] of: mdio: Honor hints from MDIO bus drivers

2017-04-12 Thread Florian Fainelli

On 04/11/2017 04:23 PM, Florian Fainelli wrote:
> On 04/11/2017 04:14 PM, Andrew Lunn wrote:
>>> To give some more background and rational for this change.
>>>
>>> On a platform where we have a parent MDIO bus, backed by the
>>> mdio-bcm-unimac.c driver, we also register a slave MII bus (through
>>> net/dsa/dsa2.c) which is parented to this UniMAC MDIO bus through an
>>> assignment of of_node. This slave MII bus is created in order to
>>> intercept reads/writes to problematic addresses (e.g: that clashes with
>>> another piece of hardware).
>>>
>>> This means that the slave DSA MII bus inherits all child nodes from the
>>> originating master MII bus. This also means that when the slave MII bus
>>> is probed via of_mdiobus_register(), we probe the same devices twice:
>>> once through the master, another time through the slave.
>>
>> Ah, O.K. This makes more sense. On the hardware i have, we get three
>> deep in MDIO busses. We have the FEC mdio bus. On top of that we have
>> a gpio-mux-mdio, and on top of that we have the mv88e6xxx mdio
>> bus. And i've never seen issues.
>>
>> So your real problem here is you have two mdio busses using the same
>> device tree properties. I would actually say that is just plain
>> broken.
> 
> From a Device Tree/HW representation perspective, we do have the
> external BCM53125 switch physically attached to the 7445/7278
> SWITCH_MDIO bus (backed by mdio-bcm-unimac) so in that regard the
> representation is correct. There is also an integrated Gigabit PHY
> (bcm7xxx) which is attached to that bus.
> 
> From a SW perspective though, we want to talk to the integrated Gigabit
> PHY using mdio-bcm-unimac but talk to the external BCM53125 switch using
> the slave MII bus created by the bcm_sf2 driver in order to create an
> isolation. We need to inherit some of the parent (mdio-bcm-unimac) child
> DT nodes (such as the BCM53125), but not the GPHY. The easiest solution
> I found was to use this patch.
> 
> Using mdiobus_register() instead of of_mdiobus_register() was
> considered, but then, the child BCM53125 has no more "visbility" into
> the OF world at all, and it matters, because this switch is also driven
> via a DSA switch driver and its Ethernet data-path is connected to one
> port of the bcm_sf2 switch..
> 
> Thankfully the HW bug was fixed eventually ;)

In fact, all I need is to flag the internal Gigabit PHY for the slave
MII bus node with something that makes it appear as "disabled" which I
can presumably do with of_update_property() and putting a status =
"disabled" property in there. Let me do something like that and see how
big of a hack this becomes.
-- 
Florian

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 02:19:32PM -0700, Luck, Tony wrote:
> On Wed, Apr 12, 2017 at 11:12:21PM +0200, Thomas Gleixner wrote:
> > There is another solution:
> > 
> > Convert the notifier to a blocking notifier and in the panic case, ignore
> > the locking and invoke the notifier chain directly. That needs some minimal
> > surgery in the notifier code to allow that, but that's certainly less ugly
> > than splitting stuff up into two chains.
> 
> But I wonder whether we actually want two chains.  We've been adding a bunch
> of general run-time logging and recovery stuff to this chain. So now we have
> things there that aren't needed or useful in the panic case. E.g.
> srao_decode_notifier() (which tries to offline a page that reported an
> uncorrected error out of the execution path) and Boris's new CEC code.

I guess we'll have to. The CEC thing does mutex_lock() too and the
atomic notifier disables preemption:

__atomic_notifier_call_chain()
 rcu_read_lock()
  __rcu_read_lock()
if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
preempt_disable();

so we need to think about something better to handle events down the
whole chain. Maybe route events from the atomic path to the blocking
path where the sleeping notifier callbacks can sleep as much as they
want to...

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 02:19:32PM -0700, Luck, Tony wrote:
> On Wed, Apr 12, 2017 at 11:12:21PM +0200, Thomas Gleixner wrote:
> > There is another solution:
> > 
> > Convert the notifier to a blocking notifier and in the panic case, ignore
> > the locking and invoke the notifier chain directly. That needs some minimal
> > surgery in the notifier code to allow that, but that's certainly less ugly
> > than splitting stuff up into two chains.
> 
> But I wonder whether we actually want two chains.  We've been adding a bunch
> of general run-time logging and recovery stuff to this chain. So now we have
> things there that aren't needed or useful in the panic case. E.g.
> srao_decode_notifier() (which tries to offline a page that reported an
> uncorrected error out of the execution path) and Boris's new CEC code.

I guess we'll have to. The CEC thing does mutex_lock() too and the
atomic notifier disables preemption:

__atomic_notifier_call_chain()
 rcu_read_lock()
  __rcu_read_lock()
if (IS_ENABLED(CONFIG_PREEMPT_COUNT))
preempt_disable();

so we need to think about something better to handle events down the
whole chain. Maybe route events from the atomic path to the blocking
path where the sleeping notifier callbacks can sleep as much as they
want to...

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [PATCH v2 25/27] ia64: Use generic pci_mmap_resource_range()

2017-04-12 Thread Tony Luck

On Wed, Apr 12, 2017 at 5:26 AM, David Woodhouse  wrote:
> From: David Woodhouse 
>
> Now that we eliminated the different behaviour in separately-reviewable
> commits, we can switch IA64 to the generic implementation.
>
> Signed-off-by: David Woodhouse 

Well it builds and boots on my last remaining ia64 machine. No warnings
or weird stuff in the console log.  So you can mark the three ia64 patches

Tested-by: Tony Luck 

and bundle them with the others rather than through the ia64 tree.

-Tony

Re: [PATCH v2 25/27] ia64: Use generic pci_mmap_resource_range()

2017-04-12 Thread Tony Luck

On Wed, Apr 12, 2017 at 5:26 AM, David Woodhouse  wrote:
> From: David Woodhouse 
>
> Now that we eliminated the different behaviour in separately-reviewable
> commits, we can switch IA64 to the generic implementation.
>
> Signed-off-by: David Woodhouse 

Well it builds and boots on my last remaining ia64 machine. No warnings
or weird stuff in the console log.  So you can mark the three ia64 patches

Tested-by: Tony Luck 

and bundle them with the others rather than through the ia64 tree.

-Tony

Re: [PATCH v2 06/17] mm: doc comment for scary spot in write_one_page

2017-04-12 Thread NeilBrown

On Wed, Apr 12 2017, Jeff Layton wrote:

> On Wed, 2017-04-12 at 07:38 -0700, Matthew Wilcox wrote:
>> On Wed, Apr 12, 2017 at 09:01:34AM -0400, Jeff Layton wrote:
>> > On Wed, 2017-04-12 at 08:06 -0400, Jeff Layton wrote:
>> > > Not sure what to do here just yet.
>> > > 
>> > > Signed-off-by: Jeff Layton 
>> > > ---
>> > >  mm/page-writeback.c | 6 ++
>> > >  1 file changed, 6 insertions(+)
>> > > 
>> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> > > index de0dbf12e2c1..3ac8399dc984 100644
>> > > --- a/mm/page-writeback.c
>> > > +++ b/mm/page-writeback.c
>> > > @@ -2388,6 +2388,12 @@ int write_one_page(struct page *page)
>> > >  ret = mapping->a_ops->writepage(page, );
>> > >  if (ret == 0) {
>> > >  wait_on_page_writeback(page);
>> > > +/*
>> > > + * FIXME: is this racy? What guarantees that 
>> > > PG_error
>> > > + * will still be set once we get around to 
>> > > checking it?
>> > > + * What if writeback fails, but then a read is 
>> > > issued
>> > > + * before we check this, and that calls 
>> > > ClearPageError?
>> > > + */
>> > >  if (PageError(page))
>> > >  ret = -EIO;
>> > >  }
>> > 
>> > Ahh, we are always under the page lock here, and this is generally used
>> > for writing out directory pages anyway. I'm fine with dropping this
>> > patch unless someone else sees a problem here.
>> 
>> ->writepage drops the page lock.  We're still holding a refcount on this
>> page, but that's not going to prevent read being called.  But maybe the
>> filesystem won't call read on a page that's marked as PageError?
>
> Hard to be sure there. I really wonder if that check is needed at all,
> the more I look at it. After all, we are calling writepage with
> WB_SYNC_ALL so we should get an error there.

WB_SYNC_ALL doesn't cause writepage to wait.  It might case it to ask
for REQ_SYNC, so the write requests gets priority in the block layer.
WB_SYNC_ALL does cause writepages (with an 's') to wait.
(At least, that is how I read the code).

>
> Is it also possible these pages could be written back before that point
> (due to memory pressure or something) and that fail?

Probably, in which case clear_page_dirty_for_io() will fail and
write_one_page() will just unlock the page.

>
> Maybe we should just have a call to filemap_check_errors on exiting
> this function?

I'm leaning in that direction.

>
> With the the wb_err_t based stuff, we could change it to sample the
> wb_err early, and then use that to see if an error has occurred since
> then. Maybe we should even allow callers to pass a wb_err_t in here, so
> we can report errors that have occurred since a known point?

That feels to me like over-engineering.  We would need to
unconditionally call writepage() for that to work.

We seem to be agreed that write errors for buffered writes are reported
per-address-space.  To get per-page errors you have to use direct IO.
Let's focus on that policy and make it work.

Thanks,
NeilBrown



signature.asc
Description: PGP signature

Re: [PATCH v2 06/17] mm: doc comment for scary spot in write_one_page

2017-04-12 Thread NeilBrown

On Wed, Apr 12 2017, Jeff Layton wrote:

> On Wed, 2017-04-12 at 07:38 -0700, Matthew Wilcox wrote:
>> On Wed, Apr 12, 2017 at 09:01:34AM -0400, Jeff Layton wrote:
>> > On Wed, 2017-04-12 at 08:06 -0400, Jeff Layton wrote:
>> > > Not sure what to do here just yet.
>> > > 
>> > > Signed-off-by: Jeff Layton 
>> > > ---
>> > >  mm/page-writeback.c | 6 ++
>> > >  1 file changed, 6 insertions(+)
>> > > 
>> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> > > index de0dbf12e2c1..3ac8399dc984 100644
>> > > --- a/mm/page-writeback.c
>> > > +++ b/mm/page-writeback.c
>> > > @@ -2388,6 +2388,12 @@ int write_one_page(struct page *page)
>> > >  ret = mapping->a_ops->writepage(page, );
>> > >  if (ret == 0) {
>> > >  wait_on_page_writeback(page);
>> > > +/*
>> > > + * FIXME: is this racy? What guarantees that 
>> > > PG_error
>> > > + * will still be set once we get around to 
>> > > checking it?
>> > > + * What if writeback fails, but then a read is 
>> > > issued
>> > > + * before we check this, and that calls 
>> > > ClearPageError?
>> > > + */
>> > >  if (PageError(page))
>> > >  ret = -EIO;
>> > >  }
>> > 
>> > Ahh, we are always under the page lock here, and this is generally used
>> > for writing out directory pages anyway. I'm fine with dropping this
>> > patch unless someone else sees a problem here.
>> 
>> ->writepage drops the page lock.  We're still holding a refcount on this
>> page, but that's not going to prevent read being called.  But maybe the
>> filesystem won't call read on a page that's marked as PageError?
>
> Hard to be sure there. I really wonder if that check is needed at all,
> the more I look at it. After all, we are calling writepage with
> WB_SYNC_ALL so we should get an error there.

WB_SYNC_ALL doesn't cause writepage to wait.  It might case it to ask
for REQ_SYNC, so the write requests gets priority in the block layer.
WB_SYNC_ALL does cause writepages (with an 's') to wait.
(At least, that is how I read the code).

>
> Is it also possible these pages could be written back before that point
> (due to memory pressure or something) and that fail?

Probably, in which case clear_page_dirty_for_io() will fail and
write_one_page() will just unlock the page.

>
> Maybe we should just have a call to filemap_check_errors on exiting
> this function?

I'm leaning in that direction.

>
> With the the wb_err_t based stuff, we could change it to sample the
> wb_err early, and then use that to see if an error has occurred since
> then. Maybe we should even allow callers to pass a wb_err_t in here, so
> we can report errors that have occurred since a known point?

That feels to me like over-engineering.  We would need to
unconditionally call writepage() for that to work.

We seem to be agreed that write errors for buffered writes are reported
per-address-space.  To get per-page errors you have to use direct IO.
Let's focus on that policy and make it work.

Thanks,
NeilBrown



signature.asc
Description: PGP signature

RE: [Regression Linux 4.11] TPM module not loaded anymore

2017-04-12 Thread Paul Menzel


Dear Robert,


Thank you for looking into this.


On 2017-04-12 17:54, Moore, Robert wrote:

And probably the dmesg if error messages appear in there.


Linux doesn’t log any messages, as the `tpm` module doesn’t load. Please 
find the output of `sudo acpidump` attached.


[…]


Kind regards,

Paul


PS: Doesn’t your mail client support to easily reply in interleaved 
style?RSD  @ 0x000F0800
  : 52 53 44 20 50 54 52 20 05 43 4F 52 45 20 20 02  RSD PTR .CORE  .
  0010: 30 50 72 7F 24 00 00 00 E0 50 72 7F 00 00 00 00  0Pr.$Pr.
  0020: BB 00 00 00  

RSDT @ 0x7F725030
  : 52 53 44 54 3C 00 00 00 01 23 43 4F 52 45 20 20  RSDT<#CORE  
  0010: 43 4F 52 45 42 4F 4F 54 00 00 00 00 43 4F 52 45  COREBOOTCORE
  0020: 00 00 00 00 E0 80 72 7F E0 81 72 7F 10 87 72 7F  ..r...r...r.
  0030: 50 87 72 7F 90 87 72 7F 00 88 72 7F  P.r...r...r.

XSDT @ 0x7F7250E0
  : 58 53 44 54 54 00 00 00 01 05 43 4F 52 45 20 20  XSDTT.CORE  
  0010: 43 4F 52 45 42 4F 4F 54 00 00 00 00 43 4F 52 45  COREBOOTCORE
  0020: 00 00 00 00 E0 80 72 7F 00 00 00 00 E0 81 72 7F  ..r...r.
  0030: 00 00 00 00 10 87 72 7F 00 00 00 00 50 87 72 7F  ..r.P.r.
  0040: 00 00 00 00 90 87 72 7F 00 00 00 00 00 88 72 7F  ..r...r.
  0050: 00 00 00 00  

DSDT @ 0x7F725280
  : 44 53 44 54 58 2E 00 00 03 18 43 4F 52 45 76 34  DSDTX.COREv4
  0010: 43 4F 52 45 42 4F 4F 54 19 04 09 20 49 4E 54 4C  COREBOOT... INTL
  0020: 26 09 14 20 10 8F 00 00 5C 00 08 4E 56 53 41 0C  &.. \..NVSA.
  0030: 80 E9 7F 7F 14 4F 04 5F 50 54 53 01 5C 2F 05 5F  .O._PTS.\/._
  0040: 53 42 5F 50 43 49 30 4C 50 43 42 45 43 5F 5F 4D  SB_PCI0LPCBEC__M
  0050: 55 54 45 01 5C 2F 05 5F 53 42 5F 50 43 49 30 4C  UTE.\/._SB_PCI0L
  0060: 50 43 42 45 43 5F 5F 55 53 42 50 00 5C 2F 05 5F  PCBEC__USBP.\/._
  0070: 53 42 5F 50 43 49 30 4C 50 43 42 45 43 5F 5F 52  SB_PCI0LPCBEC__R
  0080: 41 44 49 00 14 18 5F 57 41 4B 01 A0 05 93 68 0A  ADI..._WAKh.
  0090: 03 A0 05 93 68 0A 04 A4 12 04 02 00 00 10 1F 5F  h.._
  00A0: 53 42 5F 14 19 5F 49 4E 49 00 47 4F 53 5F A0 0E  SB_.._INI.GOS_..
  00B0: 90 93 4F 53 59 53 0B D1 07 4D 50 45 4E 08 50 49  ..OSYS...MPEN.PI
  00C0: 43 4D 00 08 44 53 45 4E 01 5B 80 47 4E 56 53 00  CM..DSEN.[.GNVS.
  00D0: 4E 56 53 41 0B 00 01 5B 81 4E 22 47 4E 56 53 01  NVSA...[.N"GNVS.
  00E0: 4F 53 59 53 10 53 4D 49 46 08 50 52 4D 30 08 50  OSYS.SMIF.PRM0.P
  00F0: 52 4D 31 08 53 43 49 46 08 50 52 4D 32 08 50 52  RM1.SCIF.PRM2.PR
  0100: 4D 33 08 4C 43 4B 46 08 50 52 4D 34 08 50 52 4D  M3.LCKF.PRM4.PRM
  0110: 35 08 50 38 30 44 20 4C 49 44 53 08 50 57 52 53  5.P80D LIDS.PWRS
  0120: 08 44 42 47 53 08 4C 49 4E 58 08 44 43 4B 4E 08  .DBGS.LINX.DCKN.
  0130: 41 43 54 54 08 50 53 56 54 08 54 43 31 56 08 54  ACTT.PSVT.TC1V.T
  0140: 43 32 56 08 54 53 50 56 08 43 52 54 54 08 44 54  C2V.TSPV.CRTT.DT
  0150: 53 45 08 44 54 53 31 08 44 54 53 32 08 00 08 42  SE.DTS1.DTS2...B
  0160: 4E 55 4D 08 42 30 53 43 08 42 31 53 43 08 42 32  NUM.B0SC.B1SC.B2
  0170: 53 43 08 42 30 53 53 08 42 31 53 53 08 42 32 53  SC.B0SS.B1SS.B2S
  0180: 53 08 00 18 41 50 49 43 08 4D 50 45 4E 08 50 43  S...APIC.MPEN.PC
  0190: 50 30 08 50 43 50 31 08 50 50 43 4D 08 00 28 4E  P0.PCP1.PPCM..(N
  01A0: 41 54 50 08 43 4D 41 50 08 43 4D 42 50 08 4C 50  ATP.CMAP.CMBP.LP
  01B0: 54 50 08 46 44 43 50 08 52 46 44 56 08 48 4F 54  TP.FDCP.RFDV.HOT
  01C0: 4B 08 52 54 43 46 08 55 54 49 4C 08 41 43 49 4E  K.RTCF.UTIL.ACIN
  01D0: 08 49 47 44 53 08 54 4C 53 54 08 43 41 44 4C 08  .IGDS.TLST.CADL.
  01E0: 50 41 44 4C 08 43 53 54 45 10 4E 53 54 45 10 53  PADL.CSTE.NSTE.S
  01F0: 53 54 45 10 4E 44 49 44 08 44 49 44 31 20 44 49  STE.NDID.DID1 DI
  0200: 44 32 20 44 49 44 33 20 44 49 44 34 20 44 49 44  D2 DID3 DID4 DID
  0210: 35 20 00 48 04 42 4C 43 53 08 42 52 54 4C 08 4F  5 .H.BLCS.BRTL.O
  0220: 44 44 53 08 00 38 41 4C 53 45 08 41 4C 41 46 08  DDS..8ALSE.ALAF.
  0230: 4C 4C 4F 57 08 4C 48 49 48 08 00 30 45 4D 41 45  LLOW.LHIH..0EMAE
  0240: 08 45 4D 41 50 10 45 4D 41 4C 10 00 28 4D 45 46  .EMAP.EMAL..(MEF
  0250: 45 08 00 48 04 54 50 4D 50 08 54 50 4D 45 08 00  E..H.TPMP.TPME..
  0260: 40 04 47 54 46 30 38 47 54 46 31 38 47 54 46 32  @.GTF08GTF18GTF2
  0270: 38 49 44 45 4D 08 49 44 45 54 08 00 38 41 53 4C  8IDEM.IDET..8ASL
  0280: 42 20 49 42 54 54 08 49 50 41 54 08 49 54 56 46  B IBTT.IPAT.ITVF
  0290: 08 49 54 56 4D 08 49 50 53 43 08 49 42 4C 43 08  .ITVM.IPSC.IBLC.
  02A0: 49 42 49 41 08 49 53 53 43 08 49 34 30 39 08 49  IBIA.ISSC.I409.I
  02B0: 35 30 39 08 49 36 30 39 08 49 37 30 39 08 49 44  509.I609.I709.ID
  02C0: 4D 4D 08 49 44 4D 53 08 49 46 31 45 08 48 56 43  MM.IDMS.IF1E.HVC
  02D0: 4F 08 4E 58 44 31 20 4E 58 44 32 20 4E 58 44 33  O.NXD1 NXD2 NXD3
  02E0: 20 4E 58 44 34 20 4E 58 44 35 20 4E 58 44 36 20   NXD4 NXD5 NXD6 
  02F0: 4E 58 44 37 20 4E 58 44 38 20 00 40 04 44 4F 43  NXD7 NXD8 .@.DOC
  0300: 4B 08 42 54 45 4E

RE: [Regression Linux 4.11] TPM module not loaded anymore

2017-04-12 Thread Paul Menzel


Dear Robert,


Thank you for looking into this.


On 2017-04-12 17:54, Moore, Robert wrote:

And probably the dmesg if error messages appear in there.


Linux doesn’t log any messages, as the `tpm` module doesn’t load. Please 
find the output of `sudo acpidump` attached.


[…]


Kind regards,

Paul


PS: Doesn’t your mail client support to easily reply in interleaved 
style?RSD  @ 0x000F0800
  : 52 53 44 20 50 54 52 20 05 43 4F 52 45 20 20 02  RSD PTR .CORE  .
  0010: 30 50 72 7F 24 00 00 00 E0 50 72 7F 00 00 00 00  0Pr.$Pr.
  0020: BB 00 00 00  

RSDT @ 0x7F725030
  : 52 53 44 54 3C 00 00 00 01 23 43 4F 52 45 20 20  RSDT<#CORE  
  0010: 43 4F 52 45 42 4F 4F 54 00 00 00 00 43 4F 52 45  COREBOOTCORE
  0020: 00 00 00 00 E0 80 72 7F E0 81 72 7F 10 87 72 7F  ..r...r...r.
  0030: 50 87 72 7F 90 87 72 7F 00 88 72 7F  P.r...r...r.

XSDT @ 0x7F7250E0
  : 58 53 44 54 54 00 00 00 01 05 43 4F 52 45 20 20  XSDTT.CORE  
  0010: 43 4F 52 45 42 4F 4F 54 00 00 00 00 43 4F 52 45  COREBOOTCORE
  0020: 00 00 00 00 E0 80 72 7F 00 00 00 00 E0 81 72 7F  ..r...r.
  0030: 00 00 00 00 10 87 72 7F 00 00 00 00 50 87 72 7F  ..r.P.r.
  0040: 00 00 00 00 90 87 72 7F 00 00 00 00 00 88 72 7F  ..r...r.
  0050: 00 00 00 00  

DSDT @ 0x7F725280
  : 44 53 44 54 58 2E 00 00 03 18 43 4F 52 45 76 34  DSDTX.COREv4
  0010: 43 4F 52 45 42 4F 4F 54 19 04 09 20 49 4E 54 4C  COREBOOT... INTL
  0020: 26 09 14 20 10 8F 00 00 5C 00 08 4E 56 53 41 0C  &.. \..NVSA.
  0030: 80 E9 7F 7F 14 4F 04 5F 50 54 53 01 5C 2F 05 5F  .O._PTS.\/._
  0040: 53 42 5F 50 43 49 30 4C 50 43 42 45 43 5F 5F 4D  SB_PCI0LPCBEC__M
  0050: 55 54 45 01 5C 2F 05 5F 53 42 5F 50 43 49 30 4C  UTE.\/._SB_PCI0L
  0060: 50 43 42 45 43 5F 5F 55 53 42 50 00 5C 2F 05 5F  PCBEC__USBP.\/._
  0070: 53 42 5F 50 43 49 30 4C 50 43 42 45 43 5F 5F 52  SB_PCI0LPCBEC__R
  0080: 41 44 49 00 14 18 5F 57 41 4B 01 A0 05 93 68 0A  ADI..._WAKh.
  0090: 03 A0 05 93 68 0A 04 A4 12 04 02 00 00 10 1F 5F  h.._
  00A0: 53 42 5F 14 19 5F 49 4E 49 00 47 4F 53 5F A0 0E  SB_.._INI.GOS_..
  00B0: 90 93 4F 53 59 53 0B D1 07 4D 50 45 4E 08 50 49  ..OSYS...MPEN.PI
  00C0: 43 4D 00 08 44 53 45 4E 01 5B 80 47 4E 56 53 00  CM..DSEN.[.GNVS.
  00D0: 4E 56 53 41 0B 00 01 5B 81 4E 22 47 4E 56 53 01  NVSA...[.N"GNVS.
  00E0: 4F 53 59 53 10 53 4D 49 46 08 50 52 4D 30 08 50  OSYS.SMIF.PRM0.P
  00F0: 52 4D 31 08 53 43 49 46 08 50 52 4D 32 08 50 52  RM1.SCIF.PRM2.PR
  0100: 4D 33 08 4C 43 4B 46 08 50 52 4D 34 08 50 52 4D  M3.LCKF.PRM4.PRM
  0110: 35 08 50 38 30 44 20 4C 49 44 53 08 50 57 52 53  5.P80D LIDS.PWRS
  0120: 08 44 42 47 53 08 4C 49 4E 58 08 44 43 4B 4E 08  .DBGS.LINX.DCKN.
  0130: 41 43 54 54 08 50 53 56 54 08 54 43 31 56 08 54  ACTT.PSVT.TC1V.T
  0140: 43 32 56 08 54 53 50 56 08 43 52 54 54 08 44 54  C2V.TSPV.CRTT.DT
  0150: 53 45 08 44 54 53 31 08 44 54 53 32 08 00 08 42  SE.DTS1.DTS2...B
  0160: 4E 55 4D 08 42 30 53 43 08 42 31 53 43 08 42 32  NUM.B0SC.B1SC.B2
  0170: 53 43 08 42 30 53 53 08 42 31 53 53 08 42 32 53  SC.B0SS.B1SS.B2S
  0180: 53 08 00 18 41 50 49 43 08 4D 50 45 4E 08 50 43  S...APIC.MPEN.PC
  0190: 50 30 08 50 43 50 31 08 50 50 43 4D 08 00 28 4E  P0.PCP1.PPCM..(N
  01A0: 41 54 50 08 43 4D 41 50 08 43 4D 42 50 08 4C 50  ATP.CMAP.CMBP.LP
  01B0: 54 50 08 46 44 43 50 08 52 46 44 56 08 48 4F 54  TP.FDCP.RFDV.HOT
  01C0: 4B 08 52 54 43 46 08 55 54 49 4C 08 41 43 49 4E  K.RTCF.UTIL.ACIN
  01D0: 08 49 47 44 53 08 54 4C 53 54 08 43 41 44 4C 08  .IGDS.TLST.CADL.
  01E0: 50 41 44 4C 08 43 53 54 45 10 4E 53 54 45 10 53  PADL.CSTE.NSTE.S
  01F0: 53 54 45 10 4E 44 49 44 08 44 49 44 31 20 44 49  STE.NDID.DID1 DI
  0200: 44 32 20 44 49 44 33 20 44 49 44 34 20 44 49 44  D2 DID3 DID4 DID
  0210: 35 20 00 48 04 42 4C 43 53 08 42 52 54 4C 08 4F  5 .H.BLCS.BRTL.O
  0220: 44 44 53 08 00 38 41 4C 53 45 08 41 4C 41 46 08  DDS..8ALSE.ALAF.
  0230: 4C 4C 4F 57 08 4C 48 49 48 08 00 30 45 4D 41 45  LLOW.LHIH..0EMAE
  0240: 08 45 4D 41 50 10 45 4D 41 4C 10 00 28 4D 45 46  .EMAP.EMAL..(MEF
  0250: 45 08 00 48 04 54 50 4D 50 08 54 50 4D 45 08 00  E..H.TPMP.TPME..
  0260: 40 04 47 54 46 30 38 47 54 46 31 38 47 54 46 32  @.GTF08GTF18GTF2
  0270: 38 49 44 45 4D 08 49 44 45 54 08 00 38 41 53 4C  8IDEM.IDET..8ASL
  0280: 42 20 49 42 54 54 08 49 50 41 54 08 49 54 56 46  B IBTT.IPAT.ITVF
  0290: 08 49 54 56 4D 08 49 50 53 43 08 49 42 4C 43 08  .ITVM.IPSC.IBLC.
  02A0: 49 42 49 41 08 49 53 53 43 08 49 34 30 39 08 49  IBIA.ISSC.I409.I
  02B0: 35 30 39 08 49 36 30 39 08 49 37 30 39 08 49 44  509.I609.I709.ID
  02C0: 4D 4D 08 49 44 4D 53 08 49 46 31 45 08 48 56 43  MM.IDMS.IF1E.HVC
  02D0: 4F 08 4E 58 44 31 20 4E 58 44 32 20 4E 58 44 33  O.NXD1 NXD2 NXD3
  02E0: 20 4E 58 44 34 20 4E 58 44 35 20 4E 58 44 36 20   NXD4 NXD5 NXD6 
  02F0: 4E 58 44 37 20 4E 58 44 38 20 00 40 04 44 4F 43  NXD7 NXD8 .@.DOC
  0300: 4B 08 42 54 45 4E

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-12 Thread Christoph Lameter

On Tue, 11 Apr 2017, Vlastimil Babka wrote:

> > The fallback was only intended for a cpuset on which boundaries are not 
> > enforced
> > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL)
> > should fail the allocation.
>
> Hmm just to clarify - I'm talking about ignoring the *mempolicy's* nodemask on
> the basis of cpuset having higher priority, while you seem to be talking about
> ignoring a (softwall) cpuset nodemask, right? man set_mempolicy says "... if
> required nodemask contains no nodes that are allowed by the process's current
> cpuset context, the memory  policy reverts to local allocation" which does 
> come
> down to ignoring mempolicy's nodemask.

I am talking of allocating outside of the current allowed nodes
(determined by mempolicy -- MPOL_BIND is the only concern as far as I can
tell -- as well as the current cpuset). One can violate the cpuset if its not
a hardwall but  the MPOL_MBIND node restriction cannot be violated.

Those allocations are also not allowed if the allocation was for a user
space page even if this is a softwall cpuset.

> >> This patch fixes the issue by having __alloc_pages_slowpath() check for 
> >> empty
> >> intersection of cpuset and ac->nodemask before OOM or allocation failure. 
> >> If
> >> it's indeed empty, the nodemask is ignored and allocation retried, which 
> >> mimics
> >> node_zonelist(). This works fine, because almost all callers of
> >
> > Well that would need to be subject to the hardwall flag. Allocation needs
> > to fail for a hardwall cpuset.
>
> They still do, if no hardwall cpuset node can satisfy the allocation with
> mempolicy ignored.

If the memory policy is MPOL_MBIND then allocations outside of the given
nodes should fail. They can violate the cpuset boundaries only if they are
kernel allocations and we are not in a hardwall cpuset.

That was at least my understand when working on this code years ago.

Re: [RFC 1/6] mm, page_alloc: fix more premature OOM due to race with cpuset update

2017-04-12 Thread Christoph Lameter

On Tue, 11 Apr 2017, Vlastimil Babka wrote:

> > The fallback was only intended for a cpuset on which boundaries are not 
> > enforced
> > in critical conditions (softwall). A hardwall cpuset (CS_MEM_HARDWALL)
> > should fail the allocation.
>
> Hmm just to clarify - I'm talking about ignoring the *mempolicy's* nodemask on
> the basis of cpuset having higher priority, while you seem to be talking about
> ignoring a (softwall) cpuset nodemask, right? man set_mempolicy says "... if
> required nodemask contains no nodes that are allowed by the process's current
> cpuset context, the memory  policy reverts to local allocation" which does 
> come
> down to ignoring mempolicy's nodemask.

I am talking of allocating outside of the current allowed nodes
(determined by mempolicy -- MPOL_BIND is the only concern as far as I can
tell -- as well as the current cpuset). One can violate the cpuset if its not
a hardwall but  the MPOL_MBIND node restriction cannot be violated.

Those allocations are also not allowed if the allocation was for a user
space page even if this is a softwall cpuset.

> >> This patch fixes the issue by having __alloc_pages_slowpath() check for 
> >> empty
> >> intersection of cpuset and ac->nodemask before OOM or allocation failure. 
> >> If
> >> it's indeed empty, the nodemask is ignored and allocation retried, which 
> >> mimics
> >> node_zonelist(). This works fine, because almost all callers of
> >
> > Well that would need to be subject to the hardwall flag. Allocation needs
> > to fail for a hardwall cpuset.
>
> They still do, if no hardwall cpuset node can satisfy the allocation with
> mempolicy ignored.

If the memory policy is MPOL_MBIND then allocations outside of the given
nodes should fail. They can violate the cpuset boundaries only if they are
kernel allocations and we are not in a hardwall cpuset.

That was at least my understand when working on this code years ago.

Re: [PATCH] ACPICA: Export mutex functions

2017-04-12 Thread Guenter Roeck

On Wed, Apr 12, 2017 at 03:29:55PM +, Moore, Robert wrote:
> The ACPICA mutex functions are based on the host OS functions, so they don't 
> really buy you anything. You should just use the native Linux functions.
> 

You mean they don't really acquire the requested ACPI mutex,
and the underlying DSDT which declares and uses the mutex
just ignores if the mutex was acquired by acpi_acquire_mutex() ?

To clarify: You are saying that code such as

acpi_status status;

status = acpi_acquire_mutex(NULL, "\\_SB.PCI0.SBRG.SIO1.MUT0", 0x10);
if (ACPI_FAILURE(status)) {
pr_err("Failed to acquire ACPI mutex\n");
return -EBUSY;
}
...

when used in conjunction with

...
Mutex (MUT0, 0x00)
Method (ENFG, 1, NotSerialized)
{
Acquire (MUT0, 0x0FFF)
...
}

doesn't really provide exclusive access to the resource(s) protected
by MUT0, even if acpi_acquire_mutex() returns ACPI_SUCCESS ?

Outch. Really ?

Thanks,
Guenter

> 
> > -Original Message-
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Sent: Wednesday, April 12, 2017 8:13 AM
> > To: Moore, Robert ; Zheng, Lv
> > ; Wysocki, Rafael J ;
> > Len Brown 
> > Cc: linux-a...@vger.kernel.org; de...@acpica.org; linux-
> > ker...@vger.kernel.org; Guenter Roeck 
> > Subject: [PATCH] ACPICA: Export mutex functions
> > 
> > Mutex functions may be needed by drivers. Examples are accesses to
> > Super-IO SIO registers (0x2e/0x2f or 0x4e/0x4f) or Super-IO
> > environmental monitor registers, both which may also be accessed through
> > DSDT.
> > 
> > Signed-off-by: Guenter Roeck 
> > ---
> >  drivers/acpi/acpica/utxfmutex.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/acpi/acpica/utxfmutex.c
> > b/drivers/acpi/acpica/utxfmutex.c index c016211c35ae..5d20581f4b2f
> > 100644
> > --- a/drivers/acpi/acpica/utxfmutex.c
> > +++ b/drivers/acpi/acpica/utxfmutex.c
> > @@ -150,6 +150,7 @@ acpi_acquire_mutex(acpi_handle handle, acpi_string
> > pathname, u16 timeout)
> > status = acpi_os_acquire_mutex(mutex_obj->mutex.os_mutex, timeout);
> > return (status);
> >  }
> > +ACPI_EXPORT_SYMBOL(acpi_acquire_mutex)
> > 
> > 
> > /***
> > 
> >   *
> > @@ -185,3 +186,4 @@ acpi_status acpi_release_mutex(acpi_handle handle,
> > acpi_string pathname)
> > acpi_os_release_mutex(mutex_obj->mutex.os_mutex);
> > return (AE_OK);
> >  }
> > +ACPI_EXPORT_SYMBOL(acpi_release_mutex)
> > --
> > 2.7.4
>

Re: [PATCH] ACPICA: Export mutex functions

2017-04-12 Thread Guenter Roeck

On Wed, Apr 12, 2017 at 03:29:55PM +, Moore, Robert wrote:
> The ACPICA mutex functions are based on the host OS functions, so they don't 
> really buy you anything. You should just use the native Linux functions.
> 

You mean they don't really acquire the requested ACPI mutex,
and the underlying DSDT which declares and uses the mutex
just ignores if the mutex was acquired by acpi_acquire_mutex() ?

To clarify: You are saying that code such as

acpi_status status;

status = acpi_acquire_mutex(NULL, "\\_SB.PCI0.SBRG.SIO1.MUT0", 0x10);
if (ACPI_FAILURE(status)) {
pr_err("Failed to acquire ACPI mutex\n");
return -EBUSY;
}
...

when used in conjunction with

...
Mutex (MUT0, 0x00)
Method (ENFG, 1, NotSerialized)
{
Acquire (MUT0, 0x0FFF)
...
}

doesn't really provide exclusive access to the resource(s) protected
by MUT0, even if acpi_acquire_mutex() returns ACPI_SUCCESS ?

Outch. Really ?

Thanks,
Guenter

> 
> > -Original Message-
> > From: Guenter Roeck [mailto:li...@roeck-us.net]
> > Sent: Wednesday, April 12, 2017 8:13 AM
> > To: Moore, Robert ; Zheng, Lv
> > ; Wysocki, Rafael J ;
> > Len Brown 
> > Cc: linux-a...@vger.kernel.org; de...@acpica.org; linux-
> > ker...@vger.kernel.org; Guenter Roeck 
> > Subject: [PATCH] ACPICA: Export mutex functions
> > 
> > Mutex functions may be needed by drivers. Examples are accesses to
> > Super-IO SIO registers (0x2e/0x2f or 0x4e/0x4f) or Super-IO
> > environmental monitor registers, both which may also be accessed through
> > DSDT.
> > 
> > Signed-off-by: Guenter Roeck 
> > ---
> >  drivers/acpi/acpica/utxfmutex.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/acpi/acpica/utxfmutex.c
> > b/drivers/acpi/acpica/utxfmutex.c index c016211c35ae..5d20581f4b2f
> > 100644
> > --- a/drivers/acpi/acpica/utxfmutex.c
> > +++ b/drivers/acpi/acpica/utxfmutex.c
> > @@ -150,6 +150,7 @@ acpi_acquire_mutex(acpi_handle handle, acpi_string
> > pathname, u16 timeout)
> > status = acpi_os_acquire_mutex(mutex_obj->mutex.os_mutex, timeout);
> > return (status);
> >  }
> > +ACPI_EXPORT_SYMBOL(acpi_acquire_mutex)
> > 
> > 
> > /***
> > 
> >   *
> > @@ -185,3 +186,4 @@ acpi_status acpi_release_mutex(acpi_handle handle,
> > acpi_string pathname)
> > acpi_os_release_mutex(mutex_obj->mutex.os_mutex);
> > return (AE_OK);
> >  }
> > +ACPI_EXPORT_SYMBOL(acpi_release_mutex)
> > --
> > 2.7.4
>

[PATCH 3/3] f2fs: fix not to set fsync/dentry mark

2017-04-12 Thread Jaegeuk Kim

Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting
in giving up roll-forward recovery due to wrong dentry mark.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/node.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 9422dd252813..ad54e907b97b 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1458,6 +1458,9 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct 
inode *inode,
f2fs_wait_on_page_writeback(page, NODE, true);
BUG_ON(PageWriteback(page));
 
+   set_fsync_mark(page, 0);
+   set_dentry_mark(page, 0);
+
if (!atomic || page == last_page) {
set_fsync_mark(page, 1);
if (IS_INODE(page)) {
-- 
2.11.0

[PATCH 3/3] f2fs: fix not to set fsync/dentry mark

2017-04-12 Thread Jaegeuk Kim

Otherwise, we can see stale fsync/dentry mark given by previous calls, resulting
in giving up roll-forward recovery due to wrong dentry mark.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/node.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 9422dd252813..ad54e907b97b 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1458,6 +1458,9 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct 
inode *inode,
f2fs_wait_on_page_writeback(page, NODE, true);
BUG_ON(PageWriteback(page));
 
+   set_fsync_mark(page, 0);
+   set_dentry_mark(page, 0);
+
if (!atomic || page == last_page) {
set_fsync_mark(page, 1);
if (IS_INODE(page)) {
-- 
2.11.0

[PATCH 2/3] f2fs: allocate hot_data for atomic writes

2017-04-12 Thread Jaegeuk Kim

We'd better allocate atomic writes to hot_data zone.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4731eb587e06..0ac833dd2634 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1531,6 +1531,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
 
set_inode_flag(inode, FI_ATOMIC_FILE);
+   set_inode_flag(inode, FI_HOT_DATA);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
 
if (!get_dirty_pages(inode))
-- 
2.11.0

[PATCH 2/3] f2fs: allocate hot_data for atomic writes

2017-04-12 Thread Jaegeuk Kim

We'd better allocate atomic writes to hot_data zone.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 4731eb587e06..0ac833dd2634 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1531,6 +1531,7 @@ static int f2fs_ioc_start_atomic_write(struct file *filp)
goto out;
 
set_inode_flag(inode, FI_ATOMIC_FILE);
+   set_inode_flag(inode, FI_HOT_DATA);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
 
if (!get_dirty_pages(inode))
-- 
2.11.0

[PATCH 1/3] f2fs: give time to flush dirty pages for checkpoint

2017-04-12 Thread Jaegeuk Kim

If all the threads are waiting for checkpoint, we have no chance to flush
required dirty pages.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 9db92990f193..800be94f8cb3 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -980,6 +980,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
err = sync_dirty_inodes(sbi, DIR_INODE);
if (err)
goto out;
+   cond_resched();
goto retry_flush_dents;
}
 
@@ -995,6 +996,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
err = f2fs_sync_inode_meta(sbi);
if (err)
goto out;
+   cond_resched();
goto retry_flush_dents;
}
 
@@ -1009,6 +1011,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
f2fs_unlock_all(sbi);
goto out;
}
+   cond_resched();
goto retry_flush_nodes;
}
 
-- 
2.11.0

[PATCH 1/3] f2fs: give time to flush dirty pages for checkpoint

2017-04-12 Thread Jaegeuk Kim

If all the threads are waiting for checkpoint, we have no chance to flush
required dirty pages.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/checkpoint.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 9db92990f193..800be94f8cb3 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -980,6 +980,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
err = sync_dirty_inodes(sbi, DIR_INODE);
if (err)
goto out;
+   cond_resched();
goto retry_flush_dents;
}
 
@@ -995,6 +996,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
err = f2fs_sync_inode_meta(sbi);
if (err)
goto out;
+   cond_resched();
goto retry_flush_dents;
}
 
@@ -1009,6 +1011,7 @@ static int block_operations(struct f2fs_sb_info *sbi)
f2fs_unlock_all(sbi);
goto out;
}
+   cond_resched();
goto retry_flush_nodes;
}
 
-- 
2.11.0

Re: [PATCH v2] f2fs: fix fs corruption due to zero inode page

2017-04-12 Thread Jaegeuk Kim

Change log from v1:
 o fix some bugs

>From 9bb02c3627f46e50246bf7ab957b56ffbef623cb Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Tue, 11 Apr 2017 19:01:26 -0700
Subject: [PATCH] f2fs: fix fs corruption due to zero inode page

This patch fixes the following scenario.

- f2fs_create/f2fs_mkdir - write_checkpoint
 - f2fs_mark_inode_dirty_sync - block_operations
   - f2fs_lock_all
   - f2fs_sync_inode_meta
- f2fs_unlock_all
- sync_inode_metadata
 - f2fs_lock_op
 - f2fs_write_inode
  - update_inode_page
   - get_node_page
 return -ENOENT
 - new_inode_page
  - fill_node_footer
 - f2fs_mark_inode_dirty_sync
 - ...
 - f2fs_unlock_op
  - f2fs_inode_synced
   - f2fs_lock_all
   - do_checkpoint

In this checkpoint, we can get an inode page which contains zeros having valid
node footer only.

Cc: 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/inode.c |  2 +-
 fs/f2fs/namei.c | 20 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 2520fa72b23f..0900814485c7 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -316,7 +316,6 @@ int update_inode_page(struct inode *inode)
} else if (err != -ENOENT) {
f2fs_stop_checkpoint(sbi, false);
}
-   f2fs_inode_synced(inode);
return 0;
}
ret = update_inode(inode, node_page);
@@ -450,6 +449,7 @@ void handle_failed_inode(struct inode *inode)
 * in a panic when flushing dirty inodes in gdirty_list.
 */
update_inode_page(inode);
+   f2fs_inode_synced(inode);
 
/* don't make bad inode, since it becomes a regular file. */
unlock_new_inode(inode);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 8906c9f6cce4..8b5f596ed738 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -148,8 +148,6 @@ static int f2fs_create(struct inode *dir, struct dentry 
*dentry, umode_t mode,
inode->i_mapping->a_ops = _dblock_aops;
ino = inode->i_ino;
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
@@ -163,6 +161,8 @@ static int f2fs_create(struct inode *dir, struct dentry 
*dentry, umode_t mode,
 
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
+
+   f2fs_balance_fs(sbi, true);
return 0;
 out:
handle_failed_inode(inode);
@@ -423,8 +423,6 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
*dentry,
inode_nohighmem(inode);
inode->i_mapping->a_ops = _dblock_aops;
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
@@ -487,6 +485,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
*dentry,
}
 
kfree(sd);
+
+   f2fs_balance_fs(sbi, true);
return err;
 out:
handle_failed_inode(inode);
@@ -508,8 +508,6 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
inode->i_mapping->a_ops = _dblock_aops;
mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_HIGH_ZERO);
 
-   f2fs_balance_fs(sbi, true);
-
set_inode_flag(inode, FI_INC_LINK);
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
@@ -524,6 +522,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
 
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
+
+   f2fs_balance_fs(sbi, true);
return 0;
 
 out_fail:
@@ -554,8 +554,6 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
*dentry,
init_special_inode(inode, inode->i_mode, rdev);
inode->i_op = _special_inode_operations;
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
@@ -569,6 +567,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
*dentry,
 
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
+
+   f2fs_balance_fs(sbi, true);
return 0;
 out:
handle_failed_inode(inode);
@@ -595,8 +595,6 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry 
*dentry,
inode->i_mapping->a_ops = _dblock_aops;
}
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = acquire_orphan_inode(sbi);
if (err)
@@ -622,6 +620,8 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry 
*dentry,

Re: [PATCH v2] f2fs: fix fs corruption due to zero inode page

2017-04-12 Thread Jaegeuk Kim

Change log from v1:
 o fix some bugs

>From 9bb02c3627f46e50246bf7ab957b56ffbef623cb Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Tue, 11 Apr 2017 19:01:26 -0700
Subject: [PATCH] f2fs: fix fs corruption due to zero inode page

This patch fixes the following scenario.

- f2fs_create/f2fs_mkdir - write_checkpoint
 - f2fs_mark_inode_dirty_sync - block_operations
   - f2fs_lock_all
   - f2fs_sync_inode_meta
- f2fs_unlock_all
- sync_inode_metadata
 - f2fs_lock_op
 - f2fs_write_inode
  - update_inode_page
   - get_node_page
 return -ENOENT
 - new_inode_page
  - fill_node_footer
 - f2fs_mark_inode_dirty_sync
 - ...
 - f2fs_unlock_op
  - f2fs_inode_synced
   - f2fs_lock_all
   - do_checkpoint

In this checkpoint, we can get an inode page which contains zeros having valid
node footer only.

Cc: 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/inode.c |  2 +-
 fs/f2fs/namei.c | 20 ++--
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 2520fa72b23f..0900814485c7 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -316,7 +316,6 @@ int update_inode_page(struct inode *inode)
} else if (err != -ENOENT) {
f2fs_stop_checkpoint(sbi, false);
}
-   f2fs_inode_synced(inode);
return 0;
}
ret = update_inode(inode, node_page);
@@ -450,6 +449,7 @@ void handle_failed_inode(struct inode *inode)
 * in a panic when flushing dirty inodes in gdirty_list.
 */
update_inode_page(inode);
+   f2fs_inode_synced(inode);
 
/* don't make bad inode, since it becomes a regular file. */
unlock_new_inode(inode);
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 8906c9f6cce4..8b5f596ed738 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -148,8 +148,6 @@ static int f2fs_create(struct inode *dir, struct dentry 
*dentry, umode_t mode,
inode->i_mapping->a_ops = _dblock_aops;
ino = inode->i_ino;
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
@@ -163,6 +161,8 @@ static int f2fs_create(struct inode *dir, struct dentry 
*dentry, umode_t mode,
 
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
+
+   f2fs_balance_fs(sbi, true);
return 0;
 out:
handle_failed_inode(inode);
@@ -423,8 +423,6 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
*dentry,
inode_nohighmem(inode);
inode->i_mapping->a_ops = _dblock_aops;
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
@@ -487,6 +485,8 @@ static int f2fs_symlink(struct inode *dir, struct dentry 
*dentry,
}
 
kfree(sd);
+
+   f2fs_balance_fs(sbi, true);
return err;
 out:
handle_failed_inode(inode);
@@ -508,8 +508,6 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
inode->i_mapping->a_ops = _dblock_aops;
mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_HIGH_ZERO);
 
-   f2fs_balance_fs(sbi, true);
-
set_inode_flag(inode, FI_INC_LINK);
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
@@ -524,6 +522,8 @@ static int f2fs_mkdir(struct inode *dir, struct dentry 
*dentry, umode_t mode)
 
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
+
+   f2fs_balance_fs(sbi, true);
return 0;
 
 out_fail:
@@ -554,8 +554,6 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
*dentry,
init_special_inode(inode, inode->i_mode, rdev);
inode->i_op = _special_inode_operations;
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
@@ -569,6 +567,8 @@ static int f2fs_mknod(struct inode *dir, struct dentry 
*dentry,
 
if (IS_DIRSYNC(dir))
f2fs_sync_fs(sbi->sb, 1);
+
+   f2fs_balance_fs(sbi, true);
return 0;
 out:
handle_failed_inode(inode);
@@ -595,8 +595,6 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry 
*dentry,
inode->i_mapping->a_ops = _dblock_aops;
}
 
-   f2fs_balance_fs(sbi, true);
-
f2fs_lock_op(sbi);
err = acquire_orphan_inode(sbi);
if (err)
@@ -622,6 +620,8 @@ static int __f2fs_tmpfile(struct inode *dir, struct dentry 
*dentry,
/* link_count was changed by d_tmpfile as well. */

Re: [RFC 2/6] mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()

2017-04-12 Thread Vlastimil Babka

On 12.4.2017 23:16, Christoph Lameter wrote:
> On Wed, 12 Apr 2017, Vlastimil Babka wrote:
> 
 Well, interleave_nodes() will then potentially return a node outside of
 the allowed memory policy when its called for the first time after
 mpol_rebind_.. . But thenn it will find the next node within the
 nodemask and work correctly for the next invocations.
>>>
>>> Hmm, you're right. But that could be easily fixed if il_next became 
>>> il_prev, so
>>> we would return the result of next_node_in(il_prev) and also store it as 
>>> the new
>>> il_prev, right? I somehow assumed it already worked that way.
> 
> Yup that makes sense and I thought about that when I saw the problem too.
> 
>> @@ -863,6 +856,18 @@ static int lookup_node(unsigned long addr)
>>  return err;
>>  }
>>
>> +/* Do dynamic interleaving for a process */
>> +static unsigned interleave_nodes(struct mempolicy *policy, bool update_prev)
> 
> Why do you need an additional flag? Would it not be better to always
> update and switch the update_prev=false case to simply use
> next_node_in()?

Looked to me as better wrapping, but probably overengineered, ok. Will change
for v2.

Re: [RFC 2/6] mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()

2017-04-12 Thread Vlastimil Babka

On 12.4.2017 23:16, Christoph Lameter wrote:
> On Wed, 12 Apr 2017, Vlastimil Babka wrote:
> 
 Well, interleave_nodes() will then potentially return a node outside of
 the allowed memory policy when its called for the first time after
 mpol_rebind_.. . But thenn it will find the next node within the
 nodemask and work correctly for the next invocations.
>>>
>>> Hmm, you're right. But that could be easily fixed if il_next became 
>>> il_prev, so
>>> we would return the result of next_node_in(il_prev) and also store it as 
>>> the new
>>> il_prev, right? I somehow assumed it already worked that way.
> 
> Yup that makes sense and I thought about that when I saw the problem too.
> 
>> @@ -863,6 +856,18 @@ static int lookup_node(unsigned long addr)
>>  return err;
>>  }
>>
>> +/* Do dynamic interleaving for a process */
>> +static unsigned interleave_nodes(struct mempolicy *policy, bool update_prev)
> 
> Why do you need an additional flag? Would it not be better to always
> update and switch the update_prev=false case to simply use
> next_node_in()?

Looked to me as better wrapping, but probably overengineered, ok. Will change
for v2.

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Luck, Tony

On Wed, Apr 12, 2017 at 11:12:21PM +0200, Thomas Gleixner wrote:
> There is another solution:
> 
> Convert the notifier to a blocking notifier and in the panic case, ignore
> the locking and invoke the notifier chain directly. That needs some minimal
> surgery in the notifier code to allow that, but that's certainly less ugly
> than splitting stuff up into two chains.

But I wonder whether we actually want two chains.  We've been adding a bunch
of general run-time logging and recovery stuff to this chain. So now we have
things there that aren't needed or useful in the panic case. E.g.
srao_decode_notifier() (which tries to offline a page that reported an
uncorrected error out of the execution path) and Boris's new CEC code.

-Tony

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Luck, Tony

On Wed, Apr 12, 2017 at 11:12:21PM +0200, Thomas Gleixner wrote:
> There is another solution:
> 
> Convert the notifier to a blocking notifier and in the panic case, ignore
> the locking and invoke the notifier chain directly. That needs some minimal
> surgery in the notifier code to allow that, but that's certainly less ugly
> than splitting stuff up into two chains.

But I wonder whether we actually want two chains.  We've been adding a bunch
of general run-time logging and recovery stuff to this chain. So now we have
things there that aren't needed or useful in the panic case. E.g.
srao_decode_notifier() (which tries to offline a page that reported an
uncorrected error out of the execution path) and Boris's new CEC code.

-Tony

Re: [RFC 2/6] mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()

2017-04-12 Thread Christoph Lameter

On Wed, 12 Apr 2017, Vlastimil Babka wrote:

> >> Well, interleave_nodes() will then potentially return a node outside of
> >> the allowed memory policy when its called for the first time after
> >> mpol_rebind_.. . But thenn it will find the next node within the
> >> nodemask and work correctly for the next invocations.
> >
> > Hmm, you're right. But that could be easily fixed if il_next became 
> > il_prev, so
> > we would return the result of next_node_in(il_prev) and also store it as 
> > the new
> > il_prev, right? I somehow assumed it already worked that way.

Yup that makes sense and I thought about that when I saw the problem too.

> @@ -863,6 +856,18 @@ static int lookup_node(unsigned long addr)
>   return err;
>  }
>
> +/* Do dynamic interleaving for a process */
> +static unsigned interleave_nodes(struct mempolicy *policy, bool update_prev)

Why do you need an additional flag? Would it not be better to always
update and switch the update_prev=false case to simply use
next_node_in()?

> +{
> + unsigned next;
> + struct task_struct *me = current;
> +
> + next = next_node_in(me->il_prev, policy->v.nodes);
> + if (next < MAX_NUMNODES && update_prev)
> + me->il_prev = next;
> + return next;
> +}
> +
>  /* Retrieve NUMA policy */
>  static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>unsigned long addr, unsigned long flags)
> @@ -916,7 +921,7 @@ static long do_get_mempolicy(int *policy, nodemask_t 
> *nmask,
>   *policy = err;
>   } else if (pol == current->mempolicy &&
>   pol->mode == MPOL_INTERLEAVE) {
> - *policy = current->il_next;
> + *policy = interleave_nodes(current->mempolicy, false);

Here

Re: [RFC 2/6] mm, mempolicy: stop adjusting current->il_next in mpol_rebind_nodemask()

2017-04-12 Thread Christoph Lameter

On Wed, 12 Apr 2017, Vlastimil Babka wrote:

> >> Well, interleave_nodes() will then potentially return a node outside of
> >> the allowed memory policy when its called for the first time after
> >> mpol_rebind_.. . But thenn it will find the next node within the
> >> nodemask and work correctly for the next invocations.
> >
> > Hmm, you're right. But that could be easily fixed if il_next became 
> > il_prev, so
> > we would return the result of next_node_in(il_prev) and also store it as 
> > the new
> > il_prev, right? I somehow assumed it already worked that way.

Yup that makes sense and I thought about that when I saw the problem too.

> @@ -863,6 +856,18 @@ static int lookup_node(unsigned long addr)
>   return err;
>  }
>
> +/* Do dynamic interleaving for a process */
> +static unsigned interleave_nodes(struct mempolicy *policy, bool update_prev)

Why do you need an additional flag? Would it not be better to always
update and switch the update_prev=false case to simply use
next_node_in()?

> +{
> + unsigned next;
> + struct task_struct *me = current;
> +
> + next = next_node_in(me->il_prev, policy->v.nodes);
> + if (next < MAX_NUMNODES && update_prev)
> + me->il_prev = next;
> + return next;
> +}
> +
>  /* Retrieve NUMA policy */
>  static long do_get_mempolicy(int *policy, nodemask_t *nmask,
>unsigned long addr, unsigned long flags)
> @@ -916,7 +921,7 @@ static long do_get_mempolicy(int *policy, nodemask_t 
> *nmask,
>   *policy = err;
>   } else if (pol == current->mempolicy &&
>   pol->mode == MPOL_INTERLEAVE) {
> - *policy = current->il_next;
> + *policy = interleave_nodes(current->mempolicy, false);

Here

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 08:27:05PM +, Verma, Vishal L wrote:
> But isn't the atomic notifier call chain always called in atomic
> context?

No, it isn't. We're calling it in normal process context in
mce_gen_pool_process() too.

So this early exit will avoid any sleeping in atomic context. And since
there's nothing you can do about the errors reported in atomic context,
we can actually use that fact.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Borislav Petkov

On Wed, Apr 12, 2017 at 08:27:05PM +, Verma, Vishal L wrote:
> But isn't the atomic notifier call chain always called in atomic
> context?

No, it isn't. We're calling it in normal process context in
mce_gen_pool_process() too.

So this early exit will avoid any sleeping in atomic context. And since
there's nothing you can do about the errors reported in atomic context,
we can actually use that fact.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
--

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Thomas Gleixner

On Wed, 12 Apr 2017, Dan Williams wrote:

> On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony  wrote:
> > On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> >> > /* We only care about memory errors */
> >> > if (!(mce->status & MCACOD))
> >> > return NOTIFY_DONE;
> >
> > N.B. that isn't a valid test that this is a memory error. You need
> >
> >
> > if (!(m->status & 0xef80) == BIT(7))
> > return NOTIFY_DONE;
> >
> > See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes
> 
> But Vishal's point is that even if we get this check correct the
> notifier still requires no sleeping operations. So we would need to
> move recoverable notifications to a separate blocking notifier chain.

There is another solution:

Convert the notifier to a blocking notifier and in the panic case, ignore
the locking and invoke the notifier chain directly. That needs some minimal
surgery in the notifier code to allow that, but that's certainly less ugly
than splitting stuff up into two chains.

Thanks,

tglx

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Thomas Gleixner

On Wed, 12 Apr 2017, Dan Williams wrote:

> On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony  wrote:
> > On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> >> > /* We only care about memory errors */
> >> > if (!(mce->status & MCACOD))
> >> > return NOTIFY_DONE;
> >
> > N.B. that isn't a valid test that this is a memory error. You need
> >
> >
> > if (!(m->status & 0xef80) == BIT(7))
> > return NOTIFY_DONE;
> >
> > See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes
> 
> But Vishal's point is that even if we get this check correct the
> notifier still requires no sleeping operations. So we would need to
> move recoverable notifications to a separate blocking notifier chain.

There is another solution:

Convert the notifier to a blocking notifier and in the panic case, ignore
the locking and invoke the notifier chain directly. That needs some minimal
surgery in the notifier code to allow that, but that's certainly less ugly
than splitting stuff up into two chains.

Thanks,

tglx

et8ek8 camera on Nokia N900: trying to understand what is going on with modes

2017-04-12 Thread Pavel Machek

Hi!

5Mpix mode does not work on N900, which is something I'd like to
understand. et8ek8_mode contains huge tables of register settings and
parameter values, but it seems that they are not really independend.

To test that theory, I started with checking values against each
other.

This is the work so far, it is neither complete nor completely working
at the moment. Perhaps someone wants to play...

Pavel

diff --git a/drivers/media/i2c/et8ek8/et8ek8_driver.c 
b/drivers/media/i2c/et8ek8/et8ek8_driver.c
index 6296f6f..ca2f648 100644
--- a/drivers/media/i2c/et8ek8/et8ek8_driver.c
+++ b/drivers/media/i2c/et8ek8/et8ek8_driver.c
@@ -798,6 +798,8 @@ static void et8ek8_update_controls(struct et8ek8_sensor 
*sensor)
u32 min, max, pixel_rate;
static const int S = 8;
 
+   printk("Updating controls for %d x %d @ %d mode -- %s\n", mode->width, 
mode->height, mode->pixel_clock, mode->name);
+
ctrl = sensor->exposure;
 
 #ifdef COMPATIBLE
@@ -820,6 +822,127 @@ static void et8ek8_update_controls(struct et8ek8_sensor 
*sensor)
__v4l2_ctrl_s_ctrl_int64(sensor->pixel_rate, pixel_rate << S);
 }
 
+static int read_8(struct i2c_client *client, unsigned long addr)
+{
+   int val;
+   et8ek8_i2c_read_reg(client, ET8EK8_REG_8BIT, addr, );
+   return val;
+}
+
+static int read_16(struct i2c_client *client, unsigned long addr)
+{
+   return read_8(client, addr);
+}
+
+static void assert_value(struct i2c_client *client, unsigned long addr, 
unsigned long val)
+{
+   int val2 = read_8(client, addr);
+   if (val != val2)
+   printk("et8ek8: assertion check %lx / should be %lx is %lx\n", 
addr, val, val2);
+}
+
+static void assert(struct i2c_client *client, int v, char *msg)
+{
+   if (!v)
+   printk("et8ek8: assertion: %s\n", msg);
+}
+
+static void assert_eq(struct i2c_client *client, int v1, int v2, char *msg)
+{
+   if (v1 != v2)
+   printk("et8ek8: assertion: %d == %d %s\n", v1, v2, msg);
+}
+
+static void et8ek8_check(struct et8ek8_sensor *sensor)
+{
+   /*
+ 1239 4F   # CKVAR_DIV
+ 1238 02   # CKVAR_DIV[8] CKREF_DIV
+ 123B 70   # MRCK_DIV LVDSCK_DIV
+ 123A 05   # VCO_DIV SPCK_DIV
+ 121B 63   # PIC_SIZE MONI_MODE
+ 1220 85   # H_COUNT
+ 1221 00   # H_COUNT[10:8]
+ 1222 58   # V_COUNT
+ 1223 00   # V_COUNT[12:8]
+ 121D 63   # H_SIZE H_INTERMIT
+ 125D 83   # CCP_LVDS_MODE/ _/ _/ _/ _/ CCP_COMP_MODE[2-0]
+   */
+   struct et8ek8_reglist *r = sensor->current_reglist;
+   struct v4l2_subdev *subdev = >subdev;
+   struct i2c_client *client = v4l2_get_subdevdata(subdev);
+   int vco;
+   
+   printk("Mode validation:\n");
+
+   assert_value(client, 0x1220, (r->mode.width / 24) & 0xff);
+   assert_value(client, 0x1221, (r->mode.width / 24) >> 8);
+   
+   assert_value(client, 0x1222, (r->mode.height / 24) & 0xff);
+   assert_value(client, 0x1223, (r->mode.height / 24) >> 8);
+
+   {
+   int ckref_div = read_16(client, 0x1238) & 0xf;
+   int ckvar_div = ((read_16(client, 0x1238) & 0x80) >> 7) | 
(read_16(client, 0x1239) << 1);
+   int vco_div = read_16(client, 0x123A) >> 4;
+   int spck_div = read_16(client, 0x123A) & 0xf;
+   int mrck_div = read_16(client, 0x123B) >> 4;
+   int lvdsck_div = read_16(client, 0x123B) & 0xf;
+
+   vco = (r->mode.ext_clock * ckvar_div) / (ckref_div + 1);
+   printk("Vco is %d, %d %d %d\n", vco, r->mode.ext_clock, 
ckvar_div, ckref_div);
+   int ccp2 = vco / ((lvdsck_div + 1) * (vco_div + 1));
+   int spck = vco / ((spck_div + 1) * (vco_div + 1));
+
+   assert_eq(client, r->mode.pixel_clock, spck, "spck");
+   }
+
+   assert_eq(client, r->mode.max_exp, r->mode.height - 4, "max_exp");
+
+   assert(client, !(r->mode.sensor_window_width % r->mode.window_width), 
"window_width");
+   switch(r->mode.sensor_window_width / r->mode.window_width) {
+   case 1: assert_value(client, 0x121d, 0x64);
+   break;
+   case 2: assert_value(client, 0x121d, 0x63);
+   break;
+   case 3: assert_value(client, 0x121d, 0x62);
+   break;
+   default:
+   assert(client, 0, "bad window_width");
+   }
+
+   assert(client, !(r->mode.sensor_window_height % r->mode.window_height), 
"window_width");
+   switch(r->mode.sensor_window_height / r->mode.window_height) {
+   case 1: assert_value(client, 0x121b, 0x64);
+   break;
+   case 2: assert_value(client, 0x121b, 0x63);
+   break;
+   case 3: assert_value(client, 0x121b, 0x62);
+   break;
+   default:
+   assert(client, 0, "bad window_height");
+

et8ek8 camera on Nokia N900: trying to understand what is going on with modes

2017-04-12 Thread Pavel Machek

Hi!

5Mpix mode does not work on N900, which is something I'd like to
understand. et8ek8_mode contains huge tables of register settings and
parameter values, but it seems that they are not really independend.

To test that theory, I started with checking values against each
other.

This is the work so far, it is neither complete nor completely working
at the moment. Perhaps someone wants to play...

Pavel

diff --git a/drivers/media/i2c/et8ek8/et8ek8_driver.c 
b/drivers/media/i2c/et8ek8/et8ek8_driver.c
index 6296f6f..ca2f648 100644
--- a/drivers/media/i2c/et8ek8/et8ek8_driver.c
+++ b/drivers/media/i2c/et8ek8/et8ek8_driver.c
@@ -798,6 +798,8 @@ static void et8ek8_update_controls(struct et8ek8_sensor 
*sensor)
u32 min, max, pixel_rate;
static const int S = 8;
 
+   printk("Updating controls for %d x %d @ %d mode -- %s\n", mode->width, 
mode->height, mode->pixel_clock, mode->name);
+
ctrl = sensor->exposure;
 
 #ifdef COMPATIBLE
@@ -820,6 +822,127 @@ static void et8ek8_update_controls(struct et8ek8_sensor 
*sensor)
__v4l2_ctrl_s_ctrl_int64(sensor->pixel_rate, pixel_rate << S);
 }
 
+static int read_8(struct i2c_client *client, unsigned long addr)
+{
+   int val;
+   et8ek8_i2c_read_reg(client, ET8EK8_REG_8BIT, addr, );
+   return val;
+}
+
+static int read_16(struct i2c_client *client, unsigned long addr)
+{
+   return read_8(client, addr);
+}
+
+static void assert_value(struct i2c_client *client, unsigned long addr, 
unsigned long val)
+{
+   int val2 = read_8(client, addr);
+   if (val != val2)
+   printk("et8ek8: assertion check %lx / should be %lx is %lx\n", 
addr, val, val2);
+}
+
+static void assert(struct i2c_client *client, int v, char *msg)
+{
+   if (!v)
+   printk("et8ek8: assertion: %s\n", msg);
+}
+
+static void assert_eq(struct i2c_client *client, int v1, int v2, char *msg)
+{
+   if (v1 != v2)
+   printk("et8ek8: assertion: %d == %d %s\n", v1, v2, msg);
+}
+
+static void et8ek8_check(struct et8ek8_sensor *sensor)
+{
+   /*
+ 1239 4F   # CKVAR_DIV
+ 1238 02   # CKVAR_DIV[8] CKREF_DIV
+ 123B 70   # MRCK_DIV LVDSCK_DIV
+ 123A 05   # VCO_DIV SPCK_DIV
+ 121B 63   # PIC_SIZE MONI_MODE
+ 1220 85   # H_COUNT
+ 1221 00   # H_COUNT[10:8]
+ 1222 58   # V_COUNT
+ 1223 00   # V_COUNT[12:8]
+ 121D 63   # H_SIZE H_INTERMIT
+ 125D 83   # CCP_LVDS_MODE/ _/ _/ _/ _/ CCP_COMP_MODE[2-0]
+   */
+   struct et8ek8_reglist *r = sensor->current_reglist;
+   struct v4l2_subdev *subdev = >subdev;
+   struct i2c_client *client = v4l2_get_subdevdata(subdev);
+   int vco;
+   
+   printk("Mode validation:\n");
+
+   assert_value(client, 0x1220, (r->mode.width / 24) & 0xff);
+   assert_value(client, 0x1221, (r->mode.width / 24) >> 8);
+   
+   assert_value(client, 0x1222, (r->mode.height / 24) & 0xff);
+   assert_value(client, 0x1223, (r->mode.height / 24) >> 8);
+
+   {
+   int ckref_div = read_16(client, 0x1238) & 0xf;
+   int ckvar_div = ((read_16(client, 0x1238) & 0x80) >> 7) | 
(read_16(client, 0x1239) << 1);
+   int vco_div = read_16(client, 0x123A) >> 4;
+   int spck_div = read_16(client, 0x123A) & 0xf;
+   int mrck_div = read_16(client, 0x123B) >> 4;
+   int lvdsck_div = read_16(client, 0x123B) & 0xf;
+
+   vco = (r->mode.ext_clock * ckvar_div) / (ckref_div + 1);
+   printk("Vco is %d, %d %d %d\n", vco, r->mode.ext_clock, 
ckvar_div, ckref_div);
+   int ccp2 = vco / ((lvdsck_div + 1) * (vco_div + 1));
+   int spck = vco / ((spck_div + 1) * (vco_div + 1));
+
+   assert_eq(client, r->mode.pixel_clock, spck, "spck");
+   }
+
+   assert_eq(client, r->mode.max_exp, r->mode.height - 4, "max_exp");
+
+   assert(client, !(r->mode.sensor_window_width % r->mode.window_width), 
"window_width");
+   switch(r->mode.sensor_window_width / r->mode.window_width) {
+   case 1: assert_value(client, 0x121d, 0x64);
+   break;
+   case 2: assert_value(client, 0x121d, 0x63);
+   break;
+   case 3: assert_value(client, 0x121d, 0x62);
+   break;
+   default:
+   assert(client, 0, "bad window_width");
+   }
+
+   assert(client, !(r->mode.sensor_window_height % r->mode.window_height), 
"window_width");
+   switch(r->mode.sensor_window_height / r->mode.window_height) {
+   case 1: assert_value(client, 0x121b, 0x64);
+   break;
+   case 2: assert_value(client, 0x121b, 0x63);
+   break;
+   case 3: assert_value(client, 0x121b, 0x62);
+   break;
+   default:
+   assert(client, 0, "bad window_height");
+

Re: [PATCH v1 1/7] mfd: intel_soc_pmic_bxtwc: fix TMU interrupt index

2017-04-12 Thread Lee Jones

On Wed, 12 Apr 2017, Sathyanarayanan Kuppuswamy Natarajan wrote:

> Hi Lee,
> 
> Thanks. Will remove the code segment in next version.

Please always reply inline.  Top posting is frowned upon.

> On Wed, Apr 12, 2017 at 3:45 AM, Lee Jones  wrote:
> > On Mon, 10 Apr 2017, sathyanarayanan.kuppusw...@linux.intel.com wrote:
> >
> >> From: Kuppuswamy Sathyanarayanan 
> >> 
> >>
> >> TMU interrupts are registered as a separate interrupt chip, and
> >> hence it should start its interrupt index(BXTWC_TMU_IRQ) number
> >> from 0. But currently, BXTWC_TMU_IRQ is defined as part of enum
> >> bxtwc_irqs_level2 and its index value is 11. Since this index
> >> value is used when calculating .num_irqs of regmap_irq_chip_tmu,
> >> it incorrectly reports number of irqs as 12 instead of actual
> >> value of 1.
> >>
> >> static const struct regmap_irq bxtwc_regmap_irqs_tmu[] = {
> >>   REGMAP_IRQ_REG(BXTWC_TMU_IRQ, 0, 0x06),
> >> };
> >>
> >> static struct regmap_irq_chip bxtwc_regmap_irq_chip_tmu = {
> >>   .name = "bxtwc_irq_chip_tmu",
> >>   .status_base = BXTWC_TMUIRQ,
> >>   .mask_base = BXTWC_MTMUIRQ,
> >>   .irqs = bxtwc_regmap_irqs_tmu,
> >>   .num_irqs = ARRAY_SIZE(bxtwc_regmap_irqs_tmu),
> >>   .num_regs = 1,
> >> };
> >>
> >> This patch fixes this issue by creating new enum of tmu irqs and
> >> resetting its starting index to 0.
> >>
> >> Signed-off-by: Kuppuswamy Sathyanarayanan 
> >> 
> >> ---
> >>  drivers/mfd/intel_soc_pmic_bxtwc.c | 5 -
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > Patch looks fine, but please remove the code segment from the commit
> > log.
> >
> > For the code:
> >
> > For my own reference:
> >   Acked-for-MFD-by: Lee Jones 
> >
> >> diff --git a/drivers/mfd/intel_soc_pmic_bxtwc.c 
> >> b/drivers/mfd/intel_soc_pmic_bxtwc.c
> >> index 699c8c7..bb18e20 100644
> >> --- a/drivers/mfd/intel_soc_pmic_bxtwc.c
> >> +++ b/drivers/mfd/intel_soc_pmic_bxtwc.c
> >> @@ -94,7 +94,10 @@ enum bxtwc_irqs_level2 {
> >>   BXTWC_GPIO0_IRQ,
> >>   BXTWC_GPIO1_IRQ,
> >>   BXTWC_CRIT_IRQ,
> >> - BXTWC_TMU_IRQ,
> >> +};
> >> +
> >> +enum bxtwc_irqs_tmu {
> >> + BXTWC_TMU_IRQ = 0,
> >>  };
> >>
> >>  static const struct regmap_irq bxtwc_regmap_irqs[] = {
> >
> 
> 
> 

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH v1 1/7] mfd: intel_soc_pmic_bxtwc: fix TMU interrupt index

2017-04-12 Thread Lee Jones

On Wed, 12 Apr 2017, Sathyanarayanan Kuppuswamy Natarajan wrote:

> Hi Lee,
> 
> Thanks. Will remove the code segment in next version.

Please always reply inline.  Top posting is frowned upon.

> On Wed, Apr 12, 2017 at 3:45 AM, Lee Jones  wrote:
> > On Mon, 10 Apr 2017, sathyanarayanan.kuppusw...@linux.intel.com wrote:
> >
> >> From: Kuppuswamy Sathyanarayanan 
> >> 
> >>
> >> TMU interrupts are registered as a separate interrupt chip, and
> >> hence it should start its interrupt index(BXTWC_TMU_IRQ) number
> >> from 0. But currently, BXTWC_TMU_IRQ is defined as part of enum
> >> bxtwc_irqs_level2 and its index value is 11. Since this index
> >> value is used when calculating .num_irqs of regmap_irq_chip_tmu,
> >> it incorrectly reports number of irqs as 12 instead of actual
> >> value of 1.
> >>
> >> static const struct regmap_irq bxtwc_regmap_irqs_tmu[] = {
> >>   REGMAP_IRQ_REG(BXTWC_TMU_IRQ, 0, 0x06),
> >> };
> >>
> >> static struct regmap_irq_chip bxtwc_regmap_irq_chip_tmu = {
> >>   .name = "bxtwc_irq_chip_tmu",
> >>   .status_base = BXTWC_TMUIRQ,
> >>   .mask_base = BXTWC_MTMUIRQ,
> >>   .irqs = bxtwc_regmap_irqs_tmu,
> >>   .num_irqs = ARRAY_SIZE(bxtwc_regmap_irqs_tmu),
> >>   .num_regs = 1,
> >> };
> >>
> >> This patch fixes this issue by creating new enum of tmu irqs and
> >> resetting its starting index to 0.
> >>
> >> Signed-off-by: Kuppuswamy Sathyanarayanan 
> >> 
> >> ---
> >>  drivers/mfd/intel_soc_pmic_bxtwc.c | 5 -
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > Patch looks fine, but please remove the code segment from the commit
> > log.
> >
> > For the code:
> >
> > For my own reference:
> >   Acked-for-MFD-by: Lee Jones 
> >
> >> diff --git a/drivers/mfd/intel_soc_pmic_bxtwc.c 
> >> b/drivers/mfd/intel_soc_pmic_bxtwc.c
> >> index 699c8c7..bb18e20 100644
> >> --- a/drivers/mfd/intel_soc_pmic_bxtwc.c
> >> +++ b/drivers/mfd/intel_soc_pmic_bxtwc.c
> >> @@ -94,7 +94,10 @@ enum bxtwc_irqs_level2 {
> >>   BXTWC_GPIO0_IRQ,
> >>   BXTWC_GPIO1_IRQ,
> >>   BXTWC_CRIT_IRQ,
> >> - BXTWC_TMU_IRQ,
> >> +};
> >> +
> >> +enum bxtwc_irqs_tmu {
> >> + BXTWC_TMU_IRQ = 0,
> >>  };
> >>
> >>  static const struct regmap_irq bxtwc_regmap_irqs[] = {
> >
> 
> 
> 

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH v6] lightnvm: physical block device (pblk) target

2017-04-12 Thread Matias Bjørling


On 04/12/2017 03:19 PM, Javier González wrote:

This patch introduces pblk, a host-side translation layer for
Open-Channel SSDs to expose them like block devices. The translation
layer allows data placement decisions, and I/O scheduling to be
managed by the host, enabling users to optimize the SSD for their
specific workloads.

An open-channel SSD has a set of LUNs (parallel units) and a
collection of blocks. Each block can be read in any order, but
writes must be sequential. Writes may also fail, and if a block
requires it, must also be reset before new writes can be
applied.

To manage the constraints, pblk maintains a logical to
physical address (L2P) table,  write cache, garbage
collection logic, recovery scheme, and logic to rate-limit
user I/Os versus garbage collection I/Os.

The L2P table is fully-associative and manages sectors at a
4KB granularity. Pblk stores the L2P table in two places, in
the out-of-band area of the media and on the last page of a
line. In the cause of a power failure, pblk will perform a
scan to recover the L2P table.

The user data is organized into lines. A line is data
striped across blocks and LUNs. The lines enable the host to
reduce the amount of metadata to maintain besides the user
data and makes it easier to implement RAID or erasure coding
in the future.

pblk implements multi-tenant support and can be instantiated
multiple times on the same drive. Each instance owns a
portion of the SSD - both regarding I/O bandwidth and
capacity - providing I/O isolation for each case.

Finally, pblk also exposes a sysfs interface that allows
user-space to peek into the internals of pblk. The interface
is available at /dev/block/*/pblk/ where * is the block
device name exposed.

This work also contains contributions from:
  Matias Bjørling 
  Simon A. F. Lund 
  Young Tack Jin 
  Huaicheng Li 

Signed-off-by: Javier González 
---


Thanks Javier. Applied to 4.12, and replaced the v5 version.

Re: [PATCH v6] lightnvm: physical block device (pblk) target

2017-04-12 Thread Matias Bjørling


On 04/12/2017 03:19 PM, Javier González wrote:

This patch introduces pblk, a host-side translation layer for
Open-Channel SSDs to expose them like block devices. The translation
layer allows data placement decisions, and I/O scheduling to be
managed by the host, enabling users to optimize the SSD for their
specific workloads.

An open-channel SSD has a set of LUNs (parallel units) and a
collection of blocks. Each block can be read in any order, but
writes must be sequential. Writes may also fail, and if a block
requires it, must also be reset before new writes can be
applied.

To manage the constraints, pblk maintains a logical to
physical address (L2P) table,  write cache, garbage
collection logic, recovery scheme, and logic to rate-limit
user I/Os versus garbage collection I/Os.

The L2P table is fully-associative and manages sectors at a
4KB granularity. Pblk stores the L2P table in two places, in
the out-of-band area of the media and on the last page of a
line. In the cause of a power failure, pblk will perform a
scan to recover the L2P table.

The user data is organized into lines. A line is data
striped across blocks and LUNs. The lines enable the host to
reduce the amount of metadata to maintain besides the user
data and makes it easier to implement RAID or erasure coding
in the future.

pblk implements multi-tenant support and can be instantiated
multiple times on the same drive. Each instance owns a
portion of the SSD - both regarding I/O bandwidth and
capacity - providing I/O isolation for each case.

Finally, pblk also exposes a sysfs interface that allows
user-space to peek into the internals of pblk. The interface
is available at /dev/block/*/pblk/ where * is the block
device name exposed.

This work also contains contributions from:
  Matias Bjørling 
  Simon A. F. Lund 
  Young Tack Jin 
  Huaicheng Li 

Signed-off-by: Javier González 
---


Thanks Javier. Applied to 4.12, and replaced the v5 version.

Re: [PATCH v1 1/1] mtd: mtk-nor: set controller's address width according to nor flash

2017-04-12 Thread Cyrille Pitchen

Hi Guochun,

Le 05/04/2017 à 10:37, Guochun Mao a écrit :
> When nor's size larger than 16MByte, nor's address width maybe
> set to 3 or 4, and controller should change address width according
> to nor's setting.
> 
> Signed-off-by: Guochun Mao 
> ---
>  drivers/mtd/spi-nor/mtk-quadspi.c |   27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/mtd/spi-nor/mtk-quadspi.c 
> b/drivers/mtd/spi-nor/mtk-quadspi.c
> index e661877..b637770 100644
> --- a/drivers/mtd/spi-nor/mtk-quadspi.c
> +++ b/drivers/mtd/spi-nor/mtk-quadspi.c
> @@ -104,6 +104,8 @@
>  #define MTK_NOR_MAX_RX_TX_SHIFT  6
>  /* can shift up to 56 bits (7 bytes) transfer by MTK_NOR_PRG_CMD */
>  #define MTK_NOR_MAX_SHIFT7
> +/* nor controller 4-byte address mode enable bit */
> +#define MTK_NOR_4B_ADDR_EN   BIT(4)
>  
>  /* Helpers for accessing the program data / shift data registers */
>  #define MTK_NOR_PRG_REG(n)   (MTK_NOR_PRGDATA0_REG + 4 * (n))
> @@ -230,10 +232,35 @@ static int mt8173_nor_write_buffer_disable(struct 
> mt8173_nor *mt8173_nor)
> 1);
>  }
>  
> +static void mt8173_nor_set_addr_width(struct mt8173_nor *mt8173_nor)
> +{
> + u8 val;
> + struct spi_nor *nor = _nor->nor;
> +
> + val = readb(mt8173_nor->base + MTK_NOR_DUAL_REG);
> +
> + switch (nor->addr_width) {
> + case 3:
> + val &= ~MTK_NOR_4B_ADDR_EN;
> + break;
> + case 4:
> + val |= MTK_NOR_4B_ADDR_EN;
> + break;
> + default:
> + dev_warn(mt8173_nor->dev, "Unexpected address width %u.\n",
> +  nor->addr_width);
> + break;
> + }
> +
> + writeb(val, mt8173_nor->base + MTK_NOR_DUAL_REG);
> +}
> +
>  static void mt8173_nor_set_addr(struct mt8173_nor *mt8173_nor, u32 addr)
>  {
>   int i;
>  
> + mt8173_nor_set_addr_width(mt8173_nor);
> +
>   for (i = 0; i < 3; i++) {

Should it be 'i < nor->addr_width' instead of 'i < 3' ?
Does it work when accessing data after 128Mbit ?

Best regards,

Cyrille

>   writeb(addr & 0xff, mt8173_nor->base + MTK_NOR_RADR0_REG + i * 
> 4);
>   addr >>= 8;
>

Re: [PATCH v1 1/1] mtd: mtk-nor: set controller's address width according to nor flash

2017-04-12 Thread Cyrille Pitchen

Hi Guochun,

Le 05/04/2017 à 10:37, Guochun Mao a écrit :
> When nor's size larger than 16MByte, nor's address width maybe
> set to 3 or 4, and controller should change address width according
> to nor's setting.
> 
> Signed-off-by: Guochun Mao 
> ---
>  drivers/mtd/spi-nor/mtk-quadspi.c |   27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/drivers/mtd/spi-nor/mtk-quadspi.c 
> b/drivers/mtd/spi-nor/mtk-quadspi.c
> index e661877..b637770 100644
> --- a/drivers/mtd/spi-nor/mtk-quadspi.c
> +++ b/drivers/mtd/spi-nor/mtk-quadspi.c
> @@ -104,6 +104,8 @@
>  #define MTK_NOR_MAX_RX_TX_SHIFT  6
>  /* can shift up to 56 bits (7 bytes) transfer by MTK_NOR_PRG_CMD */
>  #define MTK_NOR_MAX_SHIFT7
> +/* nor controller 4-byte address mode enable bit */
> +#define MTK_NOR_4B_ADDR_EN   BIT(4)
>  
>  /* Helpers for accessing the program data / shift data registers */
>  #define MTK_NOR_PRG_REG(n)   (MTK_NOR_PRGDATA0_REG + 4 * (n))
> @@ -230,10 +232,35 @@ static int mt8173_nor_write_buffer_disable(struct 
> mt8173_nor *mt8173_nor)
> 1);
>  }
>  
> +static void mt8173_nor_set_addr_width(struct mt8173_nor *mt8173_nor)
> +{
> + u8 val;
> + struct spi_nor *nor = _nor->nor;
> +
> + val = readb(mt8173_nor->base + MTK_NOR_DUAL_REG);
> +
> + switch (nor->addr_width) {
> + case 3:
> + val &= ~MTK_NOR_4B_ADDR_EN;
> + break;
> + case 4:
> + val |= MTK_NOR_4B_ADDR_EN;
> + break;
> + default:
> + dev_warn(mt8173_nor->dev, "Unexpected address width %u.\n",
> +  nor->addr_width);
> + break;
> + }
> +
> + writeb(val, mt8173_nor->base + MTK_NOR_DUAL_REG);
> +}
> +
>  static void mt8173_nor_set_addr(struct mt8173_nor *mt8173_nor, u32 addr)
>  {
>   int i;
>  
> + mt8173_nor_set_addr_width(mt8173_nor);
> +
>   for (i = 0; i < 3; i++) {

Should it be 'i < nor->addr_width' instead of 'i < 3' ?
Does it work when accessing data after 128Mbit ?

Best regards,

Cyrille

>   writeb(addr & 0xff, mt8173_nor->base + MTK_NOR_RADR0_REG + i * 
> 4);
>   addr >>= 8;
>

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Dan Williams

On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony  wrote:
> On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
>> > /* We only care about memory errors */
>> > if (!(mce->status & MCACOD))
>> > return NOTIFY_DONE;
>
> N.B. that isn't a valid test that this is a memory error. You need
>
>
> if (!(m->status & 0xef80) == BIT(7))
> return NOTIFY_DONE;
>
> See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes

But Vishal's point is that even if we get this check correct the
notifier still requires no sleeping operations. So we would need to
move recoverable notifications to a separate blocking notifier chain.

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Dan Williams

On Wed, Apr 12, 2017 at 1:52 PM, Luck, Tony  wrote:
> On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
>> > /* We only care about memory errors */
>> > if (!(mce->status & MCACOD))
>> > return NOTIFY_DONE;
>
> N.B. that isn't a valid test that this is a memory error. You need
>
>
> if (!(m->status & 0xef80) == BIT(7))
> return NOTIFY_DONE;
>
> See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes

But Vishal's point is that even if we get this check correct the
notifier still requires no sleeping operations. So we would need to
move recoverable notifications to a separate blocking notifier chain.

[patch V2 09/13] cpufreq/ia64: Replace racy task affinity logic

2017-04-12 Thread Thomas Gleixner

The get() and target() callbacks must run on the affected cpu. This is
achieved by temporarily setting the affinity of the calling thread to the
requested CPU and reset it to the original affinity afterwards.

That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU and overwriting the new affinity setting.

Replace it by work_on_cpu(). All call pathes which invoke the callbacks are
already protected against CPU hotplug.

Signed-off-by: Thomas Gleixner 
Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux...@vger.kernel.org
---

V2: Fixup kbuild-robot complaints

 drivers/cpufreq/ia64-acpi-cpufreq.c |   92 +++-
 1 file changed, 39 insertions(+), 53 deletions(-)

Index: b/drivers/cpufreq/ia64-acpi-cpufreq.c
===
--- a/drivers/cpufreq/ia64-acpi-cpufreq.c
+++ b/drivers/cpufreq/ia64-acpi-cpufreq.c
@@ -34,6 +34,11 @@ struct cpufreq_acpi_io {
unsigned intresume;
 };
 
+struct cpufreq_acpi_req {
+   unsigned intcpu;
+   unsigned intstate;
+};
+
 static struct cpufreq_acpi_io  *acpi_io_data[NR_CPUS];
 
 static struct cpufreq_driver acpi_cpufreq_driver;
@@ -83,8 +88,7 @@ processor_get_pstate (
 static unsigned
 extract_clock (
struct cpufreq_acpi_io *data,
-   unsigned value,
-   unsigned int cpu)
+   unsigned value)
 {
unsigned long i;
 
@@ -98,60 +102,43 @@ extract_clock (
 }
 
 
-static unsigned int
+static long
 processor_get_freq (
-   struct cpufreq_acpi_io  *data,
-   unsigned intcpu)
+   void *arg)
 {
-   int ret = 0;
-   u32 value = 0;
-   cpumask_t   saved_mask;
-   unsigned long   clock_freq;
+   struct cpufreq_acpi_req *req = arg;
+   unsigned intcpu = req->cpu;
+   struct cpufreq_acpi_io  *data = acpi_io_data[cpu];
+   u32 value;
+   int ret;
 
pr_debug("processor_get_freq\n");
-
-   saved_mask = current->cpus_allowed;
-   set_cpus_allowed_ptr(current, cpumask_of(cpu));
if (smp_processor_id() != cpu)
-   goto migrate_end;
+   return -EAGAIN;
 
/* processor_get_pstate gets the instantaneous frequency */
ret = processor_get_pstate();
-
if (ret) {
-   set_cpus_allowed_ptr(current, _mask);
pr_warn("get performance failed with error %d\n", ret);
-   ret = 0;
-   goto migrate_end;
+   return ret;
}
-   clock_freq = extract_clock(data, value, cpu);
-   ret = (clock_freq*1000);
-
-migrate_end:
-   set_cpus_allowed_ptr(current, _mask);
-   return ret;
+   return 1000 * extract_clock(data, value);
 }
 
 
-static int
+static long
 processor_set_freq (
-   struct cpufreq_acpi_io  *data,
-   struct cpufreq_policy   *policy,
-   int state)
+   void *arg)
 {
-   int ret = 0;
-   u32 value = 0;
-   cpumask_t   saved_mask;
-   int retval;
+   struct cpufreq_acpi_req *req = arg;
+   unsigned intcpu = req->cpu;
+   struct cpufreq_acpi_io  *data = acpi_io_data[cpu];
+   int ret, state = req->state;
+   u32 value;
 
pr_debug("processor_set_freq\n");
-
-   saved_mask = current->cpus_allowed;
-   set_cpus_allowed_ptr(current, cpumask_of(policy->cpu));
-   if (smp_processor_id() != policy->cpu) {
-   retval = -EAGAIN;
-   goto migrate_end;
-   }
+   if (smp_processor_id() != cpu)
+   return -EAGAIN;
 
if (state == data->acpi_data.state) {
if (unlikely(data->resume)) {
@@ -159,8 +146,7 @@ processor_set_freq (
data->resume = 0;
} else {
pr_debug("Already at target state (P%d)\n", state);
-   retval = 0;
-   goto migrate_end;
+   return 0;
}
}
 
@@ -171,7 +157,6 @@ processor_set_freq (
 * First we write the target state's 'control' value to the
 * control_register.
 */
-
value = (u32) data->acpi_data.states[state].control;
 
pr_debug("Transitioning to state: 0x%08x\n", value);
@@ -179,17 +164,11 @@ processor_set_freq (
ret = processor_set_pstate(value);
if (ret) {
pr_warn("Transition failed with error %d\n", ret);
-   retval = -ENODEV;
-   goto migrate_end;
+   return -ENODEV;
}

[patch V2 09/13] cpufreq/ia64: Replace racy task affinity logic

2017-04-12 Thread Thomas Gleixner

The get() and target() callbacks must run on the affected cpu. This is
achieved by temporarily setting the affinity of the calling thread to the
requested CPU and reset it to the original affinity afterwards.

That's racy vs. concurrent affinity settings for that thread resulting in
code executing on the wrong CPU and overwriting the new affinity setting.

Replace it by work_on_cpu(). All call pathes which invoke the callbacks are
already protected against CPU hotplug.

Signed-off-by: Thomas Gleixner 
Cc: "Rafael J. Wysocki" 
Cc: Viresh Kumar 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux...@vger.kernel.org
---

V2: Fixup kbuild-robot complaints

 drivers/cpufreq/ia64-acpi-cpufreq.c |   92 +++-
 1 file changed, 39 insertions(+), 53 deletions(-)

Index: b/drivers/cpufreq/ia64-acpi-cpufreq.c
===
--- a/drivers/cpufreq/ia64-acpi-cpufreq.c
+++ b/drivers/cpufreq/ia64-acpi-cpufreq.c
@@ -34,6 +34,11 @@ struct cpufreq_acpi_io {
unsigned intresume;
 };
 
+struct cpufreq_acpi_req {
+   unsigned intcpu;
+   unsigned intstate;
+};
+
 static struct cpufreq_acpi_io  *acpi_io_data[NR_CPUS];
 
 static struct cpufreq_driver acpi_cpufreq_driver;
@@ -83,8 +88,7 @@ processor_get_pstate (
 static unsigned
 extract_clock (
struct cpufreq_acpi_io *data,
-   unsigned value,
-   unsigned int cpu)
+   unsigned value)
 {
unsigned long i;
 
@@ -98,60 +102,43 @@ extract_clock (
 }
 
 
-static unsigned int
+static long
 processor_get_freq (
-   struct cpufreq_acpi_io  *data,
-   unsigned intcpu)
+   void *arg)
 {
-   int ret = 0;
-   u32 value = 0;
-   cpumask_t   saved_mask;
-   unsigned long   clock_freq;
+   struct cpufreq_acpi_req *req = arg;
+   unsigned intcpu = req->cpu;
+   struct cpufreq_acpi_io  *data = acpi_io_data[cpu];
+   u32 value;
+   int ret;
 
pr_debug("processor_get_freq\n");
-
-   saved_mask = current->cpus_allowed;
-   set_cpus_allowed_ptr(current, cpumask_of(cpu));
if (smp_processor_id() != cpu)
-   goto migrate_end;
+   return -EAGAIN;
 
/* processor_get_pstate gets the instantaneous frequency */
ret = processor_get_pstate();
-
if (ret) {
-   set_cpus_allowed_ptr(current, _mask);
pr_warn("get performance failed with error %d\n", ret);
-   ret = 0;
-   goto migrate_end;
+   return ret;
}
-   clock_freq = extract_clock(data, value, cpu);
-   ret = (clock_freq*1000);
-
-migrate_end:
-   set_cpus_allowed_ptr(current, _mask);
-   return ret;
+   return 1000 * extract_clock(data, value);
 }
 
 
-static int
+static long
 processor_set_freq (
-   struct cpufreq_acpi_io  *data,
-   struct cpufreq_policy   *policy,
-   int state)
+   void *arg)
 {
-   int ret = 0;
-   u32 value = 0;
-   cpumask_t   saved_mask;
-   int retval;
+   struct cpufreq_acpi_req *req = arg;
+   unsigned intcpu = req->cpu;
+   struct cpufreq_acpi_io  *data = acpi_io_data[cpu];
+   int ret, state = req->state;
+   u32 value;
 
pr_debug("processor_set_freq\n");
-
-   saved_mask = current->cpus_allowed;
-   set_cpus_allowed_ptr(current, cpumask_of(policy->cpu));
-   if (smp_processor_id() != policy->cpu) {
-   retval = -EAGAIN;
-   goto migrate_end;
-   }
+   if (smp_processor_id() != cpu)
+   return -EAGAIN;
 
if (state == data->acpi_data.state) {
if (unlikely(data->resume)) {
@@ -159,8 +146,7 @@ processor_set_freq (
data->resume = 0;
} else {
pr_debug("Already at target state (P%d)\n", state);
-   retval = 0;
-   goto migrate_end;
+   return 0;
}
}
 
@@ -171,7 +157,6 @@ processor_set_freq (
 * First we write the target state's 'control' value to the
 * control_register.
 */
-
value = (u32) data->acpi_data.states[state].control;
 
pr_debug("Transitioning to state: 0x%08x\n", value);
@@ -179,17 +164,11 @@ processor_set_freq (
ret = processor_set_pstate(value);
if (ret) {
pr_warn("Transition failed with error %d\n", ret);
-   retval = -ENODEV;
-   goto migrate_end;
+   return -ENODEV;
}
 
data->acpi_data.state = state;
-
-   retval = 0;
-
-migrate_end:
-   set_cpus_allowed_ptr(current,

[PATCH] pwm: fix platform_no_drv_owner.cocci warnings

2017-04-12 Thread kbuild test robot

drivers/pwm/pwm-mediatek.c:210:3-8: No need to set .owner here. The core will 
do it.

 Remove .owner field if calls are used which set it automatically

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

CC: John Crispin 
Signed-off-by: Fengguang Wu 
---

 pwm-mediatek.c |1 -
 1 file changed, 1 deletion(-)

--- a/drivers/pwm/pwm-mediatek.c
+++ b/drivers/pwm/pwm-mediatek.c
@@ -207,7 +207,6 @@ MODULE_DEVICE_TABLE(of, mtk_pwm_of_match
 static struct platform_driver mtk_pwm_driver = {
.driver = {
.name = "mtk-pwm",
-   .owner = THIS_MODULE,
.of_match_table = mtk_pwm_of_match,
},
.probe = mtk_pwm_probe,

[PATCH] pwm: fix platform_no_drv_owner.cocci warnings

2017-04-12 Thread kbuild test robot

drivers/pwm/pwm-mediatek.c:210:3-8: No need to set .owner here. The core will 
do it.

 Remove .owner field if calls are used which set it automatically

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

CC: John Crispin 
Signed-off-by: Fengguang Wu 
---

 pwm-mediatek.c |1 -
 1 file changed, 1 deletion(-)

--- a/drivers/pwm/pwm-mediatek.c
+++ b/drivers/pwm/pwm-mediatek.c
@@ -207,7 +207,6 @@ MODULE_DEVICE_TABLE(of, mtk_pwm_of_match
 static struct platform_driver mtk_pwm_driver = {
.driver = {
.name = "mtk-pwm",
-   .owner = THIS_MODULE,
.of_match_table = mtk_pwm_of_match,
},
.probe = mtk_pwm_probe,

[patch V 2 04/13] ia64/sn/hwperf: Replace racy task affinity logic

2017-04-12 Thread Thomas Gleixner

Subject: ia64/sn/hwperf: Replace racy task affinity logic
From: Thomas Gleixner 
Date: Thu, 06 Apr 2017 14:56:18 +0200

sn_hwperf_op_cpu() which is invoked from an ioctl requires to run code on
the requested cpu. This is achieved by temporarily setting the affinity of
the calling user space thread to the requested CPU and reset it to the
original affinity afterwards.

That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.

Replace it by using work_on_cpu_safe() which guarantees to run the code on
the requested CPU or to fail in case the CPU is offline.

Signed-off-by: Thomas Gleixner 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
---

V2: Use the work function for real and remove the unused variable (reported
by kbuild robot)

 arch/ia64/sn/kernel/sn2/sn_hwperf.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: b/arch/ia64/sn/kernel/sn2/sn_hwperf.c
===
--- a/arch/ia64/sn/kernel/sn2/sn_hwperf.c
+++ b/arch/ia64/sn/kernel/sn2/sn_hwperf.c
@@ -598,12 +598,17 @@ static void sn_hwperf_call_sal(void *inf
op_info->ret = r;
 }
 
+static long sn_hwperf_call_sal_work(void *info)
+{
+   sn_hwperf_call_sal(info);
+   return 0;
+}
+
 static int sn_hwperf_op_cpu(struct sn_hwperf_op_info *op_info)
 {
u32 cpu;
u32 use_ipi;
int r = 0;
-   cpumask_t save_allowed;

cpu = (op_info->a->arg & SN_HWPERF_ARG_CPU_MASK) >> 32;
use_ipi = op_info->a->arg & SN_HWPERF_ARG_USE_IPI_MASK;
@@ -629,13 +634,9 @@ static int sn_hwperf_op_cpu(struct sn_hw
/* use an interprocessor interrupt to call SAL */
smp_call_function_single(cpu, sn_hwperf_call_sal,
op_info, 1);
-   }
-   else {
-   /* migrate the task before calling SAL */ 
-   save_allowed = current->cpus_allowed;
-   set_cpus_allowed_ptr(current, cpumask_of(cpu));
-   sn_hwperf_call_sal(op_info);
-   set_cpus_allowed_ptr(current, _allowed);
+   } else {
+   /* Call on the target CPU */
+   work_on_cpu_safe(cpu, sn_hwperf_call_sal_work, op_info);
}
}
r = op_info->ret;

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-12 Thread Vlastimil Babka

On 12.4.2017 8:46, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   drivers/block/nbd.c
>   drivers/scsi/iscsi_tcp.c
>   net/core/dev.c
>   net/core/sock.c
> 
> between commit:
> 
>   717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
> tsk_restore_flags() to current_restore_flags()")
> 
> from the tip tree and commit:
> 
>   61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")
> 
> from the akpm-current tree.

Yeah, the first patch from Neil renames a function (as its subject says) and the
second patch from me converts most of its users to new helpers specific to the
PF_MEMALLOC flags.

> I fixed it up (the latter is just a superset of the former, so I used

It's not a complete superset though, more on that below.

> that) and can carry the fix as necessary. This is now fixed as far as
> linux-next is concerned, but any non trivial conflicts should be mentioned
> to your upstream maintainer when your tree is submitted for merging.
> You may also want to consider cooperating with the maintainer of the
> conflicting tree to minimise any particularly complex conflicts.

Hmm I could redo my patch on top of Neil's patch, but then Andrew would have to
carry Neil's patch as well just to have a working mmotm? And then make sure to
send my patch (but not Neil's) only after the tip tree is pulled? Would that
work for the maintainers involved?

> It looks like there may be more instances that the latter patch should
> update.

I see two remaining instances of current_restore_flags(). One in __do_softirq()
is even for PF_MEMALLOC, but there the flag is cleared first and then set back,
which is opposite of the common case that my helpers provide. The other in nfsd
is for PF_LESS_THROTTLE which is not common enough to earn own helpers yet. IIRC
Neil originally wanted to add a new one?

[patch V 2 04/13] ia64/sn/hwperf: Replace racy task affinity logic

2017-04-12 Thread Thomas Gleixner

Subject: ia64/sn/hwperf: Replace racy task affinity logic
From: Thomas Gleixner 
Date: Thu, 06 Apr 2017 14:56:18 +0200

sn_hwperf_op_cpu() which is invoked from an ioctl requires to run code on
the requested cpu. This is achieved by temporarily setting the affinity of
the calling user space thread to the requested CPU and reset it to the
original affinity afterwards.

That's racy vs. CPU hotplug and concurrent affinity settings for that
thread resulting in code executing on the wrong CPU and overwriting the
new affinity setting.

Replace it by using work_on_cpu_safe() which guarantees to run the code on
the requested CPU or to fail in case the CPU is offline.

Signed-off-by: Thomas Gleixner 
Cc: Tony Luck 
Cc: Fenghua Yu 
Cc: linux-i...@vger.kernel.org
---

V2: Use the work function for real and remove the unused variable (reported
by kbuild robot)

 arch/ia64/sn/kernel/sn2/sn_hwperf.c |   17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: b/arch/ia64/sn/kernel/sn2/sn_hwperf.c
===
--- a/arch/ia64/sn/kernel/sn2/sn_hwperf.c
+++ b/arch/ia64/sn/kernel/sn2/sn_hwperf.c
@@ -598,12 +598,17 @@ static void sn_hwperf_call_sal(void *inf
op_info->ret = r;
 }
 
+static long sn_hwperf_call_sal_work(void *info)
+{
+   sn_hwperf_call_sal(info);
+   return 0;
+}
+
 static int sn_hwperf_op_cpu(struct sn_hwperf_op_info *op_info)
 {
u32 cpu;
u32 use_ipi;
int r = 0;
-   cpumask_t save_allowed;

cpu = (op_info->a->arg & SN_HWPERF_ARG_CPU_MASK) >> 32;
use_ipi = op_info->a->arg & SN_HWPERF_ARG_USE_IPI_MASK;
@@ -629,13 +634,9 @@ static int sn_hwperf_op_cpu(struct sn_hw
/* use an interprocessor interrupt to call SAL */
smp_call_function_single(cpu, sn_hwperf_call_sal,
op_info, 1);
-   }
-   else {
-   /* migrate the task before calling SAL */ 
-   save_allowed = current->cpus_allowed;
-   set_cpus_allowed_ptr(current, cpumask_of(cpu));
-   sn_hwperf_call_sal(op_info);
-   set_cpus_allowed_ptr(current, _allowed);
+   } else {
+   /* Call on the target CPU */
+   work_on_cpu_safe(cpu, sn_hwperf_call_sal_work, op_info);
}
}
r = op_info->ret;

Re: linux-next: manual merge of the akpm-current tree with the tip tree

2017-04-12 Thread Vlastimil Babka

On 12.4.2017 8:46, Stephen Rothwell wrote:
> Hi Andrew,
> 
> Today's linux-next merge of the akpm-current tree got conflicts in:
> 
>   drivers/block/nbd.c
>   drivers/scsi/iscsi_tcp.c
>   net/core/dev.c
>   net/core/sock.c
> 
> between commit:
> 
>   717a94b5fc70 ("sched/core: Remove 'task' parameter and rename 
> tsk_restore_flags() to current_restore_flags()")
> 
> from the tip tree and commit:
> 
>   61d5ad5b2e8a ("treewide: convert PF_MEMALLOC manipulations to new helpers")
> 
> from the akpm-current tree.

Yeah, the first patch from Neil renames a function (as its subject says) and the
second patch from me converts most of its users to new helpers specific to the
PF_MEMALLOC flags.

> I fixed it up (the latter is just a superset of the former, so I used

It's not a complete superset though, more on that below.

> that) and can carry the fix as necessary. This is now fixed as far as
> linux-next is concerned, but any non trivial conflicts should be mentioned
> to your upstream maintainer when your tree is submitted for merging.
> You may also want to consider cooperating with the maintainer of the
> conflicting tree to minimise any particularly complex conflicts.

Hmm I could redo my patch on top of Neil's patch, but then Andrew would have to
carry Neil's patch as well just to have a working mmotm? And then make sure to
send my patch (but not Neil's) only after the tip tree is pulled? Would that
work for the maintainers involved?

> It looks like there may be more instances that the latter patch should
> update.

I see two remaining instances of current_restore_flags(). One in __do_softirq()
is even for PF_MEMALLOC, but there the flag is cleared first and then set back,
which is opposite of the common case that my helpers provide. The other in nfsd
is for PF_LESS_THROTTLE which is not common enough to earn own helpers yet. IIRC
Neil originally wanted to add a new one?

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Luck, Tony

On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> >     /* We only care about memory errors */
> >     if (!(mce->status & MCACOD))
> >     return NOTIFY_DONE;

N.B. that isn't a valid test that this is a memory error. You need


if (!(m->status & 0xef80) == BIT(7))
return NOTIFY_DONE;

See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes

-Tony

Re: [RFC PATCH] x86, mce: change the mce notifier to 'blocking' from 'atomic'

2017-04-12 Thread Luck, Tony

On Wed, Apr 12, 2017 at 01:27:05PM -0700, Verma, Vishal L wrote:
> >     /* We only care about memory errors */
> >     if (!(mce->status & MCACOD))
> >     return NOTIFY_DONE;

N.B. that isn't a valid test that this is a memory error. You need


if (!(m->status & 0xef80) == BIT(7))
return NOTIFY_DONE;

See: Intel SDM Volume 3B - 15.9.2 Compound Error Codes

-Tony

Re: [PATCH v2 2/3] net: macb: Remove CONFIG_OF around DT match table

2017-04-12 Thread Rob Herring

On Tue, Apr 11, 2017 at 11:41 PM, Florian Fainelli  wrote:
> A subsequent patch is going to make of_match_node() an inline stub when
> CONFIG_OF is disabled, which will let the compiler eliminate unused variables.
> In order not to clutter the code more, remove the CONFIG_OF #ifdef such that
> macb_dt_ids and what it references are always defined.
>
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/cadence/macb.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 30606b11b128..01016e9525ee 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -2811,7 +2811,6 @@ static int macb_init(struct platform_device *pdev)
> return 0;
>  }
>
> -#if defined(CONFIG_OF)
>  /* 1518 rounded up */
>  #define AT91ETHER_MAX_RBUFF_SZ 0x600
>  /* max number of receive buffers */
> @@ -3215,7 +3214,6 @@ static const struct of_device_id macb_dt_ids[] = {
> { /* sentinel */ }
>  };
>  MODULE_DEVICE_TABLE(of, macb_dt_ids);

Did you actually check that the compiler can eliminate things when
modules are enabled? Because it's not going to be able to eliminate
MODULE_DEVICE_TABLE entry AFAICT.

Rob

Re: [PATCH v2 2/3] net: macb: Remove CONFIG_OF around DT match table

2017-04-12 Thread Rob Herring

On Tue, Apr 11, 2017 at 11:41 PM, Florian Fainelli  wrote:
> A subsequent patch is going to make of_match_node() an inline stub when
> CONFIG_OF is disabled, which will let the compiler eliminate unused variables.
> In order not to clutter the code more, remove the CONFIG_OF #ifdef such that
> macb_dt_ids and what it references are always defined.
>
> Signed-off-by: Florian Fainelli 
> ---
>  drivers/net/ethernet/cadence/macb.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb.c 
> b/drivers/net/ethernet/cadence/macb.c
> index 30606b11b128..01016e9525ee 100644
> --- a/drivers/net/ethernet/cadence/macb.c
> +++ b/drivers/net/ethernet/cadence/macb.c
> @@ -2811,7 +2811,6 @@ static int macb_init(struct platform_device *pdev)
> return 0;
>  }
>
> -#if defined(CONFIG_OF)
>  /* 1518 rounded up */
>  #define AT91ETHER_MAX_RBUFF_SZ 0x600
>  /* max number of receive buffers */
> @@ -3215,7 +3214,6 @@ static const struct of_device_id macb_dt_ids[] = {
> { /* sentinel */ }
>  };
>  MODULE_DEVICE_TABLE(of, macb_dt_ids);

Did you actually check that the compiler can eliminate things when
modules are enabled? Because it's not going to be able to eliminate
MODULE_DEVICE_TABLE entry AFAICT.

Rob

Re: [PATCH v2] libnvdimm: fix btt vs clear poison locking

2017-04-12 Thread Dan Williams

On Wed, Apr 12, 2017 at 5:49 AM, Jeff Moyer  wrote:
> Dan Williams  writes:
>
>> As a minimal fix, disable error clearing when the BTT is enabled for the
>> namespace. For the final fix a larger rework of the poison list locking
>> is needed.
>
> I think "fix" is a strong word for this patch.  ;-)  Looks fine, though.

Heh, true. I'll rename it to "band aid" since the true "fix" is coming in 4.12.

> Reviewed-by: Jeff Moyer 

Thanks!

Re: [PATCH v2] libnvdimm: fix btt vs clear poison locking

2017-04-12 Thread Dan Williams

On Wed, Apr 12, 2017 at 5:49 AM, Jeff Moyer  wrote:
> Dan Williams  writes:
>
>> As a minimal fix, disable error clearing when the BTT is enabled for the
>> namespace. For the final fix a larger rework of the poison list locking
>> is needed.
>
> I think "fix" is a strong word for this patch.  ;-)  Looks fine, though.

Heh, true. I'll rename it to "band aid" since the true "fix" is coming in 4.12.

> Reviewed-by: Jeff Moyer 

Thanks!

Re: net/ipv4: use-after-free in ipv4_datagram_support_cmsg

2017-04-12 Thread Eric Dumazet

On Wed, 2017-04-12 at 13:07 -0700, Cong Wang wrote:
> On Wed, Apr 12, 2017 at 8:39 AM, Willem de Bruijn
>  wrote:
> > ===
> >> BUG: KASAN: use-after-free in ipv4_datagram_support_cmsg
> >> net/ipv4/ip_sockglue.c:500 [inline] at addr 880059be0128
> >
> > Thanks for the report. This is accessing skb->dev from within recvmsg() at 
> > line
> >
> > info->ipi_ifindex = skb->dev->ifindex;
> >
> > Introduced in 829ae9d61165 ("net-timestamp: allow reading recv cmsg on
> > errqueue with origin tstamp"). At this time the device may indeed have
> > gone away. I'm having a look at a way to read this in the receive BH
> > and store the ifindex.
> 
> Why not use skb_iif?
> 
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index ebd953b..a2aef45 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -497,7 +497,7 @@ static bool ipv4_datagram_support_cmsg(const
> struct sock *sk,
> 
> info = PKTINFO_SKB_CB(skb);
> info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
> -   info->ipi_ifindex = skb->dev->ifindex;
> +   info->ipi_ifindex = skb->skb_iif;
> return true;
>  }

This seems to differ from the code in ipv4_pktinfo_prepare()

We probably want to unify things a bit.

/* skb->cb is overloaded: prior to this point it is IP{6}CB
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
 * element so the iif is picked up from the prior IPCB. If iif
 * is the loopback interface, then return the sending interface
 * (e.g., process binds socket to eth0 for Tx which is
 * redirected to loopback in the rtable/dst).
 */
if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
pktinfo->ipi_ifindex = inet_iif(skb);

Re: net/ipv4: use-after-free in ipv4_datagram_support_cmsg

2017-04-12 Thread Eric Dumazet

On Wed, 2017-04-12 at 13:07 -0700, Cong Wang wrote:
> On Wed, Apr 12, 2017 at 8:39 AM, Willem de Bruijn
>  wrote:
> > ===
> >> BUG: KASAN: use-after-free in ipv4_datagram_support_cmsg
> >> net/ipv4/ip_sockglue.c:500 [inline] at addr 880059be0128
> >
> > Thanks for the report. This is accessing skb->dev from within recvmsg() at 
> > line
> >
> > info->ipi_ifindex = skb->dev->ifindex;
> >
> > Introduced in 829ae9d61165 ("net-timestamp: allow reading recv cmsg on
> > errqueue with origin tstamp"). At this time the device may indeed have
> > gone away. I'm having a look at a way to read this in the receive BH
> > and store the ifindex.
> 
> Why not use skb_iif?
> 
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index ebd953b..a2aef45 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -497,7 +497,7 @@ static bool ipv4_datagram_support_cmsg(const
> struct sock *sk,
> 
> info = PKTINFO_SKB_CB(skb);
> info->ipi_spec_dst.s_addr = ip_hdr(skb)->saddr;
> -   info->ipi_ifindex = skb->dev->ifindex;
> +   info->ipi_ifindex = skb->skb_iif;
> return true;
>  }

This seems to differ from the code in ipv4_pktinfo_prepare()

We probably want to unify things a bit.

/* skb->cb is overloaded: prior to this point it is IP{6}CB
 * which has interface index (iif) as the first member of the
 * underlying inet{6}_skb_parm struct. This code then overlays
 * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
 * element so the iif is picked up from the prior IPCB. If iif
 * is the loopback interface, then return the sending interface
 * (e.g., process binds socket to eth0 for Tx which is
 * redirected to loopback in the rtable/dst).
 */
if (pktinfo->ipi_ifindex == LOOPBACK_IFINDEX)
pktinfo->ipi_ifindex = inet_iif(skb);

Re: [PATCH net-next v1 1/1] L2TP device MTU setup - tunnel socket needs a lock

2017-04-12 Thread R Parameswaran

Hi Guillaume,

Please see inline:

On Wed, Apr 12, 2017 at 12:53 AM, Guillaume Nault  wrote:
> On Tue, Apr 11, 2017 at 08:14:37PM -0700, R. Parameswaran wrote:
>>
>> The MTU overhead calculation in L2TP device set-up
>> merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
>> needs to be adjusted to lock the tunnel socket while
>> referencing the sub-data structures to derive the
>> socket's IP overhead.
>
> Thanks.
>
> Tested-by: Guillaume Nault 
>
> BTW, you don't need to add "v1" for the first version of a patch.
> There's also no need for numbering pathes when there's only one in the
> series. And we normally prefix the commit message with ": ".
> For this patch, your subject would look like " [PATCH net-next] l2tp: ...".
>
> Also, you could have added a "Reported-by:" tag (I don't really mind
> in this case, but that's good practice).

Thanks for correcting these (and for testing the changes) and sorry
for the Reported-by omission. I'll respin by tonight
with these, per reply to Dave.

regards,

Ramkumar

Re: [PATCH net-next v1 1/1] L2TP device MTU setup - tunnel socket needs a lock

2017-04-12 Thread R Parameswaran

Hi Guillaume,

Please see inline:

On Wed, Apr 12, 2017 at 12:53 AM, Guillaume Nault  wrote:
> On Tue, Apr 11, 2017 at 08:14:37PM -0700, R. Parameswaran wrote:
>>
>> The MTU overhead calculation in L2TP device set-up
>> merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
>> needs to be adjusted to lock the tunnel socket while
>> referencing the sub-data structures to derive the
>> socket's IP overhead.
>
> Thanks.
>
> Tested-by: Guillaume Nault 
>
> BTW, you don't need to add "v1" for the first version of a patch.
> There's also no need for numbering pathes when there's only one in the
> series. And we normally prefix the commit message with ": ".
> For this patch, your subject would look like " [PATCH net-next] l2tp: ...".
>
> Also, you could have added a "Reported-by:" tag (I don't really mind
> in this case, but that's good practice).

Thanks for correcting these (and for testing the changes) and sorry
for the Reported-by omission. I'll respin by tonight
with these, per reply to Dave.

regards,

Ramkumar

Re: [kernel-hardening] Re: [PATCH RFC v2 1/3] LSM: Allow per LSM module per "struct task_struct" blob.

2017-04-12 Thread Casey Schaufler

On 4/12/2017 9:22 AM, Djalal Harouni wrote:
> On Tue, Apr 11, 2017 at 6:43 AM, Kees Cook  wrote:
>> On Mon, Apr 10, 2017 at 1:00 PM, Djalal Harouni  wrote:
>>> On Mon, Apr 10, 2017 at 9:26 PM, Casey Schaufler  
>>> wrote:
 I think that would be the prudent approach. There is still
 the possibility that blob sharing (or full stacking, if you
 prefer) won't be accepted any time soon.
>>> Ok Casey! I will wait for more feedback, and if other maintainers do
>>> not object, I will convert it back to rhashtables in next iterations
>>> making sure that it should be simple to convert later to a blob
>>> sharing mechanism.
>> Would it be possible just to add a single field to task_struct if this
>> LSM is built in? I feel like rhashtables is a huge overhead when a
>> single field is all that's needed.
> Well, yes rhashtables can have an overhead especially when reclaiming
> memory back, I could not identify a way how to separate tables unless
> we use cgroups as an ID. Anyway this of course could be added in
> task_struct and updated to work like the capability security hooks
> rather than a proper LSM with its own name. But as noted in the other
> response, we may need task->security field for Yama anyway. I'm open
> to suggestion ? I may try to converge the task->security blob with
> what Casey is proposing and see! otherwise fallback to task_struct as
> a last resort!
>
> Thanks!

I can present a patch set based on my existing stacking
work that includes the move from module based memory
management to infrastructure memory management but pretty
well stops there. There will be changes to Kconfig to allow
stacking of everything except SELinux and Smack, because
that's the only combination with other conficts. Well,
there's /proc/attr, but I'd include the module specific
subdirectories, too. That would allow landlock to come in
and TOMOYO (and potentially YAMA) to use/share the task
blob. I see this as taking down a barrier rather than
otherwise pointless infrastructure change.

Re: [kernel-hardening] Re: [PATCH RFC v2 1/3] LSM: Allow per LSM module per "struct task_struct" blob.

2017-04-12 Thread Casey Schaufler

On 4/12/2017 9:22 AM, Djalal Harouni wrote:
> On Tue, Apr 11, 2017 at 6:43 AM, Kees Cook  wrote:
>> On Mon, Apr 10, 2017 at 1:00 PM, Djalal Harouni  wrote:
>>> On Mon, Apr 10, 2017 at 9:26 PM, Casey Schaufler  
>>> wrote:
 I think that would be the prudent approach. There is still
 the possibility that blob sharing (or full stacking, if you
 prefer) won't be accepted any time soon.
>>> Ok Casey! I will wait for more feedback, and if other maintainers do
>>> not object, I will convert it back to rhashtables in next iterations
>>> making sure that it should be simple to convert later to a blob
>>> sharing mechanism.
>> Would it be possible just to add a single field to task_struct if this
>> LSM is built in? I feel like rhashtables is a huge overhead when a
>> single field is all that's needed.
> Well, yes rhashtables can have an overhead especially when reclaiming
> memory back, I could not identify a way how to separate tables unless
> we use cgroups as an ID. Anyway this of course could be added in
> task_struct and updated to work like the capability security hooks
> rather than a proper LSM with its own name. But as noted in the other
> response, we may need task->security field for Yama anyway. I'm open
> to suggestion ? I may try to converge the task->security blob with
> what Casey is proposing and see! otherwise fallback to task_struct as
> a last resort!
>
> Thanks!

I can present a patch set based on my existing stacking
work that includes the move from module based memory
management to infrastructure memory management but pretty
well stops there. There will be changes to Kconfig to allow
stacking of everything except SELinux and Smack, because
that's the only combination with other conficts. Well,
there's /proc/attr, but I'd include the module specific
subdirectories, too. That would allow landlock to come in
and TOMOYO (and potentially YAMA) to use/share the task
blob. I see this as taking down a barrier rather than
otherwise pointless infrastructure change.

Re: [PATCH] kbuild: use -Oz instead of -Os when using clang

2017-04-12 Thread Masahiro Yamada

2017-04-12 12:38 GMT+09:00 Masahiro Yamada :
> Hi Matthias,
>
>
> 2017-04-11 3:08 GMT+09:00 Matthias Kaehlcke :
>> Hi Masahiro,
>>
>> El Fri, Mar 31, 2017 at 02:57:43AM +0900 Masahiro Yamada ha dit:
>>
>>> 2017-03-31 1:41 GMT+09:00 Matthias Kaehlcke :
>>> > El Fri, Mar 31, 2017 at 01:03:02AM +0900 Masahiro Yamada ha dit:
>>> >
>>> >> 2017-03-28 10:19 GMT+09:00 Matthias Kaehlcke :
>>> >> > This generates smaller resulting object code when compiled with clang.
>>> >> >
>>> >> > Signed-off-by: Matthias Kaehlcke 
>>> >> > ---
>>> >> >  Makefile | 3 ++-
>>> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
>>> >> >
>>> >> > diff --git a/Makefile b/Makefile
>>> >> > index b2faa9319372..66bcdbf58371 100644
>>> >> > --- a/Makefile
>>> >> > +++ b/Makefile
>>> >> > @@ -638,7 +638,8 @@ KBUILD_CFLAGS   += $(call 
>>> >> > cc-option,-fdata-sections,)
>>> >> >  endif
>>> >> >
>>> >> >  ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
>>> >> > -KBUILD_CFLAGS  += -Os $(call cc-disable-warning,maybe-uninitialized,)
>>> >> > +KBUILD_CFLAGS  += $(call cc-option,-Oz,-Os)
>>> >> > +KBUILD_CFLAGS  += $(call cc-disable-warning,maybe-uninitialized,)
>>> >> >  else
>>> >> >  ifdef CONFIG_PROFILE_ALL_BRANCHES
>>> >> >  KBUILD_CFLAGS  += -O2 $(call cc-disable-warning,maybe-uninitialized,)
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> This is the same as the following commit in LLVMLinux
>>> >> except Author/Signed-off-by.
>>> >>
>>> >> Who should the authorship really belong to?
>>> >
>>> > (this time without html, sorry for the noise)
>>> >
>>> > It should belong to Behan, I missed to add a 'From' tag, sorry about
>>> > that. Should I resend with the tag or can you fix it when applying the
>>> > patch?
>>>
>>>
>>> No need to re-send it.  I can fixup it manually.
>>
>> I couldn't locate this patch in your tree, has it been picked up?
>>
>
>
> I will pick this up shortly.  Sorry for the delay.
>


Applied to linux-kbuild/kbuild.  Thanks!



-- 
Best Regards
Masahiro Yamada

Re: [PATCH] kbuild: use -Oz instead of -Os when using clang

2017-04-12 Thread Masahiro Yamada

2017-04-12 12:38 GMT+09:00 Masahiro Yamada :
> Hi Matthias,
>
>
> 2017-04-11 3:08 GMT+09:00 Matthias Kaehlcke :
>> Hi Masahiro,
>>
>> El Fri, Mar 31, 2017 at 02:57:43AM +0900 Masahiro Yamada ha dit:
>>
>>> 2017-03-31 1:41 GMT+09:00 Matthias Kaehlcke :
>>> > El Fri, Mar 31, 2017 at 01:03:02AM +0900 Masahiro Yamada ha dit:
>>> >
>>> >> 2017-03-28 10:19 GMT+09:00 Matthias Kaehlcke :
>>> >> > This generates smaller resulting object code when compiled with clang.
>>> >> >
>>> >> > Signed-off-by: Matthias Kaehlcke 
>>> >> > ---
>>> >> >  Makefile | 3 ++-
>>> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
>>> >> >
>>> >> > diff --git a/Makefile b/Makefile
>>> >> > index b2faa9319372..66bcdbf58371 100644
>>> >> > --- a/Makefile
>>> >> > +++ b/Makefile
>>> >> > @@ -638,7 +638,8 @@ KBUILD_CFLAGS   += $(call 
>>> >> > cc-option,-fdata-sections,)
>>> >> >  endif
>>> >> >
>>> >> >  ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
>>> >> > -KBUILD_CFLAGS  += -Os $(call cc-disable-warning,maybe-uninitialized,)
>>> >> > +KBUILD_CFLAGS  += $(call cc-option,-Oz,-Os)
>>> >> > +KBUILD_CFLAGS  += $(call cc-disable-warning,maybe-uninitialized,)
>>> >> >  else
>>> >> >  ifdef CONFIG_PROFILE_ALL_BRANCHES
>>> >> >  KBUILD_CFLAGS  += -O2 $(call cc-disable-warning,maybe-uninitialized,)
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> This is the same as the following commit in LLVMLinux
>>> >> except Author/Signed-off-by.
>>> >>
>>> >> Who should the authorship really belong to?
>>> >
>>> > (this time without html, sorry for the noise)
>>> >
>>> > It should belong to Behan, I missed to add a 'From' tag, sorry about
>>> > that. Should I resend with the tag or can you fix it when applying the
>>> > patch?
>>>
>>>
>>> No need to re-send it.  I can fixup it manually.
>>
>> I couldn't locate this patch in your tree, has it been picked up?
>>
>
>
> I will pick this up shortly.  Sorry for the delay.
>


Applied to linux-kbuild/kbuild.  Thanks!



-- 
Best Regards
Masahiro Yamada

Re: [PATCH net-next v1 1/1] L2TP device MTU setup - tunnel socket needs a lock

2017-04-12 Thread R Parameswaran

Hi Dave,

Please see inline:

On Wed, Apr 12, 2017 at 7:13 AM, David Miller  wrote:
> From: "R. Parameswaran" 
> Date: Tue, 11 Apr 2017 20:14:37 -0700 (PDT)
>
>>
>> The MTU overhead calculation in L2TP device set-up
>> merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
>> needs to be adjusted to lock the tunnel socket while
>> referencing the sub-data structures to derive the
>> socket's IP overhead.
>
> This is missing a proper signoff.
>
> The subject line also needs to be fixed "[PATCH net-next] l2tp: " as explained
> by Guillaume.
>

Thanks, I will re-spin with these corrections by tonight PT.

regards,

Ramkumar
> Thanks.

Re: [PATCH net-next v1 1/1] L2TP device MTU setup - tunnel socket needs a lock

2017-04-12 Thread R Parameswaran

Hi Dave,

Please see inline:

On Wed, Apr 12, 2017 at 7:13 AM, David Miller  wrote:
> From: "R. Parameswaran" 
> Date: Tue, 11 Apr 2017 20:14:37 -0700 (PDT)
>
>>
>> The MTU overhead calculation in L2TP device set-up
>> merged via commit b784e7ebfce8cfb16c6f95e14e8532d0768ab7ff
>> needs to be adjusted to lock the tunnel socket while
>> referencing the sub-data structures to derive the
>> socket's IP overhead.
>
> This is missing a proper signoff.
>
> The subject line also needs to be fixed "[PATCH net-next] l2tp: " as explained
> by Guillaume.
>

Thanks, I will re-spin with these corrections by tonight PT.

regards,

Ramkumar
> Thanks.

[PATCH nf-next] ip_vs_sync: change comparison on sync_refresh_period

2017-04-12 Thread Aaron Conole

The sync_refresh_period variable is unsigned, so it can never be < 0.

Signed-off-by: Aaron Conole 
---
 net/netfilter/ipvs/ip_vs_sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index b03c280..123dc0f 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -520,7 +520,7 @@ static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs,
if (!(cp->flags & IP_VS_CONN_F_TEMPLATE) &&
pkts % sync_period != sysctl_sync_threshold(ipvs))
return 0;
-   } else if (sync_refresh_period <= 0 &&
+   } else if (!sync_refresh_period &&
   pkts != sysctl_sync_threshold(ipvs))
return 0;
 
-- 
2.9.3

[PATCH nf-next] ip_vs_sync: change comparison on sync_refresh_period

2017-04-12 Thread Aaron Conole

The sync_refresh_period variable is unsigned, so it can never be < 0.

Signed-off-by: Aaron Conole 
---
 net/netfilter/ipvs/ip_vs_sync.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index b03c280..123dc0f 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -520,7 +520,7 @@ static int ip_vs_sync_conn_needed(struct netns_ipvs *ipvs,
if (!(cp->flags & IP_VS_CONN_F_TEMPLATE) &&
pkts % sync_period != sysctl_sync_threshold(ipvs))
return 0;
-   } else if (sync_refresh_period <= 0 &&
+   } else if (!sync_refresh_period &&
   pkts != sysctl_sync_threshold(ipvs))
return 0;
 
-- 
2.9.3

[PATCH resend 1/4] uapi glibc compat: add libc compat code when not build for kernel

2017-04-12 Thread Hauke Mehrtens

Instead of checking if this header file is used in the glibc, check if
iti is not used in kernel context, this way it will also work with
other libc implementations like musl.

Acked-by: Mikko Rapeli 
Signed-off-by: Hauke Mehrtens 
---
 include/uapi/linux/libc-compat.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/libc-compat.h b/include/uapi/linux/libc-compat.h
index 44b8a6bd5fe1..7c1fead03c50 100644
--- a/include/uapi/linux/libc-compat.h
+++ b/include/uapi/linux/libc-compat.h
@@ -48,8 +48,8 @@
 #ifndef _UAPI_LIBC_COMPAT_H
 #define _UAPI_LIBC_COMPAT_H
 
-/* We have included glibc headers... */
-#if defined(__GLIBC__)
+/* We have included libc headers... */
+#if !defined(__KERNEL__)
 
 /* Coordinate with glibc net/if.h header. */
 #if defined(_NET_IF_H) && defined(__USE_MISC)
@@ -168,7 +168,7 @@
 /* If we did not see any headers from any supported C libraries,
  * or we are being included in the kernel, then define everything
  * that we need. */
-#else /* !defined(__GLIBC__) */
+#else /* defined(__KERNEL__) */
 
 /* Definitions for if.h */
 #define __UAPI_DEF_IF_IFCONF 1
@@ -208,6 +208,6 @@
 /* Definitions for xattr.h */
 #define __UAPI_DEF_XATTR   1
 
-#endif /* __GLIBC__ */
+#endif /* __KERNEL__ */
 
 #endif /* _UAPI_LIBC_COMPAT_H */
-- 
2.11.0

[PATCH nf-next] nf_conntrack: remove double assignment

2017-04-12 Thread Aaron Conole

The protonet pointer will unconditionally be rewritten, so just do the
needed assignment first.

Signed-off-by: Aaron Conole 
---
 net/netfilter/nf_conntrack_proto.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/netfilter/nf_conntrack_proto.c 
b/net/netfilter/nf_conntrack_proto.c
index 2d6ee18..e6d2945 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -441,9 +441,8 @@ EXPORT_SYMBOL_GPL(nf_ct_l4proto_unregister_one);
 void nf_ct_l4proto_pernet_unregister_one(struct net *net,
 struct nf_conntrack_l4proto *l4proto)
 {
-   struct nf_proto_net *pn = NULL;
+   struct nf_proto_net *pn = nf_ct_l4proto_net(net, l4proto);
 
-   pn = nf_ct_l4proto_net(net, l4proto);
if (pn == NULL)
return;
 
-- 
2.9.3

< 1 2 3 4 5 6 7 8 9 10 >

301 - 400 of 1922 matches

Mail list logo