date:20140817

Re: linux-next: build failure after merge of the sound-asoc tree

2014-08-17 Thread Sean Cross

On 08/18/14 06:30, Stephen Rothwell wrote:
> Hi all,
>
> After merging the sound-asoc tree, today's linux-next build (powerpc 
> allyesconfig)
> failed like this:
>
> sound/soc/fsl/imx-pcm-fiq.c:31:21: fatal error: asm/fiq.h: No such file or 
> directory
>  #include 
>  ^
>
> Caused by commit 7e7292dba215 ("ASoC: fsl: add imx-es8328 machine
> driver").  Presumably it will only build on arm?
>
> I reverted that commit for today.
The following patch should fix the problem:

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index c0ace69..13199b5 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -237,8 +237,6 @@ config SND_SOC_IMX_ES8328
 select SND_SOC_IMX_PCM_DMA
 select SND_SOC_IMX_AUDMUX
 select SND_SOC_FSL_SSI
-select SND_SOC_FSL_UTILS
-select SND_SOC_IMX_PCM_FIQ
 help
   Say Y if you want to add support for the ES8328 audio codec connected
   via SSI/I2S over either SPI or I2C.

That gives it almost the exact same kernel config as the SGTL5000.

Is this the sort of thing you can apply on your end, or would you like
me to resubmit a v12 with just this file?  I'm afraid I don't have a PPC
toolchain to test with.


Sean



signature.asc
Description: OpenPGP digital signature

Ponownie aktywowac' skrzynki pocztowej!!!

2014-08-17 Thread Admin

Drogi uzytkowniku,



To jest poinformowac, ze skrzynka pocztowa nie przekraczala kwoty mail, a 
moze nie byc w stanie wysylac i odbierac nowe wiadomosci e-mail, az jego 
aktualizacji. Prosze tutaj  <  http://adminupgrradepocztaccenter.webs.com/ >
uaktualnic i ponownie skrzynke pocztowa. Dziekujemy za zrozumienie. Mamy 
Przepraszamy za wszelkie niedogodnosci i dziekujemy za zrozumienie.


Pozdrowienia,
Email Helpdesk Administrator

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Ponownie aktywowac' skrzynki pocztowej!!!

2014-08-17 Thread Admin

Drogi uzytkowniku,



To jest poinformowac, ze skrzynka pocztowa nie przekraczala kwoty mail, a 
moze nie byc w stanie wysylac i odbierac nowe wiadomosci e-mail, az jego 
aktualizacji. Prosze tutaj  <  http://adminupgrradepocztaccenter.webs.com/ >
uaktualnic i ponownie skrzynke pocztowa. Dziekujemy za zrozumienie. Mamy 
Przepraszamy za wszelkie niedogodnosci i dziekujemy za zrozumienie.


Pozdrowienia,
Email Helpdesk Administrator

---
This email is free from viruses and malware because avast! Antivirus protection 
is active.
http://www.avast.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10

2014-08-17 Thread Nadav Amit

This should have been a benign patch. I'll try to get windows 7 installation 
disk and check ASAP.

Nadav

> On 18 Aug 2014, at 05:17, Wanpeng Li  wrote:
> 
> Hi Nadav,
>> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote:
>> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
>> sometime make assumptions on CPUs while they ignore capability MSRs, it is
>> better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
>> actually supported has no functional implications.
>> 
>> Signed-off-by: Nadav Amit 
>> ---
>> arch/x86/include/asm/kvm_host.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/include/asm/kvm_host.h 
>> b/arch/x86/include/asm/kvm_host.h
>> index 4931415..0bab29d 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t 
>> base_gfn, int level)
>> #define KVM_REFILL_PAGES 25
>> #define KVM_MAX_CPUID_ENTRIES 80
>> #define KVM_NR_FIXED_MTRR_REGION 88
>> -#define KVM_NR_VAR_MTRR 8
>> +#define KVM_NR_VAR_MTRR 10
> 
> We observed that there is obvious regression caused by this commit, 32bit 
> win7 guest show blue screen during boot.
> 
> Regards,
> Wanpeng Li 
> 
>> #define ASYNC_PF_PER_VCPU 64
>> 
>> -- 
>> 1.9.1
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] flush_icache_range: Export symbol to fix build errors

2014-08-17 Thread Max Filippov

Hi Pranith,

On Mon, Aug 18, 2014 at 8:24 AM, Pranith Kumar  wrote:
> Fix building errors occuring due to a missing export of flush_icache_range() 
> in
> architectures missing the export.

Can you be a little more specific here, what build errors?

[...]

> diff --git a/arch/frv/include/asm/cacheflush.h 
> b/arch/frv/include/asm/cacheflush.h
> index edbac54..07ee4b3 100644
> --- a/arch/frv/include/asm/cacheflush.h
> +++ b/arch/frv/include/asm/cacheflush.h
> @@ -72,6 +72,7 @@ static inline void flush_icache_range(unsigned long start, 
> unsigned long end)
>  {
> frv_cache_wback_inv(start, end);
>  }
> +EXPORT_SYMBOL(flush_icache_range);

EXPORT_SYMBOL should not be placed into header file as it defines
a non-static variable.

[...]

> diff --git a/arch/metag/include/asm/cacheflush.h 
> b/arch/metag/include/asm/cacheflush.h
> index 7787ec5..117c212 100644
> --- a/arch/metag/include/asm/cacheflush.h
> +++ b/arch/metag/include/asm/cacheflush.h
> @@ -124,6 +124,7 @@ static inline void flush_icache_range(unsigned long 
> address,
> metag_code_cache_flush((void *) address, endaddr - address);
>  #endif
>  }
> +EXPORT_SYMBOL(flush_icache_range);

Same here.

-- 
Thanks.
-- Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4.

2014-08-17 Thread NeilBrown

Any attempt to look up a pathname that passes though an
autofs4 mount is currently forced out of RCU-walk into
REF-walk.

This can significantly hurt performance of many-thread work
loads on many-core systems, especially if the automounted
filesystem supports RCU-walk but doesn't get to benefit from
it.

So if autofs4_d_manage is called with rcu_walk set, only
fail with -ECHILD if it is necessary to wait longer than
a spinlock.

Signed-off-by: NeilBrown 
---
 fs/autofs4/autofs_i.h  |2 +-
 fs/autofs4/dev-ioctl.c |2 +-
 fs/autofs4/expire.c|4 +++-
 fs/autofs4/root.c  |   44 +---
 4 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 9e359fb20c0a..2f1032f12d91 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -148,7 +148,7 @@ void autofs4_free_ino(struct autofs_info *);
 
 /* Expiration */
 int is_autofs4_dentry(struct dentry *);
-int autofs4_expire_wait(struct dentry *dentry);
+int autofs4_expire_wait(struct dentry *dentry, int rcu_walk);
 int autofs4_expire_run(struct super_block *, struct vfsmount *,
struct autofs_sb_info *,
struct autofs_packet_expire __user *);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 5b570b6efa28..aaf96cb25452 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -450,7 +450,7 @@ static int autofs_dev_ioctl_requester(struct file *fp,
ino = autofs4_dentry_ino(path.dentry);
if (ino) {
err = 0;
-   autofs4_expire_wait(path.dentry);
+   autofs4_expire_wait(path.dentry, 0);
spin_lock(&sbi->fs_lock);
param->requester.uid = from_kuid_munged(current_user_ns(), 
ino->uid);
param->requester.gid = from_kgid_munged(current_user_ns(), 
ino->gid);
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index a7be57e39be7..7e2f22ce6954 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -467,7 +467,7 @@ found:
return expired;
 }
 
-int autofs4_expire_wait(struct dentry *dentry)
+int autofs4_expire_wait(struct dentry *dentry, int rcu_walk)
 {
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
@@ -477,6 +477,8 @@ int autofs4_expire_wait(struct dentry *dentry)
spin_lock(&sbi->fs_lock);
if (ino->flags & AUTOFS_INF_EXPIRING) {
spin_unlock(&sbi->fs_lock);
+   if (rcu_walk)
+   return -ECHILD;
 
DPRINTK("waiting for expire %p name=%.*s",
 dentry, dentry->d_name.len, dentry->d_name.name);
diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index cdb25ebccc4c..2296c8301b66 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -210,7 +210,8 @@ next:
return NULL;
 }
 
-static struct dentry *autofs4_lookup_expiring(struct dentry *dentry)
+static struct dentry *autofs4_lookup_expiring(struct dentry *dentry,
+ bool rcu_walk)
 {
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct dentry *parent = dentry->d_parent;
@@ -229,6 +230,11 @@ static struct dentry *autofs4_lookup_expiring(struct 
dentry *dentry)
struct dentry *expiring;
struct qstr *qstr;
 
+   if (rcu_walk) {
+   spin_unlock(&sbi->lookup_lock);
+   return ERR_PTR(-ECHILD);
+   }
+
ino = list_entry(p, struct autofs_info, expiring);
expiring = ino->dentry;
 
@@ -264,13 +270,15 @@ next:
return NULL;
 }
 
-static int autofs4_mount_wait(struct dentry *dentry)
+static int autofs4_mount_wait(struct dentry *dentry, bool rcu_walk)
 {
struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb);
struct autofs_info *ino = autofs4_dentry_ino(dentry);
int status = 0;
 
if (ino->flags & AUTOFS_INF_PENDING) {
+   if (rcu_walk)
+   return -ECHILD;
DPRINTK("waiting for mount name=%.*s",
dentry->d_name.len, dentry->d_name.name);
status = autofs4_wait(sbi, dentry, NFY_MOUNT);
@@ -280,20 +288,22 @@ static int autofs4_mount_wait(struct dentry *dentry)
return status;
 }
 
-static int do_expire_wait(struct dentry *dentry)
+static int do_expire_wait(struct dentry *dentry, bool rcu_walk)
 {
struct dentry *expiring;
 
-   expiring = autofs4_lookup_expiring(dentry);
+   expiring = autofs4_lookup_expiring(dentry, rcu_walk);
+   if (IS_ERR(expiring))
+   return PTR_ERR(expiring);
if (!expiring)
-   return autofs4_expire_wait(dentry);
+   return autofs4_expire_wait(dentry, rcu_walk);
else {
/*
 * If we are racing with expire the reques

[PATCH 5/5] autofs: the documentation I wanted to read

2014-08-17 Thread NeilBrown

This documents autofs from the perspective of what the module actually
supports rather than how automount is expected to use it.
It is based mostly on code review and very little on testing so it
may be inaccurate in some places.

The document assumes the functionality added by the RCU-walk patches
that I posted recently.

It is formatted using "markdown" and works best with Markdown.pl
(markdown_py doesn't like some constructs).


Copy-edited-by: Randy Dunlap 
Signed-off-by: NeilBrown 
Acked-by: Ian Kent 
---
 Documentation/filesystems/autofs4.txt |  520 +
 1 file changed, 520 insertions(+)
 create mode 100644 Documentation/filesystems/autofs4.txt

diff --git a/Documentation/filesystems/autofs4.txt 
b/Documentation/filesystems/autofs4.txt
new file mode 100644
index ..ae315e2768d2
--- /dev/null
+++ b/Documentation/filesystems/autofs4.txt
@@ -0,0 +1,520 @@
+
+ p { max-width:50em} ol, ul {max-width: 40em}
+
+
+autofs - how it works
+=
+
+Purpose
+---
+
+The goal of autofs is to provide on-demand mounting and race free
+automatic unmounting of various other filesystems.  This provides two
+key advantages:
+
+1. There is no need to delay boot until all filesystems that
+   might be needed are mounted.  Processes that try to access those
+   slow filesystems might be delayed but other processes can
+   continue freely.  This is particularly important for
+   network filesystems (e.g. NFS) or filesystems stored on
+   media with a media-changing robot.
+
+2. The names and locations of filesystems can be stored in
+   a remote database and can change at any time.  The content
+   in that data base at the time of access will be used to provide
+   a target for the access.  The interpretation of names in the
+   filesystem can even be programmatic rather than database-backed,
+   allowing wildcards for example, and can vary based on the user who
+   first accessed a name.
+
+Context
+---
+
+The "autofs4" filesystem module is only one part of an autofs system.
+There also needs to be a user-space program which looks up names
+and mounts filesystems.  This will often be the "automount" program,
+though other tools including "systemd" can make use of "autofs4".
+This document describes only the kernel module and the interactions
+required with any user-space program.  Subsequent text refers to this
+as the "automount daemon" or simply "the daemon".
+
+"autofs4" is a Linux kernel module with provides the "autofs"
+filesystem type.  Several "autofs" filesystems can be mounted and they
+can each be managed separately, or all managed by the same daemon.
+
+Content
+---
+
+An autofs filesystem can contain 3 sorts of objects: directories,
+symbolic links and mount traps.  Mount traps are directories with
+extra properties as described in the next section.
+
+Objects can only be created by the automount daemon: symlinks are
+created with a regular `symlink` system call, while directories and
+mount traps are created with `mkdir`.  The determination of whether a
+directory should be a mount trap or not is quite _ad hoc_, largely for
+historical reasons, and is determined in part by the
+*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option.
+
+If neither the *direct* or *offset* mount options are given (so the
+mount is considered to be *indirect*), then the root directory is
+always a regular directory, otherwise it is a mount trap when it is
+empty and a regular directory when not empty.  Note that *direct* and
+*offset* are treated identically so a concise summary is that the root
+directory is a mount trap only if the filesystem is mounted *direct*
+and the root is empty.
+
+Directories created in the root directory are mount traps only if the
+filesystem is mounted  *indirect* and they are empty.
+
+Directories further down the tree depend on the *maxproto* mount
+option and particularly whether it is less than five or not.
+When *maxproto* is five, no directories further down the
+tree are ever mount traps, they are always regular directories.  When
+the *maxproto* is four (or three), these directories are mount traps
+precisely when they are empty.
+
+So: non-empty (i.e. non-leaf) directories are never mount traps. Empty
+directories are sometimes mount traps, and sometimes not depending on
+where in the tree they are (root, top level, or lower), the *maxproto*,
+and whether the mount was *indirect* or not.
+
+Mount Traps
+---
+
+A core element of the implementation of autofs is the Mount Traps
+which are provided by the Linux VFS.  Any directory provided by a
+filesystem can be designated as a trap.  This involves two separate
+features that work together to allow autofs to do its job.
+
+**DCACHE_NEED_AUTOMOUNT**
+
+If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if
+the inode has S_AUTOMOUNT set, or can be set directly) then it is
+(potentially) a mount trap.  Any access to this directory

[PATCH 2/5] autofs4: factor should_expire() out of autofs4_expire_indirect.

2014-08-17 Thread NeilBrown

Future patch will potentially call this twice, so make it
separate.

Signed-off-by: NeilBrown 
---
 fs/autofs4/expire.c |  162 ---
 1 file changed, 88 insertions(+), 74 deletions(-)

diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index 7e2f22ce6954..bee939efca2b 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -345,6 +345,89 @@ out:
return NULL;
 }
 
+/* Check if 'dentry' should expire, or return a nearby
+ * dentry that is suitable.
+ * If returned dentry is different from arg dentry,
+ * then a dget() reference was taken, else not.
+ */
+static struct dentry *should_expire(struct dentry *dentry,
+   struct vfsmount *mnt,
+   unsigned long timeout,
+   int how)
+{
+   int do_now = how & AUTOFS_EXP_IMMEDIATE;
+   int exp_leaves = how & AUTOFS_EXP_LEAVES;
+   struct autofs_info *ino = autofs4_dentry_ino(dentry);
+   unsigned int ino_count;
+
+   /* No point expiring a pending mount */
+   if (ino->flags & AUTOFS_INF_PENDING)
+   return NULL;
+
+   /*
+* Case 1: (i) indirect mount or top level pseudo direct mount
+* (autofs-4.1).
+* (ii) indirect mount with offset mount, check the "/"
+* offset (autofs-5.0+).
+*/
+   if (d_mountpoint(dentry)) {
+   DPRINTK("checking mountpoint %p %.*s",
+   dentry, (int)dentry->d_name.len, dentry->d_name.name);
+
+   /* Can we umount this guy */
+   if (autofs4_mount_busy(mnt, dentry))
+   return NULL;
+
+   /* Can we expire this guy */
+   if (autofs4_can_expire(dentry, timeout, do_now))
+   return dentry;
+   return NULL;
+   }
+
+   if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)) {
+   DPRINTK("checking symlink %p %.*s",
+   dentry, (int)dentry->d_name.len, dentry->d_name.name);
+   /*
+* A symlink can't be "busy" in the usual sense so
+* just check last used for expire timeout.
+*/
+   if (autofs4_can_expire(dentry, timeout, do_now))
+   return dentry;
+   return NULL;
+   }
+
+   if (simple_empty(dentry))
+   return NULL;
+
+   /* Case 2: tree mount, expire iff entire tree is not busy */
+   if (!exp_leaves) {
+   /* Path walk currently on this dentry? */
+   ino_count = atomic_read(&ino->count) + 1;
+   if (d_count(dentry) > ino_count)
+   return NULL;
+
+   if (!autofs4_tree_busy(mnt, dentry, timeout, do_now))
+   return dentry;
+   /*
+* Case 3: pseudo direct mount, expire individual leaves
+* (autofs-4.1).
+*/
+   } else {
+   /* Path walk currently on this dentry? */
+   struct dentry *expired;
+   ino_count = atomic_read(&ino->count) + 1;
+   if (d_count(dentry) > ino_count)
+   return NULL;
+
+   expired = autofs4_check_leaves(mnt, dentry, timeout, do_now);
+   if (expired) {
+   if (expired == dentry)
+   dput(dentry);
+   return expired;
+   }
+   }
+   return NULL;
+}
 /*
  * Find an eligible tree to time-out
  * A tree is eligible if :-
@@ -359,11 +442,8 @@ struct dentry *autofs4_expire_indirect(struct super_block 
*sb,
unsigned long timeout;
struct dentry *root = sb->s_root;
struct dentry *dentry;
-   struct dentry *expired = NULL;
-   int do_now = how & AUTOFS_EXP_IMMEDIATE;
-   int exp_leaves = how & AUTOFS_EXP_LEAVES;
+   struct dentry *expired;
struct autofs_info *ino;
-   unsigned int ino_count;
 
if (!root)
return NULL;
@@ -374,78 +454,12 @@ struct dentry *autofs4_expire_indirect(struct super_block 
*sb,
dentry = NULL;
while ((dentry = get_next_positive_subdir(dentry, root))) {
spin_lock(&sbi->fs_lock);
-   ino = autofs4_dentry_ino(dentry);
-   /* No point expiring a pending mount */
-   if (ino->flags & AUTOFS_INF_PENDING)
-   goto next;
-
-   /*
-* Case 1: (i) indirect mount or top level pseudo direct mount
-* (autofs-4.1).
-* (ii) indirect mount with offset mount, check the "/"
-* offset (autofs-5.0+).
-*/
-   if (d_mountpoint(dentry)) {
-   DPRINTK("checking mountpoint %p %.*s",
-   dentry, (int)dentry->d_name.len, 
dentry->d_name

[PATCH 3/5] autofs4: avoid taking fs_lock during rcu-walk

2014-08-17 Thread NeilBrown

->fs_lock protects AUTOFS_INF_EXPIRING.  We need to be sure
that once the flag is set, no new references beneath the dentry
are taken.  So rcu-walk currently needs to take fs_lock before
checking the flag.  This hurts performance.

Change the expiry to a two-stage process.
First set AUTOFS_INF_NO_RCU which forces any path walk into
ref-walk mode, then drop the lock and call synchronize_rcu().
Once that returns we can be sure no rcu-walk is active beneath
the dentry and we can check reference counts again.

Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without
taking the lock as along as we test AUTOFS_INF_NO_RCU too.
If either are set, we must abort the RCU-walk
If neither are set, we know that refcounts will be tested again
after we finish the RCU-walk so we are safe to continue.

->fs_lock is still taken in d_manage() to check for a non-trap
directory.  That will be resolved in the next patch.

Signed-off-by: NeilBrown 
---
 fs/autofs4/autofs_i.h |4 
 fs/autofs4/expire.c   |   46 ++
 2 files changed, 42 insertions(+), 8 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index 2f1032f12d91..8e98cf954bab 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -79,6 +79,10 @@ struct autofs_info {
 };
 
 #define AUTOFS_INF_EXPIRING(1<<0) /* dentry is in the process of expiring 
*/
+#define AUTOFS_INF_NO_RCU  (1<<1) /* the dentry is being considered
+   * for expiry, so RCU_walk is
+   * not permitted
+   */
 #define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */
 
 struct autofs_wait_queue {
diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c
index bee939efca2b..eb4b770a4bf6 100644
--- a/fs/autofs4/expire.c
+++ b/fs/autofs4/expire.c
@@ -333,10 +333,19 @@ struct dentry *autofs4_expire_direct(struct super_block 
*sb,
if (ino->flags & AUTOFS_INF_PENDING)
goto out;
if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
-   ino->flags |= AUTOFS_INF_EXPIRING;
-   init_completion(&ino->expire_complete);
+   ino->flags |= AUTOFS_INF_NO_RCU;
spin_unlock(&sbi->fs_lock);
-   return root;
+   synchronize_rcu();
+   spin_lock(&sbi->fs_lock);
+   if (!autofs4_direct_busy(mnt, root, timeout, do_now)) {
+   ino->flags |= AUTOFS_INF_EXPIRING;
+   smp_mb();
+   ino->flags &= ~AUTOFS_INF_NO_RCU;
+   init_completion(&ino->expire_complete);
+   spin_unlock(&sbi->fs_lock);
+   return root;
+   }
+   ino->flags &= ~AUTOFS_INF_NO_RCU;
}
 out:
spin_unlock(&sbi->fs_lock);
@@ -454,12 +463,29 @@ struct dentry *autofs4_expire_indirect(struct super_block 
*sb,
dentry = NULL;
while ((dentry = get_next_positive_subdir(dentry, root))) {
spin_lock(&sbi->fs_lock);
-   expired = should_expire(dentry, mnt, timeout, how);
-   if (expired) {
+   ino = autofs4_dentry_ino(dentry);
+   if (ino->flags & AUTOFS_INF_NO_RCU)
+   expired = NULL;
+   else
+   expired = should_expire(dentry, mnt, timeout, how);
+   if (!expired) {
+   spin_unlock(&sbi->fs_lock);
+   continue;
+   }
+   ino = autofs4_dentry_ino(expired);
+   ino->flags |= AUTOFS_INF_NO_RCU;
+   spin_unlock(&sbi->fs_lock);
+   synchronize_rcu();
+   spin_lock(&sbi->fs_lock);
+   if (should_expire(expired, mnt, timeout, how)) {
if (expired != dentry)
dput(dentry);
goto found;
}
+
+   ino->flags &= ~AUTOFS_INF_NO_RCU;
+   if (expired != dentry)
+   dput(expired);
spin_unlock(&sbi->fs_lock);
}
return NULL;
@@ -467,8 +493,9 @@ struct dentry *autofs4_expire_indirect(struct super_block 
*sb,
 found:
DPRINTK("returning %p %.*s",
expired, (int)expired->d_name.len, expired->d_name.name);
-   ino = autofs4_dentry_ino(expired);
ino->flags |= AUTOFS_INF_EXPIRING;
+   smp_mb();
+   ino->flags &= ~AUTOFS_INF_NO_RCU;
init_completion(&ino->expire_complete);
spin_unlock(&sbi->fs_lock);
spin_lock(&sbi->lookup_lock);
@@ -488,11 +515,14 @@ int autofs4_expire_wait(struct dentry *dentry, int 
rcu_walk)
int status;
 
/* Block on any pending expire */
+   if (!(ino->flags & (AUTOFS_INF_EXPIRING | AUTOFS_INF_NO_RCU)))
+   return 0;
+   if (rcu_walk)
+   return -ECHI

[PATCH 0/5] RCU-walk support for autofs

2014-08-17 Thread NeilBrown

Hi Ian,
 Have you had a chance to run your tests in these patches yet?
 I've done what testing I can think of and cannot fault them.

 This set is against 3.17-rc1 and make use of the new -EISDIR handling
 for d_manage() and assumes the other patches which already went in
 through Andrew Morton.

 I've added a section to autofs4.txt about mount namespaces, but it is
 otherwise unchanged.

 If I could get an {Acked,Reviewed,Tested}-By in the next few weeks so
 I can send them on to Andrew I would really appreciate it.

Thanks,
NeilBrown



---

NeilBrown (5):
  autofs4: allow RCU-walk to walk through autofs4.
  autofs4: factor should_expire() out of autofs4_expire_indirect.
  autofs4: avoid taking fs_lock during rcu-walk
  autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk 
mode.
  autofs: the documentation I wanted to read


 Documentation/filesystems/autofs4.txt |  520 +
 fs/autofs4/autofs_i.h |6 
 fs/autofs4/dev-ioctl.c|2 
 fs/autofs4/expire.c   |  200 -
 fs/autofs4/root.c |   62 +++-
 5 files changed, 694 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/filesystems/autofs4.txt

-- 
Signature

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.

2014-08-17 Thread NeilBrown

If rcu-walk mode we don't *have* to return -EISDIR for non-mount-traps
as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT
dentrys the slow way.  But it is better if we do when possible.

In 'oz_mode', use the same condition as ref-walk: if not a mountpoint,
then it must be -EISDIR.

In regular mode there are most tests needed.  Most of them can be
performed without taking any spinlocks.
If we find a directory that isn't obviously empty, and isn't mounted
on, we need to call 'simple_empty()' which does take a spinlock.
If this turned out to hurt performance, some other approach could
be found to signal when a directory is known to be empty.

Signed-off-by: NeilBrown 
---
 fs/autofs4/root.c |   26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
index 2296c8301b66..71e4413d65c8 100644
--- a/fs/autofs4/root.c
+++ b/fs/autofs4/root.c
@@ -433,8 +433,6 @@ static int autofs4_d_manage(struct dentry *dentry, bool 
rcu_walk)
 
/* The daemon never waits. */
if (autofs4_oz_mode(sbi)) {
-   if (rcu_walk)
-   return 0;
if (!d_mountpoint(dentry))
return -EISDIR;
return 0;
@@ -452,12 +450,28 @@ static int autofs4_d_manage(struct dentry *dentry, bool 
rcu_walk)
if (status)
return status;
 
-   if (rcu_walk)
-   /* it is always safe to return 0 as the worst that
-* will happen is we retry in REF-walk mode.
-* Better than always taking a lock.
+   if (rcu_walk) {
+   /* We don't need fs_lock in rcu_walk mode,
+* just testing 'AUTOFS_INFO_NO_RCU' is enough.
+* simple_empty() takes a spinlock, so leave it
+* to last.
+* We only return -EISDIR when certain this isn't
+* a mount-trap.
 */
+   struct inode *inode;
+   if (ino->flags & (AUTOFS_INF_EXPIRING | AUTOFS_INF_NO_RCU))
+   return 0;
+   if (d_mountpoint(dentry))
+   return 0;
+   inode = rcu_dereference(dentry->d_inode);
+   if (inode && S_ISLNK(inode->i_mode))
+   return -EISDIR;
+   if (list_empty(&dentry->d_subdirs))
+   return 0;
+   if (!simple_empty(dentry))
+   return -EISDIR;
return 0;
+   }
 
spin_lock(&sbi->fs_lock);
/*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv4 0/2] regulator: of: Add support for pasing regulator suspend state

2014-08-17 Thread Chanwoo Choi

The regulators would set different state/mode according to the kind of suspend
state. So regulation_constraints structure has already regulator suspend state 
filed.
This patch parse regulator suspend state from devicetree file.

For example:

ldoX_reg: LDOx {
regulator-name = "VAP_XXX_1.2V";
regulator-min-microvolt = <120>;
regulator-max-microvolt = <120>;
regulator-always-on;

regulator-initial-state = <3>;  /* PM_SUSPEND_MEM */
regulator-state-mem {
regulator-off-in-suspend;
};

regulator-state-disk {
regulator-volt = <120>;
regulator-on-in-suspend;
};
};

Changes from v3:
- Don't support 'regulator-state-standby' mode
- Remove 'regulator-mode' property

Changes from v2:
- Fix over 80 lines by using checkpatch script
- Rebase this patchset on latest for-next branch of regulator.git

Changes from v1:
- Check whether regulator-initial-state and regulator-mode is correct or not
- Add more detailed description about regulator-initial-state, regulator-mode
  and regulator-state-[standby/mem/disk] for devicetree bindings
- Modify example of regulator suspend state in bindings documentation

Chanwoo Choi (2):
  regulator: of: Add support for parsing regulator_state for suspend state
  dt-bindings: regulator: Add regulator suspend state for PM state

 .../devicetree/bindings/regulator/regulator.txt| 22 
 drivers/regulator/of_regulator.c   | 65 +-
 2 files changed, 85 insertions(+), 2 deletions(-)

-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv4 1/2] regulator: of: Add support for parsing regulator_state for suspend state

2014-08-17 Thread Chanwoo Choi

The regulation_constraints structure includes specific field to support
suspend state for global PMIC SUSPEND/HIBERNATE mode. This patch add support
for parsing regulator_state for suspend state.

Signed-off-by: Chanwoo Choi 
Acked-by: Kyungmin Park 
---
 drivers/regulator/of_regulator.c | 65 ++--
 1 file changed, 63 insertions(+), 2 deletions(-)

diff --git a/drivers/regulator/of_regulator.c b/drivers/regulator/of_regulator.c
index ee5e67b..5fe5748 100644
--- a/drivers/regulator/of_regulator.c
+++ b/drivers/regulator/of_regulator.c
@@ -16,12 +16,19 @@
 #include 
 #include 
 
+const char *const regulator_states[PM_SUSPEND_MAX + 1] = {
+   [PM_SUSPEND_MEM]= "regulator-state-mem",
+   [PM_SUSPEND_MAX]= "regulator-state-disk",
+};
+
 static void of_get_regulation_constraints(struct device_node *np,
struct regulator_init_data **init_data)
 {
-   const __be32 *min_uV, *max_uV;
+   const __be32 *min_uV, *max_uV, *suspend_uV;
struct regulation_constraints *constraints = &(*init_data)->constraints;
-   int ret;
+   struct regulator_state *suspend_state;
+   struct device_node *suspend_np;
+   int ret, i;
u32 pval;
 
constraints->name = of_get_property(np, "regulator-name", NULL);
@@ -70,6 +77,60 @@ static void of_get_regulation_constraints(struct device_node 
*np,
ret = of_property_read_u32(np, "regulator-enable-ramp-delay", &pval);
if (!ret)
constraints->enable_time = pval;
+
+   ret = of_property_read_u32(np, "regulator-initial-state", &pval);
+   if (!ret) {
+   switch (pval) {
+   case PM_SUSPEND_MEM:
+   case PM_SUSPEND_MAX:
+   constraints->initial_state = pval;
+   break;
+   default:
+   break;
+   };
+   }
+
+   for (i = 0; i < ARRAY_SIZE(regulator_states); i++) {
+   switch (i) {
+   case PM_SUSPEND_MEM:
+   suspend_state = &constraints->state_mem;
+   break;
+   case PM_SUSPEND_MAX:
+   suspend_state = &constraints->state_disk;
+   break;
+   case PM_SUSPEND_ON:
+   case PM_SUSPEND_FREEZE:
+   case PM_SUSPEND_STANDBY:
+   default:
+   continue;
+   };
+
+   suspend_np = of_get_child_by_name(np, regulator_states[i]);
+   if (!suspend_np || !suspend_state)
+   continue;
+
+   suspend_uV = of_get_property(suspend_np, "regulator-volt",
+   NULL);
+   if (suspend_uV) {
+   suspend_state->uV = be32_to_cpu(*suspend_uV);
+
+   if (suspend_state->uV < constraints->min_uV)
+   suspend_state->uV = constraints->min_uV;
+   if (suspend_state->uV > constraints->max_uV)
+   suspend_state->uV = constraints->max_uV;
+   }
+
+   if (of_property_read_bool(suspend_np,
+   "regulator-on-in-suspend"))
+   suspend_state->enabled = true;
+
+   if (of_property_read_bool(suspend_np,
+   "regulator-off-in-suspend"))
+   suspend_state->disabled = true;
+
+   suspend_state = NULL;
+   suspend_np = NULL;
+   }
 }
 
 /**
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv4 2/2] dt-bindings: regulator: Add regulator suspend state for PM state

2014-08-17 Thread Chanwoo Choi

This patch add regulator suspend state to constraint in dt file. The regulation_
constraints structure already has regulator suspend state field as following.
The regulator suspend state control the state of regulator according to
PM (Power Management) state.
- struct regulator_state state_disk
- struct regulator_state state_mem

Signed-off-by: Chanwoo Choi 
Acked-by: Kyungmin Park 
---
 .../devicetree/bindings/regulator/regulator.txt| 22 ++
 1 file changed, 22 insertions(+)

diff --git a/Documentation/devicetree/bindings/regulator/regulator.txt 
b/Documentation/devicetree/bindings/regulator/regulator.txt
index 8607433..ccba90b 100644
--- a/Documentation/devicetree/bindings/regulator/regulator.txt
+++ b/Documentation/devicetree/bindings/regulator/regulator.txt
@@ -19,6 +19,23 @@ Optional properties:
   design requires. This property describes the total system ramp time
   required due to the combination of internal ramping of the regulator itself,
   and board design issues such as trace capacitance and load on the supply.
+- regulator-initial-state: initial state for suspend state, cnd set initial
+  state among following defined suspend states:
+  <3>: PM_SUSPEND_MEM - Setup regulator according to regulator-state-mem
+  <4>: PM_SUSPEND_MAX - Setup regulator according to regulator-state-disk
+- regulator-state-mem sub-root node for Suspend-to-RAM mode
+  : suspend to memory, the device goes to sleep, but all data stored in memory,
+  only some external interrupt can wake the device.
+- regulator-state-disk sub-root node for Suspend-to-disk mode
+  : suspend to disk, this state operates similarly to Suspend-to-RAM,
+  but includes a final step of writing memory contents to disk.
+- regulator-state-[mem/disk] node has following common properties:
+   - regulator-volt: voltage consumers may set in suspend state.
+   - regulator-on-in-suspend: regulator should be on in suspend state.
+   - regulator-off-in-suspend: regulator should be off in suspend state.
+   If node don't include regulator-[on/off]-in-suspend, can't change
+   regulator state in suspend mode and only should sustain the regulator
+   state of normal state.
 
 Deprecated properties:
 - regulator-compatible: If a regulator chip contains multiple
@@ -34,6 +51,11 @@ Example:
regulator-max-microvolt = <250>;
regulator-always-on;
vin-supply = <&vin>;
+
+   regulator-state-mem {
+   regulator-volt = <100>;
+   regulator-on-in-suspend;
+   };
};
 
 Regulator Consumers:
-- 
1.8.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] dmaengine: qcom_bam_dma: Add support for v1.3.0

2014-08-17 Thread Srinivas Kandagatla


Hi Andy,
Any plans to respin these patches with Stanimir's comments?

thanks,
srini


On 16/04/14 22:45, Andy Gross wrote:

This set of patches adds support for the v1.3.0 version of the QCOM BAM
dmaengine driver.  The older version of the BAM is present in the MSM8x64,
APQ8064, and IPQ8064 processors.

Due to register address space changes between versions, all of the register
accesses have to be calculated using different offsets and multipliers that are
specific to that version of the IP block.

Andy Gross (2):
   dmaengine: qcom_bam_dma: Add v1.3.0 driver support
   dmaengine: qcom_bam_dma: Add binding for v1.3.0

  .../devicetree/bindings/dma/qcom_bam_dma.txt   |4 +-
  drivers/dma/qcom_bam_dma.c |  177 +---
  2 files changed, 117 insertions(+), 64 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] memory-hotplug: add sysfs zones_online_to attribute

2014-08-17 Thread Yasuaki Ishimatsu


(2014/08/18 12:25), Zhang Zhen wrote:

On 2014/8/16 5:37, Toshi Kani wrote:

On Wed, 2014-08-13 at 12:10 +0800, Zhang Zhen wrote:

Currently memory-hotplug has two limits:
1. If the memory block is in ZONE_NORMAL, you can change it to
ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
2. If the memory block is in ZONE_MOVABLE, you can change it to
ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.

With this patch, we can easy to know a memory block can be onlined to
which zone, and don't need to know the above two limits.

Updated the related Documentation.

Change v1 -> v2:
- optimize the implementation following Dave Hansen's suggestion

Signed-off-by: Zhang Zhen 
---
  Documentation/ABI/testing/sysfs-devices-memory |  8 
  Documentation/memory-hotplug.txt   |  4 +-
  drivers/base/memory.c  | 62 ++
  include/linux/memory_hotplug.h |  1 +
  mm/memory_hotplug.c|  2 +-
  5 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index 7405de2..2b2a1d7 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -61,6 +61,14 @@ Users:   hotplug memory remove tools

http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils





+What:   /sys/devices/system/memory/memoryX/zones_online_to


I think this name is a bit confusing.  How about "valid_online_types"?


Thanks for your suggestion.

This patch has been added to -mm tree.
If most people think so, i would like to modify the interface name.


I like Toshi's idea (valid_online_types).

Thanks,
Yasuaki Ishimatsu


If not, let's leave it as it is.

Best regards!

Thanks,
-Toshi



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: mailto:"d...@kvack.org";> em...@kvack.org 








--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: stats (Re: Linux 3.17-rc1)

2014-08-17 Thread Stephen Rothwell

Hi all,

As usual, the executive friendly graph is at
http://neuling.org/linux-next-size.html :-)

(No merge commits counted, next-20140804 was the first linux-next after
the merge window opened.)

Commits in v3.17-rc1 (relative to v3.16): 10872 (v3.16-rc1: 11364)
Commits in next-20140804: 10268 (next-20140602: 10283)
Commits with the same SHA1:9216 (9204)
Commits with the same patch_id: 590 (1) ( 559)
Commits with the same subject line:  53 (1) (  60)

(1) not counting those in the lines above.

So commits in -rc1 that were in next-20140602:   9859   90.7%   (9823   86.4%)
That is higher than last but simillar to the merge windown before that.
Last merge window was unusually low.

Some breakdown of the list of extra commits (relative to next-20140804)
in -rc1:

Top ten first word of commit summary:

262 drm
 62 powerpc
 51 input
 46 mips
 41 net
 28 hwmon
 22 xfs
 22 bcache
 18 arm
 18 alsa

Top eleven authors:

 82 bske...@redhat.com
 35 benjamin.tissoi...@redhat.com
 25 axel@ingics.com
 25 alexander.deuc...@amd.com
 23 himangi...@gmail.com
 21 gws...@linux.vnet.ibm.com
 19 v...@zeniv.linux.org.uk
 17 paul.bur...@imgtec.com
 15 mini...@googlemail.com
 15 christian.koe...@amd.com
 15 acour...@nvidia.com

Top ten commiters:

141 da...@davemloft.net
 99 bske...@redhat.com
 64 b...@kernel.crashing.org
 57 alexander.deuc...@amd.com
 51 dmitry.torok...@gmail.com
 46 r...@linux-mips.org
 41 matthew.garr...@nebula.com
 32 torva...@linux-foundation.org
 31 daei...@gmail.com
 28 li...@roeck-us.net

There are also 410 commits in next-20140804 that didn't make it into
v3.17-rc1.

Top eight first word of commit summary:

 66 arm
 33 mm
 30 drm
 22 rcu
 15 fs
 11 ocfs2
  9 mips
  9 drivers

Top eleven authors:

 30 a...@linux-foundation.org
 23 o...@lixom.net
 17 ville.syrj...@linux.intel.com
 17 bobby.pr...@gmail.com
 15 f...@skynet.be
 13 laurent.pinchart+rene...@ideasonboard.com
 12 han...@cmpxchg.org
 10 paul...@linux.vnet.ibm.com
 10 j...@perches.com
 10 beh...@converseincode.com
 10 a...@arndb.de

Some of Andrew's patches are fixes for other patches in his tree (and
have been merged into those).

Top ten commiters:

154 s...@canb.auug.org.au
 31 daniel.vet...@ffwll.ch
 29 paul...@linux.vnet.ibm.com
 25 o...@lixom.net
 14 horms+rene...@verge.net.au
 14 epa...@redhat.com
 13 shawn@freescale.com
 12 zo...@linux.vnet.ibm.com
 10 jason.wes...@windriver.com
 10 beh...@converseincode.com

Those commits by me are from the quilt series (mainly Andrew's mmotm
tree).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

ALERT: md/raid6 data corruption risk.

2014-08-17 Thread NeilBrown


Hi all,
 There is a risk of data loss with md/raid6 arrays running on Linux since
 2.6.32.
 If:
   - the array is doubly degraded
   - one or both failed devices are being recovered, and
   - the array is written to

 then it is possible for data on the array to be lost.  The patch below fixes
 the problem.  If you apply the patch to an older kernel which has separate
 handle_stripe5() and handle_stripe6() functions, be sure that patch changes
 handle_stripe6().

 There is no risk to an optimal array or a singly-degraded array.  There is
 also no risk on a doubly-degraded array which is not recovering a device or
 is not receiving write requests.

 If you have data on a RAID6 array, please consider how to avoid corruption,
 possibly by applying the patch, possibly by removing any hot spares so
 recovery does not automatically start.

 This patch will be sent upstream shortly and will subsequently appear in
 future "-stable" kernels.

NeilBrown

From f94e37dce722ec7bfd04be357f422daa02b5 Mon Sep 17 00:00:00 2001
From: NeilBrown 
Date: Wed, 13 Aug 2014 09:57:07 +1000
Subject: [PATCH] md/raid6: avoid data corruption during recovery of
 double-degraded RAID6

During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions that patch must change handle_stripe6().

Cc: sta...@vger.kernel.org (2.6.32+)
Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov 
Cc: Dan Williams 
Reported-by: "Manibalan P" 
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown 

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6b2d615d1094..183588b11fc1 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3817,6 +3817,8 @@ static void handle_stripe(struct stripe_head *sh)
set_bit(R5_Wantwrite, &dev->flags);
if (prexor)
continue;
+   if (s.failed > 1)
+   continue;
if (!test_bit(R5_Insync, &dev->flags) ||
((i == sh->pd_idx || i == sh->qd_idx)  &&
 s.failed == 0))


signature.asc
Description: PGP signature

Re: [PATCH v2] memory-hotplug: add sysfs zones_online_to attribute

2014-08-17 Thread Yasuaki Ishimatsu


(2014/08/13 13:10), Zhang Zhen wrote:

Currently memory-hotplug has two limits:
1. If the memory block is in ZONE_NORMAL, you can change it to
ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
2. If the memory block is in ZONE_MOVABLE, you can change it to
ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.

With this patch, we can easy to know a memory block can be onlined to
which zone, and don't need to know the above two limits.

Updated the related Documentation.

Change v1 -> v2:
- optimize the implementation following Dave Hansen's suggestion

Signed-off-by: Zhang Zhen 
---
  Documentation/ABI/testing/sysfs-devices-memory |  8 
  Documentation/memory-hotplug.txt   |  4 +-
  drivers/base/memory.c  | 62 ++
  include/linux/memory_hotplug.h |  1 +
  mm/memory_hotplug.c|  2 +-
  5 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
b/Documentation/ABI/testing/sysfs-devices-memory
index 7405de2..2b2a1d7 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -61,6 +61,14 @@ Users:   hotplug memory remove tools

http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils


+What:   /sys/devices/system/memory/memoryX/zones_online_to
+Date:   July 2014
+Contact:   Zhang Zhen 
+Description:
+   The file /sys/devices/system/memory/memoryX/zones_online_to
+   is read-only and is designed to show which zone this memory 
block can
+   be onlined to.
+
  What: /sys/devices/system/memoryX/nodeY
  Date: October 2009
  Contact:  Linux Memory Management list 
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
index 45134dc..5b34e33 100644
--- a/Documentation/memory-hotplug.txt
+++ b/Documentation/memory-hotplug.txt
@@ -155,6 +155,7 @@ Under each memory block, you can see 4 files:
  /sys/devices/system/memory/memoryXXX/phys_device
  /sys/devices/system/memory/memoryXXX/state
  /sys/devices/system/memory/memoryXXX/removable
+/sys/devices/system/memory/memoryXXX/zones_online_to

  'phys_index'  : read-only and contains memory block id, same as XXX.
  'state'   : read-write
@@ -170,6 +171,8 @@ Under each memory block, you can see 4 files:
  block is removable and a value of 0 indicates that
  it is not removable. A memory block is removable only if
  every section in the block is removable.
+'zones_online_to' : read-only: designed to show which zone this memory block
+   can be onlined to.

  NOTE:
These directories/files appear after physical memory hotplug phase.
@@ -408,7 +411,6 @@ node if necessary.
- allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
  sysctl or new control file.
- showing memory block and physical device relationship.
-  - showing memory block is under ZONE_MOVABLE or not
- test and make it better memory offlining.
- support HugeTLB page migration and offlining.
- memmap removing at memory offline.
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index a2e13e2..b5d693f 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -373,10 +373,71 @@ static ssize_t show_phys_device(struct device *dev,
return sprintf(buf, "%d\n", mem->phys_device);
  }

+static int __zones_online_to(unsigned long end_pfn,
+   struct page *first_page, unsigned long nr_pages)
+{
+   struct zone *zone_next;
+



+   /*The mem block is the last block of memory.*/
+   if (!pfn_valid(end_pfn + 1))
+   return 1;


The check is not enough if memory has hole as follows:

PFN   0x00  0xd0  0xe0  0xf0
+-+-+-+
zone type   |   Normal| hole|   Normal|
+-+-+-+

In this case, 0xd1 is invalid pfn. But __zones_online_to should return 0
since 0xe0-0xf0 is Normal zone.

Thanks,
Yasuaki Ishimatsu



+   zone_next = page_zone(first_page + nr_pages);
+   if (zone_idx(zone_next) == ZONE_MOVABLE)
+   return 1;
+   return 0;
+}
+
+static ssize_t show_zones_online_to(struct device *dev,
+   struct device_attribute *attr, char *buf)
+{
+   struct memory_block *mem = to_memory_block(dev);
+   unsigned long start_pfn, end_pfn;
+   unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
+   struct page *first_page;
+   struct zone *zone, *zone_prev;
+
+   start_pfn = section_nr_to_pfn(mem->start_section_nr);
+   end_pfn = start_pfn + nr_pages;
+   first_page = pfn_to_page(start_pfn);
+
+   /*The block contains m

Antw: Re: Some problems with HP DL380 G8 BIOS and SLES11 SP3

2014-08-17 Thread Ulrich Windl

>>> Don Zickus  schrieb am 14.08.2014 um 19:46 in Nachricht
<20140814174658.gv49...@redhat.com>:
> On Wed, Aug 13, 2014 at 05:22:17PM +0200, Ulrich Windl wrote:
>> Hello!
>> 
>> Running the current SLES11 SP3 kernel on a HP DL380 G8 server, there are 
> some kernel messages that indicate a bug either in the kernel or in the HP 
> BIOS. Maybe someone can explain, so I can try to get it fixed whatever party 
> broke it...
>> 
>> Linux kernel is "3.0.101-0.35-default (geeko@buildhost) (gcc version 4.3.4 
> [gcc-4_3-branch revision 152973]" (latest).
>> HP server is "HP ProLiant DL380p Gen8, BIOS P70 02/10/2014" (latest)
> 
> Yes, it is because you are letting the firmware dynamically control your
> cpu frequency.  In order to accomplish they need to use a perf counter or
> two, hence the conflict.  Set the firmware setting to OS control and the
> problem goes away.  Contact HP for those instructions, they are very aware
> of this problem and recommend OS control to all high end servers.

Hi!

Thanks for answering, but the BIOS has set power management to "OS control" 
(see attachment). So I guess it must be something different.

Regards,
Ulrich

> 
> Cheers,
> Don
> 
>> 
>> During ACPI init I see:
>> [...]
>> Reserving 128MB of memory at 752MB for crashkernel (System RAM: 132095MB)
>> ACPI: RSDP 000f4f00 00024 (v02 HP)
>> ACPI: XSDT bddaed00 000D4 (v01 HP ProLiant 0002   322? 
> 162E)
>> ACPI: FACP bddaee40 000F4 (v03 HP ProLiant 0002   322? 
> 162E)
>> ACPI Warning: Invalid length for Pm1aControlBlock: 32, using default 16 
> (2011041
>> 3/tbfadt-611)
>> ACPI Warning: Invalid length for Pm2ControlBlock: 32, using default 8 
> (20110413/
>> tbfadt-611)
>> ACPI: DSDT bddaef40 026DC (v01 HP DSDT 0001 INTL 
> 20030228)
>> ACPI: FACS bddac140 00040
>> ACPI: SPCR bddac180 00050 (v01 HP SPCRRBSU 0001   322? 
> 162E)
>> ACPI: MCFG bddac200 0003C (v01 HP ProLiant 0001  
> )
>> [...]
>> 
>> HPET id 0 under DRHD base 0xf4ffe000
>> BIOS requests to not use x2apic
>> Use 'intremap=no_x2apic_optout' to override BIOS request
>> Enabled IRQ remapping in xapic mode
>> x2apic not enabled, IRQ remapping is in xapic mode
>> Switched APIC routing to physical flat.
>> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>> CPU0: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz stepping 04
>> Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, Broken BIOS 
> detec
>> ted, complain to your hardware vendor.
>> [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
>> Intel PMU driver.
>> ... version:3
>> ... bit width:  48
>> ... generic registers:  4
>> ... value mask: 
>> ... max period: 7fff
>> ... fixed-purpose events:   3
>> ... event mask: 0007000f
>> NMI watchdog enabled, takes one hw-pmu counter.
>> Booting Node   0, Processors  #1
>> [...]
>> 
>>  pci:00: Requesting ACPI _OSC control (0x1d)
>>  pci:00: ACPI _OSC request failed (AE_SUPPORT), returned control mask: 
> 0x00
>> ACPI _OSC control for PCIe not granted, disabling ASPM
>> [...]
>> 
>>  pci:20: Requesting ACPI _OSC control (0x1d)
>>  pci:20: ACPI _OSC request failed (AE_SUPPORT), returned control mask: 
> 0x00
>> ACPI _OSC control for PCIe not granted, disabling ASPM
>> [...]
>> 
>> Regards,
>> Ulrich
>> P.S. Please CC: me, as I'm not on LKML...
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majord...@vger.kernel.org 
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
>> Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 05/16] clk: tegra: Add closed loop support for the DFLL

2014-08-17 Thread Vince Hsu


Hi,

On 07/21/2014 11:38 PM, Tuomas Tynkkynen wrote:

With closed loop support, the clock rate of the DFLL can be adjusted.

The oscillator itself in the DFLL is a free-running oscillator whose
rate is directly determined the supply voltage. However, the DFLL
module contains logic to compare the DFLL output rate to a fixed
reference clock (51 MHz) and make a decision to either lower or raise
the DFLL supply voltage. The DFLL module can then autonomously change
the supply voltage by communicating with an off-chip PMIC via either I2C
or PWM signals. This driver currently supports only I2C.

Signed-off-by: Tuomas Tynkkynen 
---
v2 changes:
 - query the various properties required for I2C mode from the
   regulator framework

  drivers/clk/tegra/clk-dfll.c | 656 ++-
  1 file changed, 653 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/tegra/clk-dfll.c b/drivers/clk/tegra/clk-dfll.c
index d83e859..0d4b2dd 100644
--- a/drivers/clk/tegra/clk-dfll.c
+++ b/drivers/clk/tegra/clk-dfll.c
@@ -205,12 +205,16 @@

...

+
+/**
+ * dfll_calculate_rate_request - calculate DFLL parameters for a given rate
+ * @td: DFLL instance
+ * @req: DFLL-rate-request structure
+ * @rate: the desired DFLL rate
+ *
+ * Populate the DFLL-rate-request record @req fields with the scale_bits
+ * and mult_bits fields, based on the target input rate. Returns 0 upon
+ * success, or -EINVAL if the requested rate in req->rate is too high
+ * or low for the DFLL to generate.
+ */
+static int dfll_calculate_rate_request(struct tegra_dfll *td,
+  struct dfll_rate_req *req,
+  unsigned long rate)
+{
+   u32 val;
+
+   /*
+* If requested rate is below the minimum DVCO rate, active the scaler.
+* In the future the DVCO minimum voltage should be selected based on
+* chip temperature and the actual minimum rate should be calibrated
+* at runtime.
+*/
+   req->scale_bits = DFLL_FREQ_REQ_SCALE_MAX - 1;
+   if (rate < td->dvco_rate_min) {
+   int scale;
+
+   scale = DIV_ROUND_CLOSEST(rate / 1000 * DFLL_FREQ_REQ_SCALE_MAX,
+ td->dvco_rate_min / 1000);
+   if (!scale) {
+   dev_err(td->dev, "%s: Rate %lu is too low\n",
+   __func__, rate);
+   return -EINVAL;
+   }
+   req->scale_bits = scale - 1;
+   rate = td->dvco_rate_min;
+   }
+
+   /* Convert requested rate into frequency request and scale settings */
+   val = DVCO_RATE_TO_MULT(rate, td->ref_rate);
+   if (val > FREQ_MAX) {
+   dev_err(td->dev, "%s: Rate %lu is above dfll range\n",
+   __func__, rate);
+   return -EINVAL;
+   }
+   req->mult_bits = val;
+   req->dvco_target_rate = MULT_TO_DVCO_RATE(req->mult_bits, td->ref_rate);
+   req->rate = dfll_scale_dvco_rate(req->dvco_target_rate,
+req->scale_bits);

Should be dfll_scale_dvco_rate(req->scale_bits, req->dvco_target_rate);

Thanks,
Vince


+   req->lut_index = find_lut_index_for_rate(td, req->dvco_target_rate);
+   if (req->lut_index < 0)
+   return req->lut_index;
+
+   return 0;
+}
+


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V3 3/3] ARM: clk-imx6q: Add missing lvds and anaclk clock to the clock tree

2014-08-17 Thread Shawn Guo

On Mon, Aug 11, 2014 at 11:09:36AM +0800, Shengjiu Wang wrote:
> On Sat, Aug 09, 2014 at 09:58:42PM +0800, Shawn Guo wrote:
> > On Fri, Aug 08, 2014 at 03:02:49PM +0800, Shengjiu Wang wrote:
> > > @@ -176,8 +182,12 @@ static void __init imx6q_clocks_init(struct 
> > > device_node *ccm_node)
> > >* the "output_enable" bit as a gate, even though it's really just
> > >* enabling clock output.
> > >*/
> > > - clk[IMX6QDL_CLK_LVDS1_GATE] = imx_clk_gate("lvds1_gate", "lvds1_sel", 
> > > base + 0x160, 10);
> > > - clk[IMX6QDL_CLK_LVDS2_GATE] = imx_clk_gate("lvds2_gate", "lvds2_sel", 
> > > base + 0x160, 11);
> > > + clk[IMX6QDL_CLK_LVDS1_GATE] = imx_clk_gate2("lvds1_gate", "lvds1_sel", 
> > > base + 0x160, 10);
> > > + clk[IMX6QDL_CLK_LVDS2_GATE] = imx_clk_gate2("lvds2_gate", "lvds2_sel", 
> > > base + 0x160, 11);
> > 
> > I do not think you can simply change to use imx_clk_gate2() here.  It's
> > designed for those CCGR gate clocks, each of which is controlled by two
> > bits.
> > 
> > Shawn
> >
> As Lucas Stach's suggestion, we need to do add some method for mutually 
> exclusive clock, 
> lvds1_gate with lvds1_in, lvds2_gate with lvds2_in. I add 
> imx_clk_gate2_exclusive() function in clk-gate2.c.
> So I change imx_clk_gate() to imx_clk_gate2() here.
> As you said, this is not good solution.

It's not just a "not good" solution but wrong and broken one.  The net
result of that is if you call clk_enable() on lvds1_gate, both bit 10
and 11 will be set.

> So I need your suggestion, how can I do?

I guess we will need a new clock type to handle such mutually exclusive
clocks, rather than patching clk-gate2.

> First, is it allowable that to add imx_clk_gate2_exclusive() function, is 
> there a more better way?

Again, this is completely wrong.

> second, or should I change the clk-gate.c to add exclusive control?

If such mutually exclusive clocks are somehow common across different
clock controllers, we can propose to change clk-gate.c for handling
them.  But I'm not sure this is a common case.

Shawn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 04/16] clk: tegra: Add library for the DFLL clock source (open-loop mode)

2014-08-17 Thread Vince Hsu


Hi,

On 07/21/2014 11:38 PM, Tuomas Tynkkynen wrote:

Add shared code to support the Tegra DFLL clocksource in open-loop
mode. This root clocksource is present on the Tegra124 SoCs. The
DFLL is the intended primary clock source for the fast CPU cluster.

This code is very closely based on a patch by Paul Walmsley from
December (http://comments.gmane.org/gmane.linux.ports.tegra/15273),
which in turn comes from the internal driver by originally created
by Aleksandr Frid .

Subsequent patches will add support for closed loop mode and drivers
for the Tegra124 fast CPU cluster DFLL devices, which rely on this
code.

Signed-off-by: Paul Walmsley 
Signed-off-by: Tuomas Tynkkynen 
---
v2 changes:
 - minor, moved the devm_regulator_get here

  drivers/clk/tegra/Makefile   |1 +
  drivers/clk/tegra/clk-dfll.c | 1085 ++
  drivers/clk/tegra/clk-dfll.h |   55 +++
  3 files changed, 1141 insertions(+)
  create mode 100644 drivers/clk/tegra/clk-dfll.c
  create mode 100644 drivers/clk/tegra/clk-dfll.h

...

--- /dev/null
+++ b/drivers/clk/tegra/clk-dfll.c

...

+
+/*
+ * Output clock scaler helpers
+ */
+
+/**
+ * dfll_scale_dvco_rate - calculate scaled rate from the DVCO rate
+ * @scale_bits: clock scaler value (bits in the DFLL_FREQ_REQ_SCALE field)
+ * @dvco_rate: the DVCO rate
+ *
+ * Apply the same scaling formula that the DFLL hardware uses to scale
+ * the DVCO rate.
+ */
+static unsigned long dfll_scale_dvco_rate(int scale_bits,
+ unsigned long dvco_rate)
+{
+   return (u64)dvco_rate * (scale_bits + 1) / DFLL_FREQ_REQ_SCALE_MAX;
+}

...

+static u64 dfll_read_monitor_rate(struct tegra_dfll *td)
+{
+   u32 v, s;
+   u64 pre_scaler_rate, post_scaler_rate;
+
+   if (!dfll_is_running(td))
+   return 0;
+
+   v = dfll_readl(td, DFLL_MONITOR_DATA);
+   v = (v & DFLL_MONITOR_DATA_VAL_MASK) >> DFLL_MONITOR_DATA_VAL_SHIFT;
+   pre_scaler_rate = dfll_calc_monitored_rate(v, td->ref_rate);
+
+   s = dfll_readl(td, DFLL_FREQ_REQ);
+   s = (s & DFLL_FREQ_REQ_SCALE_MASK) >> DFLL_FREQ_REQ_SCALE_SHIFT;
+   post_scaler_rate = dfll_scale_dvco_rate(pre_scaler_rate, s);

Should be dfll_scale_dvco_rate(s, pre_scaler_rate);

Thanks,
Vince


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] usb: phy: return -ENODEV on failure of try_module_get

2014-08-17 Thread Arjun Sreedharan

When __usb_find_phy_dev() does not return error and
try_module_get() fails, return -ENODEV.

Signed-off-by: Arjun Sreedharan 
---
 drivers/usb/phy/phy.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c
index 36b6bce..fd0d7f1 100644
--- a/drivers/usb/phy/phy.c
+++ b/drivers/usb/phy/phy.c
@@ -232,6 +232,9 @@ struct usb_phy *usb_get_phy_dev(struct device *dev, u8 
index)
phy = __usb_find_phy_dev(dev, &phy_bind_list, index);
if (IS_ERR(phy) || !try_module_get(phy->dev->driver->owner)) {
dev_dbg(dev, "unable to find transceiver\n");
+   if (!IS_ERR(phy))
+   phy = ERR_PTR(-ENODEV);
+
goto err0;
}
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Issue with clone() and CLONE_NEWUSER as unprivileged user

2014-08-17 Thread Marcel Holtmann

Hi,

I am trying to use clone() and CLONE_NEWUSER for creating a new user namespace 
as an unprivileged user. I always get an operation not permitted error. However 
when I used fork() + unshare() as unprivileged user, I can create the new user 
namespace just fine.

Is there something obvious that I am missing? My understand is that 
CLONE_NEWUSER should not require any special capabilities. I tried the sample 
code from the manpage and also from LWN.net, but both give me the same error.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Intel-gfx] Usage of _PAGE_PCD et al in i915 driver

2014-08-17 Thread Juergen Gross


On 08/15/2014 12:21 PM, Ville Syrjälä wrote:

On Thu, Aug 14, 2014 at 05:55:11AM +0200, Juergen Gross wrote:

On 08/13/2014 05:07 PM, Jesse Barnes wrote:

On Fri, 8 Aug 2014 15:14:15 +0200
Daniel Vetter  wrote:


Adding relevant mailing lists.

On Fri, Aug 8, 2014 at 1:23 PM, Juergen Gross  wrote:

I'm just about to create a patch for full PAT support in the Linux
kernel, including Xen. For this purpose I introduce a translation
between cache modes and pte bits.

Scanning the kernel sources for usage of the cache mode bits in the
pte I discovered  drivers/gpu/drm/i915/i915_gem_gtt.h is using
_PAGE_PCD, _PAGE_PWT and _PAGE_PAT. I think those defines are used
to create ptes not for usage by the main processor, but for the
graphics processor. Is this true? In this case I'd suggest to define
i915-specific macros instead of using the x86 ones.


Yeah, those are gpu specific PAT tables, but the hw engineers
specifically designed this to match, and we've tried to follow the cpu
side to match it. Especially in the future that will be somewhat
important, since we want to fully share the entire address space
between cpu and gpu on the next platform. Jesse is working on that.


Right, we have an x86 compatible MMU in the GPU itself, so re-using the
defines makes sense.  I suppose with your work you'll move them and
make them a bit more opaque?  If so, we'll still want a way to get at
them directly, or access your mapping functions for generating PTE bits
for the GPU MMU.


Using the mapping functions I'm introducing should work, if the MMU has
an x86 compatible MSR_IA32_CR_PAT which is configured the same way as
on the x86 processor (be aware that Xen is using another MSR_IA32_CR_PAT
setting as the Linux kernel).


We have a PAT that is structured the same way as the x86 PAT. But the
contents of the PAT entries are obviously specific to the GPU so it's
not identical. But the pcd/pwt/pat bits index the PAT in exactly the
same way as on x86.

See bdw_setup_private_ppat() and chv_setup_private_ppat() for how we
set up the PAT.



So you are using the PAT bit in the ptes, but the semantic for the GPU
will be different as for the x86 processor, because the GPU PAT is set
up differently from the x86 one.

In case you are sharing ptes between GPU and x86 processor in future,
this might lead to problems when the x86 processor will use ptes with
the PAT bit set.


Juergen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Re: [RFC PATCH 04/10] scsi/constants: Cleanup printk message in scsi_dump_sense_buffer()

2014-08-17 Thread Yoshihiro YUNOMAE


(2014/08/16 0:08), Ewan Milne wrote:

On Fri, 2014-08-08 at 11:50 +, Yoshihiro YUNOMAE wrote:

Unrecognized sense data should be output after linebuf is filled because
"[%s] Unrecognized sense data (in hex): %s" message is output many times in
loop.

Signed-off-by: Yoshihiro YUNOMAE 
Cc: Hannes Reinecke 
Cc: Doug Gilbert 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: "James E.J. Bottomley" 
Cc: Hidehiro Kawai 
Cc: Masami Hiramatsu 
---
  drivers/scsi/constants.c |   13 +
  1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
index 5956d4d..6fad6b4 100644
--- a/drivers/scsi/constants.c
+++ b/drivers/scsi/constants.c
@@ -1385,16 +1385,13 @@ EXPORT_SYMBOL(scsi_print_sense_hdr);

  static void
  scsi_dump_sense_buffer(struct scsi_device *sdev, const char *prefix,
-  const unsigned char *sense_buffer, int sense_len,
-  struct scsi_sense_hdr *sshdr)
+  const unsigned char *sense_buffer, int sense_len)
  {
char linebuf[128];
int i, linelen, remaining;

if (sense_len < 32)
sense_len = 32;
-   sdev_printk(KERN_INFO, sdev,
-   "[%s] Unrecognized sense data (in hex):", prefix);

remaining = sense_len;
for (i = 0; i < sense_len; i += 16) {
@@ -1403,9 +1400,10 @@ scsi_dump_sense_buffer(struct scsi_device *sdev, const 
char *prefix,

hex_dump_to_buffer(sense_buffer + i, linelen, 16, 1,
   linebuf, sizeof(linebuf), false);
-   sdev_printk(KERN_INFO, sdev, "[%s] Sense: %s\n",
-   prefix, linebuf);
}
+   sdev_printk(KERN_INFO, sdev,
+   "[%s] Unrecognized sense data (in hex): %s",
+   prefix, linebuf);
  }


See my earlier comment regarding PATCH 03/10.

This doesn't look right -- In Hannes' tree what the code is doing is
printing out a separate line for each 16 bytes of the sense data.
Your change will cause only the last (partial?) 16 bytes to be printed.


That's true. We should not apply this as well.


The removal of the unused sshdr argument is fine, though.


Thanks!

Yoshihiro YUNOMAE


-Ewan



  static void
@@ -1467,8 +1465,7 @@ void __scsi_print_sense(struct scsi_device *sdev, const 
char *name,

if (!scsi_normalize_sense(sense_buffer, sense_len, &sshdr)) {
/* this may be SCSI-1 sense data */
-   scsi_dump_sense_buffer(sdev, name, sense_buffer,
-  sense_len, &sshdr);
+   scsi_dump_sense_buffer(sdev, name, sense_buffer, sense_len);
return;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/






--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 03/10] scsi/constants: Cleanup printk message in __scsi_print_command()

2014-08-17 Thread Yoshihiro YUNOMAE


Hi Ewan,

Thank you for your review.

(2014/08/16 0:05), Ewan Milne wrote:

On Fri, 2014-08-08 at 11:50 +, Yoshihiro YUNOMAE wrote:

All bytes in CDB should be output after linebuf is filled because
"[%s] CDB: %s\n" message is output many times in loop.

Signed-off-by: Yoshihiro YUNOMAE 
Cc: Hannes Reinecke 
Cc: Doug Gilbert 
Cc: Martin K. Petersen 
Cc: Christoph Hellwig 
Cc: "James E.J. Bottomley" 
Cc: Hidehiro Kawai 
Cc: Masami Hiramatsu 
---
  drivers/scsi/constants.c |3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c
index 9c38b8d..5956d4d 100644
--- a/drivers/scsi/constants.c
+++ b/drivers/scsi/constants.c
@@ -413,9 +413,8 @@ void __scsi_print_command(struct scsi_device *sdev, const 
char *prefix,

hex_dump_to_buffer(cdb + i, linelen, 16, 1,
   linebuf, sizeof(linebuf), false);
-   sdev_printk(KERN_INFO, sdev, "[%s] CDB: %s\n",
-   prefix, linebuf);
}
+   sdev_printk(KERN_INFO, sdev, "[%s] CDB: %s\n", prefix, linebuf);
  }
  EXPORT_SYMBOL(__scsi_print_command);


This doesn't look right -- In Hannes' tree what the code is doing is
printing out a separate line for each 16 bytes of the CDB.  You change
will cause only the last (partial?) 16 bytes to be printed.


Ah, that's true. We should not apply this patch.

Thanks,
Yoshihiro YUNOMAE

--
Yoshihiro YUNOMAE
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: yoshihiro.yunomae...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area

2014-08-17 Thread Gioh Kim




2014-08-18 오후 12:24, Theodore Ts'o 쓴 글:

On Mon, Aug 18, 2014 at 10:15:32AM +0900, Gioh Kim wrote:


My test platform has totally 1GB memory, 256MB for CMA and 768MB for normal.
I applied Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64, so that
3/4 of allocation take place in normal area and 1/4 allocation take place in 
CMA area.

And my platform has 4 ext4 partitions. Each ext4 partition has 2 page caches 
for superblock that
are what this patch tries to move to out of CMA area.
Therefore there are 8 page caches (8 pages size) that can prevent page 
migration.


Yes, but are you actually *using* the ext4 partitions for anything?
If this is a realistic real world use case, file systems are used to
store, well, files, and that means there will be inodes and dentry
cache entries that will also be allocated.  Does your test scenario
reflect real world usage?


Yes. I'm working for LG Electronics.
My test platform is currently selling item in the market.
And also I test my patch when my platform is working as if real user uses it.

I think the page caches of the inodes and dentry are held for short time.
I can see pairs of get_bh and put_bh in inodes/dentry handling.

I think inodes is allocated by kmem_cache_alloc in ext4_alloc_inode().
It is non-movable area allocation.




Cheers,

- Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] time,signal: protect resource use statistics with seqlock

2014-08-17 Thread Mike Galbraith

On Sat, 2014-08-16 at 19:50 +0200, Oleg Nesterov wrote: 
> On 08/16, Rik van Riel wrote:
> >
> > +   do {
> > +   seq = nextseq;
> > +   read_seqbegin_or_lock(&sig->stats_lock, &seq);
> > +   times->utime = sig->utime;
> > +   times->stime = sig->stime;
> > +   times->sum_exec_runtime = sig->sum_sched_runtime;
> > +
> > +   for_each_thread(tsk, t) {
> > +   task_cputime(t, &utime, &stime);
> > +   times->utime += utime;
> > +   times->stime += stime;
> > +   times->sum_exec_runtime += task_sched_runtime(t);
> > +   }
> > +   /* If lockless access failed, take the lock. */
> > +   nextseq = 1;
> 
> Yes, thanks, this answers my concerns.
> 
> Cough... can't resist, and I still think that we should take rcu_read_lock()
> only around for_each_thread() and the patch expands the critical section for
> no reason. But this is minor, I won't insist.

Hm.  Should traversal not also disable preemption to preserve the error
bound Peter mentioned?

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 6/9] gpiolib: add API to get gpio desc and flags

2014-08-17 Thread Rafael J. Wysocki

On Sunday, August 17, 2014 12:43:38 PM Darren Hart wrote:
> On 8/17/14, 6:00, "Grant Likely"  wrote:
> 
> >>
> >>+   /* Using device tree? */
> >>+   if (IS_ENABLED(CONFIG_OF) && dev->of_node)
> >>+   desc = of_find_gpio(dev, NULL, idx, flags);
> >
> >of_find_gpio() doesn't exist.
> 
> Hrm... As of 3.16.0 (e64df3ebe8262c8203d1fe4f541e0241c3112c01)
> 
> $ git blame -L1455,1456 drivers/gpio/gpiolib.c
> bae48da2 (Alexandre Courbot 2013-10-17 10:21:38 -0700 1455) static struct
> gpio_desc *of_find_gpio(struct device *dev, const char *con_id,
> 
> Have we removed this in -next or something? (on the plane, will verify
> upon landing)

In 3.17-rc1:

rafael@vostro:~/src/linux-pm> grep -r of_find_gpio *
drivers/gpio/gpiolib.c:static struct gpio_desc *of_find_gpio(struct device 
*dev, const char *con_id,
drivers/gpio/gpiolib.c: desc = of_find_gpio(dev, con_id, idx, 
&lookupflags);

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/9] Driver core: Unified device properties interface for platform firmware

2014-08-17 Thread Rafael J. Wysocki

On Sunday, August 17, 2014 12:31:27 PM Darren Hart wrote:
> On 8/17/14, 5:49, "Grant Likely"  wrote:
> 
> >
> >Hi Mika and Rafael,
> >
> >Comments below...

[cut]

> ...
> 
> >> @@ -701,6 +702,7 @@ struct acpi_dev_node {
> >>   * @archdata: For arch-specific additions.
> >>   * @of_node:  Associated device tree node.
> >>   * @acpi_node:Associated ACPI device node.
> >> + * @property_ops: Firmware interface for device properties
> >>   * @devt: For creating the sysfs "dev".
> >>   * @id:   device instance
> >>   * @devres_lock: Spinlock to protect the resource of the device.
> >> @@ -777,6 +779,7 @@ struct device {
> >>  
> >>struct device_node  *of_node; /* associated device tree node */
> >>struct acpi_dev_nodeacpi_node; /* associated ACPI device node */
> >> +  struct dev_prop_ops *property_ops;
> >
> >There are only 2 users of this interface. I don't think adding an ops
> >pointer to each and every struct device is warrented when the wrapper
> >function can check if of_node or acpi_node is set and call the
> >appropriate helper. It is unlikely anything else will use this hook. It
> >will result in smaller memory footprint. Also smaller code when only one
> >of
> >CONFIG_OF and CONFIG_ACPI are selected, which is almost always. :-)
> >
> >It can be refactored later if that ever changes.
> 
> 
> Our intent was to eliminate the #ifdefery in every one of the accessors.
> It was my understanding the ops structures were preferable in such
> situations. For a 64-bit machine with 1000 devices (all of which use
> device properties) with one or the other of ACPI/OF enabled, the
> additional memory requirement here is what... Something like (8*1000 + 4)
> ~= 8KB ? That seems worth the arguably more maintainable code to me. Is
> there more to it than this, am I missing some more significant impact?

Also we wanted to avoid going throug the same sequence of checks every time
a property is accessed for a given device as the result those checks would
lead to every time was already known when the device was registered.

Arguably, if we decide that using DTs and ACPI on the same system at the same
time is a total no-go, then we'll need just one global ops pointer either to
the ACPI or to the DT set of callbacks, but I'm not sure whether or not that
is the way to go.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/9] Driver core: Unified device properties interface for platform firmware

2014-08-17 Thread Rafael J. Wysocki

On Sunday, August 17, 2014 01:49:13 PM Grant Likely wrote:
> 
> Hi Mika and Rafael,
> 
> Comments below...

[cut]

> > +enum dev_prop_type {
> > +   DEV_PROP_U8,
> > +   DEV_PROP_U16,
> > +   DEV_PROP_U32,
> > +   DEV_PROP_U64,
> > +   DEV_PROP_STRING,
> > +   DEV_PROP_MAX,
> > +};
> > +
> > +struct dev_prop_ops {
> > +   int (*get)(struct device *dev, const char *propname, void **valptr);
> > +   int (*read)(struct device *dev, const char *propname,
> > +   enum dev_prop_type proptype, void *val);
> > +   int (*read_array)(struct device *dev, const char *propname,
> > + enum dev_prop_type proptype, void *val, size_t nval);
> 
> The associated DT functions that implement property reads
> (of_property_read_*) were created in part to provide some type safety
> when reading properties. This proposed API throws that away by accepting
> a void* for the data field, which I don't want to do. This API either
> needs to have a separate accessor for each data type, or it needs some
> other mechanism (accessor macros?) to ensure the right type is passed
> in.

The intention is to add static inline functions like:

int device_property_read_u64(struct device *dev, const char *propname, u64 *val)
{
return device_property_read(dev, propname, DEV_PROP_U64, val);
}

and so on for the other property types.  They just have not been implemented in
this version of the patch.

> 
> > +   int (*child_count)(struct device *dev);
> > +};
> > +
> > +#ifdef CONFIG_ACPI
> > +extern struct dev_prop_ops acpi_property_ops;
> > +#endif
> 
> Rendered moot by my comment about eliminating the ops structure, but the
> above shouldn't appear here. acpi_property_ops shouldn't ever be visible
> outside ACPI core code, so it shouldn't be in this header.

It doesn't look like this has to be present here.  At least this particular
patch should compile just fine after removing the 3 lines above.

That seems to be a leftover from one of the previous versions of it.

Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] flush_icache_range: Export symbol to fix build errors

2014-08-17 Thread Pranith Kumar

Fix building errors occuring due to a missing export of flush_icache_range() in
architectures missing the export.

Signed-off-by: Pranith Kumar 
Reported-by: Geert Uytterhoeven 
CC: Andrew Morton 
---
 arch/arc/mm/cache_arc700.c |1 +
 arch/blackfin/include/asm/cacheflush.h |1 +
 arch/frv/include/asm/cacheflush.h  |1 +
 arch/hexagon/mm/cache.c|1 +
 arch/metag/include/asm/cacheflush.h|1 +
 arch/sh/mm/cache.c |1 +
 arch/tile/kernel/smp.c |1 +
 arch/xtensa/kernel/smp.c   |1 +
 8 files changed, 8 insertions(+)

diff --git a/arch/arc/mm/cache_arc700.c b/arch/arc/mm/cache_arc700.c
index 4670afc..e88ddbf 100644
--- a/arch/arc/mm/cache_arc700.c
+++ b/arch/arc/mm/cache_arc700.c
@@ -581,6 +581,7 @@ void flush_icache_range(unsigned long kstart, unsigned long 
kend)
tot_sz -= sz;
}
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 /*
  * General purpose helper to make I and D cache lines consistent.
diff --git a/arch/blackfin/include/asm/cacheflush.h 
b/arch/blackfin/include/asm/cacheflush.h
index 9a5b2c5..0e2eb8c 100644
--- a/arch/blackfin/include/asm/cacheflush.h
+++ b/arch/blackfin/include/asm/cacheflush.h
@@ -70,6 +70,7 @@ static inline void flush_icache_range(unsigned start, 
unsigned end)
}
 #endif
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 #define copy_to_user_page(vma, page, vaddr, dst, src, len) \
 do { memcpy(dst, src, len);\
diff --git a/arch/frv/include/asm/cacheflush.h 
b/arch/frv/include/asm/cacheflush.h
index edbac54..07ee4b3 100644
--- a/arch/frv/include/asm/cacheflush.h
+++ b/arch/frv/include/asm/cacheflush.h
@@ -72,6 +72,7 @@ static inline void flush_icache_range(unsigned long start, 
unsigned long end)
 {
frv_cache_wback_inv(start, end);
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 #ifdef CONFIG_MMU
 extern void flush_icache_user_range(struct vm_area_struct *vma, struct page 
*page,
diff --git a/arch/hexagon/mm/cache.c b/arch/hexagon/mm/cache.c
index fe14ccf..0c76c80 100644
--- a/arch/hexagon/mm/cache.c
+++ b/arch/hexagon/mm/cache.c
@@ -68,6 +68,7 @@ void flush_icache_range(unsigned long start, unsigned long 
end)
);
local_irq_restore(flags);
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 void hexagon_clean_dcache_range(unsigned long start, unsigned long end)
 {
diff --git a/arch/metag/include/asm/cacheflush.h 
b/arch/metag/include/asm/cacheflush.h
index 7787ec5..117c212 100644
--- a/arch/metag/include/asm/cacheflush.h
+++ b/arch/metag/include/asm/cacheflush.h
@@ -124,6 +124,7 @@ static inline void flush_icache_range(unsigned long address,
metag_code_cache_flush((void *) address, endaddr - address);
 #endif
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 static inline void flush_cache_sigtramp(unsigned long addr, int size)
 {
diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c
index 097c2cd..f770e39 100644
--- a/arch/sh/mm/cache.c
+++ b/arch/sh/mm/cache.c
@@ -229,6 +229,7 @@ void flush_icache_range(unsigned long start, unsigned long 
end)
 
cacheop_on_each_cpu(local_flush_icache_range, (void *)&data, 1);
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 void flush_icache_page(struct vm_area_struct *vma, struct page *page)
 {
diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c
index 01e8ab2..19eaa62 100644
--- a/arch/tile/kernel/smp.c
+++ b/arch/tile/kernel/smp.c
@@ -183,6 +183,7 @@ void flush_icache_range(unsigned long start, unsigned long 
end)
preempt_enable();
}
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 
 /* Called when smp_send_reschedule() triggers IRQ_RESCHEDULE. */
diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c
index 40b5a37..4d02e38 100644
--- a/arch/xtensa/kernel/smp.c
+++ b/arch/xtensa/kernel/smp.c
@@ -571,6 +571,7 @@ void flush_icache_range(unsigned long start, unsigned long 
end)
};
on_each_cpu(ipi_flush_icache_range, &fd, 1);
 }
+EXPORT_SYMBOL(flush_icache_range);
 
 /* - */
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH] Input: hyperv-keyboard - implement Type Clipboard Text

2014-08-17 Thread Dexuan Cui

> -Original Message-
> From: Dmitry Torokhov
> Sent: Saturday, August 16, 2014 0:58 AM
> To: Dexuan Cui
> > For each char in the string, the host sends 2 events (key down/up with the
> > char's UNICODE value) to the guest.
> > The patch finds each char's scan codes of key down/up, and injects the
> > scan codes to the serio keyboard module.
> >
> > Known issues:
> > 1) Only printable ASCII chars are supported, and unsupported chars are
> > ignored. It seems unlikely to support generic UNICODE chars because there
> > is not a generic API to inject a UNICODE char to text mode console, KDE,
> > gnome, etc.
> >
> > 2) When we use the feature, make sure the CapsLock state of the VM's
> > (virtual) keyboard is OFF because this patch assumes it -- we'll try to
> > fix this later, probably by tracking the state of virtual CapsLock, because
> > it looks the keyboard module doesn't supply an API for us to query the
> state
> > of the keyboard.
> >
> No way. If you want to do this this way, do it in hypervisor code and keep
> feeding AT scan codes to hyperv-keyboard, although I am pretty sure users
Hi Dmitry,
Yeah, I had the same wish, but later I found this seems unlikely because IMO
the feature was firstly invented for Windows VM + generic UNICODE chars,
and we know there is no "scan code" for generic UNICODE chars... :-(

> of
> French, Czech and other keyboard layouts with numbers in upper register
> and
> symbols in lower will have a few choice words for you.
Sorry, I can't understand what these are.
Can you please give more details or a link to further info?

> If you want real cut-and-paste support in various DEs I'd recommend
> working
> with VMware on open-vm-tools package to see what can be shared/reused
> there.
> Consider this NACked with prejudice.
> Dmitry
Thanks for the suggestion!
Let me study open-vm-tools and report back.

Thanks,
-- Dexuan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Aug 18

2014-08-17 Thread Stephen Rothwell

Hi all,

On Mon, 18 Aug 2014 13:32:59 +1000 Stephen Rothwell  
wrote:
>
> Please do not add code intended for v3.18 until after v3.17-rc1 is
> released.

Which it is, of course ...

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


signature.asc
Description: PGP signature

[PATCH] staging: comedi: s626: remove unnecessary variable initialization

2014-08-17 Thread Chase Southwood

We initialize 'irqbit' to 0, only to properly set it immediately
afterwards.  Just remove the zero-initialization.

Signed-off-by: Chase Southwood 
Cc: Ian Abbott 
Cc: H Hartley Sweeten 
---
 drivers/staging/comedi/drivers/s626.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/s626.c 
b/drivers/staging/comedi/drivers/s626.c
index 080608a..e42720c 100644
--- a/drivers/staging/comedi/drivers/s626.c
+++ b/drivers/staging/comedi/drivers/s626.c
@@ -1399,7 +1399,6 @@ static void s626_check_dio_interrupts(struct 
comedi_device *dev)
uint8_t group;
 
for (group = 0; group < S626_DIO_BANKS; group++) {
-   irqbit = 0;
/* read interrupt type */
irqbit = s626_debi_read(dev, S626_LP_RDCAPFLG(group));
 
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: comedi: dt2801: change function return type to void

2014-08-17 Thread Chase Southwood

cppcheck was complaining that the variable 'stat' is being reassigned
before the old value is used.  Upon inspection, I found that
dt2801_writecmd() cannot fail, always returns 0, and most callers already
do not bother with assigning its return value anyway, so it makes sense to
just change the return type for this function from int to void, and remove
the two assignments to 'stat'.

Signed-off-by: Chase Southwood 
Cc: Ian Abbott 
Cc: H Hartley Sweeten 
---
 drivers/staging/comedi/drivers/dt2801.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/comedi/drivers/dt2801.c 
b/drivers/staging/comedi/drivers/dt2801.c
index ad8ba0b..c16d468 100644
--- a/drivers/staging/comedi/drivers/dt2801.c
+++ b/drivers/staging/comedi/drivers/dt2801.c
@@ -309,7 +309,7 @@ static int dt2801_wait_for_ready(struct comedi_device *dev)
return -ETIME;
 }
 
-static int dt2801_writecmd(struct comedi_device *dev, int command)
+static void dt2801_writecmd(struct comedi_device *dev, int command)
 {
int stat;
 
@@ -323,8 +323,6 @@ static int dt2801_writecmd(struct comedi_device *dev, int 
command)
if (!(stat & DT_S_READY))
dev_dbg(dev->class_dev, "!ready in %s, ignoring\n", __func__);
outb_p(command, dev->iobase + DT2801_CMD);
-
-   return 0;
 }
 
 static int dt2801_reset(struct comedi_device *dev)
@@ -380,7 +378,7 @@ static int probe_number_of_ai_chans(struct comedi_device 
*dev)
int data;
 
for (n_chans = 0; n_chans < 16; n_chans++) {
-   stat = dt2801_writecmd(dev, DT_C_READ_ADIM);
+   dt2801_writecmd(dev, DT_C_READ_ADIM);
dt2801_writedata(dev, 0);
dt2801_writedata(dev, n_chans);
stat = dt2801_readdata2(dev, &data);
@@ -451,7 +449,7 @@ static int dt2801_ai_insn_read(struct comedi_device *dev,
int i;
 
for (i = 0; i < insn->n; i++) {
-   stat = dt2801_writecmd(dev, DT_C_READ_ADIM);
+   dt2801_writecmd(dev, DT_C_READ_ADIM);
dt2801_writedata(dev, CR_RANGE(insn->chanspec));
dt2801_writedata(dev, CR_CHAN(insn->chanspec));
stat = dt2801_readdata2(dev, &d);
-- 
2.0.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Aug 18

2014-08-17 Thread Stephen Rothwell

Hi all,

Please do not add code intended for v3.18 until after v3.17-rc1 is
released.

Changes since 20140815:

The sound-asoc tree gained a build failure for which I reverted a commit.

The regulator tree gained a build failure so I used the verison from
next-20140815.

The staging tree gained a build failure for which I applied a fix patch.

Non-merge commits (relative to Linus' tree): 1010
 955 files changed, 25072 insertions(+), 21100 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm
defconfig.

Below is a summary of the state of the merge.

I am currently merging 220 trees (counting Linus' and 30 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (7d1311b93e58 Linux 3.17-rc1)
Merging fixes/master (23cf8d3ca0fd powerpc: Fix "attempt to move .org 
backwards" error)
Merging kbuild-current/rc-fixes (dd5a6752ae7d firmware: Create directories for 
external firmware)
Merging arc-current/for-curr (89ca3b881987 Linux 3.15-rc4)
Merging arm-current/fixes (e57e41931134 ARM: wire up memfd_create syscall)
Merging m68k-current/for-linus (9117710a5997 m68k/sun3: Remove define statement 
no longer needed)
Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX)
Merging mips-fixes/mips-fixes (08a9c3c9afcf MIPS: OCTEON: make 
get_system_type() thread-safe)
Merging powerpc-merge/merge (396a34340cdf powerpc: Fix endianness of 
flash_block_list in rtas_flash)
Merging sparc/master (c9d26423e56c Merge tag 'pm+acpi-3.17-rc1-2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging net/master (21009686662f net: phy: smsc: move smsc_phy_config_init 
reset part in a soft_reset function)
Merging ipsec/master (a0e5ef53aac8 xfrm: Fix installation of AH IPsec SAs)
Merging sound-current/for-linus (f3ee07d8b6e0 ALSA: hda/realtek - Avoid setting 
wrong COEF on ALC269 & co)
Merging pci-current/for-linus (9baa3c34ac4e PCI: Remove DEFINE_PCI_DEVICE_TABLE 
macro use)
Merging wireless/master (77b2f2865956 iwlwifi: mvm: disable scheduled scan to 
prevent firmware crash)
Merging driver-core.current/driver-core-linus (7d1311b93e58 Linux 3.17-rc1)
Merging tty.current/tty-linus (7d1311b93e58 Linux 3.17-rc1)
Merging usb.current/usb-linus (7d1311b93e58 Linux 3.17-rc1)
Merging usb-gadget-fixes/fixes (a8a85b01d185 usb: musb/cppi41: call 
musb_ep_select() before accessing an endpoint's CSR)
CONFLICT (content): Merge conflict in drivers/usb/musb/musb_host.c
Merging usb-serial-fixes/usb-linus (7d1311b93e58 Linux 3.17-rc1)
Merging staging.current/staging-linus (eb29835fb3ae staging: android: fix a 
possible memory leak)
Merging char-misc.current/char-misc-linus (7d1311b93e58 Linux 3.17-rc1)
Merging input-current/for-linus (91167e191467 Merge branch 'next' into 
for-linus)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (ce5481d01f67 crypto: drbg - fix failure of 
generating multiple of 2**16 bytes)
Merging ide/master (a53dae49b2fe ide: use module_platform_driver())
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging devicetree-current/devicetree/merge (5a12a597a862 arm: Add devicetree 
fixup machine function)
Merging rr-fixes/fixes (79465d2fd48e module: remove warning about waiting 
module removal.)
Merging vfio-fixes/for-linus (239a87020b26 Merge branch 
'for-joerg/arm-smmu/fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into for-linus)
Merging drm-intel-fixes/for-linux-next-fixes (103ae732ad26 drm/i915: Don't try 
to enable cursor from setplane when crtc is

Re: [PATCH v2] memory-hotplug: add sysfs zones_online_to attribute

2014-08-17 Thread Zhang Zhen

On 2014/8/16 5:37, Toshi Kani wrote:
> On Wed, 2014-08-13 at 12:10 +0800, Zhang Zhen wrote:
>> Currently memory-hotplug has two limits:
>> 1. If the memory block is in ZONE_NORMAL, you can change it to
>> ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE.
>> 2. If the memory block is in ZONE_MOVABLE, you can change it to
>> ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL.
>>
>> With this patch, we can easy to know a memory block can be onlined to
>> which zone, and don't need to know the above two limits.
>>
>> Updated the related Documentation.
>>
>> Change v1 -> v2:
>> - optimize the implementation following Dave Hansen's suggestion
>>
>> Signed-off-by: Zhang Zhen 
>> ---
>>  Documentation/ABI/testing/sysfs-devices-memory |  8 
>>  Documentation/memory-hotplug.txt   |  4 +-
>>  drivers/base/memory.c  | 62 
>> ++
>>  include/linux/memory_hotplug.h |  1 +
>>  mm/memory_hotplug.c|  2 +-
>>  5 files changed, 75 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-devices-memory 
>> b/Documentation/ABI/testing/sysfs-devices-memory
>> index 7405de2..2b2a1d7 100644
>> --- a/Documentation/ABI/testing/sysfs-devices-memory
>> +++ b/Documentation/ABI/testing/sysfs-devices-memory
>> @@ -61,6 +61,14 @@ Users:hotplug memory remove tools
>>  
>> http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils
>>
>>
>> +What:   /sys/devices/system/memory/memoryX/zones_online_to
> 
> I think this name is a bit confusing.  How about "valid_online_types"?
> 
Thanks for your suggestion.

This patch has been added to -mm tree.
If most people think so, i would like to modify the interface name.
If not, let's leave it as it is.

Best regards!
> Thanks,
> -Toshi
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area

2014-08-17 Thread Theodore Ts'o

On Mon, Aug 18, 2014 at 10:15:32AM +0900, Gioh Kim wrote:
> 
> My test platform has totally 1GB memory, 256MB for CMA and 768MB for normal.
> I applied Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64, so that
> 3/4 of allocation take place in normal area and 1/4 allocation take place in 
> CMA area.
> 
> And my platform has 4 ext4 partitions. Each ext4 partition has 2 page caches 
> for superblock that
> are what this patch tries to move to out of CMA area.
> Therefore there are 8 page caches (8 pages size) that can prevent page 
> migration.

Yes, but are you actually *using* the ext4 partitions for anything?
If this is a realistic real world use case, file systems are used to
store, well, files, and that means there will be inodes and dentry
cache entries that will also be allocated.  Does your test scenario
reflect real world usage?

Cheers,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

2014-08-17 Thread Xishi Qiu

On 2014/8/18 9:13, tangchen wrote:

> Hi tj,
> 
> On 08/17/2014 07:08 PM, Tejun Heo wrote:
>> Hello,
>>
>> On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:
>>> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().
>> Yeah, that one.
>>
>>> If we don't clear hotpluggable flag in free_low_memory_core_early(), the
>>> memory which marked hotpluggable flag will not free to buddy allocator.
>>> Because __next_mem_range() will skip them.
>>>
>>> free_low_memory_core_early
>>> for_each_free_mem_range
>>> for_each_mem_range
>>> __next_mem_range   
>> Ah, okay, so the patch fixes __next_mem_range() and thus makes
>> free_low_memory_core_early() to skip hotpluggable regions unlike
>> before.  Please explain things like that in the changelog.  Also,
>> what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
>> still need them?  If so, what are the different roles that these two
>> separate places serve?
> 
> numa_clear_kernel_node_hotplug() only clears hotplug flags for the nodes
> the kernel resides in, not for hotpluggable nodes. The reason why we did
> this is to enable the kernel to allocate memory in case all the nodes are
> hotpluggable.
> 

Hi TangChen,

I find a problem in numa_init() (arch/x86/mm/numa.c)
numa_init()
...
ret = init_func();  // this will mark hotpluggable flag from SRAT
...
memblock_set_bottom_up(false);
...
ret = numa_register_memblks(&numa_meminfo);  // this will alloc node 
data(pglist_data) 
...
numa_clear_kernel_node_hotplug();  // in case all the nodes are 
hotpluggable
...

If all the nodes are marked hotpluggable flag, alloc node data will fail.
Because __next_mem_range_rev() will skip the hotpluggable memory regions.
numa_register_memblks()
setup_node_data()
memblock_find_in_range_node()
__memblock_find_range_top_down()
for_each_mem_range_rev()
__next_mem_range_rev()

What do you think?
How about move numa_clear_kernel_node_hotplug() into numa_register_memblks(),
like this:

numa_register_memblks()

...
memblock_set_node(mb->start, mb->end - mb->start,
  &memblock.reserved, mb->nid);
}

+numa_clear_kernel_node_hotplug();

/*
 * If sections array is gonna be used for pfn -> nid mapping, check
...

Thanks,
Xishi Qiu

> And we clear hotplug flags for all the nodes in free_low_memory_core_early()
> is because if we do not, all hotpluggable memory won't be able to be freed
> to buddy after Qiu's patch.
> 
> Thanks.
> 
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] earlyprintk: re-enable earlyprintk calling early_param

2014-08-17 Thread Sahara



2014년 08월 16일 03:34, Rusty Russell 쓴 글:

kpark3...@gmail.com writes:

From: Sahara 

Although there are many obs_kernel_param and its names are
earlyprintk and also EARLY_PRINTK is also enabled, we could not
see the early_printk output properly until now. This patch
considers earlycon as well as earlyprintk.

Hmm, the initial "earlycon" hack slipped in when I wasn't looking.
I don't think we should extend it.

Why not make the thing(s) you want early_param()s?

Cheers,
Rusty.
The earlycon and the earlyprintk are scattered and used in many 
architectures.

It looks earlycon just could be a subset of earlyprintk.
The earlycon is for uart specific, while the earlyprintk is to support 
vga, efi, xen, serial, and so on. Especially ARM uses earlyprintk in 
many places. And, I am not sure if this is a good chance to replace all 
the earlyprintk with the earlycon. As of now, it's fair for both 
earlycon and earlyprintk.
Or perhaps removing case#2, see in my previous email to Andrew Morton, 
is better?, so users be forced to specify earlycon and earlyprintk in 
cmdline if they want to see early_printk() output.


Thanks.

Best Regards,
Sahara.




--- a/init/main.c
+++ b/init/main.c
@@ -426,7 +426,8 @@ static int __init do_early_param(char *param, char *val, 
const char *unused)
for (p = __setup_start; p < __setup_end; p++) {
if ((p->early && parameq(param, p->str)) ||
(strcmp(param, "console") == 0 &&
-strcmp(p->str, "earlycon") == 0)
+((strcmp(p->str, "earlycon") == 0) ||
+(strcmp(p->str, "earlyprintk") == 0)))
) {
if (p->setup_func(val) != 0)
pr_warn("Malformed early option '%s'\n", param);
--
1.7.9.5


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH net-next] vhost_net: stop rx net polling when possible

2014-08-17 Thread Jason Wang

On 08/17/2014 06:20 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 15, 2014 at 11:40:08AM +0800, Jason Wang wrote:
>> After rx vq was enabled, we never stop polling its socket. This is sub 
>> optimal
>> when may lead unnecessary wake-ups after the rx net work has already been
>> queued. This could be optimized by stopping polling the rx net sock when
>> processing both rx and tx and restart it afterward. This could save 
>> unnecessary
>> wake-ups and even unnecessary spin locks acquiring with the help of commit
>> 9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 "net-tun: restructure tun_do_read 
>> for
>> better sleep/wakeup efficiency".
> OK so the point is to avoid expensive wake_up_process calls?
> It's a bit unfortunate that we are adding/removing things from wait
> queue which certainly does take extra spin-locks.

When nothing new were queued during vhost thread is running. This change
may add two more spin-locks which may not but optimal. But if several
packets were queued by tun during vhost thread is running, it may save
lots of unnecessary wake ups. So the patch helps the performance in the
heavy load case for sure. In light load case, it may hurt some
throughput but cpu and thru/cpu is still saved.

>
>
>
>> Test shows significant CPU% savings during almost all the cases:
>>
>> Guest rx stream:
>> size(B)/sessions/throughput/cpu/normalized thru/
>> 64/1/+0.7773%   -8.6224% +10.2866%
>> 64/2/+0.6335%   -13.9109%+16.8946%
>> 64/4/-0.8182%   -14.8336%+16.4565%
>> 64/8/+0.4830%   -13.7675%+16.5256%
>> 256/1/-7.0963%  -12.6880%+6.4043%
>> 256/2/-1.3982%  -11.5424%+11.4678%
>> 256/4/-0.0350%  -11.8323%+13.3806%
>> 256/8/-1.5830%  -12.7693%+12.8238%
>> 1024/1/-7.4895% -19.1449%   +14.4152%
>> 1024/2/-7.4575% -19.4018%   +14.8195%
>> 1024/4/-0.3881% -9.1183%+9.6061%
>> 1024/8/+0.4713% -11.0155%   +12.9087%
>> 4096/1/+0.8786%  -8.4050%+10.1355%
>> 4096/2/+0.0098%  -15.3094%   +18.0885%
>> 4096/4/+0.0445%  -10.8247%   +12.1886%
>> 4096/8/-2.1317%  -12.5111%   +11.8637%
>> 16384/1/-0.0008% -6.1891%+6.5966%
>> 16384/2/-0.0117% -16.2716%   +19.4198%
>> 16384/4/+0.0001% -5.9197%+6.2923%
>> 16384/8/+0.0173% -7.6681%+8.3236%
>> 65535/1/+0.0011% -10.3594%   +11.5578%
>> 65535/2/-0.4108%  -14.4304%   +16.3838%
>> 65535/4/+0.0011%  -10.3594%   +11.5578%
>> 65535/8/-0.4108%  -14.4304%   +16.3838%
>>
>> Guest tx stream:
>> size(B)/sessions/throughput/cpu/normalized thru/
>> 64/1/-0.6228% -2.1936% +1.6060%
>> 64/2/+0.8646% -3.5063% +4.5297%
>> 64/4/+0.8733% -3.2495% +4.2613%
>> 64/8/+1.4290% -3.5593% +5.1724%
>> 256/1/+7.2098%-3.1122% +10.6535%
>> 256/2/-10.1408%   -6.8230% -3.5607%
>> 256/4/-11.3531%   -6.7085% -4.9785%
>> 256/8/-10.2723%   -6.5628% -3.9701%
>> 1024/1/-18.9329%  -13.6162%-6.1547%
>> 1024/2/-0.3728%   -1.3181% +0.9580%
>> 1024/4/+0.0125%   -3.6338% +3.7838%
>> 1024/8/-0.0030%   -2.7282% +2.8017%
>> 4096/1/+16.9367%  -1.9435% +19.2543%
>> 4096/2/+0.0121%   -6.1682% +6.5866%
>> 4096/4/+0.0019%   -3.8510% +4.0072%
>> 4096/8/-0.0222%   -4.1368% +4.2922%
>> 16384/1/-0.0026%  -8.6892% +9.5132%
>> 16384/2/-0.0012%  -10.1676%+11.3171%
>> 16384/4/+0.0196%  -1.2551% +1.2908%
>> 16384/8/+0.1303%  -3.2634% +3.5082%
>> 65535/1/+0.0019%  -3.4694% +3.5961%
>> 65535/2/-0.0003%  -0.7635% +0.7690%
>> 65535/4/-0.0219%  -2.7875% +2.8448%
>> 65535/8/+0.1137%  -2.7922% +2.9894%
>>
>> TCP_RR:
>> size(B)/sessions/throughput/cpu/normalized thru/
>> 256/1/+1.9004%-4.7985% +7.0366%
>> 256/25/-4.7366%   -11.0809%+7.1349%
>> 256/50/+3.9808%   -5.2037% +9.6887%
>> 4096/1/+2.1619%   -0.7303% +2.9134%
>> 4096/25/-13.1836% -14.7298%+1.8134%
>> 4096/50/-11.1990% -15.4763%+5.0605%
>>
>> Signed-off-by: Jason Wang 
>
> Could you split RX/TX parts out please, and benchmark separately?
>
> They are really independent.

Ok.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: power supply gating with ltc2978

2014-08-17 Thread Guenter Roeck

On Sat, Aug 16, 2014 at 02:20:50PM +0100, Mark Brown wrote:
> On Fri, Aug 15, 2014 at 04:34:49PM -0500, atull wrote:
> 
> > I am interested in adding functionality to be able to gate power supplies 
> > going through a ltc2978.  I see that there is a hwmon driver already 
> > existing (hwmon/pmbus/ltc2978.c).  I see some of the other hwmon drivers 
> > have MFD's.  It looks like this ltc driver would need a MFD and a 
> > regulator driver added.  However I don't see other pmbus hwmon drivers
> > using MFD.
> 
> > So I am asking for recommendations and reservations on how to proceed here 
> > before I get too far with this.
> 
> Without knowing anything at all about pmbus or this particular hardware
> it's hard to comment but what you're saying here sounds sensible (though
> I do see that apparently splitting the drivers may not actually be
> sensible from Guenter's followup).

I had originally thought about converting the pmbus drivers to mfd with client
drivers, but I concluded that it would add a lot of complexity with little gain.
It makes sense to separate a driver into mfd and a number of client drivers
if a device has clear functional blocks for the different devices it supports.
With PMBus, this is not the case. Separating a PMBus driver would be a purely
artificial costruct, and there would be overlapping functionality. Separating
just a single driver out of the group of PMBus drivers, as seems to be suggested
above, makes even less sense as one simply can not separate the core PMBus
driver code from its front-end drivers.

On the other side, adding regulator support into the PMBus driver code would
make a lot of sense. It should also be quite straightforward.

Or anyway that is my opinion. If someone wants to spend the time and separate
the PMBus drivers into an MfD part and hwmon and regulator client drivers, I'll
be happy to look at the resulting patch set.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mmc: core: sdio: Fix unconditional wake_up_process() on sdio thread

2014-08-17 Thread Fu, Zhonghui

>From 21266249bbbaf9407c1e88cd5950e06ac88aeebf Mon Sep 17 00:00:00 2001
From: Fu Zhonghui 
Date: Mon, 18 Aug 2014 10:48:14 +0800
Subject: [PATCH] mmc: core: sdio: Fix unconditional wake_up_process() on sdio 
thread

781e989cf59 ("mmc: sdhci: convert to new SDIO IRQ handling") and
bf3b5ec66bd ("mmc: sdio_irq: rework sdio irq handling") disabled
the use of our own custom threaded IRQ handler, but left in an
unconditional wake_up_process() on that handler at resume-time.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=80151

In addition, the check for MMC_CAP_SDIO_IRQ capability is added
before enable sdio IRQ.

Signed-off-by: Jaehoon Chung 
Signed-off-by: Chris Ball 
Signed-off-by: Ulf Hansson 
Signed-off-by: Fu Zhonghui 
---
 drivers/mmc/core/sdio.c |   12 ++--
 drivers/mmc/core/sdio_irq.c |4 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index e636d9e..3fc40a7 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -992,8 +992,16 @@ static int mmc_sdio_resume(struct mmc_host *host)
}
}
 
-   if (!err && host->sdio_irqs)
-   wake_up_process(host->sdio_irq_thread);
+   if (!err && host->sdio_irqs) {
+   if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD)) {
+   wake_up_process(host->sdio_irq_thread);
+   } else if (host->caps & MMC_CAP_SDIO_IRQ) {
+   mmc_host_clk_hold(host);
+   host->ops->enable_sdio_irq(host, 1);
+   mmc_host_clk_release(host);
+   }
+   }
+
mmc_release_host(host);
 
host->pm_flags &= ~MMC_PM_KEEP_POWER;
diff --git a/drivers/mmc/core/sdio_irq.c b/drivers/mmc/core/sdio_irq.c
index 5cc13c8..696eca4 100644
--- a/drivers/mmc/core/sdio_irq.c
+++ b/drivers/mmc/core/sdio_irq.c
@@ -208,7 +208,7 @@ static int sdio_card_irq_get(struct mmc_card *card)
host->sdio_irqs--;
return err;
}
-   } else {
+   } else if (host->caps & MMC_CAP_SDIO_IRQ) {
mmc_host_clk_hold(host);
host->ops->enable_sdio_irq(host, 1);
mmc_host_clk_release(host);
@@ -229,7 +229,7 @@ static int sdio_card_irq_put(struct mmc_card *card)
if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD)) {
atomic_set(&host->sdio_irq_thread_abort, 1);
kthread_stop(host->sdio_irq_thread);
-   } else {
+   } else if (host->caps & MMC_CAP_SDIO_IRQ) {
mmc_host_clk_hold(host);
host->ops->enable_sdio_irq(host, 0);
mmc_host_clk_release(host);
-- 1.7.1

On 2014/8/12 18:23, Ulf Hansson wrote:
> On 11 August 2014 07:49, Fu, Zhonghui  wrote:
>> From 6cee984e1d76ba0a3320430f8cf4318ab65fcf06 Mon Sep 17 00:00:00 2001
>> From: Fu Zhonghui 
>> Date: Tue, 5 Aug 2014 12:44:38 +0800
>> Subject: [PATCH] mmc: core: sdio: Fix unconditional wake_up_process() on 
>> sdio thread
>>
>> 781e989cf59 ("mmc: sdhci: convert to new SDIO IRQ handling") and
>> bf3b5ec66bd ("mmc: sdio_irq: rework sdio irq handling") disabled
>> the use of our own custom threaded IRQ handler, but left in an
>> unconditional wake_up_process() on that handler at resume-time.
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=80151
>>
>> In addition, the check for MMC_CAP_SDIO_IRQ capability is added
>> before enable sdio IRQ.
>>
>> Signed-off-by: Jaehoon Chung 
>> Signed-off-by: Chris Ball 
>> Signed-off-by: Fu Zhonghui 
>> ---
>>  drivers/mmc/core/sdio.c |   14 --
>>  drivers/mmc/core/sdio_irq.c |4 ++--
>>  2 files changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
>> index e636d9e..e04a540 100644
>> --- a/drivers/mmc/core/sdio.c
>> +++ b/drivers/mmc/core/sdio.c
>> @@ -992,8 +992,18 @@ static int mmc_sdio_resume(struct mmc_host *host)
>> }
>> }
>>
>> -   if (!err && host->sdio_irqs)
>> -   wake_up_process(host->sdio_irq_thread);
>> +   if (!err && host->sdio_irqs) {
>> +   if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD)) {
>> +   wake_up_process(host->sdio_irq_thread);
>> +   } else if (host->caps & MMC_CAP_SDIO_IRQ) {
>> +   mmc_release_host(host);
> Why mmc_release_host() and the corresponding mmc_claim_host() below?
> Those shouldn't be needed I think.

You are right. These two functions shouldn't be invoked here. I made a new 
patch as above.


Thanks,
Zhonghui
>
>
>> +   mmc_host_clk_hold(host);
>> +   host->ops->enable_sdio_irq(host, 1);
>> +   mmc_host_clk_release(host);
>> +   mmc_claim_host(host);
>> +   }
>>

Re: [PATCH] earlyprintk: re-enable earlyprintk calling early_param

2014-08-17 Thread Sahara



2014년 08월 15일 05:34, Andrew Morton 쓴 글:

On Thu, 14 Aug 2014 19:13:36 +0900 kpark3...@gmail.com wrote:


From: Sahara 

Although there are many obs_kernel_param and its names are
earlyprintk and also EARLY_PRINTK is also enabled, we could not
see the early_printk output properly until now. This patch
considers earlycon as well as earlyprintk.

Sorry, I just don't understand this description.

What does the patch actually do?  What was the kernel behaviour without
the patch and what is the kernel behaviour with the patch?


Without this patch,
- earlycon case -
if early_param("earlycon", ...) is defined and
case #1: if cmdline has "earlycon", then it satisfies the condition 
"(p->early && parameq(param, p->str))". You can see early_printk().
case #2: if cmdline has "console", then it satisfies the condition 
"strcmp(param, "console") == 0 && strcmp(p->str, "earlycon") == 0". You 
can see early_printk().


- earlyprintk case -
if early_param("earlyprintk", ...) is defined and
case #1: if cmdline has "earlyprintk", then it satisfies the condition 
"(p->early && parameq(param, p->str))". You can see early_printk().
case #2: if cmdline has "console", it does not satisfies the condition, 
because it only checks out "earlycon" only.


This patch fixes the case #2 problem of earlyprintk.

Thanks.

Best Regards,
Sahara.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 26/35] [PATCH 26/36] powerpc: Replace __get_cpu_var uses

2014-08-17 Thread Christoph Lameter

korg tester found an issue:


From: Christoph Lameter 
Subject: powerpc: Fix reference to opal_trace_depth.

depth is an address and not a scalar. Use & to determine the address.

Signed-off-by: Christoph Lameter 

Index: linux/arch/powerpc/platforms/powernv/opal-tracepoints.c
===
--- linux.orig/arch/powerpc/platforms/powernv/opal-tracepoints.c
+++ linux/arch/powerpc/platforms/powernv/opal-tracepoints.c
@@ -48,7 +48,7 @@ void __trace_opal_entry(unsigned long op

local_irq_save(flags);

-   depth = this_cpu_ptr(opal_trace_depth);
+   depth = this_cpu_ptr(&opal_trace_depth);

if (*depth)
goto out;
@@ -69,7 +69,7 @@ void __trace_opal_exit(long opcode, unsi

local_irq_save(flags);

-   depth = this_cpu_ptr(opal_trace_depth);
+   depth = this_cpu_ptr(&opal_trace_depth);

if (*depth)
goto out;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] of: Add of_match_machine helper

2014-08-17 Thread Grant Likely

On Fri, 8 Aug 2014 02:01:53 +0300, Tuomas Tynkkynen  
wrote:
> Add of_match_machine function to test the device tree root for an
> of_match array. This can be useful when testing SoC versions at runtime,
> for example.
> 
> Signed-off-by: Tuomas Tynkkynen 
> ---
>  drivers/of/base.c  | 21 +
>  include/linux/of.h |  3 +++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index d8574ad..37798ea 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -977,6 +977,27 @@ struct device_node 
> *of_find_matching_node_and_match(struct device_node *from,
>  EXPORT_SYMBOL(of_find_matching_node_and_match);
>  
>  /**
> + * of_match_machine - Tell if root of device tree has a matching of_match 
> struct
> + *   @matches:   array of of device match structures to search in
> + *
> + *   Returns the result of of_match_node for the root node.
> + */
> +const struct of_device_id *of_match_machine(const struct of_device_id 
> *matches)
> +{
> + const struct of_device_id *match;
> + struct device_node *root;
> +
> + root = of_find_node_by_path("/");
> + if (!root)
> + return NULL;
> +
> + match = of_match_node(matches, root);
> + of_node_put(root);
> + return match;
> +}
> +EXPORT_SYMBOL(of_match_machine);

Too wordy...

return of_match_node(matches, of_allnodes);

:-)

It could be a static inline, but I don't think it's even worth having a
helper. The callers could just open code the above.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] regulator: core: Fix build error due to const qualifier for ops

2014-08-17 Thread Axel Lin

Drop const qualifier for ops of struct regulator_desc.
Allow regulator drivers to update ops before registering regulator.

Fix below build error:
  CC [M]  drivers/regulator/mc13892-regulator.o
drivers/regulator/mc13892-regulator.c: In function 'mc13892_regulator_probe':
drivers/regulator/mc13892-regulator.c:586:3: error: assignment of member 
'set_mode' in read-only object
drivers/regulator/mc13892-regulator.c:588:3: error: assignment of member 
'get_mode' in read-only object
make[2]: *** [drivers/regulator/mc13892-regulator.o] Error 1
make[1]: *** [drivers/regulator] Error 2
make: *** [drivers] Error 2

Reported-by: Stephen Rothwell 
Signed-off-by: Axel Lin 
---
 include/linux/regulator/driver.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h
index efe058f..3abda75 100644
--- a/include/linux/regulator/driver.h
+++ b/include/linux/regulator/driver.h
@@ -246,7 +246,7 @@ struct regulator_desc {
int id;
bool continuous_voltage_range;
unsigned n_voltages;
-   const struct regulator_ops *ops;
+   struct regulator_ops *ops;
int irq;
enum regulator_type type;
struct module *owner;
-- 
1.9.1



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] of: Add of_match_machine helper

2014-08-17 Thread Grant Likely

On Fri, 8 Aug 2014 14:01:57 -0500, Rob Herring  wrote:
> On Fri, Aug 8, 2014 at 8:23 AM, Tuomas Tynkkynen  
> wrote:
> >
> >
> > On 08/08/14 12:41, Thierry Reding wrote:
> >>
> >>> +const struct of_device_id *of_match_machine(const struct of_device_id 
> >>> *matches)
> >>> +{
> >>> +const struct of_device_id *match;
> >>> +struct device_node *root;
> >>> +
> >>> +root = of_find_node_by_path("/");
> >>> +if (!root)
> >>> +return NULL;
> >>> +
> >>> +match = of_match_node(matches, root);
> >>> +of_node_put(root);
> >>> +return match;
> >>> +}
> >>> +EXPORT_SYMBOL(of_match_machine);
> >>
> >> I wonder if of_find_node_by_path("/") is somewhat overkill here. Perhaps
> >> simply of_node_get(of_allnodes) would be more appropriate here since the
> >> function is implemented in the core?
> >
> > of_machine_is_compatible() uses of_find_node_by_path("/") as well, 
> > of_allnodes
> > seems to be only used when during iterating. So I'd prefer to have them
> > consistent.
> 
> Agreed.

Disagreed. of_machine_is_compatible should be simplified.

g.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Nicolas Pitre

On Sun, 17 Aug 2014, Jason Cooper wrote:

> On Sun, Aug 17, 2014 at 09:35:11PM -0400, Nicolas Pitre wrote:
> > On Sun, 17 Aug 2014, Jason Cooper wrote:
> > 
> > > On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote:
> > > > On Sun, 17 Aug 2014, Jason Cooper wrote:
> > > > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux 
> > > > > wrote:
> > > > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote:
> > > > > > > Applied to irqchip/urgent with Nico's Ack.
> > > > > > 
> > > > > > Interesting, so I'm discussing this patch, and it gets applied 
> > > > > > anyway...
> > > > > > yes, that's great.
> > > > > 
> > > > > Quoting Nico:
> > > > > 
> > > > > "Of course it would be good to clarify things wrt Russell's remark
> > > > > independently from this patch."
> > > > > 
> > > > > I took 'independently' to mean "This patch is ok, *and* we need to
> > > > > address Russell's concerns in a follow-up patch."
> > > > > 
> > > > > Nico's Reviewed-by with that comment was sent August 13th.  The most
> > > > > recent activity on this thread was also August 13th.  After four 
> > > > > days, I
> > > > > reasoned there were no objections to his comment.
> > > > 
> > > > Well... I mentioned this patch is a nice cleanup independently of the 
> > > > reason why it was created in the first place.
> > > 
> > > Ah, fair enough.
> > > 
> > > > Maybe that shouldn't be sorted as "urgent" in that case, especially
> > > > when the code having problem with the current state of things is
> > > > living out of mainline.
> > > 
> > > hmmm, yes.  I've been grappling with the semantics of '/urgent' vice
> > > '/fixes'.  With mvebu, /fixes is the branch for all changes needing to go
> > > into the current -rcX cycle.  For irqchip, Thomas suggested /urgent for
> > > the equivalent branch.  To me, they serve the same purpose.
> > > Unfortunately, I occasionally hear "Well, it's not _urgent_ ...".  I
> > > suppose I'll put up with it for one more cycle and then change it to
> > > /fixes. :)
> > > 
> > > wrt this patch, I need to drop it anyway.  I was a bit rusty (it's been
> > > a few weeks) and forgot to add the Cc -stable and Fixes: tags.  I do
> > > agree, though, it's certainly not urgent.
> > 
> > Given the raised issue has to do with out-of-tree code, there is no need 
> > to CC stable in that case anyway.
> 
> I could go either way here.  On the one hand, a fix is a fix is a fix.
> On the other, if it can't be triggered in mainline, we shouldn't accept
> it at all.

For mainline, it should be accepted as a cleanup and minor optimization 
since no mainline code is currently affected by the absence of this 
patch.

If there is a real bug being fixed by this patch, and whether the best 
way to fix it is by relying on this patch, is still up for debate.

> Stephen, is the out of tree code that triggered this bound for mainline?

Maybe "mainline", but certainly not "stable".


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10

2014-08-17 Thread Wanpeng Li

Hi Nadav,
On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote:
>Recent Intel CPUs have 10 variable range MTRRs. Since operating systems
>sometime make assumptions on CPUs while they ignore capability MSRs, it is
>better for KVM to be consistent with recent CPUs. Reporting more MTRRs than
>actually supported has no functional implications.
>
>Signed-off-by: Nadav Amit 
>---
> arch/x86/include/asm/kvm_host.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>index 4931415..0bab29d 100644
>--- a/arch/x86/include/asm/kvm_host.h
>+++ b/arch/x86/include/asm/kvm_host.h
>@@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, 
>int level)
> #define KVM_REFILL_PAGES 25
> #define KVM_MAX_CPUID_ENTRIES 80
> #define KVM_NR_FIXED_MTRR_REGION 88
>-#define KVM_NR_VAR_MTRR 8
>+#define KVM_NR_VAR_MTRR 10
> 

We observed that there is obvious regression caused by this commit, 32bit 
win7 guest show blue screen during boot.

Regards,
Wanpeng Li 

> #define ASYNC_PF_PER_VCPU 64
> 
>-- 
>1.9.1
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majord...@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V2] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

2014-08-17 Thread Xishi Qiu

Let memblock skip the hotpluggable memory regions in __next_mem_range(),
it is used to to prevent memblock from allocating hotpluggable memory 
for the kernel at early time. The code is the same as __next_mem_range_rev().

Clear hotpluggable flag before releasing free pages to the buddy allocator.
If we don't clear hotpluggable flag in free_low_memory_core_early(), the 
memory which marked hotpluggable flag will not free to buddy allocator.
Because __next_mem_range() will skip them.

free_low_memory_core_early
for_each_free_mem_range
for_each_mem_range
__next_mem_range

Signed-off-by: Xishi Qiu 
---
 mm/memblock.c  |4 
 mm/nobootmem.c |2 ++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 6d2f219..5090050 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -817,6 +817,10 @@ void __init_memblock __next_mem_range(u64 *idx, int nid,
if (nid != NUMA_NO_NODE && nid != m_nid)
continue;
 
+   /* skip hotpluggable memory regions if needed */
+   if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
+   continue;
+
if (!type_b) {
if (out_start)
*out_start = m_start;
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 7ed5860..03de286 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -119,6 +119,8 @@ static unsigned long __init free_low_memory_core_early(void)
phys_addr_t start, end;
u64 i;
 
+   memblock_clear_hotplug(0, ULLONG_MAX);
+
for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
count += __free_memory_core(start, end);
 
-- 1.7.1 



. 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

2014-08-17 Thread Xishi Qiu

On 2014/8/17 19:08, Tejun Heo wrote:

> Hello,
> 
> On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:
>> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().
> 
> Yeah, that one.
> 
>> If we don't clear hotpluggable flag in free_low_memory_core_early(), the 
>> memory which marked hotpluggable flag will not free to buddy allocator.
>> Because __next_mem_range() will skip them.
>>
>> free_low_memory_core_early
>>  for_each_free_mem_range
>>  for_each_mem_range
>>  __next_mem_range
> 
> Ah, okay, so the patch fixes __next_mem_range() and thus makes
> free_low_memory_core_early() to skip hotpluggable regions unlike
> before.  Please explain things like that in the changelog.  Also,

OK, I will send V2.

Thanks,
Xishi Qiu

> what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
> still need them?  If so, what are the different roles that these two
> separate places serve?
> 
> Thanks.
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH V2] I2C: Rework kernel config I2C_ACPI

2014-08-17 Thread Lan Tianyu

On 2014年08月15日 19:03, Wolfram Sang wrote:
> On Fri, Aug 15, 2014 at 01:38:59PM +0800, Lan Tianyu wrote:
>> Commit da3c6647(I2C/ACPI: Clean up I2C ACPI code and Add CONFIG_I2C_ACPI
>> config) adds a new kernel config I2C_ACPI and make I2C core built in
>> when the config is selected. This is wrong because distributions
>> etc generally compile I2C as a module and the commit broken that.
>> This patch is to rename I2C_ACPI to ACPI_I2C_OPREGION. New config
>> only controls ACPI I2C operation region code and depends on I2C=y.
>>
>> Signed-off-by: Lan Tianyu 
> 
> It looks good. What tests did you perform?
> 
> Thanks,
> 
>Wolfram
> 

Hi Wolfram:
The patch passed through Fengguang's 0-day autobuild test.

Following are config files tested.

configs tested: 122

pariscc3000_defconfig
parisc b180_defconfig
parisc  defconfig
alpha   defconfig
pariscallnoconfig
mips allmodconfig
mips   jz4740
mips  allnoconfig
mips  fuloong2e_defconfig
mips txx9
x86_64allnoconfig
x86_64lkp
x86_64   rhel
shtitan_defconfig
sh  rsk7269_defconfig
sh  sh7785lcr_32bit_defconfig
shallnoconfig
x86_64 randconfig-c3-0815
x86_64 randconfig-c1-0815
x86_64 randconfig-c0-0815
x86_64 randconfig-c2-0815
x86_64   allmodconfig
i386   randconfig-jx5
i386   randconfig-jx4
i386   randconfig-jx7
i386   randconfig-jx6
i386   randconfig-jx1
i386   randconfig-jx0
i386   randconfig-jx3
i386   randconfig-jx2
i386   randconfig-jx9
i386   randconfig-jx8
x86_64 randconfig-jx8
x86_64 randconfig-jx9
x86_64 randconfig-jx2
x86_64 randconfig-jx3
x86_64 randconfig-jx0
x86_64 randconfig-jx1
x86_64 randconfig-jx6
x86_64 randconfig-jx7
x86_64 randconfig-jx4
x86_64 randconfig-jx5
powerpc  chroma_defconfig
powerpc linkstation_defconfig
powerpc   powerpc
powerpc wii_defconfig
powerpcgamecube_defconfig
powerpc   corenet64_smp_defconfig
powerpc   mpc512x
powerpcppc44x
x86_64 randconfig-j0-0815
x86_64 randconfig-j1-0815
i386  randconfig-ha2-0815
i386  randconfig-ha5-0815
i386  randconfig-ha1-0815
i386  randconfig-ha0-0815
i386  randconfig-ha3-0815
i386  randconfig-ha4-0815
ia64 allmodconfig
ia64  allnoconfig
ia64defconfig
ia64 alldefconfig
sparc   defconfig
sparc64   allnoconfig
sparc64 defconfig
xtensa   common_defconfig
m32r   m32104ut_defconfig
xtensa  iss_defconfig
m32r opsput_defconfig
m32r   usrv_defconfig
m32r mappi3.smp_defconfig
microblaze  mmu_defconfig
microblazenommu_defconfig
microblaze   allyesconfig
i386 allyesconfig
cris etrax-100lx_v2_defconfig
blackfin  TCM-BF537_defconfig
blackfinBF561-EZKIT-SMP_defconfig
blackfinBF533-EZKIT_defconfig
blackfinBF526-EZBRD_defconfig
i386   randconfig-r1-0815
i386   randconfig-r2-0815
i386   randconfig-r3-0815
i386   randconfig-r0-0815
s390 allmodconfig
s390  allnoconfig
s390defconfig
mn10300 asb2364_defconfig
openriscor1ksim_defconfig
um   x86_64_defconfig
um i386_defconfig
avr32  atngw10

Re: [PATCH] usb: phy: return -ENODEV on failure of try_module_get

2014-08-17 Thread Greg KH

On Mon, Aug 18, 2014 at 12:04:42AM +0530, Arjun Sreedharan wrote:
> When __usb_find_phy_dev() does not return error and
> try_module_get() fails, return -ENODEV
> 
> Signed-off-by: Arjun Sreedharan 
> ---
>  drivers/usb/phy/phy.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c
> index 36b6bce..8ad3638 100644
> --- a/drivers/usb/phy/phy.c
> +++ b/drivers/usb/phy/phy.c
> @@ -232,6 +232,7 @@ struct usb_phy *usb_get_phy_dev(struct device *dev, u8 
> index)
>   phy = __usb_find_phy_dev(dev, &phy_bind_list, index);
>   if (IS_ERR(phy) || !try_module_get(phy->dev->driver->owner)) {
>   dev_dbg(dev, "unable to find transceiver\n");
> + phy = IS_ERR(phy) ? phy : ERR_PTR(-ENODEV);

Please just spell out the if () statement, don't use ? : unless
necessary.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] usb: gadget: remove $(PWD) in ccflags-y

2014-08-17 Thread Greg KH

On Mon, Aug 18, 2014 at 12:08:07AM +0200, Philippe Reynes wrote:
> The variable $(PWD) is useless, and it may break the compilation.
> For example, it breaks the kernel compilation when it's done with
> buildroot :
> 
>   /home/trem/Codes/armadeus/armadeus/buildroot/output/host/usr/bin/ccache
> /home/trem/Codes/armadeus/armadeus/buildroot/output/host/usr/bin/arm-buildroot-linux-uclibcgnueabi-gcc
> -Wp,-MD,drivers/usb/gadget/legacy/.hid.o.d  -nostdinc -isystem
> /home/trem/Codes/armadeus/armadeus/buildroot/output/host/usr/lib/gcc/arm-buildroot-linux-uclibcgnueabi/4.7.3/include
> -I./arch/arm/include -Iarch/arm/include/generated  -Iinclude
> -I./arch/arm/include/uapi -Iarch/arm/include/generated/uapi
> -I./include/uapi -Iinclude/generated/uapi -include
> ./include/linux/kconfig.h -D__KERNEL__ -mlittle-endian -Wall -Wundef
> -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
> -Werror-implicit-function-declaration -Wno-format-security
> -fno-dwarf2-cfi-asm -mabi=aapcs-linux -mno-thumb-interwork -mfpu=vfp
> -funwind-tables -marm -D__LINUX_ARM_ARCH__=5 -march=armv5te
> -mtune=arm9tdmi -msoft-float -Uarm -fno-delete-null-pointer-checks -O2
> --param=allow-store-data-races=0 -Wframe-larger-than=1024
> -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer
> -fno-var-tracking-assignments -g -Wdeclaration-after-statement
> -Wno-pointer-sign -fno-strict-overflow -fconserve-stack
> -Werror=implicit-int -Werror=strict-prototypes -DCC_HAVE_ASM_GOTO
> -I/home/trem/Codes/armadeus/armadeus/buildroot/drivers/usb/gadget/
> -I/home/trem/Codes/armadeus/armadeus/buildroot/drivers/usb/gadget/udc/
> -I/home/trem/Codes/armadeus/armadeus/buildroot/drivers/usb/gadget/function/
> -DMODULE  -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(hid)"
> -D"KBUILD_MODNAME=KBUILD_STR(g_hid)" -c -o
> drivers/usb/gadget/legacy/hid.o drivers/usb/gadget/legacy/hid.c
> drivers/usb/gadget/epautoconf.c:23:26: erreur fatale: gadget_chips.h :
> Aucun fichier ou dossier de ce type
> 
> This compilation line include :
> /buildroot/driver/usb/gadget
> but the real path is :
> /buildroot/output/build/linux-3.17-rc1/driver/usb/gadget
> 
> Signed-off-by: Philippe Reynes 
> ---
>  drivers/usb/gadget/Makefile  |2 +-
>  drivers/usb/gadget/function/Makefile |4 ++--
>  drivers/usb/gadget/legacy/Makefile   |6 +++---
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/usb/gadget/Makefile b/drivers/usb/gadget/Makefile
> index a186afe..9add915 100644
> --- a/drivers/usb/gadget/Makefile
> +++ b/drivers/usb/gadget/Makefile
> @@ -3,7 +3,7 @@
>  #
>  subdir-ccflags-$(CONFIG_USB_GADGET_DEBUG):= -DDEBUG
>  subdir-ccflags-$(CONFIG_USB_GADGET_VERBOSE)  += -DVERBOSE_DEBUG
> -ccflags-y+= -I$(PWD)/drivers/usb/gadget/udc
> +ccflags-y+= -Idrivers/usb/gadget/udc

Ick, why are these here at all, shouldn't you just use the proper
relative paths in the .c files for the include files?  That way just
building a .o file individually will work properly, otherwise, it will
not.

And getting rid of those other ccflags would be good to do as well, no
need for them to be in a Makefile.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Jason Cooper

On Sun, Aug 17, 2014 at 09:35:11PM -0400, Nicolas Pitre wrote:
> On Sun, 17 Aug 2014, Jason Cooper wrote:
> 
> > On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote:
> > > On Sun, 17 Aug 2014, Jason Cooper wrote:
> > > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux 
> > > > wrote:
> > > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote:
> > > > > > Applied to irqchip/urgent with Nico's Ack.
> > > > > 
> > > > > Interesting, so I'm discussing this patch, and it gets applied 
> > > > > anyway...
> > > > > yes, that's great.
> > > > 
> > > > Quoting Nico:
> > > > 
> > > > "Of course it would be good to clarify things wrt Russell's remark
> > > > independently from this patch."
> > > > 
> > > > I took 'independently' to mean "This patch is ok, *and* we need to
> > > > address Russell's concerns in a follow-up patch."
> > > > 
> > > > Nico's Reviewed-by with that comment was sent August 13th.  The most
> > > > recent activity on this thread was also August 13th.  After four days, I
> > > > reasoned there were no objections to his comment.
> > > 
> > > Well... I mentioned this patch is a nice cleanup independently of the 
> > > reason why it was created in the first place.
> > 
> > Ah, fair enough.
> > 
> > > Maybe that shouldn't be sorted as "urgent" in that case, especially
> > > when the code having problem with the current state of things is
> > > living out of mainline.
> > 
> > hmmm, yes.  I've been grappling with the semantics of '/urgent' vice
> > '/fixes'.  With mvebu, /fixes is the branch for all changes needing to go
> > into the current -rcX cycle.  For irqchip, Thomas suggested /urgent for
> > the equivalent branch.  To me, they serve the same purpose.
> > Unfortunately, I occasionally hear "Well, it's not _urgent_ ...".  I
> > suppose I'll put up with it for one more cycle and then change it to
> > /fixes. :)
> > 
> > wrt this patch, I need to drop it anyway.  I was a bit rusty (it's been
> > a few weeks) and forgot to add the Cc -stable and Fixes: tags.  I do
> > agree, though, it's certainly not urgent.
> 
> Given the raised issue has to do with out-of-tree code, there is no need 
> to CC stable in that case anyway.

I could go either way here.  On the one hand, a fix is a fix is a fix.
On the other, if it can't be triggered in mainline, we shouldn't accept
it at all.

Stephen, is the out of tree code that triggered this bound for mainline?

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v9 02/12] PCI: OF: Parse and map the IRQ when adding the PCI device.

2014-08-17 Thread Wei Yang

On Fri, Aug 15, 2014 at 11:30:52AM +0100, Liviu Dudau wrote:
>On Fri, Aug 15, 2014 at 09:56:32AM +0100, Wei Yang wrote:
>> On Thu, Aug 14, 2014 at 04:49:59PM +0100, Liviu Dudau wrote:
>> >On Thu, Aug 14, 2014 at 03:58:04PM +0100, Wei Yang wrote:
>> >> On Tue, Aug 12, 2014 at 05:25:15PM +0100, Liviu Dudau wrote:
>> >> >Enhance the default implementation of pcibios_add_device() to
>> >> >parse and map the IRQ of the device if a DT binding is available.
>> >> >
>> >> >Cc: Bjorn Helgaas 
>> >> >Cc: Grant Likely 
>> >> >Cc: Rob Herring 
>> >> >Signed-off-by: Liviu Dudau 
>> >> >---
>> >> > drivers/pci/pci.c | 3 +++
>> >> > 1 file changed, 3 insertions(+)
>> >> >
>> >> >diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> >> >index 1c8592b..29d1775 100644
>> >> >--- a/drivers/pci/pci.c
>> >> >+++ b/drivers/pci/pci.c
>> >> >@@ -17,6 +17,7 @@
>> >> > #include 
>> >> > #include 
>> >> > #include 
>> >> >+#include 
>> >> > #include 
>> >> > #include 
>> >> > #include 
>> >> >@@ -1453,6 +1454,8 @@ EXPORT_SYMBOL(pcim_pin_device);
>> >> >  */
>> >> > int __weak pcibios_add_device(struct pci_dev *dev)
>> >> > {
>> >> >+dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
>> >> >+
>> >> > return 0;
>> >> > }
>> >> 
>> >> Liviu,
>> >> 
>> >> For this, my suggestion is to add arch dependent function to setup the irq
>> >> line for pci devices. I can't find an obvious reason this won't work on 
>> >> other
>> >> archs, but maybe this will hurt some of them?
>> >
>> >Hi Wei,
>> >
>> >I'm not sure I understand your point. Architectures that support OF will 
>> >obviously
>> >benefit from this common approach, and for the other ones the function is 
>> >empty
>> >so it will not change existing behaviour. If you are suggesting that I 
>> >should
>> >create a new API that each architecture could go and implement for setting 
>> >up the
>> >IRQ line then I would agree that it would be nice to have that, but the 
>> >question
>> >is how many architectures are outside OF that need this?
>> 
>> My suggestion is to define the pcibios_add_device() for arm arch, like the 
>> one
>> in arch/powerpc/kernel/pci-common.c. If my understanding is correct, this
>> patch set address the pci bus setup mostly on arm arch.
>
>And also arm64 at the least.
>
>> 
>> For those archs not support OF, this function is empty and has no effect. I
>> agree on this one.
>> 
>> For those archs rely on OF, we still have two cases:
>> 1. they would have implement this function like powerpc
>
>Actually, powerpc seems to be the only OF platform reimplementing this 
>function.
>s390 and x86 are not OF platforms.
>
>> 2. have other way to fix it up,  otherwise how it works now?
>
>Don't forget that my patchset aims to replace existing house-made code with a 
>more
>generic version. When architectures and platforms switch to my code they will 
>have
>to add this back in their code if it's needed.
>
>> If my assumption is correct, this change will either have no effect, or fix 
>> up
>> the irq line the second time. Not harmful, but not necessary.
>
>Well, it will become necessary as old code gets dismantled and converted 
>towards
>this patchset. To give you an example that I'm familiar with, for arch/arm the
>host bridge drivers have moved into drivers/pci/host, but they still depend/use
>the bios32 infrastructure that takes care of setting up the irq. When they 
>switch
>to my version they would have to go and debug the "irq not being assigned" 
>issue
>and it is quite likely that some of the people doing the conversion will 
>complain
>about my code rather than understanding the issue. What I'm trying to do is to
>make switching to my patchset as painless as possible, with a cleanup to remove
>redundant operations coming after the switchover.
>

This means this is a temporary version for the switchover period and will be
reverted after switchover?

>Does that sound like a reasonable plan?
>
>Best regards,
>Liviu
>
>> 
>> I am not familiar with other arch, so the second case is my deduction. If 
>> this
>> is not correct, please let me know.
>> 
>> >
>> >If I understood you correctly, it is a nice idea but slightly outside the 
>> >scope
>> >of my current patchset.
>> >
>> >Best regards,
>> >Liviu
>> >
>> >> 
>> >> >
>> >> >-- 
>> >> >2.0.4
>> >> >
>> >> >--
>> >> >To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> >> >the body of a message to majord...@vger.kernel.org
>> >> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> 
>> >> -- 
>> >> Richard Yang
>> >> Help you, Help me
>> >> 
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> >> the body of a message to majord...@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> 
>> >
>> 
>> -- 
>> Richard Yang
>> Help you, Help me
>> 
>> 
>
>-- 
>
>| I would like to |
>| fix the world,  |
>| but they're not |
>| giving me the   |
> \ source code!  /
>

Re: [PATCH RFC 6/7] usb: host: ehci-exynos: Remove unnecessary usb-phy support

2014-08-17 Thread Jingoo Han

On Thursday, August 14, 2014 11:24 PM, Vivek Gautam wrote:
> 
> Now that we have completely moved from older USB-PHY drivers
> to newer GENERIC-PHY drivers for PHYs available with USB controllers
> on Exynos series of SoCs, we can remove the support for the same
> in our host drivers too.
> This should fix the issue on ehci-exynos, wherein in the absence of
> SAMSUNG_USB2PHY config symbol, we ended up getting the NOP_USB_XCEIV phy
> when the same is enabled. And thus the PHYs are not configured properly.
> 
> Reported-by: Sachin Kamat 
> Signed-off-by: Vivek Gautam 

Reviewed-by: Jingoo Han 

Best regards,
Jingoo Han

> ---
>  drivers/usb/host/ehci-exynos.c |   53 
> ++--
>  1 file changed, 8 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/usb/host/ehci-exynos.c b/drivers/usb/host/ehci-exynos.c
> index cda0a2f..54944cc 100644
> --- a/drivers/usb/host/ehci-exynos.c
> +++ b/drivers/usb/host/ehci-exynos.c
> @@ -21,11 +21,8 @@
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
>  #include 
> -#include 
> 
>  #include "ehci.h"
> 
> @@ -47,9 +44,7 @@ static struct hc_driver __read_mostly exynos_ehci_hc_driver;
> 
>  struct exynos_ehci_hcd {
>   struct clk *clk;
> - struct usb_phy *phy;
> - struct usb_otg *otg;
> - struct phy *phy_g[PHY_NUMBER];
> + struct phy *phy[PHY_NUMBER];
>  };
> 
>  #define to_exynos_ehci(hcd) (struct exynos_ehci_hcd 
> *)(hcd_to_ehci(hcd)->priv)
> @@ -62,18 +57,6 @@ static int exynos_ehci_get_phy(struct device *dev,
>   int phy_number;
>   int ret = 0;
> 
> - exynos_ehci->phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2);
> - if (IS_ERR(exynos_ehci->phy)) {
> - ret = PTR_ERR(exynos_ehci->phy);
> - if (ret != -ENXIO && ret != -ENODEV) {
> - dev_err(dev, "no usb2 phy configured\n");
> - return ret;
> - }
> - dev_dbg(dev, "Failed to get usb2 phy\n");
> - } else {
> - exynos_ehci->otg = exynos_ehci->phy->otg;
> - }
> -
>   for_each_available_child_of_node(dev->of_node, child) {
>   ret = of_property_read_u32(child, "reg", &phy_number);
>   if (ret) {
> @@ -98,7 +81,7 @@ static int exynos_ehci_get_phy(struct device *dev,
>   }
>   dev_dbg(dev, "Failed to get usb2 phy\n");
>   }
> - exynos_ehci->phy_g[phy_number] = phy;
> + exynos_ehci->phy[phy_number] = phy;
>   }
> 
>   return ret;
> @@ -111,16 +94,13 @@ static int exynos_ehci_phy_enable(struct device *dev)
>   int i;
>   int ret = 0;
> 
> - if (!IS_ERR(exynos_ehci->phy))
> - return usb_phy_init(exynos_ehci->phy);
> -
>   for (i = 0; ret == 0 && i < PHY_NUMBER; i++)
> - if (!IS_ERR(exynos_ehci->phy_g[i]))
> - ret = phy_power_on(exynos_ehci->phy_g[i]);
> + if (!IS_ERR(exynos_ehci->phy[i]))
> + ret = phy_power_on(exynos_ehci->phy[i]);
>   if (ret)
>   for (i--; i >= 0; i--)
> - if (!IS_ERR(exynos_ehci->phy_g[i]))
> - phy_power_off(exynos_ehci->phy_g[i]);
> + if (!IS_ERR(exynos_ehci->phy[i]))
> + phy_power_off(exynos_ehci->phy[i]);
> 
>   return ret;
>  }
> @@ -131,14 +111,9 @@ static void exynos_ehci_phy_disable(struct device *dev)
>   struct exynos_ehci_hcd *exynos_ehci = to_exynos_ehci(hcd);
>   int i;
> 
> - if (!IS_ERR(exynos_ehci->phy)) {
> - usb_phy_shutdown(exynos_ehci->phy);
> - return;
> - }
> -
>   for (i = 0; i < PHY_NUMBER; i++)
> - if (!IS_ERR(exynos_ehci->phy_g[i]))
> - phy_power_off(exynos_ehci->phy_g[i]);
> + if (!IS_ERR(exynos_ehci->phy[i]))
> + phy_power_off(exynos_ehci->phy[i]);
>  }
> 
>  static void exynos_setup_vbus_gpio(struct device *dev)
> @@ -231,9 +206,6 @@ skip_phy:
>   goto fail_io;
>   }
> 
> - if (exynos_ehci->otg)
> - exynos_ehci->otg->set_host(exynos_ehci->otg, &hcd->self);
> -
>   err = exynos_ehci_phy_enable(&pdev->dev);
>   if (err) {
>   dev_err(&pdev->dev, "Failed to enable USB phy\n");
> @@ -273,9 +245,6 @@ static int exynos_ehci_remove(struct platform_device 
> *pdev)
> 
>   usb_remove_hcd(hcd);
> 
> - if (exynos_ehci->otg)
> - exynos_ehci->otg->set_host(exynos_ehci->otg, &hcd->self);
> -
>   exynos_ehci_phy_disable(&pdev->dev);
> 
>   clk_disable_unprepare(exynos_ehci->clk);
> @@ -298,9 +267,6 @@ static int exynos_ehci_suspend(struct device *dev)
>   if (rc)
>   return rc;
> 
> - if (exynos_ehci->otg)
> - exynos_ehci->otg->set_host(exynos_ehci->otg, &hcd->self);
> -
>   exynos_ehci_phy_disable(dev);
> 
>   clk_disable_unprepare(exynos_ehci->clk);
> @@ -316,9

Re: [PATCH RFC 7/7] usb: host: ohci-exynos: Remove unnecessary usb-phy support

2014-08-17 Thread Jingoo Han

On Thursday, August 14, 2014 11:24 PM, Vivek Gautam wrote:
> 
> Now that we have completely moved from older USB-PHY drivers
> to newer GENERIC-PHY drivers for PHYs available with USB controllers
> on Exynos series of SoCs, we can remove the support for the same
> in our host drivers too.
> This should fix the issue on ohci-exynos, wherein in the absence of
> SAMSUNG_USB2PHY config symbol, we ended up getting the NOP_USB_XCEIV phy
> when the same is enabled. And thus the PHYs are not configured properly.
> 
> Reported-by: Sachin Kamat 
> Signed-off-by: Vivek Gautam 

Reviewed-by: Jingoo Han 

Best regards,
Jingoo Han

> ---
>  drivers/usb/host/ohci-exynos.c |   64 
> ++--
>  1 file changed, 9 insertions(+), 55 deletions(-)
> 
> diff --git a/drivers/usb/host/ohci-exynos.c b/drivers/usb/host/ohci-exynos.c
> index a72ab8f..0199a8b 100644
> --- a/drivers/usb/host/ohci-exynos.c
> +++ b/drivers/usb/host/ohci-exynos.c
> @@ -19,11 +19,8 @@
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
>  #include 
> -#include 
> 
>  #include "ohci.h"
> 
> @@ -38,9 +35,7 @@ static struct hc_driver __read_mostly exynos_ohci_hc_driver;
> 
>  struct exynos_ohci_hcd {
>   struct clk *clk;
> - struct usb_phy *phy;
> - struct usb_otg *otg;
> - struct phy *phy_g[PHY_NUMBER];
> + struct phy *phy[PHY_NUMBER];
>  };
> 
>  static int exynos_ohci_get_phy(struct device *dev,
> @@ -51,28 +46,7 @@ static int exynos_ohci_get_phy(struct device *dev,
>   int phy_number;
>   int ret = 0;
> 
> - exynos_ohci->phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2);
> - if (IS_ERR(exynos_ohci->phy)) {
> - ret = PTR_ERR(exynos_ohci->phy);
> - if (ret != -ENXIO && ret != -ENODEV) {
> - dev_err(dev, "no usb2 phy configured\n");
> - return ret;
> - }
> - dev_dbg(dev, "Failed to get usb2 phy\n");
> - } else {
> - exynos_ohci->otg = exynos_ohci->phy->otg;
> - }
> -
> - /*
> -  * Getting generic phy:
> -  * We are keeping both types of phys as a part of transiting OHCI
> -  * to generic phy framework, so as to maintain backward compatibilty
> -  * with old DTB.
> -  * If there are existing devices using DTB files built from them,
> -  * to remove the support for old bindings in this driver,
> -  * we need to make sure that such devices have their DTBs
> -  * updated to ones built from new DTS.
> -  */
> + /* Get the generic phys */
>   for_each_available_child_of_node(dev->of_node, child) {
>   ret = of_property_read_u32(child, "reg", &phy_number);
>   if (ret) {
> @@ -97,7 +71,7 @@ static int exynos_ohci_get_phy(struct device *dev,
>   }
>   dev_dbg(dev, "Failed to get usb2 phy\n");
>   }
> - exynos_ohci->phy_g[phy_number] = phy;
> + exynos_ohci->phy[phy_number] = phy;
>   }
> 
>   return ret;
> @@ -110,16 +84,13 @@ static int exynos_ohci_phy_enable(struct device *dev)
>   int i;
>   int ret = 0;
> 
> - if (!IS_ERR(exynos_ohci->phy))
> - return usb_phy_init(exynos_ohci->phy);
> -
>   for (i = 0; ret == 0 && i < PHY_NUMBER; i++)
> - if (!IS_ERR(exynos_ohci->phy_g[i]))
> - ret = phy_power_on(exynos_ohci->phy_g[i]);
> + if (!IS_ERR(exynos_ohci->phy[i]))
> + ret = phy_power_on(exynos_ohci->phy[i]);
>   if (ret)
>   for (i--; i >= 0; i--)
> - if (!IS_ERR(exynos_ohci->phy_g[i]))
> - phy_power_off(exynos_ohci->phy_g[i]);
> + if (!IS_ERR(exynos_ohci->phy[i]))
> + phy_power_off(exynos_ohci->phy[i]);
> 
>   return ret;
>  }
> @@ -130,14 +101,9 @@ static void exynos_ohci_phy_disable(struct device *dev)
>   struct exynos_ohci_hcd *exynos_ohci = to_exynos_ohci(hcd);
>   int i;
> 
> - if (!IS_ERR(exynos_ohci->phy)) {
> - usb_phy_shutdown(exynos_ohci->phy);
> - return;
> - }
> -
>   for (i = 0; i < PHY_NUMBER; i++)
> - if (!IS_ERR(exynos_ohci->phy_g[i]))
> - phy_power_off(exynos_ohci->phy_g[i]);
> + if (!IS_ERR(exynos_ohci->phy[i]))
> + phy_power_off(exynos_ohci->phy[i]);
>  }
> 
>  static int exynos_ohci_probe(struct platform_device *pdev)
> @@ -209,9 +175,6 @@ skip_phy:
>   goto fail_io;
>   }
> 
> - if (exynos_ohci->otg)
> - exynos_ohci->otg->set_host(exynos_ohci->otg, &hcd->self);
> -
>   platform_set_drvdata(pdev, hcd);
> 
>   err = exynos_ohci_phy_enable(&pdev->dev);
> @@ -244,9 +207,6 @@ static int exynos_ohci_remove(struct platform_device 
> *pdev)
> 
>   usb_remove_hcd(hcd);
> 
> - if (exynos_ohci->otg)
> - exynos_ohci->otg->set_host(exynos_ohc

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Nicolas Pitre

On Sun, 17 Aug 2014, Jason Cooper wrote:

> On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote:
> > On Sun, 17 Aug 2014, Jason Cooper wrote:
> > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux wrote:
> > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote:
> > > > > Applied to irqchip/urgent with Nico's Ack.
> > > > 
> > > > Interesting, so I'm discussing this patch, and it gets applied anyway...
> > > > yes, that's great.
> > > 
> > > Quoting Nico:
> > > 
> > > "Of course it would be good to clarify things wrt Russell's remark
> > > independently from this patch."
> > > 
> > > I took 'independently' to mean "This patch is ok, *and* we need to
> > > address Russell's concerns in a follow-up patch."
> > > 
> > > Nico's Reviewed-by with that comment was sent August 13th.  The most
> > > recent activity on this thread was also August 13th.  After four days, I
> > > reasoned there were no objections to his comment.
> > 
> > Well... I mentioned this patch is a nice cleanup independently of the 
> > reason why it was created in the first place.
> 
> Ah, fair enough.
> 
> > Maybe that shouldn't be sorted as "urgent" in that case, especially
> > when the code having problem with the current state of things is
> > living out of mainline.
> 
> hmmm, yes.  I've been grappling with the semantics of '/urgent' vice
> '/fixes'.  With mvebu, /fixes is the branch for all changes needing to go
> into the current -rcX cycle.  For irqchip, Thomas suggested /urgent for
> the equivalent branch.  To me, they serve the same purpose.
> Unfortunately, I occasionally hear "Well, it's not _urgent_ ...".  I
> suppose I'll put up with it for one more cycle and then change it to
> /fixes. :)
> 
> wrt this patch, I need to drop it anyway.  I was a bit rusty (it's been
> a few weeks) and forgot to add the Cc -stable and Fixes: tags.  I do
> agree, though, it's certainly not urgent.

Given the raised issue has to do with out-of-tree code, there is no need 
to CC stable in that case anyway.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/7] usb-phy: samsung-usb3: Remove older phy-samsung-usb3 driver

2014-08-17 Thread Jingoo Han

On Thursday, August 14, 2014 11:24 PM, Vivek Gautam wrote:
> 
> Removing this older USB 3.0 DRD controller PHY driver, since
> a new driver based on generic phy framework is now available.
> 
> Signed-off-by: Vivek Gautam 

Reviewed-by: Jingoo Han 

Best regards,
Jingoo Han

> ---
>  drivers/usb/phy/Kconfig|8 -
>  drivers/usb/phy/Makefile   |1 -
>  drivers/usb/phy/phy-samsung-usb.h  |   80 -
>  drivers/usb/phy/phy-samsung-usb3.c |  350 
> 
>  4 files changed, 439 deletions(-)
>  delete mode 100644 drivers/usb/phy/phy-samsung-usb3.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Jason Cooper

On Sun, Aug 17, 2014 at 10:41:23PM +0100, Russell King - ARM Linux wrote:
> On Sun, Aug 17, 2014 at 03:04:34PM -0400, Jason Cooper wrote:
> > Quoting Nico:
> > 
> > "Of course it would be good to clarify things wrt Russell's remark
> > independently from this patch."
> > 
> > I took 'independently' to mean "This patch is ok, *and* we need to
> > address Russell's concerns in a follow-up patch."
> > 
> > Nico's Reviewed-by with that comment was sent August 13th.  The most
> > recent activity on this thread was also August 13th.  After four days, I
> > reasoned there were no objections to his comment.
> 
> Right, during the merge window, and during merge windows, I tend to
> ignore almost all email now because people don't stop developing, and
> they don't take any notice where the mainline cycle is.  In fact, I go
> off and do non-kernel work during a merge window and only briefly scan
> for bug fixes.

Ok, now dropped.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/7 update] arm64/efi: do not enter virtual mode in case booting with efi=noruntime or noefi

2014-08-17 Thread Dave Young


In case efi runtime disabled via noefi kernel cmdline arm64_enter_virtual_mode
should error out.

At the same time move early_memunmap(memmap.map, mapsize) to the beginning of
the function or it will leak early mem.

Signed-off-by: Dave Young 
---
 arch/arm64/kernel/efi.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
index 6ed0362..8f5db4a 100644
--- a/arch/arm64/kernel/efi.c
+++ b/arch/arm64/kernel/efi.c
@@ -392,11 +392,16 @@ static int __init arm64_enter_virtual_mode(void)
return -1;
}
 
-   pr_info("Remapping and enabling EFI services.\n");
-
-   /* replace early memmap mapping with permanent mapping */
mapsize = memmap.map_end - memmap.map;
early_memunmap(memmap.map, mapsize);
+
+   if (efi_runtime_disabled()) {
+   pr_info("EFI runtime services will be disabled.\n");
+   return -1;
+   }
+
+   pr_info("Remapping and enabling EFI services.\n");
+   /* replace early memmap mapping with permanent mapping */
memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map,
   mapsize);
memmap.map_end = memmap.map + mapsize;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/7] arm64/efi: do not enter virtual mode in case booting with efi=noruntime or noefi

2014-08-17 Thread Dave Young

On 08/15/14 at 04:09pm, Will Deacon wrote:
> On Thu, Aug 14, 2014 at 10:15:30AM +0100, Dave Young wrote:
> > In case efi runtime disabled via noefi kernel cmdline 
> > arm64_enter_virtual_mode
> > should error out.
> > 
> > At the same time move early_memunmap(memmap.map, mapsize) to the beginning 
> > of
> > the function or it will leak early mem.
> > 
> > Signed-off-by: Dave Young 
> > ---
> >  arch/arm64/kernel/efi.c | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
> > index 6ed0362..309fab1 100644
> > --- a/arch/arm64/kernel/efi.c
> > +++ b/arch/arm64/kernel/efi.c
> > @@ -392,11 +392,16 @@ static int __init arm64_enter_virtual_mode(void)
> > return -1;
> > }
> >  
> > +   mapsize = memmap.map_end - memmap.map;
> > +   if (efi_runtime_disabled()) {
> > +   early_memunmap(memmap.map, mapsize);
> 
> Should this early_memunmap really be conditional? With this change, we no
> longer unmap it before setting up the permanent mapping below.

Ooops, I tested the right version but sent a wrong version for this arm64 patch.

Thanks for catch.

> 
> Will
> 
> > +   pr_info("EFI runtime services will be disabled.\n");
> > +   return -1;
> > +   }
> > +
> > pr_info("Remapping and enabling EFI services.\n");
> >  
> > /* replace early memmap mapping with permanent mapping */
> > -   mapsize = memmap.map_end - memmap.map;
> > -   early_memunmap(memmap.map, mapsize);
> > memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map,
> >mapsize);
> > memmap.map_end = memmap.map + mapsize;
> > -- 
> > 1.8.3.1
> > 
> > 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Jason Cooper

On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote:
> On Sun, 17 Aug 2014, Jason Cooper wrote:
> > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux wrote:
> > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote:
> > > > Applied to irqchip/urgent with Nico's Ack.
> > > 
> > > Interesting, so I'm discussing this patch, and it gets applied anyway...
> > > yes, that's great.
> > 
> > Quoting Nico:
> > 
> > "Of course it would be good to clarify things wrt Russell's remark
> > independently from this patch."
> > 
> > I took 'independently' to mean "This patch is ok, *and* we need to
> > address Russell's concerns in a follow-up patch."
> > 
> > Nico's Reviewed-by with that comment was sent August 13th.  The most
> > recent activity on this thread was also August 13th.  After four days, I
> > reasoned there were no objections to his comment.
> 
> Well... I mentioned this patch is a nice cleanup independently of the 
> reason why it was created in the first place.

Ah, fair enough.

> Maybe that shouldn't be sorted as "urgent" in that case, especially
> when the code having problem with the current state of things is
> living out of mainline.

hmmm, yes.  I've been grappling with the semantics of '/urgent' vice
'/fixes'.  With mvebu, /fixes is the branch for all changes needing to go
into the current -rcX cycle.  For irqchip, Thomas suggested /urgent for
the equivalent branch.  To me, they serve the same purpose.
Unfortunately, I occasionally hear "Well, it's not _urgent_ ...".  I
suppose I'll put up with it for one more cycle and then change it to
/fixes. :)

wrt this patch, I need to drop it anyway.  I was a bit rusty (it's been
a few weeks) and forgot to add the Cc -stable and Fixes: tags.  I do
agree, though, it's certainly not urgent.

As Russell has raised more issues with this patch as well, I'll hold off
on re-applying until I see a new version.  Hopefully it'll meet with
everyones approval.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v1 5/9] block: loop: convert to blk-mq

2014-08-17 Thread Ming Lei

On Mon, Aug 18, 2014 at 1:48 AM, Jens Axboe  wrote:
> On 2014-08-16 02:06, Ming Lei wrote:
>>
>> On 8/16/14, Jens Axboe  wrote:
>>>
>>> On 08/15/2014 10:36 AM, Jens Axboe wrote:

 On 08/15/2014 10:31 AM, Christoph Hellwig wrote:
>>
>> +static void loop_queue_work(struct work_struct *work)
>
>
> Offloading work straight to a workqueue dosn't make much sense
> in the blk-mq model as we'll usually be called from one.  If you
> need to avoid the cases where we are called directly a flag for
> the blk-mq code to always schedule a workqueue sounds like a much
> better plan.


 That's a good point - would clean up this bit, and be pretty close to a
 one-liner to support in blk-mq for the drivers that always need blocking
 context.
>>>
>>>
>>> Something like this should do the trick - totally untested. But with
>>> that, loop would just need to add BLK_MQ_F_WQ_CONTEXT to it's tag set
>>> flags and it could always do the work inline from ->queue_rq().
>>
>>
>> I think it is a good idea.
>>
>> But for loop, there may be two problems:
>>
>> - default max_active for bound workqueue is 256, which means several slow
>> loop devices might slow down whole block system. With kernel AIO, it won't
>> be a big deal, but some block/fs may not support direct I/O and still
>> fallback to
>> workqueue
>>
>> - 6. Guidelines of Documentation/workqueue.txt
>> If there is dependency among multiple work items used during memory
>> reclaim, they should be queued to separate wq each with WQ_MEM_RECLAIM.
>
>
> Both are good points. But I think this mainly means that we should support
> this through a potentially per-dispatch queue workqueue, separate from
> kblockd. There's no reason blk-mq can't support this with a per-hctx
> workqueue, for drivers that need it.

Good idea, and per-device workqueue should be enough if
BLK_MQ_F_WQ_CONTEXT flag is set.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] fs/buffer.c: allocate buffer cache from non-movable area

2014-08-17 Thread Gioh Kim




2014-08-15 오전 6:22, Andrew Morton 쓴 글:

On Thu, 14 Aug 2014 14:15:40 +0900 Gioh Kim  wrote:


A buffer cache is allocated from movable area
because it is referred for a while and released soon.
But some filesystems are taking buffer cache for a long time
and it can disturb page migration.

A new API should be introduced to allocate buffer cache from
non-movable area.


I think the API could and should be more flexible than this.

Rather than making the API be "movable or not movable", let's permit
callers to specify the gfp_t and leave it at that.  That way, if
someone later wants to allocate a buffer head with, I dunno,
__GFP_NOTRACK then they can do so.

So the word "movable" shouldn't appear in buffer.c at all, except in a
single place.


Absolutely I agree with you.
If filesystem developers agree this patch I will send 2nd patch that applies 
your ideas.

Thank you for your advices.




--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device 
*bdev,
   */
  static int
  grow_dev_page(struct block_device *bdev, sector_t block,
-   pgoff_t index, int size, int sizebits)
+ pgoff_t index, int size, int sizebits, gfp_t movable_mask)


s/movable_mask/gfp/


I got it.




  {
 struct inode *inode = bdev->bd_inode;
 struct page *page;
@@ -1003,7 +1003,8 @@ grow_dev_page(struct block_device *bdev, sector_t block,
 gfp_t gfp_mask;

 gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
-   gfp_mask |= __GFP_MOVABLE;
+   if (movable_mask & __GFP_MOVABLE)
+   gfp_mask |= __GFP_MOVABLE;


This becomes

gfp_mask |= gfp;


I got it.




 /*
  * XXX: __getblk_slow() can not really deal with failure and
  * will endlessly loop on improvised global reclaim.  Prefer
@@ -1058,7 +1059,8 @@ failed:
   * that page was dirty, the buffers are set dirty also.
   */
  static int
-grow_buffers(struct block_device *bdev, sector_t block, int size)
+grow_buffers(struct block_device *bdev, sector_t block,
+int size, gfp_t movable_mask)


gfp


  {
 pgoff_t index;
 int sizebits;
@@ -1085,11 +1087,12 @@ grow_buffers(struct block_device *bdev, sector_t block, 
int size)
 }

 /* Create a page with the proper size buffers.. */
-   return grow_dev_page(bdev, block, index, size, sizebits);
+   return grow_dev_page(bdev, block, index, size, sizebits, movable_mask);
  }

  static struct buffer_head *
-__getblk_slow(struct block_device *bdev, sector_t block, int size)
+__getblk_slow(struct block_device *bdev, sector_t block,
+ int size, gfp_t movable_mask)


gfp


  {
 /* Size must be multiple of hard sectorsize */
 if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
@@ -,7 +1114,7 @@ __getblk_slow(struct block_device *bdev, sector_t block, 
int size)
 if (bh)
 return bh;

-   ret = grow_buffers(bdev, block, size);
+   ret = grow_buffers(bdev, block, size, movable_mask);


gfp


 if (ret < 0)
 return NULL;
 if (ret == 0)
@@ -1385,11 +1388,34 @@ __getblk(struct block_device *bdev, sector_t block, 
unsigned size)

 might_sleep();
 if (bh == NULL)
-   bh = __getblk_slow(bdev, block, size);
+   bh = __getblk_slow(bdev, block, size, __GFP_MOVABLE);


Here is the place where buffer.c. mentions "movable".


I got it.




 return bh;
  }
  EXPORT_SYMBOL(__getblk);

+ /*
+ * __getblk_nonmovable will locate (and, if necessary, create) the buffer_head
+ * which corresponds to the passed block_device, block and size. The
+ * returned buffer has its reference count incremented.
+ *
+ * The page cache is allocated from non-movable area
+ * not to prevent page migration.
+ *
+ * __getblk()_nonmovable will lock up the machine
+ * if grow_dev_page's try_to_free_buffers() attempt is failing. FIXME, perhaps?
+ */
+struct buffer_head *
+__getblk_nonmovable(struct block_device *bdev, sector_t block, unsigned size)
+{
+   struct buffer_head *bh = __find_get_block(bdev, block, size);
+
+   might_sleep();
+   if (bh == NULL)
+   bh = __getblk_slow(bdev, block, size, 0);
+   return bh;
+}
+EXPORT_SYMBOL(__getblk_nonmovable);


Suggest this be called __getblk_gfp(bdev, block, size, gfp) and then
__getblk() be changed to call __getblk_gfp(..., __GFP_MOVABLE).

We could then write a __getblk_nonmovable() which calls __getblk_gfp()
(a static inlined one-line function) or we can just call
__getblk_gfp(..., 0) directly from filesystems.


I got it.





@@ -1423,6 +1450,28 @@ __bread(struct block_device *bdev, sector_t block, 
unsigned size)
  }
  EXPORT_SYMBOL(__bread);

+/**
+ *  __bread_nonmovable() - reads a specified block and returns the bh
+ *  @bdev: the block_device to read from
+ *  @block: number of block
+ *  @si

Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area

2014-08-17 Thread Gioh Kim




2014-08-17 오전 3:52, Jan Kara 쓴 글:

On Thu 14-08-14 14:26:10, Andrew Morton wrote:

On Thu, 14 Aug 2014 14:12:17 +0900 Gioh Kim  wrote:


This patch try to solve problem that a long-lasting page caches of
ext4 superblock and journaling of superblock disturb page migration.

I've been testing CMA feature on my ARM-based platform
and found that two page caches cannot be migrated.
They are page caches of superblock of ext4 filesystem and its journaling data.

Current ext4 reads superblock with sb_bread() that allocates page
from movable area. But the problem is that ext4 hold the page until
it is unmounted. If root filesystem is ext4 the page cannot be migrated forever.
And also the journaling data for the superblock cannot be migreated.

I introduce a new API for allocating page cache from non-movable area.
It is useful for ext4/ext3 and others that want to hold page cache for a long 
time.


All seems reasonable to me.  The additional overhead in buffer.c from
additional function arguments is regrettable but I don't see a
non-hacky alternative.

One vital question which the changelog doesn't really address (it
should): how important is this patch?  Is your test system presently
"completely dead in the water utterly unusable" or "occasionally not
quite as good as it could be".  Somewhere in between?

   I would be also interested in how much these patches make things better.
Because I would expect all metadata that is currently journalled to be
unmovable as well.

Honza



I'm so sorry for lacking of detail.

My test platform has totally 1GB memory, 256MB for CMA and 768MB for normal.
I applied Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64, so that
3/4 of allocation take place in normal area and 1/4 allocation take place in 
CMA area.

And my platform has 4 ext4 partitions. Each ext4 partition has 2 page caches 
for superblock that
are what this patch tries to move to out of CMA area.
Therefore there are 8 page caches (8 pages size) that can prevent page 
migration.

My test scenario is trying to allocate all CMA area: repeating 16MB allocation 
until all CMA area are allocated.
In the most cases 2 pages are allocated from CMA area and one allocation among 
16 tries to allocation failed.
It is rare case that every allocation successes.

Applying this patch no page cache is allocation from CMA area and every 
allocation successes.

Please inform me if you need any information.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()

2014-08-17 Thread tangchen


Hi tj,

On 08/17/2014 07:08 PM, Tejun Heo wrote:

Hello,

On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote:

numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug().

Yeah, that one.


If we don't clear hotpluggable flag in free_low_memory_core_early(), the
memory which marked hotpluggable flag will not free to buddy allocator.
Because __next_mem_range() will skip them.

free_low_memory_core_early
for_each_free_mem_range
for_each_mem_range
__next_mem_range

Ah, okay, so the patch fixes __next_mem_range() and thus makes
free_low_memory_core_early() to skip hotpluggable regions unlike
before.  Please explain things like that in the changelog.  Also,
what's its relationship with numa_clear_kernel_node_hotplug()?  Do we
still need them?  If so, what are the different roles that these two
separate places serve?


numa_clear_kernel_node_hotplug() only clears hotplug flags for the nodes
the kernel resides in, not for hotpluggable nodes. The reason why we did
this is to enable the kernel to allocate memory in case all the nodes are
hotpluggable.

And we clear hotplug flags for all the nodes in free_low_memory_core_early()
is because if we do not, all hotpluggable memory won't be able to be freed
to buddy after Qiu's patch.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.16-rcX crashes on resume from Suspend-To-RAM

2014-08-17 Thread Zhang Rui

On Sat, 2014-08-16 at 02:46 +0200, Rafael J. Wysocki wrote:
> On Friday, August 15, 2014 10:17:42 AM Markus Gutschke wrote:
> > Just wondering if any of you had any other ideas of what I could try
> > to help debug this problem?
> 
> My theory is that there is a device in your system that we don't have a driver
> for, but it had been enumerated as a PNP device before the change that 
> triggered
> the problem for you and we turned it off during suspend as part of the default
> ACPI PNP device handling.

I had the same assumption before, thus I checked the difference of
platform devices and pnp devices, and found that there are three devices
enumerated to platform bus instead of PNP bus after the ACPI enumeration
rework patches. They are PNP0800, PNP0200 and PNP0C04 devices, thus I
made a debug patch to add those ids to the acpi_pnp scan handler id list
so that they will stay in PNP bus. But the problem still exists after
applying the debug patch.
> 
> The reason why you're seeing a crash with the "platform" test level is most
> likely that the _WAK control method does something unusual on your system.
> 
an easy way to check this is to apply the debug patch attached and
re-test "platform" test level.

thanks,
rui
> The LNXSYBUS:00 thing from dmesg probably is a red herring.
> 
> I need the output of acpidump from the affected system, but please attach it
> to the bug entry at https://bugzilla.kernel.org/show_bug.cgi?id=80911 that
> Rui has created for this issue.
> 
> Also please check the list of PNP devices under
> 
> /sys/bus/pnp/devices/
> 
> before and after the commit you have found by bisection and let me know if
> there are any differences.
> 
> 
> > On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke  
> > wrote:
> > > As I said earlier in this thread, echo'ing "devices" into "pm_test"
> > > does not result in a crash; but doing so for "platform" does.
> > >
> > > Markus
> > >
> > > On Aug 12, 2014 1:26 AM, "Zhang Rui"  wrote:
> > >>
> > >> On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote:
> > >> > I am back and have physical access to the machine now.
> > >> >
> > >> great!
> > >>
> > >> > I re-ran the test just to be sure, and I can confirm that "platform"
> > >> > does in fact result in a crash.
> > >> >
> > >> what about "devices"?
> > >> I mean
> > >>
> > >> # echo devices > /sys/power/pm_test
> > >>
> > >> and see if that triggers the crash.
> > >>
> > >> > Furthermore, I ran the test that Rui asked for. I suspended, resumed,
> > >> > and upon crashing power-cycled the machine ASAP. "dmesg" suggests that
> > >> > the problem is with LNXSYBUS:00 That doesn't tell me much, but
> > >> > hopefully it makes sense to you guys.
> > >> >
> > >> [0.930093]   Magic number: 10:810:122
> > >> [0.930185] acpi LNXSYBUS:00: hash matches
> > >>
> > >> This looks weird, ACPI will do nothing for LNXSYBUS devices during
> > >> resume.
> > >> Rafael, any thought on this?
> > >>
> > >> thanks,
> > >> rui
> > >>
> 

>From 1a51cb80cf581a0aa228dd82aaf45f4e250d0d59 Mon Sep 17 00:00:00 2001
From: Zhang Rui 
Date: Mon, 18 Aug 2014 09:09:07 +0800
Subject: [PATCH] 80911: Debug patch to skip _WAK in platform pm_test mode

---
 kernel/power/suspend.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 6dadb25..402f0ca 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -270,11 +270,12 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 		printk(KERN_ERR "PM: Some devices failed to power down\n");
 		goto Platform_finish;
 	}
-	error = platform_suspend_prepare_late(state);
-	if (error)
-		goto Platform_wake;
 
 	if (suspend_test(TEST_PLATFORM))
+		goto Platform_test;
+
+	error = platform_suspend_prepare_late(state);
+	if (error)
 		goto Platform_wake;
 
 	/*
@@ -319,8 +320,8 @@ static int suspend_enter(suspend_state_t state, bool *wakeup)
 
  Platform_wake:
 	platform_suspend_wake(state);
+ Platform_test:
 	dpm_resume_start(PMSG_RESUME);
-
  Platform_finish:
 	platform_suspend_finish(state);
 	return error;
-- 
1.8.3.2

rt_sigreturn rejects a substitute stack frame as invalid.

2014-08-17 Thread Steven Stewart-Gallus

Hello,

I'm not totally sure that GLibc's setcontext is safe to use in a
signal handler. So, I decided I was going to play things safe and let
rt_sigreturn switch stacks for me instead. However, rt_sigreturn seems
to reject my substitute stack frame as invalid and I'm not sure why.

Thank you,
Steven Stewart-Gallus

The code:

#include 
#include 
#include 
#include 

static ucontext_t alternate_context;

static char alternate_context_stack[SIGSTKSZ];

static char signal_stack[SIGSTKSZ];


static void alternate_context_func(void)
{
puts("alternate context!");
}

static void switch_stack(int signo, siginfo_t *infop, void *untyped_ucontextp)
{
ucontext_t * ucontextp = untyped_ucontextp;

/* I'm not sure if setcontext is async-signal-safe so set the
 * context using the return from the signal handler.
 */

*ucontextp = alternate_context;
#ifdef __linux__
ucontextp->uc_mcontext.fpregs = &ucontextp->__fpregs_mem;
#endif
}

int main(void)
{
{
stack_t stack = { 0 };

stack.ss_sp = signal_stack;
stack.ss_size = sizeof signal_stack;

sigaltstack(&stack, NULL);
}

getcontext(&alternate_context);
alternate_context.uc_stack.ss_sp = alternate_context_stack;
alternate_context.uc_stack.ss_size = sizeof alternate_context_stack;
makecontext(&alternate_context, (void (*)(void))alternate_context_func, 0U);

{
struct sigaction action = { 0 };

action.sa_sigaction = switch_stack;
action.sa_flags = SA_SIGINFO;

sigfillset(&action.sa_mask);

sigaction(SIGRTMIN, &action, NULL);
}

raise(SIGRTMIN);

}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

回复： Re: 回复： Re: [PATCH] unicore32: Fix build error

2014-08-17 Thread Xuetao Guan


- Guenter Roeck  写道：
> On 08/15/2014 05:45 PM, Xuetao Guan  wrote:
> >
> > - Guenter Roeck  写道：
> >> On 08/10/2014 08:29 AM, Guenter Roeck wrote:
> >>> unicore32 builds fail with
> >>>
> >>> arch/unicore32/kernel/signal.c: In function ‘setup_frame’:
> >>> arch/unicore32/kernel/signal.c:257: error:
> >>>   ‘usig’ undeclared (first use in this function)
> >>> arch/unicore32/kernel/signal.c:279: error:
> >>>   ‘usig’ undeclared (first use in this function)
> >>> arch/unicore32/kernel/signal.c: In function ‘handle_signal’:
> >>> arch/unicore32/kernel/signal.c:306: warning: unused variable ‘tsk’
> >>> arch/unicore32/kernel/signal.c: In function ‘do_signal’:
> >>> arch/unicore32/kernel/signal.c:376: error:
> >>>   implicit declaration of function ‘get_signsl’
> >>> make[1]: *** [arch/unicore32/kernel/signal.o] Error 1
> >>> make: *** [arch/unicore32/kernel/signal.o] Error 2
> >>>
> >>> Bisect points to commit 649671c90eaf ("unicore32: Use get_signal()
> >>> signal_setup_done()").
> >>>
> >>> This code never even compiled. Reverting the patch does not work,
> >>> since previously used functions no longer exist, so try to fix it up.
> >>> Compile tested only.
> >>>
> >>> Cc: Richard Weinberger 
> >>> Signed-off-by: Guenter Roeck 
> >>
> >> ping ...
> >>
> >> Failure is still present in upstream kernel (v3.16-11383-gc9d2642).
> >>
> >> Guenter
> >>
> >
> > Thanks. I'll fix it.
> >
> 
> More a question of applying (and if possible testing) the patch I provided.
> 
> Thanks,
> Guenter
> 
> 
Ok, I'll do it.

Xuetao
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v15 3/7] sparc: add pmd_[dirty|mkclean] for THP

2014-08-17 Thread Minchan Kim

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent
overwrite of the contents since MADV_FREE syscall is called for
THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Acked-by: David S. Miller 
Cc: sparcli...@vger.kernel.org
Signed-off-by: Minchan Kim 
---
 arch/sparc/include/asm/pgtable_64.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 3770bf5c6e1b..b80a309d7e00 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -666,6 +666,13 @@ static inline unsigned long pmd_young(pmd_t pmd)
return pte_young(pte);
 }
 
+static inline int pmd_dirty(pmd_t pmd)
+{
+   pte_t pte = __pte(pmd_val(pmd));
+
+   return pte_dirty(pte);
+}
+
 static inline unsigned long pmd_write(pmd_t pmd)
 {
pte_t pte = __pte(pmd_val(pmd));
@@ -723,6 +730,15 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd)
return __pmd(pte_val(pte));
 }
 
+static inline pmd_t pmd_mkclean(pmd_t pmd)
+{
+   pte_t pte = __pte(pmd_val(pmd));
+
+   pte = pte_mkclean(pte);
+
+   return __pmd(pte_val(pte));
+}
+
 static inline pmd_t pmd_mkyoung(pmd_t pmd)
 {
pte_t pte = __pte(pmd_val(pmd));
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v15 1/7] mm: support madvise(MADV_FREE)

2014-08-17 Thread Minchan Kim

Linux doesn't have an ability to free pages lazy while other OS
already have been supported that named by madvise(MADV_FREE).

The gain is clear that kernel can discard freed pages rather than
swapping out or OOM if memory pressure happens.

Without memory pressure, freed pages would be reused by userspace
without another additional overhead(ex, page fault + allocation
+ zeroing).

How to work is following as.

When madvise syscall is called, VM clears dirty bit of ptes of
the range. If memory pressure happens, VM checks dirty bit of
page table and if it found still "clean", it means it's a
"lazyfree pages" so VM could discard the page instead of swapping out.
Once there was store operation for the page before VM peek a page
to reclaim, dirty bit is set so VM can swap out the page instead of
discarding.

Firstly, heavy users would be general allocators(ex, jemalloc,
tcmalloc and hope glibc supports it) and jemalloc/tcmalloc already
have supported the feature for other OS(ex, FreeBSD)

barrios@blaptop:~/benchmark/ebizzy$ lscpu
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):4
On-line CPU(s) list:   0-3
Thread(s) per core:2
Core(s) per socket:2
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 42
Stepping:  7
CPU MHz:   2801.000
BogoMIPS:  5581.64
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  4096K
NUMA node0 CPU(s): 0-3

ebizzy benchmark(./ebizzy -S 10 -n 512)

 vanilla-jemalloc   MADV_free-jemalloc

1 thread
records:  10  records:  10
avg:  7682.10 avg:  15306.10
std:  62.35(0.81%)std:  347.99(2.27%)
max:  7770.00 max:  15622.00
min:  7598.00 min:  14772.00

2 thread
records:  10  records:  10
avg:  12747.50avg:  24171.00
std:  792.06(6.21%)   std:  895.18(3.70%)
max:  13337.00max:  26023.00
min:  10535.00min:  23152.00

4 thread
records:  10  records:  10
avg:  16474.60avg:  33717.90
std:  1496.45(9.08%)  std:  2008.97(5.96%)
max:  17877.00max:  35958.00
min:  12224.00min:  29565.00

8 thread
records:  10  records:  10
avg:  16778.50avg:  33308.10
std:  825.53(4.92%)   std:  1668.30(5.01%)
max:  17543.00max:  36010.00
min:  14576.00min:  29577.00

16 thread
records:  10  records:  10
avg:  20614.40avg:  35516.30
std:  602.95(2.92%)   std:  1283.65(3.61%)
max:  21753.00max:  37178.00
min:  19605.00min:  33217.00

32 thread
records:  10  records:  10
avg:  22771.70avg:  36018.50
std:  598.94(2.63%)   std:  1046.76(2.91%)
max:  24035.00max:  37266.00
min:  22108.00min:  34149.00

In summary, MADV_FREE is about 2 time faster than MADV_DONTNEED.

Cc: Michael Kerrisk 
Cc: Linux API 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: KOSAKI Motohiro 
Cc: Mel Gorman 
Cc: Jason Evans 
Acked-by: Kirill A. Shutemov 
Acked-by: Zhang Yanfei 
Acked-by: Rik van Riel 
Signed-off-by: Minchan Kim 
---
 include/linux/rmap.h   |   9 ++-
 include/linux/vm_event_item.h  |   1 +
 include/uapi/asm-generic/mman-common.h |   1 +
 mm/madvise.c   | 140 +
 mm/rmap.c  |  42 +-
 mm/vmscan.c|  40 --
 mm/vmstat.c|   1 +
 7 files changed, 222 insertions(+), 12 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index be574506e6a9..0ba377b97a38 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -75,6 +75,7 @@ enum ttu_flags {
TTU_UNMAP = 1,  /* unmap mode */
TTU_MIGRATION = 2,  /* migration mode */
TTU_MUNLOCK = 4,/* munlock mode */
+   TTU_FREE = 8,   /* free mode */
 
TTU_IGNORE_MLOCK = (1 << 8),/* ignore mlock */
TTU_IGNORE_ACCESS = (1 << 9),   /* don't age */
@@ -181,7 +182,8 @@ static inline void page_dup_rmap(struct page *page)
  * Called from mm/vmscan.c to handle paging out
  */
 int page_referenced(struct page *, int is_locked,
-   struct mem_cgroup *memcg, unsigned long *vm_flags);
+   struct mem_cgroup *memcg, unsigned long *vm_flags,
+   int *is_dirty);
 
 #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
 
@@ -260,9 +262,12 @@ int rmap_walk(struct page *page, struct rmap_walk_control 
*rwc);
 
 static inline int page_referenced(struct page *page, int is_loc

[PATCH v15 4/7] powerpc: add pmd_[dirty|mkclean] for THP

2014-08-17 Thread Minchan Kim

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent
overwrite of the contents since MADV_FREE syscall is called for
THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: linuxppc-...@lists.ozlabs.org
Reviewed-by: Aneesh Kumar K.V 
Signed-off-by: Minchan Kim 
---
 arch/powerpc/include/asm/pgtable-ppc64.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h 
b/arch/powerpc/include/asm/pgtable-ppc64.h
index eb9261024f51..c9a4bbe8e179 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -468,9 +468,11 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd)
 
 #define pmd_pfn(pmd)   pte_pfn(pmd_pte(pmd))
 #define pmd_young(pmd) pte_young(pmd_pte(pmd))
+#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
 #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)   pte_pmd(pte_mkdirty(pmd_pte(pmd)))
+#define pmd_mkclean(pmd)   pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
 
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v15 7/7] mm: Don't split THP page when syscall is called

2014-08-17 Thread Minchan Kim

We don't need to split THP page when MADV_FREE syscall is
called. It could be done when VM decide really frees it so
we could avoid unnecessary THP split.

Cc: Andrea Arcangeli 
Acked-by: Kirill A. Shutemov 
Signed-off-by: Minchan Kim 
---
 include/linux/huge_mm.h |  4 
 mm/huge_memory.c| 35 +++
 mm/madvise.c| 21 -
 mm/rmap.c   |  8 ++--
 mm/vmscan.c | 28 ++--
 5 files changed, 83 insertions(+), 13 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 63579cb8d3dc..25a961256d9f 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -19,6 +19,9 @@ extern struct page *follow_trans_huge_pmd(struct 
vm_area_struct *vma,
  unsigned long addr,
  pmd_t *pmd,
  unsigned int flags);
+extern int madvise_free_huge_pmd(struct mmu_gather *tlb,
+   struct vm_area_struct *vma,
+   pmd_t *pmd, unsigned long addr);
 extern int zap_huge_pmd(struct mmu_gather *tlb,
struct vm_area_struct *vma,
pmd_t *pmd, unsigned long addr);
@@ -56,6 +59,7 @@ extern pmd_t *page_check_address_pmd(struct page *page,
 unsigned long address,
 enum page_check_address_pmd_flag flag,
 spinlock_t **ptl);
+extern int pmd_freeable(pmd_t pmd);
 
 #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT)
 #define HPAGE_PMD_NR (1mmap_sem)) {
+   pr_err("%s: mmap_sem is unlocked! addr=0x%lx 
end=0x%lx vma->vm_start=0x%lx vma->vm_end=0x%lx\n",
+   __func__, addr, end,
+   vma->vm_start,
+   vma->vm_end);
+   BUG();
+   }
+#endif
+   split_huge_page_pmd(vma, addr, pmd);
+   } else if (!madvise_free_huge_pmd(tlb, vma, pmd, addr))
+   goto next;
+   /* fall through */
+   }
 
-   split_huge_page_pmd(vma, addr, pmd);
if (pmd_trans_unstable(pmd))
return 0;
 
@@ -316,6 +334,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long 
addr,
}
arch_leave_lazy_mmu_mode();
pte_unmap_unlock(pte - 1, ptl);
+next:
cond_resched();
return 0;
 }
diff --git a/mm/rmap.c b/mm/rmap.c
index 04c181133890..9c407576ff8e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -704,9 +704,13 @@ static int page_referenced_one(struct page *page, struct 
vm_area_struct *vma,
referenced++;
 
/*
-* In this implmentation, MADV_FREE doesn't support THP free
+* Use pmd_freeable instead of raw pmd_dirty because in some
+* of architecture, pmd_dirty is not defined unless
+* CONFIG_TRANSPARNTE_HUGE is enabled
 */
-   dirty++;
+   if (!pmd_freeable(*pmd))
+   dirty++;
+
spin_unlock(ptl);
} else {
pte_t *pte;
diff --g

[PATCH v15 5/7] arm: add pmd_mkclean for THP

2014-08-17 Thread Minchan Kim

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent
overwrite of the contents since MADV_FREE syscall is called for
THP page.

This patch adds pmd_mkclean for THP page MADV_FREE support.

Cc: Catalin Marinas 
Cc: Russell King 
Cc: linux-arm-ker...@lists.infradead.org
Acked-by: Will Deacon 
Acked-by: Steve Capper 
Signed-off-by: Minchan Kim 
---
 arch/arm/include/asm/pgtable-3level.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0f8b00..bc913a065270 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -234,6 +234,7 @@ PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF);
 PMD_BIT_FUNC(mksplitting, |= L_PMD_SECT_SPLITTING);
 PMD_BIT_FUNC(mkwrite,   &= ~L_PMD_SECT_RDONLY);
 PMD_BIT_FUNC(mkdirty,   |= L_PMD_SECT_DIRTY);
+PMD_BIT_FUNC(mkclean,   &= ~L_PMD_SECT_DIRTY);
 PMD_BIT_FUNC(mkyoung,   |= PMD_SECT_AF);
 
 #define pmd_mkhuge(pmd)(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v15 2/7] x86: add pmd_[dirty|mkclean] for THP

2014-08-17 Thread Minchan Kim

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent
overwrite of the contents since MADV_FREE syscall is called for
THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Acked-by: Zhang Yanfei 
Acked-by: Kirill A. Shutemov 
Signed-off-by: Minchan Kim 
---
 arch/x86/include/asm/pgtable.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0ec056012618..329865799653 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -104,6 +104,11 @@ static inline int pmd_young(pmd_t pmd)
return pmd_flags(pmd) & _PAGE_ACCESSED;
 }
 
+static inline int pmd_dirty(pmd_t pmd)
+{
+   return pmd_flags(pmd) & _PAGE_DIRTY;
+}
+
 static inline int pte_write(pte_t pte)
 {
return pte_flags(pte) & _PAGE_RW;
@@ -267,6 +272,11 @@ static inline pmd_t pmd_mkold(pmd_t pmd)
return pmd_clear_flags(pmd, _PAGE_ACCESSED);
 }
 
+static inline pmd_t pmd_mkclean(pmd_t pmd)
+{
+   return pmd_clear_flags(pmd, _PAGE_DIRTY);
+}
+
 static inline pmd_t pmd_wrprotect(pmd_t pmd)
 {
return pmd_clear_flags(pmd, _PAGE_RW);
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v15 6/7] arm64: add pmd_[dirty|mkclean] for THP

2014-08-17 Thread Minchan Kim

MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent
overwrite of the contents since MADV_FREE syscall is called for
THP page.

This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
support.

Cc: Russell King 
Cc: linux-arm-ker...@lists.infradead.org
Acked-by: Will Deacon 
Acked-by: Steve Capper 
Acked-by: Catalin Marinas 
Signed-off-by: Minchan Kim 
---
 arch/arm64/include/asm/pgtable.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index ffe1ba0506d1..efb1b2fc4d39 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -259,10 +259,12 @@ static inline pmd_t pte_pmd(pte_t pte)
 #endif
 
 #define pmd_young(pmd) pte_young(pmd_pte(pmd))
+#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd))
 #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mksplitting(pmd)   pte_pmd(pte_mkspecial(pmd_pte(pmd)))
 #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)   pte_pmd(pte_mkwrite(pmd_pte(pmd)))
+#define pmd_mkclean(pmd)   pte_pmd(pte_mkclean(pmd_pte(pmd)))
 #define pmd_mkdirty(pmd)   pte_pmd(pte_mkdirty(pmd_pte(pmd)))
 #define pmd_mkyoung(pmd)   pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 #define pmd_mknotpresent(pmd)  (__pmd(pmd_val(pmd) & ~PMD_TYPE_MASK))
-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v15 0/7] MADV_FREE support

2014-08-17 Thread Minchan Kim

This patch enable MADV_FREE hint for madvise syscall, which have
been supported by other OSes. [PATCH 1] includes the details.

[1] support MADVISE_FREE for !THP page so if VM encounter
THP page in syscall context, it splits THP page.
[2-6] is to preparing to call madvise syscall without THP plitting
[7] enable THP page support for MADV_FREE.

* from v14
 * Add more Ackedy-by from arch people(sparc, arm64 and arm)
 * Drop s390 since pmd_dirty/clean was merged

* from v13
 * Add more Ackedy-by from arch people(arm, arm64 and ppc)
 * Rebased on mmotm 2014-08-13-14-29

* from v12
 * Fix - skip to mark free pte on try_to_free_swap failed page - Kirill
 * Add more Acked-by from arch maintainers and Kirill

* From v11
 * Fix arm build - Steve
 * Separate patch for arm and arm64 - Steve
 * Remove unnecessary check - Kirill
 * Skip non-vm_normal page - Kirill
 * Add Acked-by - Zhang
 * Sparc64 build fix
 * Pagetable walker THP handling fix

* From v10
 * Add Acked-by from arch stuff(x86, s390)
 * Pagewalker based pagetable working - Kirill
 * Fix try_to_unmap_one broken with hwpoison - Kirill
 * Use VM_BUG_ON_PAGE in madvise_free_pmd - Kirill
 * Fix pgtable-3level.h for arm - Steve

* From v9
 * Add Acked-by - Rik
 * Add THP page support - Kirill

* From v8
 * Rebased-on v3.16-rc2-mmotm-2014-06-25-16-44

* From v7
 * Rebased-on next-20140613

* From v6
 * Remove page from swapcache in syscal time
 * Move utility functions from memory.c to madvise.c - Johannes
 * Rename untilify functtions - Johannes
 * Remove unnecessary checks from vmscan.c - Johannes
 * Rebased-on v3.15-rc5-mmotm-2014-05-16-16-56
 * Drop Reviewe-by because there was some changes since then.

* From v5
 * Fix PPC problem which don't flush TLB - Rik
 * Remove unnecessary lazyfree_range stub function - Rik
 * Rebased on v3.15-rc5

* From v4
 * Add Reviewed-by: Zhang Yanfei
 * Rebase on v3.15-rc1-mmotm-2014-04-15-16-14

* From v3
 * Add "how to work part" in description - Zhang
 * Add page_discardable utility function - Zhang
 * Clean up

* From v2
 * Remove forceful dirty marking of swap-readed page - Johannes
 * Remove deactivation logic of lazyfreed page
 * Rebased on 3.14
 * Remove RFC tag

* From v1
 * Use custom page table walker for madvise_free - Johannes
 * Remove PG_lazypage flag - Johannes
 * Do madvise_dontneed instead of madvise_freein swapless system

Minchan Kim (7):
  mm: support madvise(MADV_FREE)
  x86: add pmd_[dirty|mkclean] for THP
  sparc: add pmd_[dirty|mkclean] for THP
  powerpc: add pmd_[dirty|mkclean] for THP
  arm: add pmd_mkclean for THP
  arm64: add pmd_[dirty|mkclean] for THP
  mm: Don't split THP page when syscall is called

 arch/arm/include/asm/pgtable-3level.h|   1 +
 arch/arm64/include/asm/pgtable.h |   2 +
 arch/powerpc/include/asm/pgtable-ppc64.h |   2 +
 arch/sparc/include/asm/pgtable_64.h  |  16 
 arch/x86/include/asm/pgtable.h   |  10 ++
 include/linux/huge_mm.h  |   4 +
 include/linux/rmap.h |   9 +-
 include/linux/vm_event_item.h|   1 +
 include/uapi/asm-generic/mman-common.h   |   1 +
 mm/huge_memory.c |  35 +++
 mm/madvise.c | 159 +++
 mm/rmap.c|  46 -
 mm/vmscan.c  |  64 +
 mm/vmstat.c  |   1 +
 14 files changed, 331 insertions(+), 20 deletions(-)

-- 
2.0.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Nicolas Pitre

On Sun, 17 Aug 2014, Russell King - ARM Linux wrote:

> On Sun, Aug 17, 2014 at 03:04:34PM -0400, Jason Cooper wrote:
> > Quoting Nico:
> > 
> > "Of course it would be good to clarify things wrt Russell's remark
> > independently from this patch."
> > 
> > I took 'independently' to mean "This patch is ok, *and* we need to
> > address Russell's concerns in a follow-up patch."
> > 
> > Nico's Reviewed-by with that comment was sent August 13th.  The most
> > recent activity on this thread was also August 13th.  After four days, I
> > reasoned there were no objections to his comment.
> 
> Right, during the merge window, and during merge windows, I tend to
> ignore almost all email now because people don't stop developing, and
> they don't take any notice where the mainline cycle is.  In fact, I go
> off and do non-kernel work during a merge window and only briefly scan
> for bug fixes.
> 
> However, I have other concerns with this patch, which I've yet to air.
> For example, I don't like this crappy conditional locking that people
> keep dreaming up - that kind of stuff makes the kernel much harder to
> statically check that everything is correct.  It's an anti-lockdep
> strategy.
> 
> Secondly, I don't like this:
> 
> +   raw_spin_lock(&gic_sgi_lock);
> +   /*
> +* Ensure that the gic_cpu_map update above is seen in
> +* gic_raise_softirq() before we redirect any pending SGIs that
> +* may have been raised for the outgoing CPU (cur_cpu_id)
> +*/
> +   smp_mb__after_unlock_lock();
> +   raw_spin_unlock(&gic_sgi_lock);
> 
> That goes against the principle of locking, that you lock the data,
> not the code.

I admit I didn't understand the point of that construct on the first 
read.  Maybe I wouldn't be the only one.  Using Stephen's initial 
version for that hunk would be preferable as it is straight forward and 
would mean locking the data instead.

> I have no problem with changing gic_raise_softirq() to use a different
> lock, which gic_migrate_target(), and gic_set_affinity() can also use.
> There's no need for horrid locking here, because the only thing we're
> protecting is gic_map[] and the write to the register to trigger an
> IPI - and nothing using gic_arch_extn has any business knowing about
> SGIs.
> 
> No need for these crappy sgi_map_lock() macros and all the ifdeffery.

Those macros are there only to conditionalize the locking in 
gic_raise_softirq() because no locking what so ever is needed there when 
gic_migrate_target() is configured out.  I suggested the macros to cut 
down on the #ifdefery in the code.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/3] smp: re-implement the kick_all_cpus_sync() with wake_up_if_idle()

2014-08-17 Thread Liu, Chuansheng

Hello Andy,

> -Original Message-
> From: Andy Lutomirski [mailto:l...@amacapital.net]
> Sent: Friday, August 15, 2014 11:41 PM
> To: Liu, Chuansheng
> Cc: Peter Zijlstra; Daniel Lezcano; Rafael J. Wysocki; Ingo Molnar;
> linux...@vger.kernel.org; linux-kernel@vger.kernel.org; Liu, Changcheng;
> Wang, Xiaoming; Chakravarty, Souvik K
> Subject: Re: [PATCH 2/3] smp: re-implement the kick_all_cpus_sync() with
> wake_up_if_idle()
> 
> On Fri, Aug 15, 2014 at 12:01 AM, Chuansheng Liu
>  wrote:
> > Currently using smp_call_function() just woke up the corresponding
> > cpu, but can not break the polling idle loop.
> >
> > Here using the new sched API wake_up_if_idle() to implement it.
> 
> kick_all_cpus_sync has other callers, and those other callers want the
> old behavior.  I think this should be a new function.
> 
Yes, seems some current users of kick_all_cpus_sync() need IPI indeed,
will try to send out patch V2 with one new function.

Re: [PATCH v14 5/8] s390: add pmd_[dirty|mkclean] for THP

2014-08-17 Thread Minchan Kim

Hello,

On Thu, Aug 14, 2014 at 09:16:14AM +0200, Martin Schwidefsky wrote:
> On Thu, 14 Aug 2014 10:53:29 +0900
> Minchan Kim  wrote:
> 
> > MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent
> > overwrite of the contents since MADV_FREE syscall is called for
> > THP page but for s390 pmds only referenced bit is available
> > because there is no free bit left in the pmd entry for the
> > software dirty bit so this patch adds dumb pmd_dirty which
> > returns always true by suggesting by Martin.
> > 
> > They finally find a solution in future.
> > http://marc.info/?l=linux-api&m=140440328820808&w=2
> 
> The solution is already there, see git commit 152125b7a882df36.
> You can drop this patch.

Thanks for the heads up. I will drop it in next spin.
> 
> -- 
> blue skies,
>Martin.
> 
> "Reality continues to ruin my life." - Calvin.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler

2014-08-17 Thread Nicolas Pitre

On Sun, 17 Aug 2014, Jason Cooper wrote:

> Russell,
> 
> On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux wrote:
> > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote:
> > > Applied to irqchip/urgent with Nico's Ack.
> > 
> > Interesting, so I'm discussing this patch, and it gets applied anyway...
> > yes, that's great.
> 
> Quoting Nico:
> 
> "Of course it would be good to clarify things wrt Russell's remark
> independently from this patch."
> 
> I took 'independently' to mean "This patch is ok, *and* we need to
> address Russell's concerns in a follow-up patch."
> 
> Nico's Reviewed-by with that comment was sent August 13th.  The most
> recent activity on this thread was also August 13th.  After four days, I
> reasoned there were no objections to his comment.

Well... I mentioned this patch is a nice cleanup independently of the 
reason why it was created in the first place.  Maybe that shouldn't be 
sorted as "urgent" in that case, especially when the code having problem 
with the current state of things is living out of mainline.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] zram: add mem_used_max via sysfs

2014-08-17 Thread Minchan Kim

On Thu, Aug 14, 2014 at 11:32:36AM -0400, David Horner wrote:
> On Thu, Aug 14, 2014 at 11:09 AM, Dan Streetman  wrote:
> > On Wed, Aug 13, 2014 at 9:12 PM, Minchan Kim  wrote:
> >> -   if (zram->limit_bytes &&
> >> -   zs_get_total_size_bytes(meta->mem_pool) > 
> >> zram->limit_bytes) {
> >> +   total_bytes = zs_get_total_size_bytes(meta->mem_pool);
> >> +   if (zram->limit_bytes && total_bytes > zram->limit_bytes) {
> >
> > do you need to take the init_lock to read limit_bytes here?  It could
> > be getting changed between these checks...
> 
> There is no real danger in freeing with an error.
> It is more timing than a race.
> 
> The max calculation is still ok because committed allocations are
> added atomically.

There is one problem in below code piece.

zram->max_used_bytes = max(zram->max_used_bytes, total_bytes);

so we should consider this case.

if (zram->max_used_bytes < total_bytes)
zram->max_used_bytes = total_bytes;

And we could make the situation like this.


if (zram->max_used_bytes < total_bytes)
IRQ happen;
zram->max_used_bytes = total_bytes

During IRQ, other CPU could consume a lot of zsmalloc memory so that
zram->max_used_bytes would be increased under the foot so when IRQ is
finshed, zram->max_used_bytes could be reset with old total_bytes.

To prevent it, we should use the lock I posted RFC version or retry
logic with atomic opeartion(ie, cmpxchg) and my approach makes it simple
first and fix it if we see the trouble in future so my preference is
new spin lock at the moment.

Any comments?



> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning

2014-08-17 Thread Dave Chinner

On Fri, Aug 15, 2014 at 01:58:09PM -0400, Waiman Long wrote:
> On 08/14/2014 11:34 PM, Dave Chinner wrote:
> >
> >
> >xfs_io -f -c "truncate 500t" -c "extsize 1m" /path/to/vm/image/file

> 
> Thank for the testing recipe. I am afraid that I can't find a 500TB
> SSD for testing purpose.

Which bit of "sparse vm image file" didn't you understand?  I'm
using a 400GB of SSD for this testing

$ df -h /mnt/fast-ssd
Filesystem  Size  Used Avail Use% Mounted on
/dev/sdf400G  275G  125G  69% /mnt/fast-ssd
$ ls -lh /mnt/fast-ssd/vm-500t.img
-rw--- 1 root root 500T Aug 15 13:21 /mnt/fast-ssd/vm-500t.img
$ du -sh /mnt/fast-ssd/vm-500t.img
275G/mnt/fast-ssd/vm-500t.img

That is on a Samsung 840 EVO SSD, which just about everyone should
be able to obtain. Do you *really* think I have 500TB of SSDs lying
around?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()

2014-08-17 Thread Luis R. Rodriguez

On Fri, Aug 15, 2014 at 10:29:50AM +0100, Al Viro wrote:
> On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote:
> 
> > Christoph had noted that this seemed associated to the problem
> > that the btrfs uses different assignments for st_dev than s_dev,
> > but much as I'd like to see that changed based on discussions so
> > far its unclear if this is going to be possible unless strong
> > commitment is reached.
> 
> Explain, please.  Whose commitment and commitment to what, exactly?

There are two folks, one is the btrfs developers, and the others are
the VFS maintainers to provide proper guidance.

> Having different ->st_dev values for different files on the same
> fs is a bloody bad idea; why does btrfs do that at all?

With the disclosure of stating that I'm new to btrfs as I see its been
done to help cope with the copy on write mechanism, but I welcome btrfs
folks to chime in if there other reasons this was done from an
architectural point of view.

Provided all reasons why this was done are clarified what we'd need
then is proper guidance on what *would* be a much more reasonable
strategy to do what was desired, and finally commitmen from btrfs
folks to change btrfs to switch to this new agreed upon strategy.

> If nothing else,
> it breaks the usual "are those two files on the same fs?" tests...

It would seem that those tests need more context now with copy
on write, even the notion of disk space is all fucked up now, we
need to think of it in terms of different possibilities that the
new filesystems allow us to share data and different outcomes that
could be possible.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] zram: add mem_used_max via sysfs

2014-08-17 Thread Minchan Kim

Hello Dan,

On Thu, Aug 14, 2014 at 11:09:05AM -0400, Dan Streetman wrote:
> On Wed, Aug 13, 2014 at 9:12 PM, Minchan Kim  wrote:
> > Normally, zram user can get maximum memory zsmalloc consumed via
> > polling mem_used_total with sysfs in userspace.
> >
> > But it has a critical problem because user can miss peak memory
> > usage during update interval of polling. For avoiding that,
> > user should poll it frequently with mlocking to avoid delay
> > when memory pressure is heavy so it would be handy if the
> > kernel supports the function.
> >
> > This patch adds mem_used_max via sysfs.
> >
> > Signed-off-by: Minchan Kim 
> > ---
> >  Documentation/blockdev/zram.txt |  1 +
> >  drivers/block/zram/zram_drv.c   | 35 +--
> >  drivers/block/zram/zram_drv.h   |  2 ++
> >  3 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/blockdev/zram.txt 
> > b/Documentation/blockdev/zram.txt
> > index 9f239ff8c444..3b2247c2d4cf 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -107,6 +107,7 @@ size of the disk when not in use so a huge zram is 
> > wasteful.
> > orig_data_size
> > compr_data_size
> > mem_used_total
> > +   mem_used_max
> >
> >  8) Deactivate:
> > swapoff /dev/zram0
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index b48a3d0e9031..311699f18bd5 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -109,6 +109,30 @@ static ssize_t mem_used_total_show(struct device *dev,
> > return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
> >  }
> >
> > +static ssize_t mem_used_max_reset(struct device *dev,
> > +   struct device_attribute *attr, const char *buf, size_t len)
> > +{
> > +   struct zram *zram = dev_to_zram(dev);
> > +
> > +   down_write(&zram->init_lock);
> > +   zram->max_used_bytes = 0;
> > +   up_write(&zram->init_lock);
> > +   return len;
> > +}
> > +
> > +static ssize_t mem_used_max_show(struct device *dev,
> > +   struct device_attribute *attr, char *buf)
> > +{
> > +   u64 max_used_bytes;
> > +   struct zram *zram = dev_to_zram(dev);
> > +
> > +   down_read(&zram->init_lock);
> > +   max_used_bytes = zram->max_used_bytes;
> > +   up_read(&zram->init_lock);
> > +
> > +   return scnprintf(buf, PAGE_SIZE, "%llu\n", max_used_bytes);
> > +}
> > +
> >  static ssize_t max_comp_streams_show(struct device *dev,
> > struct device_attribute *attr, char *buf)
> >  {
> > @@ -474,6 +498,7 @@ static int zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index,
> > struct zram_meta *meta = zram->meta;
> > struct zcomp_strm *zstrm;
> > bool locked = false;
> > +   u64 total_bytes;
> >
> > page = bvec->bv_page;
> > if (is_partial_io(bvec)) {
> > @@ -543,8 +568,8 @@ static int zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index,
> > goto out;
> > }
> >
> > -   if (zram->limit_bytes &&
> > -   zs_get_total_size_bytes(meta->mem_pool) > 
> > zram->limit_bytes) {
> > +   total_bytes = zs_get_total_size_bytes(meta->mem_pool);
> > +   if (zram->limit_bytes && total_bytes > zram->limit_bytes) {
> 
> do you need to take the init_lock to read limit_bytes here?  It could
> be getting changed between these checks...

The zram_bvec_write is protected by read-side init_lock while mem_limit_store
is proteced by write-side init_lock.

> 
> > zs_free(meta->mem_pool, handle);
> > ret = -ENOMEM;
> > goto out;
> > @@ -578,6 +603,8 @@ static int zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index,
> > /* Update stats */
> > atomic64_add(clen, &zram->stats.compr_data_size);
> > atomic64_inc(&zram->stats.pages_stored);
> > +
> > +   zram->max_used_bytes = max(zram->max_used_bytes, total_bytes);
> 
> shouldn't max_used_bytes be atomic64_t?  Or take the init_lock here?
> 
> >  out:
> > if (locked)
> > zcomp_strm_release(zram->comp, zstrm);
> > @@ -656,6 +683,7 @@ static void zram_reset_device(struct zram *zram, bool 
> > reset_capacity)
> > down_write(&zram->init_lock);
> >
> > zram->limit_bytes = 0;
> > +   zram->max_used_bytes = 0;
> >
> > if (!init_done(zram)) {
> > up_write(&zram->init_lock);
> > @@ -897,6 +925,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, 
> > NULL);
> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >  static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
> > +static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show,
> > +   mem_used_max_reset);
> >  st

Re: [PATCH 3/3] zram: add mem_used_max via sysfs

2014-08-17 Thread Minchan Kim

Hi David,

On Thu, Aug 14, 2014 at 06:29:17AM -0400, David Horner wrote:
> The introduction of a reset can cause the stale zero value to be
> retained in the show.
> Instead reset to current value.

It's better. I will do.
Thanks!

> 
> On Wed, Aug 13, 2014 at 9:12 PM, Minchan Kim  wrote:
> > Normally, zram user can get maximum memory zsmalloc consumed via
> > polling mem_used_total with sysfs in userspace.
> >
> > But it has a critical problem because user can miss peak memory
> > usage during update interval of polling. For avoiding that,
> > user should poll it frequently with mlocking to avoid delay
> > when memory pressure is heavy so it would be handy if the
> > kernel supports the function.
> >
> > This patch adds mem_used_max via sysfs.
> >
> > Signed-off-by: Minchan Kim 
> > ---
> >  Documentation/blockdev/zram.txt |  1 +
> >  drivers/block/zram/zram_drv.c   | 35 +--
> >  drivers/block/zram/zram_drv.h   |  2 ++
> >  3 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/blockdev/zram.txt 
> > b/Documentation/blockdev/zram.txt
> > index 9f239ff8c444..3b2247c2d4cf 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -107,6 +107,7 @@ size of the disk when not in use so a huge zram is 
> > wasteful.
> > orig_data_size
> > compr_data_size
> > mem_used_total
> > +   mem_used_max
> >
> >  8) Deactivate:
> > swapoff /dev/zram0
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index b48a3d0e9031..311699f18bd5 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -109,6 +109,30 @@ static ssize_t mem_used_total_show(struct device *dev,
> > return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
> >  }
> >
> > +static ssize_t mem_used_max_reset(struct device *dev,
> > +   struct device_attribute *attr, const char *buf, size_t len)
> 
> perhaps these are local functions, but wouldn't the zs_ prefix still
> be appropriate?
> > +{
> > +   struct zram *zram = dev_to_zram(dev);
> > +
> > +   down_write(&zram->init_lock);
> > +   zram->max_used_bytes = 0;
> 
>zram->max_used_bytes = zs_get_total_size_bytes(meta->mem_pool);
> 
>(where meta is set up as below  (beyond my skill level at
> the moment)).
> 
> > +   up_write(&zram->init_lock);
> > +   return len;
> > +}
> > +
> > +static ssize_t mem_used_max_show(struct device *dev,
> > +   struct device_attribute *attr, char *buf)
> > +{
> > +   u64 max_used_bytes;
> > +   struct zram *zram = dev_to_zram(dev);
> > +
> > +   down_read(&zram->init_lock);
> 
> if these are atomic operations, why the (read and write) locks?
> 
> > +   max_used_bytes = zram->max_used_bytes;
> > +   up_read(&zram->init_lock);
> > +
> > +   return scnprintf(buf, PAGE_SIZE, "%llu\n", max_used_bytes);
> > +}
> > +
> >  static ssize_t max_comp_streams_show(struct device *dev,
> > struct device_attribute *attr, char *buf)
> >  {
> > @@ -474,6 +498,7 @@ static int zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index,
> > struct zram_meta *meta = zram->meta;
> > struct zcomp_strm *zstrm;
> > bool locked = false;
> > +   u64 total_bytes;
> >
> > page = bvec->bv_page;
> > if (is_partial_io(bvec)) {
> > @@ -543,8 +568,8 @@ static int zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index,
> > goto out;
> > }
> >
> > -   if (zram->limit_bytes &&
> > -   zs_get_total_size_bytes(meta->mem_pool) > 
> > zram->limit_bytes) {
> > +   total_bytes = zs_get_total_size_bytes(meta->mem_pool);
> > +   if (zram->limit_bytes && total_bytes > zram->limit_bytes) {
> > zs_free(meta->mem_pool, handle);
> > ret = -ENOMEM;
> > goto out;
> > @@ -578,6 +603,8 @@ static int zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index,
> > /* Update stats */
> > atomic64_add(clen, &zram->stats.compr_data_size);
> > atomic64_inc(&zram->stats.pages_stored);
> > +
> > +   zram->max_used_bytes = max(zram->max_used_bytes, total_bytes);
> >  out:
> > if (locked)
> > zcomp_strm_release(zram->comp, zstrm);
> > @@ -656,6 +683,7 @@ static void zram_reset_device(struct zram *zram, bool 
> > reset_capacity)
> > down_write(&zram->init_lock);
> >
> > zram->limit_bytes = 0;
> > +   zram->max_used_bytes = 0;
> >
> > if (!init_done(zram)) {
> > up_write(&zram->init_lock);
> > @@ -897,6 +925,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, 
> > NULL);
> >  static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store);
> >  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
> >  static DEVICE_AT

Re: [RFC 3/3] zram: limit memory size for zram

2014-08-17 Thread Minchan Kim

Hello Sergey,

On Thu, Aug 14, 2014 at 10:29:53PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (08/14/14 08:27), Minchan Kim wrote:
> > Date: Thu, 14 Aug 2014 08:27:19 +0900
> > From: Minchan Kim 
> > To: Sergey Senozhatsky 
> > Cc: linux...@kvack.org, Jerome Marchand ,
> >  linux-kernel@vger.kernel.org, juno.c...@lge.com, seungho1.p...@lge.com,
> >  Luigi Semenzato , Nitin Gupta 
> > Subject: Re: [RFC 3/3] zram: limit memory size for zram
> > User-Agent: Mutt/1.5.21 (2010-09-15)
> > 
> > Hey Sergey,
> > 
> > On Tue, Aug 05, 2014 at 10:16:15PM +0900, Sergey Senozhatsky wrote:
> > > Hello,
> > > 
> > > On (08/05/14 18:48), Minchan Kim wrote:
> > > > Another idea: we could define void zs_limit_mem(unsinged long nr_pages)
> > > > in zsmalloc and put the limit in zs_pool via new API from zram so that
> > > > zs_malloc could be failed as soon as it exceeds the limit.
> > > > 
> > > > In the end, zram doesn't need to call zs_get_total_size_bytes on every
> > > > write. It's more clean and right layer, IMHO.
> > > 
> > > yes, I think this one is better.
> > 
> > Although I suggested this new one, a few days ago I changed the decision
> > and was testing the new patchset.
> > 
> > If we add new API for zsmalloc, it adds unnecessary overhead for users who
> > doesn't care of limit. Although it's cheap, I'd like to avoid that.
> > 
> > The zsmalloc is just allocator so anybody can use it if they want.
> > But limitation is just requirement of zram who is a one of client
> > being able to use zsmalloc potentially so accouting should be on zram,
> > not zsmalloc.
> > 
> 
> my motivation was that zram does not use that much memory itself,
> zspool - does. zram is just a clueless client from that point of
> view: it recives some requests, do some things with supplied data,
> and asks zspool if the latter one can find some place to keep that
> data (and zram doesn't really care how that memory will be allocated
> or will not be).

Normally, when we consider malloc(3), malloc(3) doesn't give any API
to limit memory size for the process. It just exposes some API to
return the state like (ex, mallopt) to the user so it's user's role
to manage the memory. I thought it's same with zsmalloc.
zsmalloc already exposes zs_get_total_size_bytes so client can do it
if he want to limit and frequent API call(ex, zs_get_total_size_bytes)
should be his overhead while others who don't need to limit should
be no overhead.

> 
> I'm OK if we will have memory limitation in ZRAM. though conceptually,
> IMHO, it feels that such logic belongs to allocation layer. yet I admit
> the potential overhead issue.
> 
> > If we might have more users of zsmalloc in future and they all want this
> > feature that limit of zsmalloc memory usage, we might move the feature
> > from client to zsmalloc core so everybody would be happy for performance
> > and readability but opposite would be painful.
> > 
> > In summary, let's keep the accounting logic in client side of zsmalloc(ie,
> > zram) at the moment but we could move it into zsmalloc core possibly
> > in future.
> > 
> > Any thoughts?
> 
> agreed.

Thanks for the comment, Sergey!

> 
>   -ss
> 
> > > 
> > >   -ss
> > > 
> > > > On Tue, Aug 05, 2014 at 05:02:03PM +0900, Minchan Kim wrote:
> > > > > I have received a request several time from zram users.
> > > > > They want to limit memory size for zram because zram can consume
> > > > > lot of memory on system without limit so it makes memory management
> > > > > control hard.
> > > > > 
> > > > > This patch adds new knob to limit memory of zram.
> > > > > 
> > > > > Signed-off-by: Minchan Kim 
> > > > > ---
> > > > >  Documentation/blockdev/zram.txt |  1 +
> > > > >  drivers/block/zram/zram_drv.c   | 41 
> > > > > +
> > > > >  drivers/block/zram/zram_drv.h   |  1 +
> > > > >  3 files changed, 43 insertions(+)
> > > > > 
> > > > > diff --git a/Documentation/blockdev/zram.txt 
> > > > > b/Documentation/blockdev/zram.txt
> > > > > index d24534bee763..fcb0561dfe2e 100644
> > > > > --- a/Documentation/blockdev/zram.txt
> > > > > +++ b/Documentation/blockdev/zram.txt
> > > > > @@ -96,6 +96,7 @@ size of the disk when not in use so a huge zram is 
> > > > > wasteful.
> > > > >   compr_data_size
> > > > >   mem_used_total
> > > > >   mem_used_max
> > > > > + mem_limit
> > > > >  
> > > > >  7) Deactivate:
> > > > >   swapoff /dev/zram0
> > > > > diff --git a/drivers/block/zram/zram_drv.c 
> > > > > b/drivers/block/zram/zram_drv.c
> > > > > index a4d637b4db7d..47f68bbb2c44 100644
> > > > > --- a/drivers/block/zram/zram_drv.c
> > > > > +++ b/drivers/block/zram/zram_drv.c
> > > > > @@ -137,6 +137,37 @@ static ssize_t max_comp_streams_show(struct 
> > > > > device *dev,
> > > > >   return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> > > > >  }
> > > > >  
> > > > > +static ssize_t mem_limit_show(struct device *dev,
> > > > > + struct device_attri

Re: [PATCH 2/2] zram: limit memory size for zram

2014-08-17 Thread Minchan Kim

Hi Dan,

On Thu, Aug 14, 2014 at 10:33:29AM -0400, Dan Streetman wrote:
> On Wed, Aug 13, 2014 at 8:57 PM, Minchan Kim  wrote:
> > Since zram has no control feature to limit memory usage,
> > it makes hard to manage system memrory.
> >
> > This patch adds new knob "mem_limit" via sysfs to set up the
> > limit.
> >
> > Note: I added the logic in zram, not zsmalloc because the limit
> > is requirement of zram, not zsmalloc so I'd like to avoid
> > unnecessary branch in zsmalloc.
> >
> > Signed-off-by: Minchan Kim 
> > ---
> >  Documentation/blockdev/zram.txt | 20 +++
> >  drivers/block/zram/zram_drv.c   | 43 
> > +
> >  drivers/block/zram/zram_drv.h   |  1 +
> >  3 files changed, 60 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/blockdev/zram.txt 
> > b/Documentation/blockdev/zram.txt
> > index 0595c3f56ccf..9f239ff8c444 100644
> > --- a/Documentation/blockdev/zram.txt
> > +++ b/Documentation/blockdev/zram.txt
> > @@ -74,14 +74,26 @@ There is little point creating a zram of greater than 
> > twice the size of memory
> >  since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of 
> > the
> >  size of the disk when not in use so a huge zram is wasteful.
> >
> > -5) Activate:
> > +5) Set memory limit: Optional
> > +   Set memory limit by writing the value to sysfs node 'mem_limit'.
> > +   The value can be either in bytes or you can use mem suffixes.
> > +   Examples:
> > +   # limit /dev/zram0 with 50MB memory
> > +   echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
> > +
> > +   # Using mem suffixes
> > +   echo 256K > /sys/block/zram0/mem_limit
> > +   echo 512M > /sys/block/zram0/mem_limit
> > +   echo 1G > /sys/block/zram0/mem_limit
> > +
> > +6) Activate:
> > mkswap /dev/zram0
> > swapon /dev/zram0
> >
> > mkfs.ext4 /dev/zram1
> > mount /dev/zram1 /tmp
> >
> > -6) Stats:
> > +7) Stats:
> > Per-device statistics are exported as various nodes under
> > /sys/block/zram/
> > disksize
> > @@ -96,11 +108,11 @@ size of the disk when not in use so a huge zram is 
> > wasteful.
> > compr_data_size
> > mem_used_total
> >
> > -7) Deactivate:
> > +8) Deactivate:
> > swapoff /dev/zram0
> > umount /dev/zram1
> >
> > -8) Reset:
> > +9) Reset:
> > Write any positive value to 'reset' sysfs node
> > echo 1 > /sys/block/zram0/reset
> > echo 1 > /sys/block/zram1/reset
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index d00831c3d731..b48a3d0e9031 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -122,6 +122,35 @@ static ssize_t max_comp_streams_show(struct device 
> > *dev,
> > return scnprintf(buf, PAGE_SIZE, "%d\n", val);
> >  }
> >
> > +static ssize_t mem_limit_show(struct device *dev,
> > +   struct device_attribute *attr, char *buf)
> > +{
> > +   u64 val;
> > +   struct zram *zram = dev_to_zram(dev);
> > +
> > +   down_read(&zram->init_lock);
> > +   val = zram->limit_bytes;
> > +   up_read(&zram->init_lock);
> > +
> > +   return scnprintf(buf, PAGE_SIZE, "%llu\n", val);
> > +}
> > +
> > +static ssize_t mem_limit_store(struct device *dev,
> > +   struct device_attribute *attr, const char *buf, size_t len)
> > +{
> > +   u64 limit;
> > +   struct zram *zram = dev_to_zram(dev);
> > +
> > +   limit = memparse(buf, NULL);
> > +   if (!limit)
> > +   return -EINVAL;
> 
> Shouldn't passing a 0 limit be allowed, to disable the limit?

Sure. Will fix.
Thanks.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 4/6] perf/tests: add interrupted state sample parsing test

2014-08-17 Thread Stephane Eranian

This patch updates the sample parsing test with support
for the sampling of machine interrupted state.

The patch modifies the do_test() code to sahred the sample
regts bitmask between user and intr regs.

Signed-off-by: Stephane Eranian 
---
 tools/perf/tests/sample-parsing.c |   55 +++--
 1 file changed, 40 insertions(+), 15 deletions(-)

diff --git a/tools/perf/tests/sample-parsing.c 
b/tools/perf/tests/sample-parsing.c
index ca292f9..4908c64 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -126,16 +126,28 @@ static bool samples_same(const struct perf_sample *s1,
if (type & PERF_SAMPLE_TRANSACTION)
COMP(transaction);
 
+   if (type & PERF_SAMPLE_REGS_INTR) {
+   size_t sz = hweight_long(s1->intr_regs.mask) * sizeof(u64);
+
+   COMP(intr_regs.mask);
+   COMP(intr_regs.abi);
+   if (s1->intr_regs.abi &&
+   (!s1->intr_regs.regs || !s2->intr_regs.regs ||
+memcmp(s1->intr_regs.regs, s2->intr_regs.regs, sz))) {
+   pr_debug("Samples differ at 'intr_regs'\n");
+   return false;
+   }
+   }
+
return true;
 }
 
-static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format)
+static int do_test(u64 sample_type, u64 sample_regs, u64 read_format)
 {
struct perf_evsel evsel = {
.needs_swap = false,
.attr = {
.sample_type = sample_type,
-   .sample_regs_user = sample_regs_user,
.read_format = read_format,
},
};
@@ -154,7 +166,7 @@ static int do_test(u64 sample_type, u64 sample_regs_user, 
u64 read_format)
/* 1 branch_entry */
.data = {1, 211, 212, 213},
};
-   u64 user_regs[64];
+   u64 regs[64];
const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
const u64 data[] = {0x2211443366558877ULL, 0, 0xaabbccddeeff4321ULL};
struct perf_sample sample = {
@@ -176,8 +188,8 @@ static int do_test(u64 sample_type, u64 sample_regs_user, 
u64 read_format)
.branch_stack   = &branch_stack.branch_stack,
.user_regs  = {
.abi= PERF_SAMPLE_REGS_ABI_64,
-   .mask   = sample_regs_user,
-   .regs   = user_regs,
+   .mask   = sample_regs,
+   .regs   = regs,
},
.user_stack = {
.size   = sizeof(data),
@@ -187,14 +199,25 @@ static int do_test(u64 sample_type, u64 sample_regs_user, 
u64 read_format)
.time_enabled = 0x030a59d664fca7deULL,
.time_running = 0x011b6ae553eb98edULL,
},
+   .intr_regs  = {
+   .abi= PERF_SAMPLE_REGS_ABI_64,
+   .mask   = sample_regs,
+   .regs   = regs,
+   },
};
struct sample_read_value values[] = {{1, 5}, {9, 3}, {2, 7}, {6, 4},};
struct perf_sample sample_out;
size_t i, sz, bufsz;
int err, ret = -1;
 
-   for (i = 0; i < sizeof(user_regs); i++)
-   *(i + (u8 *)user_regs) = i & 0xfe;
+   if (sample_type & PERF_SAMPLE_REGS_USER)
+   evsel.attr.sample_regs_user = sample_regs;
+
+   if (sample_type & PERF_SAMPLE_REGS_INTR)
+   evsel.attr.sample_regs_intr = sample_regs;
+
+   for (i = 0; i < sizeof(regs); i++)
+   *(i + (u8 *)regs) = i & 0xfe;
 
if (read_format & PERF_FORMAT_GROUP) {
sample.read.group.nr = 4;
@@ -271,7 +294,7 @@ int test__sample_parsing(void)
 {
const u64 rf[] = {4, 5, 6, 7, 12, 13, 14, 15};
u64 sample_type;
-   u64 sample_regs_user;
+   u64 sample_regs;
size_t i;
int err;
 
@@ -280,7 +303,7 @@ int test__sample_parsing(void)
 * were added.  Please actually update the test rather than just change
 * the condition below.
 */
-   if (PERF_SAMPLE_MAX > PERF_SAMPLE_TRANSACTION << 1) {
+   if (PERF_SAMPLE_MAX > PERF_SAMPLE_REGS_INTR << 1) {
pr_debug("sample format has changed, some new PERF_SAMPLE_ bit 
was introduced - test needs updating\n");
return -1;
}
@@ -297,22 +320,24 @@ int test__sample_parsing(void)
}
continue;
}
+   sample_regs = 0;
 
if (sample_type == PERF_SAMPLE_REGS_USER)
-   sample_regs_user = 0x3fff;
-   else
-   sample_regs_user = 0;
+   sample_regs = 0x3fff;
+
+   if (sample_type == PERF_SAMPLE_REGS_INTR)
+   sample_regs = 0xff0fff;

[PATCH v3 1/6] perf: add ability to sample machine state on interrupt

2014-08-17 Thread Stephane Eranian

Enable capture of interrupted machine state for each
sample.

Registers to sample are passed per event in the
sample_regs_intr bitmask.

To sample interrupt machine state, the
PERF_SAMPLE_INTR_REGS must be passed in
sample_type.

The list of available registers is arch
dependent and provided by asm/perf_regs.h

Registers are laid out as u64 in the order
of the bit order of sample_intr_regs.

Reviewed-by: Jiri Olsa 
Reviewed-by: Andi Kleen 
Signed-off-by: Stephane Eranian 
---
 include/linux/perf_event.h  |7 +--
 include/uapi/linux/perf_event.h |   14 -
 kernel/events/core.c|   44 +--
 3 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f0a1036..e043465 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -79,7 +79,7 @@ struct perf_branch_stack {
struct perf_branch_entryentries[0];
 };
 
-struct perf_regs_user {
+struct perf_regs {
__u64   abi;
struct pt_regs  *regs;
 };
@@ -599,7 +599,8 @@ struct perf_sample_data {
struct perf_callchain_entry *callchain;
struct perf_raw_record  *raw;
struct perf_branch_stack*br_stack;
-   struct perf_regs_user   regs_user;
+   struct perf_regsregs_user;
+   struct perf_regsregs_intr;
u64 stack_user_size;
u64 weight;
/*
@@ -629,6 +630,8 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->weight = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
+   data->regs_intr.abi = PERF_SAMPLE_REGS_ABI_NONE;
+   data->regs_intr.regs = NULL;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9269de2..8019505 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -137,8 +137,9 @@ enum perf_event_sample_format {
PERF_SAMPLE_DATA_SRC= 1U << 15,
PERF_SAMPLE_IDENTIFIER  = 1U << 16,
PERF_SAMPLE_TRANSACTION = 1U << 17,
+   PERF_SAMPLE_REGS_INTR   = 1U << 18,
 
-   PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */
+   PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */
 };
 
 /*
@@ -334,6 +335,15 @@ struct perf_event_attr {
 
/* Align to u64. */
__u32   __reserved_2;
+   /*
+* Defines set of regs to dump for each sample
+* state captured on:
+*  - precise = 0: PMU interrupt
+*  - precise > 0: sampled instruction
+*
+* See asm/perf_regs.h for details.
+*/
+   __u64   sample_regs_intr;
 };
 
 #define perf_flags(attr)   (*(&(attr)->read_format + 1))
@@ -686,6 +696,8 @@ enum perf_event_type {
 *  { u64   weight;   } && PERF_SAMPLE_WEIGHT
 *  { u64   data_src; } && PERF_SAMPLE_DATA_SRC
 *  { u64   transaction; } && 
PERF_SAMPLE_TRANSACTION
+*  { u64   abi; # enum perf_sample_regs_abi
+*u64   regs[weight(mask)]; } && 
PERF_SAMPLE_REGS_INTR
 * };
 */
PERF_RECORD_SAMPLE  = 9,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d7363a..5fa8b17 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4395,7 +4395,7 @@ perf_output_sample_regs(struct perf_output_handle *handle,
}
 }
 
-static void perf_sample_regs_user(struct perf_regs_user *regs_user,
+static void perf_sample_regs_user(struct perf_regs *regs_user,
  struct pt_regs *regs)
 {
if (!user_mode(regs)) {
@@ -4411,6 +4411,14 @@ static void perf_sample_regs_user(struct perf_regs_user 
*regs_user,
}
 }
 
+static void perf_sample_regs_intr(struct perf_regs *regs_intr,
+ struct pt_regs *regs)
+{
+   regs_intr->regs = regs;
+   regs_intr->abi  = perf_reg_abi(current);
+}
+
+
 /*
  * Get remaining task size from user stack pointer.
  *
@@ -4792,6 +4800,22 @@ void perf_output_sample(struct perf_output_handle 
*handle,
if (sample_type & PERF_SAMPLE_TRANSACTION)
perf_output_put(handle, data->txn);
 
+   if (sample_type & PERF_SAMPLE_REGS_INTR) {
+   u64 abi = data->regs_intr.abi;
+   /*
+* If there are no regs to dump, notice it through
+* first u64 being zero (PERF_SAMPLE_REGS_ABI_NONE).
+*/
+   perf_output_put(handle, abi);
+
+   if (abi) {
+   u64 mask = event->attr.sample_regs_intr;
+   perf_output_sample_regs(handle,
+

[PATCH v3 6/6] perf: improve perf_sample_data struct layout

2014-08-17 Thread Stephane Eranian

From: Peter Zijlstra 

This patch reorders fields in the perf_sample_data
struct in order to minimize the number of cachelines
touched in perf_sample_data_init(). It also removes
some intializations which are redundant with the
code in kernel/events/core.c

Signed-off-by: Peter Zijlstra 
---
 include/linux/perf_event.h |   34 +-
 kernel/events/core.c   |5 -
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e043465..57b7efc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -579,35 +579,40 @@ extern u64 perf_event_read_value(struct perf_event *event,
 
 
 struct perf_sample_data {
-   u64 type;
+   /*
+* Fields set by perf_sample_data_init(), group so as to
+* minimize the cachelines touched.
+*/
+   u64 addr;
+   struct perf_raw_record  *raw;
+   struct perf_branch_stack*br_stack;
+   u64 period;
+   u64 weight;
+   u64 txn;
+   union  perf_mem_data_srcdata_src;
 
+   /*
+* The other fields, optionally {set,used} by
+* perf_{prepare,output}_sample().
+*/
+   u64 type;
u64 ip;
struct {
u32 pid;
u32 tid;
}   tid_entry;
u64 time;
-   u64 addr;
u64 id;
u64 stream_id;
struct {
u32 cpu;
u32 reserved;
}   cpu_entry;
-   u64 period;
-   union  perf_mem_data_srcdata_src;
struct perf_callchain_entry *callchain;
-   struct perf_raw_record  *raw;
-   struct perf_branch_stack*br_stack;
struct perf_regsregs_user;
struct perf_regsregs_intr;
u64 stack_user_size;
-   u64 weight;
-   /*
-* Transaction flags for abort events:
-*/
-   u64 txn;
-};
+} cacheline_aligned;
 
 /* default value for data source */
 #define PERF_MEM_NA (PERF_MEM_S(OP, NA)   |\
@@ -624,14 +629,9 @@ static inline void perf_sample_data_init(struct 
perf_sample_data *data,
data->raw  = NULL;
data->br_stack = NULL;
data->period = period;
-   data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
-   data->regs_user.regs = NULL;
-   data->stack_user_size = 0;
data->weight = 0;
data->data_src.val = PERF_MEM_NA;
data->txn = 0;
-   data->regs_intr.abi = PERF_SAMPLE_REGS_ABI_NONE;
-   data->regs_intr.regs = NULL;
 }
 
 extern void perf_output_sample(struct perf_output_handle *handle,
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5fa8b17..696a778 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4406,8 +4406,11 @@ static void perf_sample_regs_user(struct perf_regs 
*regs_user,
}
 
if (regs) {
-   regs_user->regs = regs;
regs_user->abi  = perf_reg_abi(current);
+   regs_user->regs = regs;
+   } else {
+   regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE;
+   regs_user->regs = NULL;
}
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 3/6] perf tools: add core support for sampling intr machine state regs

2014-08-17 Thread Stephane Eranian

Add the infrastructure to setup, collect and report the interrupt
machine state regs which can be captured by the kernel.

Signed-off-by: Stephane Eranian 
---
 tools/perf/perf.h |1 +
 tools/perf/util/event.h   |1 +
 tools/perf/util/evsel.c   |   46 -
 tools/perf/util/session.c |   44 ++-
 4 files changed, 86 insertions(+), 6 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 510c65f..309d956 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -54,6 +54,7 @@ struct record_opts {
bool sample_weight;
bool sample_time;
bool period;
+   bool sample_intr_regs;
unsigned int freq;
unsigned int mmap_pages;
unsigned int user_freq;
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 7eb7107..d6e79f3 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -162,6 +162,7 @@ struct perf_sample {
struct ip_callchain *callchain;
struct branch_stack *branch_stack;
struct regs_dump  user_regs;
+   struct regs_dump  intr_regs;
struct stack_dump user_stack;
struct sample_read read;
 };
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 01ce14c..74b4268 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -628,6 +628,11 @@ void perf_evsel__config(struct perf_evsel *evsel, struct 
record_opts *opts)
if (opts->call_graph_enabled && !evsel->no_aux_samples)
perf_evsel__config_callgraph(evsel, opts);
 
+   if (opts->sample_intr_regs) {
+   attr->sample_regs_intr = PERF_REGS_MASK;
+   perf_evsel__set_sample_bit(evsel, REGS_INTR);
+   }
+
if (target__has_cpu(&opts->target))
perf_evsel__set_sample_bit(evsel, CPU);
 
@@ -1005,6 +1010,7 @@ static size_t perf_event_attr__fprintf(struct 
perf_event_attr *attr, FILE *fp)
ret += PRINT_ATTR_X64(branch_sample_type);
ret += PRINT_ATTR_X64(sample_regs_user);
ret += PRINT_ATTR_U32(sample_stack_user);
+   ret += PRINT_ATTR_X64(sample_regs_intr);
 
ret += fprintf(fp, "%.60s\n", graph_dotted_line);
 
@@ -1504,6 +1510,23 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, 
union perf_event *event,
array++;
}
 
+   data->intr_regs.abi = PERF_SAMPLE_REGS_ABI_NONE;
+   if (type & PERF_SAMPLE_REGS_INTR) {
+   OVERFLOW_CHECK_u64(array);
+   data->intr_regs.abi = *array;
+   array++;
+
+   if (data->intr_regs.abi != PERF_SAMPLE_REGS_ABI_NONE) {
+   u64 mask = evsel->attr.sample_regs_intr;
+
+   sz = hweight_long(mask) * sizeof(u64);
+   OVERFLOW_CHECK(array, sz, max_size);
+   data->intr_regs.mask = mask;
+   data->intr_regs.regs = (u64 *)array;
+   array = (void *)array + sz;
+   }
+   }
+
return 0;
 }
 
@@ -1599,6 +1622,16 @@ size_t perf_event__sample_event_size(const struct 
perf_sample *sample, u64 type,
if (type & PERF_SAMPLE_TRANSACTION)
result += sizeof(u64);
 
+   if (type & PERF_SAMPLE_REGS_INTR) {
+   if (sample->intr_regs.abi) {
+   result += sizeof(u64);
+   sz = hweight_long(sample->intr_regs.mask) * sizeof(u64);
+   result += sz;
+   } else {
+   result += sizeof(u64);
+   }
+   }
+
return result;
 }
 
@@ -1777,6 +1810,17 @@ int perf_event__synthesize_sample(union perf_event 
*event, u64 type,
array++;
}
 
+   if (type & PERF_SAMPLE_REGS_INTR) {
+   if (sample->intr_regs.abi) {
+   *array++ = sample->intr_regs.abi;
+   sz = hweight_long(sample->intr_regs.mask) * sizeof(u64);
+   memcpy(array, sample->intr_regs.regs, sz);
+   array = (void *)array + sz;
+   } else {
+   *array++ = 0;
+   }
+   }
+
return 0;
 }
 
@@ -1906,7 +1950,7 @@ static int sample_type__fprintf(FILE *fp, bool *first, 
u64 value)
bit_name(READ), bit_name(CALLCHAIN), bit_name(ID), 
bit_name(CPU),
bit_name(PERIOD), bit_name(STREAM_ID), bit_name(RAW),
bit_name(BRANCH_STACK), bit_name(REGS_USER), 
bit_name(STACK_USER),
-   bit_name(IDENTIFIER),
+   bit_name(IDENTIFIER), bit_name(REGS_INTR),
{ .name = NULL, }
};
 #undef bit_name
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6d2d50d..4eb8ca6 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -581,15 +581,46 @@ static void regs_dump__printf(u64 mask, u64 *re

[PATCH v3 2/6] perf/x86: add support for sampling PEBS machine state registers

2014-08-17 Thread Stephane Eranian

PEBS can capture machine state regs at retiremnt of the sampled
instructions. When precise sampling is enabled on an event, PEBS
is used, so substitute the interrupted state with the PEBS state.
Note that not all registers are captured by PEBS. Those missing
are replaced by the interrupt state counter-parts.

Signed-off-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event_intel_ds.c |   17 +
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c 
b/arch/x86/kernel/cpu/perf_event_intel_ds.c
index 9dc4199..139a8a5 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_ds.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c
@@ -886,6 +886,23 @@ static void __intel_pmu_pebs_event(struct perf_event 
*event,
regs.bp = pebs->bp;
regs.sp = pebs->sp;
 
+   if (sample_type & PERF_SAMPLE_REGS_INTR) {
+   regs.ax = pebs->ax;
+   regs.bx = pebs->bx;
+   regs.cx = pebs->cx;
+   regs.si = pebs->si;
+   regs.di = pebs->di;
+
+   regs.r8 = pebs->r8;
+   regs.r9 = pebs->r9;
+   regs.r10 = pebs->r10;
+   regs.r11 = pebs->r11;
+   regs.r12 = pebs->r12;
+   regs.r13 = pebs->r13;
+   regs.r14 = pebs->r14;
+   regs.r14 = pebs->r15;
+   }
+
if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) {
regs.ip = pebs->real_ip;
regs.flags |= PERF_EFLAGS_EXACT;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 5/6] perf record: add new -I option to sample interrupted machine state

2014-08-17 Thread Stephane Eranian

Add -I/--intr-regs option to capture machine state registers at
interrupt.

Add the corresponding man page description

Signed-off-by: Stephane Eranian 
---
 tools/perf/Documentation/perf-record.txt |6 ++
 tools/perf/builtin-record.c  |2 ++
 2 files changed, 8 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index d460049..1a36259 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -214,6 +214,12 @@ if combined with -a or -C options.
 After starting the program, wait msecs before measuring. This is useful to
 filter out the startup phase of the program, which is often very different.
 
+-I::
+--intr-regs::
+Capture machine state (registers) at interrupt, i.e., on counter overflows for
+each sample. List of captured registers depends on the architecture. This 
option
+is off by default.
+
 SEE ALSO
 
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4db670d..8dc1fd8 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -882,6 +882,8 @@ const struct option record_options[] = {
"sample transaction flags (special events only)"),
OPT_BOOLEAN(0, "per-thread", &record.opts.target.per_thread,
"use per-thread mmaps"),
+   OPT_BOOLEAN('I', "intr-regs", &record.opts.sample_intr_regs,
+   "Sample machine registers on interrupt"),
OPT_END()
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 >

1 - 100 of 262 matches

Mail list logo