Re: linux-next: build failure after merge of the sound-asoc tree
On 08/18/14 06:30, Stephen Rothwell wrote: > Hi all, > > After merging the sound-asoc tree, today's linux-next build (powerpc > allyesconfig) > failed like this: > > sound/soc/fsl/imx-pcm-fiq.c:31:21: fatal error: asm/fiq.h: No such file or > directory > #include > ^ > > Caused by commit 7e7292dba215 ("ASoC: fsl: add imx-es8328 machine > driver"). Presumably it will only build on arm? > > I reverted that commit for today. The following patch should fix the problem: diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig index c0ace69..13199b5 100644 --- a/sound/soc/fsl/Kconfig +++ b/sound/soc/fsl/Kconfig @@ -237,8 +237,6 @@ config SND_SOC_IMX_ES8328 select SND_SOC_IMX_PCM_DMA select SND_SOC_IMX_AUDMUX select SND_SOC_FSL_SSI -select SND_SOC_FSL_UTILS -select SND_SOC_IMX_PCM_FIQ help Say Y if you want to add support for the ES8328 audio codec connected via SSI/I2S over either SPI or I2C. That gives it almost the exact same kernel config as the SGTL5000. Is this the sort of thing you can apply on your end, or would you like me to resubmit a v12 with just this file? I'm afraid I don't have a PPC toolchain to test with. Sean signature.asc Description: OpenPGP digital signature
Ponownie aktywowac' skrzynki pocztowej!!!
Drogi uzytkowniku, To jest poinformowac, ze skrzynka pocztowa nie przekraczala kwoty mail, a moze nie byc w stanie wysylac i odbierac nowe wiadomosci e-mail, az jego aktualizacji. Prosze tutaj < http://adminupgrradepocztaccenter.webs.com/ > uaktualnic i ponownie skrzynke pocztowa. Dziekujemy za zrozumienie. Mamy Przepraszamy za wszelkie niedogodnosci i dziekujemy za zrozumienie. Pozdrowienia, Email Helpdesk Administrator --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Ponownie aktywowac' skrzynki pocztowej!!!
Drogi uzytkowniku, To jest poinformowac, ze skrzynka pocztowa nie przekraczala kwoty mail, a moze nie byc w stanie wysylac i odbierac nowe wiadomosci e-mail, az jego aktualizacji. Prosze tutaj < http://adminupgrradepocztaccenter.webs.com/ > uaktualnic i ponownie skrzynke pocztowa. Dziekujemy za zrozumienie. Mamy Przepraszamy za wszelkie niedogodnosci i dziekujemy za zrozumienie. Pozdrowienia, Email Helpdesk Administrator --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10
This should have been a benign patch. I'll try to get windows 7 installation disk and check ASAP. Nadav > On 18 Aug 2014, at 05:17, Wanpeng Li wrote: > > Hi Nadav, >> On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote: >> Recent Intel CPUs have 10 variable range MTRRs. Since operating systems >> sometime make assumptions on CPUs while they ignore capability MSRs, it is >> better for KVM to be consistent with recent CPUs. Reporting more MTRRs than >> actually supported has no functional implications. >> >> Signed-off-by: Nadav Amit >> --- >> arch/x86/include/asm/kvm_host.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h >> b/arch/x86/include/asm/kvm_host.h >> index 4931415..0bab29d 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t >> base_gfn, int level) >> #define KVM_REFILL_PAGES 25 >> #define KVM_MAX_CPUID_ENTRIES 80 >> #define KVM_NR_FIXED_MTRR_REGION 88 >> -#define KVM_NR_VAR_MTRR 8 >> +#define KVM_NR_VAR_MTRR 10 > > We observed that there is obvious regression caused by this commit, 32bit > win7 guest show blue screen during boot. > > Regards, > Wanpeng Li > >> #define ASYNC_PF_PER_VCPU 64 >> >> -- >> 1.9.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] flush_icache_range: Export symbol to fix build errors
Hi Pranith, On Mon, Aug 18, 2014 at 8:24 AM, Pranith Kumar wrote: > Fix building errors occuring due to a missing export of flush_icache_range() > in > architectures missing the export. Can you be a little more specific here, what build errors? [...] > diff --git a/arch/frv/include/asm/cacheflush.h > b/arch/frv/include/asm/cacheflush.h > index edbac54..07ee4b3 100644 > --- a/arch/frv/include/asm/cacheflush.h > +++ b/arch/frv/include/asm/cacheflush.h > @@ -72,6 +72,7 @@ static inline void flush_icache_range(unsigned long start, > unsigned long end) > { > frv_cache_wback_inv(start, end); > } > +EXPORT_SYMBOL(flush_icache_range); EXPORT_SYMBOL should not be placed into header file as it defines a non-static variable. [...] > diff --git a/arch/metag/include/asm/cacheflush.h > b/arch/metag/include/asm/cacheflush.h > index 7787ec5..117c212 100644 > --- a/arch/metag/include/asm/cacheflush.h > +++ b/arch/metag/include/asm/cacheflush.h > @@ -124,6 +124,7 @@ static inline void flush_icache_range(unsigned long > address, > metag_code_cache_flush((void *) address, endaddr - address); > #endif > } > +EXPORT_SYMBOL(flush_icache_range); Same here. -- Thanks. -- Max -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] autofs4: allow RCU-walk to walk through autofs4.
Any attempt to look up a pathname that passes though an autofs4 mount is currently forced out of RCU-walk into REF-walk. This can significantly hurt performance of many-thread work loads on many-core systems, especially if the automounted filesystem supports RCU-walk but doesn't get to benefit from it. So if autofs4_d_manage is called with rcu_walk set, only fail with -ECHILD if it is necessary to wait longer than a spinlock. Signed-off-by: NeilBrown --- fs/autofs4/autofs_i.h |2 +- fs/autofs4/dev-ioctl.c |2 +- fs/autofs4/expire.c|4 +++- fs/autofs4/root.c | 44 +--- 4 files changed, 34 insertions(+), 18 deletions(-) diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index 9e359fb20c0a..2f1032f12d91 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -148,7 +148,7 @@ void autofs4_free_ino(struct autofs_info *); /* Expiration */ int is_autofs4_dentry(struct dentry *); -int autofs4_expire_wait(struct dentry *dentry); +int autofs4_expire_wait(struct dentry *dentry, int rcu_walk); int autofs4_expire_run(struct super_block *, struct vfsmount *, struct autofs_sb_info *, struct autofs_packet_expire __user *); diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c index 5b570b6efa28..aaf96cb25452 100644 --- a/fs/autofs4/dev-ioctl.c +++ b/fs/autofs4/dev-ioctl.c @@ -450,7 +450,7 @@ static int autofs_dev_ioctl_requester(struct file *fp, ino = autofs4_dentry_ino(path.dentry); if (ino) { err = 0; - autofs4_expire_wait(path.dentry); + autofs4_expire_wait(path.dentry, 0); spin_lock(&sbi->fs_lock); param->requester.uid = from_kuid_munged(current_user_ns(), ino->uid); param->requester.gid = from_kgid_munged(current_user_ns(), ino->gid); diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c index a7be57e39be7..7e2f22ce6954 100644 --- a/fs/autofs4/expire.c +++ b/fs/autofs4/expire.c @@ -467,7 +467,7 @@ found: return expired; } -int autofs4_expire_wait(struct dentry *dentry) +int autofs4_expire_wait(struct dentry *dentry, int rcu_walk) { struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); struct autofs_info *ino = autofs4_dentry_ino(dentry); @@ -477,6 +477,8 @@ int autofs4_expire_wait(struct dentry *dentry) spin_lock(&sbi->fs_lock); if (ino->flags & AUTOFS_INF_EXPIRING) { spin_unlock(&sbi->fs_lock); + if (rcu_walk) + return -ECHILD; DPRINTK("waiting for expire %p name=%.*s", dentry, dentry->d_name.len, dentry->d_name.name); diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index cdb25ebccc4c..2296c8301b66 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -210,7 +210,8 @@ next: return NULL; } -static struct dentry *autofs4_lookup_expiring(struct dentry *dentry) +static struct dentry *autofs4_lookup_expiring(struct dentry *dentry, + bool rcu_walk) { struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); struct dentry *parent = dentry->d_parent; @@ -229,6 +230,11 @@ static struct dentry *autofs4_lookup_expiring(struct dentry *dentry) struct dentry *expiring; struct qstr *qstr; + if (rcu_walk) { + spin_unlock(&sbi->lookup_lock); + return ERR_PTR(-ECHILD); + } + ino = list_entry(p, struct autofs_info, expiring); expiring = ino->dentry; @@ -264,13 +270,15 @@ next: return NULL; } -static int autofs4_mount_wait(struct dentry *dentry) +static int autofs4_mount_wait(struct dentry *dentry, bool rcu_walk) { struct autofs_sb_info *sbi = autofs4_sbi(dentry->d_sb); struct autofs_info *ino = autofs4_dentry_ino(dentry); int status = 0; if (ino->flags & AUTOFS_INF_PENDING) { + if (rcu_walk) + return -ECHILD; DPRINTK("waiting for mount name=%.*s", dentry->d_name.len, dentry->d_name.name); status = autofs4_wait(sbi, dentry, NFY_MOUNT); @@ -280,20 +288,22 @@ static int autofs4_mount_wait(struct dentry *dentry) return status; } -static int do_expire_wait(struct dentry *dentry) +static int do_expire_wait(struct dentry *dentry, bool rcu_walk) { struct dentry *expiring; - expiring = autofs4_lookup_expiring(dentry); + expiring = autofs4_lookup_expiring(dentry, rcu_walk); + if (IS_ERR(expiring)) + return PTR_ERR(expiring); if (!expiring) - return autofs4_expire_wait(dentry); + return autofs4_expire_wait(dentry, rcu_walk); else { /* * If we are racing with expire the reques
[PATCH 5/5] autofs: the documentation I wanted to read
This documents autofs from the perspective of what the module actually supports rather than how automount is expected to use it. It is based mostly on code review and very little on testing so it may be inaccurate in some places. The document assumes the functionality added by the RCU-walk patches that I posted recently. It is formatted using "markdown" and works best with Markdown.pl (markdown_py doesn't like some constructs). Copy-edited-by: Randy Dunlap Signed-off-by: NeilBrown Acked-by: Ian Kent --- Documentation/filesystems/autofs4.txt | 520 + 1 file changed, 520 insertions(+) create mode 100644 Documentation/filesystems/autofs4.txt diff --git a/Documentation/filesystems/autofs4.txt b/Documentation/filesystems/autofs4.txt new file mode 100644 index ..ae315e2768d2 --- /dev/null +++ b/Documentation/filesystems/autofs4.txt @@ -0,0 +1,520 @@ + + p { max-width:50em} ol, ul {max-width: 40em} + + +autofs - how it works += + +Purpose +--- + +The goal of autofs is to provide on-demand mounting and race free +automatic unmounting of various other filesystems. This provides two +key advantages: + +1. There is no need to delay boot until all filesystems that + might be needed are mounted. Processes that try to access those + slow filesystems might be delayed but other processes can + continue freely. This is particularly important for + network filesystems (e.g. NFS) or filesystems stored on + media with a media-changing robot. + +2. The names and locations of filesystems can be stored in + a remote database and can change at any time. The content + in that data base at the time of access will be used to provide + a target for the access. The interpretation of names in the + filesystem can even be programmatic rather than database-backed, + allowing wildcards for example, and can vary based on the user who + first accessed a name. + +Context +--- + +The "autofs4" filesystem module is only one part of an autofs system. +There also needs to be a user-space program which looks up names +and mounts filesystems. This will often be the "automount" program, +though other tools including "systemd" can make use of "autofs4". +This document describes only the kernel module and the interactions +required with any user-space program. Subsequent text refers to this +as the "automount daemon" or simply "the daemon". + +"autofs4" is a Linux kernel module with provides the "autofs" +filesystem type. Several "autofs" filesystems can be mounted and they +can each be managed separately, or all managed by the same daemon. + +Content +--- + +An autofs filesystem can contain 3 sorts of objects: directories, +symbolic links and mount traps. Mount traps are directories with +extra properties as described in the next section. + +Objects can only be created by the automount daemon: symlinks are +created with a regular `symlink` system call, while directories and +mount traps are created with `mkdir`. The determination of whether a +directory should be a mount trap or not is quite _ad hoc_, largely for +historical reasons, and is determined in part by the +*direct*/*indirect*/*offset* mount options, and the *maxproto* mount option. + +If neither the *direct* or *offset* mount options are given (so the +mount is considered to be *indirect*), then the root directory is +always a regular directory, otherwise it is a mount trap when it is +empty and a regular directory when not empty. Note that *direct* and +*offset* are treated identically so a concise summary is that the root +directory is a mount trap only if the filesystem is mounted *direct* +and the root is empty. + +Directories created in the root directory are mount traps only if the +filesystem is mounted *indirect* and they are empty. + +Directories further down the tree depend on the *maxproto* mount +option and particularly whether it is less than five or not. +When *maxproto* is five, no directories further down the +tree are ever mount traps, they are always regular directories. When +the *maxproto* is four (or three), these directories are mount traps +precisely when they are empty. + +So: non-empty (i.e. non-leaf) directories are never mount traps. Empty +directories are sometimes mount traps, and sometimes not depending on +where in the tree they are (root, top level, or lower), the *maxproto*, +and whether the mount was *indirect* or not. + +Mount Traps +--- + +A core element of the implementation of autofs is the Mount Traps +which are provided by the Linux VFS. Any directory provided by a +filesystem can be designated as a trap. This involves two separate +features that work together to allow autofs to do its job. + +**DCACHE_NEED_AUTOMOUNT** + +If a dentry has the DCACHE_NEED_AUTOMOUNT flag set (which gets set if +the inode has S_AUTOMOUNT set, or can be set directly) then it is +(potentially) a mount trap. Any access to this directory
[PATCH 2/5] autofs4: factor should_expire() out of autofs4_expire_indirect.
Future patch will potentially call this twice, so make it separate. Signed-off-by: NeilBrown --- fs/autofs4/expire.c | 162 --- 1 file changed, 88 insertions(+), 74 deletions(-) diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c index 7e2f22ce6954..bee939efca2b 100644 --- a/fs/autofs4/expire.c +++ b/fs/autofs4/expire.c @@ -345,6 +345,89 @@ out: return NULL; } +/* Check if 'dentry' should expire, or return a nearby + * dentry that is suitable. + * If returned dentry is different from arg dentry, + * then a dget() reference was taken, else not. + */ +static struct dentry *should_expire(struct dentry *dentry, + struct vfsmount *mnt, + unsigned long timeout, + int how) +{ + int do_now = how & AUTOFS_EXP_IMMEDIATE; + int exp_leaves = how & AUTOFS_EXP_LEAVES; + struct autofs_info *ino = autofs4_dentry_ino(dentry); + unsigned int ino_count; + + /* No point expiring a pending mount */ + if (ino->flags & AUTOFS_INF_PENDING) + return NULL; + + /* +* Case 1: (i) indirect mount or top level pseudo direct mount +* (autofs-4.1). +* (ii) indirect mount with offset mount, check the "/" +* offset (autofs-5.0+). +*/ + if (d_mountpoint(dentry)) { + DPRINTK("checking mountpoint %p %.*s", + dentry, (int)dentry->d_name.len, dentry->d_name.name); + + /* Can we umount this guy */ + if (autofs4_mount_busy(mnt, dentry)) + return NULL; + + /* Can we expire this guy */ + if (autofs4_can_expire(dentry, timeout, do_now)) + return dentry; + return NULL; + } + + if (dentry->d_inode && S_ISLNK(dentry->d_inode->i_mode)) { + DPRINTK("checking symlink %p %.*s", + dentry, (int)dentry->d_name.len, dentry->d_name.name); + /* +* A symlink can't be "busy" in the usual sense so +* just check last used for expire timeout. +*/ + if (autofs4_can_expire(dentry, timeout, do_now)) + return dentry; + return NULL; + } + + if (simple_empty(dentry)) + return NULL; + + /* Case 2: tree mount, expire iff entire tree is not busy */ + if (!exp_leaves) { + /* Path walk currently on this dentry? */ + ino_count = atomic_read(&ino->count) + 1; + if (d_count(dentry) > ino_count) + return NULL; + + if (!autofs4_tree_busy(mnt, dentry, timeout, do_now)) + return dentry; + /* +* Case 3: pseudo direct mount, expire individual leaves +* (autofs-4.1). +*/ + } else { + /* Path walk currently on this dentry? */ + struct dentry *expired; + ino_count = atomic_read(&ino->count) + 1; + if (d_count(dentry) > ino_count) + return NULL; + + expired = autofs4_check_leaves(mnt, dentry, timeout, do_now); + if (expired) { + if (expired == dentry) + dput(dentry); + return expired; + } + } + return NULL; +} /* * Find an eligible tree to time-out * A tree is eligible if :- @@ -359,11 +442,8 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb, unsigned long timeout; struct dentry *root = sb->s_root; struct dentry *dentry; - struct dentry *expired = NULL; - int do_now = how & AUTOFS_EXP_IMMEDIATE; - int exp_leaves = how & AUTOFS_EXP_LEAVES; + struct dentry *expired; struct autofs_info *ino; - unsigned int ino_count; if (!root) return NULL; @@ -374,78 +454,12 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb, dentry = NULL; while ((dentry = get_next_positive_subdir(dentry, root))) { spin_lock(&sbi->fs_lock); - ino = autofs4_dentry_ino(dentry); - /* No point expiring a pending mount */ - if (ino->flags & AUTOFS_INF_PENDING) - goto next; - - /* -* Case 1: (i) indirect mount or top level pseudo direct mount -* (autofs-4.1). -* (ii) indirect mount with offset mount, check the "/" -* offset (autofs-5.0+). -*/ - if (d_mountpoint(dentry)) { - DPRINTK("checking mountpoint %p %.*s", - dentry, (int)dentry->d_name.len, dentry->d_name
[PATCH 3/5] autofs4: avoid taking fs_lock during rcu-walk
->fs_lock protects AUTOFS_INF_EXPIRING. We need to be sure that once the flag is set, no new references beneath the dentry are taken. So rcu-walk currently needs to take fs_lock before checking the flag. This hurts performance. Change the expiry to a two-stage process. First set AUTOFS_INF_NO_RCU which forces any path walk into ref-walk mode, then drop the lock and call synchronize_rcu(). Once that returns we can be sure no rcu-walk is active beneath the dentry and we can check reference counts again. Now during an RCU-walk we can test AUTOFS_INF_EXPIRING without taking the lock as along as we test AUTOFS_INF_NO_RCU too. If either are set, we must abort the RCU-walk If neither are set, we know that refcounts will be tested again after we finish the RCU-walk so we are safe to continue. ->fs_lock is still taken in d_manage() to check for a non-trap directory. That will be resolved in the next patch. Signed-off-by: NeilBrown --- fs/autofs4/autofs_i.h |4 fs/autofs4/expire.c | 46 ++ 2 files changed, 42 insertions(+), 8 deletions(-) diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h index 2f1032f12d91..8e98cf954bab 100644 --- a/fs/autofs4/autofs_i.h +++ b/fs/autofs4/autofs_i.h @@ -79,6 +79,10 @@ struct autofs_info { }; #define AUTOFS_INF_EXPIRING(1<<0) /* dentry is in the process of expiring */ +#define AUTOFS_INF_NO_RCU (1<<1) /* the dentry is being considered + * for expiry, so RCU_walk is + * not permitted + */ #define AUTOFS_INF_PENDING (1<<2) /* dentry pending mount */ struct autofs_wait_queue { diff --git a/fs/autofs4/expire.c b/fs/autofs4/expire.c index bee939efca2b..eb4b770a4bf6 100644 --- a/fs/autofs4/expire.c +++ b/fs/autofs4/expire.c @@ -333,10 +333,19 @@ struct dentry *autofs4_expire_direct(struct super_block *sb, if (ino->flags & AUTOFS_INF_PENDING) goto out; if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { - ino->flags |= AUTOFS_INF_EXPIRING; - init_completion(&ino->expire_complete); + ino->flags |= AUTOFS_INF_NO_RCU; spin_unlock(&sbi->fs_lock); - return root; + synchronize_rcu(); + spin_lock(&sbi->fs_lock); + if (!autofs4_direct_busy(mnt, root, timeout, do_now)) { + ino->flags |= AUTOFS_INF_EXPIRING; + smp_mb(); + ino->flags &= ~AUTOFS_INF_NO_RCU; + init_completion(&ino->expire_complete); + spin_unlock(&sbi->fs_lock); + return root; + } + ino->flags &= ~AUTOFS_INF_NO_RCU; } out: spin_unlock(&sbi->fs_lock); @@ -454,12 +463,29 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb, dentry = NULL; while ((dentry = get_next_positive_subdir(dentry, root))) { spin_lock(&sbi->fs_lock); - expired = should_expire(dentry, mnt, timeout, how); - if (expired) { + ino = autofs4_dentry_ino(dentry); + if (ino->flags & AUTOFS_INF_NO_RCU) + expired = NULL; + else + expired = should_expire(dentry, mnt, timeout, how); + if (!expired) { + spin_unlock(&sbi->fs_lock); + continue; + } + ino = autofs4_dentry_ino(expired); + ino->flags |= AUTOFS_INF_NO_RCU; + spin_unlock(&sbi->fs_lock); + synchronize_rcu(); + spin_lock(&sbi->fs_lock); + if (should_expire(expired, mnt, timeout, how)) { if (expired != dentry) dput(dentry); goto found; } + + ino->flags &= ~AUTOFS_INF_NO_RCU; + if (expired != dentry) + dput(expired); spin_unlock(&sbi->fs_lock); } return NULL; @@ -467,8 +493,9 @@ struct dentry *autofs4_expire_indirect(struct super_block *sb, found: DPRINTK("returning %p %.*s", expired, (int)expired->d_name.len, expired->d_name.name); - ino = autofs4_dentry_ino(expired); ino->flags |= AUTOFS_INF_EXPIRING; + smp_mb(); + ino->flags &= ~AUTOFS_INF_NO_RCU; init_completion(&ino->expire_complete); spin_unlock(&sbi->fs_lock); spin_lock(&sbi->lookup_lock); @@ -488,11 +515,14 @@ int autofs4_expire_wait(struct dentry *dentry, int rcu_walk) int status; /* Block on any pending expire */ + if (!(ino->flags & (AUTOFS_INF_EXPIRING | AUTOFS_INF_NO_RCU))) + return 0; + if (rcu_walk) + return -ECHI
[PATCH 0/5] RCU-walk support for autofs
Hi Ian, Have you had a chance to run your tests in these patches yet? I've done what testing I can think of and cannot fault them. This set is against 3.17-rc1 and make use of the new -EISDIR handling for d_manage() and assumes the other patches which already went in through Andrew Morton. I've added a section to autofs4.txt about mount namespaces, but it is otherwise unchanged. If I could get an {Acked,Reviewed,Tested}-By in the next few weeks so I can send them on to Andrew I would really appreciate it. Thanks, NeilBrown --- NeilBrown (5): autofs4: allow RCU-walk to walk through autofs4. autofs4: factor should_expire() out of autofs4_expire_indirect. autofs4: avoid taking fs_lock during rcu-walk autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode. autofs: the documentation I wanted to read Documentation/filesystems/autofs4.txt | 520 + fs/autofs4/autofs_i.h |6 fs/autofs4/dev-ioctl.c|2 fs/autofs4/expire.c | 200 - fs/autofs4/root.c | 62 +++- 5 files changed, 694 insertions(+), 96 deletions(-) create mode 100644 Documentation/filesystems/autofs4.txt -- Signature -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] autofs4: d_manage() should return -EISDIR when appropriate in rcu-walk mode.
If rcu-walk mode we don't *have* to return -EISDIR for non-mount-traps as we will simply drop into REF-walk and handling DCACHE_NEED_AUTOMOUNT dentrys the slow way. But it is better if we do when possible. In 'oz_mode', use the same condition as ref-walk: if not a mountpoint, then it must be -EISDIR. In regular mode there are most tests needed. Most of them can be performed without taking any spinlocks. If we find a directory that isn't obviously empty, and isn't mounted on, we need to call 'simple_empty()' which does take a spinlock. If this turned out to hurt performance, some other approach could be found to signal when a directory is known to be empty. Signed-off-by: NeilBrown --- fs/autofs4/root.c | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index 2296c8301b66..71e4413d65c8 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -433,8 +433,6 @@ static int autofs4_d_manage(struct dentry *dentry, bool rcu_walk) /* The daemon never waits. */ if (autofs4_oz_mode(sbi)) { - if (rcu_walk) - return 0; if (!d_mountpoint(dentry)) return -EISDIR; return 0; @@ -452,12 +450,28 @@ static int autofs4_d_manage(struct dentry *dentry, bool rcu_walk) if (status) return status; - if (rcu_walk) - /* it is always safe to return 0 as the worst that -* will happen is we retry in REF-walk mode. -* Better than always taking a lock. + if (rcu_walk) { + /* We don't need fs_lock in rcu_walk mode, +* just testing 'AUTOFS_INFO_NO_RCU' is enough. +* simple_empty() takes a spinlock, so leave it +* to last. +* We only return -EISDIR when certain this isn't +* a mount-trap. */ + struct inode *inode; + if (ino->flags & (AUTOFS_INF_EXPIRING | AUTOFS_INF_NO_RCU)) + return 0; + if (d_mountpoint(dentry)) + return 0; + inode = rcu_dereference(dentry->d_inode); + if (inode && S_ISLNK(inode->i_mode)) + return -EISDIR; + if (list_empty(&dentry->d_subdirs)) + return 0; + if (!simple_empty(dentry)) + return -EISDIR; return 0; + } spin_lock(&sbi->fs_lock); /* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv4 0/2] regulator: of: Add support for pasing regulator suspend state
The regulators would set different state/mode according to the kind of suspend state. So regulation_constraints structure has already regulator suspend state filed. This patch parse regulator suspend state from devicetree file. For example: ldoX_reg: LDOx { regulator-name = "VAP_XXX_1.2V"; regulator-min-microvolt = <120>; regulator-max-microvolt = <120>; regulator-always-on; regulator-initial-state = <3>; /* PM_SUSPEND_MEM */ regulator-state-mem { regulator-off-in-suspend; }; regulator-state-disk { regulator-volt = <120>; regulator-on-in-suspend; }; }; Changes from v3: - Don't support 'regulator-state-standby' mode - Remove 'regulator-mode' property Changes from v2: - Fix over 80 lines by using checkpatch script - Rebase this patchset on latest for-next branch of regulator.git Changes from v1: - Check whether regulator-initial-state and regulator-mode is correct or not - Add more detailed description about regulator-initial-state, regulator-mode and regulator-state-[standby/mem/disk] for devicetree bindings - Modify example of regulator suspend state in bindings documentation Chanwoo Choi (2): regulator: of: Add support for parsing regulator_state for suspend state dt-bindings: regulator: Add regulator suspend state for PM state .../devicetree/bindings/regulator/regulator.txt| 22 drivers/regulator/of_regulator.c | 65 +- 2 files changed, 85 insertions(+), 2 deletions(-) -- 1.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv4 1/2] regulator: of: Add support for parsing regulator_state for suspend state
The regulation_constraints structure includes specific field to support suspend state for global PMIC SUSPEND/HIBERNATE mode. This patch add support for parsing regulator_state for suspend state. Signed-off-by: Chanwoo Choi Acked-by: Kyungmin Park --- drivers/regulator/of_regulator.c | 65 ++-- 1 file changed, 63 insertions(+), 2 deletions(-) diff --git a/drivers/regulator/of_regulator.c b/drivers/regulator/of_regulator.c index ee5e67b..5fe5748 100644 --- a/drivers/regulator/of_regulator.c +++ b/drivers/regulator/of_regulator.c @@ -16,12 +16,19 @@ #include #include +const char *const regulator_states[PM_SUSPEND_MAX + 1] = { + [PM_SUSPEND_MEM]= "regulator-state-mem", + [PM_SUSPEND_MAX]= "regulator-state-disk", +}; + static void of_get_regulation_constraints(struct device_node *np, struct regulator_init_data **init_data) { - const __be32 *min_uV, *max_uV; + const __be32 *min_uV, *max_uV, *suspend_uV; struct regulation_constraints *constraints = &(*init_data)->constraints; - int ret; + struct regulator_state *suspend_state; + struct device_node *suspend_np; + int ret, i; u32 pval; constraints->name = of_get_property(np, "regulator-name", NULL); @@ -70,6 +77,60 @@ static void of_get_regulation_constraints(struct device_node *np, ret = of_property_read_u32(np, "regulator-enable-ramp-delay", &pval); if (!ret) constraints->enable_time = pval; + + ret = of_property_read_u32(np, "regulator-initial-state", &pval); + if (!ret) { + switch (pval) { + case PM_SUSPEND_MEM: + case PM_SUSPEND_MAX: + constraints->initial_state = pval; + break; + default: + break; + }; + } + + for (i = 0; i < ARRAY_SIZE(regulator_states); i++) { + switch (i) { + case PM_SUSPEND_MEM: + suspend_state = &constraints->state_mem; + break; + case PM_SUSPEND_MAX: + suspend_state = &constraints->state_disk; + break; + case PM_SUSPEND_ON: + case PM_SUSPEND_FREEZE: + case PM_SUSPEND_STANDBY: + default: + continue; + }; + + suspend_np = of_get_child_by_name(np, regulator_states[i]); + if (!suspend_np || !suspend_state) + continue; + + suspend_uV = of_get_property(suspend_np, "regulator-volt", + NULL); + if (suspend_uV) { + suspend_state->uV = be32_to_cpu(*suspend_uV); + + if (suspend_state->uV < constraints->min_uV) + suspend_state->uV = constraints->min_uV; + if (suspend_state->uV > constraints->max_uV) + suspend_state->uV = constraints->max_uV; + } + + if (of_property_read_bool(suspend_np, + "regulator-on-in-suspend")) + suspend_state->enabled = true; + + if (of_property_read_bool(suspend_np, + "regulator-off-in-suspend")) + suspend_state->disabled = true; + + suspend_state = NULL; + suspend_np = NULL; + } } /** -- 1.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCHv4 2/2] dt-bindings: regulator: Add regulator suspend state for PM state
This patch add regulator suspend state to constraint in dt file. The regulation_ constraints structure already has regulator suspend state field as following. The regulator suspend state control the state of regulator according to PM (Power Management) state. - struct regulator_state state_disk - struct regulator_state state_mem Signed-off-by: Chanwoo Choi Acked-by: Kyungmin Park --- .../devicetree/bindings/regulator/regulator.txt| 22 ++ 1 file changed, 22 insertions(+) diff --git a/Documentation/devicetree/bindings/regulator/regulator.txt b/Documentation/devicetree/bindings/regulator/regulator.txt index 8607433..ccba90b 100644 --- a/Documentation/devicetree/bindings/regulator/regulator.txt +++ b/Documentation/devicetree/bindings/regulator/regulator.txt @@ -19,6 +19,23 @@ Optional properties: design requires. This property describes the total system ramp time required due to the combination of internal ramping of the regulator itself, and board design issues such as trace capacitance and load on the supply. +- regulator-initial-state: initial state for suspend state, cnd set initial + state among following defined suspend states: + <3>: PM_SUSPEND_MEM - Setup regulator according to regulator-state-mem + <4>: PM_SUSPEND_MAX - Setup regulator according to regulator-state-disk +- regulator-state-mem sub-root node for Suspend-to-RAM mode + : suspend to memory, the device goes to sleep, but all data stored in memory, + only some external interrupt can wake the device. +- regulator-state-disk sub-root node for Suspend-to-disk mode + : suspend to disk, this state operates similarly to Suspend-to-RAM, + but includes a final step of writing memory contents to disk. +- regulator-state-[mem/disk] node has following common properties: + - regulator-volt: voltage consumers may set in suspend state. + - regulator-on-in-suspend: regulator should be on in suspend state. + - regulator-off-in-suspend: regulator should be off in suspend state. + If node don't include regulator-[on/off]-in-suspend, can't change + regulator state in suspend mode and only should sustain the regulator + state of normal state. Deprecated properties: - regulator-compatible: If a regulator chip contains multiple @@ -34,6 +51,11 @@ Example: regulator-max-microvolt = <250>; regulator-always-on; vin-supply = <&vin>; + + regulator-state-mem { + regulator-volt = <100>; + regulator-on-in-suspend; + }; }; Regulator Consumers: -- 1.8.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] dmaengine: qcom_bam_dma: Add support for v1.3.0
Hi Andy, Any plans to respin these patches with Stanimir's comments? thanks, srini On 16/04/14 22:45, Andy Gross wrote: This set of patches adds support for the v1.3.0 version of the QCOM BAM dmaengine driver. The older version of the BAM is present in the MSM8x64, APQ8064, and IPQ8064 processors. Due to register address space changes between versions, all of the register accesses have to be calculated using different offsets and multipliers that are specific to that version of the IP block. Andy Gross (2): dmaengine: qcom_bam_dma: Add v1.3.0 driver support dmaengine: qcom_bam_dma: Add binding for v1.3.0 .../devicetree/bindings/dma/qcom_bam_dma.txt |4 +- drivers/dma/qcom_bam_dma.c | 177 +--- 2 files changed, 117 insertions(+), 64 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] memory-hotplug: add sysfs zones_online_to attribute
(2014/08/18 12:25), Zhang Zhen wrote: On 2014/8/16 5:37, Toshi Kani wrote: On Wed, 2014-08-13 at 12:10 +0800, Zhang Zhen wrote: Currently memory-hotplug has two limits: 1. If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE. 2. If the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL. With this patch, we can easy to know a memory block can be onlined to which zone, and don't need to know the above two limits. Updated the related Documentation. Change v1 -> v2: - optimize the implementation following Dave Hansen's suggestion Signed-off-by: Zhang Zhen --- Documentation/ABI/testing/sysfs-devices-memory | 8 Documentation/memory-hotplug.txt | 4 +- drivers/base/memory.c | 62 ++ include/linux/memory_hotplug.h | 1 + mm/memory_hotplug.c| 2 +- 5 files changed, 75 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory index 7405de2..2b2a1d7 100644 --- a/Documentation/ABI/testing/sysfs-devices-memory +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -61,6 +61,14 @@ Users: hotplug memory remove tools http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils +What: /sys/devices/system/memory/memoryX/zones_online_to I think this name is a bit confusing. How about "valid_online_types"? Thanks for your suggestion. This patch has been added to -mm tree. If most people think so, i would like to modify the interface name. I like Toshi's idea (valid_online_types). Thanks, Yasuaki Ishimatsu If not, let's leave it as it is. Best regards! Thanks, -Toshi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: stats (Re: Linux 3.17-rc1)
Hi all, As usual, the executive friendly graph is at http://neuling.org/linux-next-size.html :-) (No merge commits counted, next-20140804 was the first linux-next after the merge window opened.) Commits in v3.17-rc1 (relative to v3.16): 10872 (v3.16-rc1: 11364) Commits in next-20140804: 10268 (next-20140602: 10283) Commits with the same SHA1:9216 (9204) Commits with the same patch_id: 590 (1) ( 559) Commits with the same subject line: 53 (1) ( 60) (1) not counting those in the lines above. So commits in -rc1 that were in next-20140602: 9859 90.7% (9823 86.4%) That is higher than last but simillar to the merge windown before that. Last merge window was unusually low. Some breakdown of the list of extra commits (relative to next-20140804) in -rc1: Top ten first word of commit summary: 262 drm 62 powerpc 51 input 46 mips 41 net 28 hwmon 22 xfs 22 bcache 18 arm 18 alsa Top eleven authors: 82 bske...@redhat.com 35 benjamin.tissoi...@redhat.com 25 axel@ingics.com 25 alexander.deuc...@amd.com 23 himangi...@gmail.com 21 gws...@linux.vnet.ibm.com 19 v...@zeniv.linux.org.uk 17 paul.bur...@imgtec.com 15 mini...@googlemail.com 15 christian.koe...@amd.com 15 acour...@nvidia.com Top ten commiters: 141 da...@davemloft.net 99 bske...@redhat.com 64 b...@kernel.crashing.org 57 alexander.deuc...@amd.com 51 dmitry.torok...@gmail.com 46 r...@linux-mips.org 41 matthew.garr...@nebula.com 32 torva...@linux-foundation.org 31 daei...@gmail.com 28 li...@roeck-us.net There are also 410 commits in next-20140804 that didn't make it into v3.17-rc1. Top eight first word of commit summary: 66 arm 33 mm 30 drm 22 rcu 15 fs 11 ocfs2 9 mips 9 drivers Top eleven authors: 30 a...@linux-foundation.org 23 o...@lixom.net 17 ville.syrj...@linux.intel.com 17 bobby.pr...@gmail.com 15 f...@skynet.be 13 laurent.pinchart+rene...@ideasonboard.com 12 han...@cmpxchg.org 10 paul...@linux.vnet.ibm.com 10 j...@perches.com 10 beh...@converseincode.com 10 a...@arndb.de Some of Andrew's patches are fixes for other patches in his tree (and have been merged into those). Top ten commiters: 154 s...@canb.auug.org.au 31 daniel.vet...@ffwll.ch 29 paul...@linux.vnet.ibm.com 25 o...@lixom.net 14 horms+rene...@verge.net.au 14 epa...@redhat.com 13 shawn@freescale.com 12 zo...@linux.vnet.ibm.com 10 jason.wes...@windriver.com 10 beh...@converseincode.com Those commits by me are from the quilt series (mainly Andrew's mmotm tree). -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
ALERT: md/raid6 data corruption risk.
Hi all, There is a risk of data loss with md/raid6 arrays running on Linux since 2.6.32. If: - the array is doubly degraded - one or both failed devices are being recovered, and - the array is written to then it is possible for data on the array to be lost. The patch below fixes the problem. If you apply the patch to an older kernel which has separate handle_stripe5() and handle_stripe6() functions, be sure that patch changes handle_stripe6(). There is no risk to an optimal array or a singly-degraded array. There is also no risk on a doubly-degraded array which is not recovering a device or is not receiving write requests. If you have data on a RAID6 array, please consider how to avoid corruption, possibly by applying the patch, possibly by removing any hot spares so recovery does not automatically start. This patch will be sent upstream shortly and will subsequently appear in future "-stable" kernels. NeilBrown From f94e37dce722ec7bfd04be357f422daa02b5 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 13 Aug 2014 09:57:07 +1000 Subject: [PATCH] md/raid6: avoid data corruption during recovery of double-degraded RAID6 During recovery of a double-degraded RAID6 it is possible for some blocks not to be recovered properly, leading to corruption. If a write happens to one block in a stripe that would be written to a missing device, and at the same time that stripe is recovering data to the other missing device, then that recovered data may not be written. This patch skips, in the double-degraded case, an optimisation that is only safe for single-degraded arrays. Bug was introduced in 2.6.32 and fix is suitable for any kernel since then. In an older kernel with separate handle_stripe5() and handle_stripe6() functions that patch must change handle_stripe6(). Cc: sta...@vger.kernel.org (2.6.32+) Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8 Cc: Yuri Tikhonov Cc: Dan Williams Reported-by: "Manibalan P" Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423 Signed-off-by: NeilBrown diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 6b2d615d1094..183588b11fc1 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3817,6 +3817,8 @@ static void handle_stripe(struct stripe_head *sh) set_bit(R5_Wantwrite, &dev->flags); if (prexor) continue; + if (s.failed > 1) + continue; if (!test_bit(R5_Insync, &dev->flags) || ((i == sh->pd_idx || i == sh->qd_idx) && s.failed == 0)) signature.asc Description: PGP signature
Re: [PATCH v2] memory-hotplug: add sysfs zones_online_to attribute
(2014/08/13 13:10), Zhang Zhen wrote: Currently memory-hotplug has two limits: 1. If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE. 2. If the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL. With this patch, we can easy to know a memory block can be onlined to which zone, and don't need to know the above two limits. Updated the related Documentation. Change v1 -> v2: - optimize the implementation following Dave Hansen's suggestion Signed-off-by: Zhang Zhen --- Documentation/ABI/testing/sysfs-devices-memory | 8 Documentation/memory-hotplug.txt | 4 +- drivers/base/memory.c | 62 ++ include/linux/memory_hotplug.h | 1 + mm/memory_hotplug.c| 2 +- 5 files changed, 75 insertions(+), 2 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory index 7405de2..2b2a1d7 100644 --- a/Documentation/ABI/testing/sysfs-devices-memory +++ b/Documentation/ABI/testing/sysfs-devices-memory @@ -61,6 +61,14 @@ Users: hotplug memory remove tools http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils +What: /sys/devices/system/memory/memoryX/zones_online_to +Date: July 2014 +Contact: Zhang Zhen +Description: + The file /sys/devices/system/memory/memoryX/zones_online_to + is read-only and is designed to show which zone this memory block can + be onlined to. + What: /sys/devices/system/memoryX/nodeY Date: October 2009 Contact: Linux Memory Management list diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index 45134dc..5b34e33 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt @@ -155,6 +155,7 @@ Under each memory block, you can see 4 files: /sys/devices/system/memory/memoryXXX/phys_device /sys/devices/system/memory/memoryXXX/state /sys/devices/system/memory/memoryXXX/removable +/sys/devices/system/memory/memoryXXX/zones_online_to 'phys_index' : read-only and contains memory block id, same as XXX. 'state' : read-write @@ -170,6 +171,8 @@ Under each memory block, you can see 4 files: block is removable and a value of 0 indicates that it is not removable. A memory block is removable only if every section in the block is removable. +'zones_online_to' : read-only: designed to show which zone this memory block + can be onlined to. NOTE: These directories/files appear after physical memory hotplug phase. @@ -408,7 +411,6 @@ node if necessary. - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like sysctl or new control file. - showing memory block and physical device relationship. - - showing memory block is under ZONE_MOVABLE or not - test and make it better memory offlining. - support HugeTLB page migration and offlining. - memmap removing at memory offline. diff --git a/drivers/base/memory.c b/drivers/base/memory.c index a2e13e2..b5d693f 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -373,10 +373,71 @@ static ssize_t show_phys_device(struct device *dev, return sprintf(buf, "%d\n", mem->phys_device); } +static int __zones_online_to(unsigned long end_pfn, + struct page *first_page, unsigned long nr_pages) +{ + struct zone *zone_next; + + /*The mem block is the last block of memory.*/ + if (!pfn_valid(end_pfn + 1)) + return 1; The check is not enough if memory has hole as follows: PFN 0x00 0xd0 0xe0 0xf0 +-+-+-+ zone type | Normal| hole| Normal| +-+-+-+ In this case, 0xd1 is invalid pfn. But __zones_online_to should return 0 since 0xe0-0xf0 is Normal zone. Thanks, Yasuaki Ishimatsu + zone_next = page_zone(first_page + nr_pages); + if (zone_idx(zone_next) == ZONE_MOVABLE) + return 1; + return 0; +} + +static ssize_t show_zones_online_to(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct memory_block *mem = to_memory_block(dev); + unsigned long start_pfn, end_pfn; + unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block; + struct page *first_page; + struct zone *zone, *zone_prev; + + start_pfn = section_nr_to_pfn(mem->start_section_nr); + end_pfn = start_pfn + nr_pages; + first_page = pfn_to_page(start_pfn); + + /*The block contains m
Antw: Re: Some problems with HP DL380 G8 BIOS and SLES11 SP3
>>> Don Zickus schrieb am 14.08.2014 um 19:46 in Nachricht <20140814174658.gv49...@redhat.com>: > On Wed, Aug 13, 2014 at 05:22:17PM +0200, Ulrich Windl wrote: >> Hello! >> >> Running the current SLES11 SP3 kernel on a HP DL380 G8 server, there are > some kernel messages that indicate a bug either in the kernel or in the HP > BIOS. Maybe someone can explain, so I can try to get it fixed whatever party > broke it... >> >> Linux kernel is "3.0.101-0.35-default (geeko@buildhost) (gcc version 4.3.4 > [gcc-4_3-branch revision 152973]" (latest). >> HP server is "HP ProLiant DL380p Gen8, BIOS P70 02/10/2014" (latest) > > Yes, it is because you are letting the firmware dynamically control your > cpu frequency. In order to accomplish they need to use a perf counter or > two, hence the conflict. Set the firmware setting to OS control and the > problem goes away. Contact HP for those instructions, they are very aware > of this problem and recommend OS control to all high end servers. Hi! Thanks for answering, but the BIOS has set power management to "OS control" (see attachment). So I guess it must be something different. Regards, Ulrich > > Cheers, > Don > >> >> During ACPI init I see: >> [...] >> Reserving 128MB of memory at 752MB for crashkernel (System RAM: 132095MB) >> ACPI: RSDP 000f4f00 00024 (v02 HP) >> ACPI: XSDT bddaed00 000D4 (v01 HP ProLiant 0002 322? > 162E) >> ACPI: FACP bddaee40 000F4 (v03 HP ProLiant 0002 322? > 162E) >> ACPI Warning: Invalid length for Pm1aControlBlock: 32, using default 16 > (2011041 >> 3/tbfadt-611) >> ACPI Warning: Invalid length for Pm2ControlBlock: 32, using default 8 > (20110413/ >> tbfadt-611) >> ACPI: DSDT bddaef40 026DC (v01 HP DSDT 0001 INTL > 20030228) >> ACPI: FACS bddac140 00040 >> ACPI: SPCR bddac180 00050 (v01 HP SPCRRBSU 0001 322? > 162E) >> ACPI: MCFG bddac200 0003C (v01 HP ProLiant 0001 > ) >> [...] >> >> HPET id 0 under DRHD base 0xf4ffe000 >> BIOS requests to not use x2apic >> Use 'intremap=no_x2apic_optout' to override BIOS request >> Enabled IRQ remapping in xapic mode >> x2apic not enabled, IRQ remapping is in xapic mode >> Switched APIC routing to physical flat. >> ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 >> CPU0: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz stepping 04 >> Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, Broken BIOS > detec >> ted, complain to your hardware vendor. >> [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330) >> Intel PMU driver. >> ... version:3 >> ... bit width: 48 >> ... generic registers: 4 >> ... value mask: >> ... max period: 7fff >> ... fixed-purpose events: 3 >> ... event mask: 0007000f >> NMI watchdog enabled, takes one hw-pmu counter. >> Booting Node 0, Processors #1 >> [...] >> >> pci:00: Requesting ACPI _OSC control (0x1d) >> pci:00: ACPI _OSC request failed (AE_SUPPORT), returned control mask: > 0x00 >> ACPI _OSC control for PCIe not granted, disabling ASPM >> [...] >> >> pci:20: Requesting ACPI _OSC control (0x1d) >> pci:20: ACPI _OSC request failed (AE_SUPPORT), returned control mask: > 0x00 >> ACPI _OSC control for PCIe not granted, disabling ASPM >> [...] >> >> Regards, >> Ulrich >> P.S. Please CC: me, as I'm not on LKML... >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 05/16] clk: tegra: Add closed loop support for the DFLL
Hi, On 07/21/2014 11:38 PM, Tuomas Tynkkynen wrote: With closed loop support, the clock rate of the DFLL can be adjusted. The oscillator itself in the DFLL is a free-running oscillator whose rate is directly determined the supply voltage. However, the DFLL module contains logic to compare the DFLL output rate to a fixed reference clock (51 MHz) and make a decision to either lower or raise the DFLL supply voltage. The DFLL module can then autonomously change the supply voltage by communicating with an off-chip PMIC via either I2C or PWM signals. This driver currently supports only I2C. Signed-off-by: Tuomas Tynkkynen --- v2 changes: - query the various properties required for I2C mode from the regulator framework drivers/clk/tegra/clk-dfll.c | 656 ++- 1 file changed, 653 insertions(+), 3 deletions(-) diff --git a/drivers/clk/tegra/clk-dfll.c b/drivers/clk/tegra/clk-dfll.c index d83e859..0d4b2dd 100644 --- a/drivers/clk/tegra/clk-dfll.c +++ b/drivers/clk/tegra/clk-dfll.c @@ -205,12 +205,16 @@ ... + +/** + * dfll_calculate_rate_request - calculate DFLL parameters for a given rate + * @td: DFLL instance + * @req: DFLL-rate-request structure + * @rate: the desired DFLL rate + * + * Populate the DFLL-rate-request record @req fields with the scale_bits + * and mult_bits fields, based on the target input rate. Returns 0 upon + * success, or -EINVAL if the requested rate in req->rate is too high + * or low for the DFLL to generate. + */ +static int dfll_calculate_rate_request(struct tegra_dfll *td, + struct dfll_rate_req *req, + unsigned long rate) +{ + u32 val; + + /* +* If requested rate is below the minimum DVCO rate, active the scaler. +* In the future the DVCO minimum voltage should be selected based on +* chip temperature and the actual minimum rate should be calibrated +* at runtime. +*/ + req->scale_bits = DFLL_FREQ_REQ_SCALE_MAX - 1; + if (rate < td->dvco_rate_min) { + int scale; + + scale = DIV_ROUND_CLOSEST(rate / 1000 * DFLL_FREQ_REQ_SCALE_MAX, + td->dvco_rate_min / 1000); + if (!scale) { + dev_err(td->dev, "%s: Rate %lu is too low\n", + __func__, rate); + return -EINVAL; + } + req->scale_bits = scale - 1; + rate = td->dvco_rate_min; + } + + /* Convert requested rate into frequency request and scale settings */ + val = DVCO_RATE_TO_MULT(rate, td->ref_rate); + if (val > FREQ_MAX) { + dev_err(td->dev, "%s: Rate %lu is above dfll range\n", + __func__, rate); + return -EINVAL; + } + req->mult_bits = val; + req->dvco_target_rate = MULT_TO_DVCO_RATE(req->mult_bits, td->ref_rate); + req->rate = dfll_scale_dvco_rate(req->dvco_target_rate, +req->scale_bits); Should be dfll_scale_dvco_rate(req->scale_bits, req->dvco_target_rate); Thanks, Vince + req->lut_index = find_lut_index_for_rate(td, req->dvco_target_rate); + if (req->lut_index < 0) + return req->lut_index; + + return 0; +} + -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 3/3] ARM: clk-imx6q: Add missing lvds and anaclk clock to the clock tree
On Mon, Aug 11, 2014 at 11:09:36AM +0800, Shengjiu Wang wrote: > On Sat, Aug 09, 2014 at 09:58:42PM +0800, Shawn Guo wrote: > > On Fri, Aug 08, 2014 at 03:02:49PM +0800, Shengjiu Wang wrote: > > > @@ -176,8 +182,12 @@ static void __init imx6q_clocks_init(struct > > > device_node *ccm_node) > > >* the "output_enable" bit as a gate, even though it's really just > > >* enabling clock output. > > >*/ > > > - clk[IMX6QDL_CLK_LVDS1_GATE] = imx_clk_gate("lvds1_gate", "lvds1_sel", > > > base + 0x160, 10); > > > - clk[IMX6QDL_CLK_LVDS2_GATE] = imx_clk_gate("lvds2_gate", "lvds2_sel", > > > base + 0x160, 11); > > > + clk[IMX6QDL_CLK_LVDS1_GATE] = imx_clk_gate2("lvds1_gate", "lvds1_sel", > > > base + 0x160, 10); > > > + clk[IMX6QDL_CLK_LVDS2_GATE] = imx_clk_gate2("lvds2_gate", "lvds2_sel", > > > base + 0x160, 11); > > > > I do not think you can simply change to use imx_clk_gate2() here. It's > > designed for those CCGR gate clocks, each of which is controlled by two > > bits. > > > > Shawn > > > As Lucas Stach's suggestion, we need to do add some method for mutually > exclusive clock, > lvds1_gate with lvds1_in, lvds2_gate with lvds2_in. I add > imx_clk_gate2_exclusive() function in clk-gate2.c. > So I change imx_clk_gate() to imx_clk_gate2() here. > As you said, this is not good solution. It's not just a "not good" solution but wrong and broken one. The net result of that is if you call clk_enable() on lvds1_gate, both bit 10 and 11 will be set. > So I need your suggestion, how can I do? I guess we will need a new clock type to handle such mutually exclusive clocks, rather than patching clk-gate2. > First, is it allowable that to add imx_clk_gate2_exclusive() function, is > there a more better way? Again, this is completely wrong. > second, or should I change the clk-gate.c to add exclusive control? If such mutually exclusive clocks are somehow common across different clock controllers, we can propose to change clk-gate.c for handling them. But I'm not sure this is a common case. Shawn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 04/16] clk: tegra: Add library for the DFLL clock source (open-loop mode)
Hi, On 07/21/2014 11:38 PM, Tuomas Tynkkynen wrote: Add shared code to support the Tegra DFLL clocksource in open-loop mode. This root clocksource is present on the Tegra124 SoCs. The DFLL is the intended primary clock source for the fast CPU cluster. This code is very closely based on a patch by Paul Walmsley from December (http://comments.gmane.org/gmane.linux.ports.tegra/15273), which in turn comes from the internal driver by originally created by Aleksandr Frid . Subsequent patches will add support for closed loop mode and drivers for the Tegra124 fast CPU cluster DFLL devices, which rely on this code. Signed-off-by: Paul Walmsley Signed-off-by: Tuomas Tynkkynen --- v2 changes: - minor, moved the devm_regulator_get here drivers/clk/tegra/Makefile |1 + drivers/clk/tegra/clk-dfll.c | 1085 ++ drivers/clk/tegra/clk-dfll.h | 55 +++ 3 files changed, 1141 insertions(+) create mode 100644 drivers/clk/tegra/clk-dfll.c create mode 100644 drivers/clk/tegra/clk-dfll.h ... --- /dev/null +++ b/drivers/clk/tegra/clk-dfll.c ... + +/* + * Output clock scaler helpers + */ + +/** + * dfll_scale_dvco_rate - calculate scaled rate from the DVCO rate + * @scale_bits: clock scaler value (bits in the DFLL_FREQ_REQ_SCALE field) + * @dvco_rate: the DVCO rate + * + * Apply the same scaling formula that the DFLL hardware uses to scale + * the DVCO rate. + */ +static unsigned long dfll_scale_dvco_rate(int scale_bits, + unsigned long dvco_rate) +{ + return (u64)dvco_rate * (scale_bits + 1) / DFLL_FREQ_REQ_SCALE_MAX; +} ... +static u64 dfll_read_monitor_rate(struct tegra_dfll *td) +{ + u32 v, s; + u64 pre_scaler_rate, post_scaler_rate; + + if (!dfll_is_running(td)) + return 0; + + v = dfll_readl(td, DFLL_MONITOR_DATA); + v = (v & DFLL_MONITOR_DATA_VAL_MASK) >> DFLL_MONITOR_DATA_VAL_SHIFT; + pre_scaler_rate = dfll_calc_monitored_rate(v, td->ref_rate); + + s = dfll_readl(td, DFLL_FREQ_REQ); + s = (s & DFLL_FREQ_REQ_SCALE_MASK) >> DFLL_FREQ_REQ_SCALE_SHIFT; + post_scaler_rate = dfll_scale_dvco_rate(pre_scaler_rate, s); Should be dfll_scale_dvco_rate(s, pre_scaler_rate); Thanks, Vince -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] usb: phy: return -ENODEV on failure of try_module_get
When __usb_find_phy_dev() does not return error and try_module_get() fails, return -ENODEV. Signed-off-by: Arjun Sreedharan --- drivers/usb/phy/phy.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c index 36b6bce..fd0d7f1 100644 --- a/drivers/usb/phy/phy.c +++ b/drivers/usb/phy/phy.c @@ -232,6 +232,9 @@ struct usb_phy *usb_get_phy_dev(struct device *dev, u8 index) phy = __usb_find_phy_dev(dev, &phy_bind_list, index); if (IS_ERR(phy) || !try_module_get(phy->dev->driver->owner)) { dev_dbg(dev, "unable to find transceiver\n"); + if (!IS_ERR(phy)) + phy = ERR_PTR(-ENODEV); + goto err0; } -- 1.7.11.7 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Issue with clone() and CLONE_NEWUSER as unprivileged user
Hi, I am trying to use clone() and CLONE_NEWUSER for creating a new user namespace as an unprivileged user. I always get an operation not permitted error. However when I used fork() + unshare() as unprivileged user, I can create the new user namespace just fine. Is there something obvious that I am missing? My understand is that CLONE_NEWUSER should not require any special capabilities. I tried the sample code from the manpage and also from LWN.net, but both give me the same error. Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-gfx] Usage of _PAGE_PCD et al in i915 driver
On 08/15/2014 12:21 PM, Ville Syrjälä wrote: On Thu, Aug 14, 2014 at 05:55:11AM +0200, Juergen Gross wrote: On 08/13/2014 05:07 PM, Jesse Barnes wrote: On Fri, 8 Aug 2014 15:14:15 +0200 Daniel Vetter wrote: Adding relevant mailing lists. On Fri, Aug 8, 2014 at 1:23 PM, Juergen Gross wrote: I'm just about to create a patch for full PAT support in the Linux kernel, including Xen. For this purpose I introduce a translation between cache modes and pte bits. Scanning the kernel sources for usage of the cache mode bits in the pte I discovered drivers/gpu/drm/i915/i915_gem_gtt.h is using _PAGE_PCD, _PAGE_PWT and _PAGE_PAT. I think those defines are used to create ptes not for usage by the main processor, but for the graphics processor. Is this true? In this case I'd suggest to define i915-specific macros instead of using the x86 ones. Yeah, those are gpu specific PAT tables, but the hw engineers specifically designed this to match, and we've tried to follow the cpu side to match it. Especially in the future that will be somewhat important, since we want to fully share the entire address space between cpu and gpu on the next platform. Jesse is working on that. Right, we have an x86 compatible MMU in the GPU itself, so re-using the defines makes sense. I suppose with your work you'll move them and make them a bit more opaque? If so, we'll still want a way to get at them directly, or access your mapping functions for generating PTE bits for the GPU MMU. Using the mapping functions I'm introducing should work, if the MMU has an x86 compatible MSR_IA32_CR_PAT which is configured the same way as on the x86 processor (be aware that Xen is using another MSR_IA32_CR_PAT setting as the Linux kernel). We have a PAT that is structured the same way as the x86 PAT. But the contents of the PAT entries are obviously specific to the GPU so it's not identical. But the pcd/pwt/pat bits index the PAT in exactly the same way as on x86. See bdw_setup_private_ppat() and chv_setup_private_ppat() for how we set up the PAT. So you are using the PAT bit in the ptes, but the semantic for the GPU will be different as for the x86 processor, because the GPU PAT is set up differently from the x86 one. In case you are sharing ptes between GPU and x86 processor in future, this might lead to problems when the x86 processor will use ptes with the PAT bit set. Juergen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [RFC PATCH 04/10] scsi/constants: Cleanup printk message in scsi_dump_sense_buffer()
(2014/08/16 0:08), Ewan Milne wrote: On Fri, 2014-08-08 at 11:50 +, Yoshihiro YUNOMAE wrote: Unrecognized sense data should be output after linebuf is filled because "[%s] Unrecognized sense data (in hex): %s" message is output many times in loop. Signed-off-by: Yoshihiro YUNOMAE Cc: Hannes Reinecke Cc: Doug Gilbert Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: "James E.J. Bottomley" Cc: Hidehiro Kawai Cc: Masami Hiramatsu --- drivers/scsi/constants.c | 13 + 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c index 5956d4d..6fad6b4 100644 --- a/drivers/scsi/constants.c +++ b/drivers/scsi/constants.c @@ -1385,16 +1385,13 @@ EXPORT_SYMBOL(scsi_print_sense_hdr); static void scsi_dump_sense_buffer(struct scsi_device *sdev, const char *prefix, - const unsigned char *sense_buffer, int sense_len, - struct scsi_sense_hdr *sshdr) + const unsigned char *sense_buffer, int sense_len) { char linebuf[128]; int i, linelen, remaining; if (sense_len < 32) sense_len = 32; - sdev_printk(KERN_INFO, sdev, - "[%s] Unrecognized sense data (in hex):", prefix); remaining = sense_len; for (i = 0; i < sense_len; i += 16) { @@ -1403,9 +1400,10 @@ scsi_dump_sense_buffer(struct scsi_device *sdev, const char *prefix, hex_dump_to_buffer(sense_buffer + i, linelen, 16, 1, linebuf, sizeof(linebuf), false); - sdev_printk(KERN_INFO, sdev, "[%s] Sense: %s\n", - prefix, linebuf); } + sdev_printk(KERN_INFO, sdev, + "[%s] Unrecognized sense data (in hex): %s", + prefix, linebuf); } See my earlier comment regarding PATCH 03/10. This doesn't look right -- In Hannes' tree what the code is doing is printing out a separate line for each 16 bytes of the sense data. Your change will cause only the last (partial?) 16 bytes to be printed. That's true. We should not apply this as well. The removal of the unused sshdr argument is fine, though. Thanks! Yoshihiro YUNOMAE -Ewan static void @@ -1467,8 +1465,7 @@ void __scsi_print_sense(struct scsi_device *sdev, const char *name, if (!scsi_normalize_sense(sense_buffer, sense_len, &sshdr)) { /* this may be SCSI-1 sense data */ - scsi_dump_sense_buffer(sdev, name, sense_buffer, - sense_len, &sshdr); + scsi_dump_sense_buffer(sdev, name, sense_buffer, sense_len); return; } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 03/10] scsi/constants: Cleanup printk message in __scsi_print_command()
Hi Ewan, Thank you for your review. (2014/08/16 0:05), Ewan Milne wrote: On Fri, 2014-08-08 at 11:50 +, Yoshihiro YUNOMAE wrote: All bytes in CDB should be output after linebuf is filled because "[%s] CDB: %s\n" message is output many times in loop. Signed-off-by: Yoshihiro YUNOMAE Cc: Hannes Reinecke Cc: Doug Gilbert Cc: Martin K. Petersen Cc: Christoph Hellwig Cc: "James E.J. Bottomley" Cc: Hidehiro Kawai Cc: Masami Hiramatsu --- drivers/scsi/constants.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/scsi/constants.c b/drivers/scsi/constants.c index 9c38b8d..5956d4d 100644 --- a/drivers/scsi/constants.c +++ b/drivers/scsi/constants.c @@ -413,9 +413,8 @@ void __scsi_print_command(struct scsi_device *sdev, const char *prefix, hex_dump_to_buffer(cdb + i, linelen, 16, 1, linebuf, sizeof(linebuf), false); - sdev_printk(KERN_INFO, sdev, "[%s] CDB: %s\n", - prefix, linebuf); } + sdev_printk(KERN_INFO, sdev, "[%s] CDB: %s\n", prefix, linebuf); } EXPORT_SYMBOL(__scsi_print_command); This doesn't look right -- In Hannes' tree what the code is doing is printing out a separate line for each 16 bytes of the CDB. You change will cause only the last (partial?) 16 bytes to be printed. Ah, that's true. We should not apply this patch. Thanks, Yoshihiro YUNOMAE -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area
2014-08-18 오후 12:24, Theodore Ts'o 쓴 글: On Mon, Aug 18, 2014 at 10:15:32AM +0900, Gioh Kim wrote: My test platform has totally 1GB memory, 256MB for CMA and 768MB for normal. I applied Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64, so that 3/4 of allocation take place in normal area and 1/4 allocation take place in CMA area. And my platform has 4 ext4 partitions. Each ext4 partition has 2 page caches for superblock that are what this patch tries to move to out of CMA area. Therefore there are 8 page caches (8 pages size) that can prevent page migration. Yes, but are you actually *using* the ext4 partitions for anything? If this is a realistic real world use case, file systems are used to store, well, files, and that means there will be inodes and dentry cache entries that will also be allocated. Does your test scenario reflect real world usage? Yes. I'm working for LG Electronics. My test platform is currently selling item in the market. And also I test my patch when my platform is working as if real user uses it. I think the page caches of the inodes and dentry are held for short time. I can see pairs of get_bh and put_bh in inodes/dentry handling. I think inodes is allocated by kmem_cache_alloc in ext4_alloc_inode(). It is non-movable area allocation. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/3] time,signal: protect resource use statistics with seqlock
On Sat, 2014-08-16 at 19:50 +0200, Oleg Nesterov wrote: > On 08/16, Rik van Riel wrote: > > > > + do { > > + seq = nextseq; > > + read_seqbegin_or_lock(&sig->stats_lock, &seq); > > + times->utime = sig->utime; > > + times->stime = sig->stime; > > + times->sum_exec_runtime = sig->sum_sched_runtime; > > + > > + for_each_thread(tsk, t) { > > + task_cputime(t, &utime, &stime); > > + times->utime += utime; > > + times->stime += stime; > > + times->sum_exec_runtime += task_sched_runtime(t); > > + } > > + /* If lockless access failed, take the lock. */ > > + nextseq = 1; > > Yes, thanks, this answers my concerns. > > Cough... can't resist, and I still think that we should take rcu_read_lock() > only around for_each_thread() and the patch expands the critical section for > no reason. But this is minor, I won't insist. Hm. Should traversal not also disable preemption to preserve the error bound Peter mentioned? -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 6/9] gpiolib: add API to get gpio desc and flags
On Sunday, August 17, 2014 12:43:38 PM Darren Hart wrote: > On 8/17/14, 6:00, "Grant Likely" wrote: > > >> > >>+ /* Using device tree? */ > >>+ if (IS_ENABLED(CONFIG_OF) && dev->of_node) > >>+ desc = of_find_gpio(dev, NULL, idx, flags); > > > >of_find_gpio() doesn't exist. > > Hrm... As of 3.16.0 (e64df3ebe8262c8203d1fe4f541e0241c3112c01) > > $ git blame -L1455,1456 drivers/gpio/gpiolib.c > bae48da2 (Alexandre Courbot 2013-10-17 10:21:38 -0700 1455) static struct > gpio_desc *of_find_gpio(struct device *dev, const char *con_id, > > Have we removed this in -next or something? (on the plane, will verify > upon landing) In 3.17-rc1: rafael@vostro:~/src/linux-pm> grep -r of_find_gpio * drivers/gpio/gpiolib.c:static struct gpio_desc *of_find_gpio(struct device *dev, const char *con_id, drivers/gpio/gpiolib.c: desc = of_find_gpio(dev, con_id, idx, &lookupflags); Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/9] Driver core: Unified device properties interface for platform firmware
On Sunday, August 17, 2014 12:31:27 PM Darren Hart wrote: > On 8/17/14, 5:49, "Grant Likely" wrote: > > > > >Hi Mika and Rafael, > > > >Comments below... [cut] > ... > > >> @@ -701,6 +702,7 @@ struct acpi_dev_node { > >> * @archdata: For arch-specific additions. > >> * @of_node: Associated device tree node. > >> * @acpi_node:Associated ACPI device node. > >> + * @property_ops: Firmware interface for device properties > >> * @devt: For creating the sysfs "dev". > >> * @id: device instance > >> * @devres_lock: Spinlock to protect the resource of the device. > >> @@ -777,6 +779,7 @@ struct device { > >> > >>struct device_node *of_node; /* associated device tree node */ > >>struct acpi_dev_nodeacpi_node; /* associated ACPI device node */ > >> + struct dev_prop_ops *property_ops; > > > >There are only 2 users of this interface. I don't think adding an ops > >pointer to each and every struct device is warrented when the wrapper > >function can check if of_node or acpi_node is set and call the > >appropriate helper. It is unlikely anything else will use this hook. It > >will result in smaller memory footprint. Also smaller code when only one > >of > >CONFIG_OF and CONFIG_ACPI are selected, which is almost always. :-) > > > >It can be refactored later if that ever changes. > > > Our intent was to eliminate the #ifdefery in every one of the accessors. > It was my understanding the ops structures were preferable in such > situations. For a 64-bit machine with 1000 devices (all of which use > device properties) with one or the other of ACPI/OF enabled, the > additional memory requirement here is what... Something like (8*1000 + 4) > ~= 8KB ? That seems worth the arguably more maintainable code to me. Is > there more to it than this, am I missing some more significant impact? Also we wanted to avoid going throug the same sequence of checks every time a property is accessed for a given device as the result those checks would lead to every time was already known when the device was registered. Arguably, if we decide that using DTs and ACPI on the same system at the same time is a total no-go, then we'll need just one global ops pointer either to the ACPI or to the DT set of callbacks, but I'm not sure whether or not that is the way to go. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 3/9] Driver core: Unified device properties interface for platform firmware
On Sunday, August 17, 2014 01:49:13 PM Grant Likely wrote: > > Hi Mika and Rafael, > > Comments below... [cut] > > +enum dev_prop_type { > > + DEV_PROP_U8, > > + DEV_PROP_U16, > > + DEV_PROP_U32, > > + DEV_PROP_U64, > > + DEV_PROP_STRING, > > + DEV_PROP_MAX, > > +}; > > + > > +struct dev_prop_ops { > > + int (*get)(struct device *dev, const char *propname, void **valptr); > > + int (*read)(struct device *dev, const char *propname, > > + enum dev_prop_type proptype, void *val); > > + int (*read_array)(struct device *dev, const char *propname, > > + enum dev_prop_type proptype, void *val, size_t nval); > > The associated DT functions that implement property reads > (of_property_read_*) were created in part to provide some type safety > when reading properties. This proposed API throws that away by accepting > a void* for the data field, which I don't want to do. This API either > needs to have a separate accessor for each data type, or it needs some > other mechanism (accessor macros?) to ensure the right type is passed > in. The intention is to add static inline functions like: int device_property_read_u64(struct device *dev, const char *propname, u64 *val) { return device_property_read(dev, propname, DEV_PROP_U64, val); } and so on for the other property types. They just have not been implemented in this version of the patch. > > > + int (*child_count)(struct device *dev); > > +}; > > + > > +#ifdef CONFIG_ACPI > > +extern struct dev_prop_ops acpi_property_ops; > > +#endif > > Rendered moot by my comment about eliminating the ops structure, but the > above shouldn't appear here. acpi_property_ops shouldn't ever be visible > outside ACPI core code, so it shouldn't be in this header. It doesn't look like this has to be present here. At least this particular patch should compile just fine after removing the 3 lines above. That seems to be a leftover from one of the previous versions of it. Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] flush_icache_range: Export symbol to fix build errors
Fix building errors occuring due to a missing export of flush_icache_range() in architectures missing the export. Signed-off-by: Pranith Kumar Reported-by: Geert Uytterhoeven CC: Andrew Morton --- arch/arc/mm/cache_arc700.c |1 + arch/blackfin/include/asm/cacheflush.h |1 + arch/frv/include/asm/cacheflush.h |1 + arch/hexagon/mm/cache.c|1 + arch/metag/include/asm/cacheflush.h|1 + arch/sh/mm/cache.c |1 + arch/tile/kernel/smp.c |1 + arch/xtensa/kernel/smp.c |1 + 8 files changed, 8 insertions(+) diff --git a/arch/arc/mm/cache_arc700.c b/arch/arc/mm/cache_arc700.c index 4670afc..e88ddbf 100644 --- a/arch/arc/mm/cache_arc700.c +++ b/arch/arc/mm/cache_arc700.c @@ -581,6 +581,7 @@ void flush_icache_range(unsigned long kstart, unsigned long kend) tot_sz -= sz; } } +EXPORT_SYMBOL(flush_icache_range); /* * General purpose helper to make I and D cache lines consistent. diff --git a/arch/blackfin/include/asm/cacheflush.h b/arch/blackfin/include/asm/cacheflush.h index 9a5b2c5..0e2eb8c 100644 --- a/arch/blackfin/include/asm/cacheflush.h +++ b/arch/blackfin/include/asm/cacheflush.h @@ -70,6 +70,7 @@ static inline void flush_icache_range(unsigned start, unsigned end) } #endif } +EXPORT_SYMBOL(flush_icache_range); #define copy_to_user_page(vma, page, vaddr, dst, src, len) \ do { memcpy(dst, src, len);\ diff --git a/arch/frv/include/asm/cacheflush.h b/arch/frv/include/asm/cacheflush.h index edbac54..07ee4b3 100644 --- a/arch/frv/include/asm/cacheflush.h +++ b/arch/frv/include/asm/cacheflush.h @@ -72,6 +72,7 @@ static inline void flush_icache_range(unsigned long start, unsigned long end) { frv_cache_wback_inv(start, end); } +EXPORT_SYMBOL(flush_icache_range); #ifdef CONFIG_MMU extern void flush_icache_user_range(struct vm_area_struct *vma, struct page *page, diff --git a/arch/hexagon/mm/cache.c b/arch/hexagon/mm/cache.c index fe14ccf..0c76c80 100644 --- a/arch/hexagon/mm/cache.c +++ b/arch/hexagon/mm/cache.c @@ -68,6 +68,7 @@ void flush_icache_range(unsigned long start, unsigned long end) ); local_irq_restore(flags); } +EXPORT_SYMBOL(flush_icache_range); void hexagon_clean_dcache_range(unsigned long start, unsigned long end) { diff --git a/arch/metag/include/asm/cacheflush.h b/arch/metag/include/asm/cacheflush.h index 7787ec5..117c212 100644 --- a/arch/metag/include/asm/cacheflush.h +++ b/arch/metag/include/asm/cacheflush.h @@ -124,6 +124,7 @@ static inline void flush_icache_range(unsigned long address, metag_code_cache_flush((void *) address, endaddr - address); #endif } +EXPORT_SYMBOL(flush_icache_range); static inline void flush_cache_sigtramp(unsigned long addr, int size) { diff --git a/arch/sh/mm/cache.c b/arch/sh/mm/cache.c index 097c2cd..f770e39 100644 --- a/arch/sh/mm/cache.c +++ b/arch/sh/mm/cache.c @@ -229,6 +229,7 @@ void flush_icache_range(unsigned long start, unsigned long end) cacheop_on_each_cpu(local_flush_icache_range, (void *)&data, 1); } +EXPORT_SYMBOL(flush_icache_range); void flush_icache_page(struct vm_area_struct *vma, struct page *page) { diff --git a/arch/tile/kernel/smp.c b/arch/tile/kernel/smp.c index 01e8ab2..19eaa62 100644 --- a/arch/tile/kernel/smp.c +++ b/arch/tile/kernel/smp.c @@ -183,6 +183,7 @@ void flush_icache_range(unsigned long start, unsigned long end) preempt_enable(); } } +EXPORT_SYMBOL(flush_icache_range); /* Called when smp_send_reschedule() triggers IRQ_RESCHEDULE. */ diff --git a/arch/xtensa/kernel/smp.c b/arch/xtensa/kernel/smp.c index 40b5a37..4d02e38 100644 --- a/arch/xtensa/kernel/smp.c +++ b/arch/xtensa/kernel/smp.c @@ -571,6 +571,7 @@ void flush_icache_range(unsigned long start, unsigned long end) }; on_each_cpu(ipi_flush_icache_range, &fd, 1); } +EXPORT_SYMBOL(flush_icache_range); /* - */ -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] Input: hyperv-keyboard - implement Type Clipboard Text
> -Original Message- > From: Dmitry Torokhov > Sent: Saturday, August 16, 2014 0:58 AM > To: Dexuan Cui > > For each char in the string, the host sends 2 events (key down/up with the > > char's UNICODE value) to the guest. > > The patch finds each char's scan codes of key down/up, and injects the > > scan codes to the serio keyboard module. > > > > Known issues: > > 1) Only printable ASCII chars are supported, and unsupported chars are > > ignored. It seems unlikely to support generic UNICODE chars because there > > is not a generic API to inject a UNICODE char to text mode console, KDE, > > gnome, etc. > > > > 2) When we use the feature, make sure the CapsLock state of the VM's > > (virtual) keyboard is OFF because this patch assumes it -- we'll try to > > fix this later, probably by tracking the state of virtual CapsLock, because > > it looks the keyboard module doesn't supply an API for us to query the > state > > of the keyboard. > > > No way. If you want to do this this way, do it in hypervisor code and keep > feeding AT scan codes to hyperv-keyboard, although I am pretty sure users Hi Dmitry, Yeah, I had the same wish, but later I found this seems unlikely because IMO the feature was firstly invented for Windows VM + generic UNICODE chars, and we know there is no "scan code" for generic UNICODE chars... :-( > of > French, Czech and other keyboard layouts with numbers in upper register > and > symbols in lower will have a few choice words for you. Sorry, I can't understand what these are. Can you please give more details or a link to further info? > If you want real cut-and-paste support in various DEs I'd recommend > working > with VMware on open-vm-tools package to see what can be shared/reused > there. > Consider this NACked with prejudice. > Dmitry Thanks for the suggestion! Let me study open-vm-tools and report back. Thanks, -- Dexuan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: Tree for Aug 18
Hi all, On Mon, 18 Aug 2014 13:32:59 +1000 Stephen Rothwell wrote: > > Please do not add code intended for v3.18 until after v3.17-rc1 is > released. Which it is, of course ... -- Cheers, Stephen Rothwells...@canb.auug.org.au signature.asc Description: PGP signature
[PATCH] staging: comedi: s626: remove unnecessary variable initialization
We initialize 'irqbit' to 0, only to properly set it immediately afterwards. Just remove the zero-initialization. Signed-off-by: Chase Southwood Cc: Ian Abbott Cc: H Hartley Sweeten --- drivers/staging/comedi/drivers/s626.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/staging/comedi/drivers/s626.c b/drivers/staging/comedi/drivers/s626.c index 080608a..e42720c 100644 --- a/drivers/staging/comedi/drivers/s626.c +++ b/drivers/staging/comedi/drivers/s626.c @@ -1399,7 +1399,6 @@ static void s626_check_dio_interrupts(struct comedi_device *dev) uint8_t group; for (group = 0; group < S626_DIO_BANKS; group++) { - irqbit = 0; /* read interrupt type */ irqbit = s626_debi_read(dev, S626_LP_RDCAPFLG(group)); -- 2.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] staging: comedi: dt2801: change function return type to void
cppcheck was complaining that the variable 'stat' is being reassigned before the old value is used. Upon inspection, I found that dt2801_writecmd() cannot fail, always returns 0, and most callers already do not bother with assigning its return value anyway, so it makes sense to just change the return type for this function from int to void, and remove the two assignments to 'stat'. Signed-off-by: Chase Southwood Cc: Ian Abbott Cc: H Hartley Sweeten --- drivers/staging/comedi/drivers/dt2801.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/staging/comedi/drivers/dt2801.c b/drivers/staging/comedi/drivers/dt2801.c index ad8ba0b..c16d468 100644 --- a/drivers/staging/comedi/drivers/dt2801.c +++ b/drivers/staging/comedi/drivers/dt2801.c @@ -309,7 +309,7 @@ static int dt2801_wait_for_ready(struct comedi_device *dev) return -ETIME; } -static int dt2801_writecmd(struct comedi_device *dev, int command) +static void dt2801_writecmd(struct comedi_device *dev, int command) { int stat; @@ -323,8 +323,6 @@ static int dt2801_writecmd(struct comedi_device *dev, int command) if (!(stat & DT_S_READY)) dev_dbg(dev->class_dev, "!ready in %s, ignoring\n", __func__); outb_p(command, dev->iobase + DT2801_CMD); - - return 0; } static int dt2801_reset(struct comedi_device *dev) @@ -380,7 +378,7 @@ static int probe_number_of_ai_chans(struct comedi_device *dev) int data; for (n_chans = 0; n_chans < 16; n_chans++) { - stat = dt2801_writecmd(dev, DT_C_READ_ADIM); + dt2801_writecmd(dev, DT_C_READ_ADIM); dt2801_writedata(dev, 0); dt2801_writedata(dev, n_chans); stat = dt2801_readdata2(dev, &data); @@ -451,7 +449,7 @@ static int dt2801_ai_insn_read(struct comedi_device *dev, int i; for (i = 0; i < insn->n; i++) { - stat = dt2801_writecmd(dev, DT_C_READ_ADIM); + dt2801_writecmd(dev, DT_C_READ_ADIM); dt2801_writedata(dev, CR_RANGE(insn->chanspec)); dt2801_writedata(dev, CR_CHAN(insn->chanspec)); stat = dt2801_readdata2(dev, &d); -- 2.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Aug 18
Hi all, Please do not add code intended for v3.18 until after v3.17-rc1 is released. Changes since 20140815: The sound-asoc tree gained a build failure for which I reverted a commit. The regulator tree gained a build failure so I used the verison from next-20140815. The staging tree gained a build failure for which I applied a fix patch. Non-merge commits (relative to Linus' tree): 1010 955 files changed, 25072 insertions(+), 21100 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a multi_v7_defconfig for arm. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and allyesconfig (this fails its final link) and i386, sparc, sparc64 and arm defconfig. Below is a summary of the state of the merge. I am currently merging 220 trees (counting Linus' and 30 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (7d1311b93e58 Linux 3.17-rc1) Merging fixes/master (23cf8d3ca0fd powerpc: Fix "attempt to move .org backwards" error) Merging kbuild-current/rc-fixes (dd5a6752ae7d firmware: Create directories for external firmware) Merging arc-current/for-curr (89ca3b881987 Linux 3.15-rc4) Merging arm-current/fixes (e57e41931134 ARM: wire up memfd_create syscall) Merging m68k-current/for-linus (9117710a5997 m68k/sun3: Remove define statement no longer needed) Merging metag-fixes/fixes (ffe6902b66aa asm-generic: remove _STK_LIM_MAX) Merging mips-fixes/mips-fixes (08a9c3c9afcf MIPS: OCTEON: make get_system_type() thread-safe) Merging powerpc-merge/merge (396a34340cdf powerpc: Fix endianness of flash_block_list in rtas_flash) Merging sparc/master (c9d26423e56c Merge tag 'pm+acpi-3.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm) Merging net/master (21009686662f net: phy: smsc: move smsc_phy_config_init reset part in a soft_reset function) Merging ipsec/master (a0e5ef53aac8 xfrm: Fix installation of AH IPsec SAs) Merging sound-current/for-linus (f3ee07d8b6e0 ALSA: hda/realtek - Avoid setting wrong COEF on ALC269 & co) Merging pci-current/for-linus (9baa3c34ac4e PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use) Merging wireless/master (77b2f2865956 iwlwifi: mvm: disable scheduled scan to prevent firmware crash) Merging driver-core.current/driver-core-linus (7d1311b93e58 Linux 3.17-rc1) Merging tty.current/tty-linus (7d1311b93e58 Linux 3.17-rc1) Merging usb.current/usb-linus (7d1311b93e58 Linux 3.17-rc1) Merging usb-gadget-fixes/fixes (a8a85b01d185 usb: musb/cppi41: call musb_ep_select() before accessing an endpoint's CSR) CONFLICT (content): Merge conflict in drivers/usb/musb/musb_host.c Merging usb-serial-fixes/usb-linus (7d1311b93e58 Linux 3.17-rc1) Merging staging.current/staging-linus (eb29835fb3ae staging: android: fix a possible memory leak) Merging char-misc.current/char-misc-linus (7d1311b93e58 Linux 3.17-rc1) Merging input-current/for-linus (91167e191467 Merge branch 'next' into for-linus) Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" stripe) Merging crypto-current/master (ce5481d01f67 crypto: drbg - fix failure of generating multiple of 2**16 bytes) Merging ide/master (a53dae49b2fe ide: use module_platform_driver()) Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff) Merging devicetree-current/devicetree/merge (5a12a597a862 arm: Add devicetree fixup machine function) Merging rr-fixes/fixes (79465d2fd48e module: remove warning about waiting module removal.) Merging vfio-fixes/for-linus (239a87020b26 Merge branch 'for-joerg/arm-smmu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into for-linus) Merging drm-intel-fixes/for-linux-next-fixes (103ae732ad26 drm/i915: Don't try to enable cursor from setplane when crtc is
Re: [PATCH v2] memory-hotplug: add sysfs zones_online_to attribute
On 2014/8/16 5:37, Toshi Kani wrote: > On Wed, 2014-08-13 at 12:10 +0800, Zhang Zhen wrote: >> Currently memory-hotplug has two limits: >> 1. If the memory block is in ZONE_NORMAL, you can change it to >> ZONE_MOVABLE, but this memory block must be adjacent to ZONE_MOVABLE. >> 2. If the memory block is in ZONE_MOVABLE, you can change it to >> ZONE_NORMAL, but this memory block must be adjacent to ZONE_NORMAL. >> >> With this patch, we can easy to know a memory block can be onlined to >> which zone, and don't need to know the above two limits. >> >> Updated the related Documentation. >> >> Change v1 -> v2: >> - optimize the implementation following Dave Hansen's suggestion >> >> Signed-off-by: Zhang Zhen >> --- >> Documentation/ABI/testing/sysfs-devices-memory | 8 >> Documentation/memory-hotplug.txt | 4 +- >> drivers/base/memory.c | 62 >> ++ >> include/linux/memory_hotplug.h | 1 + >> mm/memory_hotplug.c| 2 +- >> 5 files changed, 75 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/ABI/testing/sysfs-devices-memory >> b/Documentation/ABI/testing/sysfs-devices-memory >> index 7405de2..2b2a1d7 100644 >> --- a/Documentation/ABI/testing/sysfs-devices-memory >> +++ b/Documentation/ABI/testing/sysfs-devices-memory >> @@ -61,6 +61,14 @@ Users:hotplug memory remove tools >> >> http://www.ibm.com/developerworks/wikis/display/LinuxP/powerpc-utils >> >> >> +What: /sys/devices/system/memory/memoryX/zones_online_to > > I think this name is a bit confusing. How about "valid_online_types"? > Thanks for your suggestion. This patch has been added to -mm tree. If most people think so, i would like to modify the interface name. If not, let's leave it as it is. Best regards! > Thanks, > -Toshi > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area
On Mon, Aug 18, 2014 at 10:15:32AM +0900, Gioh Kim wrote: > > My test platform has totally 1GB memory, 256MB for CMA and 768MB for normal. > I applied Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64, so that > 3/4 of allocation take place in normal area and 1/4 allocation take place in > CMA area. > > And my platform has 4 ext4 partitions. Each ext4 partition has 2 page caches > for superblock that > are what this patch tries to move to out of CMA area. > Therefore there are 8 page caches (8 pages size) that can prevent page > migration. Yes, but are you actually *using* the ext4 partitions for anything? If this is a realistic real world use case, file systems are used to store, well, files, and that means there will be inodes and dentry cache entries that will also be allocated. Does your test scenario reflect real world usage? Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
On 2014/8/18 9:13, tangchen wrote: > Hi tj, > > On 08/17/2014 07:08 PM, Tejun Heo wrote: >> Hello, >> >> On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote: >>> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug(). >> Yeah, that one. >> >>> If we don't clear hotpluggable flag in free_low_memory_core_early(), the >>> memory which marked hotpluggable flag will not free to buddy allocator. >>> Because __next_mem_range() will skip them. >>> >>> free_low_memory_core_early >>> for_each_free_mem_range >>> for_each_mem_range >>> __next_mem_range >> Ah, okay, so the patch fixes __next_mem_range() and thus makes >> free_low_memory_core_early() to skip hotpluggable regions unlike >> before. Please explain things like that in the changelog. Also, >> what's its relationship with numa_clear_kernel_node_hotplug()? Do we >> still need them? If so, what are the different roles that these two >> separate places serve? > > numa_clear_kernel_node_hotplug() only clears hotplug flags for the nodes > the kernel resides in, not for hotpluggable nodes. The reason why we did > this is to enable the kernel to allocate memory in case all the nodes are > hotpluggable. > Hi TangChen, I find a problem in numa_init() (arch/x86/mm/numa.c) numa_init() ... ret = init_func(); // this will mark hotpluggable flag from SRAT ... memblock_set_bottom_up(false); ... ret = numa_register_memblks(&numa_meminfo); // this will alloc node data(pglist_data) ... numa_clear_kernel_node_hotplug(); // in case all the nodes are hotpluggable ... If all the nodes are marked hotpluggable flag, alloc node data will fail. Because __next_mem_range_rev() will skip the hotpluggable memory regions. numa_register_memblks() setup_node_data() memblock_find_in_range_node() __memblock_find_range_top_down() for_each_mem_range_rev() __next_mem_range_rev() What do you think? How about move numa_clear_kernel_node_hotplug() into numa_register_memblks(), like this: numa_register_memblks() ... memblock_set_node(mb->start, mb->end - mb->start, &memblock.reserved, mb->nid); } +numa_clear_kernel_node_hotplug(); /* * If sections array is gonna be used for pfn -> nid mapping, check ... Thanks, Xishi Qiu > And we clear hotplug flags for all the nodes in free_low_memory_core_early() > is because if we do not, all hotpluggable memory won't be able to be freed > to buddy after Qiu's patch. > > Thanks. > > > . > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] earlyprintk: re-enable earlyprintk calling early_param
2014년 08월 16일 03:34, Rusty Russell 쓴 글: kpark3...@gmail.com writes: From: Sahara Although there are many obs_kernel_param and its names are earlyprintk and also EARLY_PRINTK is also enabled, we could not see the early_printk output properly until now. This patch considers earlycon as well as earlyprintk. Hmm, the initial "earlycon" hack slipped in when I wasn't looking. I don't think we should extend it. Why not make the thing(s) you want early_param()s? Cheers, Rusty. The earlycon and the earlyprintk are scattered and used in many architectures. It looks earlycon just could be a subset of earlyprintk. The earlycon is for uart specific, while the earlyprintk is to support vga, efi, xen, serial, and so on. Especially ARM uses earlyprintk in many places. And, I am not sure if this is a good chance to replace all the earlyprintk with the earlycon. As of now, it's fair for both earlycon and earlyprintk. Or perhaps removing case#2, see in my previous email to Andrew Morton, is better?, so users be forced to specify earlycon and earlyprintk in cmdline if they want to see early_printk() output. Thanks. Best Regards, Sahara. --- a/init/main.c +++ b/init/main.c @@ -426,7 +426,8 @@ static int __init do_early_param(char *param, char *val, const char *unused) for (p = __setup_start; p < __setup_end; p++) { if ((p->early && parameq(param, p->str)) || (strcmp(param, "console") == 0 && -strcmp(p->str, "earlycon") == 0) +((strcmp(p->str, "earlycon") == 0) || +(strcmp(p->str, "earlyprintk") == 0))) ) { if (p->setup_func(val) != 0) pr_warn("Malformed early option '%s'\n", param); -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH net-next] vhost_net: stop rx net polling when possible
On 08/17/2014 06:20 PM, Michael S. Tsirkin wrote: > On Fri, Aug 15, 2014 at 11:40:08AM +0800, Jason Wang wrote: >> After rx vq was enabled, we never stop polling its socket. This is sub >> optimal >> when may lead unnecessary wake-ups after the rx net work has already been >> queued. This could be optimized by stopping polling the rx net sock when >> processing both rx and tx and restart it afterward. This could save >> unnecessary >> wake-ups and even unnecessary spin locks acquiring with the help of commit >> 9e641bdcfa4ef4d6e2fbaa59c1be0ad5d1551fd5 "net-tun: restructure tun_do_read >> for >> better sleep/wakeup efficiency". > OK so the point is to avoid expensive wake_up_process calls? > It's a bit unfortunate that we are adding/removing things from wait > queue which certainly does take extra spin-locks. When nothing new were queued during vhost thread is running. This change may add two more spin-locks which may not but optimal. But if several packets were queued by tun during vhost thread is running, it may save lots of unnecessary wake ups. So the patch helps the performance in the heavy load case for sure. In light load case, it may hurt some throughput but cpu and thru/cpu is still saved. > > > >> Test shows significant CPU% savings during almost all the cases: >> >> Guest rx stream: >> size(B)/sessions/throughput/cpu/normalized thru/ >> 64/1/+0.7773% -8.6224% +10.2866% >> 64/2/+0.6335% -13.9109%+16.8946% >> 64/4/-0.8182% -14.8336%+16.4565% >> 64/8/+0.4830% -13.7675%+16.5256% >> 256/1/-7.0963% -12.6880%+6.4043% >> 256/2/-1.3982% -11.5424%+11.4678% >> 256/4/-0.0350% -11.8323%+13.3806% >> 256/8/-1.5830% -12.7693%+12.8238% >> 1024/1/-7.4895% -19.1449% +14.4152% >> 1024/2/-7.4575% -19.4018% +14.8195% >> 1024/4/-0.3881% -9.1183%+9.6061% >> 1024/8/+0.4713% -11.0155% +12.9087% >> 4096/1/+0.8786% -8.4050%+10.1355% >> 4096/2/+0.0098% -15.3094% +18.0885% >> 4096/4/+0.0445% -10.8247% +12.1886% >> 4096/8/-2.1317% -12.5111% +11.8637% >> 16384/1/-0.0008% -6.1891%+6.5966% >> 16384/2/-0.0117% -16.2716% +19.4198% >> 16384/4/+0.0001% -5.9197%+6.2923% >> 16384/8/+0.0173% -7.6681%+8.3236% >> 65535/1/+0.0011% -10.3594% +11.5578% >> 65535/2/-0.4108% -14.4304% +16.3838% >> 65535/4/+0.0011% -10.3594% +11.5578% >> 65535/8/-0.4108% -14.4304% +16.3838% >> >> Guest tx stream: >> size(B)/sessions/throughput/cpu/normalized thru/ >> 64/1/-0.6228% -2.1936% +1.6060% >> 64/2/+0.8646% -3.5063% +4.5297% >> 64/4/+0.8733% -3.2495% +4.2613% >> 64/8/+1.4290% -3.5593% +5.1724% >> 256/1/+7.2098%-3.1122% +10.6535% >> 256/2/-10.1408% -6.8230% -3.5607% >> 256/4/-11.3531% -6.7085% -4.9785% >> 256/8/-10.2723% -6.5628% -3.9701% >> 1024/1/-18.9329% -13.6162%-6.1547% >> 1024/2/-0.3728% -1.3181% +0.9580% >> 1024/4/+0.0125% -3.6338% +3.7838% >> 1024/8/-0.0030% -2.7282% +2.8017% >> 4096/1/+16.9367% -1.9435% +19.2543% >> 4096/2/+0.0121% -6.1682% +6.5866% >> 4096/4/+0.0019% -3.8510% +4.0072% >> 4096/8/-0.0222% -4.1368% +4.2922% >> 16384/1/-0.0026% -8.6892% +9.5132% >> 16384/2/-0.0012% -10.1676%+11.3171% >> 16384/4/+0.0196% -1.2551% +1.2908% >> 16384/8/+0.1303% -3.2634% +3.5082% >> 65535/1/+0.0019% -3.4694% +3.5961% >> 65535/2/-0.0003% -0.7635% +0.7690% >> 65535/4/-0.0219% -2.7875% +2.8448% >> 65535/8/+0.1137% -2.7922% +2.9894% >> >> TCP_RR: >> size(B)/sessions/throughput/cpu/normalized thru/ >> 256/1/+1.9004%-4.7985% +7.0366% >> 256/25/-4.7366% -11.0809%+7.1349% >> 256/50/+3.9808% -5.2037% +9.6887% >> 4096/1/+2.1619% -0.7303% +2.9134% >> 4096/25/-13.1836% -14.7298%+1.8134% >> 4096/50/-11.1990% -15.4763%+5.0605% >> >> Signed-off-by: Jason Wang > > Could you split RX/TX parts out please, and benchmark separately? > > They are really independent. Ok. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: power supply gating with ltc2978
On Sat, Aug 16, 2014 at 02:20:50PM +0100, Mark Brown wrote: > On Fri, Aug 15, 2014 at 04:34:49PM -0500, atull wrote: > > > I am interested in adding functionality to be able to gate power supplies > > going through a ltc2978. I see that there is a hwmon driver already > > existing (hwmon/pmbus/ltc2978.c). I see some of the other hwmon drivers > > have MFD's. It looks like this ltc driver would need a MFD and a > > regulator driver added. However I don't see other pmbus hwmon drivers > > using MFD. > > > So I am asking for recommendations and reservations on how to proceed here > > before I get too far with this. > > Without knowing anything at all about pmbus or this particular hardware > it's hard to comment but what you're saying here sounds sensible (though > I do see that apparently splitting the drivers may not actually be > sensible from Guenter's followup). I had originally thought about converting the pmbus drivers to mfd with client drivers, but I concluded that it would add a lot of complexity with little gain. It makes sense to separate a driver into mfd and a number of client drivers if a device has clear functional blocks for the different devices it supports. With PMBus, this is not the case. Separating a PMBus driver would be a purely artificial costruct, and there would be overlapping functionality. Separating just a single driver out of the group of PMBus drivers, as seems to be suggested above, makes even less sense as one simply can not separate the core PMBus driver code from its front-end drivers. On the other side, adding regulator support into the PMBus driver code would make a lot of sense. It should also be quite straightforward. Or anyway that is my opinion. If someone wants to spend the time and separate the PMBus drivers into an MfD part and hwmon and regulator client drivers, I'll be happy to look at the resulting patch set. Guenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mmc: core: sdio: Fix unconditional wake_up_process() on sdio thread
>From 21266249bbbaf9407c1e88cd5950e06ac88aeebf Mon Sep 17 00:00:00 2001 From: Fu Zhonghui Date: Mon, 18 Aug 2014 10:48:14 +0800 Subject: [PATCH] mmc: core: sdio: Fix unconditional wake_up_process() on sdio thread 781e989cf59 ("mmc: sdhci: convert to new SDIO IRQ handling") and bf3b5ec66bd ("mmc: sdio_irq: rework sdio irq handling") disabled the use of our own custom threaded IRQ handler, but left in an unconditional wake_up_process() on that handler at resume-time. Link: https://bugzilla.kernel.org/show_bug.cgi?id=80151 In addition, the check for MMC_CAP_SDIO_IRQ capability is added before enable sdio IRQ. Signed-off-by: Jaehoon Chung Signed-off-by: Chris Ball Signed-off-by: Ulf Hansson Signed-off-by: Fu Zhonghui --- drivers/mmc/core/sdio.c | 12 ++-- drivers/mmc/core/sdio_irq.c |4 ++-- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c index e636d9e..3fc40a7 100644 --- a/drivers/mmc/core/sdio.c +++ b/drivers/mmc/core/sdio.c @@ -992,8 +992,16 @@ static int mmc_sdio_resume(struct mmc_host *host) } } - if (!err && host->sdio_irqs) - wake_up_process(host->sdio_irq_thread); + if (!err && host->sdio_irqs) { + if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD)) { + wake_up_process(host->sdio_irq_thread); + } else if (host->caps & MMC_CAP_SDIO_IRQ) { + mmc_host_clk_hold(host); + host->ops->enable_sdio_irq(host, 1); + mmc_host_clk_release(host); + } + } + mmc_release_host(host); host->pm_flags &= ~MMC_PM_KEEP_POWER; diff --git a/drivers/mmc/core/sdio_irq.c b/drivers/mmc/core/sdio_irq.c index 5cc13c8..696eca4 100644 --- a/drivers/mmc/core/sdio_irq.c +++ b/drivers/mmc/core/sdio_irq.c @@ -208,7 +208,7 @@ static int sdio_card_irq_get(struct mmc_card *card) host->sdio_irqs--; return err; } - } else { + } else if (host->caps & MMC_CAP_SDIO_IRQ) { mmc_host_clk_hold(host); host->ops->enable_sdio_irq(host, 1); mmc_host_clk_release(host); @@ -229,7 +229,7 @@ static int sdio_card_irq_put(struct mmc_card *card) if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD)) { atomic_set(&host->sdio_irq_thread_abort, 1); kthread_stop(host->sdio_irq_thread); - } else { + } else if (host->caps & MMC_CAP_SDIO_IRQ) { mmc_host_clk_hold(host); host->ops->enable_sdio_irq(host, 0); mmc_host_clk_release(host); -- 1.7.1 On 2014/8/12 18:23, Ulf Hansson wrote: > On 11 August 2014 07:49, Fu, Zhonghui wrote: >> From 6cee984e1d76ba0a3320430f8cf4318ab65fcf06 Mon Sep 17 00:00:00 2001 >> From: Fu Zhonghui >> Date: Tue, 5 Aug 2014 12:44:38 +0800 >> Subject: [PATCH] mmc: core: sdio: Fix unconditional wake_up_process() on >> sdio thread >> >> 781e989cf59 ("mmc: sdhci: convert to new SDIO IRQ handling") and >> bf3b5ec66bd ("mmc: sdio_irq: rework sdio irq handling") disabled >> the use of our own custom threaded IRQ handler, but left in an >> unconditional wake_up_process() on that handler at resume-time. >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=80151 >> >> In addition, the check for MMC_CAP_SDIO_IRQ capability is added >> before enable sdio IRQ. >> >> Signed-off-by: Jaehoon Chung >> Signed-off-by: Chris Ball >> Signed-off-by: Fu Zhonghui >> --- >> drivers/mmc/core/sdio.c | 14 -- >> drivers/mmc/core/sdio_irq.c |4 ++-- >> 2 files changed, 14 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c >> index e636d9e..e04a540 100644 >> --- a/drivers/mmc/core/sdio.c >> +++ b/drivers/mmc/core/sdio.c >> @@ -992,8 +992,18 @@ static int mmc_sdio_resume(struct mmc_host *host) >> } >> } >> >> - if (!err && host->sdio_irqs) >> - wake_up_process(host->sdio_irq_thread); >> + if (!err && host->sdio_irqs) { >> + if (!(host->caps2 & MMC_CAP2_SDIO_IRQ_NOTHREAD)) { >> + wake_up_process(host->sdio_irq_thread); >> + } else if (host->caps & MMC_CAP_SDIO_IRQ) { >> + mmc_release_host(host); > Why mmc_release_host() and the corresponding mmc_claim_host() below? > Those shouldn't be needed I think. You are right. These two functions shouldn't be invoked here. I made a new patch as above. Thanks, Zhonghui > > >> + mmc_host_clk_hold(host); >> + host->ops->enable_sdio_irq(host, 1); >> + mmc_host_clk_release(host); >> + mmc_claim_host(host); >> + } >>
Re: [PATCH] earlyprintk: re-enable earlyprintk calling early_param
2014년 08월 15일 05:34, Andrew Morton 쓴 글: On Thu, 14 Aug 2014 19:13:36 +0900 kpark3...@gmail.com wrote: From: Sahara Although there are many obs_kernel_param and its names are earlyprintk and also EARLY_PRINTK is also enabled, we could not see the early_printk output properly until now. This patch considers earlycon as well as earlyprintk. Sorry, I just don't understand this description. What does the patch actually do? What was the kernel behaviour without the patch and what is the kernel behaviour with the patch? Without this patch, - earlycon case - if early_param("earlycon", ...) is defined and case #1: if cmdline has "earlycon", then it satisfies the condition "(p->early && parameq(param, p->str))". You can see early_printk(). case #2: if cmdline has "console", then it satisfies the condition "strcmp(param, "console") == 0 && strcmp(p->str, "earlycon") == 0". You can see early_printk(). - earlyprintk case - if early_param("earlyprintk", ...) is defined and case #1: if cmdline has "earlyprintk", then it satisfies the condition "(p->early && parameq(param, p->str))". You can see early_printk(). case #2: if cmdline has "console", it does not satisfies the condition, because it only checks out "earlycon" only. This patch fixes the case #2 problem of earlyprintk. Thanks. Best Regards, Sahara. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 26/35] [PATCH 26/36] powerpc: Replace __get_cpu_var uses
korg tester found an issue: From: Christoph Lameter Subject: powerpc: Fix reference to opal_trace_depth. depth is an address and not a scalar. Use & to determine the address. Signed-off-by: Christoph Lameter Index: linux/arch/powerpc/platforms/powernv/opal-tracepoints.c === --- linux.orig/arch/powerpc/platforms/powernv/opal-tracepoints.c +++ linux/arch/powerpc/platforms/powernv/opal-tracepoints.c @@ -48,7 +48,7 @@ void __trace_opal_entry(unsigned long op local_irq_save(flags); - depth = this_cpu_ptr(opal_trace_depth); + depth = this_cpu_ptr(&opal_trace_depth); if (*depth) goto out; @@ -69,7 +69,7 @@ void __trace_opal_exit(long opcode, unsi local_irq_save(flags); - depth = this_cpu_ptr(opal_trace_depth); + depth = this_cpu_ptr(&opal_trace_depth); if (*depth) goto out; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] of: Add of_match_machine helper
On Fri, 8 Aug 2014 02:01:53 +0300, Tuomas Tynkkynen wrote: > Add of_match_machine function to test the device tree root for an > of_match array. This can be useful when testing SoC versions at runtime, > for example. > > Signed-off-by: Tuomas Tynkkynen > --- > drivers/of/base.c | 21 + > include/linux/of.h | 3 +++ > 2 files changed, 24 insertions(+) > > diff --git a/drivers/of/base.c b/drivers/of/base.c > index d8574ad..37798ea 100644 > --- a/drivers/of/base.c > +++ b/drivers/of/base.c > @@ -977,6 +977,27 @@ struct device_node > *of_find_matching_node_and_match(struct device_node *from, > EXPORT_SYMBOL(of_find_matching_node_and_match); > > /** > + * of_match_machine - Tell if root of device tree has a matching of_match > struct > + * @matches: array of of device match structures to search in > + * > + * Returns the result of of_match_node for the root node. > + */ > +const struct of_device_id *of_match_machine(const struct of_device_id > *matches) > +{ > + const struct of_device_id *match; > + struct device_node *root; > + > + root = of_find_node_by_path("/"); > + if (!root) > + return NULL; > + > + match = of_match_node(matches, root); > + of_node_put(root); > + return match; > +} > +EXPORT_SYMBOL(of_match_machine); Too wordy... return of_match_node(matches, of_allnodes); :-) It could be a static inline, but I don't think it's even worth having a helper. The callers could just open code the above. g. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] regulator: core: Fix build error due to const qualifier for ops
Drop const qualifier for ops of struct regulator_desc. Allow regulator drivers to update ops before registering regulator. Fix below build error: CC [M] drivers/regulator/mc13892-regulator.o drivers/regulator/mc13892-regulator.c: In function 'mc13892_regulator_probe': drivers/regulator/mc13892-regulator.c:586:3: error: assignment of member 'set_mode' in read-only object drivers/regulator/mc13892-regulator.c:588:3: error: assignment of member 'get_mode' in read-only object make[2]: *** [drivers/regulator/mc13892-regulator.o] Error 1 make[1]: *** [drivers/regulator] Error 2 make: *** [drivers] Error 2 Reported-by: Stephen Rothwell Signed-off-by: Axel Lin --- include/linux/regulator/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h index efe058f..3abda75 100644 --- a/include/linux/regulator/driver.h +++ b/include/linux/regulator/driver.h @@ -246,7 +246,7 @@ struct regulator_desc { int id; bool continuous_voltage_range; unsigned n_voltages; - const struct regulator_ops *ops; + struct regulator_ops *ops; int irq; enum regulator_type type; struct module *owner; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3] of: Add of_match_machine helper
On Fri, 8 Aug 2014 14:01:57 -0500, Rob Herring wrote: > On Fri, Aug 8, 2014 at 8:23 AM, Tuomas Tynkkynen > wrote: > > > > > > On 08/08/14 12:41, Thierry Reding wrote: > >> > >>> +const struct of_device_id *of_match_machine(const struct of_device_id > >>> *matches) > >>> +{ > >>> +const struct of_device_id *match; > >>> +struct device_node *root; > >>> + > >>> +root = of_find_node_by_path("/"); > >>> +if (!root) > >>> +return NULL; > >>> + > >>> +match = of_match_node(matches, root); > >>> +of_node_put(root); > >>> +return match; > >>> +} > >>> +EXPORT_SYMBOL(of_match_machine); > >> > >> I wonder if of_find_node_by_path("/") is somewhat overkill here. Perhaps > >> simply of_node_get(of_allnodes) would be more appropriate here since the > >> function is implemented in the core? > > > > of_machine_is_compatible() uses of_find_node_by_path("/") as well, > > of_allnodes > > seems to be only used when during iterating. So I'd prefer to have them > > consistent. > > Agreed. Disagreed. of_machine_is_compatible should be simplified. g. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, 17 Aug 2014, Jason Cooper wrote: > On Sun, Aug 17, 2014 at 09:35:11PM -0400, Nicolas Pitre wrote: > > On Sun, 17 Aug 2014, Jason Cooper wrote: > > > > > On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote: > > > > On Sun, 17 Aug 2014, Jason Cooper wrote: > > > > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux > > > > > wrote: > > > > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote: > > > > > > > Applied to irqchip/urgent with Nico's Ack. > > > > > > > > > > > > Interesting, so I'm discussing this patch, and it gets applied > > > > > > anyway... > > > > > > yes, that's great. > > > > > > > > > > Quoting Nico: > > > > > > > > > > "Of course it would be good to clarify things wrt Russell's remark > > > > > independently from this patch." > > > > > > > > > > I took 'independently' to mean "This patch is ok, *and* we need to > > > > > address Russell's concerns in a follow-up patch." > > > > > > > > > > Nico's Reviewed-by with that comment was sent August 13th. The most > > > > > recent activity on this thread was also August 13th. After four > > > > > days, I > > > > > reasoned there were no objections to his comment. > > > > > > > > Well... I mentioned this patch is a nice cleanup independently of the > > > > reason why it was created in the first place. > > > > > > Ah, fair enough. > > > > > > > Maybe that shouldn't be sorted as "urgent" in that case, especially > > > > when the code having problem with the current state of things is > > > > living out of mainline. > > > > > > hmmm, yes. I've been grappling with the semantics of '/urgent' vice > > > '/fixes'. With mvebu, /fixes is the branch for all changes needing to go > > > into the current -rcX cycle. For irqchip, Thomas suggested /urgent for > > > the equivalent branch. To me, they serve the same purpose. > > > Unfortunately, I occasionally hear "Well, it's not _urgent_ ...". I > > > suppose I'll put up with it for one more cycle and then change it to > > > /fixes. :) > > > > > > wrt this patch, I need to drop it anyway. I was a bit rusty (it's been > > > a few weeks) and forgot to add the Cc -stable and Fixes: tags. I do > > > agree, though, it's certainly not urgent. > > > > Given the raised issue has to do with out-of-tree code, there is no need > > to CC stable in that case anyway. > > I could go either way here. On the one hand, a fix is a fix is a fix. > On the other, if it can't be triggered in mainline, we shouldn't accept > it at all. For mainline, it should be accepted as a cleanup and minor optimization since no mainline code is currently affected by the absence of this patch. If there is a real bug being fixed by this patch, and whether the best way to fix it is by relying on this patch, is still up for debate. > Stephen, is the out of tree code that triggered this bound for mainline? Maybe "mainline", but certainly not "stable". Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] KVM: x86: Increase the number of fixed MTRR regs to 10
Hi Nadav, On Wed, Jun 18, 2014 at 05:21:19PM +0300, Nadav Amit wrote: >Recent Intel CPUs have 10 variable range MTRRs. Since operating systems >sometime make assumptions on CPUs while they ignore capability MSRs, it is >better for KVM to be consistent with recent CPUs. Reporting more MTRRs than >actually supported has no functional implications. > >Signed-off-by: Nadav Amit >--- > arch/x86/include/asm/kvm_host.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >index 4931415..0bab29d 100644 >--- a/arch/x86/include/asm/kvm_host.h >+++ b/arch/x86/include/asm/kvm_host.h >@@ -95,7 +95,7 @@ static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, >int level) > #define KVM_REFILL_PAGES 25 > #define KVM_MAX_CPUID_ENTRIES 80 > #define KVM_NR_FIXED_MTRR_REGION 88 >-#define KVM_NR_VAR_MTRR 8 >+#define KVM_NR_VAR_MTRR 10 > We observed that there is obvious regression caused by this commit, 32bit win7 guest show blue screen during boot. Regards, Wanpeng Li > #define ASYNC_PF_PER_VCPU 64 > >-- >1.9.1 > >-- >To unsubscribe from this list: send the line "unsubscribe kvm" in >the body of a message to majord...@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V2] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
Let memblock skip the hotpluggable memory regions in __next_mem_range(), it is used to to prevent memblock from allocating hotpluggable memory for the kernel at early time. The code is the same as __next_mem_range_rev(). Clear hotpluggable flag before releasing free pages to the buddy allocator. If we don't clear hotpluggable flag in free_low_memory_core_early(), the memory which marked hotpluggable flag will not free to buddy allocator. Because __next_mem_range() will skip them. free_low_memory_core_early for_each_free_mem_range for_each_mem_range __next_mem_range Signed-off-by: Xishi Qiu --- mm/memblock.c |4 mm/nobootmem.c |2 ++ 2 files changed, 6 insertions(+), 0 deletions(-) diff --git a/mm/memblock.c b/mm/memblock.c index 6d2f219..5090050 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -817,6 +817,10 @@ void __init_memblock __next_mem_range(u64 *idx, int nid, if (nid != NUMA_NO_NODE && nid != m_nid) continue; + /* skip hotpluggable memory regions if needed */ + if (movable_node_is_enabled() && memblock_is_hotpluggable(m)) + continue; + if (!type_b) { if (out_start) *out_start = m_start; diff --git a/mm/nobootmem.c b/mm/nobootmem.c index 7ed5860..03de286 100644 --- a/mm/nobootmem.c +++ b/mm/nobootmem.c @@ -119,6 +119,8 @@ static unsigned long __init free_low_memory_core_early(void) phys_addr_t start, end; u64 i; + memblock_clear_hotplug(0, ULLONG_MAX); + for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL) count += __free_memory_core(start, end); -- 1.7.1 . -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
On 2014/8/17 19:08, Tejun Heo wrote: > Hello, > > On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote: >> numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug(). > > Yeah, that one. > >> If we don't clear hotpluggable flag in free_low_memory_core_early(), the >> memory which marked hotpluggable flag will not free to buddy allocator. >> Because __next_mem_range() will skip them. >> >> free_low_memory_core_early >> for_each_free_mem_range >> for_each_mem_range >> __next_mem_range > > Ah, okay, so the patch fixes __next_mem_range() and thus makes > free_low_memory_core_early() to skip hotpluggable regions unlike > before. Please explain things like that in the changelog. Also, OK, I will send V2. Thanks, Xishi Qiu > what's its relationship with numa_clear_kernel_node_hotplug()? Do we > still need them? If so, what are the different roles that these two > separate places serve? > > Thanks. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] I2C: Rework kernel config I2C_ACPI
On 2014年08月15日 19:03, Wolfram Sang wrote: > On Fri, Aug 15, 2014 at 01:38:59PM +0800, Lan Tianyu wrote: >> Commit da3c6647(I2C/ACPI: Clean up I2C ACPI code and Add CONFIG_I2C_ACPI >> config) adds a new kernel config I2C_ACPI and make I2C core built in >> when the config is selected. This is wrong because distributions >> etc generally compile I2C as a module and the commit broken that. >> This patch is to rename I2C_ACPI to ACPI_I2C_OPREGION. New config >> only controls ACPI I2C operation region code and depends on I2C=y. >> >> Signed-off-by: Lan Tianyu > > It looks good. What tests did you perform? > > Thanks, > >Wolfram > Hi Wolfram: The patch passed through Fengguang's 0-day autobuild test. Following are config files tested. configs tested: 122 pariscc3000_defconfig parisc b180_defconfig parisc defconfig alpha defconfig pariscallnoconfig mips allmodconfig mips jz4740 mips allnoconfig mips fuloong2e_defconfig mips txx9 x86_64allnoconfig x86_64lkp x86_64 rhel shtitan_defconfig sh rsk7269_defconfig sh sh7785lcr_32bit_defconfig shallnoconfig x86_64 randconfig-c3-0815 x86_64 randconfig-c1-0815 x86_64 randconfig-c0-0815 x86_64 randconfig-c2-0815 x86_64 allmodconfig i386 randconfig-jx5 i386 randconfig-jx4 i386 randconfig-jx7 i386 randconfig-jx6 i386 randconfig-jx1 i386 randconfig-jx0 i386 randconfig-jx3 i386 randconfig-jx2 i386 randconfig-jx9 i386 randconfig-jx8 x86_64 randconfig-jx8 x86_64 randconfig-jx9 x86_64 randconfig-jx2 x86_64 randconfig-jx3 x86_64 randconfig-jx0 x86_64 randconfig-jx1 x86_64 randconfig-jx6 x86_64 randconfig-jx7 x86_64 randconfig-jx4 x86_64 randconfig-jx5 powerpc chroma_defconfig powerpc linkstation_defconfig powerpc powerpc powerpc wii_defconfig powerpcgamecube_defconfig powerpc corenet64_smp_defconfig powerpc mpc512x powerpcppc44x x86_64 randconfig-j0-0815 x86_64 randconfig-j1-0815 i386 randconfig-ha2-0815 i386 randconfig-ha5-0815 i386 randconfig-ha1-0815 i386 randconfig-ha0-0815 i386 randconfig-ha3-0815 i386 randconfig-ha4-0815 ia64 allmodconfig ia64 allnoconfig ia64defconfig ia64 alldefconfig sparc defconfig sparc64 allnoconfig sparc64 defconfig xtensa common_defconfig m32r m32104ut_defconfig xtensa iss_defconfig m32r opsput_defconfig m32r usrv_defconfig m32r mappi3.smp_defconfig microblaze mmu_defconfig microblazenommu_defconfig microblaze allyesconfig i386 allyesconfig cris etrax-100lx_v2_defconfig blackfin TCM-BF537_defconfig blackfinBF561-EZKIT-SMP_defconfig blackfinBF533-EZKIT_defconfig blackfinBF526-EZBRD_defconfig i386 randconfig-r1-0815 i386 randconfig-r2-0815 i386 randconfig-r3-0815 i386 randconfig-r0-0815 s390 allmodconfig s390 allnoconfig s390defconfig mn10300 asb2364_defconfig openriscor1ksim_defconfig um x86_64_defconfig um i386_defconfig avr32 atngw10
Re: [PATCH] usb: phy: return -ENODEV on failure of try_module_get
On Mon, Aug 18, 2014 at 12:04:42AM +0530, Arjun Sreedharan wrote: > When __usb_find_phy_dev() does not return error and > try_module_get() fails, return -ENODEV > > Signed-off-by: Arjun Sreedharan > --- > drivers/usb/phy/phy.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/usb/phy/phy.c b/drivers/usb/phy/phy.c > index 36b6bce..8ad3638 100644 > --- a/drivers/usb/phy/phy.c > +++ b/drivers/usb/phy/phy.c > @@ -232,6 +232,7 @@ struct usb_phy *usb_get_phy_dev(struct device *dev, u8 > index) > phy = __usb_find_phy_dev(dev, &phy_bind_list, index); > if (IS_ERR(phy) || !try_module_get(phy->dev->driver->owner)) { > dev_dbg(dev, "unable to find transceiver\n"); > + phy = IS_ERR(phy) ? phy : ERR_PTR(-ENODEV); Please just spell out the if () statement, don't use ? : unless necessary. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] usb: gadget: remove $(PWD) in ccflags-y
On Mon, Aug 18, 2014 at 12:08:07AM +0200, Philippe Reynes wrote: > The variable $(PWD) is useless, and it may break the compilation. > For example, it breaks the kernel compilation when it's done with > buildroot : > > /home/trem/Codes/armadeus/armadeus/buildroot/output/host/usr/bin/ccache > /home/trem/Codes/armadeus/armadeus/buildroot/output/host/usr/bin/arm-buildroot-linux-uclibcgnueabi-gcc > -Wp,-MD,drivers/usb/gadget/legacy/.hid.o.d -nostdinc -isystem > /home/trem/Codes/armadeus/armadeus/buildroot/output/host/usr/lib/gcc/arm-buildroot-linux-uclibcgnueabi/4.7.3/include > -I./arch/arm/include -Iarch/arm/include/generated -Iinclude > -I./arch/arm/include/uapi -Iarch/arm/include/generated/uapi > -I./include/uapi -Iinclude/generated/uapi -include > ./include/linux/kconfig.h -D__KERNEL__ -mlittle-endian -Wall -Wundef > -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common > -Werror-implicit-function-declaration -Wno-format-security > -fno-dwarf2-cfi-asm -mabi=aapcs-linux -mno-thumb-interwork -mfpu=vfp > -funwind-tables -marm -D__LINUX_ARM_ARCH__=5 -march=armv5te > -mtune=arm9tdmi -msoft-float -Uarm -fno-delete-null-pointer-checks -O2 > --param=allow-store-data-races=0 -Wframe-larger-than=1024 > -fno-stack-protector -Wno-unused-but-set-variable -fomit-frame-pointer > -fno-var-tracking-assignments -g -Wdeclaration-after-statement > -Wno-pointer-sign -fno-strict-overflow -fconserve-stack > -Werror=implicit-int -Werror=strict-prototypes -DCC_HAVE_ASM_GOTO > -I/home/trem/Codes/armadeus/armadeus/buildroot/drivers/usb/gadget/ > -I/home/trem/Codes/armadeus/armadeus/buildroot/drivers/usb/gadget/udc/ > -I/home/trem/Codes/armadeus/armadeus/buildroot/drivers/usb/gadget/function/ > -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(hid)" > -D"KBUILD_MODNAME=KBUILD_STR(g_hid)" -c -o > drivers/usb/gadget/legacy/hid.o drivers/usb/gadget/legacy/hid.c > drivers/usb/gadget/epautoconf.c:23:26: erreur fatale: gadget_chips.h : > Aucun fichier ou dossier de ce type > > This compilation line include : > /buildroot/driver/usb/gadget > but the real path is : > /buildroot/output/build/linux-3.17-rc1/driver/usb/gadget > > Signed-off-by: Philippe Reynes > --- > drivers/usb/gadget/Makefile |2 +- > drivers/usb/gadget/function/Makefile |4 ++-- > drivers/usb/gadget/legacy/Makefile |6 +++--- > 3 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/drivers/usb/gadget/Makefile b/drivers/usb/gadget/Makefile > index a186afe..9add915 100644 > --- a/drivers/usb/gadget/Makefile > +++ b/drivers/usb/gadget/Makefile > @@ -3,7 +3,7 @@ > # > subdir-ccflags-$(CONFIG_USB_GADGET_DEBUG):= -DDEBUG > subdir-ccflags-$(CONFIG_USB_GADGET_VERBOSE) += -DVERBOSE_DEBUG > -ccflags-y+= -I$(PWD)/drivers/usb/gadget/udc > +ccflags-y+= -Idrivers/usb/gadget/udc Ick, why are these here at all, shouldn't you just use the proper relative paths in the .c files for the include files? That way just building a .o file individually will work properly, otherwise, it will not. And getting rid of those other ccflags would be good to do as well, no need for them to be in a Makefile. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, Aug 17, 2014 at 09:35:11PM -0400, Nicolas Pitre wrote: > On Sun, 17 Aug 2014, Jason Cooper wrote: > > > On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote: > > > On Sun, 17 Aug 2014, Jason Cooper wrote: > > > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux > > > > wrote: > > > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote: > > > > > > Applied to irqchip/urgent with Nico's Ack. > > > > > > > > > > Interesting, so I'm discussing this patch, and it gets applied > > > > > anyway... > > > > > yes, that's great. > > > > > > > > Quoting Nico: > > > > > > > > "Of course it would be good to clarify things wrt Russell's remark > > > > independently from this patch." > > > > > > > > I took 'independently' to mean "This patch is ok, *and* we need to > > > > address Russell's concerns in a follow-up patch." > > > > > > > > Nico's Reviewed-by with that comment was sent August 13th. The most > > > > recent activity on this thread was also August 13th. After four days, I > > > > reasoned there were no objections to his comment. > > > > > > Well... I mentioned this patch is a nice cleanup independently of the > > > reason why it was created in the first place. > > > > Ah, fair enough. > > > > > Maybe that shouldn't be sorted as "urgent" in that case, especially > > > when the code having problem with the current state of things is > > > living out of mainline. > > > > hmmm, yes. I've been grappling with the semantics of '/urgent' vice > > '/fixes'. With mvebu, /fixes is the branch for all changes needing to go > > into the current -rcX cycle. For irqchip, Thomas suggested /urgent for > > the equivalent branch. To me, they serve the same purpose. > > Unfortunately, I occasionally hear "Well, it's not _urgent_ ...". I > > suppose I'll put up with it for one more cycle and then change it to > > /fixes. :) > > > > wrt this patch, I need to drop it anyway. I was a bit rusty (it's been > > a few weeks) and forgot to add the Cc -stable and Fixes: tags. I do > > agree, though, it's certainly not urgent. > > Given the raised issue has to do with out-of-tree code, there is no need > to CC stable in that case anyway. I could go either way here. On the one hand, a fix is a fix is a fix. On the other, if it can't be triggered in mainline, we shouldn't accept it at all. Stephen, is the out of tree code that triggered this bound for mainline? thx, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v9 02/12] PCI: OF: Parse and map the IRQ when adding the PCI device.
On Fri, Aug 15, 2014 at 11:30:52AM +0100, Liviu Dudau wrote: >On Fri, Aug 15, 2014 at 09:56:32AM +0100, Wei Yang wrote: >> On Thu, Aug 14, 2014 at 04:49:59PM +0100, Liviu Dudau wrote: >> >On Thu, Aug 14, 2014 at 03:58:04PM +0100, Wei Yang wrote: >> >> On Tue, Aug 12, 2014 at 05:25:15PM +0100, Liviu Dudau wrote: >> >> >Enhance the default implementation of pcibios_add_device() to >> >> >parse and map the IRQ of the device if a DT binding is available. >> >> > >> >> >Cc: Bjorn Helgaas >> >> >Cc: Grant Likely >> >> >Cc: Rob Herring >> >> >Signed-off-by: Liviu Dudau >> >> >--- >> >> > drivers/pci/pci.c | 3 +++ >> >> > 1 file changed, 3 insertions(+) >> >> > >> >> >diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> >> >index 1c8592b..29d1775 100644 >> >> >--- a/drivers/pci/pci.c >> >> >+++ b/drivers/pci/pci.c >> >> >@@ -17,6 +17,7 @@ >> >> > #include >> >> > #include >> >> > #include >> >> >+#include >> >> > #include >> >> > #include >> >> > #include >> >> >@@ -1453,6 +1454,8 @@ EXPORT_SYMBOL(pcim_pin_device); >> >> > */ >> >> > int __weak pcibios_add_device(struct pci_dev *dev) >> >> > { >> >> >+dev->irq = of_irq_parse_and_map_pci(dev, 0, 0); >> >> >+ >> >> > return 0; >> >> > } >> >> >> >> Liviu, >> >> >> >> For this, my suggestion is to add arch dependent function to setup the irq >> >> line for pci devices. I can't find an obvious reason this won't work on >> >> other >> >> archs, but maybe this will hurt some of them? >> > >> >Hi Wei, >> > >> >I'm not sure I understand your point. Architectures that support OF will >> >obviously >> >benefit from this common approach, and for the other ones the function is >> >empty >> >so it will not change existing behaviour. If you are suggesting that I >> >should >> >create a new API that each architecture could go and implement for setting >> >up the >> >IRQ line then I would agree that it would be nice to have that, but the >> >question >> >is how many architectures are outside OF that need this? >> >> My suggestion is to define the pcibios_add_device() for arm arch, like the >> one >> in arch/powerpc/kernel/pci-common.c. If my understanding is correct, this >> patch set address the pci bus setup mostly on arm arch. > >And also arm64 at the least. > >> >> For those archs not support OF, this function is empty and has no effect. I >> agree on this one. >> >> For those archs rely on OF, we still have two cases: >> 1. they would have implement this function like powerpc > >Actually, powerpc seems to be the only OF platform reimplementing this >function. >s390 and x86 are not OF platforms. > >> 2. have other way to fix it up, otherwise how it works now? > >Don't forget that my patchset aims to replace existing house-made code with a >more >generic version. When architectures and platforms switch to my code they will >have >to add this back in their code if it's needed. > >> If my assumption is correct, this change will either have no effect, or fix >> up >> the irq line the second time. Not harmful, but not necessary. > >Well, it will become necessary as old code gets dismantled and converted >towards >this patchset. To give you an example that I'm familiar with, for arch/arm the >host bridge drivers have moved into drivers/pci/host, but they still depend/use >the bios32 infrastructure that takes care of setting up the irq. When they >switch >to my version they would have to go and debug the "irq not being assigned" >issue >and it is quite likely that some of the people doing the conversion will >complain >about my code rather than understanding the issue. What I'm trying to do is to >make switching to my patchset as painless as possible, with a cleanup to remove >redundant operations coming after the switchover. > This means this is a temporary version for the switchover period and will be reverted after switchover? >Does that sound like a reasonable plan? > >Best regards, >Liviu > >> >> I am not familiar with other arch, so the second case is my deduction. If >> this >> is not correct, please let me know. >> >> > >> >If I understood you correctly, it is a nice idea but slightly outside the >> >scope >> >of my current patchset. >> > >> >Best regards, >> >Liviu >> > >> >> >> >> > >> >> >-- >> >> >2.0.4 >> >> > >> >> >-- >> >> >To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> >> >the body of a message to majord...@vger.kernel.org >> >> >More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> >> Richard Yang >> >> Help you, Help me >> >> >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-pci" in >> >> the body of a message to majord...@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> >> -- >> Richard Yang >> Help you, Help me >> >> > >-- > >| I would like to | >| fix the world, | >| but they're not | >| giving me the | > \ source code! / >
Re: [PATCH RFC 6/7] usb: host: ehci-exynos: Remove unnecessary usb-phy support
On Thursday, August 14, 2014 11:24 PM, Vivek Gautam wrote: > > Now that we have completely moved from older USB-PHY drivers > to newer GENERIC-PHY drivers for PHYs available with USB controllers > on Exynos series of SoCs, we can remove the support for the same > in our host drivers too. > This should fix the issue on ehci-exynos, wherein in the absence of > SAMSUNG_USB2PHY config symbol, we ended up getting the NOP_USB_XCEIV phy > when the same is enabled. And thus the PHYs are not configured properly. > > Reported-by: Sachin Kamat > Signed-off-by: Vivek Gautam Reviewed-by: Jingoo Han Best regards, Jingoo Han > --- > drivers/usb/host/ehci-exynos.c | 53 > ++-- > 1 file changed, 8 insertions(+), 45 deletions(-) > > diff --git a/drivers/usb/host/ehci-exynos.c b/drivers/usb/host/ehci-exynos.c > index cda0a2f..54944cc 100644 > --- a/drivers/usb/host/ehci-exynos.c > +++ b/drivers/usb/host/ehci-exynos.c > @@ -21,11 +21,8 @@ > #include > #include > #include > -#include > -#include > #include > #include > -#include > > #include "ehci.h" > > @@ -47,9 +44,7 @@ static struct hc_driver __read_mostly exynos_ehci_hc_driver; > > struct exynos_ehci_hcd { > struct clk *clk; > - struct usb_phy *phy; > - struct usb_otg *otg; > - struct phy *phy_g[PHY_NUMBER]; > + struct phy *phy[PHY_NUMBER]; > }; > > #define to_exynos_ehci(hcd) (struct exynos_ehci_hcd > *)(hcd_to_ehci(hcd)->priv) > @@ -62,18 +57,6 @@ static int exynos_ehci_get_phy(struct device *dev, > int phy_number; > int ret = 0; > > - exynos_ehci->phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2); > - if (IS_ERR(exynos_ehci->phy)) { > - ret = PTR_ERR(exynos_ehci->phy); > - if (ret != -ENXIO && ret != -ENODEV) { > - dev_err(dev, "no usb2 phy configured\n"); > - return ret; > - } > - dev_dbg(dev, "Failed to get usb2 phy\n"); > - } else { > - exynos_ehci->otg = exynos_ehci->phy->otg; > - } > - > for_each_available_child_of_node(dev->of_node, child) { > ret = of_property_read_u32(child, "reg", &phy_number); > if (ret) { > @@ -98,7 +81,7 @@ static int exynos_ehci_get_phy(struct device *dev, > } > dev_dbg(dev, "Failed to get usb2 phy\n"); > } > - exynos_ehci->phy_g[phy_number] = phy; > + exynos_ehci->phy[phy_number] = phy; > } > > return ret; > @@ -111,16 +94,13 @@ static int exynos_ehci_phy_enable(struct device *dev) > int i; > int ret = 0; > > - if (!IS_ERR(exynos_ehci->phy)) > - return usb_phy_init(exynos_ehci->phy); > - > for (i = 0; ret == 0 && i < PHY_NUMBER; i++) > - if (!IS_ERR(exynos_ehci->phy_g[i])) > - ret = phy_power_on(exynos_ehci->phy_g[i]); > + if (!IS_ERR(exynos_ehci->phy[i])) > + ret = phy_power_on(exynos_ehci->phy[i]); > if (ret) > for (i--; i >= 0; i--) > - if (!IS_ERR(exynos_ehci->phy_g[i])) > - phy_power_off(exynos_ehci->phy_g[i]); > + if (!IS_ERR(exynos_ehci->phy[i])) > + phy_power_off(exynos_ehci->phy[i]); > > return ret; > } > @@ -131,14 +111,9 @@ static void exynos_ehci_phy_disable(struct device *dev) > struct exynos_ehci_hcd *exynos_ehci = to_exynos_ehci(hcd); > int i; > > - if (!IS_ERR(exynos_ehci->phy)) { > - usb_phy_shutdown(exynos_ehci->phy); > - return; > - } > - > for (i = 0; i < PHY_NUMBER; i++) > - if (!IS_ERR(exynos_ehci->phy_g[i])) > - phy_power_off(exynos_ehci->phy_g[i]); > + if (!IS_ERR(exynos_ehci->phy[i])) > + phy_power_off(exynos_ehci->phy[i]); > } > > static void exynos_setup_vbus_gpio(struct device *dev) > @@ -231,9 +206,6 @@ skip_phy: > goto fail_io; > } > > - if (exynos_ehci->otg) > - exynos_ehci->otg->set_host(exynos_ehci->otg, &hcd->self); > - > err = exynos_ehci_phy_enable(&pdev->dev); > if (err) { > dev_err(&pdev->dev, "Failed to enable USB phy\n"); > @@ -273,9 +245,6 @@ static int exynos_ehci_remove(struct platform_device > *pdev) > > usb_remove_hcd(hcd); > > - if (exynos_ehci->otg) > - exynos_ehci->otg->set_host(exynos_ehci->otg, &hcd->self); > - > exynos_ehci_phy_disable(&pdev->dev); > > clk_disable_unprepare(exynos_ehci->clk); > @@ -298,9 +267,6 @@ static int exynos_ehci_suspend(struct device *dev) > if (rc) > return rc; > > - if (exynos_ehci->otg) > - exynos_ehci->otg->set_host(exynos_ehci->otg, &hcd->self); > - > exynos_ehci_phy_disable(dev); > > clk_disable_unprepare(exynos_ehci->clk); > @@ -316,9
Re: [PATCH RFC 7/7] usb: host: ohci-exynos: Remove unnecessary usb-phy support
On Thursday, August 14, 2014 11:24 PM, Vivek Gautam wrote: > > Now that we have completely moved from older USB-PHY drivers > to newer GENERIC-PHY drivers for PHYs available with USB controllers > on Exynos series of SoCs, we can remove the support for the same > in our host drivers too. > This should fix the issue on ohci-exynos, wherein in the absence of > SAMSUNG_USB2PHY config symbol, we ended up getting the NOP_USB_XCEIV phy > when the same is enabled. And thus the PHYs are not configured properly. > > Reported-by: Sachin Kamat > Signed-off-by: Vivek Gautam Reviewed-by: Jingoo Han Best regards, Jingoo Han > --- > drivers/usb/host/ohci-exynos.c | 64 > ++-- > 1 file changed, 9 insertions(+), 55 deletions(-) > > diff --git a/drivers/usb/host/ohci-exynos.c b/drivers/usb/host/ohci-exynos.c > index a72ab8f..0199a8b 100644 > --- a/drivers/usb/host/ohci-exynos.c > +++ b/drivers/usb/host/ohci-exynos.c > @@ -19,11 +19,8 @@ > #include > #include > #include > -#include > -#include > #include > #include > -#include > > #include "ohci.h" > > @@ -38,9 +35,7 @@ static struct hc_driver __read_mostly exynos_ohci_hc_driver; > > struct exynos_ohci_hcd { > struct clk *clk; > - struct usb_phy *phy; > - struct usb_otg *otg; > - struct phy *phy_g[PHY_NUMBER]; > + struct phy *phy[PHY_NUMBER]; > }; > > static int exynos_ohci_get_phy(struct device *dev, > @@ -51,28 +46,7 @@ static int exynos_ohci_get_phy(struct device *dev, > int phy_number; > int ret = 0; > > - exynos_ohci->phy = devm_usb_get_phy(dev, USB_PHY_TYPE_USB2); > - if (IS_ERR(exynos_ohci->phy)) { > - ret = PTR_ERR(exynos_ohci->phy); > - if (ret != -ENXIO && ret != -ENODEV) { > - dev_err(dev, "no usb2 phy configured\n"); > - return ret; > - } > - dev_dbg(dev, "Failed to get usb2 phy\n"); > - } else { > - exynos_ohci->otg = exynos_ohci->phy->otg; > - } > - > - /* > - * Getting generic phy: > - * We are keeping both types of phys as a part of transiting OHCI > - * to generic phy framework, so as to maintain backward compatibilty > - * with old DTB. > - * If there are existing devices using DTB files built from them, > - * to remove the support for old bindings in this driver, > - * we need to make sure that such devices have their DTBs > - * updated to ones built from new DTS. > - */ > + /* Get the generic phys */ > for_each_available_child_of_node(dev->of_node, child) { > ret = of_property_read_u32(child, "reg", &phy_number); > if (ret) { > @@ -97,7 +71,7 @@ static int exynos_ohci_get_phy(struct device *dev, > } > dev_dbg(dev, "Failed to get usb2 phy\n"); > } > - exynos_ohci->phy_g[phy_number] = phy; > + exynos_ohci->phy[phy_number] = phy; > } > > return ret; > @@ -110,16 +84,13 @@ static int exynos_ohci_phy_enable(struct device *dev) > int i; > int ret = 0; > > - if (!IS_ERR(exynos_ohci->phy)) > - return usb_phy_init(exynos_ohci->phy); > - > for (i = 0; ret == 0 && i < PHY_NUMBER; i++) > - if (!IS_ERR(exynos_ohci->phy_g[i])) > - ret = phy_power_on(exynos_ohci->phy_g[i]); > + if (!IS_ERR(exynos_ohci->phy[i])) > + ret = phy_power_on(exynos_ohci->phy[i]); > if (ret) > for (i--; i >= 0; i--) > - if (!IS_ERR(exynos_ohci->phy_g[i])) > - phy_power_off(exynos_ohci->phy_g[i]); > + if (!IS_ERR(exynos_ohci->phy[i])) > + phy_power_off(exynos_ohci->phy[i]); > > return ret; > } > @@ -130,14 +101,9 @@ static void exynos_ohci_phy_disable(struct device *dev) > struct exynos_ohci_hcd *exynos_ohci = to_exynos_ohci(hcd); > int i; > > - if (!IS_ERR(exynos_ohci->phy)) { > - usb_phy_shutdown(exynos_ohci->phy); > - return; > - } > - > for (i = 0; i < PHY_NUMBER; i++) > - if (!IS_ERR(exynos_ohci->phy_g[i])) > - phy_power_off(exynos_ohci->phy_g[i]); > + if (!IS_ERR(exynos_ohci->phy[i])) > + phy_power_off(exynos_ohci->phy[i]); > } > > static int exynos_ohci_probe(struct platform_device *pdev) > @@ -209,9 +175,6 @@ skip_phy: > goto fail_io; > } > > - if (exynos_ohci->otg) > - exynos_ohci->otg->set_host(exynos_ohci->otg, &hcd->self); > - > platform_set_drvdata(pdev, hcd); > > err = exynos_ohci_phy_enable(&pdev->dev); > @@ -244,9 +207,6 @@ static int exynos_ohci_remove(struct platform_device > *pdev) > > usb_remove_hcd(hcd); > > - if (exynos_ohci->otg) > - exynos_ohci->otg->set_host(exynos_ohc
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, 17 Aug 2014, Jason Cooper wrote: > On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote: > > On Sun, 17 Aug 2014, Jason Cooper wrote: > > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux wrote: > > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote: > > > > > Applied to irqchip/urgent with Nico's Ack. > > > > > > > > Interesting, so I'm discussing this patch, and it gets applied anyway... > > > > yes, that's great. > > > > > > Quoting Nico: > > > > > > "Of course it would be good to clarify things wrt Russell's remark > > > independently from this patch." > > > > > > I took 'independently' to mean "This patch is ok, *and* we need to > > > address Russell's concerns in a follow-up patch." > > > > > > Nico's Reviewed-by with that comment was sent August 13th. The most > > > recent activity on this thread was also August 13th. After four days, I > > > reasoned there were no objections to his comment. > > > > Well... I mentioned this patch is a nice cleanup independently of the > > reason why it was created in the first place. > > Ah, fair enough. > > > Maybe that shouldn't be sorted as "urgent" in that case, especially > > when the code having problem with the current state of things is > > living out of mainline. > > hmmm, yes. I've been grappling with the semantics of '/urgent' vice > '/fixes'. With mvebu, /fixes is the branch for all changes needing to go > into the current -rcX cycle. For irqchip, Thomas suggested /urgent for > the equivalent branch. To me, they serve the same purpose. > Unfortunately, I occasionally hear "Well, it's not _urgent_ ...". I > suppose I'll put up with it for one more cycle and then change it to > /fixes. :) > > wrt this patch, I need to drop it anyway. I was a bit rusty (it's been > a few weeks) and forgot to add the Cc -stable and Fixes: tags. I do > agree, though, it's certainly not urgent. Given the raised issue has to do with out-of-tree code, there is no need to CC stable in that case anyway. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/7] usb-phy: samsung-usb3: Remove older phy-samsung-usb3 driver
On Thursday, August 14, 2014 11:24 PM, Vivek Gautam wrote: > > Removing this older USB 3.0 DRD controller PHY driver, since > a new driver based on generic phy framework is now available. > > Signed-off-by: Vivek Gautam Reviewed-by: Jingoo Han Best regards, Jingoo Han > --- > drivers/usb/phy/Kconfig|8 - > drivers/usb/phy/Makefile |1 - > drivers/usb/phy/phy-samsung-usb.h | 80 - > drivers/usb/phy/phy-samsung-usb3.c | 350 > > 4 files changed, 439 deletions(-) > delete mode 100644 drivers/usb/phy/phy-samsung-usb3.c -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, Aug 17, 2014 at 10:41:23PM +0100, Russell King - ARM Linux wrote: > On Sun, Aug 17, 2014 at 03:04:34PM -0400, Jason Cooper wrote: > > Quoting Nico: > > > > "Of course it would be good to clarify things wrt Russell's remark > > independently from this patch." > > > > I took 'independently' to mean "This patch is ok, *and* we need to > > address Russell's concerns in a follow-up patch." > > > > Nico's Reviewed-by with that comment was sent August 13th. The most > > recent activity on this thread was also August 13th. After four days, I > > reasoned there were no objections to his comment. > > Right, during the merge window, and during merge windows, I tend to > ignore almost all email now because people don't stop developing, and > they don't take any notice where the mainline cycle is. In fact, I go > off and do non-kernel work during a merge window and only briefly scan > for bug fixes. Ok, now dropped. thx, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/7 update] arm64/efi: do not enter virtual mode in case booting with efi=noruntime or noefi
In case efi runtime disabled via noefi kernel cmdline arm64_enter_virtual_mode should error out. At the same time move early_memunmap(memmap.map, mapsize) to the beginning of the function or it will leak early mem. Signed-off-by: Dave Young --- arch/arm64/kernel/efi.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c index 6ed0362..8f5db4a 100644 --- a/arch/arm64/kernel/efi.c +++ b/arch/arm64/kernel/efi.c @@ -392,11 +392,16 @@ static int __init arm64_enter_virtual_mode(void) return -1; } - pr_info("Remapping and enabling EFI services.\n"); - - /* replace early memmap mapping with permanent mapping */ mapsize = memmap.map_end - memmap.map; early_memunmap(memmap.map, mapsize); + + if (efi_runtime_disabled()) { + pr_info("EFI runtime services will be disabled.\n"); + return -1; + } + + pr_info("Remapping and enabling EFI services.\n"); + /* replace early memmap mapping with permanent mapping */ memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map, mapsize); memmap.map_end = memmap.map + mapsize; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/7] arm64/efi: do not enter virtual mode in case booting with efi=noruntime or noefi
On 08/15/14 at 04:09pm, Will Deacon wrote: > On Thu, Aug 14, 2014 at 10:15:30AM +0100, Dave Young wrote: > > In case efi runtime disabled via noefi kernel cmdline > > arm64_enter_virtual_mode > > should error out. > > > > At the same time move early_memunmap(memmap.map, mapsize) to the beginning > > of > > the function or it will leak early mem. > > > > Signed-off-by: Dave Young > > --- > > arch/arm64/kernel/efi.c | 9 +++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c > > index 6ed0362..309fab1 100644 > > --- a/arch/arm64/kernel/efi.c > > +++ b/arch/arm64/kernel/efi.c > > @@ -392,11 +392,16 @@ static int __init arm64_enter_virtual_mode(void) > > return -1; > > } > > > > + mapsize = memmap.map_end - memmap.map; > > + if (efi_runtime_disabled()) { > > + early_memunmap(memmap.map, mapsize); > > Should this early_memunmap really be conditional? With this change, we no > longer unmap it before setting up the permanent mapping below. Ooops, I tested the right version but sent a wrong version for this arm64 patch. Thanks for catch. > > Will > > > + pr_info("EFI runtime services will be disabled.\n"); > > + return -1; > > + } > > + > > pr_info("Remapping and enabling EFI services.\n"); > > > > /* replace early memmap mapping with permanent mapping */ > > - mapsize = memmap.map_end - memmap.map; > > - early_memunmap(memmap.map, mapsize); > > memmap.map = (__force void *)ioremap_cache((phys_addr_t)memmap.phys_map, > >mapsize); > > memmap.map_end = memmap.map + mapsize; > > -- > > 1.8.3.1 > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, Aug 17, 2014 at 08:04:45PM -0400, Nicolas Pitre wrote: > On Sun, 17 Aug 2014, Jason Cooper wrote: > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux wrote: > > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote: > > > > Applied to irqchip/urgent with Nico's Ack. > > > > > > Interesting, so I'm discussing this patch, and it gets applied anyway... > > > yes, that's great. > > > > Quoting Nico: > > > > "Of course it would be good to clarify things wrt Russell's remark > > independently from this patch." > > > > I took 'independently' to mean "This patch is ok, *and* we need to > > address Russell's concerns in a follow-up patch." > > > > Nico's Reviewed-by with that comment was sent August 13th. The most > > recent activity on this thread was also August 13th. After four days, I > > reasoned there were no objections to his comment. > > Well... I mentioned this patch is a nice cleanup independently of the > reason why it was created in the first place. Ah, fair enough. > Maybe that shouldn't be sorted as "urgent" in that case, especially > when the code having problem with the current state of things is > living out of mainline. hmmm, yes. I've been grappling with the semantics of '/urgent' vice '/fixes'. With mvebu, /fixes is the branch for all changes needing to go into the current -rcX cycle. For irqchip, Thomas suggested /urgent for the equivalent branch. To me, they serve the same purpose. Unfortunately, I occasionally hear "Well, it's not _urgent_ ...". I suppose I'll put up with it for one more cycle and then change it to /fixes. :) wrt this patch, I need to drop it anyway. I was a bit rusty (it's been a few weeks) and forgot to add the Cc -stable and Fixes: tags. I do agree, though, it's certainly not urgent. As Russell has raised more issues with this patch as well, I'll hold off on re-applying until I see a new version. Hopefully it'll meet with everyones approval. thx, Jason. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1 5/9] block: loop: convert to blk-mq
On Mon, Aug 18, 2014 at 1:48 AM, Jens Axboe wrote: > On 2014-08-16 02:06, Ming Lei wrote: >> >> On 8/16/14, Jens Axboe wrote: >>> >>> On 08/15/2014 10:36 AM, Jens Axboe wrote: On 08/15/2014 10:31 AM, Christoph Hellwig wrote: >> >> +static void loop_queue_work(struct work_struct *work) > > > Offloading work straight to a workqueue dosn't make much sense > in the blk-mq model as we'll usually be called from one. If you > need to avoid the cases where we are called directly a flag for > the blk-mq code to always schedule a workqueue sounds like a much > better plan. That's a good point - would clean up this bit, and be pretty close to a one-liner to support in blk-mq for the drivers that always need blocking context. >>> >>> >>> Something like this should do the trick - totally untested. But with >>> that, loop would just need to add BLK_MQ_F_WQ_CONTEXT to it's tag set >>> flags and it could always do the work inline from ->queue_rq(). >> >> >> I think it is a good idea. >> >> But for loop, there may be two problems: >> >> - default max_active for bound workqueue is 256, which means several slow >> loop devices might slow down whole block system. With kernel AIO, it won't >> be a big deal, but some block/fs may not support direct I/O and still >> fallback to >> workqueue >> >> - 6. Guidelines of Documentation/workqueue.txt >> If there is dependency among multiple work items used during memory >> reclaim, they should be queued to separate wq each with WQ_MEM_RECLAIM. > > > Both are good points. But I think this mainly means that we should support > this through a potentially per-dispatch queue workqueue, separate from > kblockd. There's no reason blk-mq can't support this with a per-hctx > workqueue, for drivers that need it. Good idea, and per-device workqueue should be enough if BLK_MQ_F_WQ_CONTEXT flag is set. Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] fs/buffer.c: allocate buffer cache from non-movable area
2014-08-15 오전 6:22, Andrew Morton 쓴 글: On Thu, 14 Aug 2014 14:15:40 +0900 Gioh Kim wrote: A buffer cache is allocated from movable area because it is referred for a while and released soon. But some filesystems are taking buffer cache for a long time and it can disturb page migration. A new API should be introduced to allocate buffer cache from non-movable area. I think the API could and should be more flexible than this. Rather than making the API be "movable or not movable", let's permit callers to specify the gfp_t and leave it at that. That way, if someone later wants to allocate a buffer head with, I dunno, __GFP_NOTRACK then they can do so. So the word "movable" shouldn't appear in buffer.c at all, except in a single place. Absolutely I agree with you. If filesystem developers agree this patch I will send 2nd patch that applies your ideas. Thank you for your advices. --- a/fs/buffer.c +++ b/fs/buffer.c @@ -993,7 +993,7 @@ init_page_buffers(struct page *page, struct block_device *bdev, */ static int grow_dev_page(struct block_device *bdev, sector_t block, - pgoff_t index, int size, int sizebits) + pgoff_t index, int size, int sizebits, gfp_t movable_mask) s/movable_mask/gfp/ I got it. { struct inode *inode = bdev->bd_inode; struct page *page; @@ -1003,7 +1003,8 @@ grow_dev_page(struct block_device *bdev, sector_t block, gfp_t gfp_mask; gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS; - gfp_mask |= __GFP_MOVABLE; + if (movable_mask & __GFP_MOVABLE) + gfp_mask |= __GFP_MOVABLE; This becomes gfp_mask |= gfp; I got it. /* * XXX: __getblk_slow() can not really deal with failure and * will endlessly loop on improvised global reclaim. Prefer @@ -1058,7 +1059,8 @@ failed: * that page was dirty, the buffers are set dirty also. */ static int -grow_buffers(struct block_device *bdev, sector_t block, int size) +grow_buffers(struct block_device *bdev, sector_t block, +int size, gfp_t movable_mask) gfp { pgoff_t index; int sizebits; @@ -1085,11 +1087,12 @@ grow_buffers(struct block_device *bdev, sector_t block, int size) } /* Create a page with the proper size buffers.. */ - return grow_dev_page(bdev, block, index, size, sizebits); + return grow_dev_page(bdev, block, index, size, sizebits, movable_mask); } static struct buffer_head * -__getblk_slow(struct block_device *bdev, sector_t block, int size) +__getblk_slow(struct block_device *bdev, sector_t block, + int size, gfp_t movable_mask) gfp { /* Size must be multiple of hard sectorsize */ if (unlikely(size & (bdev_logical_block_size(bdev)-1) || @@ -,7 +1114,7 @@ __getblk_slow(struct block_device *bdev, sector_t block, int size) if (bh) return bh; - ret = grow_buffers(bdev, block, size); + ret = grow_buffers(bdev, block, size, movable_mask); gfp if (ret < 0) return NULL; if (ret == 0) @@ -1385,11 +1388,34 @@ __getblk(struct block_device *bdev, sector_t block, unsigned size) might_sleep(); if (bh == NULL) - bh = __getblk_slow(bdev, block, size); + bh = __getblk_slow(bdev, block, size, __GFP_MOVABLE); Here is the place where buffer.c. mentions "movable". I got it. return bh; } EXPORT_SYMBOL(__getblk); + /* + * __getblk_nonmovable will locate (and, if necessary, create) the buffer_head + * which corresponds to the passed block_device, block and size. The + * returned buffer has its reference count incremented. + * + * The page cache is allocated from non-movable area + * not to prevent page migration. + * + * __getblk()_nonmovable will lock up the machine + * if grow_dev_page's try_to_free_buffers() attempt is failing. FIXME, perhaps? + */ +struct buffer_head * +__getblk_nonmovable(struct block_device *bdev, sector_t block, unsigned size) +{ + struct buffer_head *bh = __find_get_block(bdev, block, size); + + might_sleep(); + if (bh == NULL) + bh = __getblk_slow(bdev, block, size, 0); + return bh; +} +EXPORT_SYMBOL(__getblk_nonmovable); Suggest this be called __getblk_gfp(bdev, block, size, gfp) and then __getblk() be changed to call __getblk_gfp(..., __GFP_MOVABLE). We could then write a __getblk_nonmovable() which calls __getblk_gfp() (a static inlined one-line function) or we can just call __getblk_gfp(..., 0) directly from filesystems. I got it. @@ -1423,6 +1450,28 @@ __bread(struct block_device *bdev, sector_t block, unsigned size) } EXPORT_SYMBOL(__bread); +/** + * __bread_nonmovable() - reads a specified block and returns the bh + * @bdev: the block_device to read from + * @block: number of block + * @si
Re: [PATCH 0/2] new APIs to allocate buffer-cache for superblock in non-movable area
2014-08-17 오전 3:52, Jan Kara 쓴 글: On Thu 14-08-14 14:26:10, Andrew Morton wrote: On Thu, 14 Aug 2014 14:12:17 +0900 Gioh Kim wrote: This patch try to solve problem that a long-lasting page caches of ext4 superblock and journaling of superblock disturb page migration. I've been testing CMA feature on my ARM-based platform and found that two page caches cannot be migrated. They are page caches of superblock of ext4 filesystem and its journaling data. Current ext4 reads superblock with sb_bread() that allocates page from movable area. But the problem is that ext4 hold the page until it is unmounted. If root filesystem is ext4 the page cannot be migrated forever. And also the journaling data for the superblock cannot be migreated. I introduce a new API for allocating page cache from non-movable area. It is useful for ext4/ext3 and others that want to hold page cache for a long time. All seems reasonable to me. The additional overhead in buffer.c from additional function arguments is regrettable but I don't see a non-hacky alternative. One vital question which the changelog doesn't really address (it should): how important is this patch? Is your test system presently "completely dead in the water utterly unusable" or "occasionally not quite as good as it could be". Somewhere in between? I would be also interested in how much these patches make things better. Because I would expect all metadata that is currently journalled to be unmovable as well. Honza I'm so sorry for lacking of detail. My test platform has totally 1GB memory, 256MB for CMA and 768MB for normal. I applied Joonsoo's patch: https://lkml.org/lkml/2014/5/28/64, so that 3/4 of allocation take place in normal area and 1/4 allocation take place in CMA area. And my platform has 4 ext4 partitions. Each ext4 partition has 2 page caches for superblock that are what this patch tries to move to out of CMA area. Therefore there are 8 page caches (8 pages size) that can prevent page migration. My test scenario is trying to allocate all CMA area: repeating 16MB allocation until all CMA area are allocated. In the most cases 2 pages are allocated from CMA area and one allocation among 16 tries to allocation failed. It is rare case that every allocation successes. Applying this patch no page cache is allocation from CMA area and every allocation successes. Please inform me if you need any information. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()
Hi tj, On 08/17/2014 07:08 PM, Tejun Heo wrote: Hello, On Sat, Aug 16, 2014 at 10:36:41PM +0800, Xishi Qiu wrote: numa_clear_node_hotplug()? There is only numa_clear_kernel_node_hotplug(). Yeah, that one. If we don't clear hotpluggable flag in free_low_memory_core_early(), the memory which marked hotpluggable flag will not free to buddy allocator. Because __next_mem_range() will skip them. free_low_memory_core_early for_each_free_mem_range for_each_mem_range __next_mem_range Ah, okay, so the patch fixes __next_mem_range() and thus makes free_low_memory_core_early() to skip hotpluggable regions unlike before. Please explain things like that in the changelog. Also, what's its relationship with numa_clear_kernel_node_hotplug()? Do we still need them? If so, what are the different roles that these two separate places serve? numa_clear_kernel_node_hotplug() only clears hotplug flags for the nodes the kernel resides in, not for hotpluggable nodes. The reason why we did this is to enable the kernel to allocate memory in case all the nodes are hotpluggable. And we clear hotplug flags for all the nodes in free_low_memory_core_early() is because if we do not, all hotpluggable memory won't be able to be freed to buddy after Qiu's patch. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 3.16-rcX crashes on resume from Suspend-To-RAM
On Sat, 2014-08-16 at 02:46 +0200, Rafael J. Wysocki wrote: > On Friday, August 15, 2014 10:17:42 AM Markus Gutschke wrote: > > Just wondering if any of you had any other ideas of what I could try > > to help debug this problem? > > My theory is that there is a device in your system that we don't have a driver > for, but it had been enumerated as a PNP device before the change that > triggered > the problem for you and we turned it off during suspend as part of the default > ACPI PNP device handling. I had the same assumption before, thus I checked the difference of platform devices and pnp devices, and found that there are three devices enumerated to platform bus instead of PNP bus after the ACPI enumeration rework patches. They are PNP0800, PNP0200 and PNP0C04 devices, thus I made a debug patch to add those ids to the acpi_pnp scan handler id list so that they will stay in PNP bus. But the problem still exists after applying the debug patch. > > The reason why you're seeing a crash with the "platform" test level is most > likely that the _WAK control method does something unusual on your system. > an easy way to check this is to apply the debug patch attached and re-test "platform" test level. thanks, rui > The LNXSYBUS:00 thing from dmesg probably is a red herring. > > I need the output of acpidump from the affected system, but please attach it > to the bug entry at https://bugzilla.kernel.org/show_bug.cgi?id=80911 that > Rui has created for this issue. > > Also please check the list of PNP devices under > > /sys/bus/pnp/devices/ > > before and after the commit you have found by bisection and let me know if > there are any differences. > > > > On Tue, Aug 12, 2014 at 9:11 AM, Markus Gutschke > > wrote: > > > As I said earlier in this thread, echo'ing "devices" into "pm_test" > > > does not result in a crash; but doing so for "platform" does. > > > > > > Markus > > > > > > On Aug 12, 2014 1:26 AM, "Zhang Rui" wrote: > > >> > > >> On Sat, 2014-08-09 at 03:14 -0700, Markus Gutschke wrote: > > >> > I am back and have physical access to the machine now. > > >> > > > >> great! > > >> > > >> > I re-ran the test just to be sure, and I can confirm that "platform" > > >> > does in fact result in a crash. > > >> > > > >> what about "devices"? > > >> I mean > > >> > > >> # echo devices > /sys/power/pm_test > > >> > > >> and see if that triggers the crash. > > >> > > >> > Furthermore, I ran the test that Rui asked for. I suspended, resumed, > > >> > and upon crashing power-cycled the machine ASAP. "dmesg" suggests that > > >> > the problem is with LNXSYBUS:00 That doesn't tell me much, but > > >> > hopefully it makes sense to you guys. > > >> > > > >> [0.930093] Magic number: 10:810:122 > > >> [0.930185] acpi LNXSYBUS:00: hash matches > > >> > > >> This looks weird, ACPI will do nothing for LNXSYBUS devices during > > >> resume. > > >> Rafael, any thought on this? > > >> > > >> thanks, > > >> rui > > >> > >From 1a51cb80cf581a0aa228dd82aaf45f4e250d0d59 Mon Sep 17 00:00:00 2001 From: Zhang Rui Date: Mon, 18 Aug 2014 09:09:07 +0800 Subject: [PATCH] 80911: Debug patch to skip _WAK in platform pm_test mode --- kernel/power/suspend.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index 6dadb25..402f0ca 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -270,11 +270,12 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) printk(KERN_ERR "PM: Some devices failed to power down\n"); goto Platform_finish; } - error = platform_suspend_prepare_late(state); - if (error) - goto Platform_wake; if (suspend_test(TEST_PLATFORM)) + goto Platform_test; + + error = platform_suspend_prepare_late(state); + if (error) goto Platform_wake; /* @@ -319,8 +320,8 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) Platform_wake: platform_suspend_wake(state); + Platform_test: dpm_resume_start(PMSG_RESUME); - Platform_finish: platform_suspend_finish(state); return error; -- 1.8.3.2
rt_sigreturn rejects a substitute stack frame as invalid.
Hello, I'm not totally sure that GLibc's setcontext is safe to use in a signal handler. So, I decided I was going to play things safe and let rt_sigreturn switch stacks for me instead. However, rt_sigreturn seems to reject my substitute stack frame as invalid and I'm not sure why. Thank you, Steven Stewart-Gallus The code: #include #include #include #include static ucontext_t alternate_context; static char alternate_context_stack[SIGSTKSZ]; static char signal_stack[SIGSTKSZ]; static void alternate_context_func(void) { puts("alternate context!"); } static void switch_stack(int signo, siginfo_t *infop, void *untyped_ucontextp) { ucontext_t * ucontextp = untyped_ucontextp; /* I'm not sure if setcontext is async-signal-safe so set the * context using the return from the signal handler. */ *ucontextp = alternate_context; #ifdef __linux__ ucontextp->uc_mcontext.fpregs = &ucontextp->__fpregs_mem; #endif } int main(void) { { stack_t stack = { 0 }; stack.ss_sp = signal_stack; stack.ss_size = sizeof signal_stack; sigaltstack(&stack, NULL); } getcontext(&alternate_context); alternate_context.uc_stack.ss_sp = alternate_context_stack; alternate_context.uc_stack.ss_size = sizeof alternate_context_stack; makecontext(&alternate_context, (void (*)(void))alternate_context_func, 0U); { struct sigaction action = { 0 }; action.sa_sigaction = switch_stack; action.sa_flags = SA_SIGINFO; sigfillset(&action.sa_mask); sigaction(SIGRTMIN, &action, NULL); } raise(SIGRTMIN); } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
回复: Re: 回复: Re: [PATCH] unicore32: Fix build error
- Guenter Roeck 写道: > On 08/15/2014 05:45 PM, Xuetao Guan wrote: > > > > - Guenter Roeck 写道: > >> On 08/10/2014 08:29 AM, Guenter Roeck wrote: > >>> unicore32 builds fail with > >>> > >>> arch/unicore32/kernel/signal.c: In function ‘setup_frame’: > >>> arch/unicore32/kernel/signal.c:257: error: > >>> ‘usig’ undeclared (first use in this function) > >>> arch/unicore32/kernel/signal.c:279: error: > >>> ‘usig’ undeclared (first use in this function) > >>> arch/unicore32/kernel/signal.c: In function ‘handle_signal’: > >>> arch/unicore32/kernel/signal.c:306: warning: unused variable ‘tsk’ > >>> arch/unicore32/kernel/signal.c: In function ‘do_signal’: > >>> arch/unicore32/kernel/signal.c:376: error: > >>> implicit declaration of function ‘get_signsl’ > >>> make[1]: *** [arch/unicore32/kernel/signal.o] Error 1 > >>> make: *** [arch/unicore32/kernel/signal.o] Error 2 > >>> > >>> Bisect points to commit 649671c90eaf ("unicore32: Use get_signal() > >>> signal_setup_done()"). > >>> > >>> This code never even compiled. Reverting the patch does not work, > >>> since previously used functions no longer exist, so try to fix it up. > >>> Compile tested only. > >>> > >>> Cc: Richard Weinberger > >>> Signed-off-by: Guenter Roeck > >> > >> ping ... > >> > >> Failure is still present in upstream kernel (v3.16-11383-gc9d2642). > >> > >> Guenter > >> > > > > Thanks. I'll fix it. > > > > More a question of applying (and if possible testing) the patch I provided. > > Thanks, > Guenter > > Ok, I'll do it. Xuetao -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v15 3/7] sparc: add pmd_[dirty|mkclean] for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Acked-by: David S. Miller Cc: sparcli...@vger.kernel.org Signed-off-by: Minchan Kim --- arch/sparc/include/asm/pgtable_64.h | 16 1 file changed, 16 insertions(+) diff --git a/arch/sparc/include/asm/pgtable_64.h b/arch/sparc/include/asm/pgtable_64.h index 3770bf5c6e1b..b80a309d7e00 100644 --- a/arch/sparc/include/asm/pgtable_64.h +++ b/arch/sparc/include/asm/pgtable_64.h @@ -666,6 +666,13 @@ static inline unsigned long pmd_young(pmd_t pmd) return pte_young(pte); } +static inline int pmd_dirty(pmd_t pmd) +{ + pte_t pte = __pte(pmd_val(pmd)); + + return pte_dirty(pte); +} + static inline unsigned long pmd_write(pmd_t pmd) { pte_t pte = __pte(pmd_val(pmd)); @@ -723,6 +730,15 @@ static inline pmd_t pmd_mkdirty(pmd_t pmd) return __pmd(pte_val(pte)); } +static inline pmd_t pmd_mkclean(pmd_t pmd) +{ + pte_t pte = __pte(pmd_val(pmd)); + + pte = pte_mkclean(pte); + + return __pmd(pte_val(pte)); +} + static inline pmd_t pmd_mkyoung(pmd_t pmd) { pte_t pte = __pte(pmd_val(pmd)); -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v15 1/7] mm: support madvise(MADV_FREE)
Linux doesn't have an ability to free pages lazy while other OS already have been supported that named by madvise(MADV_FREE). The gain is clear that kernel can discard freed pages rather than swapping out or OOM if memory pressure happens. Without memory pressure, freed pages would be reused by userspace without another additional overhead(ex, page fault + allocation + zeroing). How to work is following as. When madvise syscall is called, VM clears dirty bit of ptes of the range. If memory pressure happens, VM checks dirty bit of page table and if it found still "clean", it means it's a "lazyfree pages" so VM could discard the page instead of swapping out. Once there was store operation for the page before VM peek a page to reclaim, dirty bit is set so VM can swap out the page instead of discarding. Firstly, heavy users would be general allocators(ex, jemalloc, tcmalloc and hope glibc supports it) and jemalloc/tcmalloc already have supported the feature for other OS(ex, FreeBSD) barrios@blaptop:~/benchmark/ebizzy$ lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):4 On-line CPU(s) list: 0-3 Thread(s) per core:2 Core(s) per socket:2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 42 Stepping: 7 CPU MHz: 2801.000 BogoMIPS: 5581.64 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 4096K NUMA node0 CPU(s): 0-3 ebizzy benchmark(./ebizzy -S 10 -n 512) vanilla-jemalloc MADV_free-jemalloc 1 thread records: 10 records: 10 avg: 7682.10 avg: 15306.10 std: 62.35(0.81%)std: 347.99(2.27%) max: 7770.00 max: 15622.00 min: 7598.00 min: 14772.00 2 thread records: 10 records: 10 avg: 12747.50avg: 24171.00 std: 792.06(6.21%) std: 895.18(3.70%) max: 13337.00max: 26023.00 min: 10535.00min: 23152.00 4 thread records: 10 records: 10 avg: 16474.60avg: 33717.90 std: 1496.45(9.08%) std: 2008.97(5.96%) max: 17877.00max: 35958.00 min: 12224.00min: 29565.00 8 thread records: 10 records: 10 avg: 16778.50avg: 33308.10 std: 825.53(4.92%) std: 1668.30(5.01%) max: 17543.00max: 36010.00 min: 14576.00min: 29577.00 16 thread records: 10 records: 10 avg: 20614.40avg: 35516.30 std: 602.95(2.92%) std: 1283.65(3.61%) max: 21753.00max: 37178.00 min: 19605.00min: 33217.00 32 thread records: 10 records: 10 avg: 22771.70avg: 36018.50 std: 598.94(2.63%) std: 1046.76(2.91%) max: 24035.00max: 37266.00 min: 22108.00min: 34149.00 In summary, MADV_FREE is about 2 time faster than MADV_DONTNEED. Cc: Michael Kerrisk Cc: Linux API Cc: Hugh Dickins Cc: Johannes Weiner Cc: KOSAKI Motohiro Cc: Mel Gorman Cc: Jason Evans Acked-by: Kirill A. Shutemov Acked-by: Zhang Yanfei Acked-by: Rik van Riel Signed-off-by: Minchan Kim --- include/linux/rmap.h | 9 ++- include/linux/vm_event_item.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 140 + mm/rmap.c | 42 +- mm/vmscan.c| 40 -- mm/vmstat.c| 1 + 7 files changed, 222 insertions(+), 12 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index be574506e6a9..0ba377b97a38 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -75,6 +75,7 @@ enum ttu_flags { TTU_UNMAP = 1, /* unmap mode */ TTU_MIGRATION = 2, /* migration mode */ TTU_MUNLOCK = 4,/* munlock mode */ + TTU_FREE = 8, /* free mode */ TTU_IGNORE_MLOCK = (1 << 8),/* ignore mlock */ TTU_IGNORE_ACCESS = (1 << 9), /* don't age */ @@ -181,7 +182,8 @@ static inline void page_dup_rmap(struct page *page) * Called from mm/vmscan.c to handle paging out */ int page_referenced(struct page *, int is_locked, - struct mem_cgroup *memcg, unsigned long *vm_flags); + struct mem_cgroup *memcg, unsigned long *vm_flags, + int *is_dirty); #define TTU_ACTION(x) ((x) & TTU_ACTION_MASK) @@ -260,9 +262,12 @@ int rmap_walk(struct page *page, struct rmap_walk_control *rwc); static inline int page_referenced(struct page *page, int is_loc
[PATCH v15 4/7] powerpc: add pmd_[dirty|mkclean] for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: linuxppc-...@lists.ozlabs.org Reviewed-by: Aneesh Kumar K.V Signed-off-by: Minchan Kim --- arch/powerpc/include/asm/pgtable-ppc64.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h index eb9261024f51..c9a4bbe8e179 100644 --- a/arch/powerpc/include/asm/pgtable-ppc64.h +++ b/arch/powerpc/include/asm/pgtable-ppc64.h @@ -468,9 +468,11 @@ static inline pte_t *pmdp_ptep(pmd_t *pmd) #define pmd_pfn(pmd) pte_pfn(pmd_pte(pmd)) #define pmd_young(pmd) pte_young(pmd_pte(pmd)) +#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) +#define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v15 7/7] mm: Don't split THP page when syscall is called
We don't need to split THP page when MADV_FREE syscall is called. It could be done when VM decide really frees it so we could avoid unnecessary THP split. Cc: Andrea Arcangeli Acked-by: Kirill A. Shutemov Signed-off-by: Minchan Kim --- include/linux/huge_mm.h | 4 mm/huge_memory.c| 35 +++ mm/madvise.c| 21 - mm/rmap.c | 8 ++-- mm/vmscan.c | 28 ++-- 5 files changed, 83 insertions(+), 13 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 63579cb8d3dc..25a961256d9f 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -19,6 +19,9 @@ extern struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags); +extern int madvise_free_huge_pmd(struct mmu_gather *tlb, + struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr); extern int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr); @@ -56,6 +59,7 @@ extern pmd_t *page_check_address_pmd(struct page *page, unsigned long address, enum page_check_address_pmd_flag flag, spinlock_t **ptl); +extern int pmd_freeable(pmd_t pmd); #define HPAGE_PMD_ORDER (HPAGE_PMD_SHIFT-PAGE_SHIFT) #define HPAGE_PMD_NR (1mmap_sem)) { + pr_err("%s: mmap_sem is unlocked! addr=0x%lx end=0x%lx vma->vm_start=0x%lx vma->vm_end=0x%lx\n", + __func__, addr, end, + vma->vm_start, + vma->vm_end); + BUG(); + } +#endif + split_huge_page_pmd(vma, addr, pmd); + } else if (!madvise_free_huge_pmd(tlb, vma, pmd, addr)) + goto next; + /* fall through */ + } - split_huge_page_pmd(vma, addr, pmd); if (pmd_trans_unstable(pmd)) return 0; @@ -316,6 +334,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(pte - 1, ptl); +next: cond_resched(); return 0; } diff --git a/mm/rmap.c b/mm/rmap.c index 04c181133890..9c407576ff8e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -704,9 +704,13 @@ static int page_referenced_one(struct page *page, struct vm_area_struct *vma, referenced++; /* -* In this implmentation, MADV_FREE doesn't support THP free +* Use pmd_freeable instead of raw pmd_dirty because in some +* of architecture, pmd_dirty is not defined unless +* CONFIG_TRANSPARNTE_HUGE is enabled */ - dirty++; + if (!pmd_freeable(*pmd)) + dirty++; + spin_unlock(ptl); } else { pte_t *pte; diff --g
[PATCH v15 5/7] arm: add pmd_mkclean for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_mkclean for THP page MADV_FREE support. Cc: Catalin Marinas Cc: Russell King Cc: linux-arm-ker...@lists.infradead.org Acked-by: Will Deacon Acked-by: Steve Capper Signed-off-by: Minchan Kim --- arch/arm/include/asm/pgtable-3level.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 06e0bc0f8b00..bc913a065270 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -234,6 +234,7 @@ PMD_BIT_FUNC(mkold, &= ~PMD_SECT_AF); PMD_BIT_FUNC(mksplitting, |= L_PMD_SECT_SPLITTING); PMD_BIT_FUNC(mkwrite, &= ~L_PMD_SECT_RDONLY); PMD_BIT_FUNC(mkdirty, |= L_PMD_SECT_DIRTY); +PMD_BIT_FUNC(mkclean, &= ~L_PMD_SECT_DIRTY); PMD_BIT_FUNC(mkyoung, |= PMD_SECT_AF); #define pmd_mkhuge(pmd)(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT)) -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v15 2/7] x86: add pmd_[dirty|mkclean] for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x...@kernel.org Acked-by: Zhang Yanfei Acked-by: Kirill A. Shutemov Signed-off-by: Minchan Kim --- arch/x86/include/asm/pgtable.h | 10 ++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 0ec056012618..329865799653 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -104,6 +104,11 @@ static inline int pmd_young(pmd_t pmd) return pmd_flags(pmd) & _PAGE_ACCESSED; } +static inline int pmd_dirty(pmd_t pmd) +{ + return pmd_flags(pmd) & _PAGE_DIRTY; +} + static inline int pte_write(pte_t pte) { return pte_flags(pte) & _PAGE_RW; @@ -267,6 +272,11 @@ static inline pmd_t pmd_mkold(pmd_t pmd) return pmd_clear_flags(pmd, _PAGE_ACCESSED); } +static inline pmd_t pmd_mkclean(pmd_t pmd) +{ + return pmd_clear_flags(pmd, _PAGE_DIRTY); +} + static inline pmd_t pmd_wrprotect(pmd_t pmd) { return pmd_clear_flags(pmd, _PAGE_RW); -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v15 6/7] arm64: add pmd_[dirty|mkclean] for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Cc: Russell King Cc: linux-arm-ker...@lists.infradead.org Acked-by: Will Deacon Acked-by: Steve Capper Acked-by: Catalin Marinas Signed-off-by: Minchan Kim --- arch/arm64/include/asm/pgtable.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index ffe1ba0506d1..efb1b2fc4d39 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -259,10 +259,12 @@ static inline pmd_t pte_pmd(pte_t pte) #endif #define pmd_young(pmd) pte_young(pmd_pte(pmd)) +#define pmd_dirty(pmd) pte_dirty(pmd_pte(pmd)) #define pmd_wrprotect(pmd) pte_pmd(pte_wrprotect(pmd_pte(pmd))) #define pmd_mksplitting(pmd) pte_pmd(pte_mkspecial(pmd_pte(pmd))) #define pmd_mkold(pmd) pte_pmd(pte_mkold(pmd_pte(pmd))) #define pmd_mkwrite(pmd) pte_pmd(pte_mkwrite(pmd_pte(pmd))) +#define pmd_mkclean(pmd) pte_pmd(pte_mkclean(pmd_pte(pmd))) #define pmd_mkdirty(pmd) pte_pmd(pte_mkdirty(pmd_pte(pmd))) #define pmd_mkyoung(pmd) pte_pmd(pte_mkyoung(pmd_pte(pmd))) #define pmd_mknotpresent(pmd) (__pmd(pmd_val(pmd) & ~PMD_TYPE_MASK)) -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v15 0/7] MADV_FREE support
This patch enable MADV_FREE hint for madvise syscall, which have been supported by other OSes. [PATCH 1] includes the details. [1] support MADVISE_FREE for !THP page so if VM encounter THP page in syscall context, it splits THP page. [2-6] is to preparing to call madvise syscall without THP plitting [7] enable THP page support for MADV_FREE. * from v14 * Add more Ackedy-by from arch people(sparc, arm64 and arm) * Drop s390 since pmd_dirty/clean was merged * from v13 * Add more Ackedy-by from arch people(arm, arm64 and ppc) * Rebased on mmotm 2014-08-13-14-29 * from v12 * Fix - skip to mark free pte on try_to_free_swap failed page - Kirill * Add more Acked-by from arch maintainers and Kirill * From v11 * Fix arm build - Steve * Separate patch for arm and arm64 - Steve * Remove unnecessary check - Kirill * Skip non-vm_normal page - Kirill * Add Acked-by - Zhang * Sparc64 build fix * Pagetable walker THP handling fix * From v10 * Add Acked-by from arch stuff(x86, s390) * Pagewalker based pagetable working - Kirill * Fix try_to_unmap_one broken with hwpoison - Kirill * Use VM_BUG_ON_PAGE in madvise_free_pmd - Kirill * Fix pgtable-3level.h for arm - Steve * From v9 * Add Acked-by - Rik * Add THP page support - Kirill * From v8 * Rebased-on v3.16-rc2-mmotm-2014-06-25-16-44 * From v7 * Rebased-on next-20140613 * From v6 * Remove page from swapcache in syscal time * Move utility functions from memory.c to madvise.c - Johannes * Rename untilify functtions - Johannes * Remove unnecessary checks from vmscan.c - Johannes * Rebased-on v3.15-rc5-mmotm-2014-05-16-16-56 * Drop Reviewe-by because there was some changes since then. * From v5 * Fix PPC problem which don't flush TLB - Rik * Remove unnecessary lazyfree_range stub function - Rik * Rebased on v3.15-rc5 * From v4 * Add Reviewed-by: Zhang Yanfei * Rebase on v3.15-rc1-mmotm-2014-04-15-16-14 * From v3 * Add "how to work part" in description - Zhang * Add page_discardable utility function - Zhang * Clean up * From v2 * Remove forceful dirty marking of swap-readed page - Johannes * Remove deactivation logic of lazyfreed page * Rebased on 3.14 * Remove RFC tag * From v1 * Use custom page table walker for madvise_free - Johannes * Remove PG_lazypage flag - Johannes * Do madvise_dontneed instead of madvise_freein swapless system Minchan Kim (7): mm: support madvise(MADV_FREE) x86: add pmd_[dirty|mkclean] for THP sparc: add pmd_[dirty|mkclean] for THP powerpc: add pmd_[dirty|mkclean] for THP arm: add pmd_mkclean for THP arm64: add pmd_[dirty|mkclean] for THP mm: Don't split THP page when syscall is called arch/arm/include/asm/pgtable-3level.h| 1 + arch/arm64/include/asm/pgtable.h | 2 + arch/powerpc/include/asm/pgtable-ppc64.h | 2 + arch/sparc/include/asm/pgtable_64.h | 16 arch/x86/include/asm/pgtable.h | 10 ++ include/linux/huge_mm.h | 4 + include/linux/rmap.h | 9 +- include/linux/vm_event_item.h| 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/huge_memory.c | 35 +++ mm/madvise.c | 159 +++ mm/rmap.c| 46 - mm/vmscan.c | 64 + mm/vmstat.c | 1 + 14 files changed, 331 insertions(+), 20 deletions(-) -- 2.0.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, 17 Aug 2014, Russell King - ARM Linux wrote: > On Sun, Aug 17, 2014 at 03:04:34PM -0400, Jason Cooper wrote: > > Quoting Nico: > > > > "Of course it would be good to clarify things wrt Russell's remark > > independently from this patch." > > > > I took 'independently' to mean "This patch is ok, *and* we need to > > address Russell's concerns in a follow-up patch." > > > > Nico's Reviewed-by with that comment was sent August 13th. The most > > recent activity on this thread was also August 13th. After four days, I > > reasoned there were no objections to his comment. > > Right, during the merge window, and during merge windows, I tend to > ignore almost all email now because people don't stop developing, and > they don't take any notice where the mainline cycle is. In fact, I go > off and do non-kernel work during a merge window and only briefly scan > for bug fixes. > > However, I have other concerns with this patch, which I've yet to air. > For example, I don't like this crappy conditional locking that people > keep dreaming up - that kind of stuff makes the kernel much harder to > statically check that everything is correct. It's an anti-lockdep > strategy. > > Secondly, I don't like this: > > + raw_spin_lock(&gic_sgi_lock); > + /* > +* Ensure that the gic_cpu_map update above is seen in > +* gic_raise_softirq() before we redirect any pending SGIs that > +* may have been raised for the outgoing CPU (cur_cpu_id) > +*/ > + smp_mb__after_unlock_lock(); > + raw_spin_unlock(&gic_sgi_lock); > > That goes against the principle of locking, that you lock the data, > not the code. I admit I didn't understand the point of that construct on the first read. Maybe I wouldn't be the only one. Using Stephen's initial version for that hunk would be preferable as it is straight forward and would mean locking the data instead. > I have no problem with changing gic_raise_softirq() to use a different > lock, which gic_migrate_target(), and gic_set_affinity() can also use. > There's no need for horrid locking here, because the only thing we're > protecting is gic_map[] and the write to the register to trigger an > IPI - and nothing using gic_arch_extn has any business knowing about > SGIs. > > No need for these crappy sgi_map_lock() macros and all the ifdeffery. Those macros are there only to conditionalize the locking in gic_raise_softirq() because no locking what so ever is needed there when gic_migrate_target() is configured out. I suggested the macros to cut down on the #ifdefery in the code. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH 2/3] smp: re-implement the kick_all_cpus_sync() with wake_up_if_idle()
Hello Andy, > -Original Message- > From: Andy Lutomirski [mailto:l...@amacapital.net] > Sent: Friday, August 15, 2014 11:41 PM > To: Liu, Chuansheng > Cc: Peter Zijlstra; Daniel Lezcano; Rafael J. Wysocki; Ingo Molnar; > linux...@vger.kernel.org; linux-kernel@vger.kernel.org; Liu, Changcheng; > Wang, Xiaoming; Chakravarty, Souvik K > Subject: Re: [PATCH 2/3] smp: re-implement the kick_all_cpus_sync() with > wake_up_if_idle() > > On Fri, Aug 15, 2014 at 12:01 AM, Chuansheng Liu > wrote: > > Currently using smp_call_function() just woke up the corresponding > > cpu, but can not break the polling idle loop. > > > > Here using the new sched API wake_up_if_idle() to implement it. > > kick_all_cpus_sync has other callers, and those other callers want the > old behavior. I think this should be a new function. > Yes, seems some current users of kick_all_cpus_sync() need IPI indeed, will try to send out patch V2 with one new function.
Re: [PATCH v14 5/8] s390: add pmd_[dirty|mkclean] for THP
Hello, On Thu, Aug 14, 2014 at 09:16:14AM +0200, Martin Schwidefsky wrote: > On Thu, 14 Aug 2014 10:53:29 +0900 > Minchan Kim wrote: > > > MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent > > overwrite of the contents since MADV_FREE syscall is called for > > THP page but for s390 pmds only referenced bit is available > > because there is no free bit left in the pmd entry for the > > software dirty bit so this patch adds dumb pmd_dirty which > > returns always true by suggesting by Martin. > > > > They finally find a solution in future. > > http://marc.info/?l=linux-api&m=140440328820808&w=2 > > The solution is already there, see git commit 152125b7a882df36. > You can drop this patch. Thanks for the heads up. I will drop it in next spin. > > -- > blue skies, >Martin. > > "Reality continues to ruin my life." - Calvin. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4] irqchip: gic: Allow gic_arch_extn hooks to call into scheduler
On Sun, 17 Aug 2014, Jason Cooper wrote: > Russell, > > On Sun, Aug 17, 2014 at 07:55:23PM +0100, Russell King - ARM Linux wrote: > > On Sun, Aug 17, 2014 at 01:32:36PM -0400, Jason Cooper wrote: > > > Applied to irqchip/urgent with Nico's Ack. > > > > Interesting, so I'm discussing this patch, and it gets applied anyway... > > yes, that's great. > > Quoting Nico: > > "Of course it would be good to clarify things wrt Russell's remark > independently from this patch." > > I took 'independently' to mean "This patch is ok, *and* we need to > address Russell's concerns in a follow-up patch." > > Nico's Reviewed-by with that comment was sent August 13th. The most > recent activity on this thread was also August 13th. After four days, I > reasoned there were no objections to his comment. Well... I mentioned this patch is a nice cleanup independently of the reason why it was created in the first place. Maybe that shouldn't be sorted as "urgent" in that case, especially when the code having problem with the current state of things is living out of mainline. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] zram: add mem_used_max via sysfs
On Thu, Aug 14, 2014 at 11:32:36AM -0400, David Horner wrote: > On Thu, Aug 14, 2014 at 11:09 AM, Dan Streetman wrote: > > On Wed, Aug 13, 2014 at 9:12 PM, Minchan Kim wrote: > >> - if (zram->limit_bytes && > >> - zs_get_total_size_bytes(meta->mem_pool) > > >> zram->limit_bytes) { > >> + total_bytes = zs_get_total_size_bytes(meta->mem_pool); > >> + if (zram->limit_bytes && total_bytes > zram->limit_bytes) { > > > > do you need to take the init_lock to read limit_bytes here? It could > > be getting changed between these checks... > > There is no real danger in freeing with an error. > It is more timing than a race. > > The max calculation is still ok because committed allocations are > added atomically. There is one problem in below code piece. zram->max_used_bytes = max(zram->max_used_bytes, total_bytes); so we should consider this case. if (zram->max_used_bytes < total_bytes) zram->max_used_bytes = total_bytes; And we could make the situation like this. if (zram->max_used_bytes < total_bytes) IRQ happen; zram->max_used_bytes = total_bytes During IRQ, other CPU could consume a lot of zsmalloc memory so that zram->max_used_bytes would be increased under the foot so when IRQ is finshed, zram->max_used_bytes could be reset with old total_bytes. To prevent it, we should use the lock I posted RFC version or retry logic with atomic opeartion(ie, cmpxchg) and my approach makes it simple first and fix it if we see the trouble in future so my preference is new spin lock at the moment. Any comments? > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/7] locking/rwsem: more aggressive use of optimistic spinning
On Fri, Aug 15, 2014 at 01:58:09PM -0400, Waiman Long wrote: > On 08/14/2014 11:34 PM, Dave Chinner wrote: > > > > > >xfs_io -f -c "truncate 500t" -c "extsize 1m" /path/to/vm/image/file > > Thank for the testing recipe. I am afraid that I can't find a 500TB > SSD for testing purpose. Which bit of "sparse vm image file" didn't you understand? I'm using a 400GB of SSD for this testing $ df -h /mnt/fast-ssd Filesystem Size Used Avail Use% Mounted on /dev/sdf400G 275G 125G 69% /mnt/fast-ssd $ ls -lh /mnt/fast-ssd/vm-500t.img -rw--- 1 root root 500T Aug 15 13:21 /mnt/fast-ssd/vm-500t.img $ du -sh /mnt/fast-ssd/vm-500t.img 275G/mnt/fast-ssd/vm-500t.img That is on a Samsung 840 EVO SSD, which just about everyone should be able to obtain. Do you *really* think I have 500TB of SSDs lying around? Cheers, Dave. -- Dave Chinner da...@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v3 0/2] vfs / btrfs: add support for ustat()
On Fri, Aug 15, 2014 at 10:29:50AM +0100, Al Viro wrote: > On Thu, Aug 14, 2014 at 07:58:56PM -0700, Luis R. Rodriguez wrote: > > > Christoph had noted that this seemed associated to the problem > > that the btrfs uses different assignments for st_dev than s_dev, > > but much as I'd like to see that changed based on discussions so > > far its unclear if this is going to be possible unless strong > > commitment is reached. > > Explain, please. Whose commitment and commitment to what, exactly? There are two folks, one is the btrfs developers, and the others are the VFS maintainers to provide proper guidance. > Having different ->st_dev values for different files on the same > fs is a bloody bad idea; why does btrfs do that at all? With the disclosure of stating that I'm new to btrfs as I see its been done to help cope with the copy on write mechanism, but I welcome btrfs folks to chime in if there other reasons this was done from an architectural point of view. Provided all reasons why this was done are clarified what we'd need then is proper guidance on what *would* be a much more reasonable strategy to do what was desired, and finally commitmen from btrfs folks to change btrfs to switch to this new agreed upon strategy. > If nothing else, > it breaks the usual "are those two files on the same fs?" tests... It would seem that those tests need more context now with copy on write, even the notion of disk space is all fucked up now, we need to think of it in terms of different possibilities that the new filesystems allow us to share data and different outcomes that could be possible. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3] zram: add mem_used_max via sysfs
Hello Dan, On Thu, Aug 14, 2014 at 11:09:05AM -0400, Dan Streetman wrote: > On Wed, Aug 13, 2014 at 9:12 PM, Minchan Kim wrote: > > Normally, zram user can get maximum memory zsmalloc consumed via > > polling mem_used_total with sysfs in userspace. > > > > But it has a critical problem because user can miss peak memory > > usage during update interval of polling. For avoiding that, > > user should poll it frequently with mlocking to avoid delay > > when memory pressure is heavy so it would be handy if the > > kernel supports the function. > > > > This patch adds mem_used_max via sysfs. > > > > Signed-off-by: Minchan Kim > > --- > > Documentation/blockdev/zram.txt | 1 + > > drivers/block/zram/zram_drv.c | 35 +-- > > drivers/block/zram/zram_drv.h | 2 ++ > > 3 files changed, 36 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/blockdev/zram.txt > > b/Documentation/blockdev/zram.txt > > index 9f239ff8c444..3b2247c2d4cf 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -107,6 +107,7 @@ size of the disk when not in use so a huge zram is > > wasteful. > > orig_data_size > > compr_data_size > > mem_used_total > > + mem_used_max > > > > 8) Deactivate: > > swapoff /dev/zram0 > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index b48a3d0e9031..311699f18bd5 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -109,6 +109,30 @@ static ssize_t mem_used_total_show(struct device *dev, > > return scnprintf(buf, PAGE_SIZE, "%llu\n", val); > > } > > > > +static ssize_t mem_used_max_reset(struct device *dev, > > + struct device_attribute *attr, const char *buf, size_t len) > > +{ > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_write(&zram->init_lock); > > + zram->max_used_bytes = 0; > > + up_write(&zram->init_lock); > > + return len; > > +} > > + > > +static ssize_t mem_used_max_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + u64 max_used_bytes; > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_read(&zram->init_lock); > > + max_used_bytes = zram->max_used_bytes; > > + up_read(&zram->init_lock); > > + > > + return scnprintf(buf, PAGE_SIZE, "%llu\n", max_used_bytes); > > +} > > + > > static ssize_t max_comp_streams_show(struct device *dev, > > struct device_attribute *attr, char *buf) > > { > > @@ -474,6 +498,7 @@ static int zram_bvec_write(struct zram *zram, struct > > bio_vec *bvec, u32 index, > > struct zram_meta *meta = zram->meta; > > struct zcomp_strm *zstrm; > > bool locked = false; > > + u64 total_bytes; > > > > page = bvec->bv_page; > > if (is_partial_io(bvec)) { > > @@ -543,8 +568,8 @@ static int zram_bvec_write(struct zram *zram, struct > > bio_vec *bvec, u32 index, > > goto out; > > } > > > > - if (zram->limit_bytes && > > - zs_get_total_size_bytes(meta->mem_pool) > > > zram->limit_bytes) { > > + total_bytes = zs_get_total_size_bytes(meta->mem_pool); > > + if (zram->limit_bytes && total_bytes > zram->limit_bytes) { > > do you need to take the init_lock to read limit_bytes here? It could > be getting changed between these checks... The zram_bvec_write is protected by read-side init_lock while mem_limit_store is proteced by write-side init_lock. > > > zs_free(meta->mem_pool, handle); > > ret = -ENOMEM; > > goto out; > > @@ -578,6 +603,8 @@ static int zram_bvec_write(struct zram *zram, struct > > bio_vec *bvec, u32 index, > > /* Update stats */ > > atomic64_add(clen, &zram->stats.compr_data_size); > > atomic64_inc(&zram->stats.pages_stored); > > + > > + zram->max_used_bytes = max(zram->max_used_bytes, total_bytes); > > shouldn't max_used_bytes be atomic64_t? Or take the init_lock here? > > > out: > > if (locked) > > zcomp_strm_release(zram->comp, zstrm); > > @@ -656,6 +683,7 @@ static void zram_reset_device(struct zram *zram, bool > > reset_capacity) > > down_write(&zram->init_lock); > > > > zram->limit_bytes = 0; > > + zram->max_used_bytes = 0; > > > > if (!init_done(zram)) { > > up_write(&zram->init_lock); > > @@ -897,6 +925,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, > > NULL); > > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > > static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL); > > +static DEVICE_ATTR(mem_used_max, S_IRUGO | S_IWUSR, mem_used_max_show, > > + mem_used_max_reset); > > st
Re: [PATCH 3/3] zram: add mem_used_max via sysfs
Hi David, On Thu, Aug 14, 2014 at 06:29:17AM -0400, David Horner wrote: > The introduction of a reset can cause the stale zero value to be > retained in the show. > Instead reset to current value. It's better. I will do. Thanks! > > On Wed, Aug 13, 2014 at 9:12 PM, Minchan Kim wrote: > > Normally, zram user can get maximum memory zsmalloc consumed via > > polling mem_used_total with sysfs in userspace. > > > > But it has a critical problem because user can miss peak memory > > usage during update interval of polling. For avoiding that, > > user should poll it frequently with mlocking to avoid delay > > when memory pressure is heavy so it would be handy if the > > kernel supports the function. > > > > This patch adds mem_used_max via sysfs. > > > > Signed-off-by: Minchan Kim > > --- > > Documentation/blockdev/zram.txt | 1 + > > drivers/block/zram/zram_drv.c | 35 +-- > > drivers/block/zram/zram_drv.h | 2 ++ > > 3 files changed, 36 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/blockdev/zram.txt > > b/Documentation/blockdev/zram.txt > > index 9f239ff8c444..3b2247c2d4cf 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -107,6 +107,7 @@ size of the disk when not in use so a huge zram is > > wasteful. > > orig_data_size > > compr_data_size > > mem_used_total > > + mem_used_max > > > > 8) Deactivate: > > swapoff /dev/zram0 > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index b48a3d0e9031..311699f18bd5 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -109,6 +109,30 @@ static ssize_t mem_used_total_show(struct device *dev, > > return scnprintf(buf, PAGE_SIZE, "%llu\n", val); > > } > > > > +static ssize_t mem_used_max_reset(struct device *dev, > > + struct device_attribute *attr, const char *buf, size_t len) > > perhaps these are local functions, but wouldn't the zs_ prefix still > be appropriate? > > +{ > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_write(&zram->init_lock); > > + zram->max_used_bytes = 0; > >zram->max_used_bytes = zs_get_total_size_bytes(meta->mem_pool); > >(where meta is set up as below (beyond my skill level at > the moment)). > > > + up_write(&zram->init_lock); > > + return len; > > +} > > + > > +static ssize_t mem_used_max_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + u64 max_used_bytes; > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_read(&zram->init_lock); > > if these are atomic operations, why the (read and write) locks? > > > + max_used_bytes = zram->max_used_bytes; > > + up_read(&zram->init_lock); > > + > > + return scnprintf(buf, PAGE_SIZE, "%llu\n", max_used_bytes); > > +} > > + > > static ssize_t max_comp_streams_show(struct device *dev, > > struct device_attribute *attr, char *buf) > > { > > @@ -474,6 +498,7 @@ static int zram_bvec_write(struct zram *zram, struct > > bio_vec *bvec, u32 index, > > struct zram_meta *meta = zram->meta; > > struct zcomp_strm *zstrm; > > bool locked = false; > > + u64 total_bytes; > > > > page = bvec->bv_page; > > if (is_partial_io(bvec)) { > > @@ -543,8 +568,8 @@ static int zram_bvec_write(struct zram *zram, struct > > bio_vec *bvec, u32 index, > > goto out; > > } > > > > - if (zram->limit_bytes && > > - zs_get_total_size_bytes(meta->mem_pool) > > > zram->limit_bytes) { > > + total_bytes = zs_get_total_size_bytes(meta->mem_pool); > > + if (zram->limit_bytes && total_bytes > zram->limit_bytes) { > > zs_free(meta->mem_pool, handle); > > ret = -ENOMEM; > > goto out; > > @@ -578,6 +603,8 @@ static int zram_bvec_write(struct zram *zram, struct > > bio_vec *bvec, u32 index, > > /* Update stats */ > > atomic64_add(clen, &zram->stats.compr_data_size); > > atomic64_inc(&zram->stats.pages_stored); > > + > > + zram->max_used_bytes = max(zram->max_used_bytes, total_bytes); > > out: > > if (locked) > > zcomp_strm_release(zram->comp, zstrm); > > @@ -656,6 +683,7 @@ static void zram_reset_device(struct zram *zram, bool > > reset_capacity) > > down_write(&zram->init_lock); > > > > zram->limit_bytes = 0; > > + zram->max_used_bytes = 0; > > > > if (!init_done(zram)) { > > up_write(&zram->init_lock); > > @@ -897,6 +925,8 @@ static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, > > NULL); > > static DEVICE_ATTR(reset, S_IWUSR, NULL, reset_store); > > static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL); > > static DEVICE_AT
Re: [RFC 3/3] zram: limit memory size for zram
Hello Sergey, On Thu, Aug 14, 2014 at 10:29:53PM +0900, Sergey Senozhatsky wrote: > Hello Minchan, > > On (08/14/14 08:27), Minchan Kim wrote: > > Date: Thu, 14 Aug 2014 08:27:19 +0900 > > From: Minchan Kim > > To: Sergey Senozhatsky > > Cc: linux...@kvack.org, Jerome Marchand , > > linux-kernel@vger.kernel.org, juno.c...@lge.com, seungho1.p...@lge.com, > > Luigi Semenzato , Nitin Gupta > > Subject: Re: [RFC 3/3] zram: limit memory size for zram > > User-Agent: Mutt/1.5.21 (2010-09-15) > > > > Hey Sergey, > > > > On Tue, Aug 05, 2014 at 10:16:15PM +0900, Sergey Senozhatsky wrote: > > > Hello, > > > > > > On (08/05/14 18:48), Minchan Kim wrote: > > > > Another idea: we could define void zs_limit_mem(unsinged long nr_pages) > > > > in zsmalloc and put the limit in zs_pool via new API from zram so that > > > > zs_malloc could be failed as soon as it exceeds the limit. > > > > > > > > In the end, zram doesn't need to call zs_get_total_size_bytes on every > > > > write. It's more clean and right layer, IMHO. > > > > > > yes, I think this one is better. > > > > Although I suggested this new one, a few days ago I changed the decision > > and was testing the new patchset. > > > > If we add new API for zsmalloc, it adds unnecessary overhead for users who > > doesn't care of limit. Although it's cheap, I'd like to avoid that. > > > > The zsmalloc is just allocator so anybody can use it if they want. > > But limitation is just requirement of zram who is a one of client > > being able to use zsmalloc potentially so accouting should be on zram, > > not zsmalloc. > > > > my motivation was that zram does not use that much memory itself, > zspool - does. zram is just a clueless client from that point of > view: it recives some requests, do some things with supplied data, > and asks zspool if the latter one can find some place to keep that > data (and zram doesn't really care how that memory will be allocated > or will not be). Normally, when we consider malloc(3), malloc(3) doesn't give any API to limit memory size for the process. It just exposes some API to return the state like (ex, mallopt) to the user so it's user's role to manage the memory. I thought it's same with zsmalloc. zsmalloc already exposes zs_get_total_size_bytes so client can do it if he want to limit and frequent API call(ex, zs_get_total_size_bytes) should be his overhead while others who don't need to limit should be no overhead. > > I'm OK if we will have memory limitation in ZRAM. though conceptually, > IMHO, it feels that such logic belongs to allocation layer. yet I admit > the potential overhead issue. > > > If we might have more users of zsmalloc in future and they all want this > > feature that limit of zsmalloc memory usage, we might move the feature > > from client to zsmalloc core so everybody would be happy for performance > > and readability but opposite would be painful. > > > > In summary, let's keep the accounting logic in client side of zsmalloc(ie, > > zram) at the moment but we could move it into zsmalloc core possibly > > in future. > > > > Any thoughts? > > agreed. Thanks for the comment, Sergey! > > -ss > > > > > > > -ss > > > > > > > On Tue, Aug 05, 2014 at 05:02:03PM +0900, Minchan Kim wrote: > > > > > I have received a request several time from zram users. > > > > > They want to limit memory size for zram because zram can consume > > > > > lot of memory on system without limit so it makes memory management > > > > > control hard. > > > > > > > > > > This patch adds new knob to limit memory of zram. > > > > > > > > > > Signed-off-by: Minchan Kim > > > > > --- > > > > > Documentation/blockdev/zram.txt | 1 + > > > > > drivers/block/zram/zram_drv.c | 41 > > > > > + > > > > > drivers/block/zram/zram_drv.h | 1 + > > > > > 3 files changed, 43 insertions(+) > > > > > > > > > > diff --git a/Documentation/blockdev/zram.txt > > > > > b/Documentation/blockdev/zram.txt > > > > > index d24534bee763..fcb0561dfe2e 100644 > > > > > --- a/Documentation/blockdev/zram.txt > > > > > +++ b/Documentation/blockdev/zram.txt > > > > > @@ -96,6 +96,7 @@ size of the disk when not in use so a huge zram is > > > > > wasteful. > > > > > compr_data_size > > > > > mem_used_total > > > > > mem_used_max > > > > > + mem_limit > > > > > > > > > > 7) Deactivate: > > > > > swapoff /dev/zram0 > > > > > diff --git a/drivers/block/zram/zram_drv.c > > > > > b/drivers/block/zram/zram_drv.c > > > > > index a4d637b4db7d..47f68bbb2c44 100644 > > > > > --- a/drivers/block/zram/zram_drv.c > > > > > +++ b/drivers/block/zram/zram_drv.c > > > > > @@ -137,6 +137,37 @@ static ssize_t max_comp_streams_show(struct > > > > > device *dev, > > > > > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > > > > > } > > > > > > > > > > +static ssize_t mem_limit_show(struct device *dev, > > > > > + struct device_attri
Re: [PATCH 2/2] zram: limit memory size for zram
Hi Dan, On Thu, Aug 14, 2014 at 10:33:29AM -0400, Dan Streetman wrote: > On Wed, Aug 13, 2014 at 8:57 PM, Minchan Kim wrote: > > Since zram has no control feature to limit memory usage, > > it makes hard to manage system memrory. > > > > This patch adds new knob "mem_limit" via sysfs to set up the > > limit. > > > > Note: I added the logic in zram, not zsmalloc because the limit > > is requirement of zram, not zsmalloc so I'd like to avoid > > unnecessary branch in zsmalloc. > > > > Signed-off-by: Minchan Kim > > --- > > Documentation/blockdev/zram.txt | 20 +++ > > drivers/block/zram/zram_drv.c | 43 > > + > > drivers/block/zram/zram_drv.h | 1 + > > 3 files changed, 60 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/blockdev/zram.txt > > b/Documentation/blockdev/zram.txt > > index 0595c3f56ccf..9f239ff8c444 100644 > > --- a/Documentation/blockdev/zram.txt > > +++ b/Documentation/blockdev/zram.txt > > @@ -74,14 +74,26 @@ There is little point creating a zram of greater than > > twice the size of memory > > since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of > > the > > size of the disk when not in use so a huge zram is wasteful. > > > > -5) Activate: > > +5) Set memory limit: Optional > > + Set memory limit by writing the value to sysfs node 'mem_limit'. > > + The value can be either in bytes or you can use mem suffixes. > > + Examples: > > + # limit /dev/zram0 with 50MB memory > > + echo $((50*1024*1024)) > /sys/block/zram0/mem_limit > > + > > + # Using mem suffixes > > + echo 256K > /sys/block/zram0/mem_limit > > + echo 512M > /sys/block/zram0/mem_limit > > + echo 1G > /sys/block/zram0/mem_limit > > + > > +6) Activate: > > mkswap /dev/zram0 > > swapon /dev/zram0 > > > > mkfs.ext4 /dev/zram1 > > mount /dev/zram1 /tmp > > > > -6) Stats: > > +7) Stats: > > Per-device statistics are exported as various nodes under > > /sys/block/zram/ > > disksize > > @@ -96,11 +108,11 @@ size of the disk when not in use so a huge zram is > > wasteful. > > compr_data_size > > mem_used_total > > > > -7) Deactivate: > > +8) Deactivate: > > swapoff /dev/zram0 > > umount /dev/zram1 > > > > -8) Reset: > > +9) Reset: > > Write any positive value to 'reset' sysfs node > > echo 1 > /sys/block/zram0/reset > > echo 1 > /sys/block/zram1/reset > > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c > > index d00831c3d731..b48a3d0e9031 100644 > > --- a/drivers/block/zram/zram_drv.c > > +++ b/drivers/block/zram/zram_drv.c > > @@ -122,6 +122,35 @@ static ssize_t max_comp_streams_show(struct device > > *dev, > > return scnprintf(buf, PAGE_SIZE, "%d\n", val); > > } > > > > +static ssize_t mem_limit_show(struct device *dev, > > + struct device_attribute *attr, char *buf) > > +{ > > + u64 val; > > + struct zram *zram = dev_to_zram(dev); > > + > > + down_read(&zram->init_lock); > > + val = zram->limit_bytes; > > + up_read(&zram->init_lock); > > + > > + return scnprintf(buf, PAGE_SIZE, "%llu\n", val); > > +} > > + > > +static ssize_t mem_limit_store(struct device *dev, > > + struct device_attribute *attr, const char *buf, size_t len) > > +{ > > + u64 limit; > > + struct zram *zram = dev_to_zram(dev); > > + > > + limit = memparse(buf, NULL); > > + if (!limit) > > + return -EINVAL; > > Shouldn't passing a 0 limit be allowed, to disable the limit? Sure. Will fix. Thanks. -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 4/6] perf/tests: add interrupted state sample parsing test
This patch updates the sample parsing test with support for the sampling of machine interrupted state. The patch modifies the do_test() code to sahred the sample regts bitmask between user and intr regs. Signed-off-by: Stephane Eranian --- tools/perf/tests/sample-parsing.c | 55 +++-- 1 file changed, 40 insertions(+), 15 deletions(-) diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c index ca292f9..4908c64 100644 --- a/tools/perf/tests/sample-parsing.c +++ b/tools/perf/tests/sample-parsing.c @@ -126,16 +126,28 @@ static bool samples_same(const struct perf_sample *s1, if (type & PERF_SAMPLE_TRANSACTION) COMP(transaction); + if (type & PERF_SAMPLE_REGS_INTR) { + size_t sz = hweight_long(s1->intr_regs.mask) * sizeof(u64); + + COMP(intr_regs.mask); + COMP(intr_regs.abi); + if (s1->intr_regs.abi && + (!s1->intr_regs.regs || !s2->intr_regs.regs || +memcmp(s1->intr_regs.regs, s2->intr_regs.regs, sz))) { + pr_debug("Samples differ at 'intr_regs'\n"); + return false; + } + } + return true; } -static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format) +static int do_test(u64 sample_type, u64 sample_regs, u64 read_format) { struct perf_evsel evsel = { .needs_swap = false, .attr = { .sample_type = sample_type, - .sample_regs_user = sample_regs_user, .read_format = read_format, }, }; @@ -154,7 +166,7 @@ static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format) /* 1 branch_entry */ .data = {1, 211, 212, 213}, }; - u64 user_regs[64]; + u64 regs[64]; const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL}; const u64 data[] = {0x2211443366558877ULL, 0, 0xaabbccddeeff4321ULL}; struct perf_sample sample = { @@ -176,8 +188,8 @@ static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format) .branch_stack = &branch_stack.branch_stack, .user_regs = { .abi= PERF_SAMPLE_REGS_ABI_64, - .mask = sample_regs_user, - .regs = user_regs, + .mask = sample_regs, + .regs = regs, }, .user_stack = { .size = sizeof(data), @@ -187,14 +199,25 @@ static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format) .time_enabled = 0x030a59d664fca7deULL, .time_running = 0x011b6ae553eb98edULL, }, + .intr_regs = { + .abi= PERF_SAMPLE_REGS_ABI_64, + .mask = sample_regs, + .regs = regs, + }, }; struct sample_read_value values[] = {{1, 5}, {9, 3}, {2, 7}, {6, 4},}; struct perf_sample sample_out; size_t i, sz, bufsz; int err, ret = -1; - for (i = 0; i < sizeof(user_regs); i++) - *(i + (u8 *)user_regs) = i & 0xfe; + if (sample_type & PERF_SAMPLE_REGS_USER) + evsel.attr.sample_regs_user = sample_regs; + + if (sample_type & PERF_SAMPLE_REGS_INTR) + evsel.attr.sample_regs_intr = sample_regs; + + for (i = 0; i < sizeof(regs); i++) + *(i + (u8 *)regs) = i & 0xfe; if (read_format & PERF_FORMAT_GROUP) { sample.read.group.nr = 4; @@ -271,7 +294,7 @@ int test__sample_parsing(void) { const u64 rf[] = {4, 5, 6, 7, 12, 13, 14, 15}; u64 sample_type; - u64 sample_regs_user; + u64 sample_regs; size_t i; int err; @@ -280,7 +303,7 @@ int test__sample_parsing(void) * were added. Please actually update the test rather than just change * the condition below. */ - if (PERF_SAMPLE_MAX > PERF_SAMPLE_TRANSACTION << 1) { + if (PERF_SAMPLE_MAX > PERF_SAMPLE_REGS_INTR << 1) { pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n"); return -1; } @@ -297,22 +320,24 @@ int test__sample_parsing(void) } continue; } + sample_regs = 0; if (sample_type == PERF_SAMPLE_REGS_USER) - sample_regs_user = 0x3fff; - else - sample_regs_user = 0; + sample_regs = 0x3fff; + + if (sample_type == PERF_SAMPLE_REGS_INTR) + sample_regs = 0xff0fff;
[PATCH v3 1/6] perf: add ability to sample machine state on interrupt
Enable capture of interrupted machine state for each sample. Registers to sample are passed per event in the sample_regs_intr bitmask. To sample interrupt machine state, the PERF_SAMPLE_INTR_REGS must be passed in sample_type. The list of available registers is arch dependent and provided by asm/perf_regs.h Registers are laid out as u64 in the order of the bit order of sample_intr_regs. Reviewed-by: Jiri Olsa Reviewed-by: Andi Kleen Signed-off-by: Stephane Eranian --- include/linux/perf_event.h |7 +-- include/uapi/linux/perf_event.h | 14 - kernel/events/core.c| 44 +-- 3 files changed, 60 insertions(+), 5 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f0a1036..e043465 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -79,7 +79,7 @@ struct perf_branch_stack { struct perf_branch_entryentries[0]; }; -struct perf_regs_user { +struct perf_regs { __u64 abi; struct pt_regs *regs; }; @@ -599,7 +599,8 @@ struct perf_sample_data { struct perf_callchain_entry *callchain; struct perf_raw_record *raw; struct perf_branch_stack*br_stack; - struct perf_regs_user regs_user; + struct perf_regsregs_user; + struct perf_regsregs_intr; u64 stack_user_size; u64 weight; /* @@ -629,6 +630,8 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->weight = 0; data->data_src.val = PERF_MEM_NA; data->txn = 0; + data->regs_intr.abi = PERF_SAMPLE_REGS_ABI_NONE; + data->regs_intr.regs = NULL; } extern void perf_output_sample(struct perf_output_handle *handle, diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 9269de2..8019505 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -137,8 +137,9 @@ enum perf_event_sample_format { PERF_SAMPLE_DATA_SRC= 1U << 15, PERF_SAMPLE_IDENTIFIER = 1U << 16, PERF_SAMPLE_TRANSACTION = 1U << 17, + PERF_SAMPLE_REGS_INTR = 1U << 18, - PERF_SAMPLE_MAX = 1U << 18, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */ }; /* @@ -334,6 +335,15 @@ struct perf_event_attr { /* Align to u64. */ __u32 __reserved_2; + /* +* Defines set of regs to dump for each sample +* state captured on: +* - precise = 0: PMU interrupt +* - precise > 0: sampled instruction +* +* See asm/perf_regs.h for details. +*/ + __u64 sample_regs_intr; }; #define perf_flags(attr) (*(&(attr)->read_format + 1)) @@ -686,6 +696,8 @@ enum perf_event_type { * { u64 weight; } && PERF_SAMPLE_WEIGHT * { u64 data_src; } && PERF_SAMPLE_DATA_SRC * { u64 transaction; } && PERF_SAMPLE_TRANSACTION +* { u64 abi; # enum perf_sample_regs_abi +*u64 regs[weight(mask)]; } && PERF_SAMPLE_REGS_INTR * }; */ PERF_RECORD_SAMPLE = 9, diff --git a/kernel/events/core.c b/kernel/events/core.c index 2d7363a..5fa8b17 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4395,7 +4395,7 @@ perf_output_sample_regs(struct perf_output_handle *handle, } } -static void perf_sample_regs_user(struct perf_regs_user *regs_user, +static void perf_sample_regs_user(struct perf_regs *regs_user, struct pt_regs *regs) { if (!user_mode(regs)) { @@ -4411,6 +4411,14 @@ static void perf_sample_regs_user(struct perf_regs_user *regs_user, } } +static void perf_sample_regs_intr(struct perf_regs *regs_intr, + struct pt_regs *regs) +{ + regs_intr->regs = regs; + regs_intr->abi = perf_reg_abi(current); +} + + /* * Get remaining task size from user stack pointer. * @@ -4792,6 +4800,22 @@ void perf_output_sample(struct perf_output_handle *handle, if (sample_type & PERF_SAMPLE_TRANSACTION) perf_output_put(handle, data->txn); + if (sample_type & PERF_SAMPLE_REGS_INTR) { + u64 abi = data->regs_intr.abi; + /* +* If there are no regs to dump, notice it through +* first u64 being zero (PERF_SAMPLE_REGS_ABI_NONE). +*/ + perf_output_put(handle, abi); + + if (abi) { + u64 mask = event->attr.sample_regs_intr; + perf_output_sample_regs(handle, +
[PATCH v3 6/6] perf: improve perf_sample_data struct layout
From: Peter Zijlstra This patch reorders fields in the perf_sample_data struct in order to minimize the number of cachelines touched in perf_sample_data_init(). It also removes some intializations which are redundant with the code in kernel/events/core.c Signed-off-by: Peter Zijlstra --- include/linux/perf_event.h | 34 +- kernel/events/core.c |5 - 2 files changed, 21 insertions(+), 18 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e043465..57b7efc 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -579,35 +579,40 @@ extern u64 perf_event_read_value(struct perf_event *event, struct perf_sample_data { - u64 type; + /* +* Fields set by perf_sample_data_init(), group so as to +* minimize the cachelines touched. +*/ + u64 addr; + struct perf_raw_record *raw; + struct perf_branch_stack*br_stack; + u64 period; + u64 weight; + u64 txn; + union perf_mem_data_srcdata_src; + /* +* The other fields, optionally {set,used} by +* perf_{prepare,output}_sample(). +*/ + u64 type; u64 ip; struct { u32 pid; u32 tid; } tid_entry; u64 time; - u64 addr; u64 id; u64 stream_id; struct { u32 cpu; u32 reserved; } cpu_entry; - u64 period; - union perf_mem_data_srcdata_src; struct perf_callchain_entry *callchain; - struct perf_raw_record *raw; - struct perf_branch_stack*br_stack; struct perf_regsregs_user; struct perf_regsregs_intr; u64 stack_user_size; - u64 weight; - /* -* Transaction flags for abort events: -*/ - u64 txn; -}; +} cacheline_aligned; /* default value for data source */ #define PERF_MEM_NA (PERF_MEM_S(OP, NA) |\ @@ -624,14 +629,9 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->raw = NULL; data->br_stack = NULL; data->period = period; - data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE; - data->regs_user.regs = NULL; - data->stack_user_size = 0; data->weight = 0; data->data_src.val = PERF_MEM_NA; data->txn = 0; - data->regs_intr.abi = PERF_SAMPLE_REGS_ABI_NONE; - data->regs_intr.regs = NULL; } extern void perf_output_sample(struct perf_output_handle *handle, diff --git a/kernel/events/core.c b/kernel/events/core.c index 5fa8b17..696a778 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4406,8 +4406,11 @@ static void perf_sample_regs_user(struct perf_regs *regs_user, } if (regs) { - regs_user->regs = regs; regs_user->abi = perf_reg_abi(current); + regs_user->regs = regs; + } else { + regs_user->abi = PERF_SAMPLE_REGS_ABI_NONE; + regs_user->regs = NULL; } } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 3/6] perf tools: add core support for sampling intr machine state regs
Add the infrastructure to setup, collect and report the interrupt machine state regs which can be captured by the kernel. Signed-off-by: Stephane Eranian --- tools/perf/perf.h |1 + tools/perf/util/event.h |1 + tools/perf/util/evsel.c | 46 - tools/perf/util/session.c | 44 ++- 4 files changed, 86 insertions(+), 6 deletions(-) diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 510c65f..309d956 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -54,6 +54,7 @@ struct record_opts { bool sample_weight; bool sample_time; bool period; + bool sample_intr_regs; unsigned int freq; unsigned int mmap_pages; unsigned int user_freq; diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index 7eb7107..d6e79f3 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -162,6 +162,7 @@ struct perf_sample { struct ip_callchain *callchain; struct branch_stack *branch_stack; struct regs_dump user_regs; + struct regs_dump intr_regs; struct stack_dump user_stack; struct sample_read read; }; diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 01ce14c..74b4268 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -628,6 +628,11 @@ void perf_evsel__config(struct perf_evsel *evsel, struct record_opts *opts) if (opts->call_graph_enabled && !evsel->no_aux_samples) perf_evsel__config_callgraph(evsel, opts); + if (opts->sample_intr_regs) { + attr->sample_regs_intr = PERF_REGS_MASK; + perf_evsel__set_sample_bit(evsel, REGS_INTR); + } + if (target__has_cpu(&opts->target)) perf_evsel__set_sample_bit(evsel, CPU); @@ -1005,6 +1010,7 @@ static size_t perf_event_attr__fprintf(struct perf_event_attr *attr, FILE *fp) ret += PRINT_ATTR_X64(branch_sample_type); ret += PRINT_ATTR_X64(sample_regs_user); ret += PRINT_ATTR_U32(sample_stack_user); + ret += PRINT_ATTR_X64(sample_regs_intr); ret += fprintf(fp, "%.60s\n", graph_dotted_line); @@ -1504,6 +1510,23 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event, array++; } + data->intr_regs.abi = PERF_SAMPLE_REGS_ABI_NONE; + if (type & PERF_SAMPLE_REGS_INTR) { + OVERFLOW_CHECK_u64(array); + data->intr_regs.abi = *array; + array++; + + if (data->intr_regs.abi != PERF_SAMPLE_REGS_ABI_NONE) { + u64 mask = evsel->attr.sample_regs_intr; + + sz = hweight_long(mask) * sizeof(u64); + OVERFLOW_CHECK(array, sz, max_size); + data->intr_regs.mask = mask; + data->intr_regs.regs = (u64 *)array; + array = (void *)array + sz; + } + } + return 0; } @@ -1599,6 +1622,16 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type, if (type & PERF_SAMPLE_TRANSACTION) result += sizeof(u64); + if (type & PERF_SAMPLE_REGS_INTR) { + if (sample->intr_regs.abi) { + result += sizeof(u64); + sz = hweight_long(sample->intr_regs.mask) * sizeof(u64); + result += sz; + } else { + result += sizeof(u64); + } + } + return result; } @@ -1777,6 +1810,17 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type, array++; } + if (type & PERF_SAMPLE_REGS_INTR) { + if (sample->intr_regs.abi) { + *array++ = sample->intr_regs.abi; + sz = hweight_long(sample->intr_regs.mask) * sizeof(u64); + memcpy(array, sample->intr_regs.regs, sz); + array = (void *)array + sz; + } else { + *array++ = 0; + } + } + return 0; } @@ -1906,7 +1950,7 @@ static int sample_type__fprintf(FILE *fp, bool *first, u64 value) bit_name(READ), bit_name(CALLCHAIN), bit_name(ID), bit_name(CPU), bit_name(PERIOD), bit_name(STREAM_ID), bit_name(RAW), bit_name(BRANCH_STACK), bit_name(REGS_USER), bit_name(STACK_USER), - bit_name(IDENTIFIER), + bit_name(IDENTIFIER), bit_name(REGS_INTR), { .name = NULL, } }; #undef bit_name diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 6d2d50d..4eb8ca6 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -581,15 +581,46 @@ static void regs_dump__printf(u64 mask, u64 *re
[PATCH v3 2/6] perf/x86: add support for sampling PEBS machine state registers
PEBS can capture machine state regs at retiremnt of the sampled instructions. When precise sampling is enabled on an event, PEBS is used, so substitute the interrupted state with the PEBS state. Note that not all registers are captured by PEBS. Those missing are replaced by the interrupt state counter-parts. Signed-off-by: Stephane Eranian --- arch/x86/kernel/cpu/perf_event_intel_ds.c | 17 + 1 file changed, 17 insertions(+) diff --git a/arch/x86/kernel/cpu/perf_event_intel_ds.c b/arch/x86/kernel/cpu/perf_event_intel_ds.c index 9dc4199..139a8a5 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_ds.c +++ b/arch/x86/kernel/cpu/perf_event_intel_ds.c @@ -886,6 +886,23 @@ static void __intel_pmu_pebs_event(struct perf_event *event, regs.bp = pebs->bp; regs.sp = pebs->sp; + if (sample_type & PERF_SAMPLE_REGS_INTR) { + regs.ax = pebs->ax; + regs.bx = pebs->bx; + regs.cx = pebs->cx; + regs.si = pebs->si; + regs.di = pebs->di; + + regs.r8 = pebs->r8; + regs.r9 = pebs->r9; + regs.r10 = pebs->r10; + regs.r11 = pebs->r11; + regs.r12 = pebs->r12; + regs.r13 = pebs->r13; + regs.r14 = pebs->r14; + regs.r14 = pebs->r15; + } + if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format >= 2) { regs.ip = pebs->real_ip; regs.flags |= PERF_EFLAGS_EXACT; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 5/6] perf record: add new -I option to sample interrupted machine state
Add -I/--intr-regs option to capture machine state registers at interrupt. Add the corresponding man page description Signed-off-by: Stephane Eranian --- tools/perf/Documentation/perf-record.txt |6 ++ tools/perf/builtin-record.c |2 ++ 2 files changed, 8 insertions(+) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index d460049..1a36259 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -214,6 +214,12 @@ if combined with -a or -C options. After starting the program, wait msecs before measuring. This is useful to filter out the startup phase of the program, which is often very different. +-I:: +--intr-regs:: +Capture machine state (registers) at interrupt, i.e., on counter overflows for +each sample. List of captured registers depends on the architecture. This option +is off by default. + SEE ALSO linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 4db670d..8dc1fd8 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -882,6 +882,8 @@ const struct option record_options[] = { "sample transaction flags (special events only)"), OPT_BOOLEAN(0, "per-thread", &record.opts.target.per_thread, "use per-thread mmaps"), + OPT_BOOLEAN('I', "intr-regs", &record.opts.sample_intr_regs, + "Sample machine registers on interrupt"), OPT_END() }; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/