[GIT PULL] f2fs: request for tree inclusion
Hi Linus, This is the first pull request for tree inclusion of Flash-Friendly File System (F2FS) towards the 3.8 merge window. http://lwn.net/Articles/518718/ http://lwn.net/Articles/518988/ http://en.wikipedia.org/wiki/F2FS The f2fs has been in the linux-next tree for a while, and several issues have been cleared as described in the signed tag below. And also, I've done testing f2fs successfully based on Linux 3.7 with the following test scenarios. - Reliability test: Run fsstress on an SSD partition. - Robustness test: Conduct sudden-power-off and examine the fs consistency repeatedly, while running a reliability test. So, please pull the f2fs filesystem. If I'm missing any issues or made mistakes, please let me know. Thanks, Jaegeuk Kim The following changes since commit 29594404d7fe73cd80eaa4ee8c43dcc53970c60e: Linux 3.7 (2012-12-10 19:30:57 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git tags/for-3.8-merge for you to fetch changes up to e6aa9f36b2bfd6b30072c07b34f2a24becf1: f2fs: fix tracking parent inode number (2012-12-11 13:43:45 +0900) Introduce a new file system, Flash-Friendly File System (F2FS), to Linux 3.8. Highlights: - Add initial f2fs source codes - Fix an endian conversion bug - Fix build failures on random configs - Fix the power-off-recovery routine - Minor cleanup, coding style, and typos patches Greg Kroah-Hartman (1): f2fs: move proc files to debugfs Huajun Li (1): f2fs: fix a typo in f2fs documentation Jaegeuk Kim (22): f2fs: add document f2fs: add on-disk layout f2fs: add superblock and major in-memory structure f2fs: add super block operations f2fs: add checkpoint operations f2fs: add node operations f2fs: add segment operations f2fs: add file operations f2fs: add address space operations for data f2fs: add core inode operations f2fs: add inode operations for special inodes f2fs: add core directory operations f2fs: add xattr and acl functionalities f2fs: add garbage collection functions f2fs: add recovery routines for roll-forward f2fs: update Kconfig and Makefile f2fs: update the f2fs document f2fs: fix endian conversion bugs reported by sparse f2fs: adjust kernel coding style f2fs: resolve build failures f2fs: cleanup the f2fs_bio_alloc routine f2fs: fix tracking parent inode number Namjae Jeon (10): f2fs: fix the compiler warning for uninitialized use of variable f2fs: show error in case of invalid mount arguments f2fs: remove unneeded memset from init_once f2fs: check read only condition before beginning write out f2fs: remove unneeded initialization f2fs: move error condition for mkdir at proper place f2fs: rewrite f2fs_bio_alloc to make it simpler f2fs: make use of GFP_F2FS_ZERO for setting gfp_mask f2fs: remove redundant call to f2fs_put_page in delete entry f2fs: introduce accessor to retrieve number of dentry slots Sachin Kamat (1): f2fs: remove unneeded version.h header file from f2fs.h Wei Yongjun (1): f2fs: remove unused variable Documentation/filesystems/00-INDEX |2 + Documentation/filesystems/f2fs.txt | 421 + fs/Kconfig |1 + fs/Makefile|1 + fs/f2fs/Kconfig| 53 ++ fs/f2fs/Makefile |7 + fs/f2fs/acl.c | 414 + fs/f2fs/acl.h | 57 ++ fs/f2fs/checkpoint.c | 794 fs/f2fs/data.c | 702 ++ fs/f2fs/debug.c| 361 fs/f2fs/dir.c | 672 ++ fs/f2fs/f2fs.h | 1083 ++ fs/f2fs/file.c | 636 + fs/f2fs/gc.c | 742 +++ fs/f2fs/gc.h | 117 +++ fs/f2fs/hash.c | 97 ++ fs/f2fs/inode.c| 268 ++ fs/f2fs/namei.c| 503 ++ fs/f2fs/node.c | 1764 +++ fs/f2fs/node.h | 353 +++ fs/f2fs/recovery.c | 375 fs/f2fs/segment.c | 1791 fs/f2fs/segment.h | 618 + fs/f2fs/super.c| 657 + fs/f2fs/xattr.c| 440 + fs/f2fs/xattr.h| 145 +++ include/linux/f2fs_fs.h| 413 + include/uapi/linux/magic.h |1 + 29 files changed, 13488 insertions(+) create mode 100644 Documentation/fi
Re: linux-next: manual merge of the akpm tree with Linus' tree
On 12/11/2012 09:22 AM, Stephen Rothwell wrote: > Hi Andrew, > > Today's linux-next merge of the akpm tree got a conflict in > include/linux/gfp.h between commit caf491916b1c ("Revert "revert "Revert > "mm: remove __GFP_NO_KSWAPD""" and associated damage") from Linus' tree > and commit "mm: add a __GFP_KMEMCG flag" from the akpm tree. > > I fixed it up (see below) and can carry the fix as necessary (no action > is required). > Fix is fine, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mfd: wm5102: Mark only extant DSP registers volatile
Since regmap sometimes uses volatile as a proxy for readable simply having a blanket condition can mark too many registers as readable. Signed-off-by: Mark Brown --- drivers/mfd/wm5102-tables.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/mfd/wm5102-tables.c b/drivers/mfd/wm5102-tables.c index 4a01192..0317d11 100644 --- a/drivers/mfd/wm5102-tables.c +++ b/drivers/mfd/wm5102-tables.c @@ -1837,9 +1837,6 @@ static bool wm5102_readable_register(struct device *dev, unsigned int reg) static bool wm5102_volatile_register(struct device *dev, unsigned int reg) { - if (reg > 0x) - return true; - switch (reg) { case ARIZONA_SOFTWARE_RESET: case ARIZONA_DEVICE_REVISION: @@ -1884,7 +1881,13 @@ static bool wm5102_volatile_register(struct device *dev, unsigned int reg) case ARIZONA_MIC_DETECT_3: return true; default: - return false; + if ((reg >= 0x10 && reg < 0x106000) || + (reg >= 0x18 && reg < 0x180800) || + (reg >= 0x19 && reg < 0x194800) || + (reg >= 0x1a8000 && reg < 0x1a9800)) + return true; + else + return false; } } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: feel Re: [PATCH v2 1/2] zsmalloc: add function to query object size
On Mon, Dec 10, 2012 at 11:19:49PM -0800, Nitin Gupta wrote: > On 12/10/2012 07:59 PM, Minchan Kim wrote: > > On Fri, Dec 07, 2012 at 04:45:53PM -0800, Nitin Gupta wrote: > >> On Sun, Dec 2, 2012 at 11:52 PM, Minchan Kim wrote: > >>> On Sun, Dec 02, 2012 at 11:20:42PM -0800, Nitin Gupta wrote: > > > On Nov 30, 2012, at 5:54 AM, Minchan Kim > wrote: > > > On Thu, Nov 29, 2012 at 10:54:48PM -0800, Nitin Gupta wrote: > >> Changelog v2 vs v1: > >> - None > >> > >> Adds zs_get_object_size(handle) which provides the size of > >> the given object. This is useful since the user (zram etc.) > >> now do not have to maintain object sizes separately, saving > >> on some metadata size (4b per page). > >> > >> The object handle encodes pair which currently points > >> to the start of the object. Now, the handle implicitly stores the size > >> information by pointing to the object's end instead. Since zsmalloc is > >> a slab based allocator, the start of the object can be easily > >> determined > >> and the difference between the end offset encoded in the handle and the > >> start gives us the object size. > >> > >> Signed-off-by: Nitin Gupta > > Acked-by: Minchan Kim > > > > I already had a few comment in your previous versoin. > > I'm OK although you ignore them because I can make follow up patch about > > my nitpick but could you answer below my question? > > > >> --- > >> drivers/staging/zsmalloc/zsmalloc-main.c | 177 > >> +- > >> drivers/staging/zsmalloc/zsmalloc.h |1 + > >> 2 files changed, 127 insertions(+), 51 deletions(-) > >> > >> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c > >> b/drivers/staging/zsmalloc/zsmalloc-main.c > >> index 09a9d35..65c9d3b 100644 > >> --- a/drivers/staging/zsmalloc/zsmalloc-main.c > >> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c > >> @@ -112,20 +112,20 @@ > >> #define MAX_PHYSMEM_BITS 36 > >> #else /* !CONFIG_HIGHMEM64G */ > >> /* > >> - * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS > >> will just > >> + * If this definition of MAX_PHYSMEM_BITS is used, OFFSET_BITS will > >> just > >> * be PAGE_SHIFT > >> */ > >> #define MAX_PHYSMEM_BITS BITS_PER_LONG > >> #endif > >> #endif > >> #define _PFN_BITS(MAX_PHYSMEM_BITS - PAGE_SHIFT) > >> -#define OBJ_INDEX_BITS(BITS_PER_LONG - _PFN_BITS) > >> -#define OBJ_INDEX_MASK((_AC(1, UL) << OBJ_INDEX_BITS) - 1) > >> +#define OFFSET_BITS(BITS_PER_LONG - _PFN_BITS) > >> +#define OFFSET_MASK((_AC(1, UL) << OFFSET_BITS) - 1) > >> > >> #define MAX(a, b) ((a) >= (b) ? (a) : (b)) > >> /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ > >> #define ZS_MIN_ALLOC_SIZE \ > >> -MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) > >> +MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OFFSET_BITS)) > >> #define ZS_MAX_ALLOC_SIZEPAGE_SIZE > >> > >> /* > >> @@ -256,6 +256,11 @@ static int is_last_page(struct page *page) > >>return PagePrivate2(page); > >> } > >> > >> +static unsigned long get_page_index(struct page *page) > >> +{ > >> +return is_first_page(page) ? 0 : page->index; > >> +} > >> + > >> static void get_zspage_mapping(struct page *page, unsigned int > >> *class_idx, > >>enum fullness_group *fullness) > >> { > >> @@ -433,39 +438,86 @@ static struct page *get_next_page(struct page > >> *page) > >>return next; > >> } > >> > >> -/* Encode as a single handle value */ > >> -static void *obj_location_to_handle(struct page *page, unsigned long > >> obj_idx) > >> +static struct page *get_prev_page(struct page *page) > >> { > >> -unsigned long handle; > >> +struct page *prev, *first_page; > >> > >> -if (!page) { > >> -BUG_ON(obj_idx); > >> -return NULL; > >> -} > >> +first_page = get_first_page(page); > >> +if (page == first_page) > >> +prev = NULL; > >> +else if (page == (struct page *)first_page->private) > >> +prev = first_page; > >> +else > >> +prev = list_entry(page->lru.prev, struct page, lru); > >> > >> -handle = page_to_pfn(page) << OBJ_INDEX_BITS; > >> -handle |= (obj_idx & OBJ_INDEX_MASK); > >> +return prev; > >> > >> -return (void *)handle; > >> } > >> > >> -/* Decode pair from the given object handle */ > >> -static void obj_handle_to_location(unsigned long handle, struct page > >> **page, > >> -unsigned long *obj_idx) > >> +static void *encode_ptr(struct page *page, unsigned long offset) > >> { > >> -*pag
[GIT PULL] core/locking change for v3.8
Linus, Please pull the latest core-locking-for-linus git tree from: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-locking-for-linus HEAD: 99fb4a122e96203dfd6c67d99d908aafd20f4753 lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name() Just a oneliner cleanup. Thanks, Ingo --> Cyrill Gorcunov (1): lockdep: Use KSYM_NAME_LEN'ed buffer for __get_key_name() kernel/lockdep_proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/lockdep_proc.c b/kernel/lockdep_proc.c index 91c32a0..b2c71c5 100644 --- a/kernel/lockdep_proc.c +++ b/kernel/lockdep_proc.c @@ -39,7 +39,7 @@ static void l_stop(struct seq_file *m, void *v) static void print_name(struct seq_file *m, struct lock_class *class) { - char str[128]; + char str[KSYM_NAME_LEN]; const char *name = class->name; if (!name) { -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v3] Support volatile range for anon vma
On Tue, Dec 11, 2012 at 08:17:42AM +0100, Mike Hommey wrote: > On Tue, Dec 11, 2012 at 11:41:04AM +0900, Minchan Kim wrote: > > - What's the madvise(addr, length, MADV_VOLATILE)? > > > > It's a hint that user deliver to kernel so kernel can *discard* > > pages in a range anytime. > > > > - What happens if user access page(ie, virtual address) discarded > > by kernel? > > > > The user can see zero-fill-on-demand pages as if madvise(DONTNEED). > > What happened to getting SIGBUS? I thought it could force for user to handle signal. If user can receive signal, what can he do? Maybe he can call madivse(NOVOLATILE) in my old version but I removed it in this version so user don't need handle signal handling. The problem of madvise(NOVOLATILE) is that time delay between allocator allocats a free chunk to user and the user really access the memory. Normally, when allocator return free chunk to customer, allocator should call madvise(NOVOLATILE) but user could access the memory long time after. So during that time difference, that pages could be swap out. It means to mitigate the patch's goal. Yes. It's not good for tmpfs volatile pages. If you have an interesting about tmpfs-volatile, please look at this. https://lkml.org/lkml/2012/12/10/695 > > Mike > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 4/4] leds: leds-pwm: Add device tree bindings
On Mon, Dec 10, 2012 at 11:00:37AM +0100, Peter Ujfalusi wrote: [...] > +LED sub-node properties: > +- pwms : PWM property, please refer to: > + Documentation/devicetree/bindings/pwm/pwm.txt Instead of only referring to the generic PWM binding document, this should probably explain what the PWM device is used for. > +err: > + if (priv->num_leds > 0) { > + for (count = priv->num_leds - 1; count >= 0; count--) { > + led_classdev_unregister(&priv->leds[count].cdev); > + pwm_put(priv->leds[count].pwm); > + } > + } Can this not be written more simply as follows? while (priv->num_leds--) { ... } > static int led_pwm_remove(struct platform_device *pdev) > { > + struct led_pwm_platform_data *pdata = pdev->dev.platform_data; > struct led_pwm_priv *priv = platform_get_drvdata(pdev); > int i; > > - for (i = 0; i < priv->num_leds; i++) > + for (i = 0; i < priv->num_leds; i++) { > led_classdev_unregister(&priv->leds[i].cdev); > + if (!pdata) > + pwm_put(priv->leds[i].pwm); > + } Perhaps while at it we can add devm_of_pwm_get() along with exporting of_pwm_get() so that you don't have to special-case this? > +static const struct of_device_id of_pwm_leds_match[] = { > + { .compatible = "pwm-leds", }, > + {}, > +}; Doesn't this cause a compiler warning for !OF builds? Thierry pgpu79iP0dP5J.pgp Description: PGP signature
Re: [PATCH 1/1] media: saa7146: don't use mutex_lock_interruptible() in device_release().
On Tue December 11 2012 04:05:28 Cyril Roelandt wrote: > Use uninterruptible mutex_lock in the release() file op to make sure all > resources are properly freed when a process is being terminated. Returning > -ERESTARTSYS has no effect for a terminating process and this may cause driver > resources not to be released. Acked-by: Hans Verkuil Thanks! Hans > This was found using the following semantic patch > (http://coccinelle.lip6.fr/): > > > @r@ > identifier fops; > identifier release_func; > @@ > static const struct v4l2_file_operations fops = { > .release = release_func > }; > > @depends on r@ > identifier r.release_func; > expression E; > @@ > static int release_func(...) > { > ... > - if (mutex_lock_interruptible(E)) return -ERESTARTSYS; > + mutex_lock(E); > ... > } > > > Signed-off-by: Cyril Roelandt > --- > drivers/media/common/saa7146/saa7146_fops.c |3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/media/common/saa7146/saa7146_fops.c > b/drivers/media/common/saa7146/saa7146_fops.c > index b3890bd..0afe98d 100644 > --- a/drivers/media/common/saa7146/saa7146_fops.c > +++ b/drivers/media/common/saa7146/saa7146_fops.c > @@ -265,8 +265,7 @@ static int fops_release(struct file *file) > > DEB_EE("file:%p\n", file); > > - if (mutex_lock_interruptible(vdev->lock)) > - return -ERESTARTSYS; > + mutex_lock(vdev->lock); > > if (vdev->vfl_type == VFL_TYPE_VBI) { > if (dev->ext_vv_data->capabilities & V4L2_CAP_VBI_CAPTURE) > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
feel Re: [PATCH v2 1/2] zsmalloc: add function to query object size
On 12/10/2012 07:59 PM, Minchan Kim wrote: > On Fri, Dec 07, 2012 at 04:45:53PM -0800, Nitin Gupta wrote: >> On Sun, Dec 2, 2012 at 11:52 PM, Minchan Kim wrote: >>> On Sun, Dec 02, 2012 at 11:20:42PM -0800, Nitin Gupta wrote: On Nov 30, 2012, at 5:54 AM, Minchan Kim wrote: > On Thu, Nov 29, 2012 at 10:54:48PM -0800, Nitin Gupta wrote: >> Changelog v2 vs v1: >> - None >> >> Adds zs_get_object_size(handle) which provides the size of >> the given object. This is useful since the user (zram etc.) >> now do not have to maintain object sizes separately, saving >> on some metadata size (4b per page). >> >> The object handle encodes pair which currently points >> to the start of the object. Now, the handle implicitly stores the size >> information by pointing to the object's end instead. Since zsmalloc is >> a slab based allocator, the start of the object can be easily determined >> and the difference between the end offset encoded in the handle and the >> start gives us the object size. >> >> Signed-off-by: Nitin Gupta > Acked-by: Minchan Kim > > I already had a few comment in your previous versoin. > I'm OK although you ignore them because I can make follow up patch about > my nitpick but could you answer below my question? > >> --- >> drivers/staging/zsmalloc/zsmalloc-main.c | 177 >> +- >> drivers/staging/zsmalloc/zsmalloc.h |1 + >> 2 files changed, 127 insertions(+), 51 deletions(-) >> >> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c >> b/drivers/staging/zsmalloc/zsmalloc-main.c >> index 09a9d35..65c9d3b 100644 >> --- a/drivers/staging/zsmalloc/zsmalloc-main.c >> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c >> @@ -112,20 +112,20 @@ >> #define MAX_PHYSMEM_BITS 36 >> #else /* !CONFIG_HIGHMEM64G */ >> /* >> - * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will >> just >> + * If this definition of MAX_PHYSMEM_BITS is used, OFFSET_BITS will just >> * be PAGE_SHIFT >> */ >> #define MAX_PHYSMEM_BITS BITS_PER_LONG >> #endif >> #endif >> #define _PFN_BITS(MAX_PHYSMEM_BITS - PAGE_SHIFT) >> -#define OBJ_INDEX_BITS(BITS_PER_LONG - _PFN_BITS) >> -#define OBJ_INDEX_MASK((_AC(1, UL) << OBJ_INDEX_BITS) - 1) >> +#define OFFSET_BITS(BITS_PER_LONG - _PFN_BITS) >> +#define OFFSET_MASK((_AC(1, UL) << OFFSET_BITS) - 1) >> >> #define MAX(a, b) ((a) >= (b) ? (a) : (b)) >> /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ >> #define ZS_MIN_ALLOC_SIZE \ >> -MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) >> +MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OFFSET_BITS)) >> #define ZS_MAX_ALLOC_SIZEPAGE_SIZE >> >> /* >> @@ -256,6 +256,11 @@ static int is_last_page(struct page *page) >>return PagePrivate2(page); >> } >> >> +static unsigned long get_page_index(struct page *page) >> +{ >> +return is_first_page(page) ? 0 : page->index; >> +} >> + >> static void get_zspage_mapping(struct page *page, unsigned int >> *class_idx, >>enum fullness_group *fullness) >> { >> @@ -433,39 +438,86 @@ static struct page *get_next_page(struct page >> *page) >>return next; >> } >> >> -/* Encode as a single handle value */ >> -static void *obj_location_to_handle(struct page *page, unsigned long >> obj_idx) >> +static struct page *get_prev_page(struct page *page) >> { >> -unsigned long handle; >> +struct page *prev, *first_page; >> >> -if (!page) { >> -BUG_ON(obj_idx); >> -return NULL; >> -} >> +first_page = get_first_page(page); >> +if (page == first_page) >> +prev = NULL; >> +else if (page == (struct page *)first_page->private) >> +prev = first_page; >> +else >> +prev = list_entry(page->lru.prev, struct page, lru); >> >> -handle = page_to_pfn(page) << OBJ_INDEX_BITS; >> -handle |= (obj_idx & OBJ_INDEX_MASK); >> +return prev; >> >> -return (void *)handle; >> } >> >> -/* Decode pair from the given object handle */ >> -static void obj_handle_to_location(unsigned long handle, struct page >> **page, >> -unsigned long *obj_idx) >> +static void *encode_ptr(struct page *page, unsigned long offset) >> { >> -*page = pfn_to_page(handle >> OBJ_INDEX_BITS); >> -*obj_idx = handle & OBJ_INDEX_MASK; >> +unsigned long ptr; >> +ptr = page_to_pfn(page) << OFFSET_BITS; >> +ptr |= offset & OFFSET_MASK; >> +return (void *)ptr; >> +} >> + >> +static void decode_ptr(unsigned long ptr, struct
Re: [PATCH] kvm/vmx: fix the return value of handle_vmcall()
On Mon, Dec 10, 2012 at 03:28:13PM -0600, Jesse Larrew wrote: > > The return value of kvm_emulate_hypercall() is intended to inform callers > whether or not we need to exit to userspace. However, handle_vmcall() > currently ignores the return value. > No, it it not. KVM does not handle vmcalls in userspace. > This patch simply propogates the return value from kvm_emulate_hypercall() > to callers so that it can be acted upon appropriately. > > Signed-off-by: Jesse Larrew > --- > arch/x86/kvm/vmx.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index f858159..8b37f5f 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -4682,8 +4682,7 @@ static int handle_halt(struct kvm_vcpu *vcpu) > static int handle_vmcall(struct kvm_vcpu *vcpu) > { > skip_emulated_instruction(vcpu); > - kvm_emulate_hypercall(vcpu); > - return 1; > + return kvm_emulate_hypercall(vcpu); > } > > static int handle_invd(struct kvm_vcpu *vcpu) > -- > 1.7.11.7 > > Jesse Larrew > Software Engineer, KVM Team > IBM Linux Technology Center > Phone: (512) 973-2052 (T/L: 363-2052) > jlar...@linux.vnet.ibm.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v3] Support volatile range for anon vma
On Tue, Dec 11, 2012 at 11:41:04AM +0900, Minchan Kim wrote: > - What's the madvise(addr, length, MADV_VOLATILE)? > > It's a hint that user deliver to kernel so kernel can *discard* > pages in a range anytime. > > - What happens if user access page(ie, virtual address) discarded > by kernel? > > The user can see zero-fill-on-demand pages as if madvise(DONTNEED). What happened to getting SIGBUS? Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] regulator: lp3971: Convert to get_voltage_sel
Hello, On 12/10/2012 12:46 PM, Axel Lin wrote: regulator_list_voltage_table() returns -EINVAL if selector >= n_voltages. Thus we don't need to check if reg is greater than BUCK_TARGET_VOL_MAX_IDX in lp3971_dcdc_get_voltage_sel. BUCK_TARGET_VOL_MIN_IDX and BUCK_TARGET_VOL_MAX_IDX are not used, remove them. Signed-off-by: Axel Lin Acked-by: Marek Szyprowski --- drivers/regulator/lp3971.c | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/drivers/regulator/lp3971.c b/drivers/regulator/lp3971.c index 5f68ff1..9cb2c0f 100644 --- a/drivers/regulator/lp3971.c +++ b/drivers/regulator/lp3971.c @@ -73,8 +73,6 @@ static const unsigned int buck_voltage_map[] = { }; #define BUCK_TARGET_VOL_MASK 0x3f -#define BUCK_TARGET_VOL_MIN_IDX 0x01 -#define BUCK_TARGET_VOL_MAX_IDX 0x19 #define LP3971_BUCK_RAMP_REG(x) (buck_base_addr[x]+2) @@ -140,7 +138,7 @@ static int lp3971_ldo_disable(struct regulator_dev *dev) return lp3971_set_bits(lp3971, LP3971_LDO_ENABLE_REG, mask, 0); } -static int lp3971_ldo_get_voltage(struct regulator_dev *dev) +static int lp3971_ldo_get_voltage_sel(struct regulator_dev *dev) { struct lp3971 *lp3971 = rdev_get_drvdata(dev); int ldo = rdev_get_id(dev) - LP3971_LDO1; @@ -149,7 +147,7 @@ static int lp3971_ldo_get_voltage(struct regulator_dev *dev) reg = lp3971_reg_read(lp3971, LP3971_LDO_VOL_CONTR_REG(ldo)); val = (reg >> LDO_VOL_CONTR_SHIFT(ldo)) & LDO_VOL_CONTR_MASK; - return dev->desc->volt_table[val]; + return val; } static int lp3971_ldo_set_voltage_sel(struct regulator_dev *dev, @@ -168,7 +166,7 @@ static struct regulator_ops lp3971_ldo_ops = { .is_enabled = lp3971_ldo_is_enabled, .enable = lp3971_ldo_enable, .disable = lp3971_ldo_disable, - .get_voltage = lp3971_ldo_get_voltage, + .get_voltage_sel = lp3971_ldo_get_voltage_sel, .set_voltage_sel = lp3971_ldo_set_voltage_sel, }; @@ -201,24 +199,16 @@ static int lp3971_dcdc_disable(struct regulator_dev *dev) return lp3971_set_bits(lp3971, LP3971_BUCK_VOL_ENABLE_REG, mask, 0); } -static int lp3971_dcdc_get_voltage(struct regulator_dev *dev) +static int lp3971_dcdc_get_voltage_sel(struct regulator_dev *dev) { struct lp3971 *lp3971 = rdev_get_drvdata(dev); int buck = rdev_get_id(dev) - LP3971_DCDC1; u16 reg; - int val; reg = lp3971_reg_read(lp3971, LP3971_BUCK_TARGET_VOL1_REG(buck)); reg &= BUCK_TARGET_VOL_MASK; - if (reg <= BUCK_TARGET_VOL_MAX_IDX) - val = buck_voltage_map[reg]; - else { - val = 0; - dev_warn(&dev->dev, "chip reported incorrect voltage value.\n"); - } - - return val; + return reg; } static int lp3971_dcdc_set_voltage_sel(struct regulator_dev *dev, @@ -249,7 +239,7 @@ static struct regulator_ops lp3971_dcdc_ops = { .is_enabled = lp3971_dcdc_is_enabled, .enable = lp3971_dcdc_enable, .disable = lp3971_dcdc_disable, - .get_voltage = lp3971_dcdc_get_voltage, + .get_voltage_sel = lp3971_dcdc_get_voltage_sel, .set_voltage_sel = lp3971_dcdc_set_voltage_sel, }; Best regards -- Marek Szyprowski Samsung Poland R&D Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 2/4] leds: leds-pwm: Preparing the driver for device tree support
On Mon, Dec 10, 2012 at 11:00:35AM +0100, Peter Ujfalusi wrote: > In order to be able to add device tree support for leds-pwm driver we need > to rearrange the data structures used by the drivers. > > Signed-off-by: Peter Ujfalusi > --- > drivers/leds/leds-pwm.c | 39 +++ > 1 file changed, 23 insertions(+), 16 deletions(-) > > diff --git a/drivers/leds/leds-pwm.c b/drivers/leds/leds-pwm.c > index 351257c..02f0c0c 100644 > --- a/drivers/leds/leds-pwm.c > +++ b/drivers/leds/leds-pwm.c > @@ -30,6 +30,11 @@ struct led_pwm_data { > unsigned intperiod; > }; > > +struct led_pwm_priv { > + int num_leds; > + struct led_pwm_data leds[]; > +}; I think you want leds[0] here. Otherwise your structure is too large by sizeof(struct led_pwm_data *). > + > static void led_pwm_set(struct led_classdev *led_cdev, > enum led_brightness brightness) > { > @@ -47,25 +52,29 @@ static void led_pwm_set(struct led_classdev *led_cdev, > } > } > > +static inline int sizeof_pwm_leds_priv(int num_leds) Perhaps this should return size_t? > +{ > + return sizeof(struct led_pwm_priv) + > + (sizeof(struct led_pwm_data) * num_leds); > +} > + > static int led_pwm_probe(struct platform_device *pdev) > { > struct led_pwm_platform_data *pdata = pdev->dev.platform_data; > - struct led_pwm *cur_led; > - struct led_pwm_data *leds_data, *led_dat; > + struct led_pwm_priv *priv; > int i, ret = 0; > > if (!pdata) > return -EBUSY; > > - leds_data = devm_kzalloc(&pdev->dev, > - sizeof(struct led_pwm_data) * pdata->num_leds, > - GFP_KERNEL); > - if (!leds_data) > + priv = devm_kzalloc(&pdev->dev, sizeof_pwm_leds_priv(pdata->num_leds), > + GFP_KERNEL); I'm not sure if sizeof_pwm_leds_priv() requires to be a separate function. You could make it shorter by doing something like: size_t extra = sizeof(*led_dat) * pdata->num_leds; priv = devm_kzalloc(&pdev->dev, sizeof(*priv) + extra, GFP_KERNEL); But that's really just a matter of taste, so no further objections if you want to keep the inline function. Thierry pgpkjEcBiIAI5.pgp Description: PGP signature
Re: [RFC][PATCH RT 3/4] sched/rt: Use IPI to trigger RT task push migration instead of pulling
On Mon, 2012-12-10 at 20:53 -0500, Steven Rostedt wrote: > On Mon, 2012-12-10 at 17:15 -0800, Frank Rowand wrote: > > > I should have also mentioned some previous experience using IPIs to > > avoid runq lock contention on wake up. Someone encountered IPI > > storms when using the TTWU_QUEUE feature, thus it defaults to off > > for CONFIG_PREEMPT_RT_FULL: > > > > #ifndef CONFIG_PREEMPT_RT_FULL > > /* > >* Queue remote wakeups on the target CPU and process them > >* using the scheduler IPI. Reduces rq->lock contention/bounces. > >*/ > > SCHED_FEAT(TTWU_QUEUE, true) > > #else > > SCHED_FEAT(TTWU_QUEUE, false) > > > > Interesting, but I'm wondering if this also does it for every wakeup? If > you have 1000 tasks waking up on another CPU, this could potentially > send out 1000 IPIs. The number of IPIs here looks to be # of tasks > waking up, and perhaps more than that, as there could be multiple > instances that try to wake up the same task. Yeah. In mainline, wakeup via IPI is disabled within a socket, because it's too much of a performance hit for high frequency switchers. (It seems we're limited by the max rate at which we can IPI) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
On Mon, Dec 10, 2012 at 10:34 PM, H. Peter Anvin wrote: > > That doesn't work if the microcode is replaced at runtime. However, vmalloc > doesn't work either since 32 bits needs any one blob to be physically > contiguous. I have suggested Fenghua replace it with a linked list of > kmalloc areas, one for each blob. you mean: keep the all of version, and update code need to go over the list to find latest before apply update? BTW, do we really need to update microcode so early? Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] clk: changes for 3.8
The following changes since commit 8f0d8163b50e01f398b14bcd4dc039ac5ab18d64: Linux 3.7-rc3 (2012-10-28 12:24:48 -0700) are available in the git repository at: git://git.linaro.org/people/mturquette/linux.git tags/clk-for-linus for you to fetch changes up to 8f87189653d60656e262060665f52c855508a301: MAINTAINERS: bad email address for Mike Turquette (2012-12-10 22:35:32 -0800) The common clock framework changes for 3.8 are comprised of lots of fixes for existing platforms as well as new ports for some ARM platforms. In addition there are new clk drivers for audio devices and MFDs. Axel Lin (1): clk: spear: Add stub functions for spear3[0|1|2]0_clk_init() Deepak Sikri (2): CLK: SPEAr: Update clock rate table CLK: SPEAr: Correct index scanning done for clock synths Fabio Estevam (1): clk: mxs: Use a better name for the USB PHY clock Linus Walleij (4): clk: add GPLv2 headers to the Versatile clock files clk: make ICST driver handle the VCO registers clk: move IM-PD1 clocks to drivers/clk clk: ux500: fix bit error Martin Fuzzey (1): clk: clock multiplexers may register out of order Mike Turquette (2): clk: introduce optional disable_unused callback MAINTAINERS: bad email address for Mike Turquette Pawel Moll (2): clk: Versatile Express clock generators ("osc") driver clk: Common clocks implementation for Versatile Express Peter Ujfalusi (1): CLK: clk-twl6040: Initial clock driver for OMAP4+ McPDM fclk clock Rajeev Kumar (1): CLK: SPEAr: Fix dev_id & con_id for multiple clocks Shiraz Hashim (2): CLK: SPEAr13xx: Fix mux clock names CLK: SPEAr13xx: fix parent names of multiple clocks Stephen Boyd (6): clk: Document .is_enabled op clk: Fix documentation typos clk: Don't return negative numbers for unsigned values with !clk clk: wm831x: Fix clk_register() error code checking clk: Add devm_clk_{register,unregister}() clk: wm831x: Use devm_clk_register() to simplify code Tony Prisk (1): CLK: vt8500: Fix SDMMC clk special cases Ulf Hansson (19): mfd: dbx500: Export prmcu_request_ape_opp_100_voltage clk: ux500: Support prcmu ape opp voltage clock clk: ux500: Update sdmmc clock to 100MHz for u8500 ARM: ux500: Remove cpufreq platform device mfd: db8500: Provide cpufreq table as platform data cpufreq: db8500: Register as a platform driver cpufreq: db8500: Fetch cpufreq table from platform data mfd: db8500: Connect ARMSS clk to ARM OPP clk: ux500: Support for prcmu_scalable_rate clock clk: ux500: Add armss clk and fixup smp_twd clk for u8500 cpufreq: db8500: Use armss clk to update frequency clk: ux500: Register i2c clock lookups for u8500 clk: ux500: Register ssp clock lookups for u8500 clk: ux500: Register msp clock lookups for u8500 clk: ux500: Update rtc clock lookup for u8500 clk: ux500: Register slimbus clock lookups for u8500 clk: ux500: Register rng clock lookups for u8500 clk: ux500: Register nomadik keypad clock lookups for u8500 clk: ux500: Initial support for abx500 clock driver Vipul Kumar Samar (3): CLK: SPEAr: Set CLK_SET_RATE_PARENT for few clocks CLK: SPEAr: Add missing clocks CLK: SPEAr: Remove unused dummy apb_pclk Viresh Kumar (1): clk: SPEAr: Vco-pll: Fix compilation warning Wei Yongjun (4): clk: fix return value check in of_fixed_clk_setup() clk: fix return value check in sirfsoc_of_clk_init() clk: fix return value check in bcm2835_init_clocks() CLK: clk-twl6040: fix return value check in twl6040_clk_probe() .../devicetree/bindings/clock/imx23-clock.txt |2 +- .../devicetree/bindings/clock/imx28-clock.txt |4 +- MAINTAINERS|1 - arch/arm/include/asm/hardware/sp810.h |2 + arch/arm/mach-integrator/impd1.c | 69 +- arch/arm/mach-ux500/cpu-db8500.c |6 - drivers/clk/Kconfig| 16 +- drivers/clk/Makefile |1 + drivers/clk/clk-bcm2835.c |8 +- drivers/clk/clk-fixed-rate.c |2 +- drivers/clk/clk-prima2.c | 84 +++ drivers/clk/clk-twl6040.c | 126 +++ drivers/clk/clk-vt8500.c | 18 ++ drivers/clk/clk-wm831x.c | 34 +-- drivers/clk/clk.c | 154 ++--- drivers/clk/mxs/clk-imx23.c|6 +- drivers/clk/mxs/clk-imx28.c| 10 +- drivers/clk/spear/clk-aux-synth.c |3 +- drivers/clk/spear
Re: [PATCH v3 1/4] leds: leds-pwm: Convert to use devm_get_pwm
On Mon, Dec 10, 2012 at 11:00:34AM +0100, Peter Ujfalusi wrote: > Update the driver to use the new API for requesting pwm so we can take > advantage of the pwm_lookup table to find the correct pwm to be used for the > LED functionality. > If the devm_get_pwm fails we fall back to legacy mode to try to get the pwm. > > Signed-off-by: Peter Ujfalusi > --- > drivers/leds/leds-pwm.c | 19 ++- > include/linux/leds_pwm.h | 2 +- > 2 files changed, 7 insertions(+), 14 deletions(-) > > diff --git a/drivers/leds/leds-pwm.c b/drivers/leds/leds-pwm.c > index 2157524..351257c 100644 > --- a/drivers/leds/leds-pwm.c > +++ b/drivers/leds/leds-pwm.c > @@ -67,12 +67,11 @@ static int led_pwm_probe(struct platform_device *pdev) > cur_led = &pdata->leds[i]; > led_dat = &leds_data[i]; > > - led_dat->pwm = pwm_request(cur_led->pwm_id, > - cur_led->name); > + led_dat->pwm = devm_pwm_get(&pdev->dev, cur_led->name); > if (IS_ERR(led_dat->pwm)) { > ret = PTR_ERR(led_dat->pwm); > - dev_err(&pdev->dev, "unable to request PWM %d\n", > - cur_led->pwm_id); > + dev_err(&pdev->dev, "unable to request PWM for %s\n", > + cur_led->name); > goto err; > } The commit message says that legacy mode is used as fallback if devm_get_pwm() (that should really be devm_pwm_get() btw) fails but I don't see where pwm_request() is called. Thierry pgpDhuDzzX4rT.pgp Description: PGP signature
Re: [PATCH v3 3/4] pwm: core: Export of_pwm_request() so client drivers can also use it
On Mon, Dec 10, 2012 at 11:00:36AM +0100, Peter Ujfalusi wrote: > Allow client driver to use of_pwm_request() to get the pwm they need. This > is needed for drivers which handle more than one pwm separately, like > leds-pwm driver which have: Hi Peter, I really was hoping that we didn't have to export this function, but I can't think of any other way to solve the problem at hand either. I'd prefer to rename the function to of_pwm_get() at the same time to keep consistent with other subsystems that provide similar functionality. Also, please use all-caps for PWM in prose. And while at it, you can drop the "core:" and "so client drivers can also use it" from the subject line. > pwmleds { > compatible = "pwm-leds"; > kpad { > label = "omap4::keypad"; > pwms = <&twl_pwm 0 7812500>; > max-brightness = <127>; > }; > > charging { > label = "omap4:green:chrg"; > pwms = <&twl_pwmled 0 7812500>; > max-brightness = <255>; > }; > }; > > in the dts files. > > Signed-off-by: Peter Ujfalusi > --- > drivers/pwm/core.c | 2 +- > include/linux/pwm.h | 7 +++ > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/pwm/core.c b/drivers/pwm/core.c > index 903138b..3a7ebcc 100644 > --- a/drivers/pwm/core.c > +++ b/drivers/pwm/core.c > @@ -486,7 +486,7 @@ static struct pwm_chip *of_node_to_pwmchip(struct > device_node *np) > * becomes mandatory for devices that look up the PWM device via the con_id > * parameter. > */ > -static struct pwm_device *of_pwm_request(struct device_node *np, > +struct pwm_device *of_pwm_request(struct device_node *np, >const char *con_id) > { > struct pwm_device *pwm = NULL; This is missing an EXPORT_SYMBOL_GPL. > diff --git a/include/linux/pwm.h b/include/linux/pwm.h > index 6d661f3..d70ffe3 100644 > --- a/include/linux/pwm.h > +++ b/include/linux/pwm.h > @@ -175,6 +175,7 @@ struct pwm_device *of_pwm_xlate_with_flags(struct > pwm_chip *pc, > const struct of_phandle_args *args); > > struct pwm_device *pwm_get(struct device *dev, const char *consumer); > +struct pwm_device *of_pwm_request(struct device_node *np, const char > *con_id); While at it, maybe rename the con_id parameter as well to match pwm_get(). Thierry pgp23R2yAVsxG.pgp Description: PGP signature
performance drop after using blkcg
Hi, I plan to use blkcg(proportional BW) in my system. But I encounter great performance drop after enabling blkcg. The testing tool is fio(version 2.0.7) and both the BW and IOPS fields are recorded. Two instances of fio program are carried out simultaneously, each opearting on a separate disk file (say /data/testfile1, /data/testfile2). System environment: kernel: 3.7.0-rc5 CFQ's slice_idle is disabled(slice_idle=0) while group_idle is enabled(group_idle=8). FIO configuration(e.g. "read") for the first fio program(say FIO1): [global] description=Emulation of Intel IOmeter File Server Access Pattern [iometer] bssplit=4k/30:8k/40:16k/30 rw=read direct=1 time_based runtime=180s ioengine=sync filename=/data/testfile1 numjobs=32 group_reporting result before using blkcg: (the value of BW is KB/s) FIO1 BW/IOPSFIO2 BW/IOPS --- read 26799/2911 25861/2810 write 138618/15071138578/15069 rw 72159/7838(r) 71851/7811(r) 72171/7840(w) 71799/7805(w) randread 4982/5435370/585 randwrite 5192/5666010/654 randrw 2369/258(r) 3027/330(r) 2369/258(w) 3016/328(w) result after using blkcg(create two blkio cgroups with default blkio.weight(500) and put FIO1 and FIO2 into these cgroups respectively) FIO1 BW/IOPSFIO2 BW/IOPS --- read 36651/3985 36470/3943 write 75738/8229 75641/8221 rw 49169/5342(r) 49168/5346(r) 49200/5348(w) 49140/5341(w) randread 4876/5324905/534 randwrite 5535/6035497/599 randrw 2521/274(r) 2527/275(r) 2510/273(w) 2532/274(w) Comparing with those results, we found greate performance drop (30%-40%) in some test cases(especially for the "write", "rw" case). Is it normal to see write/rw bandwidth decrease by 40% after using blkio-cgroup? If not, any way to improve or tune the performace? Thanks. -- Regards, Zhao Shuai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] pinctrl changes for v3.8
Hi Linus, these are the first and major pinctrl changes for the v3.8 merge cycle. Some of this is used as merge base for other trees so I better be early on the trigger. The major changes are described in the signed tag. All has been in linux-next for a while. This is the first time I've had to pull in external branches and use some parallel topics for pinctrl so if I've done it wrong somehow just tell me. Anyway, please pull it in! Yours, Linus Walleij The following changes since commit 77b67063bb6bce6d475e910d3b886a606d0d91f7: Linux 3.7-rc5 (2012-11-11 13:44:33 +0100) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git tags/pinctrl-for-v3.8 for you to fetch changes up to 7c8f86a451fe8c010eb93c62d4d69727ccdbe435: ARM: mmp: select PINCTRL for ARCH_MMP (2012-12-02 00:09:09 +0100) This is the pinctrl big pull request for v3.8. As can be seen from the diffstat the major changes are: - A big conversion of the AT91 pinctrl driver and the associated ACKed platform changes under arch/arm/max-at91 and its device trees. This has been coordinated with the AT91 maintainers to go in through the pinctrl tree. - A larger chunk of changes to the SPEAr drivers and the addition of the "plgpio" driver for the SPEAr as well. - The removal of the remnants of the Nomadik driver from the arch/arm tree and fusion of that into the Nomadik driver and platform data header files. - Some local movement in the Marvell MVEBU drivers, these now have their own subdirectory. - The addition of a chunk of code to gpiolib under drivers/gpio to register gpio-to-pin range mappings from the GPIO side of things. This has been requested by Grant Likely and is now implemented, it is particularly useful for device tree work. Then we have incremental updates all over the place, many of these are cleanups and fixes from Axel Lin who has done a great job of removing minor mistakes and compilation annoyances. Axel Lin (25): pinctrl: nomadik: Add terminating entry for platform_device_id table pinctrl: at91: Staticize non-exported symbols pinctrl: exynos: Add terminating entry for of_device_id table pinctrl: u300: Staticize non-exported symbols pinctrl: sirf: Staticize non-exported symbol pinctrl: Staticize pinconf_ops pinctrl: lantiq: Remove ltq_pmx_disable() function pinctrl: lantiq: Staticize non-exported symbols pinctrl: pinmux: Release all taken pins in pinmux_enable_setting error paths pinctrl: spear: Staticize non-exported symbols pinctrl: mxs: Make PINCTRL_MXS select PINMUX && PINCONF pinctrl: tegra: Make PINCTRL_TEGRA select PINMUX && PINCONF pinctrl: pxa3xx: Use devm_request_and_ioremap pinctrl: pxa3xx: Remove phy_base and phy_size from struct pxa3xx_pinmux_info pinctrl: tegra: Staticize non-exported symbols pinctrl: imx: Fix the logic checking if not able to find pin reg map pinctrl: spear: Fix the logic of setting reg in pmx_init_gpio_pingroup_addr pinctrl: coh901: Return proper error if irq_domain_add_linear() fails pinctrl: spear: Make get_gpio_pingroup return NULL when no gpio_pingroup found pinctrl: plgpio: Call clk_disable_unprepare only if clk_prepare_enable is called pinctrl: nomadik: Prevent NULL dereference if of_match_device returns NULL pinctrl: nomadik: Staticize non-exported symbols gpiolib: Fix use after free in gpiochip_add_pin_range pinctrl: Drop selecting PINCONF for MMP2, PXA168 and PXA910 ARM: mmp: select PINCTRL for ARCH_MMP Barry Song (1): pinctrl: sirf: enable the driver support new SiRFmarco SoC Haojian Zhuang (3): pinctrl: single: dump pinmux register value pinctrl: generic: add input schmitt disable parameter pinctrl: single: support gpio request and free Jean-Christophe PLAGNIOL-VILLARD (29): arm: at91: use macro to declare soc boot data ARM: at91: gpio: implement request at91: regroup gpio and pinctrl under the same ranges arm: at91: at91sam9x5: fix gpio number per bank ARM: at91: add dummies pinctrl for non dt platform ARM: at91: add pinctrl support arm: at91: dt: at91sam9 add pinctrl support arm: at91: dt: at91sam9 add serial pinctrl support tty: atmel_serial: add pinctrl support arm: at91: dt: sam9m10g45ek: use rts/cts pinctrl group for uart1 arm: at91: dt: sam9263ek: use rts/cts pinctrl group for uart0 arm: at91: dt: sam9g20ek: use rts/cts/dtr/dsr/dcd/ri pinctrl group for uart0 arm: at91: dt: at91sam9 add nand pinctrl support MTD: atmel_nand: add pinctrl consumer support pinctrl: at91: fix typo on PULL_UP gpio/at91: auto request and configure the pio as input when the interrupt is used via DT pinctrl/:
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
>>> Hi Simon, >>> >>> If we use "/sys/devices/system/memory/soft_offline_page" to offline a >>> free page, the value of mce_bad_pages will be added. Then the page is marked >>> HWPoison, but it is still managed by page buddy alocator. >>> >>> So if we offline it again, the value of mce_bad_pages will be added again. >>> Assume the page is not allocated during this short time. >>> >>> soft_offline_page() >>> get_any_page() >>> "else if (is_free_buddy_page(p))" branch return 0 >>> "goto done"; >>> "atomic_long_add(1, &mce_bad_pages);" >>> >>> I think it would be better to move "if(PageHWPoison(page))" at the >>> beginning of >>> soft_offline_page(). However I don't know what do these words mean, >>> "Synchronized using the page lock with memory_failure()" > > Hi Xishi, > > Unpoison will clear PG_hwpoison flag after hold page lock, memory_failure() > and > soft_offline_page() take the lock to avoid unpoison clear the flag behind > them. > > Regards, > Wanpeng Li > Hi Wanpeng, As you mean, it is the necessary to get the page lock first when we check the HWPoison flag every time, this is in order to avoid conflict, right? So why not use a globe lock here? For example lock_memory_hotplug() is used in online_pages() and offline_pages()? Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
> -Original Message- > From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com] > Sent: Friday, December 07, 2012 10:09 PM > To: Xu, Dongxiao > Cc: xen-de...@lists.xen.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg > hook > > On Thu, Dec 06, 2012 at 09:08:42PM +0800, Dongxiao Xu wrote: > > While mapping sg buffers, checking to cross page DMA buffer is also > > needed. If the guest DMA buffer crosses page boundary, Xen should > > exchange contiguous memory for it. > > So this is when we cross those 2MB contingous swatch of buffers. > Wouldn't we get the same problem with the 'map_page' call? If the driver tried > to map say a 4MB DMA region? Yes, it also needs such check, as I just replied to Jan's mail. > > What if this check was done in the routines that provide the software static > buffers and there try to provide a nice DMA contingous swatch of pages? Yes, this approach also came to our mind, which needs to modify the driver itself. If so, it requires driver not using such static buffers (e.g., from kmalloc) to do DMA even if the buffer is continuous in native. Is this acceptable by kernel/driver upstream? Thanks, Dongxiao > > > > > Besides, it is needed to backup the original page contents and copy it > > back after memory exchange is done. > > > > This fixes issues if device DMA into software static buffers, and in > > case the static buffer cross page boundary which pages are not > > contiguous in real hardware. > > > > Signed-off-by: Dongxiao Xu > > Signed-off-by: Xiantao Zhang > > --- > > drivers/xen/swiotlb-xen.c | 47 > - > > 1 files changed, 46 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index 58db6df..e8f0cfb 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device > > *hwdev, dma_addr_t dev_addr, } > > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > > > +static bool > > +check_continguous_region(unsigned long vstart, unsigned long order) { > > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > > + unsigned long next_ma; > > + int i; > > + > > + for (i = 1; i < (1 << order); i++) { > > + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > > + if (next_ma != prev_ma + PAGE_SIZE) > > + return false; > > + prev_ma = next_ma; > > + } > > + return true; > > +} > > + > > /* > > * Map a set of buffers described by scatterlist in streaming mode for > DMA. > > * This is the scatter-gather version of the above > > xen_swiotlb_map_page @@ -489,7 +505,36 @@ > > xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist > > *sgl, > > > > for_each_sg(sgl, sg, nelems, i) { > > phys_addr_t paddr = sg_phys(sg); > > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > > + unsigned long vstart, order; > > + dma_addr_t dev_addr; > > + > > + /* > > +* While mapping sg buffers, checking to cross page DMA buffer > > +* is also needed. If the guest DMA buffer crosses page > > +* boundary, Xen should exchange contiguous memory for it. > > +* Besides, it is needed to backup the original page contents > > +* and copy it back after memory exchange is done. > > +*/ > > + if (range_straddles_page_boundary(paddr, sg->length)) { > > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > > + if (!check_continguous_region(vstart, order)) { > > + unsigned long buf; > > + buf = __get_free_pages(GFP_KERNEL, order); > > + memcpy((void *)buf, (void *)vstart, > > + PAGE_SIZE * (1 << order)); > > + if (xen_create_contiguous_region(vstart, order, > > + fls64(paddr))) { > > + free_pages(buf, order); > > + return 0; > > + } > > + memcpy((void *)vstart, (void *)buf, > > + PAGE_SIZE * (1 << order)); > > + free_pages(buf, order); > > + } > > + } > > + > > + dev_addr = xen_phys_to_bus(paddr); > > > > if (swiotlb_force || > > !dma_capable(hwdev, dev_addr, sg->length) || > > -- > > 1.7.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Pleas
Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
On 12/10/2012 07:55 PM, Yinghai Lu wrote: And my suggestion is: after scan and find the ucode, save it to BRK, so don't need to adjust pointer again, and don't need to copy the blob and update the pointer again. That doesn't work if the microcode is replaced at runtime. However, vmalloc doesn't work either since 32 bits needs any one blob to be physically contiguous. I have suggested Fenghua replace it with a linked list of kmalloc areas, one for each blob. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] staging: rtl8712: avoid a useless call to memset().
On Tue, Dec 11, 2012 at 01:20:48AM +0100, Cyril Roelandt wrote: > In r8711_wx_get_wap(), make sure we do not call memcpy() on a memory area that > has just been zeroed by a call to memset(). > Acked-by: Dan Carpenter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/18] sched: select_task_rq_fair clean up
On 12/11/2012 10:58 AM, Alex Shi wrote: > On 12/11/2012 12:23 PM, Preeti U Murthy wrote: >> Hi Alex, >> >> On 12/10/2012 01:52 PM, Alex Shi wrote: >>> It is impossible to miss a task allowed cpu in a eligible group. >> >> The one thing I am concerned with here is if there is a possibility of >> the task changing its tsk_cpus_allowed() while this code is running. >> >> i.e find_idlest_group() finds an idle group,then the tsk_cpus_allowed() >> for the task changes,perhaps by the user himself,which might not include >> the cpus in the idle group.After this find_idlest_cpu() is called.I mean >> a race condition in short.Then we might not have an eligible cpu in that >> group right? > > your worry make sense, but the code handle the situation, in > select_task_rq(), it will check the cpu allowed again. if the answer is > no, it will fallback to old cpu. >> >>> And since find_idlest_group only return a different group which >>> excludes old cpu, it's also imporissible to find a new cpu same as old >>> cpu. I doubt this will work correctly.Consider the following situation:sched domain begins with sd that encloses both socket1 and socket2 cpu0 cpu1 | cpu2 cpu3 ---|- socket1 | socket2 old cpu = cpu1 Iteration1: 1.find_idlest_group() returns socket2 to be idlest. 2.task changes tsk_allowed_cpus to 0,1 3.find_idlest_cpu() returns cpu2 * without your patch 1.the condition after find_idlest_cpu() returns -1,and sd->child is chosen which happens to be socket1 2.in the next iteration, find_idlest_group() and find_idlest_cpu() will probably choose cpu0 which happens to be idler than cpu1,which is in tsk_allowed_cpu. * with your patch 1.the condition after find_idlest_cpu() does not exist,therefore a sched domain to which cpu2 belongs to is chosen.this is socket2.(under the for_each_domain() loop). 2.in the next iteration, find_idlest_group() return NULL,because there is no cpu which intersects with tsk_allowed_cpus. 3.in select task rq,the fallback cpu is chosen even when an idle cpu existed. So my concern is though select_task_rq() checks the tsk_allowed_cpus(),you might end up choosing a different path of sched_domains compared to without this patch as shown above. In short without the "if(new_cpu==-1)" condition we might get misled doing unnecessary iterations over the wrong sched domains in select_task_rq_fair().(Think about situations when not all the cpus of socket2 are disallowed by the task,then there will more iterations in the wrong path of sched_domains before exit,compared to what is shown above.) Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] smpboot: calling smpboot_register_percpu_thread is unsafe during one CPU being down
On 12/11/2012 07:47 PM, Chuansheng Liu wrote: > > When one CPU is going down, and smpboot_register_percpu_thread is called, > there is the race issue below: > T1(CPUA): T2(CPUB): > _cpu_down() > smpboot_register_percpu_thread() > smpboot_park_threads() ... > __stop_machine() > __smpboot_create_thread(CPU_Dying) > [Currently, the being down > CPU is online yet] > take_cpu_down() > smpboot_unpark_thread(CPU_Dying) > __cpu_disable() > native_cpu_disable() > Here the new kthread will > get running > based on the CPU_Dying > set_cpu_online(cpu, false) > >cpu_notify(CPU_DYING) > > After notified the CPU_DYING, the new created kthead for dying CPU will > be migrated to another CPU in migration_call(). > > Here we need use get_online_cpus()/put_online_cpus() when calling > function smpboot_register_percpu_thread(). > > Signed-off-by: liu chuansheng Reviewed-by: Srivatsa S. Bhat Regards, Srivatsa S. Bhat > --- > kernel/smpboot.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/kernel/smpboot.c b/kernel/smpboot.c > index d6c5fc0..3fe708a 100644 > --- a/kernel/smpboot.c > +++ b/kernel/smpboot.c > @@ -266,6 +266,7 @@ int smpboot_register_percpu_thread(struct > smp_hotplug_thread *plug_thread) > unsigned int cpu; > int ret = 0; > > + get_online_cpus(); > mutex_lock(&smpboot_threads_lock); > for_each_online_cpu(cpu) { > ret = __smpboot_create_thread(plug_thread, cpu); > @@ -278,6 +279,7 @@ int smpboot_register_percpu_thread(struct > smp_hotplug_thread *plug_thread) > list_add(&plug_thread->list, &hotplug_threads); > out: > mutex_unlock(&smpboot_threads_lock); > + put_online_cpus(); > return ret; > } > EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread); > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [Xen-devel] [PATCH] xen/swiotlb: Exchange to contiguous memory for map_sg hook
> -Original Message- > From: Jan Beulich [mailto:jbeul...@suse.com] > Sent: Thursday, December 06, 2012 9:38 PM > To: Xu, Dongxiao > Cc: xen-de...@lists.xen.org; konrad.w...@oracle.com; > linux-kernel@vger.kernel.org > Subject: Re: [Xen-devel] [PATCH] xen/swiotlb: Exchange to contiguous memory > for map_sg hook > > >>> On 06.12.12 at 14:08, Dongxiao Xu wrote: > > While mapping sg buffers, checking to cross page DMA buffer is also > > needed. If the guest DMA buffer crosses page boundary, Xen should > > exchange contiguous memory for it. > > > > Besides, it is needed to backup the original page contents and copy it > > back after memory exchange is done. > > > > This fixes issues if device DMA into software static buffers, and in > > case the static buffer cross page boundary which pages are not > > contiguous in real hardware. > > > > Signed-off-by: Dongxiao Xu > > Signed-off-by: Xiantao Zhang > > --- > > drivers/xen/swiotlb-xen.c | 47 > > - > > 1 files changed, 46 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c > > index 58db6df..e8f0cfb 100644 > > --- a/drivers/xen/swiotlb-xen.c > > +++ b/drivers/xen/swiotlb-xen.c > > @@ -461,6 +461,22 @@ xen_swiotlb_sync_single_for_device(struct device > > *hwdev, dma_addr_t dev_addr, } > > EXPORT_SYMBOL_GPL(xen_swiotlb_sync_single_for_device); > > > > +static bool > > +check_continguous_region(unsigned long vstart, unsigned long order) > > check_continguous_region(unsigned long vstart, unsigned int order) > > But - why do you need to do this check order based in the first place? > Checking > the actual length of the buffer should suffice. Thanks, the word "continguous" is mistyped in the function, it should be "contiguous".    check_contiguous_region() function is used to check whether pages are contiguous in hardware. The length only indicates whether the buffer crosses page boundary. If buffer crosses pages and they are not contiguous in hardware, we do need to exchange memory in Xen. > > > +{ > > + unsigned long prev_ma = xen_virt_to_bus((void *)vstart); > > + unsigned long next_ma; > > phys_addr_t or some such for both of them. Thanks. Should be dma_addr_t? > > > + int i; > > unsigned long Thanks. > > > + > > + for (i = 1; i < (1 << order); i++) { > > 1UL Thanks. > > > + next_ma = xen_virt_to_bus((void *)(vstart + i * PAGE_SIZE)); > > + if (next_ma != prev_ma + PAGE_SIZE) > > + return false; > > + prev_ma = next_ma; > > + } > > + return true; > > +} > > + > > /* > > * Map a set of buffers described by scatterlist in streaming mode for > DMA. > > * This is the scatter-gather version of the above > > xen_swiotlb_map_page @@ -489,7 +505,36 @@ > > xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist > > *sgl, > > > > for_each_sg(sgl, sg, nelems, i) { > > phys_addr_t paddr = sg_phys(sg); > > - dma_addr_t dev_addr = xen_phys_to_bus(paddr); > > + unsigned long vstart, order; > > + dma_addr_t dev_addr; > > + > > + /* > > +* While mapping sg buffers, checking to cross page DMA buffer > > +* is also needed. If the guest DMA buffer crosses page > > +* boundary, Xen should exchange contiguous memory for it. > > +* Besides, it is needed to backup the original page contents > > +* and copy it back after memory exchange is done. > > +*/ > > + if (range_straddles_page_boundary(paddr, sg->length)) { > > + vstart = (unsigned long)__va(paddr & PAGE_MASK); > > + order = get_order(sg->length + (paddr & ~PAGE_MASK)); > > + if (!check_continguous_region(vstart, order)) { > > + unsigned long buf; > > + buf = __get_free_pages(GFP_KERNEL, order); > > + memcpy((void *)buf, (void *)vstart, > > + PAGE_SIZE * (1 << order)); > > + if (xen_create_contiguous_region(vstart, order, > > + fls64(paddr))) { > > + free_pages(buf, order); > > + return 0; > > + } > > + memcpy((void *)vstart, (void *)buf, > > + PAGE_SIZE * (1 << order)); > > + free_pages(buf, order); > > + } > > + } > > + > > + dev_addr = xen_phys_to_bus(paddr); > > > > if (swiotlb_force || > > !dma_capable(hwdev, dev_addr, sg->length) || > > How about swiotlb_map_page() (for the compound page case)? Yes! This should also need similar handling. One thing needs further consideration is that, the above approach introdu
[GIT PULL] please pull infiniband.git
Hi Linus, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband.git tags/rdma-for-linus First batch of InfiniBand/RDMA changes for the 3.8 merge window: - A good chunk of Bart Van Assche's SRP fixes - UAPI disintegration from David Howells - mlx4 support for "64-byte CQE" hardware feature from Or Gerlitz - Other miscellaneous fixes Alan Cox (2): IB/ipath: Remove unreachable code RDMA/amsol1100: Fix missing break Bart Van Assche (14): IB/srp: Increase block layer timeout IB/srp: Eliminate state SRP_TARGET_CONNECTING IB/srp: Keep processing commands during host removal IB/srp: Simplify SCSI error handling IB/srp: Introduce srp_handle_qp_err() IB/srp: Process all error completions IB/srp: Suppress superfluous error messages IB/srp: Introduce the helper function srp_remove_target() IB/srp: Eliminate state SRP_TARGET_DEAD IB/srp: Document sysfs attributes srp_transport: Fix attribute registration srp_transport: Simplify attribute initialization code srp_transport: Document sysfs attributes IB/srp: Allow SRP disconnect through sysfs David Howells (1): UAPI: (Scripted) Disintegrate include/rdma Ishai Rabinovitz (1): IB/srp: destroy and recreate QP and CQs when reconnecting Jack Morgenstein (2): IB/mlx4: Fix spinlock order to avoid lockdep warnings mlx4_core: Fix potential deadlock in mlx4_eq_int() Julia Lawall (3): RDMA/nes: Use WARN() RDMA/cxgb4: use WARN RDMA/cxgb3: use WARN Or Gerlitz (1): mlx4: 64-byte CQE/EQE support Roland Dreier (4): Merge branches 'cxgb4', 'misc', 'mlx4', 'nes' and 'uapi' into for-next Merge branches 'cma' and 'mlx4' into for-next Merge branch 'srp' into for-next Merge branch 'nes' into for-next Tatyana Nikolova (7): RDMA/nes: Fix incorrect address of IP header RDMA/nes: Fix for unlinking skbs from empty list RDMA/nes: Fix for sending fpdus in order to hardware RDMA/nes: Fix for incorrect multicast address in the perfect filter table RDMA/nes: Fix for BUG_ON due to adding already-pending timer RDMA/nes: Fix for terminate timer crash RDMA/nes: Fix for crash when registering zero length MR for CQ Vu Pham (1): IB/srp: send disconnect request without waiting for CM timewait exit shefty (1): RDMA/cm: Change return value from find_gid_port() Documentation/ABI/stable/sysfs-driver-ib_srp | 156 Documentation/ABI/stable/sysfs-transport-srp | 19 ++ drivers/infiniband/core/cma.c |9 +- drivers/infiniband/hw/amso1100/c2_ae.c |1 + drivers/infiniband/hw/cxgb3/iwch_cm.c |6 +- drivers/infiniband/hw/cxgb4/cm.c |6 +- drivers/infiniband/hw/ipath/ipath_init_chip.c | 10 - drivers/infiniband/hw/mlx4/cm.c|4 +- drivers/infiniband/hw/mlx4/cq.c| 34 ++- drivers/infiniband/hw/mlx4/main.c | 27 +- drivers/infiniband/hw/mlx4/mlx4_ib.h |1 + drivers/infiniband/hw/mlx4/user.h | 12 +- drivers/infiniband/hw/nes/nes.h|1 + drivers/infiniband/hw/nes/nes_cm.c | 32 +-- drivers/infiniband/hw/nes/nes_hw.c |9 +- drivers/infiniband/hw/nes/nes_mgt.c| 42 ++-- drivers/infiniband/hw/nes/nes_nic.c| 13 +- drivers/infiniband/hw/nes/nes_verbs.c |9 +- drivers/infiniband/ulp/srp/ib_srp.c| 314 ++-- drivers/infiniband/ulp/srp/ib_srp.h| 11 +- drivers/net/ethernet/mellanox/mlx4/cmd.c | 11 +- drivers/net/ethernet/mellanox/mlx4/en_cq.c |2 +- drivers/net/ethernet/mellanox/mlx4/en_netdev.c |1 + drivers/net/ethernet/mellanox/mlx4/en_rx.c |5 +- drivers/net/ethernet/mellanox/mlx4/en_tx.c |5 +- drivers/net/ethernet/mellanox/mlx4/eq.c| 36 ++- drivers/net/ethernet/mellanox/mlx4/fw.c| 30 ++- drivers/net/ethernet/mellanox/mlx4/fw.h|1 + drivers/net/ethernet/mellanox/mlx4/main.c | 38 ++- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |1 + drivers/scsi/scsi_transport_srp.c | 51 ++-- include/linux/mlx4/device.h| 21 ++ include/rdma/Kbuild|6 - include/rdma/rdma_netlink.h| 36 +-- include/scsi/scsi_transport_srp.h |8 + include/uapi/rdma/Kbuild |6 + include/{ => uapi}/rdma/ib_user_cm.h |0 include/{ => uapi}/rdma/ib_user_mad.h |0 include/{ => uapi}/rdma/ib_user_sa.h |0 include/{ => uapi}/rdma/ib_user_verbs.h|0 include/uapi/rdma/rdma_netlink.h | 37 +++ inc
[GIT PULL] hwmon updates for 3.8-rc1
Hi Linus, Please pull hwmon updates for Linux 3.8-rc1 from signed tag: git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git hwmon-for-linus Thanks, Guenter -- The following changes since commit 9489e9dcae718d5fde988e4a684a0f55b5f94d17: Linux 3.7-rc7 (2012-11-25 17:59:19 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git tags/hwmon-for-linus for you to fetch changes up to 44f751cee1b4baef9e3b49c6bd954f8b12b097a6: hwmon: (da9055) Fix chan_mux[DA9055_ADC_ADCIN3] setting (2012-12-05 10:55:55 -0800) New driver: DA9055 Added/improved support for new chips in existing drivers: Z650/670, N550/570, ADS7830, AMD 16h family Ashish Jangam (1): hwmon: DA9055 HWMON driver Axel Lin (2): hwmon: da9052: Use da9052_reg_update for rmw operations hwmon: (da9055) Fix chan_mux[DA9055_ADC_ADCIN3] setting Boris Ostrovsky (1): x86,AMD: Power driver support for AMD's family 16h processors Guenter Roeck (4): hwmon: (coretemp) Drop dependency on PCI for TjMax detection on Atom CPUs hwmon: (coretemp) Use model table instead of if/else to identify CPU models hwmon: (coretemp) Drop N4xx, N5xx, D4xx, D5xx CPUs from tjmax table hwmon: (coretemp) List TjMax for Z650/670 and N550/570 Guillaume Roguez (1): hwmon: (ads7828) add support for ADS7830 Vivien Didelot (1): hwmon: (ads7828) driver cleanup Wei Yongjun (1): hwmon: (ina2xx) use module_i2c_driver to simplify the code Documentation/hwmon/ads7828 | 46 +++-- Documentation/hwmon/coretemp |2 + Documentation/hwmon/da9055| 47 + drivers/hwmon/Kconfig | 19 +- drivers/hwmon/Makefile|1 + drivers/hwmon/ads7828.c | 247 ++-- drivers/hwmon/coretemp.c | 60 +++--- drivers/hwmon/da9052-hwmon.c | 27 +-- drivers/hwmon/da9055-hwmon.c | 336 + drivers/hwmon/fam15h_power.c |4 + drivers/hwmon/ina2xx.c| 13 +- include/linux/platform_data/ads7828.h | 29 +++ 12 files changed, 607 insertions(+), 224 deletions(-) create mode 100644 Documentation/hwmon/da9055 create mode 100644 drivers/hwmon/da9055-hwmon.c create mode 100644 include/linux/platform_data/ads7828.h signature.asc Description: Digital signature
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On 2012/12/11 11:48, Simon Jeons wrote: > On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote: >> On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: >>> On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > Oh, it will be putback to lru list during migration. So does your "some > time" mean before call check_new_page? Yes until the next check_new_page() whenever that is. If the migration works it will be earlier, otherwise later. >>> >>> But I can't figure out any page reclaim path check if the page is set >>> PG_hwpoison, can poisoned pages be rclaimed? >> >> The only way to reclaim a page is to free and reallocate it. > > Then why there doesn't have check in reclaim path to avoid relcaim > poisoned page? > > -Simon Hi Simon, If the page is free, it will be set PG_hwpoison, and soft_offline_page() is done. When the page is alocated later, check_new_page() will find the poisoned page and isolate the whole buddy block(just drop the block). If the page is not free, soft_offline_page() try to free it first, if this is failed, it will migrate the page, but the page is still in LRU list after migration, migrate_pages() unmap_and_move() if (rc != -EAGAIN) { ... putback_lru_page(page); } We can use lru_add_drain_all() to drain lru pagevec, at last free_hot_cold_page() will be called, and free_pages_prepare() check the poisoned pages. free_pages_prepare() free_pages_check() bad_page() Is this right, Andi? Thanks Xishi Qiu >> >> -Andi > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/18] sched: fix find_idlest_group mess logical
Hi Alex, On 12/11/2012 10:59 AM, Alex Shi wrote: > On 12/11/2012 01:08 PM, Preeti U Murthy wrote: >> Hi Alex, >> >> On 12/10/2012 01:52 PM, Alex Shi wrote: >>> There is 4 situations in the function: >>> 1, no task allowed group; >>> so min_load = ULONG_MAX, this_load = 0, idlest = NULL >>> 2, only local group task allowed; >>> so min_load = ULONG_MAX, this_load assigned, idlest = NULL >>> 3, only non-local task group allowed; >>> so min_load assigned, this_load = 0, idlest != NULL >>> 4, local group + another group are task allowed. >>> so min_load assigned, this_load assigned, idlest != NULL >>> >>> Current logical will return NULL in first 3 kinds of scenarios. >>> And still return NULL, if idlest group is heavier then the >>> local group in the 4th situation. >>> >>> Actually, I thought groups in situation 2,3 are also eligible to host >>> the task. And in 4th situation, agree to bias toward local group. >>> So, has this patch. >>> >>> Signed-off-by: Alex Shi >>> --- >>> kernel/sched/fair.c | 12 +--- >>> 1 files changed, 9 insertions(+), 3 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index df99456..b40bc2b 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -2953,6 +2953,7 @@ find_idlest_group(struct sched_domain *sd, struct >>> task_struct *p, >>> int this_cpu, int load_idx) >>> { >>> struct sched_group *idlest = NULL, *group = sd->groups; >>> + struct sched_group *this_group = NULL; >>> unsigned long min_load = ULONG_MAX, this_load = 0; >>> int imbalance = 100 + (sd->imbalance_pct-100)/2; >>> >>> @@ -2987,14 +2988,19 @@ find_idlest_group(struct sched_domain *sd, struct >>> task_struct *p, >>> >>> if (local_group) { >>> this_load = avg_load; >>> - } else if (avg_load < min_load) { >>> + this_group = group; >>> + } >>> + if (avg_load < min_load) { >>> min_load = avg_load; >>> idlest = group; >>> } >>> } while (group = group->next, group != sd->groups); >>> >>> - if (!idlest || 100*this_load < imbalance*min_load) >>> - return NULL; >>> + if (this_group && idlest != this_group) >>> + /* Bias toward our group again */ >>> + if (100*this_load < imbalance*min_load) >>> + idlest = this_group; >> >> If the idlest group is heavier than this_group(or to put it better if >> the difference in the loads of the local group and idlest group is less >> than a threshold,it means there is no point moving the load from the >> local group) you return NULL,that immediately means this_group is chosen >> as the candidate group for the task to run,one does not have to >> explicitly return that. > > In situation 4, this_group is not NULL. True.The return value of find_idlest_group() indicates that there is no other idle group other than the local group(the group to which cpu belongs to). it does not indicate that there is no host group for the task.If this is the case,select_task_rq_fair() falls back to the group(sd->child) to which the cpu chosen in the previous iteration belongs to,This is nothing but this_group in the current iteration. Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/4] kprobes/powerpc: Do not disable External interrupts during single step
On 12/03/2012 08:37 PM, Suzuki K. Poulose wrote: From: Suzuki K. Poulose External/Decrement exceptions have lower priority than the Debug Exception. So, we don't have to disable the External interrupts before a single step. However, on BookE, Critical Input Exception(CE) has higher priority than a Debug Exception. Hence we mask them. Signed-off-by: Suzuki K. Poulose Cc: Sebastian Andrzej Siewior Cc: Ananth N Mavinakaynahalli Cc: Kumar Gala Cc: linuxppc-...@ozlabs.org --- arch/powerpc/kernel/kprobes.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c index e88c643..4901b34 100644 --- a/arch/powerpc/kernel/kprobes.c +++ b/arch/powerpc/kernel/kprobes.c @@ -104,13 +104,13 @@ void __kprobes arch_remove_kprobe(struct kprobe *p) static void __kprobes prepare_singlestep(struct kprobe *p, struct pt_regs *regs) { - /* We turn off async exceptions to ensure that the single step will -* be for the instruction we have the kprobe on, if we dont its -* possible we'd get the single step reported for an exception handler -* like Decrementer or External Interrupt */ - regs->msr &= ~MSR_EE; regs->msr |= MSR_SINGLESTEP; #ifdef CONFIG_PPC_ADV_DEBUG_REGS + /* +* We turn off Critical Input Exception(CE) to ensure that the single +* step will be for the instruction we have the probe on; if we don't, +* it is possible we'd get the single step reported for CE. +*/ regs->msr &= ~MSR_CE; mtspr(SPRN_DBCR0, mfspr(SPRN_DBCR0) | DBCR0_IC | DBCR0_IDM); #ifdef CONFIG_PPC_47x Ben, Kumar, Could you please review this patch ? Thanks Suzuki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/2] net: remove obsolete simple_strto
This patch removes the redundant occurences of simple_strto Signed-off-by: Abhijit Pawar --- net/core/netpoll.c|1 - net/mac80211/debugfs_sta.c|1 - net/netfilter/nf_conntrack_core.c |1 - 3 files changed, 0 insertions(+), 3 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 12c129f..3151acf 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -706,7 +706,6 @@ int netpoll_parse_options(struct netpoll *np, char *opt) *delim = 0; if (*cur == ' ' || *cur == '\t') np_info(np, "warning: whitespace is not allowed\n"); - np->remote_port = simple_strtol(cur, NULL, 10); if (kstrtou16(cur, 10, &np->remote_port)) goto parse_failed; cur = delim; diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c index 0dedb4b..6fb1168 100644 --- a/net/mac80211/debugfs_sta.c +++ b/net/mac80211/debugfs_sta.c @@ -220,7 +220,6 @@ static ssize_t sta_agg_status_write(struct file *file, const char __user *userbu } else return -EINVAL; - tid = simple_strtoul(buf, NULL, 0); ret = kstrtoul(buf, 0, &tid); if (ret) return ret; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 37d9e62..08cdc71 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1422,7 +1422,6 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp) if (!nf_conntrack_htable_size) return param_set_uint(val, kp); - hashsize = simple_strtoul(val, NULL, 0); rc = kstrtouint(val, 0, &hashsize); if (rc) return rc; -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND RESEND] net: remove obsolete simple_strto
On 12/11/2012 10:19 AM, David Miller wrote: > From: Abhijit Pawar > Date: Tue, 11 Dec 2012 09:04:20 +0530 > >> This patch replace the obsolete simple_strto with kstrto >> >> Signed-off-by: Abhijit Pawar > > You can't submit replacement patches for ones which I have already > applied. > > Patches I apply are permanently applied, and therefore you must submit > changes relative the ones I've applied already. > I am sorry to create this confusion. I have created and sent the new patch which you can apply over the old one to fix the issues. -- - Abhijit -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/18] sched: fix find_idlest_group mess logical
On 12/11/2012 01:08 PM, Preeti U Murthy wrote: > Hi Alex, > > On 12/10/2012 01:52 PM, Alex Shi wrote: >> There is 4 situations in the function: >> 1, no task allowed group; >> so min_load = ULONG_MAX, this_load = 0, idlest = NULL >> 2, only local group task allowed; >> so min_load = ULONG_MAX, this_load assigned, idlest = NULL >> 3, only non-local task group allowed; >> so min_load assigned, this_load = 0, idlest != NULL >> 4, local group + another group are task allowed. >> so min_load assigned, this_load assigned, idlest != NULL >> >> Current logical will return NULL in first 3 kinds of scenarios. >> And still return NULL, if idlest group is heavier then the >> local group in the 4th situation. >> >> Actually, I thought groups in situation 2,3 are also eligible to host >> the task. And in 4th situation, agree to bias toward local group. >> So, has this patch. >> >> Signed-off-by: Alex Shi >> --- >> kernel/sched/fair.c | 12 +--- >> 1 files changed, 9 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index df99456..b40bc2b 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2953,6 +2953,7 @@ find_idlest_group(struct sched_domain *sd, struct >> task_struct *p, >>int this_cpu, int load_idx) >> { >> struct sched_group *idlest = NULL, *group = sd->groups; >> +struct sched_group *this_group = NULL; >> unsigned long min_load = ULONG_MAX, this_load = 0; >> int imbalance = 100 + (sd->imbalance_pct-100)/2; >> >> @@ -2987,14 +2988,19 @@ find_idlest_group(struct sched_domain *sd, struct >> task_struct *p, >> >> if (local_group) { >> this_load = avg_load; >> -} else if (avg_load < min_load) { >> +this_group = group; >> +} >> +if (avg_load < min_load) { >> min_load = avg_load; >> idlest = group; >> } >> } while (group = group->next, group != sd->groups); >> >> -if (!idlest || 100*this_load < imbalance*min_load) >> -return NULL; >> +if (this_group && idlest != this_group) >> +/* Bias toward our group again */ >> +if (100*this_load < imbalance*min_load) >> +idlest = this_group; > > If the idlest group is heavier than this_group(or to put it better if > the difference in the loads of the local group and idlest group is less > than a threshold,it means there is no point moving the load from the > local group) you return NULL,that immediately means this_group is chosen > as the candidate group for the task to run,one does not have to > explicitly return that. In situation 4, this_group is not NULL. > > Let me explain: > find_idlest_group()-if it returns NULL to mark your case4,it means there > is no idler group than the group to which this_cpu belongs to, at that > level of sched domain.Which is fair enough. > > So now the question is under such a circumstance which is the idlest > group so far.It is the group containing this_cpu,i.e.this_group.After > this sd->child is chosen which is nothing but this_group(sd hierarchy > moves towards the cpu it belongs to). Again here the idlest group search > begins. > >> + >> return idlest; >> } >> >> > Regards > Preeti U Murthy > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/18] sched: select_task_rq_fair clean up
On 12/11/2012 12:23 PM, Preeti U Murthy wrote: > Hi Alex, > > On 12/10/2012 01:52 PM, Alex Shi wrote: >> It is impossible to miss a task allowed cpu in a eligible group. > > The one thing I am concerned with here is if there is a possibility of > the task changing its tsk_cpus_allowed() while this code is running. > > i.e find_idlest_group() finds an idle group,then the tsk_cpus_allowed() > for the task changes,perhaps by the user himself,which might not include > the cpus in the idle group.After this find_idlest_cpu() is called.I mean > a race condition in short.Then we might not have an eligible cpu in that > group right? your worry make sense, but the code handle the situation, in select_task_rq(), it will check the cpu allowed again. if the answer is no, it will fallback to old cpu. > >> And since find_idlest_group only return a different group which >> excludes old cpu, it's also imporissible to find a new cpu same as old >> cpu. > > This I agree with. > >> Signed-off-by: Alex Shi >> --- >> kernel/sched/fair.c |5 - >> 1 files changed, 0 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 59e072b..df99456 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3150,11 +3150,6 @@ select_task_rq_fair(struct task_struct *p, int >> sd_flag, int wake_flags) >> } >> >> new_cpu = find_idlest_cpu(group, p, cpu); >> -if (new_cpu == -1 || new_cpu == cpu) { >> -/* Now try balancing at a lower domain level of cpu */ >> -sd = sd->child; >> -continue; >> -} >> >> /* Now try balancing at a lower domain level of new_cpu */ >> cpu = new_cpu; >> > Regards > Preeti U Murthy > -- Thanks Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: manual merge of the akpm tree with Linus' tree
Hi Andrew, Today's linux-next merge of the akpm tree got a conflict in include/linux/gfp.h between commit caf491916b1c ("Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage") from Linus' tree and commit "mm: add a reminder comment for __GFP_BITS_SHIFT" from the akpm tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc include/linux/gfp.h index 976a8e3,c0fb4d8..000 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@@ -30,11 -30,11 +30,12 @@@ struct vm_area_struct #define ___GFP_HARDWALL 0x2u #define ___GFP_THISNODE 0x4u #define ___GFP_RECLAIMABLE0x8u -#define ___GFP_NOTRACK0x10u -#define ___GFP_OTHER_NODE 0x20u -#define ___GFP_WRITE 0x40u -#define ___GFP_KMEMCG 0x80u +#define ___GFP_NOTRACK0x20u +#define ___GFP_NO_KSWAPD 0x40u +#define ___GFP_OTHER_NODE 0x80u +#define ___GFP_WRITE 0x100u +#define ___GFP_KMEMCG 0x200u + /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* * GFP bitmasks.. pgpBN26usQqku.pgp Description: PGP signature
linux-next: manual merge of the akpm tree with Linus' tree
Hi Andrew, Today's linux-next merge of the akpm tree got a conflict in include/linux/gfp.h between commit caf491916b1c ("Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage") from Linus' tree and commit "mm: add a __GFP_KMEMCG flag" from the akpm tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc include/linux/gfp.h index 31e8041,5520344..000 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@@ -30,10 -30,10 +30,11 @@@ struct vm_area_struct #define ___GFP_HARDWALL 0x2u #define ___GFP_THISNODE 0x4u #define ___GFP_RECLAIMABLE0x8u -#define ___GFP_NOTRACK0x10u -#define ___GFP_OTHER_NODE 0x20u -#define ___GFP_WRITE 0x40u -#define ___GFP_KMEMCG 0x80u +#define ___GFP_NOTRACK0x20u +#define ___GFP_NO_KSWAPD 0x40u +#define ___GFP_OTHER_NODE 0x80u +#define ___GFP_WRITE 0x100u ++#define ___GFP_KMEMCG 0x200u /* * GFP bitmasks.. @@@ -86,17 -86,16 +87,17 @@@ #define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) /* Page is reclaimable */ #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) /* Don't track with kmemcheck */ +#define __GFP_NO_KSWAPD ((__force gfp_t)___GFP_NO_KSWAPD) #define __GFP_OTHER_NODE ((__force gfp_t)___GFP_OTHER_NODE) /* On behalf of other node */ #define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) /* Allocator intends to dirty page */ - + #define __GFP_KMEMCG ((__force gfp_t)___GFP_KMEMCG) /* Allocation comes from a memcg-accounted resource */ /* * This may seem redundant, but it's a way of annotating false positives vs. * allocations that simply cannot be supported (e.g. page tables). */ #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) - #define __GFP_BITS_SHIFT 25 /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 24 /* Room for N __GFP_FOO bits */ ++#define __GFP_BITS_SHIFT 26 /* Room for N __GFP_FOO bits */ #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* This equals 0, but use constants in case they ever change */ pgpKIRYXpyL6C.pgp Description: PGP signature
[PATCH] smpboot: calling smpboot_register_percpu_thread is unsafe during one CPU being down
When one CPU is going down, and smpboot_register_percpu_thread is called, there is the race issue below: T1(CPUA): T2(CPUB): _cpu_down()smpboot_register_percpu_thread() smpboot_park_threads() ... __stop_machine() __smpboot_create_thread(CPU_Dying) [Currently, the being down CPU is online yet] take_cpu_down() smpboot_unpark_thread(CPU_Dying) __cpu_disable() native_cpu_disable() Here the new kthread will get running based on the CPU_Dying set_cpu_online(cpu, false) cpu_notify(CPU_DYING) After notified the CPU_DYING, the new created kthead for dying CPU will be migrated to another CPU in migration_call(). Here we need use get_online_cpus()/put_online_cpus() when calling function smpboot_register_percpu_thread(). Signed-off-by: liu chuansheng --- kernel/smpboot.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/kernel/smpboot.c b/kernel/smpboot.c index d6c5fc0..3fe708a 100644 --- a/kernel/smpboot.c +++ b/kernel/smpboot.c @@ -266,6 +266,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread) unsigned int cpu; int ret = 0; + get_online_cpus(); mutex_lock(&smpboot_threads_lock); for_each_online_cpu(cpu) { ret = __smpboot_create_thread(plug_thread, cpu); @@ -278,6 +279,7 @@ int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread) list_add(&plug_thread->list, &hotplug_threads); out: mutex_unlock(&smpboot_threads_lock); + put_online_cpus(); return ret; } EXPORT_SYMBOL_GPL(smpboot_register_percpu_thread); -- 1.7.0.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/18] sched: fix find_idlest_group mess logical
Hi Alex, On 12/10/2012 01:52 PM, Alex Shi wrote: > There is 4 situations in the function: > 1, no task allowed group; > so min_load = ULONG_MAX, this_load = 0, idlest = NULL > 2, only local group task allowed; > so min_load = ULONG_MAX, this_load assigned, idlest = NULL > 3, only non-local task group allowed; > so min_load assigned, this_load = 0, idlest != NULL > 4, local group + another group are task allowed. > so min_load assigned, this_load assigned, idlest != NULL > > Current logical will return NULL in first 3 kinds of scenarios. > And still return NULL, if idlest group is heavier then the > local group in the 4th situation. > > Actually, I thought groups in situation 2,3 are also eligible to host > the task. And in 4th situation, agree to bias toward local group. > So, has this patch. > > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c | 12 +--- > 1 files changed, 9 insertions(+), 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index df99456..b40bc2b 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2953,6 +2953,7 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > int this_cpu, int load_idx) > { > struct sched_group *idlest = NULL, *group = sd->groups; > + struct sched_group *this_group = NULL; > unsigned long min_load = ULONG_MAX, this_load = 0; > int imbalance = 100 + (sd->imbalance_pct-100)/2; > > @@ -2987,14 +2988,19 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > > if (local_group) { > this_load = avg_load; > - } else if (avg_load < min_load) { > + this_group = group; > + } > + if (avg_load < min_load) { > min_load = avg_load; > idlest = group; > } > } while (group = group->next, group != sd->groups); > > - if (!idlest || 100*this_load < imbalance*min_load) > - return NULL; > + if (this_group && idlest != this_group) > + /* Bias toward our group again */ > + if (100*this_load < imbalance*min_load) > + idlest = this_group; If the idlest group is heavier than this_group(or to put it better if the difference in the loads of the local group and idlest group is less than a threshold,it means there is no point moving the load from the local group) you return NULL,that immediately means this_group is chosen as the candidate group for the task to run,one does not have to explicitly return that. Let me explain: find_idlest_group()-if it returns NULL to mark your case4,it means there is no idler group than the group to which this_cpu belongs to, at that level of sched domain.Which is fair enough. So now the question is under such a circumstance which is the idlest group so far.It is the group containing this_cpu,i.e.this_group.After this sd->child is chosen which is nothing but this_group(sd hierarchy moves towards the cpu it belongs to). Again here the idlest group search begins. > + > return idlest; > } > > Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 4/5][RESEND] page_alloc: Make movablecore_map has higher priority
If kernelcore or movablecore is specified at the same time with movablecore_map, movablecore_map will have higher priority to be satisfied. This patch will make find_zone_movable_pfns_for_nodes() calculate zone_movable_pfn[] with the limit from zone_movable_limit[]. change log: Move find_usable_zone_for_movable() to free_area_init_nodes() so that sanitize_zone_movable_limit() in patch 3 could use initialized movable_zone. Reported-by: Wu Jianguo Signed-off-by: Tang Chen Reviewed-by: Wen Congyang Reviewed-by: Lai Jiangshan Tested-by: Lin Feng --- mm/page_alloc.c | 28 +--- 1 files changed, 25 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 52c368e..00fa67d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4839,9 +4839,17 @@ static void __init find_zone_movable_pfns_for_nodes(void) required_kernelcore = max(required_kernelcore, corepages); } - /* If kernelcore was not specified, there is no ZONE_MOVABLE */ - if (!required_kernelcore) + /* +* If neither kernelcore/movablecore nor movablecore_map is specified, +* there is no ZONE_MOVABLE. But if movablecore_map is specified, the +* start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[]. +*/ + if (!required_kernelcore) { + if (movablecore_map.nr_map) + memcpy(zone_movable_pfn, zone_movable_limit, + sizeof(zone_movable_pfn)); goto out; + } /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; @@ -4871,10 +4879,24 @@ restart: for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { unsigned long size_pages; + /* +* Find more memory for kernelcore in +* [zone_movable_pfn[nid], zone_movable_limit[nid]). +*/ start_pfn = max(start_pfn, zone_movable_pfn[nid]); if (start_pfn >= end_pfn) continue; + if (zone_movable_limit[nid]) { + end_pfn = min(end_pfn, zone_movable_limit[nid]); + /* No range left for kernelcore in this node */ + if (start_pfn >= end_pfn) { + zone_movable_pfn[nid] = + zone_movable_limit[nid]; + break; + } + } + /* Account for what is only usable for kernelcore */ if (start_pfn < usable_startpfn) { unsigned long kernel_pages; @@ -4934,12 +4956,12 @@ restart: if (usable_nodes && required_kernelcore > usable_nodes) goto restart; +out: /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ for (nid = 0; nid < MAX_NUMNODES; nid++) zone_movable_pfn[nid] = roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); -out: /* restore the node_state */ node_states[N_HIGH_MEMORY] = saved_node_state; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 3/5][RESEND] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
This patch introduces a new array zone_movable_limit[] to store the ZONE_MOVABLE limit from movablecore_map boot option for all nodes. The function sanitize_zone_movable_limit() will find out to which node the ranges in movable_map.map[] belongs, and calculates the low boundary of ZONE_MOVABLE for each node. change log: Do find_usable_zone_for_movable() to initialize movable_zone so that sanitize_zone_movable_limit() could use it. Reported-by: Wu Jianguo Signed-off-by: Tang Chen Signed-off-by: Liu Jiang Reviewed-by: Wen Congyang Reviewed-by: Lai Jiangshan Tested-by: Lin Feng --- mm/page_alloc.c | 79 ++- 1 files changed, 78 insertions(+), 1 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1c91d16..52c368e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; static unsigned long __initdata required_kernelcore; static unsigned long __initdata required_movablecore; static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES]; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid, return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); } +/** + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array. + * + * zone_movable_limit is initialized as 0. This function will try to get + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and + * assigne them to zone_movable_limit. + * zone_movable_limit[nid] == 0 means no limit for the node. + * + * Note: Each range is represented as [start_pfn, end_pfn) + */ +static void __meminit sanitize_zone_movable_limit(void) +{ + int map_pos = 0, i, nid; + unsigned long start_pfn, end_pfn; + + if (!movablecore_map.nr_map) + return; + + /* Iterate all ranges from minimum to maximum */ + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + /* +* If we have found lowest pfn of ZONE_MOVABLE of the node +* specified by user, just go on to check next range. +*/ + if (zone_movable_limit[nid]) + continue; + +#ifdef CONFIG_ZONE_DMA + /* Skip DMA memory. */ + if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA]) + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA]; +#endif + +#ifdef CONFIG_ZONE_DMA32 + /* Skip DMA32 memory. */ + if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32]) + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32]; +#endif + +#ifdef CONFIG_HIGHMEM + /* Skip lowmem if ZONE_MOVABLE is highmem. */ + if (zone_movable_is_highmem() && + start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]) + start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]; +#endif + + if (start_pfn >= end_pfn) + continue; + + while (map_pos < movablecore_map.nr_map) { + if (end_pfn <= movablecore_map.map[map_pos].start_pfn) + break; + + if (start_pfn >= movablecore_map.map[map_pos].end_pfn) { + map_pos++; + continue; + } + + /* +* The start_pfn of ZONE_MOVABLE is either the minimum +* pfn specified by movablecore_map, or 0, which means +* the node has no ZONE_MOVABLE. +*/ + zone_movable_limit[nid] = max(start_pfn, + movablecore_map.map[map_pos].start_pfn); + + break; + } + } +} + #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ static inline unsigned long __meminit zone_spanned_pages_in_node(int nid, unsigned long zone_type, @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid, return zholes_size[zone_type]; } +static void __meminit sanitize_zone_movable_limit(void) +{ +} + #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat, @@ -4768,7 +4844,6 @@ static void __init find_zone_movable_pfns_for_nodes(void) goto out; /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - find_usable_zone_for_movable(); usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: @@ -4923,6 +4998,8 @@
RE: [PATCH 3/4 v5] iommu/fsl: Add iommu domain attributes required by fsl PAMU driver.
> -Original Message- > From: Wood Scott-B07421 > Sent: Tuesday, December 11, 2012 6:31 AM > To: Sethi Varun-B16395 > Cc: Wood Scott-B07421; Joerg Roedel; linux-kernel@vger.kernel.org; > io...@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; Tabi > Timur-B04825 > Subject: Re: [PATCH 3/4 v5] iommu/fsl: Add iommu domain attributes > required by fsl PAMU driver. > > On 12/10/2012 04:10:06 AM, Sethi Varun-B16395 wrote: > > > > > > > -Original Message- > > > From: Wood Scott-B07421 > > > Sent: Tuesday, December 04, 2012 11:53 PM > > > To: Sethi Varun-B16395 > > > Cc: Wood Scott-B07421; Joerg Roedel; linux-kernel@vger.kernel.org; > > > io...@lists.linux-foundation.org; linuxppc-...@lists.ozlabs.org; > > Tabi > > > Timur-B04825 > > > Subject: Re: [PATCH 3/4 v5] iommu/fsl: Add iommu domain attributes > > > required by fsl PAMU driver. > > > > > > On 12/04/2012 05:53:33 AM, Sethi Varun-B16395 wrote: > > > > > > > > > > > > > -Original Message- > > > > > From: Wood Scott-B07421 > > > > > Sent: Monday, December 03, 2012 10:34 PM > > > > > To: Sethi Varun-B16395 > > > > > Cc: Joerg Roedel; linux-kernel@vger.kernel.org; > > iommu@lists.linux- > > > > > foundation.org; Wood Scott-B07421; > > linuxppc-...@lists.ozlabs.org; > > > > Tabi > > > > > Timur-B04825 > > > > > Subject: Re: [PATCH 3/4 v5] iommu/fsl: Add iommu domain > > attributes > > > > > required by fsl PAMU driver. > > > > > > > > > > On 12/03/2012 10:57:29 AM, Sethi Varun-B16395 wrote: > > > > > > > > > > > > > > > > > > > -Original Message- > > > > > > > From: iommu-boun...@lists.linux-foundation.org > > [mailto:iommu- > > > > > > > boun...@lists.linux-foundation.org] On Behalf Of Joerg > > Roedel > > > > > > > Sent: Sunday, December 02, 2012 7:33 PM > > > > > > > To: Sethi Varun-B16395 > > > > > > > Cc: linux-kernel@vger.kernel.org; > > > > io...@lists.linux-foundation.org; > > > > > > Wood > > > > > > > Scott-B07421; linuxppc-...@lists.ozlabs.org; Tabi > > Timur-B04825 > > > > > > > Subject: Re: [PATCH 3/4 v5] iommu/fsl: Add iommu domain > > > > attributes > > > > > > > required by fsl PAMU driver. > > > > > > > > > > > > > > Hmm, we need to work out a good abstraction for this. > > > > > > > > > > > > > > On Tue, Nov 20, 2012 at 07:24:56PM +0530, Varun Sethi wrote: > > > > > > > > Added the following domain attributes required by FSL PAMU > > > > driver: > > > > > > > > 1. Subwindows field added to the iommu domain geometry > > > > attribute. > > > > > > > > > > > > > > Are the Subwindows mapped with full size or do you map only > > > > parts > > > > > > of the > > > > > > > subwindows? > > > > > > > > > > > > > [Sethi Varun-B16395] It's possible to map a part of the > > subwindow > > > > i.e. > > > > > > size of the mapping can be less than the sub window size. > > > > > > > > > > > > > > +* This attribute indicates number of DMA subwindows > > > > > supported > > > > > > by > > > > > > > > +* the geometry. If there is a single window that maps > > > > the > > > > > > entire > > > > > > > > +* geometry, attribute must be set to "1". A value of > > > > "0" > > > > > > implies > > > > > > > > +* that this mechanism is not used at all(normal paging > > > > is > > > > > > used). > > > > > > > > +* Value other than* "0" or "1" indicates the actual > > > > number > > > > > of > > > > > > > > +* subwindows. > > > > > > > > +*/ > > > > > > > > > > > > > > This semantic is ugly, how about a feature detection > > mechanism? > > > > > > > > > > > > > [Sethi Varun-B16395] A feature mechanism to query the type of > > > > IOMMU? > > > > > > > > > > A feature mechanism to determine whether this subwindow > > mechanism is > > > > > available, and what the limits are. > > > > > > > > > So, we use the IOMMU capability interface to find out if IOMMU > > > > supports sub windows or not, right? But still number of sub > > windows > > > > would be specified as a part of the geometry and the valid value > > for > > > > sub windows would 0,1 or actual number of sub windows. > > > > > > How does a user of the interface find out what values are possible > > for > > > the "actual number of subwindows"? How does a user of the > > interface find > > > out whether there are any limitations on specifying a value of zero > > (in > > > the case of PAMU, that would be a maximum 1 MiB naturally-aligned > > > aperture to support arbitrary 4KiB mappings)? > > How about if we say that the default value for subwindows is zero and > > this what you get when you read the geometry (iommu_get_attr) after > > initializing the domain? In that case the user would know that > > implication of setting subwindows to zero with respect to the aperture > > size. > > So it would default to the maximum aperture size possible with no > subwindows? That might be OK, though is there a way to reset the domain > later on to get back to that informational state? > [Sethi Varun-B16395] Yes, that can be done via
Re: [PATCH RESEND RESEND] net: remove obsolete simple_strto
From: Abhijit Pawar Date: Tue, 11 Dec 2012 09:04:20 +0530 > This patch replace the obsolete simple_strto with kstrto > > Signed-off-by: Abhijit Pawar You can't submit replacement patches for ones which I have already applied. Patches I apply are permanently applied, and therefore you must submit changes relative the ones I've applied already. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] net: remove obsolete simple_strto
From: Abhijit Pawar Date: Tue, 11 Dec 2012 09:03:41 +0530 > On 12/11/2012 12:40 AM, David Miller wrote: >> From: Abhijit Pawar >> Date: Mon, 10 Dec 2012 14:42:28 +0530 >> >>> This patch replace the obsolete simple_strto with kstrto >>> >>> Signed-off-by: Abhijit Pawar >> >> Applied. >> > Hi David, > It seems that there are occurences of simple_strto* still present in the > couple of files which are not yet removed correctly by this patch. I > will send a modified patch shortly. Please revert this commit and use > the newly sent patch to merge with the tree. Again, you cannot send "modified" patches. When I say I've applied your patch, that cannot be undone. You must therefore send me fixup patches relative to the ones I've applied already. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] net: remove obsolete simple_strto
From: Abhijit Pawar Date: Tue, 11 Dec 2012 06:36:59 +0530 > It looks like there are two occurences of simple_strtoul which has not been > removed cleanly from the patch. > They are in netpoll.c and debugfs_sta.c > I will send the modified corrected clean patch shortly. You can't simply send me a replacement patch, since I already applied the original one and that patch will not be reverted. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v2] Support volatile range for anon vma
On Fri, Dec 07, 2012 at 04:49:56PM -0800, John Stultz wrote: > On 12/04/2012 08:18 PM, Minchan Kim wrote: > >On Tue, Dec 04, 2012 at 11:13:40AM -0800, John Stultz wrote: > >>I don't think the problem is when vmas being marked VM_VOLATILE are > >>being merged, its that when we mark the vma as *non-volatile*, and > >>remove the VM_VOLATILE flag we merge the non-volatile vmas with > >>neighboring vmas. So preserving the purged flag during that merge is > >>important. Again, the example I used to trigger this was an > >>alternating pattern of volatile and non volatile vmas, then marking > >>the entire range non-volatile (though sometimes in two overlapping > >>passes). > >Understood. Thanks. > >Below patch solves your problems? It's simple than yours. > > Yea, this is nicer then my fix. > Although I still need the purged handling in the vma merge code for > me to see the behavior I expect in my tests. > > I've integrated your patch and repushed my queue here: > http://git.linaro.org/gitweb?p=people/jstultz/android-dev.git;a=shortlog;h=refs/heads/dev/minchan-anonvol > > git://git.linaro.org/people/jstultz/android-dev.git dev/minchan-anonvol > > >Anyway, both yours and mine are not right fix. > >As I mentioned, locking scheme is broken. > >We need anon_vma_lock to handle purged and we should consider fork > >case, too. > Hrm. I'm sure you're right, as I've not yet fully grasped all the > locking rules here. Could you clarify how it is broken? And why is > the anon_vma_lock needed to manage the purged state that is part of > the vma itself? If you don't hold anon->lock, merge/split/fork can race with try_to_unmap so vma->[purged|vm_flags] would lose consistency. > > thanks > -john > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majord...@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: mailto:"d...@kvack.org";> em...@kvack.org -- Kind regards, Minchan Kim -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch v2 4/6] memcg: simplify mem_cgroup_iter
On Mon, Nov 26, 2012 at 10:47 AM, Michal Hocko wrote: > Current implementation of mem_cgroup_iter has to consider both css and > memcg to find out whether no group has been found (css==NULL - aka the > loop is completed) and that no memcg is associated with the found node > (!memcg - aka css_tryget failed because the group is no longer alive). > This leads to awkward tweaks like tests for css && !memcg to skip the > current node. > > It will be much easier if we got rid off css variable altogether and > only rely on memcg. In order to do that the iteration part has to skip > dead nodes. This sounds natural to me and as a nice side effect we will > get a simple invariant that memcg is always alive when non-NULL and all > nodes have been visited otherwise. > > We could get rid of the surrounding while loop but keep it in for now to > make review easier. It will go away in the following patch. > > Signed-off-by: Michal Hocko > --- > mm/memcontrol.c | 56 > +++ > 1 file changed, 27 insertions(+), 29 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 6bcc97b..d1bc0e8 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1086,7 +1086,6 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup > *root, > rcu_read_lock(); > while (!memcg) { > struct mem_cgroup_reclaim_iter *uninitialized_var(iter); > - struct cgroup_subsys_state *css = NULL; > > if (reclaim) { > int nid = zone_to_nid(reclaim->zone); > @@ -1112,53 +,52 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup > *root, > * explicit visit. > */ > if (!last_visited) { > - css = &root->css; > + memcg = root; > } else { > struct cgroup *prev_cgroup, *next_cgroup; > > prev_cgroup = (last_visited == root) ? NULL > : last_visited->css.cgroup; > - next_cgroup = cgroup_next_descendant_pre(prev_cgroup, > - root->css.cgroup); > - if (next_cgroup) > - css = cgroup_subsys_state(next_cgroup, > - mem_cgroup_subsys_id); > - } > +skip_node: > + next_cgroup = cgroup_next_descendant_pre( > + prev_cgroup, root->css.cgroup); > > - /* > -* Even if we found a group we have to make sure it is alive. > -* css && !memcg means that the groups should be skipped and > -* we should continue the tree walk. > -* last_visited css is safe to use because it is protected by > -* css_get and the tree walk is rcu safe. > -*/ > - if (css == &root->css || (css && css_tryget(css))) > - memcg = mem_cgroup_from_css(css); > + /* > +* Even if we found a group we have to make sure it is > +* alive. css && !memcg means that the groups should > be > +* skipped and we should continue the tree walk. > +* last_visited css is safe to use because it is > +* protected by css_get and the tree walk is rcu safe. > +*/ > + if (next_cgroup) { > + struct mem_cgroup *mem = mem_cgroup_from_cont( > + next_cgroup); > + if (css_tryget(&mem->css)) > + memcg = mem; I see a functional change after this, where we now hold a refcnt of css if memcg is root. It is not the case before this change. --Ying > + else { > + prev_cgroup = next_cgroup; > + goto skip_node; > + } > + } > + } > > if (reclaim) { > - struct mem_cgroup *curr = memcg; > - > if (last_visited) > css_put(&last_visited->css); > > - if (css && !memcg) > - curr = mem_cgroup_from_css(css); > - > /* make sure that the cached memcg is not removed */ > - if (curr) > - css_get(&curr->css); > - iter->last_visited = curr; > + if (memcg) > + css_get(&memcg->css); > + iter->last_visited = memcg; > > - if (!
Re: [RFC v2] Support volatile range for anon vma
On Fri, Dec 07, 2012 at 04:20:30PM -0800, John Stultz wrote: > On 12/04/2012 11:01 PM, Minchan Kim wrote: > >Hi John, > > > >On Tue, Dec 04, 2012 at 11:13:40AM -0800, John Stultz wrote: > >> > >>I don't think the problem is when vmas being marked VM_VOLATILE are > >>being merged, its that when we mark the vma as *non-volatile*, and > >>remove the VM_VOLATILE flag we merge the non-volatile vmas with > >>neighboring vmas. So preserving the purged flag during that merge is > >>important. Again, the example I used to trigger this was an > >>alternating pattern of volatile and non volatile vmas, then marking > >>the entire range non-volatile (though sometimes in two overlapping > >>passes). > >If I understand correctly, you mean following as. > > > >chunk1 = mmap(8M) > >chunk2 = chunk1 + 2M; > >chunk3 = chunk2 + 2M > >chunk4 = chunk3 + 2M > > > >madvise(chunk1, 2M, VOLATILE); > >madvise(chunk4, 2M, VOLATILE); > > > >/* > > * V : volatile vma > > * N : non volatile vma > > * So Now vma is VNVN. > > */ > >And chunk4 is purged. > > > >int ret = madvise(chunk1, 8M, NOVOLATILE); > >ASSERT(ret == 1); > >/* And you expect VNVN->N ?*/ > > > >Right? > > Yes. That's exactly right. > > >If so, why should non-volatile function semantic allow it which cross over > >non-volatile areas in a range? I would like to fail such case because > >in case of MADV_REMOVE, it fails in the middle of operation if it encounter > >VM_LOCKED. > > > >What do you think about it? > Right, so I think this issue is maybe a problematic part of the VMA > based approach. While marking an area as nonvolatile twice might > not make a lot of sense, I think userland applications would not > appreciate the constraint that madvise(VOLATILE/NONVOLATILE) calls > be made in perfect pairs of identical sizes. > > For instance, if a browser has rendered a web page, but the page is > so large that only a sliding window/view of that page is visible at > one time, it may want to mark the regions not currently in the view > as volatile. So it would be nice (albeit naive) for that > application that when the view location changed, it would just mark > the new region as non-volatile, and any region not in the current > view as volatile. This would be easier then trying to calculate the > diff of the old view region boundaries vs the new and modifying only > the ranges that changed. Granted, doing so might be more efficient, > but I'm not sure we can be sure every similar case would be more > efficient. > > So in my mind, double-clearing a flag should be allowed (as well as > double-setting), as well as allowing for setting/clearing > overlapping regions. It might and as you said, it's not matched by normal madvise opearation. So if user really want it, we might need another interface like new system call like mlock. Although we can implement it, what I has a concern is mmap_sem hold time. For VMA approach, we need exclusive mmap_sem and it ends up preventing concurrent page fault handling so it would mitigate anon volatile's goal for user-space allocators. So I would like to avoid more works with exclusive mmap_sem as far as possible. Of course, you can argue that if we don't support such semantic, user can call madvise(NOVOATILE) several time with several ranges so it could be more bad. Right. But I suggest for plumbers to implement range management smart and let's leave kernel implementation simple/fast. I'm not solid. If user really want such semantic, I can support it with new system call. Frankly speaking, I would like to remove madvise(NOVOLATILE) call. If you already saw my patch just I sent morning, you can guess what it is. The problem of anon volatile with madvise(NOVOLATILE) is that time delay between allocator allocats a free chunk and user really access the memory. Normally, when allocator return free chunk to customer, allocator should call madvise(NOVOLATILE) but user could access the memory long time after. So during that time difference, that pages could be swap out. So I decide to remove madvise(NOVOLATILE) and it's handled at first page fault. Yeb. The same rule couldn't applied to tmpfs volatile and it does needs NOVOLATILE semantic. Hmm,, I am biasing to new system call. int mvolatile(const void *addr, size_t len, int mode); int munvolatile(const void *addr, size_t len; If someone call mvolatile with AUTO mode, it would work as my anon volatile while in MANUAL mode, user must call munvolatile before using. It might meet your and mine goal. But adding new system call is last resort. :) > > Aside from if the behavior should be allowed or not, the error mode > of madvise is problematic as well, since failures can happen mid way > through the operation, leaving the vmas in the range specified > inconsistent. Since usually its only advisory, such inconsistent > states aren't really problematic, and repeating the last action is > probably fine. True. > > The problem with NOVOLATILE's purged state, with vmas, is that if > we hi
Re: [PATCH v4 1/7] DMA: PL330: use prefix in reg names to build under x86
On 10 December 2012 19:12, Davide Ciminaghi wrote: > From: Alessandro Rubini > > This driver would not compile if ARM_AMBA is selected under x86, > because "CS" and "DS" are already defined there. But AMBA > is used in the x86 world by a PCI-to-AMBA bridge, to be submitted. > > The patch just adds the "PL330_" prefix to all registers, > so it can be built by randomconfig after ARM_AMBA appears within x86. > No other technical changes have been performed. > The patch was build-tested only. > > Signed-off-by: Alessandro Rubini > Acked-by: Giancarlo Asnaghi > [Davide Ciminaghi : only registers prefixed] > Signed-off-by: Davide Ciminaghi > --- Acked-by: Jassi Brar Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 01/18] sched: select_task_rq_fair clean up
Hi Alex, On 12/10/2012 01:52 PM, Alex Shi wrote: > It is impossible to miss a task allowed cpu in a eligible group. The one thing I am concerned with here is if there is a possibility of the task changing its tsk_cpus_allowed() while this code is running. i.e find_idlest_group() finds an idle group,then the tsk_cpus_allowed() for the task changes,perhaps by the user himself,which might not include the cpus in the idle group.After this find_idlest_cpu() is called.I mean a race condition in short.Then we might not have an eligible cpu in that group right? > And since find_idlest_group only return a different group which > excludes old cpu, it's also imporissible to find a new cpu same as old > cpu. This I agree with. > Signed-off-by: Alex Shi > --- > kernel/sched/fair.c |5 - > 1 files changed, 0 insertions(+), 5 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 59e072b..df99456 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3150,11 +3150,6 @@ select_task_rq_fair(struct task_struct *p, int > sd_flag, int wake_flags) > } > > new_cpu = find_idlest_cpu(group, p, cpu); > - if (new_cpu == -1 || new_cpu == cpu) { > - /* Now try balancing at a lower domain level of cpu */ > - sd = sd->child; > - continue; > - } > > /* Now try balancing at a lower domain level of new_cpu */ > cpu = new_cpu; > Regards Preeti U Murthy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux 3.7
Hi all, On Mon, 10 Dec 2012 19:59:50 -0800 Linus Torvalds wrote: > > Anyway, it's been a somewhat drawn out release despite the 3.7 merge > window having otherwise appeared pretty straightforward, and none of > the rc's were all that big either. But we're done, and this means that > the merge window will close on Christmas eve. > > Or rather, I'll probably close it a couple of days early. For obvious > reasons. It's the main commercial holiday of the year, after all. > > So aim for winter solstice, and no later. Deal? And even then, I might > be deep into the glögg. Hopefully people will submit earlier rather than later as currently there are more commits in linux-next than every before ... Also, please resist the usual rebase before submitting frenzy :-( See http://neuling.org/linux-next-size.html (double click to see the complete history). -- Cheers, Stephen Rothwells...@canb.auug.org.au pgp8lzzKhmqBW.pgp Description: PGP signature
linux-next: manual merge of the gpio-lw tree with the driver-core tree
Hi Linus, Today's linux-next merge of the gpio-lw tree got a conflict in drivers/gpio/gpio-stmpe.c between commit 3836309d9346 ("gpio: remove use of __devinit") from the driver-core tree and commit fc13d5a5b17c ("gpio: Provide the STMPE GPIO driver with its own IRQ Domain") from the gpio-lw tree. I fixed it up (see below) and can carry the fix as necessary (no action is required). -- Cheers, Stephen Rothwells...@canb.auug.org.au diff --cc drivers/gpio/gpio-stmpe.c index 6411600,3e1d398..000 --- a/drivers/gpio/gpio-stmpe.c +++ b/drivers/gpio/gpio-stmpe.c @@@ -288,21 -292,37 +292,37 @@@ int stmpe_gpio_irq_map(struct irq_domai return 0; } - static void stmpe_gpio_irq_remove(struct stmpe_gpio *stmpe_gpio) + void stmpe_gpio_irq_unmap(struct irq_domain *d, unsigned int virq) { - int base = stmpe_gpio->irq_base; - int irq; - - for (irq = base; irq < base + stmpe_gpio->chip.ngpio; irq++) { #ifdef CONFIG_ARM - set_irq_flags(irq, 0); + set_irq_flags(virq, 0); #endif - irq_set_chip_and_handler(irq, NULL, NULL); - irq_set_chip_data(irq, NULL); + irq_set_chip_and_handler(virq, NULL, NULL); + irq_set_chip_data(virq, NULL); + } + + static const struct irq_domain_ops stmpe_gpio_irq_simple_ops = { + .unmap = stmpe_gpio_irq_unmap, + .map = stmpe_gpio_irq_map, + .xlate = irq_domain_xlate_twocell, + }; + -static int __devinit stmpe_gpio_irq_init(struct stmpe_gpio *stmpe_gpio) ++static int stmpe_gpio_irq_init(struct stmpe_gpio *stmpe_gpio) + { + int base = stmpe_gpio->irq_base; + + stmpe_gpio->domain = irq_domain_add_simple(NULL, + stmpe_gpio->chip.ngpio, base, + &stmpe_gpio_irq_simple_ops, stmpe_gpio); + if (!stmpe_gpio->domain) { + dev_err(stmpe_gpio->dev, "failed to create irqdomain\n"); + return -ENOSYS; } + + return 0; } -static int __devinit stmpe_gpio_probe(struct platform_device *pdev) +static int stmpe_gpio_probe(struct platform_device *pdev) { struct stmpe *stmpe = dev_get_drvdata(pdev->dev.parent); struct device_node *np = pdev->dev.of_node; pgpRPGZJAuY1v.pgp Description: PGP signature
[GIT PULL] MMC updates for 3.8-rc1
Hi Linus, Please pull from: git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git tags/mmc-updates-for-3.8-rc1 to receive the MMC merge for 3.8. There are currently no conflicts, and these patches have been tested in linux-next. Thanks. The following changes since commit 91ab252ac5a5c3461dd6910797611e9172626aed: mmc: sh-mmcif: avoid oops on spurious interrupts (second try) (2012-12-06 13:54:35 -0500) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git tags/mmc-updates-for-3.8-rc1 for you to fetch changes up to 71e69211eac889898dec5a21270347591eb2d001: mmc: sdhci: implement the .card_event() method (2012-12-07 13:56:03 -0500) MMC highlights for 3.8: Core: - Expose access to the eMMC RPMB ("Replay Protected Memory Block") area by extending the existing mmc_block ioctl. - Add SDIO powered-suspend DT properties to the core MMC DT binding. - Add no-1-8-v DT flag for boards where the SD controller reports that it supports 1.8V but the board itself has no way to switch to 1.8V. - More work on switching to 1.8V UHS support using a vqmmc regulator. - Fix up a case where the slot-gpio helper may fail to reset the host controller properly if a card was removed during a transfer. - Fix several cases where a broken device could cause an infinite loop while we wait for a register to update. Drivers: - at91-mci: Remove obsolete driver, atmel-mci handles these devices now. - sdhci-dove: Allow using GPIOs for card-detect notifications. - sdhci-esdhc: Fix for recovering from ADMA errors on broken silicon. - sdhci-s3c: Add pinctrl support. - wmt-sdmmc: New driver for WonderMedia SD/MMC controllers. Abhilash Kesavan (4): mmc: dt: Fix typo in filename mmc: dt: Add optional pm properties to core binding mmc: sdhci-pltfm: Support optional pm properties mmc: dw_mmc: Add sdio power bindings Al Cooper (1): mmc: Limit MMC speed to 52MHz if not HS200 Andy Shevchenko (2): mmc: dw_mmc: use __devexit_p macro for .remove() mmc: dw_mmc: use helper macro module_platform_driver() Arnd Bergmann (1): mmc: dw_mmc: fix more const pointer warnings Balaji T K (4): mmc: omap_hsmmc: remove warning message for debounce clock mmc: omap_hsmmc: Fix Oops in case of data errors mmc: omap_hsmmc: No reset of cmd state machine for DCRC mmc: omap_hsmmc: Update error code for response_busy cmd Daniel Drake (2): mmc: sdhci: add quirk for lack of 1.8v support mmc: dt: add no-1-8-v device tree flag Daniel Mack (2): mmc: omap_hsmmc: claim pinctrl at probe time mmc: omap_hsmmc: add DT property for max bus frequency Fabio Estevam (1): mmc: mxs-mmc: Remove platform data Felipe Balbi (1): mmc: omap_hsmmc: Introduce omap_hsmmc_prepare/complete Guennadi Liakhovetski (6): mmc: sh_mobile_sdhi: fix clock frequency printing mmc: sh_mobile_sdhi: remove unneeded clock connection ID mmc: sh_mmcif: remove unneeded clock connection ID mmc: add a card-event host operation mmc: extend the slot-gpio card-detection to use host's .card_event() method mmc: sdhci: implement the .card_event() method Haijun Zhang (1): mmc: eSDHC: Recover from ADMA errors Hebbar, Gururaja (1): mmc: omap_hsmmc: Enable HSPE bit for high speed cards Jaehoon Chung (2): mmc: dw_mmc: relocate where dw_mci_setup_bus() is called from mmc: dw_mmc: remove duplicated buswidth code Javier Martin (1): mmc: mxcmmc: fix SD cards not being detected sometimes. Jerry Huang (1): mmc: sdhci-of-esdhc: support commands with busy response expecting TC Johan Rudholm (1): mmc: core: debugfs: Add signal_voltage to ios dump Kevin Liu (5): mmc: sdhci: Balance vmmc regulator_enable(), and always enable vqmmc mmc: sdhci-pxav3: Add base clock quirk mmc: host: Make UHS timing values fully unique mmc: sdhci: Use regulator min/max voltage range according to spec mmc: sdhci-pxav3: add quirks2 Kyoungil Kim (1): mmc: sdio: Use multiple scatter/gather list Lee Jones (1): mmc: Standardise capability type Loic Pallardy (5): mmc: core: Expose access to RPMB partition mmc: card: Do not scan RPMB partitions mmc: core: Extend sysfs to ext_csd parameters for RPMB support mmc: core: Add mmc_set_blockcount feature mmc: card: Add RPMB support in IOCTL interface Ludovic Desroches (1): mmc: at91-mci: remove obsolete driver Madhvapathi Sriram (1): mmc: sdhci-pci: Enable SDHCI_CAN_DO_HISPD for Ricoh SDHCI controller Marina Makienko (1): mmc: vub300: add missing usb_put_dev Rafael J. Wysocki (1): mmc: sdio: Add empty bus-level suspend/resume callbacks Russell King (3): mmc: sdhci-dove: use devm_clk_get() mmc: sdhci-dov
Re: [[PATCH v9 3/3] 1/1] virtio_console: Remove buffers from out_vq at port removal
On (Tue) 11 Dec 2012 [09:39:41], Rusty Russell wrote: > Amit Shah writes: > > > On (Fri) 16 Nov 2012 [11:22:09], Rusty Russell wrote: > >> Amit Shah writes: > >> > From: Sjur Brændeland > >> > > >> > Remove buffers from the out-queue when a port is removed. Rproc_serial > >> > communicates with remote processors that may crash and leave buffers in > >> > the out-queue. The virtio serial ports may have buffers in the out-queue > >> > as well, e.g. for non-blocking ports and the host didn't consume them > >> > yet. > >> > > >> > [Amit: Remove WARN_ON for generic ports case.] > >> > > >> > Signed-off-by: Sjur Brændeland > >> > Signed-off-by: Amit Shah > >> > >> I already have this in my pending queue; I've promoted it to my > >> virtio-next branch now. > > > > Rusty, I still see this series in your pending queue, not in > > virtio-next. Did anything change in the meantime? > > Hmm: > > 40e625ac50f40d87ddba93280d0a503425aa68e9? I'm sorry, I meant the remoteproc code, not this patch. Amit -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Linux 3.7
Whee. After an extra rc release, 3.7 is now out. After a few more trials at fixing things, in the end we ended up reverting the kswapd changes that caused problems. And with the extra rc, I had decided to risk doing the buffer.c cleanups that would otherwise have just been marked for stable during the next merge window, and had enough time to fix a few problems that people found there too. There's also a fix for a SCSI driver bug that was exposed by the last-minute workqueue fixes in rc8. Other than that, there's a few networking fixes, and some trivial fixes for sparc and MIPS. Anyway, it's been a somewhat drawn out release despite the 3.7 merge window having otherwise appeared pretty straightforward, and none of the rc's were all that big either. But we're done, and this means that the merge window will close on Christmas eve. Or rather, I'll probably close it a couple of days early. For obvious reasons. It's the main commercial holiday of the year, after all. So aim for winter solstice, and no later. Deal? And even then, I might be deep into the glögg. Linus --- Chris Ball (1): Revert misapplied "mmc: sh-mmcif: avoid oops on spurious interrupts" Dan Carpenter (1): vfs: clear to the end of the buffer on partial buffer reads David Daney (1): MIPS: Avoid mcheck by flushing page range in huge_ptep_set_access_flags() David Howells (2): MODSIGN: Don't use enum-type bitfields in module signature info block ASN.1: Fix an indefinite length skip error David S. Miller (2): sparc64: exit_group should kill register windows just like plain exit. sparc: Fix piggyback with newer binutils. Dmitry Adamushko (1): MIPS: Fix endless loop when processing signals for kernel tasks Eric Dumazet (1): net: gro: fix possible panic in skb_gro_receive() Florian Fainelli (1): Input: matrix-keymap - provide proper module license Guennadi Liakhovetski (1): mmc: sh-mmcif: avoid oops on spurious interrupts (second try) Heiko Stübner (1): mmc: sdhci-s3c: fix missing clock for gpio card-detect James Hogan (2): linux/kernel.h: define SYMBOL_PREFIX modsign: add symbol prefix to certificate list Johannes Berg (1): ipv4: ip_check_defrag must not modify skb before unsharing Johannes Weiner (2): mm: vmscan: do not keep kswapd looping forever due to individual uncompactable zones mm: vmscan: fix inappropriate zone congestion clearing Linus Torvalds (5): vfs: avoid "attempt to access beyond end of device" warnings vfs: fix O_DIRECT read past end of block device Revert "mm: avoid waking kswapd for THP allocations when compaction is deferred or contended" Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage Linux 3.7 Mel Gorman (2): mm: compaction: validate pfn range passed to isolate_freepages_block tmpfs: fix shared mempolicy leak Neal Cardwell (4): inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state inet_diag: validate byte code to prevent oops in inet_diag_bc_run() inet_diag: avoid unsafe and nonsensical prefix matches in inet_diag_bc_run() inet_diag: validate port comparison byte code to prevent unsafe reads Ralf Baechle (3): MIPS: N32: Fix preadv(2) and pwritev(2) entry points. MIPS: N32: Fix signalfd4 syscall entry point MIPS: R3000/R3081: Fix CPU detection. Richard Weinberger (2): UBI: remove PEB from free tree in get_peb_for_wl() UBI: dont call ubi_self_check_all_ff() in __wl_get_peb() Tejun Heo (1): workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s Thomas Gleixner (1): watchdog: Fix CPU hotplug regression Tim Gardner (1): lib/Makefile: Fix oid_registry build dependency Xiaotian Feng (1): megaraid: fix BUG_ON() from incorrect use of delayed work Yuchung Cheng (1): tcp: bug fix Fast Open client retransmission -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/2] zsmalloc: add function to query object size
On Fri, Dec 07, 2012 at 04:45:53PM -0800, Nitin Gupta wrote: > On Sun, Dec 2, 2012 at 11:52 PM, Minchan Kim wrote: > > On Sun, Dec 02, 2012 at 11:20:42PM -0800, Nitin Gupta wrote: > >> > >> > >> On Nov 30, 2012, at 5:54 AM, Minchan Kim > >> wrote: > >> > >> > On Thu, Nov 29, 2012 at 10:54:48PM -0800, Nitin Gupta wrote: > >> >> Changelog v2 vs v1: > >> >> - None > >> >> > >> >> Adds zs_get_object_size(handle) which provides the size of > >> >> the given object. This is useful since the user (zram etc.) > >> >> now do not have to maintain object sizes separately, saving > >> >> on some metadata size (4b per page). > >> >> > >> >> The object handle encodes pair which currently points > >> >> to the start of the object. Now, the handle implicitly stores the size > >> >> information by pointing to the object's end instead. Since zsmalloc is > >> >> a slab based allocator, the start of the object can be easily determined > >> >> and the difference between the end offset encoded in the handle and the > >> >> start gives us the object size. > >> >> > >> >> Signed-off-by: Nitin Gupta > >> > Acked-by: Minchan Kim > >> > > >> > I already had a few comment in your previous versoin. > >> > I'm OK although you ignore them because I can make follow up patch about > >> > my nitpick but could you answer below my question? > >> > > >> >> --- > >> >> drivers/staging/zsmalloc/zsmalloc-main.c | 177 > >> >> +- > >> >> drivers/staging/zsmalloc/zsmalloc.h |1 + > >> >> 2 files changed, 127 insertions(+), 51 deletions(-) > >> >> > >> >> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c > >> >> b/drivers/staging/zsmalloc/zsmalloc-main.c > >> >> index 09a9d35..65c9d3b 100644 > >> >> --- a/drivers/staging/zsmalloc/zsmalloc-main.c > >> >> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c > >> >> @@ -112,20 +112,20 @@ > >> >> #define MAX_PHYSMEM_BITS 36 > >> >> #else /* !CONFIG_HIGHMEM64G */ > >> >> /* > >> >> - * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will > >> >> just > >> >> + * If this definition of MAX_PHYSMEM_BITS is used, OFFSET_BITS will > >> >> just > >> >> * be PAGE_SHIFT > >> >> */ > >> >> #define MAX_PHYSMEM_BITS BITS_PER_LONG > >> >> #endif > >> >> #endif > >> >> #define _PFN_BITS(MAX_PHYSMEM_BITS - PAGE_SHIFT) > >> >> -#define OBJ_INDEX_BITS(BITS_PER_LONG - _PFN_BITS) > >> >> -#define OBJ_INDEX_MASK((_AC(1, UL) << OBJ_INDEX_BITS) - 1) > >> >> +#define OFFSET_BITS(BITS_PER_LONG - _PFN_BITS) > >> >> +#define OFFSET_MASK((_AC(1, UL) << OFFSET_BITS) - 1) > >> >> > >> >> #define MAX(a, b) ((a) >= (b) ? (a) : (b)) > >> >> /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ > >> >> #define ZS_MIN_ALLOC_SIZE \ > >> >> -MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OBJ_INDEX_BITS)) > >> >> +MAX(32, (ZS_MAX_PAGES_PER_ZSPAGE << PAGE_SHIFT >> OFFSET_BITS)) > >> >> #define ZS_MAX_ALLOC_SIZEPAGE_SIZE > >> >> > >> >> /* > >> >> @@ -256,6 +256,11 @@ static int is_last_page(struct page *page) > >> >>return PagePrivate2(page); > >> >> } > >> >> > >> >> +static unsigned long get_page_index(struct page *page) > >> >> +{ > >> >> +return is_first_page(page) ? 0 : page->index; > >> >> +} > >> >> + > >> >> static void get_zspage_mapping(struct page *page, unsigned int > >> >> *class_idx, > >> >>enum fullness_group *fullness) > >> >> { > >> >> @@ -433,39 +438,86 @@ static struct page *get_next_page(struct page > >> >> *page) > >> >>return next; > >> >> } > >> >> > >> >> -/* Encode as a single handle value */ > >> >> -static void *obj_location_to_handle(struct page *page, unsigned long > >> >> obj_idx) > >> >> +static struct page *get_prev_page(struct page *page) > >> >> { > >> >> -unsigned long handle; > >> >> +struct page *prev, *first_page; > >> >> > >> >> -if (!page) { > >> >> -BUG_ON(obj_idx); > >> >> -return NULL; > >> >> -} > >> >> +first_page = get_first_page(page); > >> >> +if (page == first_page) > >> >> +prev = NULL; > >> >> +else if (page == (struct page *)first_page->private) > >> >> +prev = first_page; > >> >> +else > >> >> +prev = list_entry(page->lru.prev, struct page, lru); > >> >> > >> >> -handle = page_to_pfn(page) << OBJ_INDEX_BITS; > >> >> -handle |= (obj_idx & OBJ_INDEX_MASK); > >> >> +return prev; > >> >> > >> >> -return (void *)handle; > >> >> } > >> >> > >> >> -/* Decode pair from the given object handle */ > >> >> -static void obj_handle_to_location(unsigned long handle, struct page > >> >> **page, > >> >> -unsigned long *obj_idx) > >> >> +static void *encode_ptr(struct page *page, unsigned long offset) > >> >> { > >> >> -*page = pfn_to_page(handle >> OBJ_INDEX_BITS); > >> >> -*obj_idx = handle & OBJ_INDEX_MASK; > >> >> +unsigned long ptr; > >> >> +ptr = page_to_pfn(page) << OFFSET_BITS; > >> >> +ptr |= offset & OFFSET_MASK; > >> >> +return (void *)p
Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
On Mon, Dec 10, 2012 at 7:41 PM, H. Peter Anvin wrote: > On 12/10/2012 06:39 PM, Yinghai Lu wrote: >> >> No, you should not copy that several times. >> >> just pre-allocate some kbytes in BRK, and copy to there one time. >> > > He doesn't copy it several times. He just saves an offset into the > initrd blob. ucode is together with initrd blob, and code scan that blob, and save the pointer about ucode, then BSP use it, and APs use it after that when initrd get freed, that ucode is copied to somewhere... and his patch missed initrd could be get relocated for 64bit and 32bit. So AP would not find that saved ucode. After i pointed that, he said he will update the pointer when relocate the initrd for AP. And my suggestion is: after scan and find the ucode, save it to BRK, so don't need to adjust pointer again, and don't need to copy the blob and update the pointer again. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] lpfc: init: fix misspelling word in mailbox command waiting comments
On 12/11/2012 11:53 AM, re...@cn.fujitsu.com wrote: From: Ren Mingxin Superfluous, sorry for disturbing everyone :-( Ren -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lpfc: init: fix misspelling word in mailbox command waiting comments
From: Ren Mingxin Correct misspelling of "outstanding" in mailbox command waiting comments. Signed-off-by: Ren Mingxin Signed-off-by: Pan Dayu --- drivers/scsi/lpfc/lpfc_init.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 7dc4218..8533160 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -2566,7 +2566,7 @@ lpfc_block_mgmt_io(struct lpfc_hba *phba, int mbx_action) } spin_unlock_irqrestore(&phba->hbalock, iflag); - /* Wait for the outstnading mailbox command to complete */ + /* Wait for the outstanding mailbox command to complete */ while (phba->sli.mbox_active) { /* Check active mailbox complete status every 2ms */ msleep(2); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4] regulator: s5m8767: Fix to work even if no DVS gpio present
On Mon, Dec 10, 2012 at 06:19:41PM +0530, Amit Daniel Kachhap wrote: > Signed-off-by: Amit Daniel Kachhap Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] lpfc: init: fix misspelling word in mailbox command waiting comments
Correct misspelling of "outstanding" in mailbox command waiting comments. Signed-off-by: Ren Mingxin Signed-off-by: Pan Dayu --- drivers/scsi/lpfc/lpfc_init.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c index 7dc4218..8533160 100644 --- a/drivers/scsi/lpfc/lpfc_init.c +++ b/drivers/scsi/lpfc/lpfc_init.c @@ -2566,7 +2566,7 @@ lpfc_block_mgmt_io(struct lpfc_hba *phba, int mbx_action) } spin_unlock_irqrestore(&phba->hbalock, iflag); - /* Wait for the outstnading mailbox command to complete */ + /* Wait for the outstanding mailbox command to complete */ while (phba->sli.mbox_active) { /* Check active mailbox complete status every 2ms */ msleep(2); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On Tue, 2012-12-11 at 04:19 +0100, Andi Kleen wrote: > On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: > > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > > > > Oh, it will be putback to lru list during migration. So does your "some > > > > time" mean before call check_new_page? > > > > > > Yes until the next check_new_page() whenever that is. If the migration > > > works it will be earlier, otherwise later. > > > > But I can't figure out any page reclaim path check if the page is set > > PG_hwpoison, can poisoned pages be rclaimed? > > The only way to reclaim a page is to free and reallocate it. Then why there doesn't have check in reclaim path to avoid relcaim poisoned page? -Simon > > -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/4] regulator: s5m8767: Fix to read the first DVS register.
On Mon, Dec 10, 2012 at 06:19:40PM +0530, Amit Daniel Kachhap wrote: > This patch modifies the DVS register read function to select correct DVS1 > register. This change is required because the GPIO select pin is 000 in > unintialized state and hence selects the DVS1 register. Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
resend--[PATCH] improve read ahead in kernel
resend it, due to format error Subject: [PATCH] when system in low memory scenario, imaging there is a mp3 play, ora video play, we need to read mp3 or video file from memory to page cache,but when system lack of memory, page cache of mp3 or video file will be reclaimed.once read in memory, then reclaimed, it will cause audio or video glitch,and it will increase the io operation at the same time. Signed-off-by: xiaobing tu --- include/linux/mm_types.h |4 mm/filemap.c |4 mm/readahead.c | 20 +--- mm/vmscan.c | 10 -- 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5b42f1b..2739995 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -149,6 +149,10 @@ struct page { */ void *shadow; #endif +#ifdef CONFIG_LOWMEMORY_READAHEAD +unsigned int ioprio; +#endif + } /* * If another subsystem starts using the double word pairing for atomic diff --git a/mm/filemap.c b/mm/filemap.c index a0701e6..ca3a3e8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -129,6 +129,10 @@ void __delete_from_page_cache(struct page *page) page->mapping = NULL; /* Leave page->index set: truncation lookup relies upon it */ mapping->nrpages--; +#ifdef CONFIG_LOWMEMORY_READAHEAD +page->ioprio = 0; +#endif + __dec_zone_page_state(page, NR_FILE_PAGES); if (PageSwapBacked(page)) __dec_zone_page_state(page, NR_SHMEM); diff --git a/mm/readahead.c b/mm/readahead.c index cbcbb02..5c2d2ff 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -159,6 +159,11 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, int page_idx; int ret = 0; loff_t isize = i_size_read(inode); +#ifdef CONFIG_LOWMEMORY_READAHEAD +int class = 0; +if (p->io_context) +class = IOPRIO_PRIO_CLASS(p->io_context->ioprio); +#endif if (isize == 0) goto out; @@ -177,12 +182,21 @@ __do_page_cache_readahead(struct address_space *mapping, struct file *filp, rcu_read_lock(); page = radix_tree_lookup(&mapping->page_tree, page_offset); rcu_read_unlock(); -if (page) -continue; - +if (page){ +#ifdef CONFIG_LOWMEMORY_READAHEAD +if (class == IOPRIO_CLASS_RT) { +page->ioprio = 1; +#endif +continue; +} page = page_cache_alloc_readahead(mapping); if (!page) break; +#ifdef CONFIG_LOWMEMORY_READAHEAD +if (class == IOPRIO_CLASS_RT) { +page->ioprio = 1; +#endif + page->index = page_offset; list_add(&page->lru, &page_pool); if (page_idx == nr_to_read - lookahead_size) diff --git a/mm/vmscan.c b/mm/vmscan.c index 753a2dc..0a1cae8 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -728,8 +728,14 @@ static enum page_references page_check_references(struct page *page, } /* Reclaim if clean, defer dirty pages to writeback */ -if (referenced_page && !PageSwapBacked(page)) -return PAGEREF_RECLAIM_CLEAN; +if (referenced_page && !PageSwapBacked(page)) { +#ifdef CONFIG_LOWMEMORY_READAHEAD +if (page->ioprio == 1) { +return PAGEREF_ACTIVATE; +} else +#endif +return PAGEREF_RECLAIM_CLEAN; +} return PAGEREF_RECLAIM; } -- 1.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4] regulator: s5m8767: Fix to work when platform registers less regulators
On Mon, Dec 10, 2012 at 06:19:39PM +0530, Amit Daniel Kachhap wrote: > Signed-off-by: Amit Daniel Kachhap Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
On 12/10/2012 06:39 PM, Yinghai Lu wrote: > > No, you should not copy that several times. > > just pre-allocate some kbytes in BRK, and copy to there one time. > He doesn't copy it several times. He just saves an offset into the initrd blob. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 00/12] Refactoring the ab8500 battery management drivers
On Fri, Nov 30, 2012 at 11:57:22AM +, Lee Jones wrote: > The aim of this and subsequent patch-sets is to refactor battery > management services provided by the ab8500 MFD. This first patch-set > brings a few modifications to the collection which happened on the > internal kernel, but were never Mainlined. There are lots more of > these to come. We also tidy-up some of the Device Tree related patches > which are currently pending in -next. It fails to apply... Applying: ab8500_charger: Charger current step-up/down Applying: ab8500_fg: Don't clear the CCMuxOffset bit Applying: ab8500_btemp: Detect battery type in workqueue Applying: ab8500_btemp: Fix crazy tabbing implementation fatal: sha1 information is lacking or useless (drivers/power/ab8500_bmdata.c). Repository lacks necessary blobs to fall back on 3-way merge. Cannot fall back to three-way merge. Patch failed at 0001 ab8500_btemp: Fix crazy tabbing implementation I have tried battery tree (as of eba3b670a9166a91be5a, Nov 18), I have tried pristine Linus' tree, and I have tried linux-next. All failed in different places. I have tried to apply the "ab8500_btemp: Fix crazy tabbing implementation" manually (which applied with fuzz), but then the other patches failed. So I gave up. What is the base of the patches? Looking at the patch... diff --git a/drivers/power/ab8500_bmdata.c b/drivers/power/ab8500_bmdata.c index f16b60c..2623b16 100644 --- a/drivers/power/ab8500_bmdata.c +++ b/drivers/power/ab8500_bmdata.c There is really no f16b60c object in any tree, which makes me think that you use some private tree. Thanks, Anton. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
> "There are not so many free pages in a typical server system", sorry I don't > quite understand it. Linux tries to keep most memory in caches. As Linus says "free memory is bad memory" > > buffered_rmqueue() > prep_new_page() > check_new_page() > bad_page() > > If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M > memory will be dropped. prep_new_page() is only called on whatever is allocated. MAX_ORDER is much smaller than 2^10 If you allocate a large order page then yes the complete page is dropped. This is today generally true in hwpoison. It would be one possible area of improvement (probably mostly if 1GB pages become more common than they are today) It's usually not a problem because usually most allocations are small order and systems have generally very few memory errors, and even the largest MAX_ORDER pages are a small fraction of the total memory. If you lose larger amounts of memory usually you quickly hit something that HWPoison cannot handle. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH RESEND RESEND] net: remove obsolete simple_strto
This patch replace the obsolete simple_strto with kstrto Signed-off-by: Abhijit Pawar --- net/core/netpoll.c |6 -- net/ipv4/netfilter/ipt_CLUSTERIP.c |9 +++-- net/mac80211/debugfs_sta.c |4 +++- net/netfilter/nf_conntrack_core.c |6 -- 4 files changed, 18 insertions(+), 7 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 77a0388..3151acf 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -674,7 +674,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt) if ((delim = strchr(cur, '@')) == NULL) goto parse_failed; *delim = 0; - np->local_port = simple_strtol(cur, NULL, 10); + if (kstrtou16(cur, 10, &np->local_port)) + goto parse_failed; cur = delim; } cur++; @@ -705,7 +706,8 @@ int netpoll_parse_options(struct netpoll *np, char *opt) *delim = 0; if (*cur == ' ' || *cur == '\t') np_info(np, "warning: whitespace is not allowed\n"); - np->remote_port = simple_strtol(cur, NULL, 10); + if (kstrtou16(cur, 10, &np->remote_port)) + goto parse_failed; cur = delim; } cur++; diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index fe5daea..75e33a7 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -661,6 +661,7 @@ static ssize_t clusterip_proc_write(struct file *file, const char __user *input, #define PROC_WRITELEN 10 char buffer[PROC_WRITELEN+1]; unsigned long nodenum; + int rc; if (size > PROC_WRITELEN) return -EIO; @@ -669,11 +670,15 @@ static ssize_t clusterip_proc_write(struct file *file, const char __user *input, buffer[size] = 0; if (*buffer == '+') { - nodenum = simple_strtoul(buffer+1, NULL, 10); + rc = kstrtoul(buffer+1, 10, &nodenum); + if (rc) + return rc; if (clusterip_add_node(c, nodenum)) return -ENOMEM; } else if (*buffer == '-') { - nodenum = simple_strtoul(buffer+1, NULL,10); + rc = kstrtoul(buffer+1, 10, &nodenum); + if (rc) + return rc; if (clusterip_del_node(c, nodenum)) return -ENOENT; } else diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c index 49a1c70..6fb1168 100644 --- a/net/mac80211/debugfs_sta.c +++ b/net/mac80211/debugfs_sta.c @@ -220,7 +220,9 @@ static ssize_t sta_agg_status_write(struct file *file, const char __user *userbu } else return -EINVAL; - tid = simple_strtoul(buf, NULL, 0); + ret = kstrtoul(buf, 0, &tid); + if (ret) + return ret; if (tid >= IEEE80211_NUM_TIDS) return -EINVAL; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index af17516..08cdc71 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1409,7 +1409,7 @@ EXPORT_SYMBOL_GPL(nf_ct_alloc_hashtable); int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp) { - int i, bucket; + int i, bucket, rc; unsigned int hashsize, old_size; struct hlist_nulls_head *hash, *old_hash; struct nf_conntrack_tuple_hash *h; @@ -1422,7 +1422,9 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp) if (!nf_conntrack_htable_size) return param_set_uint(val, kp); - hashsize = simple_strtoul(val, NULL, 0); + rc = kstrtouint(val, 0, &hashsize); + if (rc) + return rc; if (!hashsize) return -EINVAL; -- 1.7.7.6 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RESEND] net: remove obsolete simple_strto
On 12/11/2012 12:40 AM, David Miller wrote: > From: Abhijit Pawar > Date: Mon, 10 Dec 2012 14:42:28 +0530 > >> This patch replace the obsolete simple_strto with kstrto >> >> Signed-off-by: Abhijit Pawar > > Applied. > Hi David, It seems that there are occurences of simple_strto* still present in the couple of files which are not yet removed correctly by this patch. I will send a modified patch shortly. Please revert this commit and use the newly sent patch to merge with the tree. -- - Abhijit -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
On 12/11/2012 11:07 AM, Jianguo Wu wrote: On 2012/12/11 10:33, Tang Chen wrote: This patch introduces a new array zone_movable_limit[] to store the ZONE_MOVABLE limit from movablecore_map boot option for all nodes. The function sanitize_zone_movable_limit() will find out to which node the ranges in movable_map.map[] belongs, and calculates the low boundary of ZONE_MOVABLE for each node. Signed-off-by: Tang Chen Signed-off-by: Jiang Liu Reviewed-by: Wen Congyang Reviewed-by: Lai Jiangshan Tested-by: Lin Feng --- mm/page_alloc.c | 77 +++ 1 files changed, 77 insertions(+), 0 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1c91d16..4853619 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; static unsigned long __initdata required_kernelcore; static unsigned long __initdata required_movablecore; static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES]; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid, return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); } +/** + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array. + * + * zone_movable_limit is initialized as 0. This function will try to get + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and + * assigne them to zone_movable_limit. + * zone_movable_limit[nid] == 0 means no limit for the node. + * + * Note: Each range is represented as [start_pfn, end_pfn) + */ +static void __meminit sanitize_zone_movable_limit(void) +{ + int map_pos = 0, i, nid; + unsigned long start_pfn, end_pfn; + + if (!movablecore_map.nr_map) + return; + + /* Iterate all ranges from minimum to maximum */ + for_each_mem_pfn_range(i, MAX_NUMNODES,&start_pfn,&end_pfn,&nid) { + /* +* If we have found lowest pfn of ZONE_MOVABLE of the node +* specified by user, just go on to check next range. +*/ + if (zone_movable_limit[nid]) + continue; + +#ifdef CONFIG_ZONE_DMA + /* Skip DMA memory. */ + if (start_pfn< arch_zone_highest_possible_pfn[ZONE_DMA]) + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA]; +#endif + +#ifdef CONFIG_ZONE_DMA32 + /* Skip DMA32 memory. */ + if (start_pfn< arch_zone_highest_possible_pfn[ZONE_DMA32]) + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32]; +#endif + +#ifdef CONFIG_HIGHMEM + /* Skip lowmem if ZONE_MOVABLE is highmem. */ + if (zone_movable_is_highmem()&& Hi Tang, I think zone_movable_is_highmem() is not work correctly here. sanitize_zone_movable_limit zone_movable_is_highmem<--using movable_zone here find_zone_movable_pfns_for_nodes find_usable_zone_for_movable<--movable_zone is specified here I think Jiang Liu's patch works fine for highmem, please refer to: http://marc.info/?l=linux-mm&m=135476085816087&w=2 Hi Wu, Yes, I forgot movable_zone think. Thanks for reminding me. :) But Liu's patch you just mentioned, I didn't use it because I don't think we should skip kernelcore when movablecore_map is specified. If these 2 options are not conflict, we should satisfy them both. :) Of course, I also think Liu's suggestion is wonderful. But I think we need more discussion on it. :) I'll fix it soon. Thanks. :) Thanks, Jianguo Wu + start_pfn< arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]) + start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]; +#endif + + if (start_pfn>= end_pfn) + continue; + + while (map_pos< movablecore_map.nr_map) { + if (end_pfn<= movablecore_map.map[map_pos].start_pfn) + break; + + if (start_pfn>= movablecore_map.map[map_pos].end_pfn) { + map_pos++; + continue; + } + + /* +* The start_pfn of ZONE_MOVABLE is either the minimum +* pfn specified by movablecore_map, or 0, which means +* the node has no ZONE_MOVABLE. +*/ + zone_movable_limit[nid] = max(start_pfn, + movablecore_map.map[map_pos].start_pfn); + + break; + } + } +} + #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ static inline unsigned lon
[PATCH] Fix Irq Subsystem menu
Hi; In menuconfig, General setup -> Irq subsystem contains two possible menu-items. Sometimes, neither menu-item exists. This patch prevents the Irq susystem menu from appearing at all unless it will contain at least one menu-item, preventing a confusing, empty menu. --- linux-3.7-rc8/kernel/irq/Kconfig.orig 2012-12-05 20:59:00.963707538 -0500 +++ linux-3.7-rc8/kernel/irq/Kconfig2012-12-05 21:00:18.454788693 -0500 @@ -3,7 +3,6 @@ config HAVE_GENERIC_HARDIRQS bool if HAVE_GENERIC_HARDIRQS -menu "IRQ subsystem" # # Interrupt subsystem related configuration options # @@ -56,6 +55,13 @@ config GENERIC_IRQ_CHIP config IRQ_DOMAIN bool +# Support forced irq threading +config IRQ_FORCED_THREADING + bool + +menu "IRQ subsystem" + depends on ( IRQ_DOMAIN && DEBUG_FS ) || MAY_HAVE_SPARSE_IRQ + config IRQ_DOMAIN_DEBUG bool "Expose hardware/virtual IRQ mapping via debugfs" depends on IRQ_DOMAIN && DEBUG_FS @@ -66,10 +72,6 @@ config IRQ_DOMAIN_DEBUG If you don't know what this means you don't need it. -# Support forced irq threading -config IRQ_FORCED_THREADING - bool - config SPARSE_IRQ bool "Support sparse irq numbering" if MAY_HAVE_SPARSE_IRQ ---help--- Signed-off-by: Paul Thompson -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On 2012/12/11 10:58, Andi Kleen wrote: >> That sounds like overkill. There are not so many free pages in a >> typical server system. > > As Fengguang said -- memory error handling is tricky. Lots of things > could be done in theory, but they all have a cost in testing and > maintenance. > > In general they are only worth doing if the situation is common and > represents a significant percentage of the total pages of a relevant server > workload. > > -Andi > Hi Andi and Fengguang, "There are not so many free pages in a typical server system", sorry I don't quite understand it. buffered_rmqueue() prep_new_page() check_new_page() bad_page() If we alloc 2^10 pages and one of them is a poisoned page, then the whole 4M memory will be dropped. Thanks, Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On Mon, Dec 10, 2012 at 09:13:11PM -0600, Simon Jeons wrote: > On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > > > Oh, it will be putback to lru list during migration. So does your "some > > > time" mean before call check_new_page? > > > > Yes until the next check_new_page() whenever that is. If the migration > > works it will be earlier, otherwise later. > > But I can't figure out any page reclaim path check if the page is set > PG_hwpoison, can poisoned pages be rclaimed? The only way to reclaim a page is to free and reallocate it. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On Tue, 2012-12-11 at 04:01 +0100, Andi Kleen wrote: > > Oh, it will be putback to lru list during migration. So does your "some > > time" mean before call check_new_page? > > Yes until the next check_new_page() whenever that is. If the migration > works it will be earlier, otherwise later. But I can't figure out any page reclaim path check if the page is set PG_hwpoison, can poisoned pages be rclaimed? -Simon > > -andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/1] media: saa7146: don't use mutex_lock_interruptible() in device_release().
Use uninterruptible mutex_lock in the release() file op to make sure all resources are properly freed when a process is being terminated. Returning -ERESTARTSYS has no effect for a terminating process and this may cause driver resources not to be released. This was found using the following semantic patch (http://coccinelle.lip6.fr/): @r@ identifier fops; identifier release_func; @@ static const struct v4l2_file_operations fops = { .release = release_func }; @depends on r@ identifier r.release_func; expression E; @@ static int release_func(...) { ... - if (mutex_lock_interruptible(E)) return -ERESTARTSYS; + mutex_lock(E); ... } Signed-off-by: Cyril Roelandt --- drivers/media/common/saa7146/saa7146_fops.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/media/common/saa7146/saa7146_fops.c b/drivers/media/common/saa7146/saa7146_fops.c index b3890bd..0afe98d 100644 --- a/drivers/media/common/saa7146/saa7146_fops.c +++ b/drivers/media/common/saa7146/saa7146_fops.c @@ -265,8 +265,7 @@ static int fops_release(struct file *file) DEB_EE("file:%p\n", file); - if (mutex_lock_interruptible(vdev->lock)) - return -ERESTARTSYS; + mutex_lock(vdev->lock); if (vdev->vfl_type == VFL_TYPE_VBI) { if (dev->ext_vv_data->capabilities & V4L2_CAP_VBI_CAPTURE) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/1] media: saa7146: don't use mutex_lock_interruptible in
This is the same kind of bug as the one fixed by ddc43d6dc7df0849fe41b91460fa76145cf87b67 : mutex_lock() must be used in the device_release file operation in order for all resources to be freed, since returning -RESTARTSYS has no effect here. I stole the commit log from Sylwester Nawrocki, who fixed a few of these issues, since I could not formulate it better. --- Cyril Roelandt (1): media: saa7146: don't use mutex_lock_interruptible() in device_release(). drivers/media/common/saa7146/saa7146_fops.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
On 2012/12/11 10:33, Tang Chen wrote: > This patch introduces a new array zone_movable_limit[] to store the > ZONE_MOVABLE limit from movablecore_map boot option for all nodes. > The function sanitize_zone_movable_limit() will find out to which > node the ranges in movable_map.map[] belongs, and calculates the > low boundary of ZONE_MOVABLE for each node. > > Signed-off-by: Tang Chen > Signed-off-by: Jiang Liu > Reviewed-by: Wen Congyang > Reviewed-by: Lai Jiangshan > Tested-by: Lin Feng > --- > mm/page_alloc.c | 77 > +++ > 1 files changed, 77 insertions(+), 0 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 1c91d16..4853619 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -206,6 +206,7 @@ static unsigned long __meminitdata > arch_zone_highest_possible_pfn[MAX_NR_ZONES]; > static unsigned long __initdata required_kernelcore; > static unsigned long __initdata required_movablecore; > static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; > +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES]; > > /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ > int movable_zone; > @@ -4340,6 +4341,77 @@ static unsigned long __meminit > zone_absent_pages_in_node(int nid, > return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); > } > > +/** > + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array. > + * > + * zone_movable_limit is initialized as 0. This function will try to get > + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and > + * assigne them to zone_movable_limit. > + * zone_movable_limit[nid] == 0 means no limit for the node. > + * > + * Note: Each range is represented as [start_pfn, end_pfn) > + */ > +static void __meminit sanitize_zone_movable_limit(void) > +{ > + int map_pos = 0, i, nid; > + unsigned long start_pfn, end_pfn; > + > + if (!movablecore_map.nr_map) > + return; > + > + /* Iterate all ranges from minimum to maximum */ > + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { > + /* > + * If we have found lowest pfn of ZONE_MOVABLE of the node > + * specified by user, just go on to check next range. > + */ > + if (zone_movable_limit[nid]) > + continue; > + > +#ifdef CONFIG_ZONE_DMA > + /* Skip DMA memory. */ > + if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA]) > + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA]; > +#endif > + > +#ifdef CONFIG_ZONE_DMA32 > + /* Skip DMA32 memory. */ > + if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32]) > + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32]; > +#endif > + > +#ifdef CONFIG_HIGHMEM > + /* Skip lowmem if ZONE_MOVABLE is highmem. */ > + if (zone_movable_is_highmem() && Hi Tang, I think zone_movable_is_highmem() is not work correctly here. sanitize_zone_movable_limit zone_movable_is_highmem <--using movable_zone here find_zone_movable_pfns_for_nodes find_usable_zone_for_movable <--movable_zone is specified here I think Jiang Liu's patch works fine for highmem, please refer to: http://marc.info/?l=linux-mm&m=135476085816087&w=2 Thanks, Jianguo Wu > + start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]) > + start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]; > +#endif > + > + if (start_pfn >= end_pfn) > + continue; > + > + while (map_pos < movablecore_map.nr_map) { > + if (end_pfn <= movablecore_map.map[map_pos].start_pfn) > + break; > + > + if (start_pfn >= movablecore_map.map[map_pos].end_pfn) { > + map_pos++; > + continue; > + } > + > + /* > + * The start_pfn of ZONE_MOVABLE is either the minimum > + * pfn specified by movablecore_map, or 0, which means > + * the node has no ZONE_MOVABLE. > + */ > + zone_movable_limit[nid] = max(start_pfn, > + movablecore_map.map[map_pos].start_pfn); > + > + break; > + } > + } > +} > + > #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ > static inline unsigned long __meminit zone_spanned_pages_in_node(int nid, > unsigned long zone_type, > @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit > zone_absent_pages_in_node(int nid, > return zholes_size[zone_type]; > } > > +static void __meminit sanitize_zone_movable_limit(v
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
> Oh, it will be putback to lru list during migration. So does your "some > time" mean before call check_new_page? Yes until the next check_new_page() whenever that is. If the migration works it will be earlier, otherwise later. -andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
> That sounds like overkill. There are not so many free pages in a > typical server system. As Fengguang said -- memory error handling is tricky. Lots of things could be done in theory, but they all have a cost in testing and maintenance. In general they are only worth doing if the situation is common and represents a significant percentage of the total pages of a relevant server workload. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Serial8250 doesn't populate in /proc/iomem?
[+cc linux-arm, linux-samsung-soc, linux-serial] On Sun, Dec 9, 2012 at 11:25 PM, Woody Wu wrote: > Hi, list > > I found some io memory information is lost from /dev/iomem and want to > find out why. > > I have a 2.6.16 kernel running on a ARM board (Samsung S3C2410). From > the kernel log, I see 16 8250 serial ports were detected, and each of > thoese ports has a memory address: > > Serial: 8250/16550 driver $Revision: 1.90 $ 16 ports, IRQ sharing enabled > serial8250: tts0 at MMIO 0xe140 (irq = 16) is a 16550A > serial8250: tts1 at MMIO 0xe148 (irq = 16) is a 16550A > serial8250: tts2 at MMIO 0xe1400010 (irq = 17) is a 16550A > serial8250: tts3 at MMIO 0xe1400018 (irq = 17) is a 16550A > serial8250: tts4 at MMIO 0xe1400020 (irq = 18) is a 16550A > serial8250: tts5 at MMIO 0xe1400028 (irq = 18) is a 16550A > serial8250: tts6 at MMIO 0xe1400030 (irq = 19) is a 16550A > serial8250: tts7 at MMIO 0xe1400038 (irq = 19) is a 16550A > serial8250: tts8 at MMIO 0xe1400040 (irq = 48) is a 16550A > serial8250: tts9 at MMIO 0xe1400048 (irq = 48) is a 16550A > serial8250: tts10 at MMIO 0xe1400050 (irq = 49) is a 16550A > serial8250: tts11 at MMIO 0xe1400058 (irq = 49) is a 16550A > serial8250: tts12 at MMIO 0xe1400060 (irq = 60) is a 16550A > serial8250: tts13 at MMIO 0xe1400068 (irq = 60) is a 16550A > serial8250: tts14 at MMIO 0xe1400070 (irq = 61) is a 16550A > serial8250: tts15 at MMIO 0xe1400078 (irq = 61) is a 16550A > > I can read/write these serial ports from /dev/ttys*, in other words, > they do exist. I also can find the driver info from /proc/devices: > > Character devices: > > ... > 4 /dev/vc/0 > 4 tty > 4 tts > 5 /dev/tty > 5 /dev/console > 5 /dev/ptmx > 7 vcs > ... > > The problem is, I don't understand why there is no information about > these ports in /proc/iomem file. The 'iomem' file now contains: > > 1100-11000ffe : AX88796B > 1900-19000ffe : AX88796B > 3000-33ff : System RAM > 3001c000-301e826b : Kernel text > 301ea000-302234a3 : Kernel data > 4900-490f : s3c2410-ohci > 4900-490f : ohci_hcd > 4e00-4e0f : s3c2410-nand > 4e00-4e0f : s3c2410-nand > 5000-50003fff : s3c2410-uart.0 > 5000-50ff : s3c2410-uart > 50004000-50007fff : s3c2410-uart.1 > 50004000-500040ff : s3c2410-uart > 50008000-5000bfff : s3c2410-uart.2 > 50008000-500080ff : s3c2410-uart > 5300-530f : s3c2410-wdt > 5400-540f : s3c2410-i2c > 5a00-5a0f : s3c2410-sdi > 5a00-5a0f : mmci-s3c2410 > > You see, there is no serial8250 informations. > > Can anyone here please tell me how this can happen? Does it mean the > serial8250 driver don't populate or register in the /proc/iomem? That looks like a bug to me. There should be entries in /proc/iomem for the hardware device (showing that something responds at that address) and for the driver that claims the device. I think the 8250 core does the reservation in serial8250_request_std_resource(). You could try putting some printks in that path to see whether it's exercised. You're running a fairly old kernel, so it's possible the bug has already been fixed. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On Tue, Dec 11, 2012 at 10:25:00AM +0800, Xishi Qiu wrote: > On 2012/12/10 23:38, Andi Kleen wrote: > > >> It is another topic, I mean since the page is poisoned, so why not isolate > >> it > >> from page buddy alocator in soft_offline_page() rather than in > >> check_new_page(). > >> I find soft_offline_page() only migrate the page and mark HWPoison, the > >> poisoned > >> page is still managed by page buddy alocator. > > > > Doing it in check_new_page is the only way if the page is currently > > allocated by someone. Since that's not uncommon it's simplest to always > > do it this way. > > > > -Andi > > > > Hi Andi, > > The poisoned page is isolated in check_new_page, however the whole buddy > block will > be dropped, it seems to be a waste of memory. > > Can we separate the poisoned page from the buddy block, then *only* drop the > poisoned > page? That sounds like overkill. There are not so many free pages in a typical server system. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC v3] Support volatile range for anon vma
Sorry, resending with fixing compile error. :( >From 0cfd3b65e4e90ab59abe8a337334414f92423cad Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 11 Dec 2012 11:38:30 +0900 Subject: [RFC v3] Support volatile range for anon vma This still is [RFC v3] because just passed my simple test with TCMalloc tweaking. I hope more inputs from user-space allocator people and test patch with their allocator because it might need design change of arena management design for getting real vaule. Changelog from v2 * Removing madvise(addr, length, MADV_NOVOLATILE). * add vmstat about the number of discarded volatile pages * discard volatile pages without promotion in reclaim path This is based on v3.6. - What's the madvise(addr, length, MADV_VOLATILE)? It's a hint that user deliver to kernel so kernel can *discard* pages in a range anytime. - What happens if user access page(ie, virtual address) discarded by kernel? The user can see zero-fill-on-demand pages as if madvise(DONTNEED). - What happens if user access page(ie, virtual address) doesn't discarded by kernel? The user can see old data without page fault. - What's different with madvise(DONTNEED)? System call semantic DONTNEED makes sure user always can see zero-fill pages after he calls madvise while VOLATILE can see zero-fill pages or old data. Internal implementation The madvise(DONTNEED) should zap all mapped pages in range so overhead is increased linearly with the number of mapped pages. Even, if user access zapped pages by write, page fault + page allocation + memset should be happened. The madvise(VOLATILE) should mark the flag in a range(ie, VMA). It doesn't touch pages any more so overhead of the system call should be very small. If memory pressure happens, VM can discard pages in VMAs marked by VOLATILE. If user access address with write mode by discarding by VM, he can see zero-fill pages so the cost is same with DONTNEED but if memory pressure isn't severe, user can see old data without (page fault + page allocation + memset) The VOLATILE mark should be removed in page fault handler when first page fault occur in marked vma so next page faults will follow normal page fault path. That's why user don't need madvise(MADV_NOVOLATILE) interface. - What's the benefit compared to DONTNEED? 1. The system call overhead is smaller because VOLATILE just marks the flag to VMA instead of zapping all the page in a range. 2. It has a chance to eliminate overheads (ex, page fault + page allocation + memset(PAGE_SIZE)). - Isn't there any drawback? DONTNEED doesn't need exclusive mmap_sem locking so concurrent page fault of other threads could be allowed. But VOLATILE needs exclusive mmap_sem so other thread would be blocked if they try to access not-mapped pages. That's why I designed madvise(VOLATILE)'s overhead should be small as far as possible. Other concern of exclusive mmap_sem is when page fault occur in VOLATILE marked vma. We should remove the flag of vma and merge adjacent vmas so needs exclusive mmap_sem. It can slow down page fault handling and prevent concurrent page fault. But we need such handling just once when page fault occur after we mark VOLATILE into VMA only if memory pressure happpens so the page is discarded. So it wouldn't not common so that benefit we get by this feature would be bigger than lose. - What's for targetting? Firstly, user-space allocator like ptmalloc, tcmalloc or heap management of virtual machine like Dalvik. Also, it comes in handy for embedded which doesn't have swap device so they can't reclaim anonymous pages. By discarding instead of swap, it could be used in the non-swap system. For it, we have to age anon lru list although we don't have swap because I don't want to discard volatile pages by top priority when memory pressure happens as volatile in this patch means "We don't need to swap out because user can handle the situation which data are disappear suddenly", NOT "They are useless so hurry up to reclaim them". So I want to apply same aging rule of nomal pages to them. Anonymous page background aging of non-swap system would be a trade-off for getting good feature. Even, we had done it two years ago until merge [1] and I believe gain of this patch will beat loss of anon lru aging's overead once all of allocator start to use madvise. (This patch doesn't include background aging in case of non-swap system but it's trivial if we decide) [1] 74e3f3c3, vmscan: prevent background aging of anon page in no swap system Cc: Michael Kerrisk Cc: Arun Sharma Cc: san...@google.com Cc: Paul Turner CC: David Rientjes Cc: John Stultz Cc: Andrew Morton Cc: Christoph Lameter Cc: Android Kernel Team Cc: Robert Love Cc: Mel Gorman Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Dave Chinner Cc: Neil Brown Cc: Mike Hommey Cc: Taras Glek Cc: KOSAKI Motohiro Cc: Ch
Re: [tip:x86/microcode] x86/microcode_intel_early.c: Early update ucode on Intel's CPU
On Mon, Dec 3, 2012 at 4:18 PM, Yu, Fenghua wrote: >> From: yhlu.ker...@gmail.com [mailto:yhlu.ker...@gmail.com] On Behalf Of >> Yinghai Lu >> >> may need to copy the ucode.bin to BRK at first. that will make code >> much simple, and later does not need to >> copy them back in free_bootmem_initrd. >> >> aka, this patchset is not ready for 3.8 even. >> > > I will relocate saved microcode blob (mc_saved_in_initrd) after initrd is > relocated in updated patches. Thus, mc_saved_in_initrd always point to > right initrd during boot time. No, you should not copy that several times. just pre-allocate some kbytes in BRK, and copy to there one time. Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TIP tree's master branch failed to boot up
On 12/11/2012 10:34 AM, H. Peter Anvin wrote: > On 12/10/2012 06:22 PM, Michael Wang wrote: >> On 12/11/2012 01:02 AM, H. Peter Anvin wrote: >>> On 12/09/2012 08:50 PM, Michael Wang wrote: Hi, Folks I'm testing with the latest tip tree's master branch 3.7.0-rc8 and failed to boot up my server, it's hung at very beginning and I could not catch any useful log, is there any one else got this problem or I'm the only one?. Regards, Michael Wang >>> >>> 32 or 64 bits? >> >> 64 bits. >> > > Thanks. We're working on that patchset and have found a bunch of > issues, hopefully we can get them resolved very soon. I see, let me know if you need more info :) Regards, Michael Wang > > -hpa > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 3/5] page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes
This patch introduces a new array zone_movable_limit[] to store the ZONE_MOVABLE limit from movablecore_map boot option for all nodes. The function sanitize_zone_movable_limit() will find out to which node the ranges in movable_map.map[] belongs, and calculates the low boundary of ZONE_MOVABLE for each node. Signed-off-by: Tang Chen Signed-off-by: Jiang Liu Reviewed-by: Wen Congyang Reviewed-by: Lai Jiangshan Tested-by: Lin Feng --- mm/page_alloc.c | 77 +++ 1 files changed, 77 insertions(+), 0 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1c91d16..4853619 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -206,6 +206,7 @@ static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; static unsigned long __initdata required_kernelcore; static unsigned long __initdata required_movablecore; static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES]; +static unsigned long __meminitdata zone_movable_limit[MAX_NUMNODES]; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ int movable_zone; @@ -4340,6 +4341,77 @@ static unsigned long __meminit zone_absent_pages_in_node(int nid, return __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); } +/** + * sanitize_zone_movable_limit - Sanitize the zone_movable_limit array. + * + * zone_movable_limit is initialized as 0. This function will try to get + * the first ZONE_MOVABLE pfn of each node from movablecore_map, and + * assigne them to zone_movable_limit. + * zone_movable_limit[nid] == 0 means no limit for the node. + * + * Note: Each range is represented as [start_pfn, end_pfn) + */ +static void __meminit sanitize_zone_movable_limit(void) +{ + int map_pos = 0, i, nid; + unsigned long start_pfn, end_pfn; + + if (!movablecore_map.nr_map) + return; + + /* Iterate all ranges from minimum to maximum */ + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + /* +* If we have found lowest pfn of ZONE_MOVABLE of the node +* specified by user, just go on to check next range. +*/ + if (zone_movable_limit[nid]) + continue; + +#ifdef CONFIG_ZONE_DMA + /* Skip DMA memory. */ + if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA]) + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA]; +#endif + +#ifdef CONFIG_ZONE_DMA32 + /* Skip DMA32 memory. */ + if (start_pfn < arch_zone_highest_possible_pfn[ZONE_DMA32]) + start_pfn = arch_zone_highest_possible_pfn[ZONE_DMA32]; +#endif + +#ifdef CONFIG_HIGHMEM + /* Skip lowmem if ZONE_MOVABLE is highmem. */ + if (zone_movable_is_highmem() && + start_pfn < arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]) + start_pfn = arch_zone_lowest_possible_pfn[ZONE_HIGHMEM]; +#endif + + if (start_pfn >= end_pfn) + continue; + + while (map_pos < movablecore_map.nr_map) { + if (end_pfn <= movablecore_map.map[map_pos].start_pfn) + break; + + if (start_pfn >= movablecore_map.map[map_pos].end_pfn) { + map_pos++; + continue; + } + + /* +* The start_pfn of ZONE_MOVABLE is either the minimum +* pfn specified by movablecore_map, or 0, which means +* the node has no ZONE_MOVABLE. +*/ + zone_movable_limit[nid] = max(start_pfn, + movablecore_map.map[map_pos].start_pfn); + + break; + } + } +} + #else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ static inline unsigned long __meminit zone_spanned_pages_in_node(int nid, unsigned long zone_type, @@ -4358,6 +4430,10 @@ static inline unsigned long __meminit zone_absent_pages_in_node(int nid, return zholes_size[zone_type]; } +static void __meminit sanitize_zone_movable_limit(void) +{ +} + #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ static void __meminit calculate_node_totalpages(struct pglist_data *pgdat, @@ -4923,6 +4999,7 @@ void __init free_area_init_nodes(unsigned long *max_zone_pfn) /* Find the PFNs that ZONE_MOVABLE begins at in each node */ memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn)); + sanitize_zone_movable_limit(); find_zone_movable_pfns_for_nodes(); /* Print out the zone ranges */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org Mo
[PATCH v3 5/5] page_alloc: Bootmem limit with movablecore_map
This patch make sure bootmem will not allocate memory from areas that may be ZONE_MOVABLE. The map info is from movablecore_map boot option. Signed-off-by: Tang Chen Reviewed-by: Wen Congyang Reviewed-by: Lai Jiangshan Tested-by: Lin Feng --- include/linux/memblock.h |1 + mm/memblock.c| 18 +- 2 files changed, 18 insertions(+), 1 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index d452ee1..6e25597 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -42,6 +42,7 @@ struct memblock { extern struct memblock memblock; extern int memblock_debug; +extern struct movablecore_map movablecore_map; #define memblock_dbg(fmt, ...) \ if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__) diff --git a/mm/memblock.c b/mm/memblock.c index 6259055..197c3be 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -101,6 +101,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start, { phys_addr_t this_start, this_end, cand; u64 i; + int curr = movablecore_map.nr_map - 1; /* pump up @end */ if (end == MEMBLOCK_ALLOC_ACCESSIBLE) @@ -114,13 +115,28 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start, this_start = clamp(this_start, start, end); this_end = clamp(this_end, start, end); - if (this_end < size) +restart: + if (this_end <= this_start || this_end < size) continue; + for (; curr >= 0; curr--) { + if ((movablecore_map.map[curr].start_pfn << PAGE_SHIFT) + < this_end) + break; + } + cand = round_down(this_end - size, align); + if (curr >= 0 && + cand < movablecore_map.map[curr].end_pfn << PAGE_SHIFT) { + this_end = movablecore_map.map[curr].start_pfn + << PAGE_SHIFT; + goto restart; + } + if (cand >= this_start) return cand; } + return 0; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TIP tree's master branch failed to boot up
On 12/10/2012 06:22 PM, Michael Wang wrote: > On 12/11/2012 01:02 AM, H. Peter Anvin wrote: >> On 12/09/2012 08:50 PM, Michael Wang wrote: >>> Hi, Folks >>> >>> I'm testing with the latest tip tree's master branch 3.7.0-rc8 and >>> failed to boot up my server, it's hung at very beginning and I could not >>> catch any useful log, is there any one else got this problem or I'm the >>> only one?. >>> >>> Regards, >>> Michael Wang >>> >> >> 32 or 64 bits? > > 64 bits. > Thanks. We're working on that patchset and have found a bunch of issues, hopefully we can get them resolved very soon. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TIP tree's master branch failed to boot up
On 12/10/2012 06:54 PM, Ingo Molnar wrote: > > I've Cc:-ed hpa, he merged the x86/microcode bits. > > Michael Wang, I've excluded x86/microcode from the latest > version of the -tip tree which I just pushed out: > > 4e00fd4c93e0 Merge branch 'x86/cleanups' > > Can you confirm that your server boots now? Yes, it works. Regards, Michael Wang > > Thanks, > > Ingo > > * Michael Wang wrote: > >> On 12/10/2012 12:50 PM, Michael Wang wrote: >>> Hi, Folks >>> >>> I'm testing with the latest tip tree's master branch 3.7.0-rc8 and >>> failed to boot up my server, it's hung at very beginning and I could not >>> catch any useful log, is there any one else got this problem or I'm the >>> only one?. >> >> And bisect catch below commit: >> >> commit 56e7dba100a50f674627a3764fd4da4a6ec93295 >> Merge: ea8432f 16544f8 >> Author: Ingo Molnar >> Date: Fri Dec 7 12:13:11 2012 +0100 >> >> Merge branch 'x86/microcode' >> >> Regards, >> Michael Wang >> >> >>> >>> Regards, >>> Michael Wang >>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 0/5] Add movablecore_map boot option
[What we are doing] This patchset provide a boot option for user to specify ZONE_MOVABLE memory map for each node in the system. movablecore_map=nn[KMG]@ss[KMG] This option make sure memory range from ss to ss+nn is movable memory. [Why we do this] If we hot remove a memroy, the memory cannot have kernel memory, because Linux cannot migrate kernel memory currently. Therefore, we have to guarantee that the hot removed memory has only movable memoroy. Linux has two boot options, kernelcore= and movablecore=, for creating movable memory. These boot options can specify the amount of memory use as kernel or movable memory. Using them, we can create ZONE_MOVABLE which has only movable memory. But it does not fulfill a requirement of memory hot remove, because even if we specify the boot options, movable memory is distributed in each node evenly. So when we want to hot remove memory which memory range is 0x8000-0c000, we have no way to specify the memory as movable memory. So we proposed a new feature which specifies memory range to use as movable memory. [Ways to do this] There may be 2 ways to specify movable memory. 1. use firmware information 2. use boot option 1. use firmware information According to ACPI spec 5.0, SRAT table has memory affinity structure and the structure has Hot Pluggable Filed. See "5.2.16.2 Memory Affinity Structure". If we use the information, we might be able to specify movable memory by firmware. For example, if Hot Pluggable Filed is enabled, Linux sets the memory as movable memory. 2. use boot option This is our proposal. New boot option can specify memory range to use as movable memory. [How we do this] We chose second way, because if we use first way, users cannot change memory range to use as movable memory easily. We think if we create movable memory, performance regression may occur by NUMA. In this case, user can turn off the feature easily if we prepare the boot option. And if we prepare the boot optino, the user can select which memory to use as movable memory easily. [How to use] Specify the following boot option: movablecore_map=nn[KMG]@ss[KMG] That means physical address range from ss to ss+nn will be allocated as ZONE_MOVABLE. And the following points should be considered. 1) If the range is involved in a single node, then from ss to the end of the node will be ZONE_MOVABLE. 2) If the range covers two or more nodes, then from ss to the end of the node will be ZONE_MOVABLE, and all the other nodes will only have ZONE_MOVABLE. 3) If no range is in the node, then the node will have no ZONE_MOVABLE unless kernelcore or movablecore is specified. 4) This option could be specified at most MAX_NUMNODES times. 5) If kernelcore or movablecore is also specified, movablecore_map will have higher priority to be satisfied. 6) This option has no conflict with memmap option. Change log: v2 -> v3: 1) Use memblock_alloc_try_nid() instead of memblock_alloc_nid() to allocate memory twice if a whole node is ZONE_MOVABLE. 2) Add DMA, DMA32 addresses check, make sure ZONE_MOVABLE won't use these addresses. Suggested by Wu Jianguo 3) Add lowmem addresses check, when the system has highmem, make sure ZONE_MOVABLE won't use lowmem. Suggested by Liu Jiang 4) Fix misuse of pfns in movablecore_map.map[] as physical addresses. Tang Chen (4): page_alloc: add movable_memmap kernel parameter page_alloc: Introduce zone_movable_limit[] to keep movable limit for nodes page_alloc: Make movablecore_map has higher priority page_alloc: Bootmem limit with movablecore_map Yasuaki Ishimatsu (1): x86: get pg_data_t's memory from other node Documentation/kernel-parameters.txt | 17 +++ arch/x86/mm/numa.c |5 +- include/linux/memblock.h|1 + include/linux/mm.h | 11 ++ mm/memblock.c | 18 +++- mm/page_alloc.c | 238 ++- 6 files changed, 282 insertions(+), 8 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 2/5] page_alloc: add movable_memmap kernel parameter
This patch adds functions to parse movablecore_map boot option. Since the option could be specified more then once, all the maps will be stored in the global variable movablecore_map.map array. And also, we keep the array in monotonic increasing order by start_pfn. And merge all overlapped ranges. Signed-off-by: Tang Chen Signed-off-by: Lai Jiangshan Reviewed-by: Wen Congyang Tested-by: Lin Feng --- Documentation/kernel-parameters.txt | 17 + include/linux/mm.h | 11 +++ mm/page_alloc.c | 126 +++ 3 files changed, 154 insertions(+), 0 deletions(-) diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 9776f06..785f878 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1620,6 +1620,23 @@ bytes respectively. Such letter suffixes can also be entirely omitted. that the amount of memory usable for all allocations is not too small. + movablecore_map=nn[KMG]@ss[KMG] + [KNL,X86,IA-64,PPC] This parameter is similar to + memmap except it specifies the memory map of + ZONE_MOVABLE. + If more areas are all within one node, then from + lowest ss to the end of the node will be ZONE_MOVABLE. + If an area covers two or more nodes, the area from + ss to the end of the 1st node will be ZONE_MOVABLE, + and all the rest nodes will only have ZONE_MOVABLE. + If memmap is specified at the same time, the + movablecore_map will be limited within the memmap + areas. If kernelcore or movablecore is also specified, + movablecore_map will have higher priority to be + satisfied. So the administrator should be careful that + the amount of movablecore_map areas are not too large. + Otherwise kernel won't have enough memory to start. + MTD_Partition= [MTD] Format: ,,, diff --git a/include/linux/mm.h b/include/linux/mm.h index bcaab4e..29622c2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1328,6 +1328,17 @@ extern void free_bootmem_with_active_regions(int nid, unsigned long max_low_pfn); extern void sparse_memory_present_with_active_regions(int nid); +#define MOVABLECORE_MAP_MAX MAX_NUMNODES +struct movablecore_entry { + unsigned long start_pfn;/* start pfn of memory segment */ + unsigned long end_pfn; /* end pfn of memory segment */ +}; + +struct movablecore_map { + int nr_map; + struct movablecore_entry map[MOVABLECORE_MAP_MAX]; +}; + #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */ #if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a8f2c87..1c91d16 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -198,6 +198,9 @@ static unsigned long __meminitdata nr_all_pages; static unsigned long __meminitdata dma_reserve; #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP +/* Movable memory ranges, will also be used by memblock subsystem. */ +struct movablecore_map movablecore_map; + static unsigned long __meminitdata arch_zone_lowest_possible_pfn[MAX_NR_ZONES]; static unsigned long __meminitdata arch_zone_highest_possible_pfn[MAX_NR_ZONES]; static unsigned long __initdata required_kernelcore; @@ -5003,6 +5006,129 @@ static int __init cmdline_parse_movablecore(char *p) early_param("kernelcore", cmdline_parse_kernelcore); early_param("movablecore", cmdline_parse_movablecore); +/** + * insert_movablecore_map - Insert a memory range in to movablecore_map.map. + * @start_pfn: start pfn of the range + * @end_pfn: end pfn of the range + * + * This function will also merge the overlapped ranges, and sort the array + * by start_pfn in monotonic increasing order. + */ +static void __init insert_movablecore_map(unsigned long start_pfn, + unsigned long end_pfn) +{ + int pos, overlap; + + /* +* pos will be at the 1st overlapped range, or the position +* where the element should be inserted. +*/ + for (pos = 0; pos < movablecore_map.nr_map; pos++) + if (start_pfn <= movablecore_map.map[pos].end_pfn) + break; + + /* If there is no overlapped range, just insert the element. */ + if (pos == movablecore_map.nr_map || + end_pfn < movablecore_map.map[pos].start_pfn) { + /* +* If pos is not the end of array, we need to move all +* the rest elements backward. +*/ + if (pos < movablecore_map.nr_map) + memmove(&
[PATCH v3 4/5] page_alloc: Make movablecore_map has higher priority
If kernelcore or movablecore is specified at the same time with movablecore_map, movablecore_map will have higher priority to be satisfied. This patch will make find_zone_movable_pfns_for_nodes() calculate zone_movable_pfn[] with the limit from zone_movable_limit[]. Signed-off-by: Tang Chen Reviewed-by: Wen Congyang Reviewed-by: Lai Jiangshan Tested-by: Lin Feng --- mm/page_alloc.c | 35 +++ 1 files changed, 31 insertions(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4853619..e7b6db5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4839,12 +4839,25 @@ static void __init find_zone_movable_pfns_for_nodes(void) required_kernelcore = max(required_kernelcore, corepages); } - /* If kernelcore was not specified, there is no ZONE_MOVABLE */ - if (!required_kernelcore) + /* +* No matter kernelcore/movablecore was limited or not, movable_zone +* should always be set to a usable zone index. +*/ + find_usable_zone_for_movable(); + + /* +* If neither kernelcore/movablecore nor movablecore_map is specified, +* there is no ZONE_MOVABLE. But if movablecore_map is specified, the +* start pfn of ZONE_MOVABLE has been stored in zone_movable_limit[]. +*/ + if (!required_kernelcore) { + if (movablecore_map.nr_map) + memcpy(zone_movable_pfn, zone_movable_limit, + sizeof(zone_movable_pfn)); goto out; + } /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - find_usable_zone_for_movable(); usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; restart: @@ -4872,10 +4885,24 @@ restart: for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { unsigned long size_pages; + /* +* Find more memory for kernelcore in +* [zone_movable_pfn[nid], zone_movable_limit[nid]). +*/ start_pfn = max(start_pfn, zone_movable_pfn[nid]); if (start_pfn >= end_pfn) continue; + if (zone_movable_limit[nid]) { + end_pfn = min(end_pfn, zone_movable_limit[nid]); + /* No range left for kernelcore in this node */ + if (start_pfn >= end_pfn) { + zone_movable_pfn[nid] = + zone_movable_limit[nid]; + break; + } + } + /* Account for what is only usable for kernelcore */ if (start_pfn < usable_startpfn) { unsigned long kernel_pages; @@ -4935,12 +4962,12 @@ restart: if (usable_nodes && required_kernelcore > usable_nodes) goto restart; +out: /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ for (nid = 0; nid < MAX_NUMNODES; nid++) zone_movable_pfn[nid] = roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); -out: /* restore the node_state */ node_states[N_HIGH_MEMORY] = saved_node_state; } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3 1/5] x86: get pg_data_t's memory from other node
From: Yasuaki Ishimatsu If system can create movable node which all memory of the node is allocated as ZONE_MOVABLE, setup_node_data() cannot allocate memory for the node's pg_data_t. So, use memblock_alloc_try_nid() instead of memblock_alloc_nid() to retry when the first allocation fails. Signed-off-by: Yasuaki Ishimatsu Signed-off-by: Lai Jiangshan Signed-off-by: Tang Chen Signed-off-by: Jiang Liu --- arch/x86/mm/numa.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c index 2d125be..db939b6 100644 --- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -222,10 +222,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end) nd_pa = __pa(nd); remapped = true; } else { - nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); + nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); if (!nd_pa) { - pr_err("Cannot find %zu bytes in node %d\n", - nd_size, nid); + pr_err("Cannot find %zu bytes in any node\n", nd_size); return; } nd = __va(nd_pa); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC v3] Support volatile range for anon vma
This still is [RFC v3] because just passed my simple test with TCMalloc tweaking. I hope more inputs from user-space allocator people and test patch with their allocator because it might need design change of arena management design for getting real vaule. Changelog from v2 * Removing madvise(addr, length, MADV_NOVOLATILE). * add vmstat about the number of discarded volatile pages * discard volatile pages without promotion in reclaim path This is based on v3.6. - What's the madvise(addr, length, MADV_VOLATILE)? It's a hint that user deliver to kernel so kernel can *discard* pages in a range anytime. - What happens if user access page(ie, virtual address) discarded by kernel? The user can see zero-fill-on-demand pages as if madvise(DONTNEED). - What happens if user access page(ie, virtual address) doesn't discarded by kernel? The user can see old data without page fault. - What's different with madvise(DONTNEED)? System call semantic DONTNEED makes sure user always can see zero-fill pages after he calls madvise while VOLATILE can see zero-fill pages or old data. Internal implementation The madvise(DONTNEED) should zap all mapped pages in range so overhead is increased linearly with the number of mapped pages. Even, if user access zapped pages by write, page fault + page allocation + memset should be happened. The madvise(VOLATILE) should mark the flag in a range(ie, VMA). It doesn't touch pages any more so overhead of the system call should be very small. If memory pressure happens, VM can discard pages in VMAs marked by VOLATILE. If user access address with write mode by discarding by VM, he can see zero-fill pages so the cost is same with DONTNEED but if memory pressure isn't severe, user can see old data without (page fault + page allocation + memset) The VOLATILE mark should be removed in page fault handler when first page fault occur in marked vma so next page faults will follow normal page fault path. That's why user don't need madvise(MADV_NOVOLATILE) interface. - What's the benefit compared to DONTNEED? 1. The system call overhead is smaller because VOLATILE just marks the flag to VMA instead of zapping all the page in a range. 2. It has a chance to eliminate overheads (ex, page fault + page allocation + memset(PAGE_SIZE)). - Isn't there any drawback? DONTNEED doesn't need exclusive mmap_sem locking so concurrent page fault of other threads could be allowed. But VOLATILE needs exclusive mmap_sem so other thread would be blocked if they try to access not-mapped pages. That's why I designed madvise(VOLATILE)'s overhead should be small as far as possible. Other concern of exclusive mmap_sem is when page fault occur in VOLATILE marked vma. We should remove the flag of vma and merge adjacent vmas so needs exclusive mmap_sem. It can slow down page fault handling and prevent concurrent page fault. But we need such handling just once when page fault occur after we mark VOLATILE into VMA only if memory pressure happpens so the page is discarded. So it wouldn't not common so that benefit we get by this feature would be bigger than lose. - What's for targetting? Firstly, user-space allocator like ptmalloc, tcmalloc or heap management of virtual machine like Dalvik. Also, it comes in handy for embedded which doesn't have swap device so they can't reclaim anonymous pages. By discarding instead of swap, it could be used in the non-swap system. For it, we have to age anon lru list although we don't have swap because I don't want to discard volatile pages by top priority when memory pressure happens as volatile in this patch means "We don't need to swap out because user can handle the situation which data are disappear suddenly", NOT "They are useless so hurry up to reclaim them". So I want to apply same aging rule of nomal pages to them. Anonymous page background aging of non-swap system would be a trade-off for getting good feature. Even, we had done it two years ago until merge [1] and I believe gain of this patch will beat loss of anon lru aging's overead once all of allocator start to use madvise. (This patch doesn't include background aging in case of non-swap system but it's trivial if we decide) [1] 74e3f3c3, vmscan: prevent background aging of anon page in no swap system Cc: Michael Kerrisk Cc: Arun Sharma Cc: san...@google.com Cc: Paul Turner CC: David Rientjes Cc: John Stultz Cc: Andrew Morton Cc: Christoph Lameter Cc: Android Kernel Team Cc: Robert Love Cc: Mel Gorman Cc: Hugh Dickins Cc: Dave Hansen Cc: Rik van Riel Cc: Dave Chinner Cc: Neil Brown Cc: Mike Hommey Cc: Taras Glek Cc: KOSAKI Motohiro Cc: Christoph Lameter Cc: KAMEZAWA Hiroyuki Signed-off-by: Minchan Kim --- arch/x86/mm/fault.c |2 + include/asm-generic/mman-common.h |6 ++ include/linux/mm.h|7 ++- include/linux/rmap.h
Re: [PATCH 5/6] ACPI: Replace struct acpi_bus_ops with enum type
On Mon, Dec 10, 2012 at 5:28 PM, Rafael J. Wysocki wrote: >> >> OK, thanks for the pointers. I actually see more differences between our >> patchsets. For one example, you seem to have left the parent->ops.bind() >> stuff in acpi_add_single_object() which calls it even drivers_autoprobe is >> set. > > Sorry, that should have been "which calls it even when drivers_autoprobe is > not set". I need to be more careful. > oh, Jiang Liu had one patch to remove that workaround. http://git.kernel.org/?p=linux/kernel/git/yinghai/linux-yinghai.git;a=commitdiff;h=b40dba80c2b8395570d8357e6b3f417c27c84504 ACPI/pci-bind: remove bind/unbind callbacks from acpi_device_ops Maybe you can review that patches in my for-pci-next2... those are ACPI related anyway. those patches have been there for a while, and Bjorn did not have time to digest them. or you prefer I resend updated version as huge whole patchset? Thanks Yinghai -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On 2012/12/10 23:38, Andi Kleen wrote: >> It is another topic, I mean since the page is poisoned, so why not isolate it >> from page buddy alocator in soft_offline_page() rather than in >> check_new_page(). >> I find soft_offline_page() only migrate the page and mark HWPoison, the >> poisoned >> page is still managed by page buddy alocator. > > Doing it in check_new_page is the only way if the page is currently > allocated by someone. Since that's not uncommon it's simplest to always > do it this way. > > -Andi > Hi Andi, The poisoned page is isolated in check_new_page, however the whole buddy block will be dropped, it seems to be a waste of memory. Can we separate the poisoned page from the buddy block, then *only* drop the poisoned page? Thanks Xishi Qiu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TIP tree's master branch failed to boot up
On 12/11/2012 01:02 AM, H. Peter Anvin wrote: > On 12/09/2012 08:50 PM, Michael Wang wrote: >> Hi, Folks >> >> I'm testing with the latest tip tree's master branch 3.7.0-rc8 and >> failed to boot up my server, it's hung at very beginning and I could not >> catch any useful log, is there any one else got this problem or I'm the >> only one?. >> >> Regards, >> Michael Wang >> > > 32 or 64 bits? 64 bits. Regards, Michael Wang > > -hpa > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
On Tue, 2012-12-11 at 03:03 +0100, Andi Kleen wrote: > > IIUC, soft offlining will isolate and migrate hwpoisoned page, and this > > page will not be accessed by memory management subsystem until unpoison, > > correct? > > No, soft offlining can still allow accesses for some time. It'll never kill > anything. Oh, it will be putback to lru list during migration. So does your "some time" mean before call check_new_page? -Simon > > Hard tries much harder and will kill. > > In some cases (unshrinkable kernel allocation) they end up doing the same > because there isn't any other alternative though. However these are > expected to only apply to a small percentage of pages in a typical > system. > > -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2] MCE: fix an error of mce_bad_pages statistics
> IIUC, soft offlining will isolate and migrate hwpoisoned page, and this > page will not be accessed by memory management subsystem until unpoison, > correct? No, soft offlining can still allow accesses for some time. It'll never kill anything. Hard tries much harder and will kill. In some cases (unshrinkable kernel allocation) they end up doing the same because there isn't any other alternative though. However these are expected to only apply to a small percentage of pages in a typical system. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/