date:20130405

Re: [PATCH] Staging: Android: looping issue, need break when get value firstly.

2013-04-05 Thread Greg KH

On Sat, Apr 06, 2013 at 01:05:59PM +0800, Chen Gang wrote:
> On 2013年04月06日 07:48, Arve Hjønnevåg wrote:
> > On Fri, Apr 5, 2013 at 3:01 PM, Greg KH  wrote:
> >> > On Fri, Apr 05, 2013 at 04:05:25PM +0800, Chen Gang wrote:
> >>> >>
> >>> >>   need break when 'target_thread' get value, firstly.
> >>> >>
> >>> >> 'tmp' is a stack (thread->transaction_stack),
> >>> >> if 'proc' was the same between child node and parent node,
> >>> >> the child would have higher priority than parent.
> >> >
> >> > Are you sure about this?
> >> >
> >> > have you tested it?
> >> >
> > Theoretically this should not change the behavior. The purpose of this
> > code it to make sure only thread per process is part of a transaction
> > stack, so if it finds more than one transaction with a matching
> > process, they should all point to the same thread object. I think a
> > better change description is needed though.
> 
> 
>   ok, thanks.
>   I will send patch v2 (also mark you as Signed-off).

You can NEVER add someone else's "Signed-off-by", unless they actually
do it (hint, that did not happen here at all.)

Please go read Documentation/SubmittingPatches again to learn exactly
what Signed-off-by: really is (a legal agreement), it is not something
to throw around lightly like this.

> (if the patch v2 still need improvement, please reply in time).

In time for what?

Please test these patches before you resend them.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 7/7] f2fs: add tracepoints to debug checkpoint request

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints to debug checkpoint request.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/checkpoint.c|1 +
 include/trace/events/f2fs.h |   18 ++
 2 files changed, 19 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index c0606b1..f1bcf35 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -606,6 +606,7 @@ static void do_checkpoint(struct f2fs_sb_info *sbi, bool 
is_umount)
void *kaddr;
int i;
 
+   trace_f2fs_do_checkpoint(sbi->sb);
/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META))
sync_meta_pages(sbi, META, LONG_MAX);
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 858375b..8ec02ea 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -485,6 +485,24 @@ DEFINE_EVENT(f2fs_page_type_op, f2fs_write_page,
TP_ARGS(page, type)
 );
 
+TRACE_EVENT(f2fs_do_checkpoint,
+   TP_PROTO(struct super_block *sb),
+
+   TP_ARGS(sb),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= sb->s_dev;
+   ),
+
+   TP_printk("dev %d,%d ",
+ MAJOR(__entry->dev), MINOR(__entry->dev))
+
+);
+
 #endif /* _TRACE_F2FS_H */
 
  /* This part must be outside protection */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 6/7] f2fs: add tracepoints for write page operations

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints to debug the various page write operation
like data pages, meta pages.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/checkpoint.c|2 ++
 fs/f2fs/data.c  |2 ++
 include/trace/events/f2fs.h |   62 +++
 3 files changed, 66 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 93fd57d..c0606b1 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -20,6 +20,7 @@
 #include "f2fs.h"
 #include "node.h"
 #include "segment.h"
+#include 
 
 static struct kmem_cache *orphan_entry_slab;
 static struct kmem_cache *inode_entry_slab;
@@ -75,6 +76,7 @@ static int f2fs_write_meta_page(struct page *page,
struct inode *inode = page->mapping->host;
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
 
+   trace_f2fs_write_page(page, META);
/* Should not write any meta pages, if any IO error was occurred */
if (wbc->for_reclaim ||
is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ERROR_FLAG)) {
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a517ec2..9ed6500 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -490,6 +490,7 @@ static int f2fs_write_data_page(struct page *page,
unsigned offset;
int err = 0;
 
+   trace_f2fs_write_page(page, DATA);
if (page->index < end_index)
goto out;
 
@@ -598,6 +599,7 @@ static int f2fs_write_begin(struct file *file, struct 
address_space *mapping,
struct dnode_of_data dn;
int err = 0;
 
+   trace_f2fs_write_begin(inode, pos, len, flags);
/* for nobh_write_end */
*fsdata = NULL;
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 5665619..858375b 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -423,6 +423,68 @@ TRACE_EVENT(f2fs_reserve_new_block,
   __entry->nid)
 );
 
+TRACE_EVENT(f2fs_write_begin,
+
+   TP_PROTO(struct inode *inode, loff_t pos, unsigned int len,
+unsigned int flags),
+
+   TP_ARGS(inode, pos, len, flags),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(loff_t, pos)
+   __field(unsigned int, len)
+   __field(unsigned int, flags)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->ino= inode->i_ino;
+   __entry->pos= pos;
+   __entry->len= len;
+   __entry->flags  = flags;
+   ),
+
+   TP_printk("dev %d,%d ino %lu pos %lld len %u flags %u",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ __entry->pos, __entry->len, __entry->flags)
+);
+
+DECLARE_EVENT_CLASS(f2fs_page_type_op,
+   TP_PROTO(struct page *page, int type),
+
+   TP_ARGS(page, type),
+
+   TP_STRUCT__entry(
+   __field(pgoff_t, index)
+   __field(int, type)
+   __field(ino_t,  ino)
+   __field(dev_t,  dev)
+
+   ),
+
+   TP_fast_assign(
+   __entry->index  = page->index;
+   __entry->type   = type;
+   __entry->ino= page->mapping->host->i_ino;
+   __entry->dev= page->mapping->host->i_sb->s_dev;
+   ),
+
+   TP_printk("dev %d,%d ino %lu page_index %lu type %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ (unsigned long) __entry->index, __entry->type)
+);
+
+DEFINE_EVENT(f2fs_page_type_op, f2fs_write_page,
+
+   TP_PROTO(struct page *page, int type),
+
+   TP_ARGS(page, type)
+);
+
 #endif /* _TRACE_F2FS_H */
 
  /* This part must be outside protection */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 5/7] f2fs: add tracepoints to debug the block allocation & fallocate

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints to debug the block allocation & fallocate.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/data.c  |1 +
 fs/f2fs/file.c  |3 ++
 include/trace/events/f2fs.h |   76 +++
 3 files changed, 80 insertions(+)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d5d5a7c..a517ec2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -56,6 +56,7 @@ int reserve_new_block(struct dnode_of_data *dn)
if (!inc_valid_block_count(sbi, dn->inode, 1))
return -ENOSPC;
 
+   trace_f2fs_reserve_new_block(dn->inode, dn->nid);
__set_data_blkaddr(dn, NEW_ADDR);
dn->data_blkaddr = NEW_ADDR;
sync_inode_page(dn);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index f082a16..412fe77 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -528,6 +528,7 @@ static long f2fs_fallocate(struct file *file, int mode,
struct inode *inode = file_inode(file);
long ret;
 
+   trace_f2fs_fallocate_enter(inode, offset, len, mode);
if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
return -EOPNOTSUPP;
 
@@ -540,6 +541,8 @@ static long f2fs_fallocate(struct file *file, int mode,
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
mark_inode_dirty(inode);
}
+
+   trace_f2fs_fallocate_exit(inode, offset, len, ret);
return ret;
 }
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index f9efe99..5665619 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -347,6 +347,82 @@ TRACE_EVENT(f2fs_get_victim,
   __entry->type)
 );
 
+TRACE_EVENT(f2fs_fallocate_enter,
+   TP_PROTO(struct inode *inode, loff_t offset, loff_t len, int mode),
+
+   TP_ARGS(inode, offset, len, mode),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(loff_t, pos)
+   __field(loff_t, len)
+   __field(int,mode)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->ino= inode->i_ino;
+   __entry->pos= offset;
+   __entry->len= len;
+   __entry->mode   = mode;
+   ),
+
+   TP_printk("dev %d,%d ino %lu pos %lld len %lld mode %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino, __entry->pos,
+ __entry->len, __entry->mode)
+);
+
+TRACE_EVENT(f2fs_fallocate_exit,
+   TP_PROTO(struct inode *inode, loff_t offset,
+loff_t len, int ret),
+
+   TP_ARGS(inode, offset, len, ret),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(loff_t, pos)
+   __field(loff_t, len)
+   __field(int,ret)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->ino= inode->i_ino;
+   __entry->pos= offset;
+   __entry->len= len;
+   __entry->ret= ret;
+   ),
+
+   TP_printk("dev %d,%d ino %lu pos %lld len %lld ret %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ __entry->pos, __entry->len,
+ __entry->ret)
+);
+
+TRACE_EVENT(f2fs_reserve_new_block,
+   TP_PROTO(struct inode *inode, unsigned int nid),
+
+   TP_ARGS(inode, nid),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(unsigned int, nid)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->nid= nid;
+   ),
+
+   TP_printk("dev %d,%d: with Nid %u ",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+  __entry->nid)
+);
+
 #endif /* _TRACE_F2FS_H */
 
  /* This part must be outside protection */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 4/7] f2fs: add tracepoints for GC threads

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints for tracing the garbage collector
threads in f2fs with status of collection & type.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/gc.c|2 ++
 include/trace/events/f2fs.h |   20 
 2 files changed, 22 insertions(+)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 54ac13d..93bb0f9 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -23,6 +23,7 @@
 #include "node.h"
 #include "segment.h"
 #include "gc.h"
+#include 
 
 static struct kmem_cache *winode_slab;
 
@@ -239,6 +240,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct victim_sel_policy p;
int nsearched = 0;
 
+   trace_f2fs_get_victim(sbi->sb, gc_type);
p.alloc_mode = alloc_mode;
select_policy(sbi, gc_type, type, );
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 3df0525..f9efe99 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -327,6 +327,26 @@ DEFINE_EVENT(f2fs_data_block, f2fs_get_data_block_exit,
TP_ARGS(inode, block, ret)
 );
 
+TRACE_EVENT(f2fs_get_victim,
+   TP_PROTO(struct super_block *sb, int gc_type),
+
+   TP_ARGS(sb, gc_type),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(int,type)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= sb->s_dev;
+   __entry->type   = gc_type;
+   ),
+
+   TP_printk("dev %d,%d  GC_type %d ",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+  __entry->type)
+);
+
 #endif /* _TRACE_F2FS_H */
 
  /* This part must be outside protection */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 2/7] f2fs: add tracepoints for truncate operation

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

add tracepoints for tracing the truncate operations
like truncate node/data blocks, f2fs_truncate etc.

Tracepoints are added at entry and exit of operation
to trace the success & failure of operation.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/file.c  |6 ++-
 fs/f2fs/node.c  |7 +++
 include/trace/events/f2fs.h |  108 +++
 3 files changed, 120 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index c937d7b..f082a16 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -175,6 +175,8 @@ static int truncate_data_blocks_range(struct dnode_of_data 
*dn, int count)
struct f2fs_node *raw_node;
__le32 *addr;
 
+   trace_f2fs_truncate_data_blocks_range_enter(dn->inode,
+dn->data_blkaddr, dn->nid);
raw_node = page_address(dn->node_page);
addr = blkaddr_in_node(raw_node) + ofs;
 
@@ -193,6 +195,8 @@ static int truncate_data_blocks_range(struct dnode_of_data 
*dn, int count)
sync_inode_page(dn);
}
dn->ofs_in_node = ofs;
+
+   trace_f2fs_truncate_data_blocks_range_exit(dn->inode, nr_free);
return nr_free;
 }
 
@@ -271,7 +275,7 @@ void f2fs_truncate(struct inode *inode)
if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
S_ISLNK(inode->i_mode)))
return;
-
+   trace_f2fs_truncate(inode);
if (!truncate_blocks(inode, i_size_read(inode))) {
inode->i_mtime = inode->i_ctime = CURRENT_TIME;
mark_inode_dirty(inode);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 7555fb7..7db4813 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -19,6 +19,7 @@
 #include "f2fs.h"
 #include "node.h"
 #include "segment.h"
+#include 
 
 static struct kmem_cache *nat_entry_slab;
 static struct kmem_cache *free_nid_slab;
@@ -547,6 +548,7 @@ static int truncate_nodes(struct dnode_of_data *dn, 
unsigned int nofs,
int freed = 0;
int i, ret;
 
+   trace_f2fs_truncate_nodes_enter(dn->inode, dn->data_blkaddr, dn->nid);
if (dn->nid == 0)
return NIDS_PER_BLOCK + 1;
 
@@ -594,10 +596,12 @@ static int truncate_nodes(struct dnode_of_data *dn, 
unsigned int nofs,
} else {
f2fs_put_page(page, 1);
}
+   trace_f2fs_truncate_nodes_exit(dn->inode, freed);
return freed;
 
 out_err:
f2fs_put_page(page, 1);
+   trace_f2fs_truncate_nodes_exit(dn->inode, ret);
return ret;
 }
 
@@ -612,6 +616,8 @@ static int truncate_partial_nodes(struct dnode_of_data *dn,
int i;
int idx = depth - 2;
 
+   trace_f2fs_truncate_partial_nodes_enter(dn->inode,
+dn->data_blkaddr, dn->nid);
nid[0] = le32_to_cpu(ri->i_nid[offset[0] - NODE_DIR1_BLOCK]);
if (!nid[0])
return 0;
@@ -652,6 +658,7 @@ static int truncate_partial_nodes(struct dnode_of_data *dn,
 fail:
for (i = depth - 3; i >= 0; i--)
f2fs_put_page(pages[i], 1);
+   trace_f2fs_truncate_partial_nodes_exit(dn->inode, err);
return err;
 }
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index fd50db9..0d39f58 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -158,6 +158,114 @@ DEFINE_EVENT(f2fs_file_inode_ret, f2fs_unlink_exit,
TP_ARGS(inode, ret)
 );
 
+DECLARE_EVENT_CLASS(f2fs__truncate_op,
+   TP_PROTO(struct inode *inode, block_t blk_addr, unsigned int nid),
+
+   TP_ARGS(inode, blk_addr, nid),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(block_t, addr)
+   __field(unsigned int,   nid)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->ino= inode->i_ino;
+   __entry->addr   = blk_addr;
+   __entry->nid= nid;
+   ),
+
+   TP_printk("dev %d,%d ino %lu block_address %llu Nid %d ",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino, __entry->addr, __entry->nid)
+);
+
+DEFINE_EVENT(f2fs__truncate_op, f2fs_truncate_data_blocks_range_enter,
+
+   TP_PROTO(struct inode *inode, block_t blk_addr, unsigned int nid),
+
+   TP_ARGS(inode, blk_addr, nid)
+);
+
+DEFINE_EVENT(f2fs__truncate_op, f2fs_truncate_nodes_enter,
+
+   TP_PROTO(struct inode *inode, block_t blk_addr, unsigned int nid),
+
+   TP_ARGS(inode, blk_addr, nid)
+);
+
+DEFINE_EVENT(f2fs__truncate_op, f2fs_truncate_partial_nodes_enter,
+
+   TP_PROTO(struct inode *inode, block_t blk_addr, unsigned int nid),
+
+   TP_ARGS(inode, blk_addr, nid)
+);
+
+DECLARE_EVENT_CLASS(f2fs__truncate_op_exit,
+   TP_PROTO(struct inode *inode, int

[PATCH v3 3/7] f2fs: add tracepoint for tracing the page i/o operations

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints for page i/o operations and block allocation
tracing during page read operation.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/data.c  |9 ++-
 include/trace/events/f2fs.h |   61 +++
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 4f4da0d..d5d5a7c 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -22,6 +22,7 @@
 #include "f2fs.h"
 #include "node.h"
 #include "segment.h"
+#include 
 
 /*
  * Lock ordering for the change of data block address:
@@ -346,6 +347,8 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct page 
*page,
struct block_device *bdev = sbi->sb->s_bdev;
struct bio *bio;
 
+   trace_f2fs_readpage(page);
+
down_read(>bio_sem);
 
/* Allocate a new bio */
@@ -383,6 +386,7 @@ static int get_data_block_ro(struct inode *inode, sector_t 
iblock,
pgoff_t pgofs;
int err;
 
+   trace_f2fs_get_data_block_enter(inode, iblock, 0);
/* Get the page offset from the block offset(iblock) */
pgofs = (pgoff_t)(iblock >> (PAGE_CACHE_SHIFT - blkbits));
 
@@ -392,8 +396,10 @@ static int get_data_block_ro(struct inode *inode, sector_t 
iblock,
/* When reading holes, we need its node page */
set_new_dnode(, inode, NULL, NULL, 0);
err = get_dnode_of_data(, pgofs, LOOKUP_NODE_RA);
-   if (err)
+   if (err) {
+   trace_f2fs_get_data_block_exit(inode, iblock, err);
return (err == -ENOENT) ? 0 : err;
+   }
 
/* It does not support data allocation */
BUG_ON(create);
@@ -418,6 +424,7 @@ static int get_data_block_ro(struct inode *inode, sector_t 
iblock,
bh_result->b_size = (i << blkbits);
}
f2fs_put_dnode();
+   trace_f2fs_get_data_block_exit(inode, iblock, 0);
return 0;
 }
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 0d39f58..3df0525 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -266,6 +266,67 @@ TRACE_EVENT(f2fs_truncate,
  (unsigned long) __entry->ino)
 );
 
+TRACE_EVENT_CONDITION(f2fs_readpage,
+   TP_PROTO(struct page *page),
+
+   TP_ARGS(page),
+
+   TP_CONDITION(page->mapping),
+
+   TP_STRUCT__entry(
+   __field(pgoff_t, index)
+   __field(ino_t,  ino)
+   __field(dev_t,  dev)
+
+   ),
+
+   TP_fast_assign(
+   __entry->index  = page->index;
+   __entry->ino= page->mapping->host->i_ino;
+   __entry->dev= page->mapping->host->i_sb->s_dev;
+   ),
+
+   TP_printk("dev %d,%d ino %lu page_index %lu",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ (unsigned long) __entry->index)
+);
+
+DECLARE_EVENT_CLASS(f2fs_data_block,
+   TP_PROTO(struct inode *inode, sector_t block, int ret),
+
+   TP_ARGS(inode, block, ret),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(sector_t,   block)
+   __field(int,ret)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= inode->i_sb->s_dev;
+   __entry->ino= inode->i_ino;
+   __entry->block  = block;
+   __entry->ret= ret;
+   ),
+
+   TP_printk("dev %d,%d ino %lu block number %llu error %d",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+   (unsigned long long) __entry->block, __entry->ret)
+);
+
+DEFINE_EVENT(f2fs_data_block, f2fs_get_data_block_enter,
+   TP_PROTO(struct inode *inode, sector_t block, int ret),
+   TP_ARGS(inode, block, ret)
+);
+
+DEFINE_EVENT(f2fs_data_block, f2fs_get_data_block_exit,
+   TP_PROTO(struct inode *inode, sector_t block, int ret),
+   TP_ARGS(inode, block, ret)
+);
+
 #endif /* _TRACE_F2FS_H */
 
  /* This part must be outside protection */
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 1/7] f2fs: add tracepoints for sync & Inode operations

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints in f2fs for tracing the syncing
operations like filesystem sync, file sync enter/exit.
It will helf to trace the code under debugging scenarios.

Also add tracepoints for tracing the various inode operations
like building inode, eviction of inode, link/unlink of
inodes.

Signed-off-by: Namjae Jeon 
Signed-off-by: Pankaj Kumar 
---
 fs/f2fs/file.c  |3 +
 fs/f2fs/inode.c |3 +
 fs/f2fs/namei.c |3 +
 fs/f2fs/super.c |4 ++
 include/trace/events/f2fs.h |  164 +++
 5 files changed, 177 insertions(+)
 create mode 100644 include/trace/events/f2fs.h

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ac8cbb2..c937d7b 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -25,6 +25,7 @@
 #include "segment.h"
 #include "xattr.h"
 #include "acl.h"
+#include 
 
 static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma,
struct vm_fault *vmf)
@@ -118,6 +119,7 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
if (inode->i_sb->s_flags & MS_RDONLY)
return 0;
 
+   trace_f2fs_sync_file_enter(file, datasync);
ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
if (ret)
return ret;
@@ -155,6 +157,7 @@ int f2fs_sync_file(struct file *file, loff_t start, loff_t 
end, int datasync)
}
 out:
mutex_unlock(>i_mutex);
+   trace_f2fs_sync_file_exit(inode, ret);
return ret;
 }
 
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index f798ddf..41ea158 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -15,6 +15,7 @@
 
 #include "f2fs.h"
 #include "node.h"
+#include 
 
 void f2fs_set_inode_flags(struct inode *inode)
 {
@@ -93,6 +94,7 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long 
ino)
struct inode *inode;
int ret;
 
+   trace_f2fs_iget(sb, ino);
inode = iget_locked(sb, ino);
if (!inode)
return ERR_PTR(-ENOMEM);
@@ -236,6 +238,7 @@ void f2fs_evict_inode(struct inode *inode)
 {
struct f2fs_sb_info *sbi = F2FS_SB(inode->i_sb);
 
+   trace_f2fs_evict_inode(inode);
truncate_inode_pages(>i_data, 0);
 
if (inode->i_ino == F2FS_NODE_INO(sbi) ||
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index 7c6e219..a16036a 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -18,6 +18,7 @@
 #include "node.h"
 #include "xattr.h"
 #include "acl.h"
+#include 
 
 static struct inode *f2fs_new_inode(struct inode *dir, umode_t mode)
 {
@@ -230,6 +231,7 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
*dentry)
struct page *page;
int err = -ENOENT;
 
+   trace_f2fs_unlink_enter(dir, dentry);
f2fs_balance_fs(sbi);
 
de = f2fs_find_entry(dir, >d_name, );
@@ -248,6 +250,7 @@ static int f2fs_unlink(struct inode *dir, struct dentry 
*dentry)
/* In order to evict this inode,  we set it dirty */
mark_inode_dirty(inode);
 fail:
+   trace_f2fs_unlink_exit(inode, err);
return err;
 }
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a756204..0d5300b 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -29,6 +29,9 @@
 #include "segment.h"
 #include "xattr.h"
 
+#define CREATE_TRACE_POINTS
+#include 
+
 static struct kmem_cache *f2fs_inode_cachep;
 
 enum {
@@ -134,6 +137,7 @@ int f2fs_sync_fs(struct super_block *sb, int sync)
 {
struct f2fs_sb_info *sbi = F2FS_SB(sb);
 
+   trace_f2fs_sync_fs(sb, sync);
if (!sbi->s_dirty && !get_pages(sbi, F2FS_DIRTY_NODES))
return 0;
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
new file mode 100644
index 000..fd50db9
--- /dev/null
+++ b/include/trace/events/f2fs.h
@@ -0,0 +1,164 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM f2fs
+
+#if !defined(_TRACE_F2FS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_F2FS_H
+
+#include 
+
+
+TRACE_EVENT(f2fs_sync_file_enter,
+   TP_PROTO(struct file *file, int datasync),
+
+   TP_ARGS(file, datasync),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(ino_t,  parent)
+   __field(int,datasync)
+   ),
+
+   TP_fast_assign(
+   struct dentry *dentry = file->f_path.dentry;
+
+   __entry->dev= dentry->d_inode->i_sb->s_dev;
+   __entry->ino= dentry->d_inode->i_ino;
+   __entry->datasync   = datasync;
+   __entry->parent = dentry->d_parent->d_inode->i_ino;
+   ),
+
+   TP_printk("dev %d,%d ino %lu parent %lu datasync %d ",
+ MAJOR(__entry->dev), MINOR(__entry->dev),
+ (unsigned long) __entry->ino,
+ (unsigned long) __entry->parent, __entry->datasync)
+);
+

[PATCH v3 0/7] f2fs: add tracepoints support in f2fs filesystem

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Add tracepoints in f2fs filesystem for tracing the filesystem
operations for information/debugging purpose if needed. All the
tracepoints are clubbed with respect to functionalities.

Change Log:
v3: Introduced TRACE_EVENT_CONDITION() macro for checking the
condition page->mapping inside the trace point function call as
per Steve's review comment for the patch

v2: Added DECLARE_EVENT_CLASS() macro for combining the similar
type of trace function calls which has same type of arguments.

v1: Introduced the tracepoint functions in f2fs filesystem.

Namjae Jeon (7):
  f2fs: add tracepoints for sync & Inode operations
  f2fs: add tracepoints for truncate operation
  f2fs: add tracepoint for tracing the page i/o operations
  f2fs: add tracepoints for GC threads
  f2fs: add tracepoints to debug the block allocation & fallocate
  f2fs: add tracepoints for write page operations
  f2fs: add tracepoints to debug checkpoint request

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] f2fs: fix typo mistakes

2013-04-05 Thread Namjae Jeon

From: Namjae Jeon 

Fix typo mistakes.
1. I think that it should be 'L' instead of 'V'.
2. and try to fix 'Front' instead of 'Frone'

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/data.c|2 +-
 fs/f2fs/segment.h |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9ed6500..76ff48b 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -136,7 +136,7 @@ void update_extent_cache(block_t blk_addr, struct 
dnode_of_data *dn)
goto end_update;
}
 
-   /* Frone merge */
+   /* Front merge */
if (fofs == start_fofs - 1 && blk_addr == start_blkaddr - 1) {
fi->ext.fofs--;
fi->ext.blk_addr--;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 4c2cd9e..aac74cd 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -11,7 +11,7 @@
 /* constant macro */
 #define NULL_SEGNO ((unsigned int)(~0))
 
-/* V: Logical segment # in volume, R: Relative segment # in main area */
+/* L: Logical segment # in volume, R: Relative segment # in main area */
 #define GET_L2R_SEGNO(free_i, segno)   (segno - free_i->start_segno)
 #define GET_R2L_SEGNO(free_i, segno)   (segno + free_i->start_segno)
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: Android: looping issue, need break when get value firstly.

2013-04-05 Thread Chen Gang

On 2013年04月06日 13:05, Chen Gang wrote:
> On 2013年04月06日 07:48, Arve Hjønnevåg wrote:
>> Theoretically this should not change the behavior. The purpose of this
>> code it to make sure only thread per process is part of a transaction
>> stack, so if it finds more than one transaction with a matching
>> process, they should all point to the same thread object. I think a
>> better change description is needed though.
> 

  oh, sorry, I forgot to be sure one thing before send patch v2.
(the reason maybe is my English is not quite well)

  I guess what your meaning is:
in this condition:
  one thread is related with one process.
  also one process is related with one thread.

  is it correct ?

  thanks.

> 
>   ok, thanks.
>   I will send patch v2 (also mark you as Signed-off).
> (if the patch v2 still need improvement, please reply in time).
> 
> 
>   thanks.
> 
>   :-)
> 


-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add non-zero module sections to sysfs

2013-04-05 Thread Rusty Russell

Sebastian Wankerl  writes:
> On 04/05/13 06:00, Rusty Russell wrote:
>> Sebastian Wankerl  writes:
>>> On 04/04/13 03:00, Rusty Russell wrote:
 Sebastian Wankerl  writes:
> Add non-zero module sections to sysfs on architectures unequal to PARISC.
> KGDB needs all module sections for proper module debugging. Therefore, 
> commit 
> 35dead4235e2b67da7275b4122fed37099c2f462 is revoked except for PARISC
> architecture.
 #ifdef CONFIG_PARISC in the middle of kernel/module.c is super-ugly, and
 wrong.
>>> I don't see why this is wrong. It used to load all sections to sysfs
>>> until the patch mentioned. Actually, it is the PARISC build chain which
>>> is broken.
>
> We worked on that topic further. Now we have another suggestion: would
> it be okay to add a field to struct module for use by kgdb where we save
> the section names for our use. This seems to be the most valuable
> solution as solving the sysfs stuff is rather hard.

It is hard.  But being a kernel hacker isn't just about making newbies
eat flaming death; sometimes we need to solve problems.

We'll see what we can do; but we'll continue this in the branch of the
thread that cc's linux-parisc...

Thanks,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add non-zero module sections to sysfs

2013-04-05 Thread Rusty Russell

James Bottomley  writes:
> On Fri, 2013-04-05 at 14:30 +1030, Rusty Russell wrote:
>> Sebastian Wankerl  writes:
>> > On 04/04/13 03:00, Rusty Russell wrote:
>> >> Sebastian Wankerl  writes:
>> >>> Add non-zero module sections to sysfs on architectures unequal to PARISC.
>> >>> KGDB needs all module sections for proper module debugging. Therefore, 
>> >>> commit 
>> >>> 35dead4235e2b67da7275b4122fed37099c2f462 is revoked except for PARISC
>> >>> architecture.
>
> Thanks for actually cc'ing us.
>
>> >> #ifdef CONFIG_PARISC in the middle of kernel/module.c is super-ugly, and
>> >> wrong.
>> >
>> > I don't see why this is wrong. It used to load all sections to sysfs
>> > until the patch mentioned. Actually, it is the PARISC build chain which
>> > is broken.
>> 
>> Exactly.  Don't workaround it here, revert it and put the
>> duplicate-section-name fixup in parisc where it belongs.
>> 
>> Assuming parisc still produces these dup sections: that patch is 4 years
>> old now.
>
> Just so you know: this isn't a parisc specific problem.  Gcc produces
> duplicate section names under various circumstances, but the one that
> bites us is -ffunction-sections.

*This* is a PA-RISC specific issue.  -ffunction-sections is a different
problem, which this hack wouldn't help.

> Note that there are proposals to use
> -ffunction-sections on all architectures (so we can garbage collect
> unused functions) in which case you'll induce the bug identified in
> 35dead4235e2b67da7275b4122fed37099c2f462 on every architecture

Good point, though I note that we seem to have stalled on
-ffunction-sections.  (And I vaguely recall an issue with
-ffunction-sections and using ld -o which would fold duplicate named
sections back together reducing elimination opportunities).

> The problem is our assumption that section names be unique.  This
> assumption is wrong.  The ELF spec says (version 1.1 page 1-15): "An
> object file may have more than one section with the same name."  We need
> to fix the kernel not to rely on a bogus assumption ... but we had no
> idea how to do that in a way that preserved the backwards compatibility
> of sections subdirectory.
>
> I admit that 35dead4235e2b67da7275b4122fed37099c2f462 is a hack, but now
> the problem has got attention, can we fix it properly?

Yep.  The original patch didn't go through me, or we would have had this
discussion back then...

The use of section names in sysfs goes back to one Mr. Corbet.  Why did
he do it that way?  Because gdb's add-symbol-file makes the same
assumption.  So if we fixed the sysfs somehow, it still wouldn't be
useful, since there's no way to tell gdb :(

The real answer don't use -ffunction-sections on modules: probably not
as important as the rest of the kernel.  And the new shiny is
-flto anyway.

And that leaves us with a PA-RISC specific issue, for which we should
move the fix to PA-RISC.

Thoughts?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: Android: looping issue, need break when get value firstly.

2013-04-05 Thread Chen Gang

On 2013年04月06日 06:01, Greg KH wrote:
> On Fri, Apr 05, 2013 at 04:05:25PM +0800, Chen Gang wrote:
>>
>>   need break when 'target_thread' get value, firstly.
>>
>> 'tmp' is a stack (thread->transaction_stack),
>> if 'proc' was the same between child node and parent node,
>> the child would have higher priority than parent.
> 
> Are you sure about this?
> 
> have you tested it?
> 
> greg k-h
> 
> 

  excuse me, I have not tested it.

  I find it only by reading code.
  I want to discuss it with related members.
(although I use [PATCH] title instead of [Suggestion] title)

  if what I have done is not quite suitable, please provide suggestions.

  thanks.

-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: Android: looping issue, need break when get value firstly.

2013-04-05 Thread Chen Gang

On 2013年04月06日 07:48, Arve Hjønnevåg wrote:
> On Fri, Apr 5, 2013 at 3:01 PM, Greg KH  wrote:
>> > On Fri, Apr 05, 2013 at 04:05:25PM +0800, Chen Gang wrote:
>>> >>
>>> >>   need break when 'target_thread' get value, firstly.
>>> >>
>>> >> 'tmp' is a stack (thread->transaction_stack),
>>> >> if 'proc' was the same between child node and parent node,
>>> >> the child would have higher priority than parent.
>> >
>> > Are you sure about this?
>> >
>> > have you tested it?
>> >
> Theoretically this should not change the behavior. The purpose of this
> code it to make sure only thread per process is part of a transaction
> stack, so if it finds more than one transaction with a matching
> process, they should all point to the same thread object. I think a
> better change description is needed though.


  ok, thanks.
  I will send patch v2 (also mark you as Signed-off).
(if the patch v2 still need improvement, please reply in time).


  thanks.

  :-)

-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] revoke(2) and generic handling of things like remove_proc_entry()

2013-04-05 Thread Al Viro

On Fri, Apr 05, 2013 at 05:29:32AM +0100, Al Viro wrote:
> 4) nasty semantics issue - mmap() vs. revoke (of any sort, including
> remove_proc_entry(), etc.).  Suppose a revokable file had been mmapped;
> now it's going away.  What should we do to its VMAs?  Right now sysfs
> and procfs get away with that, but only because there's only one thing
> that has ->mmap() there - /proc/bus/pci and sysfs equivalents.  I've
> no idea how does pci_mmap_page_range() interact with PCI hotplug (and
> I'm not at all sure that whatever it does isn't racy wrt device removal),
> but I suspect that it strongly depends on lack of ->fault() for those
> VMAs, which makes killing all PTEs pointing to pages in question enough.
> How generic do we want to make it?  Anybody wanting to add more files
> that could be mmapped in procfs/sysfs/debugfs deserves to be hurt, but
> if we start playing with revoke(2), restriction might become inconvenient.
> I'm not sure what kind of behaviour do we want there - *BSD at least
> used to have revoke(2) only for character devices that had no mmap()...

Actually, after looking at what sysfs does...   We might get away with
the following
* new vma flag - VM_REVOKABLE; set by mmap() if ->f_revoke is
non-NULL.  We are short on spare bits there, but there still are some...
* start_using_vma(vma) that checks the presence of that flag,
returns true if it's absent and __start_using(vma->vm_file->f_revoke)
otherwise; a matching stop_using_vma(vma) as well.
* surround vma method calls with start_using_vma/stop_using_vma,
similar to file ones.  Do what fs/sysfs/bin.c wrappers do for revoked
ones - VM_FAULT_SIGBUS for ->fault() and ->page_mkwrite(), -EINVAL for
->access() and ->set_policy(), vma->vm_policy for ->get_policy(),
0 for ->migrate(), "do nothing" for ->open() (and I'm not at all sure that
this one is correct), hell knows what for ->close().  Note that the *only*
instance with ->open and without ->close is sysfs pile of wrappers itself...

Hell knows...  We have few enough call sites for ->vm_op->foo() to make
it feasible and overhead would be trivial.  OTOH, I'm not sure what's the
right behaviour for mmap of something like drm after revoke(2) - leaving
writable pages there looks wrong...

BTW, snd_card_disconnect() doesn't do anything to existing mappings; smells
like a bug, and there we do have ones with non-trivial ->mmap().  Could
ALSA folks comment?

One note about the mockup implementation upthread - __release_revoke() should
suck in a bit more than just ->release() - turning fasync off should also go
there.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next v2] drbd: fix error return code in drbd_init()

2013-04-05 Thread Wei Yongjun

From: Wei Yongjun 

Fix to return a negative error code from the error handling
case instead of 0, as returned elsewhere in this function.

Signed-off-by: Wei Yongjun 
---
 drivers/block/drbd/drbd_main.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 1b93a726..93b3505 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2763,8 +2763,6 @@ int __init drbd_init(void)
/*
 * allocate all necessary structs
 */
-   err = -ENOMEM;
-
init_waitqueue_head(_pp_wait);
 
drbd_proc = NULL; /* play safe for drbd_cleanup */
@@ -2774,6 +2772,7 @@ int __init drbd_init(void)
if (err)
goto fail;
 
+   err = -ENOMEM;
drbd_proc = proc_create_data("drbd", S_IFREG | S_IRUGO , NULL, 
_proc_fops, NULL);
if (!drbd_proc) {
printk(KERN_ERR "drbd: unable to register proc file\n");
@@ -2804,7 +2803,6 @@ int __init drbd_init(void)
 fail:
drbd_cleanup();
if (err == -ENOMEM)
-   /* currently always the case */
printk(KERN_ERR "drbd: ran out of memory\n");
else
printk(KERN_ERR "drbd: initialization failure\n");

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] watchdog: Fix race condition in registration code

2013-04-05 Thread Guenter Roeck

A race condition exists when registering the first watchdog device.
Sequence of events:

- watchdog_register_device calls watchdog_dev_register
- watchdog_dev_register creates the watchdog misc device by calling
  misc_register.
  At that time, the matching character device (/dev/watchdog0) does not yet
  exist, and old_wdd is not set either.
- Userspace gets an event and opens /dev/watchdog
- watchdog_open is called and sets sets wdd = old_wdd, which is still NULL,
  and tries to dereference it. This causes the kernel to panic.

Seen with systemd trying to open /dev/watchdog immediately after
it was created.

Reported-by: Arkadiusz Miskiewicz 
Signed-off-by: Guenter Roeck 
---
Arkadiusz,

would be great if you can test this in your system.

 drivers/watchdog/watchdog_dev.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
index 08b48bb..faf4e18 100644
--- a/drivers/watchdog/watchdog_dev.c
+++ b/drivers/watchdog/watchdog_dev.c
@@ -523,6 +523,7 @@ int watchdog_dev_register(struct watchdog_device *watchdog)
int err, devno;
 
if (watchdog->id == 0) {
+   old_wdd = watchdog;
watchdog_miscdev.parent = watchdog->parent;
err = misc_register(_miscdev);
if (err != 0) {
@@ -531,9 +532,9 @@ int watchdog_dev_register(struct watchdog_device *watchdog)
if (err == -EBUSY)
pr_err("%s: a legacy watchdog module is 
probably present.\n",
watchdog->info->identity);
+   old_wdd = NULL;
return err;
}
-   old_wdd = watchdog;
}
 
/* Fill in the data structures */
-- 
1.7.9.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH -next] virtio_console: make local symbols static

2013-04-05 Thread Wei Yongjun

From: Wei Yongjun 

Those symbols only used within this file, and should be static.

Signed-off-by: Wei Yongjun 
---
 drivers/char/virtio_console.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index 13ad9b1..f73ad64 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -78,8 +78,8 @@ struct ports_driver_data {
 };
 static struct ports_driver_data pdrvdata;
 
-DEFINE_SPINLOCK(pdrvdata_lock);
-DECLARE_COMPLETION(early_console_added);
+static DEFINE_SPINLOCK(pdrvdata_lock);
+static DECLARE_COMPLETION(early_console_added);
 
 /* This struct holds information that's relevant only for console ports */
 struct console {
@@ -1202,7 +1202,7 @@ int __init virtio_cons_early_init(int (*put_chars)(u32, 
const char *, int))
return hvc_instantiate(0, 0, _ops);
 }
 
-int init_port_console(struct port *port)
+static int init_port_console(struct port *port)
 {
int ret;
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 3.8.3 and 3.9git occasional watchdog oops

2013-04-05 Thread Guenter Roeck

On Thu, Apr 04, 2013 at 06:59:59PM -0700, Guenter Roeck wrote:
> On Fri, Apr 05, 2013 at 12:23:30AM +0200, Arkadiusz Miskiewicz wrote:
> > On Thursday 14 of March 2013, Arkadiusz Miśkiewicz wrote:
> > > Hi.
> > > 
> > > Just hit watchdog related oops in 3.8.3 kernel. Unfortunately photos only.
> > > 
> > > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8942.JPG
> > > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.8.3/IMG_8941.JPG
> > 
> > 3.9git from today isn't any better unfortunately:
> > 
> > http://ixion.pld-linux.org/~arekm/watchdog-oops-3.9git.jpg
> > 
> > > 
> > > oops started after I enabled systemd watchdog functionality. Cannot
> > > reproduce easily.
> > > 
> > > watchdog here (thinkpad t400) is:
> > >  iTCO_wdt: Found a ICH9M-E TCO device (Version=2, TCOBASE=0x1060)
> > 
> > 
> Wonder if there is a race condition in the watchdog driver: The watchdog 
> device
> is opened before watchdog_register_device returns. I suspect systemd waits for
> a udev event, or by some other means detects that /dev/watchdog was created,
> and opens it immediately.
> 
> I just have no idea where exactly the race condition, if there is one, is
> hiding. Or maybe I am completely off track.
> 
I _think_ I understand the sequence of events.

- The driver is the first watchdog driver to register.
- watchdog_dev_register() gets called and creates the watchdog misc device
  by calling misc_register().
  At that time, the matching character device (/dev/watchdog0) does not yet
  exist, and old_wdd is not set either.
- Userspace gets an event and opens /dev/watchdog
- watchdog_open() is called and sets sets wdd = old_wdd, which is still NULL,
  and tries to dereference it. Bang.

If this is the problem, a simple fix would be to set old_wdd before calling
misc_register().

Can you test a patch ?

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] revoke(2) and generic handling of things like remove_proc_entry()

2013-04-05 Thread Hannes Frederic Sowa

On Fri, Apr 05, 2013 at 05:29:32AM +0100, Al Viro wrote:
> 4) nasty semantics issue - mmap() vs. revoke (of any sort, including
> remove_proc_entry(), etc.).  Suppose a revokable file had been mmapped;
> now it's going away.  What should we do to its VMAs?  Right now sysfs
> and procfs get away with that, but only because there's only one thing
> that has ->mmap() there - /proc/bus/pci and sysfs equivalents.  I've
> no idea how does pci_mmap_page_range() interact with PCI hotplug (and
> I'm not at all sure that whatever it does isn't racy wrt device removal),
> but I suspect that it strongly depends on lack of ->fault() for those
> VMAs, which makes killing all PTEs pointing to pages in question enough.
> How generic do we want to make it?  Anybody wanting to add more files
> that could be mmapped in procfs/sysfs/debugfs deserves to be hurt, but
> if we start playing with revoke(2), restriction might become inconvenient.
> I'm not sure what kind of behaviour do we want there - *BSD at least
> used to have revoke(2) only for character devices that had no mmap()...

I am seeing possible problems in software implementing their own memory
management ontop SIGSEGV e.g. java. I hope they sanely distinguish
between heap mappings and file mmaps.

FreeBSD allowes tearing down a mmap on MAC security relabel. Two possible
actions are available: SIGSEGV generation by tearing down the mapping
forcefully or enable some kind of copy-on-write semantics on revoke:

http://svnweb.freebsd.org/base/head/sys/security/mac/mac_process.c?revision=248084=markup

I like to see something like revoke being worked on, thanks!

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-05 Thread Yinghai Lu

On Fri, Apr 5, 2013 at 6:25 PM, Neil Horman  wrote:
> I'm sorry.  Forgot to change the wording of the error for the new model that 
> I'm following here.  Although the message is mostly right as bios is 
> responsible for setting and clearing the IRQ remapping feature bit in the 
> chips capabilities register.
>
> I'll fix and repost Monday

>>> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
>>> index d56f8c1..2b56e92 100644
>>> --- a/drivers/iommu/irq_remapping.c
>>> +++ b/drivers/iommu/irq_remapping.c
>>> @@ -19,6 +19,7 @@
>>>  int irq_remapping_enabled;
>>>
>>>  int disable_irq_remap;
>>> +int irq_remap_broken;
>>>  int disable_sourceid_checking;
>>>  int no_x2apic_optout;
>>>
>>> @@ -216,6 +217,17 @@ int irq_remapping_supported(void)
>>> if (disable_irq_remap)
>>> return 0;
>>>
>>> +   if (irq_remap_broken) {
>>> +   WARN_TAINT(1, TAIN_FIRMWARE_WORKAROUND,
>>> +  "This system BIOS has enabled interrupt 
>>> remapping\n"
>>> +  "on a chipset that contains an erratum making 
>>> that\n"
>>> +  "feature unstable.  Please reboot with 
>>> nointremap\n"
>>> +  "added to the kernel command line and contact\n"
>>> +  "your BIOS vendor for an update");

Also please put those warning code in to
drivers/iommu/intel_irq_remapping.c::intel_irq_remapping_supported()

It does not belong to drivers/iommu/irq_remapping.c.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] PCI updates for v3.9

2013-04-05 Thread Bjorn Helgaas

[+cc linux-kernel]

On Fri, Apr 5, 2013 at 7:31 PM, Bjorn Helgaas  wrote:
> Hi Linus,
>
> Here are some fixes for v3.9.  They include fixes for an ASPM problem
> that affects pre-1.1 PCIe devices, a kexec problem, the platform ROM
> image problem, a couple hotplug issues related to PM, and a fix for
> PCI-EISA bridges that have been broken for a long time.
>
> Bjorn
>
>
> The following changes since commit 8bb9660418e05bb1845ac1a2428444d78e322cc7:
>
>   Linux 3.9-rc4 (2013-03-23 16:52:44 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git
> tags/pci-v3.9-fixes-1
>
> for you to fetch changes up to de7d5f729c72638f41d7c17487bccb1c570ff144:
>
>   PCI/PM: Disable runtime PM of PCIe ports (2013-04-03 15:54:59 -0600)
>
> 
> PCI updates for v3.9:
>
>   ASPM
>   Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>   kexec
>   PCI: Don't try to disable Bus Master on disconnected PCI devices
>   Platform ROM images
>   PCI: Add PCI ROM helper for platform-provided ROM images
>   nouveau: Attempt to use platform-provided ROM image
>   radeon: Attempt to use platform-provided ROM image
>   Hotplug
>   PCI/ACPI: Always resume devices on ACPI wakeup notifications
>   PCI/PM: Disable runtime PM of PCIe ports
>   EISA
>   EISA/PCI: Fix bus res reference
>   EISA/PCI: Init EISA early, before PNP
>
> 
> Bjorn Helgaas (3):
>   Merge branch 'pci/mjg-rom' into for-linus
>   Merge branch 'pci/yinghai-eisa' into for-linus
>   Revert "PCI/ACPI: Request _OSC control before scanning PCI root bus"
>
> Konstantin Khlebnikov (1):
>   PCI: Don't try to disable Bus Master on disconnected PCI devices
>
> Matthew Garrett (3):
>   PCI: Add PCI ROM helper for platform-provided ROM images
>   nouveau: Attempt to use platform-provided ROM image
>   radeon: Attempt to use platform-provided ROM image
>
> Rafael J. Wysocki (2):
>   PCI/ACPI: Always resume devices on ACPI wakeup notifications
>   PCI/PM: Disable runtime PM of PCIe ports
>
> Yinghai Lu (2):
>   EISA/PCI: Fix bus res reference
>   EISA/PCI: Init EISA early, before PNP
>
>  drivers/acpi/pci_root.c | 76 
> -
>  drivers/eisa/pci_eisa.c | 67 +++---
>  drivers/gpu/drm/nouveau/core/subdev/bios/base.c | 17 ++
>  drivers/gpu/drm/radeon/radeon_bios.c| 26 +
>  drivers/pci/pci-acpi.c  | 15 ++---
>  drivers/pci/pci-driver.c|  5 +-
>  drivers/pci/pcie/portdrv_pci.c  | 13 -
>  drivers/pci/rom.c   | 67 ++
>  include/linux/pci.h |  1 +
>  9 files changed, 169 insertions(+), 118 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-05 Thread Bjorn Helgaas

On Fri, Apr 5, 2013 at 1:31 PM, Neil Horman  wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.

I'd still like to mention the bugzilla URL in the changelog
(https://bugzilla.redhat.com/show_bug.cgi?id=887006) if it can be made
public.

> ...

> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 3755ef4..bfa3139 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, int 
> func)
>  }
>  #endif
>
> +#ifdef CONFIG_IRQ_REMAP
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +   u8 revision;
> +
> +   revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID);
> +
> +   /*
> +* Revision 0x13 of this chipset supports irq remapping
> +* but has an erratum that breaks its behavior, flag it as such
> +*/
> +   if (revision == 0x13)
> +   irq_remap_broken = 1;
> +
> +}
> +#else
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +}
> +#endif
> +
>  #define QFLAG_APPLY_ONCE   0x1
>  #define QFLAG_APPLIED  0x2
>  #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
> @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = {
>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
> { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
> +   { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> +   { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> {}
>  };
>
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index d56f8c1..2b56e92 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -19,6 +19,7 @@
>  int irq_remapping_enabled;
>
>  int disable_irq_remap;
> +int irq_remap_broken;
>  int disable_sourceid_checking;
>  int no_x2apic_optout;
>
> @@ -216,6 +217,17 @@ int irq_remapping_supported(void)
> if (disable_irq_remap)
> return 0;
>
> +   if (irq_remap_broken) {
> +   WARN_TAINT(1, TAIN_FIRMWARE_WORKAROUND,

This looks like a typo (s/TAIN/TAINT/).

> +  "This system BIOS has enabled interrupt 
> remapping\n"
> +  "on a chipset that contains an erratum making 
> that\n"
> +  "feature unstable.  Please reboot with 
> nointremap\n"
> +  "added to the kernel command line and contact\n"
> +  "your BIOS vendor for an update");

I suspect your updated message won't mention "nointremap", but if it
does, Documentation/kernel-parameters.txt says that option is
deprecated and "intremap=off" should be used instead.

> +   disable_irq_remap = 1;

Tell me if I have this correct:

Before this patch, we had interrupt remapping enabled and
virtualization enabled.  This is safe, but devices might need resets
to deal with lost or spurious interrupts.

After this patch, these same machines will by default have interrupt
remapping disabled and virtualization enabled.  The lost or spurious
interrupt problem should be gone, but we now have the IRQ injection
security bug.

If that's really the change we're making, I'm not comfortable applying
this patch.  But I don't know the details of the IRQ injection
problem, so maybe my understanding of the implications is wrong.

> +   return 0;
> +   }
> +
> if (!remap_ops || !remap_ops->supported)
> return 0;
>
> diff --git a/drivers/iommu/irq_remapping.h b/drivers/iommu/irq_remapping.h
> index ecb6376..d7537e4 100644
> ---

[git pull] device-mapper fixes for 3.9-rc6

2013-04-05 Thread Alasdair G Kergon

Please pull from:

  git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm tags/dm-3.9-fixes-2
 
to get the following device-mapper fixes for 3.9.

Thanks,
Alasdair
 

A pair of patches to fix the writethrough mode of the device-mapper
cache target when the device being cached is not itself wrapped with
device-mapper.


Darrick J. Wong (1):
  dm cache: fix writes to cache device in writethrough mode

Mike Snitzer (1):
  dm cache: reduce bio front_pad size in writeback mode

 drivers/md/dm-cache-target.c |   51 +++---
 1 file changed, 38 insertions(+), 13 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-05 Thread Neil Horman

I'm sorry.  Forgot to change the wording of the error for the new model that 
I'm following here.  Although the message is mostly right as bios is 
responsible for setting and clearing the IRQ remapping feature bit in the chips 
capabilities register.

I'll fix and repost Monday

Neil

Yinghai Lu  wrote:

>On Fri, Apr 5, 2013 at 12:31 PM, Neil Horman  wrote:
>> A few years back intel published a spec update:
>> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>>
>> For the 5520 and 5500 chipsets which contained an errata (specificially 
>> errata
>> 53), which noted that these chipsets can't properly do interrupt remapping, 
>> and
>> as a result the recommend that interrupt remapping be disabled in bios.  
>> While
>> many vendors have a bios update to do exactly that, not all do, and of course
>> not all users update their bios to a level that corrects the problem.  As a
>> result, occasionally interrupts can arrive at a cpu even after affinity for 
>> that
>> interrupt has be moved, leading to lost or spurrious interrupts (usually
>> characterized by the message:
>> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>>
>> There have been several incidents recently of people seeing this error, and
>> investigation has shown that they have system for which their BIOS level is 
>> such
>> that this feature was not properly turned off.  As such, it would be good to
>> give them a reminder that their systems are vulnurable to this problem.
>>
>> Signed-off-by: Neil Horman 
>> CC: Prarit Bhargava 
>> CC: Don Zickus 
>> CC: Don Dutile 
>> CC: Bjorn Helgaas 
>> CC: Asit Mallick 
>> CC: David Woodhouse 
>> CC: linux-...@vger.kernel.org
>> ---
>>
>> Change notes:
>>
>> v2)
>>
>> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
>> chipset series is x86 only.  I decided however to keep the quirk as a regular
>> quirk, not an early_quirk.  Early quirks have no way currently to determine 
>> if
>> BIOS has properly disabled the feature in the iommu, at least not without
>> significant hacking, and since its quite possible this will be a short lived
>> quirk, should Don Z's workaround code prove successful (and it looks like it 
>> may
>> well), I don't think that necessecary.
>>
>> * Removed the WARNING banner from the quirk, and added the HW_ERR token to 
>> the
>> string, I opted to leave the newlines in place however, as I really couldnt
>> find a way to keep the text on a single line is still legible from a code
>> perspective.  I think theres enough language in there that using cscope on 
>> just
>> about any substring however will turn it up, and again, this may be a short
>> lived quirk.
>>
>> v3)
>>
>> * Removed defines from pci_ids.h, and used direct id values as per request 
>> from
>> Bjorn.
>>
>> v4)
>>
>> * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David
>> Woodhouse
>>
>> v5)
>>
>> * Moved check to an early quirk, and flagged the broken chip, so we could
>> reasonably disable irq remapping during bootup.
>>
>> v6)
>> * Clean up of stupid extra thrash in quirks.c
>> ---
>>  arch/x86/kernel/early-quirks.c | 25 +
>>  drivers/iommu/irq_remapping.c  | 12 
>>  drivers/iommu/irq_remapping.h  |  1 +
>>  3 files changed, 38 insertions(+)
>>
>> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
>> index 3755ef4..bfa3139 100644
>> --- a/arch/x86/kernel/early-quirks.c
>> +++ b/arch/x86/kernel/early-quirks.c
>> @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, 
>> int func)
>>  }
>>  #endif
>>
>> +#ifdef CONFIG_IRQ_REMAP
>> +static void __init intel_remapping_check(int num, int slot, int func)
>> +{
>> +   u8 revision;
>> +
>> +   revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID);
>> +
>> +   /*
>> +* Revision 0x13 of this chipset supports irq remapping
>> +* but has an erratum that breaks its behavior, flag it as such
>> +*/
>> +   if (revision == 0x13)
>> +   irq_remap_broken = 1;
>> +
>> +}
>> +#else
>> +static void __init intel_remapping_check(int num, int slot, int func)
>> +{
>> +}
>> +#endif
>> +
>>  #define QFLAG_APPLY_ONCE   0x1
>>  #define QFLAG_APPLIED  0x2
>>  #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
>> @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = {
>>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
>> { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
>> +   { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
>> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
>> +   { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
>> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
>> {}
>>  };
>>
>> diff --git a/drivers/iommu/irq_remapping.c

cgroup: status-quo and userland efforts

2013-04-05 Thread Tejun Heo

Hello, guys.

 Status-quo
 ==

It's been about a year since I wrote up a summary on cgroup status quo
and future plans.  We're not there yet but much closer than we were
before.  At least the locking and object life-time management aren't
crazy anymore and most controllers now support proper hierarchy
although not all of them agree on how to treat inheritance.

IIRC, the yet-to-be-converted ones are blk-throttle and perf.  cpu
needs to be updated so that it at least supports a similar mechanism
as cfq-iosched for configuring ratio between tasks on an internal
cgroup and its children.  Also, we really should update how cpuset
handles a cgroup becoming empty (no cpus or memory node left due to
hot-unplug).  It currently transfers all its tasks to the nearest
ancestor with executing resources, which is an irreversible process
which would affect all other co-mounted controllers.  We probably want
it to just take on the masks of the ancestor until its own executing
resources become online again, and the new behavior should be gated
behind a switch (Li, can you please look into this?).

While we have still ways to go, I feel relatively confident saying
that we aren't too far out now, well, except for the writeback mess
that still needs to be tackled.  Anyways, once the remaining bits are
settled, we can proceed to implement the unified hierarchy mode I've
been talking about forever.  I can't think of any fundamental
roadblocks at the moment but who knows?  The devil usually is in the
details.  Let's hope it goes okay.

So, while we aren't moving as fast as we wish we were, the kernel side
of things are falling into places.  At least, that's how I see it.
>From now on, I think how to make it actually useable to userland
deserves a bit more focus, and by "useable to userland", I don't mean
some group hacking up an elaborate, manual configuration which is
tailored to the point of being eccentric to suit the needs of the said
group.  There's nothing wrong with that and they can continue to do
so, but it just isn't generically useable or useful.  It should be
possible to generically and automatically split resources among, say,
several servers and a couple users sharing a system without resorting
to indecipherable ad-hoc shell script running off rc.local.


 Userland efforts
 

There are currently a few userland efforts trying to make interfacing
with cgroup less painful.

* libcg: Make cgroup interface accessible from programming languages
  with support for configuration persistency, which also brings its
  own config files to remember what to do on the next boot.  Sans the
  persistence part, it just seems to directly translate the filesystem
  interface to function interface.

  http://libcg.sourceforge.net/

* Workman: It's a rather young project but as its name (workload
  management) implies, its aims are higher level than that of libcg.
  It aims to provide high-level resource allocation and management and
  introduces new concepts like resource partitions to represent its
  view of resource hierarchy.  Like libcg, this one is implemented as
  a library but provides bindings for more languages.

  https://gitorious.org/workman/pages/Home

* Pax Controla Groupiana: A document on how not to step on other's
  toes while using cgroup.  It's not a software project but tries to
  define precautions that a software or user can take to avoid
  breaking or confusing other users of the cgroup filesystem.

  http://www.freedesktop.org/wiki/Software/systemd/PaxControlGroups

All try to play nice with other possible users of the cgroup
filesystem - be it libvirt cgroup, applications doing their own cgroup
tricks, or hand-crafted custom scripts.  While the approach is
understandable given that those usages already exist, I don't think
it's a workable solution in the long term.  There are several reasons
for that.

* The configurations aren't independent.  e.g. for weight-based
  controllers, your weight is only meaningful in relation to other
  weights at that level.  Distributing configuration to whatever
  entities which may write to cgroupfs simply cannot work.  It's
  fundamentally flawed.

* It's fragile like hell.  There's no accountability.  Nobody really
  knows what's going on.  Is this subdirectory still there due to a
  bug in this program, or something or someone else created it and
  crashed / forgot to remove it, or what?  Oh, the cgroup I wanted to
  create already exists.  Maybe the previous instance created it and
  then crashed or maybe some other program just happened to choose the
  same name.  Who owns config knobs in that directory?  This way lies
  madness.  I understand why the Pax doc exists but I'm not sure its
  long-term effect would be positive - best practices which ultimately
  lead to utter confusion and fragility.

* In many cases, resource distribution is system-wide policy decisions
  and determining what to do often requires system-wide knowledge.
  You can't

Re: [RFC PATCH arm: initial TI-Nspire support]

2013-04-05 Thread Daniel Tang

Hi,

First of all, thank you for your comments!

On 04/04/2013, at 10:12 PM, Arnd Bergmann  wrote:

> For new platforms, we want to have only the absolute minimum amount of
> code in arch/arm and move everything else into drivers. However, that
> is only possible using device tree. It should not add any significant
> complexity to your code, and you can easily bundle the device tree blob
> with the kernel.

Given that most of your comments described some very fundamental changes (esp 
switching to DTB) to the structure of our port, we've decided we'll probably 
start from scratch and fix the issues you outlined as we reimplement our 
platform.

At the moment, we're working on getting a basic DTB-booting kernel working so 
our next patch will be starting from basics.

Cheers,
tangrs--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 10/10] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

2013-04-05 Thread KOSAKI Motohiro

(3/25/13 11:12 AM), Michal Hocko wrote:
> On Fri 22-03-13 16:23:55, Naoya Horiguchi wrote:
> [...]
>> @@ -2086,11 +2085,7 @@ int hugetlb_treat_movable_handler(struct ctl_table 
>> *table, int write,
>>  void __user *buffer,
>>  size_t *length, loff_t *ppos)
>>  {
>> -proc_dointvec(table, write, buffer, length, ppos);
>> -if (hugepages_treat_as_movable)
>> -htlb_alloc_mask = GFP_HIGHUSER_MOVABLE;
>> -else
>> -htlb_alloc_mask = GFP_HIGHUSER;
>> +/* hugepages_treat_as_movable is obsolete and to be removed. */
> 
> WARN_ON_ONCE("This knob is obsolete and has no effect. It is scheduled for 
> removal")

Indeed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/4] xen/arm: introduce xen_early_init, use PSCI on xen

2013-04-05 Thread Nicolas Pitre

On Sat, 6 Apr 2013, Stefano Stabellini wrote:

> On Fri, 5 Apr 2013, Nicolas Pitre wrote:
> > On Fri, 5 Apr 2013, Rob Herring wrote:
> > 
> > > On 04/05/2013 02:36 PM, Nicolas Pitre wrote:
> > > > On Fri, 5 Apr 2013, Stefano Stabellini wrote:
> > > > 
> > > >> This is what happens:
> > > >>
> > > >> - No Xen
> > > >> Xen is not running on the platform and a Xen hypervisor node is not
> > > >> available on device tree.
> > > >> Everything keeps working seamlessly, this patch doesn't change 
> > > >> anything.
> > > >>
> > > >> - we are running on Xen
> > > >> Xen is running on the platform, we are running as a guest on Xen and an
> > > >> hypervisor node is available on device tree.
> > > >> Let's also assume that there aren't any "arm,cci" compatible nodes on
> > > >> device tree because Xen wouldn't export this kind of information to any
> > > >> guests right now. Therefore PSCI should be used to boot secondary cpus.
> > > >> Because the versatile express machine sets smp_init to
> > > >> vexpress_smp_init_ops, vexpress_smp_init_ops will be called.
> > > >> vexpress_smp_init_ops sets smp_ops to vexpress_smp_ops, that *break*
> > > >> Xen.
> > > > 
> > > > OK I see.
> > > > 
> > > >> With this patch, xen_smp_init will be called instead of
> > > >> vexpress_smp_init_ops, and smp_ops will be set to psci_smp_ops,
> > > >> therefore *unbreaking* Xen.
> > > > 
> > > > However that breaks MCPM.
> > > 
> > > You mean on bare metal, right? For the bare metal, "xen,xen" property
> > > would not be present and xen_smp_init is not used. So the vexpress MCPM
> > > ops will be used. Aren't Dom0 cpu's basically virtual cpus? If Xen ever
> > > needs the MCPM support, the Xen hook itself can figure out whether to
> > > use MCPM support.
> > 
> > Well, if Xen has its own mdesc distinct from the VExpress one then 
> > things
> > are indeed fine.
> 
> It's not about the mdesc: Xen has its own hypervisor node on device tree
> if and only if Xen is running on the platform, therefore the Xen early
> hook is never going to do anything at all on native.
> 
> In other words, this patch should NOT change the behaviour of Linux on
> native, and if it did, it would be a bug.

Perfect.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/10] memory-hotplug: enable memory hotplug to handle hugepage

2013-04-05 Thread KOSAKI Motohiro

(3/22/13 4:23 PM), Naoya Horiguchi wrote:
> Currently we can't offline memory blocks which contain hugepages because
> a hugepage is considered as an unmovable page. But now with this patch
> series, a hugepage has become movable, so by using hugepage migration we
> can offline such memory blocks.
> 
> What's different from other users of hugepage migration is that we need
> to decompose all the hugepages inside the target memory block into free
> buddy pages after hugepage migration, because otherwise free hugepages
> remaining in the memory block intervene the memory offlining.
> For this reason we introduce new functions dissolve_free_huge_page() and
> dissolve_free_huge_pages().
> 
> Other than that, what this patch does is straightforwardly to add hugepage
> migration code, that is, adding hugepage code to the functions which scan
> over pfn and collect hugepages to be migrated, and adding a hugepage
> allocation function to alloc_migrate_target().
> 
> As for larger hugepages (1GB for x86_64), it's not easy to do hotremove
> over them because it's larger than memory block. So we now simply leave
> it to fail as it is.
> 
> ChangeLog v2:
>  - changed return value type of is_hugepage_movable() to bool
>  - is_hugepage_movable() uses list_for_each_entry() instead of *_safe()
>  - moved if(PageHuge) block before get_page_unless_zero() in 
> do_migrate_range()
>  - do_migrate_range() returns -EBUSY for hugepages larger than memory block
>  - dissolve_free_huge_pages() calculates scan step and sets it to minimum
>hugepage size
> 
> Signed-off-by: Naoya Horiguchi 
> ---
>  include/linux/hugetlb.h |  6 +
>  mm/hugetlb.c| 58 
> +
>  mm/memory_hotplug.c | 42 +++
>  mm/page_alloc.c | 12 ++
>  mm/page_isolation.c |  5 +
>  5 files changed, 114 insertions(+), 9 deletions(-)
> 
> diff --git v3.9-rc3.orig/include/linux/hugetlb.h 
> v3.9-rc3/include/linux/hugetlb.h
> index 981eff8..8220a8a 100644
> --- v3.9-rc3.orig/include/linux/hugetlb.h
> +++ v3.9-rc3/include/linux/hugetlb.h
> @@ -69,6 +69,7 @@ int dequeue_hwpoisoned_huge_page(struct page *page);
>  void putback_active_hugepage(struct page *page);
>  void putback_active_hugepages(struct list_head *l);
>  void migrate_hugepage_add(struct page *page, struct list_head *list);
> +bool is_hugepage_movable(struct page *page);
>  void copy_huge_page(struct page *dst, struct page *src);
>  
>  extern unsigned long hugepages_treat_as_movable;
> @@ -134,6 +135,7 @@ static inline int dequeue_hwpoisoned_huge_page(struct 
> page *page)
>  #define putback_active_hugepage(p) 0
>  #define putback_active_hugepages(l) 0
>  #define migrate_hugepage_add(p, l) 0
> +#define is_hugepage_movable(x) 0

should be false instaed of 0.


>  static inline void copy_huge_page(struct page *dst, struct page *src)
>  {
>  }
> @@ -356,6 +358,9 @@ static inline int hstate_index(struct hstate *h)
>   return h - hstates;
>  }
>  
> +extern void dissolve_free_huge_pages(unsigned long start_pfn,
> +  unsigned long end_pfn);
> +
>  #else
>  struct hstate {};
>  #define alloc_huge_page(v, a, r) NULL
> @@ -376,6 +381,7 @@ static inline unsigned int pages_per_huge_page(struct 
> hstate *h)
>  }
>  #define hstate_index_to_shift(index) 0
>  #define hstate_index(h) 0
> +#define dissolve_free_huge_pages(s, e) 0

no need 0.

>  #endif
>  
>  #endif /* _LINUX_HUGETLB_H */
> diff --git v3.9-rc3.orig/mm/hugetlb.c v3.9-rc3/mm/hugetlb.c
> index d9d3dd7..ef79871 100644
> --- v3.9-rc3.orig/mm/hugetlb.c
> +++ v3.9-rc3/mm/hugetlb.c
> @@ -844,6 +844,36 @@ static int free_pool_huge_page(struct hstate *h, 
> nodemask_t *nodes_allowed,
>   return ret;
>  }
>  
> +/* Dissolve a given free hugepage into free pages. */
> +static void dissolve_free_huge_page(struct page *page)
> +{
> + spin_lock(_lock);
> + if (PageHuge(page) && !page_count(page)) {
> + struct hstate *h = page_hstate(page);
> + int nid = page_to_nid(page);
> + list_del(>lru);
> + h->free_huge_pages--;
> + h->free_huge_pages_node[nid]--;
> + update_and_free_page(h, page);
> + }
> + spin_unlock(_lock);
> +}
> +
> +/* Dissolve free hugepages in a given pfn range. Used by memory hotplug. */
> +void dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn)
> +{
> + unsigned int order = 8 * sizeof(void *);
> + unsigned long pfn;
> + struct hstate *h;
> +
> + /* Set scan step to minimum hugepage size */
> + for_each_hstate(h)
> + if (order > huge_page_order(h))
> + order = huge_page_order(h);
> + for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << order)
> + dissolve_free_huge_page(pfn_to_page(pfn));
> +}

hotplug.c must not have such pure huge page function.


> +
>  static struct page *alloc_buddy_huge_page(struct hstate *h, int nid)
>

Re: [PATCH] Staging: Android: looping issue, need break when get value firstly.

2013-04-05 Thread Arve Hjønnevåg

On Fri, Apr 5, 2013 at 3:01 PM, Greg KH  wrote:
> On Fri, Apr 05, 2013 at 04:05:25PM +0800, Chen Gang wrote:
>>
>>   need break when 'target_thread' get value, firstly.
>>
>> 'tmp' is a stack (thread->transaction_stack),
>> if 'proc' was the same between child node and parent node,
>> the child would have higher priority than parent.
>
> Are you sure about this?
>
> have you tested it?
>

Theoretically this should not change the behavior. The purpose of this
code it to make sure only thread per process is part of a transaction
stack, so if it finds more than one transaction with a matching
process, they should all point to the same thread object. I think a
better change description is needed though.

--
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq/intel_pstate: Set timer timeout correctly

2013-04-05 Thread Parag Warudkar



On Fri, 5 Apr 2013, Viresh Kumar wrote:

> On Thu, Apr 4, 2013 at 11:05 PM,   wrote:
> > From: Dirk Brandewie 
> >
> > The current calculation of the delay time is wrong and a cut and paste
> > error from a previous experimental driver.  This can result in the
> > timeout being set to jiffies + 1 which setup the driver to race with
> > it's self if the apic timer interrupt happen at just the right time.
> >
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=920289
> >
> > Reported-by: Adam Williamson 
> > Reported-by: Parag Warudkar 
> >
> > Signed-off-by: Dirk Brandewie 
> > ---
> >  drivers/cpufreq/intel_pstate.c |1 -
> >  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> Looks fine, but I would like to see a Tested-by from Adam/Parag
> as they haven't said anything about this patch (even in bugzilla).
> 

I am running with the patch since yesterday - everything looks good.

The issue hasn't been reproducible on demand but some code reading and 
Dirk's explanation says the patch should fix the issue.

So - Tested-by: Parag Warudkar 

Parag
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] Android Binder IPC Fixes

2013-04-05 Thread Arve Hjønnevåg

On Fri, Apr 5, 2013 at 3:00 PM, Greg KH  wrote:
> On Thu, Apr 04, 2013 at 01:32:30PM +0100, Serban Constantinescu wrote:
>> Hi all,
>>
>> This set of patches will clean-up and fix some of the issues that arise
>> with the current binder interface when moving to a 64bit kernel. All these
>> changes will not affect the existing 32bit Android interface and are meant
>> to stand as the base for the 64bit binder compat layer.
>>
>> This patch set has been successfully tested on 32bit platforms(ARMv7 
>> VExpress)
>> and 64bit platforms(ARMv8 RTSM) running a 32bit Android userspace and an in
>> kernel binder compat layer.
>
> I need some acks from some of the Android developers before I can take
> this.
>

I still think it is better to change user-space to use 64 bit pointer
types when running on a 64 bit kernel. These changes do not seem to
allow 64 bit user-space processes on a 64 bit kernel.

--
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6] irq: add quirk for broken interrupt remapping on 55XX chipsets

2013-04-05 Thread Yinghai Lu

On Fri, Apr 5, 2013 at 12:31 PM, Neil Horman  wrote:
> A few years back intel published a spec update:
> http://www.intel.com/content/dam/doc/specification-update/5520-and-5500-chipset-ioh-specification-update.pdf
>
> For the 5520 and 5500 chipsets which contained an errata (specificially errata
> 53), which noted that these chipsets can't properly do interrupt remapping, 
> and
> as a result the recommend that interrupt remapping be disabled in bios.  While
> many vendors have a bios update to do exactly that, not all do, and of course
> not all users update their bios to a level that corrects the problem.  As a
> result, occasionally interrupts can arrive at a cpu even after affinity for 
> that
> interrupt has be moved, leading to lost or spurrious interrupts (usually
> characterized by the message:
> kernel: do_IRQ: 7.71 No irq handler for vector (irq -1)
>
> There have been several incidents recently of people seeing this error, and
> investigation has shown that they have system for which their BIOS level is 
> such
> that this feature was not properly turned off.  As such, it would be good to
> give them a reminder that their systems are vulnurable to this problem.
>
> Signed-off-by: Neil Horman 
> CC: Prarit Bhargava 
> CC: Don Zickus 
> CC: Don Dutile 
> CC: Bjorn Helgaas 
> CC: Asit Mallick 
> CC: David Woodhouse 
> CC: linux-...@vger.kernel.org
> ---
>
> Change notes:
>
> v2)
>
> * Moved the quirk to the x86 arch, since consensus seems to be that the 55XX
> chipset series is x86 only.  I decided however to keep the quirk as a regular
> quirk, not an early_quirk.  Early quirks have no way currently to determine if
> BIOS has properly disabled the feature in the iommu, at least not without
> significant hacking, and since its quite possible this will be a short lived
> quirk, should Don Z's workaround code prove successful (and it looks like it 
> may
> well), I don't think that necessecary.
>
> * Removed the WARNING banner from the quirk, and added the HW_ERR token to the
> string, I opted to leave the newlines in place however, as I really couldnt
> find a way to keep the text on a single line is still legible from a code
> perspective.  I think theres enough language in there that using cscope on 
> just
> about any substring however will turn it up, and again, this may be a short
> lived quirk.
>
> v3)
>
> * Removed defines from pci_ids.h, and used direct id values as per request 
> from
> Bjorn.
>
> v4)
>
> * Converted pr_warn to WARN_TAINT(TAINT_FIRMWARE_WORKAROUND) as per David
> Woodhouse
>
> v5)
>
> * Moved check to an early quirk, and flagged the broken chip, so we could
> reasonably disable irq remapping during bootup.
>
> v6)
> * Clean up of stupid extra thrash in quirks.c
> ---
>  arch/x86/kernel/early-quirks.c | 25 +
>  drivers/iommu/irq_remapping.c  | 12 
>  drivers/iommu/irq_remapping.h  |  1 +
>  3 files changed, 38 insertions(+)
>
> diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c
> index 3755ef4..bfa3139 100644
> --- a/arch/x86/kernel/early-quirks.c
> +++ b/arch/x86/kernel/early-quirks.c
> @@ -192,6 +192,27 @@ static void __init ati_bugs_contd(int num, int slot, int 
> func)
>  }
>  #endif
>
> +#ifdef CONFIG_IRQ_REMAP
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +   u8 revision;
> +
> +   revision = pci_read_config_byte(num, slot, func , PCI_REVISION_ID);
> +
> +   /*
> +* Revision 0x13 of this chipset supports irq remapping
> +* but has an erratum that breaks its behavior, flag it as such
> +*/
> +   if (revision == 0x13)
> +   irq_remap_broken = 1;
> +
> +}
> +#else
> +static void __init intel_remapping_check(int num, int slot, int func)
> +{
> +}
> +#endif
> +
>  #define QFLAG_APPLY_ONCE   0x1
>  #define QFLAG_APPLIED  0x2
>  #define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
> @@ -221,6 +242,10 @@ static struct chipset early_qrk[] __initdata = {
>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
> { PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
>   PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
> +   { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST,
> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> +   { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST,
> + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check },
> {}
>  };
>
> diff --git a/drivers/iommu/irq_remapping.c b/drivers/iommu/irq_remapping.c
> index d56f8c1..2b56e92 100644
> --- a/drivers/iommu/irq_remapping.c
> +++ b/drivers/iommu/irq_remapping.c
> @@ -19,6 +19,7 @@
>  int irq_remapping_enabled;
>
>  int disable_irq_remap;
> +int irq_remap_broken;
>  int disable_sourceid_checking;
>  int no_x2apic_optout;
>
> @@ -216,6 +217,17 @@ int irq_remapping_supported(void)
> if (disable_irq_remap)
> return 0;
>
> +   if (irq_remap_broken) {
>

Re: [ANNOUNCE] 3.8.4-rt2

2013-04-05 Thread Paul Gortmaker

On Tue, Mar 26, 2013 at 4:17 PM, Sebastian Andrzej Siewior
 wrote:
>
> Dear RT Folks,
>
> I'm pleased to announce the 3.8.4-rt2 release.
>
> changes since v3.8.4-rt1:
> - build fix for i915 (reported by "Luis Claudio R. Goncalves")
> - build fix for fscache (reported by tglx)
> - build fix for !RT (kernel/softirq.c did not compile)
> - per-cpu rwsem fixed for RT (required only by uprobes so far)
> - slub: delay the execution of the ->ctor() hook for newly created
>   objects. This lowers the worst case latencies.
>
> Known issues:
>
>   - SLxB is broken on PowerPC.
>
> The delta patch against v3.8.4-rt1 is appended below and can be found
> here:
>
>   
> https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/incr/patch-3.8.4-rt1-rt2.patch.xz
>
> The RT patch against 3.8.4 can be found here:
>
>   
> https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patch-3.8.4-rt2.patch.xz
>
> The split quilt queue is available at:
>
>   
> https://www.kernel.org/pub/linux/kernel/projects/rt/3.8/patches-3.8.4-rt2.tar.xz

Thanks for the work in putting this together.  Just a heads up that the
split queue fails on a patch with no author/date/subject as follows:

[...]
(174/286) Applying: rt-add-rt-to-mutex-headers.patch
(175/286) Applying: rwsem-add-rt-variant.patch
(176/286) Applying: rt: Add the preempt-rt lock replacement APIs
(177/286) Patch format detection failed.
git am of percpu-rwsem-compilefix.patch failed.

Looking at percpu-rwsem-compilefix.patch -- it starts with three dashes,
so it looks like the shortlog and long-log got inadvertently thrown out,
as well as any author or optional date information, etc.

Fixing that one, I get further, until I get to:

[...]
(270/286) Applying: wait-simple: Simple waitqueue implementation
(271/286) Applying: rcutiny: Use simple waitqueue
(272/286) Patch format detection failed.
git am of treercu-use-simple-waitqueue.patch failed.

This "treercu-use-simple-waitqueue.patch" has the exact same problem.

Note that I'm using git directly, and not quilt -- which is more
strict in what it will accept, hence it catches these kinds of things.

Thanks,
Paul.
--

> Sebastian
> ---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/4] xen/arm: introduce xen_early_init, use PSCI on xen

2013-04-05 Thread Stefano Stabellini

On Fri, 5 Apr 2013, Nicolas Pitre wrote:
> On Fri, 5 Apr 2013, Rob Herring wrote:
> 
> > On 04/05/2013 02:36 PM, Nicolas Pitre wrote:
> > > On Fri, 5 Apr 2013, Stefano Stabellini wrote:
> > > 
> > >> This is what happens:
> > >>
> > >> - No Xen
> > >> Xen is not running on the platform and a Xen hypervisor node is not
> > >> available on device tree.
> > >> Everything keeps working seamlessly, this patch doesn't change anything.
> > >>
> > >> - we are running on Xen
> > >> Xen is running on the platform, we are running as a guest on Xen and an
> > >> hypervisor node is available on device tree.
> > >> Let's also assume that there aren't any "arm,cci" compatible nodes on
> > >> device tree because Xen wouldn't export this kind of information to any
> > >> guests right now. Therefore PSCI should be used to boot secondary cpus.
> > >> Because the versatile express machine sets smp_init to
> > >> vexpress_smp_init_ops, vexpress_smp_init_ops will be called.
> > >> vexpress_smp_init_ops sets smp_ops to vexpress_smp_ops, that *break*
> > >> Xen.
> > > 
> > > OK I see.
> > > 
> > >> With this patch, xen_smp_init will be called instead of
> > >> vexpress_smp_init_ops, and smp_ops will be set to psci_smp_ops,
> > >> therefore *unbreaking* Xen.
> > > 
> > > However that breaks MCPM.
> > 
> > You mean on bare metal, right? For the bare metal, "xen,xen" property
> > would not be present and xen_smp_init is not used. So the vexpress MCPM
> > ops will be used. Aren't Dom0 cpu's basically virtual cpus? If Xen ever
> > needs the MCPM support, the Xen hook itself can figure out whether to
> > use MCPM support.
> 
> Well, if Xen has its own mdesc distinct from the VExpress one then 
> things
> are indeed fine.

It's not about the mdesc: Xen has its own hypervisor node on device tree
if and only if Xen is running on the platform, therefore the Xen early
hook is never going to do anything at all on native.

In other words, this patch should NOT change the behaviour of Linux on
native, and if it did, it would be a bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] x86: kernel base offset ASLR

2013-04-05 Thread Kees Cook

On Fri, Apr 5, 2013 at 1:43 PM, Borislav Petkov  wrote:
> On Fri, Apr 05, 2013 at 01:19:39PM -0700, Julien Tinnes wrote:
>> I think it'd be perfectly ok for OOPS to print out the kernel base.
>
> Yeah, ok, this still would need some massaging of the oops output per
> script, but it shouldn't be a big problem.
>
> Also, you probably need to make clear in the oops itself that the
> addresses have been randomized. Or, is the mere presence of kernel base
> going to imply that?

There is already a hook in the patch that prints the offset:

+dump_kernel_offset(struct notifier_block *self, unsigned long v, void *p)
+{
+   pr_emerg("Kernel Offset: 0x%lx\n",
+(unsigned long)&_text - __START_KERNEL);
...
+   atomic_notifier_chain_register(_notifier_list,
+   _offset_notifier);

But of course, this can get improved.

-Kees

--
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/8] staging: android: binder: replace explicit size types

2013-04-05 Thread Arve Hjønnevåg

On Thu, Apr 4, 2013 at 5:32 AM, Serban Constantinescu
 wrote:
>
> Since the binder driver uses both uint32_t and unsigned int any further
> kernel changes will be difficult to read. This patch fixes the inconsistent
> types usage.
>

Would it make more sense to only change the types that need to be
larger on a 64 bit system?

--
Arve Hjønnevåg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Theodore Ts'o

On Sat, Apr 06, 2013 at 12:18:11AM +0200, Jiri Slaby wrote:
> Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but
> it still sucks. Updating a kernel in a VM still results in "Your system
> is too SLOW to play this!" by mplayer and frame dropping.

What was the first kernel where you didn't have the problem?  Were you
using the 3.8 kernel earlier, and did you see the interactivity
problems there?

What else was running in on your desktop at the same time?  How was
the file system mounted, and can you send me the output of dumpe2fs -h
/dev/XXX?  Oh, and what options were you using to when you kicked off
the VM?

The other thing that would be useful was to enable the jbd2_run_stats
tracepoint and to send the output of the trace log when you notice the
interactivity problems.

Thanks,

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/4] xen/arm: introduce xen_early_init, use PSCI on xen

2013-04-05 Thread Stefano Stabellini

On Fri, 5 Apr 2013, Rob Herring wrote:
> On 04/05/2013 02:36 PM, Nicolas Pitre wrote:
> > On Fri, 5 Apr 2013, Stefano Stabellini wrote:
> > 
> >> This is what happens:
> >>
> >> - No Xen
> >> Xen is not running on the platform and a Xen hypervisor node is not
> >> available on device tree.
> >> Everything keeps working seamlessly, this patch doesn't change anything.
> >>
> >> - we are running on Xen
> >> Xen is running on the platform, we are running as a guest on Xen and an
> >> hypervisor node is available on device tree.
> >> Let's also assume that there aren't any "arm,cci" compatible nodes on
> >> device tree because Xen wouldn't export this kind of information to any
> >> guests right now. Therefore PSCI should be used to boot secondary cpus.
> >> Because the versatile express machine sets smp_init to
> >> vexpress_smp_init_ops, vexpress_smp_init_ops will be called.
> >> vexpress_smp_init_ops sets smp_ops to vexpress_smp_ops, that *break*
> >> Xen.
> > 
> > OK I see.
> > 
> >> With this patch, xen_smp_init will be called instead of
> >> vexpress_smp_init_ops, and smp_ops will be set to psci_smp_ops,
> >> therefore *unbreaking* Xen.
> > 
> > However that breaks MCPM.
> 
> You mean on bare metal, right? For the bare metal, "xen,xen" property
> would not be present and xen_smp_init is not used. So the vexpress MCPM
> ops will be used.

That is correct.


> Aren't Dom0 cpu's basically virtual cpus? If Xen ever
> needs the MCPM support, the Xen hook itself can figure out whether to
> use MCPM support.

Right.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] mac802154: Keep track of the channel when changed

2013-04-05 Thread Alan Ott

Two sections checked whether the current channel != the new channel
without ever setting the current channel variables.

1. net/mac802154/tx.c: Prevent set_channel() from getting called every
time a packet is sent.

2. net/mac802154/mib.c: Lock (pib_lock) accesses to current_channel and
current_page and make sure they are updated when the channel has been
changed.

Signed-off-by: Alan Ott 
---
 net/mac802154/mib.c | 12 +++-
 net/mac802154/tx.c  |  3 +++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/mac802154/mib.c b/net/mac802154/mib.c
index f03e55f..8ded97c 100644
--- a/net/mac802154/mib.c
+++ b/net/mac802154/mib.c
@@ -176,9 +176,15 @@ static void phy_chan_notify(struct work_struct *work)
struct mac802154_sub_if_data *priv = netdev_priv(nw->dev);
int res;
 
+   mutex_lock(>hw->phy->pib_lock);
res = hw->ops->set_channel(>hw, priv->page, priv->chan);
if (res)
pr_debug("set_channel failed\n");
+   else {
+   priv->hw->phy->current_channel = priv->chan;
+   priv->hw->phy->current_page = priv->page;
+   }
+   mutex_unlock(>hw->phy->pib_lock);
 
kfree(nw);
 }
@@ -195,8 +201,11 @@ void mac802154_dev_set_page_channel(struct net_device 
*dev, u8 page, u8 chan)
priv->chan = chan;
spin_unlock_bh(>mib_lock);
 
+   mutex_lock(>hw->phy->pib_lock);
if (priv->hw->phy->current_channel != priv->chan ||
priv->hw->phy->current_page != priv->page) {
+   mutex_unlock(>hw->phy->pib_lock);
+
work = kzalloc(sizeof(*work), GFP_ATOMIC);
if (!work)
return;
@@ -204,5 +213,6 @@ void mac802154_dev_set_page_channel(struct net_device *dev, 
u8 page, u8 chan)
INIT_WORK(>work, phy_chan_notify);
work->dev = dev;
queue_work(priv->hw->dev_workqueue, >work);
-   }
+   } else
+   mutex_unlock(>hw->phy->pib_lock);
 }
diff --git a/net/mac802154/tx.c b/net/mac802154/tx.c
index 3fd3e07..6d16473 100644
--- a/net/mac802154/tx.c
+++ b/net/mac802154/tx.c
@@ -58,6 +58,9 @@ static void mac802154_xmit_worker(struct work_struct *work)
pr_debug("set_channel failed\n");
goto out;
}
+
+   xw->priv->phy->current_channel = xw->chan;
+   xw->priv->phy->current_page = xw->page;
}
 
res = xw->priv->ops->xmit(>priv->hw, xw->skb);
-- 
1.7.11.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] revoke(2) and generic handling of things like remove_proc_entry()

2013-04-05 Thread Greg Kroah-Hartman

On Fri, Apr 05, 2013 at 09:51:37PM +0100, Al Viro wrote:
> On Fri, Apr 05, 2013 at 12:56:09PM -0700, Greg Kroah-Hartman wrote:
> > > 4) nasty semantics issue - mmap() vs. revoke (of any sort, including
> > > remove_proc_entry(), etc.).  Suppose a revokable file had been mmapped;
> > > now it's going away.  What should we do to its VMAs?  Right now sysfs
> > > and procfs get away with that, but only because there's only one thing
> > > that has ->mmap() there - /proc/bus/pci and sysfs equivalents.  I've
> > > no idea how does pci_mmap_page_range() interact with PCI hotplug (and
> > > I'm not at all sure that whatever it does isn't racy wrt device removal),
> > 
> > The page range should just start returning 0xff all over the place, the
> > BIOS should have kept the mapping around, as it can't really assign it
> > anywhere else, so all _should_ be fine here.
> 
> Umm... 0xff or SIGSEGV?

I think, at first glance, 0xff, as the area is still "mapped" to the
device, and that never gets invaldated from what I can tell, despite the
device now being gone.

> > I think that's a reasonable constraint, although tearing down the VMAs
> > might be possible if we just invalidate the file handle "forcefully"
> > (i.e. manually tear them down and then further accesses should through a
> > SIGSEV fail, or am I missing something more basic here?)
> 
> The question is how to do that in a reasonably clean way; we would've done
> as part of ->kick(), I suppose, or right next to it.

I don't really know, sorry.

> > > 6) how do we get from revoke(2) to call of revoke_it() on the right 
> > > object?
> > > Note that revoke(2) is done by pathname; we might want an ...at() variant,
> > > but all we'll have to play with will be inode, not an opened file.
> > 
> > Can we make revoke(2) require a valid file handle?  Is there a POSIX
> > spec for revoke(2) that we have to follow here, or given that we haven't
> > had one yet, are we free to define whatever we want without people
> > getting that upset?
> 
> BSD one takes a pathname and so do all derived ones...

Ugh, ok, they were there first, fair enough.

Hm, how do they solve this type of race condition?  Last time I looked
(middle of last year) at one of the revoke BSD implementations, I don't
recall anything special to try to prevent this.  Is it that they just
don't care as almost no one uses it, and it's only for tty devices?  Or
did I miss something?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[tip:x86/urgent] x86: Fix rebuild with EFI_STUB enabled

2013-04-05 Thread tip-bot for Jan Beulich

Commit-ID:  918708245e92941df16a634dc201b407d12bcd91
Gitweb: http://git.kernel.org/tip/918708245e92941df16a634dc201b407d12bcd91
Author: Jan Beulich 
AuthorDate: Wed, 3 Apr 2013 15:47:33 +0100
Committer:  H. Peter Anvin 
CommitDate: Fri, 5 Apr 2013 13:59:23 -0700

x86: Fix rebuild with EFI_STUB enabled

eboot.o and efi_stub_$(BITS).o didn't get added to "targets", and hence
their .cmd files don't get included by the build machinery, leading to
the files always getting rebuilt.

Rather than adding the two files individually, take the opportunity and
add $(VMLINUX_OBJS) to "targets" instead, thus allowing the assignment
at the top of the file to be shrunk quite a bit.

At the same time, remove a pointless flags override line - the variable
assigned to was misspelled anyway, and the options added are
meaningless for assembly sources.

[ hpa: the patch is not minimal, but I am taking it for -urgent anyway
  since the excess impact of the patch seems to be small enough. ]

Signed-off-by: Jan Beulich 
Link: http://lkml.kernel.org/r/515c5d250278000ca...@nat28.tlf.novell.com
Cc: Matthew Garrett 
Cc: Matt Fleming 
Signed-off-by: H. Peter Anvin 
---
 arch/x86/boot/compressed/Makefile | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 8a84501..5ef205c 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -4,7 +4,7 @@
 # create a compressed vmlinux image from the original vmlinux
 #
 
-targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 
vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo head_$(BITS).o misc.o string.o 
cmdline.o early_serial_console.o piggy.o
+targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma 
vmlinux.bin.xz vmlinux.bin.lzo
 
 KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2
 KBUILD_CFLAGS += -fno-strict-aliasing -fPIC
@@ -29,7 +29,6 @@ VMLINUX_OBJS = $(obj)/vmlinux.lds $(obj)/head_$(BITS).o 
$(obj)/misc.o \
$(obj)/piggy.o
 
 $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone
-$(obj)/efi_stub_$(BITS).o: KBUILD_CLFAGS += -fshort-wchar -mno-red-zone
 
 ifeq ($(CONFIG_EFI_STUB), y)
VMLINUX_OBJS += $(obj)/eboot.o $(obj)/efi_stub_$(BITS).o
@@ -43,7 +42,7 @@ OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 $(obj)/vmlinux.bin: vmlinux FORCE
$(call if_changed,objcopy)
 
-targets += vmlinux.bin.all vmlinux.relocs
+targets += $(patsubst $(obj)/%,%,$(VMLINUX_OBJS)) vmlinux.bin.all 
vmlinux.relocs
 
 CMD_RELOCS = arch/x86/tools/relocs
 quiet_cmd_relocs = RELOCS  $@
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] pci: Add PCI ROM helper for platform-provided ROM images

2013-04-05 Thread Chris Murphy

On Apr 5, 2013, at 2:35 PM, Bjorn Helgaas  wrote:

> On Fri, Apr 5, 2013 at 2:31 PM, Chris Murphy  
> wrote:
>> 
>> 
>> Are they in 3.9.0-0.rc5.git2.1.f19? I'm seeing a regression from 3.8.5 with 
>> the radeon driver not finding BIOS ROM as well.
>> https://bugzilla.redhat.com/show_bug.cgi?id=949083
> 
> No.  I haven't asked Linus to pull my branch yet (was just thinking it
> was time to do that, coincidentally :))

The patch appears to fix Bug 949083 radeon issue as well. I've updated the bug 
report.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/10] mbind: add hugepage migration code to mbind()

2013-04-05 Thread KOSAKI Motohiro

>> -if (!new_hpage)
>> +/*
>> + * Getting a new hugepage with alloc_huge_page() (which can happen
>> + * when migration is caused by mbind()) can return ERR_PTR value,
>> + * so we need take care of the case here.
>> + */
>> +if (!new_hpage || IS_ERR_VALUE(new_hpage))
>>  return -ENOMEM;
> 
> Please no. get_new_page returns NULL or a page. You are hooking a wrong
> callback here. The error value doesn't make any sense here. IMO you
> should just wrap alloc_huge_page by something that returns NULL or page.

I suggest just opposite way. new_vma_page() always return ENOMEM, ENOSPC etc 
instad 
of NULL. and caller propegate it to userland.
I guess userland want to distingush why mbind was failed.

Anyway, If new_vma_page() have a change to return both NULL and -ENOMEM. That's 
a bug.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Excessive stall times on ext4 in 3.9-rc2

2013-04-05 Thread Jiri Slaby

On 04/03/2013 12:19 PM, Mel Gorman wrote:
> On Tue, Apr 02, 2013 at 11:14:36AM -0400, Theodore Ts'o wrote:
>> On Tue, Apr 02, 2013 at 11:06:51AM -0400, Theodore Ts'o wrote:
>>>
>>> Can you try 3.9-rc4 or later and see if the problem still persists?
>>> There were a number of ext4 issues especially around low memory
>>> performance which weren't resolved until -rc4.
>>
>> Actually, sorry, I took a closer look and I'm not as sure going to
>> -rc4 is going to help (although we did have some ext4 patches to fix a
>> number of bugs that flowed in as late as -rc4).
>>
> 
> I'm running with -rc5 now. I have not noticed much interactivity problems
> as such but the stall detection script reported that mutt stalled for
> 20 seconds opening an inbox and imapd blocked for 59 seconds doing path
> lookups, imaps blocked again for 12 seconds doing an atime update, an RSS
> reader blocked for 3.5 seconds writing a file. etc.
> 
> There has been no reclaim activity in the system yet and 2G is still free
> so it's very unlikely to be a page or slab reclaim problem.

Ok, so now I'm runnning 3.9.0-rc5-next-20130404, it's not that bad, but
it still sucks. Updating a kernel in a VM still results in "Your system
is too SLOW to play this!" by mplayer and frame dropping.

3.5G out of 6G memory used, the rest is I/O cache.

I have 7200RPM disks in my desktop.

-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/10] mbind: add hugepage migration code to mbind()

2013-04-05 Thread KOSAKI Motohiro

> @@ -1277,14 +1279,10 @@ static long do_mbind(unsigned long start, unsigned 
> long len,
>   if (!err) {
>   int nr_failed = 0;
>  
> - if (!list_empty()) {
> - WARN_ON_ONCE(flags & MPOL_MF_LAZY);
> - nr_failed = migrate_pages(, new_vma_page,
> + WARN_ON_ONCE(flags & MPOL_MF_LAZY);

???
MPOL_MF_LAZY always output warn? It seems really insane.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 17/22] x86, ACPI, numa, ia64: split SLIT handling out

2013-04-05 Thread Yinghai Lu

On Fri, Apr 5, 2013 at 2:54 PM, Tony Luck  wrote:
> On Thu, Apr 4, 2013 at 4:46 PM, Yinghai Lu  wrote:
>> It should not break ia64 by replacing acpi_numa_init with
>> acpi_numa_init_srat/acpi_numa_init_slit/acpi_num_arch_fixup.
>
> You are right - it doesn't break ia64.  All my test configs still
> build.  Machines both with and without NUMA still boot and
> nothing strange happens.
>
> Tested-by: Tony Luck 

Great, Thanks a lot for testing them.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] x86: kernel base offset ASLR

2013-04-05 Thread Julien Tinnes

On Fri, Apr 5, 2013 at 3:08 PM, H. Peter Anvin  wrote:
> On 04/05/2013 03:06 PM, Julien Tinnes wrote:
>>
>> Speaking of IDT, and to capture some off-thread discussion here, we
>> should remember that the "SGDT" and "SIDT" instructions aren't
>> privileged on x86, so user-land can leak these out without any way for
>> the kernel to intercept that.
>>
>> Adding their own random offsets to these two tables would be very
>> useful. This could be done in a later patchset of course.
>>
>
> Yes, if the GDT or IDT position is at all correlated to the kernel
> position this is pointless.

Let's say it's less useful :) Remote attacks and from-inside-a-VM
attack would still be mitigated.

Julien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] x86: kernel base offset ASLR

2013-04-05 Thread H. Peter Anvin

On 04/05/2013 03:06 PM, Julien Tinnes wrote:
> 
> Speaking of IDT, and to capture some off-thread discussion here, we
> should remember that the "SGDT" and "SIDT" instructions aren't
> privileged on x86, so user-land can leak these out without any way for
> the kernel to intercept that.
> 
> Adding their own random offsets to these two tables would be very
> useful. This could be done in a later patchset of course.
> 

Yes, if the GDT or IDT position is at all correlated to the kernel
position this is pointless.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] x86: kernel base offset ASLR

2013-04-05 Thread Julien Tinnes

On Fri, Apr 5, 2013 at 12:11 AM, Ingo Molnar  wrote:
>
> * Kees Cook  wrote:
>
>> This creates CONFIG_RANDOMIZE_BASE, so that the base offset of the kernel
>> can be randomized at boot.
>>
>> This makes kernel vulnerabilities harder to reliably exploit, especially
>> from remote attacks and local processes in seccomp containers. Keeping
>> the location of kernel addresses secret becomes very important when using
>> this feature, so enabling kptr_restrict and dmesg_restrict is recommended.
>> Besides direct address leaks, several other attacks are possible to bypass
>> this on local systems, including cache timing[1]. However, the benefits of
>> this feature in certain environments exceed the perceived weaknesses[2].
>>
>> An added security benefit is making the IDT read-only.
>>
>> Current entropy is low, since the kernel has basically a minimum 2MB
>> alignment and has been built with -2G memory addressing. As a result,
>> available entropy will be 8 bits in the best case. The e820 entries on
>> a given system may further limit the available memory.
>>
>> This feature is presently incompatible with hibernation.
>>
>> When built into the kernel, the "noaslr" kernel command line option will
>> disable the feature.
>>
>> Heavily based on work by Dan Rosenberg[3] and Neill Clift.
>>
>> [1] 
>> http://www.internetsociety.org/sites/default/files/Practical%20Timing%20Side%20Channel%20Attacks%20Against%20Kernel%20Space%20ASLR.pdf
>> [2] http://forums.grsecurity.net/viewtopic.php?f=7=3367
>> [3] http://lkml.indiana.edu/hypermail/linux/kernel/1105.3/index.html#00520
>>
>> Signed-off-by: Kees Cook 
>> Cc: Eric Northup 
>> ---
>>  Documentation/kernel-parameters.txt  |4 +
>>  arch/x86/Kconfig |   51 +++--
>>  arch/x86/Makefile|3 +
>>  arch/x86/boot/compressed/head_32.S   |   21 +-
>>  arch/x86/boot/compressed/head_64.S   |  135 
>> --
>>  arch/x86/include/asm/fixmap.h|4 +
>>  arch/x86/include/asm/page_32_types.h |2 +
>>  arch/x86/include/asm/page_64_types.h |4 -
>>  arch/x86/include/asm/page_types.h|4 +
>>  arch/x86/kernel/asm-offsets.c|   14 
>>  arch/x86/kernel/setup.c  |   24 ++
>>  arch/x86/kernel/traps.c  |6 ++
>>  12 files changed, 251 insertions(+), 21 deletions(-)
>
> Before going into the details, I have a structural request: could you
> please further increase the granularity of the patch-set?
>
> In particular I'd suggest introducing a helper Kconfig bool that makes the
> IDT readonly - instead of using CONFIG_RANDOMIZE_BASE for that.
> CONFIG_RANDOMIZE_BASE can then select this helper Kconfig switch.
>
> Users could also select a readonly-IDT - even if they don't want a
> randomized kernel.

Speaking of IDT, and to capture some off-thread discussion here, we
should remember that the "SGDT" and "SIDT" instructions aren't
privileged on x86, so user-land can leak these out without any way for
the kernel to intercept that.

Adding their own random offsets to these two tables would be very
useful. This could be done in a later patchset of course.

Julien
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] staging/adt7316 Fix some 'interesting' string operations

2013-04-05 Thread Greg Kroah-Hartman

On Thu, Apr 04, 2013 at 02:37:24PM -0700, Luck, Tony wrote:
> Calling memcmp() to check the value of the first byte in a string is overkill.
> Just use buf[0] == '1' or buf[0] != '1' as appropriate.
> 
> Signed-off-by: Tony Luck 

I'll let Jonathan take this through his tree which eventually makes it
to mine.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: Android: looping issue, need break when get value firstly.

2013-04-05 Thread Greg KH

On Fri, Apr 05, 2013 at 04:05:25PM +0800, Chen Gang wrote:
> 
>   need break when 'target_thread' get value, firstly.
> 
> 'tmp' is a stack (thread->transaction_stack),
> if 'proc' was the same between child node and parent node,
> the child would have higher priority than parent.

Are you sure about this?

have you tested it?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] drivers/staging/silicom/bp_proc.c removal

2013-04-05 Thread Greg Kroah-Hartman

On Fri, Apr 05, 2013 at 06:48:02PM +0100, Al Viro wrote:
> On Fri, Apr 05, 2013 at 10:31:56AM -0700, Puff . wrote:
> > No reason not to.
> 
> Done (in vfs.git#for-next, should propagate in a few)

Thanks for doing that, you beat me to it.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] Android Binder IPC Fixes

2013-04-05 Thread Greg KH

On Thu, Apr 04, 2013 at 01:32:30PM +0100, Serban Constantinescu wrote:
> Hi all,
> 
> This set of patches will clean-up and fix some of the issues that arise
> with the current binder interface when moving to a 64bit kernel. All these
> changes will not affect the existing 32bit Android interface and are meant
> to stand as the base for the 64bit binder compat layer.
> 
> This patch set has been successfully tested on 32bit platforms(ARMv7 VExpress)
> and 64bit platforms(ARMv8 RTSM) running a 32bit Android userspace and an in
> kernel binder compat layer.

I need some acks from some of the Android developers before I can take
this.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Jacob Pan

On Fri, 05 Apr 2013 14:26:35 -0700
Joe Perches  wrote:

> > +/* in the order of enum rapl_primitives */
> > +static struct rapl_primitive_info rpi[] = {  
> 
> const?
I do need to override one entry for a special case. The hardware uses a
different bit location for the same lock functionality.

The other comments are well taken,

-- 
Thanks,

Jacob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 17/22] x86, ACPI, numa, ia64: split SLIT handling out

2013-04-05 Thread Tony Luck

On Thu, Apr 4, 2013 at 4:46 PM, Yinghai Lu  wrote:
> It should not break ia64 by replacing acpi_numa_init with
> acpi_numa_init_srat/acpi_numa_init_slit/acpi_num_arch_fixup.

You are right - it doesn't break ia64.  All my test configs still
build.  Machines both with and without NUMA still boot and
nothing strange happens.

Tested-by: Tony Luck 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Greg KH

On Fri, Apr 05, 2013 at 02:33:40PM -0700, Jacob Pan wrote:
> On Fri, 5 Apr 2013 13:23:09 -0700
> Greg KH  wrote:
> 
> > On Wed, Apr 03, 2013 at 10:35:51AM -0700, Jacob Pan wrote:
> > > On Wed, 3 Apr 2013 09:35:09 -0700
> > > Greg KH  wrote:
> > > 
> > > > On Tue, Apr 02, 2013 at 09:48:18PM -0700, Jacob Pan wrote:
> > > > > > Let's step back and start over, what exactly are you trying to
> > > > > > tell userspace?  What data do you have that you need to
> > > > > > express to it?  How do you want userspace to see/use it?
> > > > > 
> > > > > It is a good idea to step back and let me explain what I wanted
> > > > > to do here for userspace.
> > > > > 
> > > > > I have two kinds of applications that might use this driver.
> > > > > 1. simple use case where user sets a power limit for a RAPL
> > > > > domain. e.g. set graphics unit power limit to 7w
> > > > > 2. advanced use case where use can do fine tuning on top of
> > > > > simple power limit,e.g. the dynamic response parameters of
> > > > > power control logic, event notifications, etc.
> > > > > 
> > > > > For #1, this driver register with the abstract generic thermal
> > > > > layer (/sys/class/thermal) and presents itself as a set of
> > > > > cooling devices with a single knob per domain for power limits.
> > > > > root@chromoly:/sys/class/thermal/cooling_device15# echo 7000 >
> > > > > cur_state 
> > > > 
> > > > Great, how about submitting that functionality as patch 1 of your
> > > > series?  That seems like a very "normal" thermal driver, right?
> > > > 
> > > yes, that would be a normal thermal cooling device driver. I will do
> > > that first. Thanks for the suggestion.
> > 
> > Do that first, get it merged, and then let's work on the second part.
> > The patch for that will be much more obvious as to what you are
> > attempting to do.
> > 
> Sorry I was too busy to work on v2 before seeing this. I agree I need
> to simplify the interface, I just need to come up with a more
> intelligent way to abstract that and do the best guesses for the user.
> Hopefully, v2 will serve as a confirmation on the comments I got from
> v1. i.e. kobject->struct device, removed dependencies on sysfs internal
> data struct, etc.

I'm not going to review that part of the code, sorry, as it's about to
be ripped out anyway :)

> > > >  Perhaps the thermal interface could be expanded to provide
> > > > more functionality that you need?
> > > yes, some of them such as limits. But not all the data in the list
> > > above are suitable for thermal interface. That is why I am trying to
> > > balance between abstracted generic data and RAPL specific data while
> > > still allow linking between the two.
> > 
> > What is not in the existing interface?  And as this is a thermal
> > device, why can't you add them there?
> existing interface has only cur_state and max_state, I have been
> working with Rui (thermal maintainer) on adding more knobs for cooling
> devices, such as limit low, limit high, event control. I believe they
> can all be added.

Great, then you will not need any of the driver-specific sysfs files,
struct device usage, or kobject mess, so your code should be a lot
smaller.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernel BUG at kernel/smpboot.c:134!

2013-04-05 Thread Dave Hansen

Hey Thomas,

I seem to be running in to smpboot_thread_fn()'s

BUG_ON(td->cpu != smp_processor_id());

pretty regularly, both at boot and if I boot with maxcpus=x and then
online the CPUs from sysfs after boot.  It's a 160-logical-cpu system,
so it's quite a beast.  I _seem_ to be hitting it more often at higher
cpu counts, but it doesn't trigger on bringing up a particular CPU as
far as I can tell.

This is on a pull of mainline from today, e0a77f263.  Any ideas?

> [  790.223270] [ cut here ]
> [  790.223966] kernel BUG at kernel/smpboot.c:134!
> [  790.224739] invalid opcode:  [#1] SMP 
> [  790.225671] Modules linked in:
> [  790.226428] CPU 81 
> [  790.226909] Pid: 3909, comm: migration/135 Tainted: GW
> 3.9.0-rc5-00184-gb6a9b7f-dirty #118 FUJITSU-SV PRIMEQUEST 1800E2/SB
> [  790.228775] RIP: 0010:[]  [] 
> smpboot_thread_fn+0x258/0x280
> [  790.230205] RSP: 0018:88bfef9c1e08  EFLAGS: 00010202
> [  790.231090] RAX: 0051 RBX: 88bfefb82000 RCX: 
> b888
> [  790.231653] RDX: 88bfef9c1fd8 RSI: 881fff00 RDI: 
> 0087
> [  790.232085] RBP: 88bfef9c1e38 R08: 0001 R09: 
> 
> [  790.232850] R10: 0018 R11:  R12: 
> 88bfec9e22e0
> [  790.233561] R13: 81e587a0 R14: 88bfec9e22e0 R15: 
> 
> [  790.234004] FS:  () GS:881fff00() 
> knlGS:
> [  790.234918] CS:  0010 DS:  ES:  CR0: 8005003b
> [  790.235602] CR2: 7fa89a333c62 CR3: 01e0b000 CR4: 
> 07e0
> [  790.236110] DR0:  DR1:  DR2: 
> 
> [  790.236584] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  790.237329] Process migration/135 (pid: 3909, threadinfo 88bfef9c, 
> task 88bfec9e22e0)
> [  790.238321] Stack:
> [  790.238882]  88bfef9c1e38  88ffef421cc0 
> 88bfef9c1ec0
> [  790.245415]  88bfefb82000 8110bc90 88bfef9c1f48 
> 810ff1df
> [  790.250755]  0001 0087 88bfefb82000 
> 
> [  790.253365] Call Trace:
> [  790.254121]  [] ? __smpboot_create_thread+0x180/0x180
> [  790.255428]  [] kthread+0xef/0x100
> [  790.256071]  [] ? wait_for_completion+0x124/0x180
> [  790.256697]  [] ? __init_kthread_worker+0x80/0x80
> [  790.257325]  [] ret_from_fork+0x7c/0xb0
> [  790.258233]  [] ? __init_kthread_worker+0x80/0x80
> [  790.258942] Code: ef 3d 01 01 48 89 df e8 87 b0 16 00 48 83 05 67 ef 3d 01 
> 01 48 83 c4 10 31 c0 5b 41 5c 41 5d 41 5e 5d c3 48 83 05 90 ef 3d 01 01 <0f> 
> 0b 48 83 05 96 ef 3d 01 01 48 83 05 56 ef 3d 01 01 0f 0b 48 
> [  790.276178] RIP  [] smpboot_thread_fn+0x258/0x280
> [  790.276735]  RSP 
> [  790.278348] ---[ end trace 84baa2bee1434240 ]---


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: libata: how to duplicate the exact xfer_mask?

2013-04-05 Thread Chris Frey

On Fri, Apr 05, 2013 at 01:07:25AM -0400, Chris Frey wrote:
> I'd like to duplicate these settings, so that it does not timeout, but
> when I use:
> 
>   libata.force=ata1:udma/44,ata1:pio4

I figured out what I was doing wrong.  It should be:

libata.force=1:udma/44,1:pio4

- Chris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Greg Kroah-Hartman

On Fri, Apr 05, 2013 at 02:26:35PM -0700, Joe Perches wrote:
> On Fri, 2013-04-05 at 14:02 -0700, Jacob Pan wrote:
> > +static ssize_t store_event_control(struct device *dev,
> > +   struct device_attribute *attr,
> > +   const char *buf,
> > +   size_t size)
> > +{
> > +   struct rapl_domain *rd = dev_get_drvdata(dev);
> > +   unsigned int efd, new_threshold;
> > +   struct file *efile = NULL;
> > +   int ret = 0;
> > +   int prim;
> > +   struct rapl_event *ep;
> > +   u64 val;
> > +   char cmd[MAX_PRIM_NAME];
> > +
> > +   if (sscanf(buf, "%u %s %u", , cmd, _threshold) != 3)
> > +   return -EINVAL;
> 
> This sscanf looks fragile.
> 
> buf = "1 some_really_long_name_longer_than_MAX_PRIM_NAME 2"
> 
> stack overrun.
> 
> Where does buf come from?

It comes from the sysfs core, which limits it to a PAGE_SIZE.  But yes,
it does look fragile, and flat out wrong, but I'm not going into that
just yet, as that whole api should just be deleted for now.

greg k-h

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Jacob Pan

On Fri, 5 Apr 2013 13:23:09 -0700
Greg KH  wrote:

> On Wed, Apr 03, 2013 at 10:35:51AM -0700, Jacob Pan wrote:
> > On Wed, 3 Apr 2013 09:35:09 -0700
> > Greg KH  wrote:
> > 
> > > On Tue, Apr 02, 2013 at 09:48:18PM -0700, Jacob Pan wrote:
> > > > > Let's step back and start over, what exactly are you trying to
> > > > > tell userspace?  What data do you have that you need to
> > > > > express to it?  How do you want userspace to see/use it?
> > > > 
> > > > It is a good idea to step back and let me explain what I wanted
> > > > to do here for userspace.
> > > > 
> > > > I have two kinds of applications that might use this driver.
> > > > 1. simple use case where user sets a power limit for a RAPL
> > > > domain. e.g. set graphics unit power limit to 7w
> > > > 2. advanced use case where use can do fine tuning on top of
> > > > simple power limit,e.g. the dynamic response parameters of
> > > > power control logic, event notifications, etc.
> > > > 
> > > > For #1, this driver register with the abstract generic thermal
> > > > layer (/sys/class/thermal) and presents itself as a set of
> > > > cooling devices with a single knob per domain for power limits.
> > > > root@chromoly:/sys/class/thermal/cooling_device15# echo 7000 >
> > > > cur_state 
> > > 
> > > Great, how about submitting that functionality as patch 1 of your
> > > series?  That seems like a very "normal" thermal driver, right?
> > > 
> > yes, that would be a normal thermal cooling device driver. I will do
> > that first. Thanks for the suggestion.
> 
> Do that first, get it merged, and then let's work on the second part.
> The patch for that will be much more obvious as to what you are
> attempting to do.
> 
Sorry I was too busy to work on v2 before seeing this. I agree I need
to simplify the interface, I just need to come up with a more
intelligent way to abstract that and do the best guesses for the user.
Hopefully, v2 will serve as a confirmation on the comments I got from
v1. i.e. kobject->struct device, removed dependencies on sysfs internal
data struct, etc.

> > >  Perhaps the thermal interface could be expanded to provide
> > > more functionality that you need?
> > yes, some of them such as limits. But not all the data in the list
> > above are suitable for thermal interface. That is why I am trying to
> > balance between abstracted generic data and RAPL specific data while
> > still allow linking between the two.
> 
> What is not in the existing interface?  And as this is a thermal
> device, why can't you add them there?
existing interface has only cur_state and max_state, I have been
working with Rui (thermal maintainer) on adding more knobs for cooling
devices, such as limit low, limit high, event control. I believe they
can all be added.

What is not appropriate for thermal interface are the things like
energy counters, accumulated throttle time, time constants that used by
the internal control algorithm.
> 
> > The way I envisioned how a thermal/power management app would use
> > is: 1. go through generic thermal layer sysfs and find available
> > RAPL domains
> > 2. if the app wants to do more fine grained control, it follows the
> > device symlink to locate the RAPL domain specific sysfs area.
> 
> So any application will have to know all of the device-specific
> attributes?  That totally defeats the purpose of a generic api that
> the kernel is providing.  You are creating device-specific apis that
> will not work over the long-run (i.e. next 5-10 years.)  Please don't
> do that unless you have exhausted _all_ other alternatives.
> 
I agree we should improve the abstraction to avoid using device
specific attributes. Unfortunately, on the other side power tuning tends
to be very platform specific.
> So, get your first driver accepted, using the in-kernel thermal api,
> and then, if you still feel you wish to do device-specific
> extensions, we can discuss that then.
> 
Agreed. I just want to make sure the rapl class and per domain 'struct
device' layout in v2 is what you suggested. I avoided using raw kobjects
and sysfs internals.

The challenge for the simplified/abstract model is that the driver
would have to guess what is best for the user.
e.g. when user selects power limit of 8200mw for the core power plane.
Besides setting that limit, the driver will decide the following for
the user:
 - dynamics of the control, rise time, overshoot, steady state error
 - whether or not allow P/T states to go below OS request value
 - correlation with instantaneous power limit

I believe it is all doable, just need to strike a balance and for
different platforms. But I do believe it is a better interface for most
generic applications.

> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe
> platform-driver-x86" in the body of a message to
> majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

[Jacob Pan]

-- 
Thanks,

Jacob
--
To unsubscribe from this list: send the line

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Michael R. Hines

Well, I have the "is_dup_page()" commented out...when RDMA is 
activated.


Is there something else in QEMU that could be touching the page that I 
don't know about?


- Michael


On 04/05/2013 05:03 PM, Roland Dreier wrote:

On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines
 wrote:

Sorry, I was wrong. ignore the comments about cgroups. That's still broken.
(i.e. trying to register RDMA memory while using a cgroup swap limit cause
the process get killed).

But the GIFT flag patch works (my understanding is that GIFT flag allows the
adapter to transmit stale memory information, it does not have anything to
do with cgroups specifically).

The point of the GIFT patch is to avoid triggering copy-on-write so
that memory doesn't blow up during migration.  If that doesn't work
then there's no point to the patch.

  - R.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Joe Perches

On Fri, 2013-04-05 at 14:02 -0700, Jacob Pan wrote:
> RAPL(Running Average Power Limit) interface provides platform software
> with the ability to monitor, control, and get notifications on SOC
> power consumptions. Since its first appearance on Sandy Bridge, more
> features have being added to extend its usage. In RAPL, platforms are
> divided into domains for fine grained control. These domains include
> package, DRAM controller, CPU core (Power Plane 0), graphics uncore
> (power plane 1), etc.

Some more trivia...

> diff --git a/drivers/platform/x86/intel_rapl.c 
> b/drivers/platform/x86/intel_rapl.c
[]
> +/* in the order of enum rapl_primitives */
> +static struct rapl_primitive_info rpi[] = {

const?

> + /* name, mask, shift, msr index, unit divisor*/
> + PRIMITIVE_INFO_INIT(energy, ENERGY_STATUS_MASK, 0,
> + RAPL_DOMAIN_MSR_STATUS, ENERGY_UNIT,
> + RAPL_PRIMITIVE_EVENT_CAP),

> +static int rapl_set_cur_state(struct thermal_cooling_device *cdev,
> + unsigned long state)

I think most of this would look nicer if you
adopted the net style of aligning multi-line
statements to the appropriate open parenthesis.

> +static ssize_t store_event_control(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf,
> + size_t size)
> +{
> + struct rapl_domain *rd = dev_get_drvdata(dev);
> + unsigned int efd, new_threshold;
> + struct file *efile = NULL;
> + int ret = 0;
> + int prim;
> + struct rapl_event *ep;
> + u64 val;
> + char cmd[MAX_PRIM_NAME];
> +
> + if (sscanf(buf, "%u %s %u", , cmd, _threshold) != 3)
> + return -EINVAL;

This sscanf looks fragile.

buf = "1 some_really_long_name_longer_than_MAX_PRIM_NAME 2"

stack overrun.

Where does buf come from?

> +#define primitive_show_fn(n) \
> +
> +#define primitive_store_fn(n)
> \

Can't both of these be consolidated into a 2 functions using
offset_of and/or adding a string argument?

> +static struct attribute *all_attrs[] = {

const?

> + _attr_energy.attr,


> +static void rapl_update_domain_data(void)
> +{
> + int i, j;
> + u64 val;
> + bool xlate;
> +
> + for (i = 0; i < rg_data.nr_domains; i++) {
> + /* exclude non-raw primitives */
> + for (j = 0; j < NR_RAW_PRIMITIVES; j++)
> + xlate = !!(rpi[j].unit);

You don't really need the !!.  The compiler does that.

> +/* for global rapl data */
> +static struct class_attribute rapl_class_attrs[] = {

const?

> + GLOBAL_CLASS_RO_ATTR(energy_unit_divisor),


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/4] xen/arm: introduce xen_early_init, use PSCI on xen

2013-04-05 Thread Nicolas Pitre

On Fri, 5 Apr 2013, Rob Herring wrote:

> On 04/05/2013 02:36 PM, Nicolas Pitre wrote:
> > On Fri, 5 Apr 2013, Stefano Stabellini wrote:
> > 
> >> This is what happens:
> >>
> >> - No Xen
> >> Xen is not running on the platform and a Xen hypervisor node is not
> >> available on device tree.
> >> Everything keeps working seamlessly, this patch doesn't change anything.
> >>
> >> - we are running on Xen
> >> Xen is running on the platform, we are running as a guest on Xen and an
> >> hypervisor node is available on device tree.
> >> Let's also assume that there aren't any "arm,cci" compatible nodes on
> >> device tree because Xen wouldn't export this kind of information to any
> >> guests right now. Therefore PSCI should be used to boot secondary cpus.
> >> Because the versatile express machine sets smp_init to
> >> vexpress_smp_init_ops, vexpress_smp_init_ops will be called.
> >> vexpress_smp_init_ops sets smp_ops to vexpress_smp_ops, that *break*
> >> Xen.
> > 
> > OK I see.
> > 
> >> With this patch, xen_smp_init will be called instead of
> >> vexpress_smp_init_ops, and smp_ops will be set to psci_smp_ops,
> >> therefore *unbreaking* Xen.
> > 
> > However that breaks MCPM.
> 
> You mean on bare metal, right? For the bare metal, "xen,xen" property
> would not be present and xen_smp_init is not used. So the vexpress MCPM
> ops will be used. Aren't Dom0 cpu's basically virtual cpus? If Xen ever
> needs the MCPM support, the Xen hook itself can figure out whether to
> use MCPM support.

Well, if Xen has its own mdesc distinct from the VExpress one then 
things
are indeed fine.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: bcache: BUG in fuzz testing without devices

2013-04-05 Thread Kent Overstreet

On Tue, Apr 02, 2013 at 12:21:02PM -0400, Sasha Levin wrote:
> Hi all,
> 
> It seems that trying to fuzz bcache without any devices triggers a BUG:
> 
> That BUG looks very intentional there, any reason for adding it instead of 
> exiting
> if there aren't any devices?

The fuzz tester is meant to stress test some of the btree code - it's a
purely in memory test. But it looks like it's rotted - thanks for the
bug report, I'll disable it for now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-zigbee-devel] [PATCH] mac802154: Keep track of the channel when changed

2013-04-05 Thread Alan Ott

On 04/05/2013 05:05 PM, Werner Almesberger wrote:
> Alan Ott wrote:
>> Prevent set_channel() from getting called every time a packet is sent. This
>> looks like it was an oversight.
> at86rf230.c and derivatives avoid this problem by setting
> phy->current_* in the *_channel function.
>
> But I'd agree that it's nicer to do this in one place, not in
> every driver.
>
> In case a driver had a weird failure mode in which it leaves the
> original channel but only makes it halfway to the new channel, it
> could still set phy->current_* and return an error. So there's no
> loss of functionality with your change.

Hmm... I just noticed that mib.c does the same thing (and doesn't set
phy->current_*). I'll need to fix that one too (and resubmit). :(

Alan.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 05/10] migrate: add hugepage migration code to migrate_pages()

2013-04-05 Thread KOSAKI Motohiro

(3/22/13 4:23 PM), Naoya Horiguchi wrote:
> This patch extends check_range() to handle vma with VM_HUGETLB set.
> We will be able to migrate hugepage with migrate_pages(2) after
> applying the enablement patch which comes later in this series.
> 
> Note that for larger hugepages (covered by pud entries, 1GB for
> x86_64 for example), we simply skip it now.

check_range() has largely duplication with mm_walk and it is quirk subset.
Instead of, could you replace them to mm_walk and enhance/cleanup mm_walk?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] staging: dgrp: implement error handling in dgrp_create_class_sysfs_files()

2013-04-05 Thread Alexey Khoroshilov

There is no any error handling in dgrp_create_class_sysfs_files().
The patch adds code to check return values and propagate them to 
dgrp_init_module().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov 
---
 drivers/staging/dgrp/dgrp_common.h |2 +-
 drivers/staging/dgrp/dgrp_driver.c |6 +-
 drivers/staging/dgrp/dgrp_sysfs.c  |   30 +-
 3 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/dgrp/dgrp_common.h 
b/drivers/staging/dgrp/dgrp_common.h
index 0583fe9..2832b8e 100644
--- a/drivers/staging/dgrp/dgrp_common.h
+++ b/drivers/staging/dgrp/dgrp_common.h
@@ -66,7 +66,7 @@ extern void dgrp_register_dpa_hook(struct proc_dir_entry *de);
 extern void dgrp_dpa_data(struct nd_struct *, int, u8 *, int);
 
 /* from dgrp_sysfs.c */
-extern void dgrp_create_class_sysfs_files(void);
+extern int dgrp_create_class_sysfs_files(void);
 extern void dgrp_remove_class_sysfs_files(void);
 
 extern void dgrp_create_node_class_sysfs_files(struct nd_struct *nd);
diff --git a/drivers/staging/dgrp/dgrp_driver.c 
b/drivers/staging/dgrp/dgrp_driver.c
index aa26258..e456dc6c 100644
--- a/drivers/staging/dgrp/dgrp_driver.c
+++ b/drivers/staging/dgrp/dgrp_driver.c
@@ -66,6 +66,8 @@ module_exit(dgrp_cleanup_module);
  */
 static int dgrp_init_module(void)
 {
+   int ret;
+
INIT_LIST_HEAD(_struct_list);
 
spin_lock_init(_poll_data.poll_lock);
@@ -74,7 +76,9 @@ static int dgrp_init_module(void)
dgrp_poll_data.timer.function = dgrp_poll_handler;
dgrp_poll_data.timer.data = (unsigned long) _poll_data;
 
-   dgrp_create_class_sysfs_files();
+   ret = dgrp_create_class_sysfs_files();
+   if (ret)
+   return ret;
 
dgrp_register_proc();
 
diff --git a/drivers/staging/dgrp/dgrp_sysfs.c 
b/drivers/staging/dgrp/dgrp_sysfs.c
index be179ad..7d1b36d 100644
--- a/drivers/staging/dgrp/dgrp_sysfs.c
+++ b/drivers/staging/dgrp/dgrp_sysfs.c
@@ -85,30 +85,50 @@ static struct attribute_group 
dgrp_global_settings_attribute_group = {
 
 
 
-void dgrp_create_class_sysfs_files(void)
+int dgrp_create_class_sysfs_files(void)
 {
int ret = 0;
int max_majors = 1U << (32 - MINORBITS);
 
dgrp_class = class_create(THIS_MODULE, "digi_realport");
+   if (IS_ERR(dgrp_class))
+   return PTR_ERR(dgrp_class);
ret = class_create_file(dgrp_class, _attr_driver_version);
+   if (ret)
+   goto err_class;
 
dgrp_class_global_settings_dev = device_create(dgrp_class, NULL,
MKDEV(0, max_majors + 1), NULL, "driver_settings");
-
+   if (IS_ERR(dgrp_class_global_settings_dev)) {
+   ret = PTR_ERR(dgrp_class_global_settings_dev);
+   goto err_file;
+   }
ret = sysfs_create_group(_class_global_settings_dev->kobj,
_global_settings_attribute_group);
if (ret) {
pr_alert("%s: failed to create sysfs global settings device 
attributes.\n",
__func__);
-   sysfs_remove_group(_class_global_settings_dev->kobj,
-   _global_settings_attribute_group);
-   return;
+   goto err_dev1;
}
 
dgrp_class_nodes_dev = device_create(dgrp_class, NULL,
MKDEV(0, max_majors + 2), NULL, "nodes");
+   if (IS_ERR(dgrp_class_nodes_dev)) {
+   ret = PTR_ERR(dgrp_class_nodes_dev);
+   goto err_group;
+   }
 
+   return 0;
+err_group:
+   sysfs_remove_group(_class_global_settings_dev->kobj,
+   _global_settings_attribute_group);
+err_dev1:
+   device_destroy(dgrp_class, MKDEV(0, max_majors + 1));
+err_file:
+   class_remove_file(dgrp_class, _attr_driver_version);
+err_class:
+   class_destroy(dgrp_class);
+   return ret;
 }
 
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/9] arm: mvebu: Align the internal registers virtual base to support LPAE

2013-04-05 Thread Gregory CLEMENT

On 04/05/2013 10:50 PM, Arnd Bergmann wrote:
> On Friday 05 April 2013, Gregory CLEMENT wrote:
>> From: Lior Amsalem 
>>
>> In order to be able to support he LPAE, the internal registers virtual
>> base must be aligned to 2MB.
>>
>> Signed-off-by: Lior Amsalem 
>> Signed-off-by: Gregory CLEMENT 
> 
> This is a surprising limitation. Can you extend the above text to go into more
> detail where that alignment requirement comes from?
> 

The explanation I had was that in LPAE section size is 2MB, in earlyprintk we 
map
the internal registers and it must be section aligned.

>   Arnd
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net,1/2] hyperv: Fix a kernel warning from netvsc_linkstatus_callback()

2013-04-05 Thread Haiyang Zhang

The warning about local_bh_enable inside IRQ happens when disconnecting a
virtual NIC.

The reason for the warning is -- netif_tx_disable() is called when the NIC
is disconnected. And it's called within irq context. netif_tx_disable() calls
local_bh_enable() which displays warning if in irq.

The fix is to remove the unnecessary netif_tx_disable & wake_queue() in the
netvsc_linkstatus_callback().

Reported-by: Richard Genoud 
Tested-by: Long Li 
Tested-by: Richard Genoud 
Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 

---
 drivers/net/hyperv/netvsc_drv.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 5f85205..8341b62 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -241,13 +241,11 @@ void netvsc_linkstatus_callback(struct hv_device 
*device_obj,
 
if (status == 1) {
netif_carrier_on(net);
-   netif_wake_queue(net);
ndev_ctx = netdev_priv(net);
schedule_delayed_work(_ctx->dwork, 0);
schedule_delayed_work(_ctx->dwork, msecs_to_jiffies(20));
} else {
netif_carrier_off(net);
-   netif_tx_disable(net);
}
 }
 
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net,2/2] hyperv: Fix RNDIS send_completion code path

2013-04-05 Thread Haiyang Zhang

In some cases, the VM_PKT_COMP message can arrive later than RNDIS completion
message, which will free the packet memory. This may cause panic due to access
to freed memory in netvsc_send_completion().

This patch fixes this problem by removing rndis_filter_send_request_completion()
from the code path. The function was a no-op.

Reported-by: Long Li 
Tested-by: Long Li 
Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 

---
 drivers/net/hyperv/netvsc.c   |   17 -
 drivers/net/hyperv/rndis_filter.c |   14 +-
 2 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 1cd7748..f5f0f09 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -470,8 +470,10 @@ static void netvsc_send_completion(struct hv_device 
*device,
packet->trans_id;
 
/* Notify the layer above us */
-   nvsc_packet->completion.send.send_completion(
-   nvsc_packet->completion.send.send_completion_ctx);
+   if (nvsc_packet)
+   nvsc_packet->completion.send.send_completion(
+   nvsc_packet->completion.send.
+   send_completion_ctx);
 
num_outstanding_sends =
atomic_dec_return(_device->num_outstanding_sends);
@@ -498,6 +500,7 @@ int netvsc_send(struct hv_device *device,
int ret = 0;
struct nvsp_message sendMessage;
struct net_device *ndev;
+   u64 req_id;
 
net_device = get_outbound_net_device(device);
if (!net_device)
@@ -518,20 +521,24 @@ int netvsc_send(struct hv_device *device,
0x;
sendMessage.msg.v1_msg.send_rndis_pkt.send_buf_section_size = 0;
 
+   if (packet->completion.send.send_completion)
+   req_id = (u64)packet;
+   else
+   req_id = 0;
+
if (packet->page_buf_cnt) {
ret = vmbus_sendpacket_pagebuffer(device->channel,
  packet->page_buf,
  packet->page_buf_cnt,
  ,
  sizeof(struct nvsp_message),
- (unsigned long)packet);
+ req_id);
} else {
ret = vmbus_sendpacket(device->channel, ,
sizeof(struct nvsp_message),
-   (unsigned long)packet,
+   req_id,
VM_PKT_DATA_INBAND,
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
-
}
 
if (ret == 0) {
diff --git a/drivers/net/hyperv/rndis_filter.c 
b/drivers/net/hyperv/rndis_filter.c
index 2b657d4..0775f0a 100644
--- a/drivers/net/hyperv/rndis_filter.c
+++ b/drivers/net/hyperv/rndis_filter.c
@@ -61,9 +61,6 @@ struct rndis_request {
 
 static void rndis_filter_send_completion(void *ctx);
 
-static void rndis_filter_send_request_completion(void *ctx);
-
-
 
 static struct rndis_device *get_rndis_device(void)
 {
@@ -241,10 +238,7 @@ static int rndis_filter_send_request(struct rndis_device 
*dev,
packet->page_buf[0].len;
}
 
-   packet->completion.send.send_completion_ctx = req;/* packet; */
-   packet->completion.send.send_completion =
-   rndis_filter_send_request_completion;
-   packet->completion.send.send_completion_tid = (unsigned long)dev;
+   packet->completion.send.send_completion = NULL;
 
ret = netvsc_send(dev->net_dev->dev, packet);
return ret;
@@ -999,9 +993,3 @@ static void rndis_filter_send_completion(void *ctx)
/* Pass it back to the original handler */
filter_pkt->completion(filter_pkt->completion_ctx);
 }
-
-
-static void rndis_filter_send_request_completion(void *ctx)
-{
-   /* Noop */
-}
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 04/10] migrate: clean up migrate_huge_page()

2013-04-05 Thread KOSAKI Motohiro

(3/22/13 4:23 PM), Naoya Horiguchi wrote:
> Due to the previous patch, soft_offline_huge_page() switches to use
> migrate_pages(), and migrate_huge_page() is not used any more.
> So let's remove it.
> 
> Signed-off-by: Naoya Horiguchi 

Acked-by: KOSAKI Motohiro 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 03/10] soft-offline: use migrate_pages() instead of migrate_huge_page()

2013-04-05 Thread KOSAKI Motohiro

(3/27/13 9:00 AM), Michal Hocko wrote:
> On Tue 26-03-13 16:35:35, Naoya Horiguchi wrote:
> [...]
>> The differences is that migrate_huge_page() has one hugepage as an argument,
>> and migrate_pages() has a pagelist with multiple hugepages.
>> I already told this before and I'm not sure it's enough to answer the 
>> question,
>> so I explain another point about why this patch do like it.
> 
> OK, I am blind. It is
> +   list_move(>lru, );
> +   ret = migrate_pages(, new_page, MPOL_MF_MOVE_ALL,
> +   MIGRATE_SYNC, MR_MEMORY_FAILURE);
> 
> which moves it from active_list and so you have to put it back.
> 
>> I think that we must do putback_*pages() for source pages whether migration
>> succeeds or not.
>> But when we call migrate_pages() with a pagelist,
>> the caller can't access to the successfully migrated source pages
>> after migrate_pages() returns, because they are no longer on the pagelist.
>> So putback of the successfully migrated source pages should be done *in*
>> unmap_and_move() and/or unmap_and_move_huge_page().
> 
> If the migration succeeds then the page becomes unused and free after
> its last reference drops. So I do not see any reason to put it back to
> active list and free it right afterwards.
> On the other hand unmap_and_move does the same thing (although page
> reference counting is a bit more complicated in that case) so it would
> be good to keep in sync with regular pages case.

Even if pages are isolated from lists, there are several page count increasing
path. So, putback_pages() close a race when page count != 1.

I'm not sure, but I guess follow_hugepage() can make the same race.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/9] rm: mvebu: Enable LPAE support for Armada XP SoCs

2013-04-05 Thread Gregory CLEMENT

On 04/05/2013 10:46 PM, Arnd Bergmann wrote:
> On Friday 05 April 2013, Gregory CLEMENT wrote:
>> The Armada XP SoCs have LPAE support. This is the second version patch
>> set whixh allow to run kernel on this SoCs with LPAE support.
>>
>> The biggest changes are the conversion of the device tree file to 64
>> bits in order to be able to use more than 4GB of memory (without this
>> the LPAE is pointless).
>>
> 
> The series looks good overall, I've commented on trivial details.

So there is still hope to have this patch set in 3.10 :)

> 
> Also, please use "ARM: mvebu: ..." in the subject rather than the lower-case
> version.

OK I will, but I saw a lot of 'arm' written in lower-case int the subject of 
emails
on the LAKML.
> 
>   Arnd
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Greg Kroah-Hartman

On Fri, Apr 05, 2013 at 02:02:04PM -0700, Jacob Pan wrote:
> RAPL(Running Average Power Limit) interface provides platform software
> with the ability to monitor, control, and get notifications on SOC
> power consumptions. Since its first appearance on Sandy Bridge, more
> features have being added to extend its usage. In RAPL, platforms are
> divided into domains for fine grained control. These domains include
> package, DRAM controller, CPU core (Power Plane 0), graphics uncore
> (power plane 1), etc.
> 
> The purpose of this driver is to expose RAPL for userspace
> consumption. Overall, RAPL fits in the generic thermal layer in
> that platform level power capping and monitoring are mainly used for
> thermal management and thermal layer provides the abstracted interface
> needed to have portable applications.
> 
> Specifically, userspace is presented with per domain cooling device
> with sysfs links to its true device. Although RAPL domain provides many
> parameters for fine tuning, long term power limit is exposed as the
> single knob via cooling device state. Whereas the rest of the
> parameters are still accessible via the linked RAPL class devices.
> This interface allows both simple and advanced use cases.
> 
> Eventfd is used to provide notifications to the userspace. At per domain
> level, use can choose any event capable parameters to register for
> threshold crossing notifications. This is shamelessly "borrowed" from
> cgroup with some trimming/fitting.
> 
> Zhang, Rui's initial RAPL driver was used as a reference and starting
> point. Many thanks.
> https://lkml.org/lkml/2011/5/26/93
> 
> Unlike the patch above, which is mainly for monitoring, this driver
> focus on the control and usability by user applications.
> 
> Signed-off-by: Jacob Pan 

As described in the cover letter response:
NACKed-by: Greg Kroah-Hartman 

please redo this based on those comments.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Linux-zigbee-devel] [PATCH] mac802154: Keep track of the channel when changed

2013-04-05 Thread Werner Almesberger

Alan Ott wrote:
> Prevent set_channel() from getting called every time a packet is sent. This
> looks like it was an oversight.

at86rf230.c and derivatives avoid this problem by setting
phy->current_* in the *_channel function.

But I'd agree that it's nicer to do this in one place, not in
every driver.

In case a driver had a weird failure mode in which it leaves the
original channel but only makes it halfway to the new channel, it
could still set phy->current_* and return an error. So there's no
loss of functionality with your change.

- Werner
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] RAPL (Running Average Power Limit) driver

2013-04-05 Thread Greg Kroah-Hartman

On Fri, Apr 05, 2013 at 02:02:03PM -0700, Jacob Pan wrote:
> Changes since V2:
>   - use 'struct device' instead of raw kobject to represent
> RAPL domains
>   - changed eventfd control interface to use event string
> instead of passing file descriptors that cannot be
> authenticated in sysfs directory
>   - clean ups based on v1 reviews
>   - use kcalloc for arrays
>   - drop dependencies on X86
>   - misc cleanups

Please no, split this up into at least 2 patches, the first being the
"standard" thermal driver, and the rest for the "custom" extensions you
are proposing.

That way the first can get accepted easily, and into 3.10 (hopefully),
so people can use it and not stall everything as we continue to iterate
over the "custom" things.

As is, this driver isn't acceptable, sorry.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Roland Dreier

On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines
 wrote:
> Sorry, I was wrong. ignore the comments about cgroups. That's still broken.
> (i.e. trying to register RDMA memory while using a cgroup swap limit cause
> the process get killed).
>
> But the GIFT flag patch works (my understanding is that GIFT flag allows the
> adapter to transmit stale memory information, it does not have anything to
> do with cgroups specifically).

The point of the GIFT patch is to avoid triggering copy-on-write so
that memory doesn't blow up during migration.  If that doesn't work
then there's no point to the patch.

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] Introduce Intel RAPL cooling device driver

2013-04-05 Thread Jacob Pan

RAPL(Running Average Power Limit) interface provides platform software
with the ability to monitor, control, and get notifications on SOC
power consumptions. Since its first appearance on Sandy Bridge, more
features have being added to extend its usage. In RAPL, platforms are
divided into domains for fine grained control. These domains include
package, DRAM controller, CPU core (Power Plane 0), graphics uncore
(power plane 1), etc.

The purpose of this driver is to expose RAPL for userspace
consumption. Overall, RAPL fits in the generic thermal layer in
that platform level power capping and monitoring are mainly used for
thermal management and thermal layer provides the abstracted interface
needed to have portable applications.

Specifically, userspace is presented with per domain cooling device
with sysfs links to its true device. Although RAPL domain provides many
parameters for fine tuning, long term power limit is exposed as the
single knob via cooling device state. Whereas the rest of the
parameters are still accessible via the linked RAPL class devices.
This interface allows both simple and advanced use cases.

Eventfd is used to provide notifications to the userspace. At per domain
level, use can choose any event capable parameters to register for
threshold crossing notifications. This is shamelessly "borrowed" from
cgroup with some trimming/fitting.

Zhang, Rui's initial RAPL driver was used as a reference and starting
point. Many thanks.
https://lkml.org/lkml/2011/5/26/93

Unlike the patch above, which is mainly for monitoring, this driver
focus on the control and usability by user applications.

Signed-off-by: Jacob Pan 
---
 Documentation/ABI/testing/sysfs-class-intel-rapl |  121 ++
 drivers/platform/x86/Kconfig |9 +
 drivers/platform/x86/Makefile|1 +
 drivers/platform/x86/intel_rapl.c| 1285 ++
 drivers/platform/x86/intel_rapl.h|  244 
 5 files changed, 1660 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-intel-rapl
 create mode 100644 drivers/platform/x86/intel_rapl.c
 create mode 100644 drivers/platform/x86/intel_rapl.h

diff --git a/Documentation/ABI/testing/sysfs-class-intel-rapl 
b/Documentation/ABI/testing/sysfs-class-intel-rapl
new file mode 100644
index 000..5d2dded
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-intel-rapl
@@ -0,0 +1,121 @@
+What:  /sys/class/rapl/polling_freq_hz
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:
+   Frequency in HZ used to poll RAPL date. Only activated when
+   user selects an event that need to be polled or power limit
+   is set for one of the RAPL domains.
+Users: Any thermal/power management applications want to do power 
capping
+   and battery life management on modern Intel SOCs.
+
+What:  /sys/class/rapl/time_unit_divisor
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Indicating time increment size
+
+What:  /sys/class/rapl/energy_unit_divisor
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Indicating energy counter increment size
+
+What:  /sys/class/rapl/power_unit_divisor
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Indicating power unit increment size, used in power limits
+
+What:  /sys/class/rapl/xxx_domain/average_power
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Average power over the current sampling period in milliwatts
+
+What:  /sys/class/rapl/xxx_domain/energy
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Total amount of energy consumed (in jules) since that last time
+   this counter is cleared.
+
+What:  /sys/class/rapl/xxx_domain/event_control
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:For arm event thresholds on selected data. Write to this file
+   in the format of   
+   where command must be one of the event capable file names such
+   as average_power. e.g. if user creates and eventfd=3, the write:
+  "3 average_power 12000"
+   eventfd will be notified when average_power of this domain 
crosses
+   12000mW on either directions.
+
+What:  /sys/class/rapl/xxx_domain/max_power
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Maximum power (mwatt) derived from the electrical spec of
+   this domain
+
+What:  /sys/class/rapl/xxx_domain/min_power
+Date:  April 2013
+KernelVersion: 3.10
+Contact:   Jacob Pan 
+Description:Minimum power (mwatt) derived from the electrical spec of
+

[PATCH v2] RAPL (Running Average Power Limit) driver

2013-04-05 Thread Jacob Pan

Changes since V2:
- use 'struct device' instead of raw kobject to represent
  RAPL domains
- changed eventfd control interface to use event string
  instead of passing file descriptors that cannot be
  authenticated in sysfs directory
- clean ups based on v1 reviews
- use kcalloc for arrays
- drop dependencies on X86
- misc cleanups

I don have a checkpatch error which I think it is not valid. I can
fix it by avoiding the macro/adding more lines.

ERROR: Macros with complex values should be enclosed in parenthesis
#735: FILE: platform/x86/intel_rapl.c:735:
+#define RO_PRIMITIVE(name) \
+   primitive_show_fn(name) \
+   static DEVICE_ATTR(name, S_IRUGO, show_##name, NULL)

If i put parenthesis around it to mute the error then it won't compile.

I attempted to come up with a simple version as Greg suggested to
just expose cooling device but then I realized we could lose many
key functionalities partially due to the correlations among RAPL
control knobs. I am still working on this since it is not as simple as
deleting the advanced interfaces, I would have to do some setup based
on best guesses.

Jacob Pan (1):
  Introduce Intel RAPL cooling device driver

 Documentation/ABI/testing/sysfs-class-intel-rapl |  121 ++
 drivers/platform/x86/Kconfig |9 +
 drivers/platform/x86/Makefile|1 +
 drivers/platform/x86/intel_rapl.c| 1285 ++
 drivers/platform/x86/intel_rapl.h|  244 
 5 files changed, 1660 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-class-intel-rapl
 create mode 100644 drivers/platform/x86/intel_rapl.c
 create mode 100644 drivers/platform/x86/intel_rapl.h

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH resend] fs/proc: Move kfree outside pde_unload_lock

2013-04-05 Thread Al Viro

On Fri, Apr 05, 2013 at 03:56:17PM -0500, Nathan Zimmer wrote:

> That didn't produce anything.  I'll run some bisections over the
> weekend and see what I can sort out.

*Ugh*

I'd try to build with DEBUG_KMEMLEAK and slapped printks on the entry
and exit from close_pdeo().  If that doesn't show anything interesting,
it's probably unrelated to procfs...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/9] arm: mvebu: Limit the DMA zone when LPAE is selected

2013-04-05 Thread Gregory CLEMENT

On 04/05/2013 10:41 PM, Arnd Bergmann wrote:
> On Friday 05 April 2013, Gregory CLEMENT wrote:
>> When LPAE is activated on Armada XP, all registers and IOs are still
>> 32bit, the 40bit extension is on the CPU to DRAM path (windows) only.
>> That means that all the DMA transfer are restricted to the low 32 bits
>> address space. This is limitation is achieved by selecting ZONE_DMA.
>>
>> Signed-off-by: Gregory CLEMENT 
> 
> 
> Shouldn't that be ZONE_DMA32?
> 

Well common code for ARM don't manage the ZONE_DMA32. Whereas with
ZONE_DMA, setup_dma_zone() in arch/arm/mm/init.c does exactly what
I want: setting arm_dma_limit to 0x.

ZONE_DMA32 is used on arm64 however.

>   Arnd
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHv2] ARM: arch_timer: Silence debug preempt warnings

2013-04-05 Thread Stephen Boyd

Hot-plugging with CONFIG_DEBUG_PREEMPT=y on a device with arm
architected timers causes a slew of "using smp_processor_id() in
preemptible" warnings:

  BUG: using smp_processor_id() in preemptible [] code: sh/111
  caller is arch_timer_cpu_notify+0x14/0xc8

This happens because sometimes the cpu notifier,
arch_timer_cpu_notify(), is called in preemptible context and
other times in non-preemptible context but we use this_cpu_ptr()
to retrieve the clockevent in all cases. We're only going to
actually use the pointer in non-preemptible context though, so
push the this_cpu_ptr() access down into the cases to force the
checks to occur only in non-preemptible contexts.

Cc: Mark Rutland 
Cc: Marc Zyngier 
Signed-off-by: Stephen Boyd 
---

Changes since v1:
 * Pushed down this_cpu_ptr and added a comment

 drivers/clocksource/arm_arch_timer.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/clocksource/arm_arch_timer.c 
b/drivers/clocksource/arm_arch_timer.c
index d7ad425..a65a710 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -248,14 +248,16 @@ static void __cpuinit arch_timer_stop(struct 
clock_event_device *clk)
 static int __cpuinit arch_timer_cpu_notify(struct notifier_block *self,
   unsigned long action, void *hcpu)
 {
-   struct clock_event_device *evt = this_cpu_ptr(arch_timer_evt);
-
+   /*
+* Grab cpu pointer in each case to avoid spurious
+* preemptible warnings
+*/
switch (action & ~CPU_TASKS_FROZEN) {
case CPU_STARTING:
-   arch_timer_setup(evt);
+   arch_timer_setup(this_cpu_ptr(arch_timer_evt));
break;
case CPU_DYING:
-   arch_timer_stop(evt);
+   arch_timer_stop(this_cpu_ptr(arch_timer_evt));
break;
}
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH resend] fs/proc: Move kfree outside pde_unload_lock

2013-04-05 Thread Nathan Zimmer


On 04/05/2013 12:36 PM, Al Viro wrote:

On Fri, Apr 05, 2013 at 12:05:26PM -0500, Nathan Zimmer wrote:

On 04/04/2013 03:44 PM, Al Viro wrote:

On Thu, Apr 04, 2013 at 12:12:05PM -0500, Nathan Zimmer wrote:


Ok I am cloning the tree now.
It does look like the patches would conflict.
I'll run some tests and take a deeper look.

FWIW, I've just pushed there a tentative patch that switches to hopefully
saner locking (head should be at cb673c115c1f99d3480471ca5d8cb3f89a1e3bee).
Is that more or less what you want wrt spinlock contention?

One note: for any given pde_opener, close_pdeo() can be called at most
by two threads - final fput() and remove_proc_entry() resp.  I think
the use of completion + flag is safe there; pde->pde_unload_lock
should serialize the critical areas.

Something isn't quite right.  I keep getting hung during boot.
dracut: Mounted root filesystem /dev/sda8
dracut: Switching root

I'll try to get some more info on a smaller box.

Umm...  Try to add WARN_ON(1) in entry_rundown(), just to see what's
getting hit (don't bother with entry name, stack trace will be enough).
That didn't produce anything.  I'll run some bisections over the weekend 
and see what I can sort out.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Michael R. Hines


To be more specific, here's what I did:

1. apply kernel module patch - re-insert module
1. QEMU does: ibv_reg_mr(IBV_ACCESS_GIFT | IBV_ACCESS_REMOTE_READ)
2. Start the RDMA migration
3. Migration completes without any errors

This test does *not* work with a cgroup swap limit, however. The process 
gets killed. (Both with and without GIFT)


- Michael

On 04/05/2013 04:43 PM, Roland Dreier wrote:

On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines
 wrote:

I also removed the IBV_*_WRITE flags on the sender-side and activated
cgroups with the "memory.memsw.limit_in_bytes" activated and the migration
with RDMA also succeeded without any problems (both with *and* without GIFT
also worked).

Not sure I'm interpreting this correctly.  Are you saying that things
worked without actually setting the GIFT flag?   In which case why are
we adding this flag?

  - R.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] revoke(2) and generic handling of things like remove_proc_entry()

2013-04-05 Thread Al Viro

On Fri, Apr 05, 2013 at 12:56:09PM -0700, Greg Kroah-Hartman wrote:

> Which methods do you mean here?

file->f_op->some_method()

> The vfs core would call start_using(), or would filesystems / drivers
> need to do this?

The former; we have relatively few places that call file_operations
members directly and we'd turn each of those into
if (likely(start_using(file)) {
res = file->f_op->foo();
stop_using(file);
} else {
res = error_value_appropriate_for_foo;
}
 
> > 4) nasty semantics issue - mmap() vs. revoke (of any sort, including
> > remove_proc_entry(), etc.).  Suppose a revokable file had been mmapped;
> > now it's going away.  What should we do to its VMAs?  Right now sysfs
> > and procfs get away with that, but only because there's only one thing
> > that has ->mmap() there - /proc/bus/pci and sysfs equivalents.  I've
> > no idea how does pci_mmap_page_range() interact with PCI hotplug (and
> > I'm not at all sure that whatever it does isn't racy wrt device removal),
> 
> The page range should just start returning 0xff all over the place, the
> BIOS should have kept the mapping around, as it can't really assign it
> anywhere else, so all _should_ be fine here.

Umm... 0xff or SIGSEGV?

> I think that's a reasonable constraint, although tearing down the VMAs
> might be possible if we just invalidate the file handle "forcefully"
> (i.e. manually tear them down and then further accesses should through a
> SIGSEV fail, or am I missing something more basic here?)

The question is how to do that in a reasonably clean way; we would've done
as part of ->kick(), I suppose, or right next to it.

> > 6) how do we get from revoke(2) to call of revoke_it() on the right object?
> > Note that revoke(2) is done by pathname; we might want an ...at() variant,
> > but all we'll have to play with will be inode, not an opened file.
> 
> Can we make revoke(2) require a valid file handle?  Is there a POSIX
> spec for revoke(2) that we have to follow here, or given that we haven't
> had one yet, are we free to define whatever we want without people
> getting that upset?

BSD one takes a pathname and so do all derived ones...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Michael R. Hines

Sorry, I was wrong. ignore the comments about cgroups. That's still 
broken. (i.e. trying to register RDMA memory while using a cgroup swap 
limit cause the process get killed).


But the GIFT flag patch works (my understanding is that GIFT flag allows 
the adapter to transmit stale memory information, it does not have 
anything to do with cgroups specifically).


Am I missing something? I was only testing the GIFT flag patch.

Note: I only turned it on - I did not verify the (non) consitency of the 
memory that was transmitted.


- Michael


On 04/05/2013 04:43 PM, Roland Dreier wrote:

On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines
 wrote:

I also removed the IBV_*_WRITE flags on the sender-side and activated
cgroups with the "memory.memsw.limit_in_bytes" activated and the migration
with RDMA also succeeded without any problems (both with *and* without GIFT
also worked).

Not sure I'm interpreting this correctly.  Are you saying that things
worked without actually setting the GIFT flag?   In which case why are
we adding this flag?

  - R.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 4/4] xen/arm: introduce xen_early_init, use PSCI on xen

2013-04-05 Thread Rob Herring

On 04/05/2013 02:36 PM, Nicolas Pitre wrote:
> On Fri, 5 Apr 2013, Stefano Stabellini wrote:
> 
>> This is what happens:
>>
>> - No Xen
>> Xen is not running on the platform and a Xen hypervisor node is not
>> available on device tree.
>> Everything keeps working seamlessly, this patch doesn't change anything.
>>
>> - we are running on Xen
>> Xen is running on the platform, we are running as a guest on Xen and an
>> hypervisor node is available on device tree.
>> Let's also assume that there aren't any "arm,cci" compatible nodes on
>> device tree because Xen wouldn't export this kind of information to any
>> guests right now. Therefore PSCI should be used to boot secondary cpus.
>> Because the versatile express machine sets smp_init to
>> vexpress_smp_init_ops, vexpress_smp_init_ops will be called.
>> vexpress_smp_init_ops sets smp_ops to vexpress_smp_ops, that *break*
>> Xen.
> 
> OK I see.
> 
>> With this patch, xen_smp_init will be called instead of
>> vexpress_smp_init_ops, and smp_ops will be set to psci_smp_ops,
>> therefore *unbreaking* Xen.
> 
> However that breaks MCPM.

You mean on bare metal, right? For the bare metal, "xen,xen" property
would not be present and xen_smp_init is not used. So the vexpress MCPM
ops will be used. Aren't Dom0 cpu's basically virtual cpus? If Xen ever
needs the MCPM support, the Xen hook itself can figure out whether to
use MCPM support.

Rob

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/9] arm: mvebu: Align the internal registers virtual base to support LPAE

2013-04-05 Thread Arnd Bergmann

On Friday 05 April 2013, Gregory CLEMENT wrote:
> From: Lior Amsalem 
> 
> In order to be able to support he LPAE, the internal registers virtual
> base must be aligned to 2MB.
> 
> Signed-off-by: Lior Amsalem 
> Signed-off-by: Gregory CLEMENT 

This is a surprising limitation. Can you extend the above text to go into more
detail where that alignment requirement comes from?

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 7/9] arm: dts: mvebu: introduce internal-regs node

2013-04-05 Thread Gregory CLEMENT

On 04/05/2013 10:43 PM, Arnd Bergmann wrote:
> On Friday 05 April 2013, Gregory CLEMENT wrote:
>> Signed-off-by: Gregory CLEMENT 
> 
> The patch looks good but the description is a bit short.
> 

It cannot be more brief! :)

I explained the purpose of this patch in the cover letter and forgot to
add this explanation here.

I will expand it for next version.

Thanks.
>   Arnd
> 


-- 
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/9] rm: mvebu: Enable LPAE support for Armada XP SoCs

2013-04-05 Thread Arnd Bergmann

On Friday 05 April 2013, Gregory CLEMENT wrote:
> The Armada XP SoCs have LPAE support. This is the second version patch
> set whixh allow to run kernel on this SoCs with LPAE support.
> 
> The biggest changes are the conversion of the device tree file to 64
> bits in order to be able to use more than 4GB of memory (without this
> the LPAE is pointless).
> 

The series looks good overall, I've commented on trivial details.

Also, please use "ARM: mvebu: ..." in the subject rather than the lower-case
version.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 8/9] arm: dts: mvebu: fix cpus section indentation

2013-04-05 Thread Arnd Bergmann

On Friday 05 April 2013, Gregory CLEMENT wrote:
> From: Thomas Petazzoni 
> 
> Signed-off-by: Thomas Petazzoni 

This should have a description, even though it's completely trivial.
I would also recommend moving this patch first, as the general rule
is to do cleanups first.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 7/9] arm: dts: mvebu: introduce internal-regs node

2013-04-05 Thread Arnd Bergmann

On Friday 05 April 2013, Gregory CLEMENT wrote:
> Signed-off-by: Gregory CLEMENT 

The patch looks good but the description is a bit short.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-05 Thread Roland Dreier

On Fri, Apr 5, 2013 at 1:17 PM, Michael R. Hines
 wrote:
> I also removed the IBV_*_WRITE flags on the sender-side and activated
> cgroups with the "memory.memsw.limit_in_bytes" activated and the migration
> with RDMA also succeeded without any problems (both with *and* without GIFT
> also worked).

Not sure I'm interpreting this correctly.  Are you saying that things
worked without actually setting the GIFT flag?   In which case why are
we adding this flag?

 - R.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] x86: kernel base offset ASLR

2013-04-05 Thread Borislav Petkov

On Fri, Apr 05, 2013 at 01:19:39PM -0700, Julien Tinnes wrote:
> I think it'd be perfectly ok for OOPS to print out the kernel base.

Yeah, ok, this still would need some massaging of the oops output per
script, but it shouldn't be a big problem.

Also, you probably need to make clear in the oops itself that the
addresses have been randomized. Or, is the mere presence of kernel base
going to imply that?

> Restricting access to these oopses becomes a different problem
> (privilege separation). Some existing sandboxes (Chromium, vsftpd,
> openssh..) are already defending against it.

Ok.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/4] x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.

2013-04-05 Thread Konrad Rzeszutek Wilk

During the ACPI S3 resume path the trampoline code handles it already.

During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set:
early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id());

which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume is complex and for 64-bit kernels we use three GDTs
- one early bootstrap GDT (wakeup_igdt) that we load to workaround
broken BIOSes, an early Protected Mode to Long Mode transition one
(tr_gdt), and the final one - early_gdt_descr (which points to the real GDT).

The early ('wakeup_gdt') is loaded in 'trampoline_start' for working
around broken BIOSes, and then when we end up in Protected Mode in the
startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt'
(still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment,
64-bit code segment with L=1, and a 32-bit data segment.

Once we have transitioned from Protected Mode to Long Mode we then
set the GDT to 'early_gdt_desc' and then via an iretq emerge in
wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel).

In the wakeup_long64 we end up restoring the %rip (which is set to
'resume_point') and jump there.

In 'resume_point' we call 'restore_processor_state' which does
the load_gdt on the saved context. This load_gdt is redundant as the
GDT loaded via early_gdt_desc is the same.

Here is the call-chain:
 wakeup_start
   |- lgdtl wakeup_gdt [the work-around broken BIOSes]
   |
   \-- trampoline_start (trampoline_64.S)
 |- lgdtl tr_gdt
 |
 \-- startup_32 (trampoline_64.S)
   |
   \-- startup_64 (trampoline_64.S)
  |
  \-- secondary_startup_64
   |- lgdtl early_gdt_desc
   | ...
   |- movq initial_code(%rip), %eax
   |-.. lretq
   \-- wakeup_64
 |-- other registers are reloaded
 |-- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that along
with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and cr3
(restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_64.S is invoked which restores
the contents of most registers. Naturally the resume path benefits from
already being in 64-bit mode, so it does not have to load the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note that
the restoration of the restore image page-tables is done prior to this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT which
is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/include/asm/suspend_64.h | 3 ---
 arch/x86/power/cpu.c  | 2 --
 2 files changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/suspend_64.h 
b/arch/x86/include/asm/suspend_64.h
index 09b0bf1..97b84e0 100644
--- a/arch/x86/include/asm/suspend_64.h
+++ b/arch/x86/include/asm/suspend_64.h
@@ -25,9 +25,6 @@ struct saved_context {
u64 misc_enable;
bool misc_enable_saved;
unsigned long efer;
-   u16 gdt_pad;
-   u16 gdt_limit;
-   unsigned long gdt_base;
u16 idt_pad;
u16 idt_limit;
unsigned long idt_base;
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 3c68768..fdca260 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -66,7 +66,6 @@ static void __save_processor_state(struct saved_context *ctxt)
store_idt(>idt);
 #else
 /* CONFIG_X86_64 */
-   store_gdt((struct desc_ptr *)>gdt_limit);
store_idt((struct desc_ptr *)>idt_limit);
 #endif
store_tr(ctxt->tr);
@@ -187,7 +186,6 @@ static void __restore_processor_state(struct saved_context 
*ctxt)
load_idt(>idt);
 #else
 /* CONFIG_X86_64 */
-   load_gdt((const struct desc_ptr *)>gdt_limit);
load_idt((const struct desc_ptr *)>idt_limit);
 #endif
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed

2013-04-05 Thread Konrad Rzeszutek Wilk

During the ACPI S3 suspend, we store the GDT in the wakup_header (see
wakeup_asm.s) field called 'pmode_gdt'.

Which is then used during the resume path and has the same exact
value as what the store/load_gdt do with the saved_context
(which is saved/restored via save/restore_processor_state()).

The flow during resume from ACPI S3 is simpler than the 64-bit
counterpart. We only use the early bootstrap once (wakeup_gdt) and
do various checks in real mode.

After the checks are completed, we load the saved GDT ('pmode_gdt') and
continue on with the resume (by heading to startup_32 in trampoline_32.S) -
which quickly jumps to what was saved in 'pmode_entry'
aka 'wakeup_pmode_return'.

The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
saved in do_suspend_lowlevel initially). After that it ends up calling
the 'ret_point' which calls 'restore_processor_state()'.

We have two opportunities to remove code where we restore the same GDT
twice.

Here is the call chain:
 wakeup_start
   |- lgdtl wakeup_gdt [the work-around broken BIOSes]
   |
   | - lgdtl pmode_gdt [the real one]
   |
   \-- startup_32 (in trampoline_32.S)
  \-- wakeup_pmode_return (in wakeup_32.S)
   |- lgdtl saved_gdt [the real one]
   \-- ret_point
 |..
 |- call restore_processor_state

The hibernate path is much simpler. During the saving of the hibernation
image we call save_processor_state() and save the contents of that
along with the rest of the kernel in the hibernation image destination.
We save the EIP of 'restore_registers' (restore_jump_address) and
cr3 (restore_cr3).

During hibernate resume, the 'restore_registers' (via the
'restore_jump_address) in hibernate_asm_32.S is invoked which
restores the contents of most registers. Naturally the resume path benefits
from already being in 32-bit mode, so it does not have to reload the GDT.

It only reloads the cr3 (from restore_cr3) and continues on. Note
that the restoration of the restore image page-tables is done prior to
this.

After the 'restore_registers' it returns and we end up called
restore_processor_state() - where we reload the GDT. The reload of
the GDT is not needed as bootup kernel has already loaded the GDT
which is at the same physical location as the the restored kernel.

Note that the hibernation path assumes the GDT is correct during its
'restore_registers'. The assumption in the code is that the restored
image is the same as saved - meaning we are not trying to restore
an different kernel in the virtual address space of a new kernel.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/include/asm/suspend_32.h | 1 -
 arch/x86/kernel/acpi/wakeup_32.S  | 3 ---
 arch/x86/power/cpu.c  | 2 --
 3 files changed, 6 deletions(-)

diff --git a/arch/x86/include/asm/suspend_32.h 
b/arch/x86/include/asm/suspend_32.h
index 487055c..f6064b7 100644
--- a/arch/x86/include/asm/suspend_32.h
+++ b/arch/x86/include/asm/suspend_32.h
@@ -15,7 +15,6 @@ struct saved_context {
unsigned long cr0, cr2, cr3, cr4;
u64 misc_enable;
bool misc_enable_saved;
-   struct desc_ptr gdt;
struct desc_ptr idt;
u16 ldt;
u16 tss;
diff --git a/arch/x86/kernel/acpi/wakeup_32.S b/arch/x86/kernel/acpi/wakeup_32.S
index 13ab720..91adb1b 100644
--- a/arch/x86/kernel/acpi/wakeup_32.S
+++ b/arch/x86/kernel/acpi/wakeup_32.S
@@ -18,7 +18,6 @@ wakeup_pmode_return:
movw%ax, %gs
 
# reload the gdt, as we need the full 32 bit address
-   lgdtsaved_gdt
lidtsaved_idt
lldtsaved_ldt
ljmp$(__KERNEL_CS), $1f
@@ -44,7 +43,6 @@ bogus_magic:
 
 
 save_registers:
-   sgdtsaved_gdt
sidtsaved_idt
sldtsaved_ldt
str saved_tss
@@ -93,7 +91,6 @@ ENTRY(saved_magic).long   0
 ENTRY(saved_eip)   .long   0
 
 # saved registers
-saved_gdt: .long   0,0
 saved_idt: .long   0,0
 saved_ldt: .long   0
 saved_tss: .long   0
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index fdca260..571176f 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -62,7 +62,6 @@ static void __save_processor_state(struct saved_context *ctxt)
 * descriptor tables
 */
 #ifdef CONFIG_X86_32
-   store_gdt(>gdt);
store_idt(>idt);
 #else
 /* CONFIG_X86_64 */
@@ -182,7 +181,6 @@ static void __restore_processor_state(struct saved_context 
*ctxt)
 * ltr is done i fix_processor_context().
 */
 #ifdef CONFIG_X86_32
-   load_gdt(>gdt);
load_idt(>idt);
 #else
 /* CONFIG_X86_64 */
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] axe the store_gdt() pvops call. (v1)

2013-04-05 Thread Konrad Rzeszutek Wilk

Long long time ago (way back in October 2012), when I posted the patches
that would make it possible to do ACPI S3 with Xen, Peter pointed out that:
"excellent set of pvops calls that should be nukable to Kingdom Come. There
is no reason, ever, to read the IDT and GDT from the kernel... the kernel
already knows what they should be!"

http://lkml.indiana.edu/hypermail/linux/kernel/1210.2/01555.html

Merge windows happens, bugs happen, and only this week I was able to carve
out some time to dig a bit in this. I started with the GDT and found
out that we can remove it. My fear was that ACPI S3 would break but fortunatly
it has its own mechanism for reloading the GDT (and as the 32-bit
patch shows - it has an redundant one as well!). Tested on 32 bit and
64-bit kernels - doing ACPI S3 and hibernate as well. The machines
were ThinkPad T61 and an Asus M5A97.

This RFC patch does the removal of the store_gdt() and as well some of
the 32-bit ACPI S3 code. Please review at your leisure.

The patches are also visible at:
 
 git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/for-hpa-3.10

 arch/x86/include/asm/paravirt.h   |  4 
 arch/x86/include/asm/paravirt_types.h |  2 +-
 arch/x86/include/asm/suspend_32.h |  1 -
 arch/x86/include/asm/suspend_64.h |  3 ---
 arch/x86/kernel/acpi/sleep.c  |  2 +-
 arch/x86/kernel/acpi/wakeup_32.S  |  3 ---
 arch/x86/kernel/doublefault_32.c  |  2 +-
 arch/x86/kernel/paravirt.c|  1 -
 arch/x86/kvm/vmx.c|  2 +-
 arch/x86/power/cpu.c  | 13 +++--
 arch/x86/xen/enlighten.c  |  1 -
 11 files changed, 11 insertions(+), 23 deletions(-)

Konrad Rzeszutek Wilk (3):
  x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is 
not needed.
  x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is 
not needed
  x86/xen/store_gdt: Remove the pvops variant of store_gdt.

kon...@kernel.org (1):
  x86/wakeup/sleep:  Use pvops functions for changing GDT entries.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1014 matches

Mail list logo