date:20131118

Re: [patch] pinctrl: abx500: fix some more bitwise AND tests

2013-11-18 Thread Linus Walleij

On Sun, Nov 10, 2013 at 12:35 AM, Dan Carpenter
 wrote:

> I sent a patch to fix some bitwise AND tests but I guess I missed some.
> Sorry about that.
>
> Signed-off-by: Dan Carpenter 

Patch applied for fixes, thanks!

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the gpio tree with the pm tree

2013-11-18 Thread Linus Walleij

On Tue, Nov 19, 2013 at 2:43 AM, Stephen Rothwell  wrote:

> Today's linux-next merge of the gpio tree got a conflict in
> drivers/gpio/gpiolib.c between commit 7b1998116bbb ("ACPI / driver core:
> Store an ACPI device pointer in struct acpi_dev_node") from the pm tree
> and commit f760f1967ee8 ("gpiolib: use dedicated flags for GPIO
> properties") from the gpio tree.
>
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Oh I never saw that patch before ... anyway it seems to be
correct, thanks.

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mfd: Make MFD_AS3722 depend on I2C=y

2013-11-18 Thread Michal Marek

On 19.11.2013 00:19, Stephen Warren wrote:
> On 11/18/2013 10:25 AM, Michal Marek wrote:
>> MFD_AS3722 can only be builtin, so it needs I2C builtin as well.
>> With I2C=m, we get:
>>
>> drivers/mfd/as3722.c:372: undefined reference to `devm_regmap_init_i2c'
>> drivers/built-in.o: In function `as3722_i2c_driver_init':
>> drivers/mfd/as3722.c:444: undefined reference to `i2c_register_driver'
>> drivers/built-in.o: In function `as3722_i2c_driver_exit':
>> drivers/mfd/as3722.c:444: undefined reference to `i2c_del_driver'
> 
> Shouldn't Kconfig handle this; if a Boolean config option depends on a
> tri-state config option, shouldn't it automatically validate that the
> tri-state is "y" not "m"?

That's what I initially though as well. But then you have cases where
intuitively you prefer the current m -> y promotion:

config EXT3_FS
tristate "Ext3 journalling file system support"

config EXT3_DEFAULTS_TO_ORDERED
bool "Default to 'data=ordered' in ext3"
depends on EXT3_FS
default y

Ternary logic is fun only as long as it is not mixed with boolean :-(.

Michal
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ANNOUNCE] Jailhouse: A Linux-based Partitioning Hypervisor

2013-11-18 Thread Jan Kiszka

We are happy to announce the Jailhouse project, now also to a broader
community!

Jailhouse is a partitioning hypervisor that can create asymmetric
multiprocessing (AMP) setups on Linux-based systems. That means it runs
bare-metal applications or non-Linux OSes aside a standard Linux kernel
on one multicore hardware platform. Jailhouse ensures isolation between
these "cells", as we call them, via hardware-assisted virtualization.
The typical workloads we expect to see in non-Linux cells are
applications with highly demanding real-time, safety or security
requirements. In contrast to comparable hypervisors, Jailhouse is loaded
and configured via Linux, not the other way around. Give it a try to see
and "feel" the difference.

The aim of Jailhouse is to keep the amount of code responsible for
establishing and maintaining cell isolation as small as possible. And
with small we mean a few thousand lines of code at the privilege level
of the hypervisor. This is obviously much less than you can achieve with
full-featured hypervisors like KVM. See also the Jailhouse presentation
at this year's KVM Forum for the differentiation between KVM and
Jailhouse, as well as possible combinations of both:

https://docs.google.com/file/d/0B6HTUUWSPdd-Zl93MVhlMnRJRjg

Jailhouse is clearly in an incubator stage. We currently only support
Intel x86, including a demonstration setup inside QEMU/KVM. Also, we
still lack a number features and measures in order to truly and provably
isolate cells from each other. Besides working on this, ARM support is
on our road map as well. As we would like to motivate early feedback,
including potential contributions, we already released the code under GPLv2:

https://github.com/siemens/jailhouse

Aside the master branch, you can also find a first step towards the
KVM-on-Jailhouse concept presented at KVM Forum 2013.

Looking forward to your feedback!

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: boot: Fix mixed indentation in a20.c

2013-11-18 Thread Ingo Molnar

* H. Peter Anvin  wrote:

> On 11/18/2013 09:50 AM, Johannes Löthberg wrote:
> > Replace all mixed indentation with tabs
> > 
> > Signed-off-by: Johannes Löthberg 
> 
> NAK.  Not worth the churn in the absence of other changes.

Yes.

If a20.c was cleaned up altogether, that might be a more tempting 
change, it has a good number of inconsistencies and improving it all 
would make it more readable.

But it makes little sense to do an incomplete cleanup.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ipc,shm: fix shm_file deletion races

2013-11-18 Thread Greg Thelen

When IPC_RMID races with other shm operations there's potential for
use-after-free of the shm object's associated file (shm_file).

Here's the race before this patch:
  TASK 1 TASK 2
  -- --
  shm_rmid()
ipc_lock_object()
 shmctl()
 shp = shm_obtain_object_check()

shm_destroy()
  shum_unlock()
  fput(shp->shm_file)
 ipc_lock_object()
 shmem_lock(shp->shm_file)
 

The oops is caused because shm_destroy() calls fput() after dropping the
ipc_lock.  fput() clears the file's f_inode, f_path.dentry, and
f_path.mnt, which causes various NULL pointer references in task 2.  I
reliably see the oops in task 2 if with shmlock, shmu

This patch fixes the races by:
1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
2) modify at risk operations to check shm_file while holding
   ipc_object_lock().

Example workloads, which each trigger oops...

Workload 1:
  while true; do
id=$(shmget 1 4096)
shm_rmid $id &
shmlock $id &
wait
  done

  The oops stack shows accessing NULL f_inode due to racing fput:
_raw_spin_lock
shmem_lock
SyS_shmctl

Workload 2:
  while true; do
id=$(shmget 1 4096)
shmat $id 4096 &
shm_rmid $id &
wait
  done

  The oops stack is similar to workload 1 due to NULL f_inode:
touch_atime
shmem_mmap
shm_mmap
mmap_region
do_mmap_pgoff
do_shmat
SyS_shmat

Workload 3:
  while true; do
id=$(shmget 1 4096)
shmlock $id
shm_rmid $id &
shmunlock $id &
wait
  done

  The oops stack shows second fput tripping on an NULL f_inode.  The
  first fput() completed via from shm_destroy(), but a racing thread did
  a get_file() and queued this fput():
locks_remove_flock
__fput
fput
task_work_run
do_notify_resume
int_signal

Fixes: c2c737a0461e ("ipc,shm: shorten critical region for shmat")
Fixes: 2caacaa82a51 ("ipc,shm: shorten critical region for shmctl")
Signed-off-by: Greg Thelen 
Cc:   # 3.10.17+ 3.11.6+
---
 ipc/shm.c | 28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index d69739610fd4..0bdf21c6814e 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -208,15 +208,18 @@ static void shm_open(struct vm_area_struct *vma)
  */
 static void shm_destroy(struct ipc_namespace *ns, struct shmid_kernel *shp)
 {
+   struct file *shm_file;
+
+   shm_file = shp->shm_file;
+   shp->shm_file = NULL;
ns->shm_tot -= (shp->shm_segsz + PAGE_SIZE - 1) >> PAGE_SHIFT;
shm_rmid(ns, shp);
shm_unlock(shp);
-   if (!is_file_hugepages(shp->shm_file))
-   shmem_lock(shp->shm_file, 0, shp->mlock_user);
+   if (!is_file_hugepages(shm_file))
+   shmem_lock(shm_file, 0, shp->mlock_user);
else if (shp->mlock_user)
-   user_shm_unlock(file_inode(shp->shm_file)->i_size,
-   shp->mlock_user);
-   fput (shp->shm_file);
+   user_shm_unlock(file_inode(shm_file)->i_size, shp->mlock_user);
+   fput(shm_file);
ipc_rcu_putref(shp, shm_rcu_free);
 }
 
@@ -983,6 +986,13 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct 
shmid_ds __user *, buf)
}
 
shm_file = shp->shm_file;
+
+   /* check if shm_destroy() is tearing down shp */
+   if (shm_file == NULL) {
+   err = -EIDRM;
+   goto out_unlock0;
+   }
+
if (is_file_hugepages(shm_file))
goto out_unlock0;
 
@@ -1101,6 +,14 @@ long do_shmat(int shmid, char __user *shmaddr, int 
shmflg, ulong *raddr,
goto out_unlock;
 
ipc_lock_object(>shm_perm);
+
+   /* check if shm_destroy() is tearing down shp */
+   if (shp->shm_file == NULL) {
+   ipc_unlock_object(>shm_perm);
+   err = -EIDRM;
+   goto out_unlock;
+   }
+
path = shp->shm_file->f_path;
path_get();
shp->shm_nattch++;
-- 
1.8.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:sched/urgent] sched: Optimize task_sched_runtime()

2013-11-18 Thread Ingo Molnar


* Davidlohr Bueso  wrote:

> On Wed, 2013-11-13 at 09:25 -0800, tip-bot for Peter Zijlstra wrote:
> > Commit-ID:  911b2898b3c9fe0048e9485ad1629ed4fce330fd
> > Gitweb: 
> > http://git.kernel.org/tip/911b2898b3c9fe0048e9485ad1629ed4fce330fd
> > Author: Peter Zijlstra 
> > AuthorDate: Mon, 11 Nov 2013 18:21:56 +0100
> > Committer:  Ingo Molnar 
> > CommitDate: Wed, 13 Nov 2013 13:33:54 +0100
> > 
> > sched: Optimize task_sched_runtime()
> > 
> > Large multi-threaded apps like to hit this using do_sys_times() and
> > then queue up on the rq->lock.
> > 
> > Avoid when possible.
> > 
> > Larry reported ~20% performance increase his test case.
> > 
> > Reported-by: Larry Woodman 
> > Suggested-by: Paul Turner 
> > Signed-off-by: Peter Zijlstra 
> > Cc: KOSAKI Motohiro 
> > Cc: Linus Torvalds 
> > Cc: Andrew Morton 
> > Link: 
> > http://lkml.kernel.org/r/2013172925.gg26...@twins.programming.kicks-ass.net
> > Signed-off-by: Ingo Molnar 
> 
> For what it's worth:
> 
> Tested-by: Davidlohr Bueso 

Thanks for the testing - the change is upstream already and unless it 
causes regressions it will be part of the v3.13 kernel.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [PATCH RESEND v2 0/2] xen: vnuma introduction for pv guest

2013-11-18 Thread Dario Faggioli

On lun, 2013-11-18 at 16:58 -0500, Elena Ufimtseva wrote:
> Xen vnuma introduction.
> 
> The patchset introduces vnuma to paravirtualized Xen guests
> runnning as domU.
> Xen subop hypercall is used to retreive vnuma topology information.
> Bases on the retreived topology from Xen, NUMA number of nodes,
> memory ranges, distance table and cpumask is being set.
> If initialization is incorrect, sets 'dummy' node and unsets
> nodemask. vNUMA topology is constructed by Xen toolstack.
> 
> Example of vnuma enabled pv domain dmesg:
> 
Mmm... So, why the resend? When you do things like this, you usually
tell the reason, if only, to let people know what happened and which
series they should actually look at and review. :-)

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part

Re: [Xen-devel] [PATCH v2 1/2] xen: vnuma support for PV guests running as domU

2013-11-18 Thread Dario Faggioli

On lun, 2013-11-18 at 15:25 -0500, Elena Ufimtseva wrote:
> Signed-off-by: Elena Ufimtseva 

> diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
> index 96ab2c0..de9deab 100644
> --- a/arch/x86/xen/Makefile
> +++ b/arch/x86/xen/Makefile
> @@ -13,7 +13,7 @@ CFLAGS_mmu.o:= $(nostackp)
>  obj-y:= enlighten.o setup.o multicalls.o mmu.o irq.o \
>   time.o xen-asm.o xen-asm_$(BITS).o \
>   grant-table.o suspend.o platform-pci-unplug.o \
> - p2m.o
> + p2m.o vnuma.o
>  
>  obj-$(CONFIG_EVENT_TRACING) += trace.o

I think David said something about this during last round (going
fetchin'-cuttin'-pastin' it):

"
obj-$(CONFIG_NUMA) += vnuma.o

Then you can remove the #ifdef CONFIG_NUMA from xen/vnuma.c
"
 
> diff --git a/arch/x86/xen/vnuma.c b/arch/x86/xen/vnuma.c

> +/* 
> + * Called from numa_init if numa_off = 0;
^ if numa_off = 1 ?

> + * we set numa_off = 0 if xen_vnuma_supported()
> + * returns true and its a domU;
> + */
> +int __init xen_numa_init(void)
> +{

> + if (nr_nodes > num_possible_cpus()) {
> + pr_debug("vNUMA: Node without cpu is not supported in this 
> version.\n");
> + goto out;
> + }
> +
This is a super-minor thing, but I wouldn't say "in this version". It
makes people think that there will be a later version where that will be
supported, which we don't know. :-)

> + /*
> +  * Set a dummy node and return success.  This prevents calling any
> +  * hardware-specific initializers which do not work in a PV guest.
> +  * Taken from dummy_numa_init code.
> +  */
>
This is a lot better... Thanks! :-)

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part

linux-kernel@vger.kernel.org

2013-11-18 Thread

[PATCH 2/2] f2fs: introduce a bio array for per-page write bios

2013-11-18 Thread Jaegeuk Kim

The f2fs has three bio types, NODE, DATA, and META, and manages some data
structures per each bio types.

The codes are a little bit messy, thus, this patch introduces a bio array
which groups individual data structures as follows.

struct f2fs_bio_info {
struct bio *bio;/* bios to merge */
sector_t last_block_in_bio; /* last block number */
struct mutex io_mutex;  /* mutex for bio */
};

struct f2fs_sb_info {
...
struct f2fs_bio_info write_io[NR_PAGE_TYPE];/* for write bios */
...
};

The code changes from this new data structure are trivial.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/f2fs.h| 12 +---
 fs/f2fs/segment.c | 44 ++--
 fs/f2fs/super.c   |  2 +-
 3 files changed, 32 insertions(+), 26 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e9038bb..05f8fe1 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -361,6 +361,12 @@ enum page_type {
META_FLUSH,
 };
 
+struct f2fs_bio_info {
+   struct bio *bio;/* bios to merge */
+   sector_t last_block_in_bio; /* last block number */
+   struct mutex io_mutex;  /* mutex for bio */
+};
+
 struct f2fs_sb_info {
struct super_block *sb; /* pointer to VFS super block */
struct proc_dir_entry *s_proc;  /* proc entry */
@@ -374,9 +380,9 @@ struct f2fs_sb_info {
 
/* for segment-related operations */
struct f2fs_sm_info *sm_info;   /* segment manager */
-   struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
-   sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
-   struct mutex write_mutex[NR_PAGE_TYPE]; /* mutex for writing IOs */
+
+   /* for bio operations */
+   struct f2fs_bio_info write_io[NR_PAGE_TYPE];/* for write bios */
 
/* for checkpoint */
struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 119af0b..9607cc4 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -838,65 +838,65 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
 {
int rw = sync ? WRITE_SYNC : WRITE;
enum page_type btype = PAGE_TYPE_OF_BIO(type);
-   struct bio *bio = sbi->bio[btype];
+   struct f2fs_bio_info *io = >write_io[btype];
struct bio_private *p;
 
-   if (!bio)
+   if (!io->bio)
return;
 
-   sbi->bio[btype] = NULL;
-
if (type >= META_FLUSH)
rw = WRITE_FLUSH_FUA;
if (btype == META)
rw |= REQ_META;
 
-   p = bio->bi_private;
+   p = io->bio->bi_private;
p->sbi = sbi;
-   bio->bi_end_io = f2fs_end_io_write;
+   io->bio->bi_end_io = f2fs_end_io_write;
 
-   trace_f2fs_do_submit_bio(sbi->sb, btype, sync, bio);
+   trace_f2fs_do_submit_bio(sbi->sb, btype, sync, io->bio);
 
if (type == META_FLUSH) {
DECLARE_COMPLETION_ONSTACK(wait);
p->is_sync = true;
p->wait = 
-   submit_bio(rw, bio);
+   submit_bio(rw, io->bio);
wait_for_completion();
} else {
p->is_sync = false;
-   submit_bio(rw, bio);
+   submit_bio(rw, io->bio);
}
+   io->bio = NULL;
 }
 
 void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum page_type type, bool sync)
 {
-   enum page_type btype = PAGE_TYPE_OF_BIO(type);
+   struct f2fs_bio_info *io = >write_io[PAGE_TYPE_OF_BIO(type)];
 
-   if (!sbi->bio[btype])
+   if (!io->bio)
return;
 
-   mutex_lock(>write_mutex[btype]);
+   mutex_lock(>io_mutex);
do_submit_bio(sbi, type, sync);
-   mutex_unlock(>write_mutex[btype]);
+   mutex_unlock(>io_mutex);
 }
 
 static void submit_write_page(struct f2fs_sb_info *sbi, struct page *page,
block_t blk_addr, enum page_type type)
 {
struct block_device *bdev = sbi->sb->s_bdev;
+   struct f2fs_bio_info *io = >write_io[type];
int bio_blocks;
 
verify_block_addr(sbi, blk_addr);
 
-   mutex_lock(>write_mutex[type]);
+   mutex_lock(>io_mutex);
 
inc_page_count(sbi, F2FS_WRITEBACK);
 
-   if (sbi->bio[type] && sbi->last_block_in_bio[type] != blk_addr - 1)
+   if (io->bio && io->last_block_in_bio != blk_addr - 1)
do_submit_bio(sbi, type, false);
 alloc_new:
-   if (sbi->bio[type] == NULL) {
+   if (io->bio == NULL) {
struct bio_private *priv;
 retry:
priv = kmalloc(sizeof(struct bio_private), GFP_NOFS);
@@ -906,9 +906,9 @@ retry:
}
 
bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
-   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
-   sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi, blk_addr);
-

[PATCH 1/2] f2fs: disable the extent cache ops on high fragmented files

2013-11-18 Thread Jaegeuk Kim

The f2fs manages an extent cache to search a number of consecutive data blocks
very quickly.

However it conducts unnecessary cache operations if the file is highly
fragmented with no valid extent cache.

In such the case, we don't need to handle the extent cache, but just can disable
the cache facility.

Nevertheless, this patch gives one more chance to enable the extent cache.

For example,
1. create a file
2. write data sequentially which produces a large valid extent cache
3. update some data, resulting in a fragmented extent
4. if the fragmented extent is too small, then drop extent cache
5. close the file

6. open the file again
7. give another chance to make a new extent cache
8. write data sequentially again which creates another big extent cache.
...

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c | 22 ++
 fs/f2fs/f2fs.h |  3 +++
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 5920639..2e54522 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -71,6 +71,9 @@ static int check_extent_cache(struct inode *inode, pgoff_t 
pgofs,
pgoff_t start_fofs, end_fofs;
block_t start_blkaddr;
 
+   if (is_inode_flag_set(fi, FI_NO_EXTENT))
+   return 0;
+
read_lock(>ext.ext_lock);
if (fi->ext.len == 0) {
read_unlock(>ext.ext_lock);
@@ -109,6 +112,7 @@ void update_extent_cache(block_t blk_addr, struct 
dnode_of_data *dn)
struct f2fs_inode_info *fi = F2FS_I(dn->inode);
pgoff_t fofs, start_fofs, end_fofs;
block_t start_blkaddr, end_blkaddr;
+   int need_update = true;
 
f2fs_bug_on(blk_addr == NEW_ADDR);
fofs = start_bidx_of_node(ofs_of_node(dn->node_page), fi) +
@@ -117,6 +121,9 @@ void update_extent_cache(block_t blk_addr, struct 
dnode_of_data *dn)
/* Update the page address in the parent node */
__set_data_blkaddr(dn, blk_addr);
 
+   if (is_inode_flag_set(fi, FI_NO_EXTENT))
+   return;
+
write_lock(>ext.ext_lock);
 
start_fofs = fi->ext.fofs;
@@ -163,14 +170,21 @@ void update_extent_cache(block_t blk_addr, struct 
dnode_of_data *dn)
fofs - start_fofs + 1;
fi->ext.len -= fofs - start_fofs + 1;
}
-   goto end_update;
+   } else {
+   need_update = false;
}
-   write_unlock(>ext.ext_lock);
-   return;
 
+   /* Finally, if the extent is very fragmented, let's drop the cache. */
+   if (fi->ext.len < F2FS_MIN_EXTENT_LEN) {
+   fi->ext.len = 0;
+   set_inode_flag(fi, FI_NO_EXTENT);
+   need_update = true;
+   }
 end_update:
write_unlock(>ext.ext_lock);
-   sync_inode_page(dn);
+   if (need_update)
+   sync_inode_page(dn);
+   return;
 }
 
 struct page *find_data_page(struct inode *inode, pgoff_t index, bool sync)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index c2de549..e9038bb 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -169,6 +169,8 @@ enum {
 #define F2FS_LINK_MAX  32000   /* maximum link count per file */
 
 /* for in-memory extent cache entry */
+#define F2FS_MIN_EXTENT_LEN16  /* minimum extent length */
+
 struct extent_info {
rwlock_t ext_lock;  /* rwlock for consistency */
unsigned int fofs;  /* start offset in a file */
@@ -889,6 +891,7 @@ enum {
FI_NO_ALLOC,/* should not allocate any blocks */
FI_UPDATE_DIR,  /* should update inode block for consistency */
FI_DELAY_IPUT,  /* used for the recovery */
+   FI_NO_EXTENT,   /* not to use the extent cache */
FI_INLINE_XATTR,/* used for inline xattr */
 };
 
-- 
1.8.4.474.g128a96c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH/RFC] wait_for_completion_timeout() considered harmful.

2013-11-18 Thread Ingo Molnar


* Jonathan Corbet  wrote:

> On Tue, 19 Nov 2013 00:42:09 +0100
> Peter Zijlstra  wrote:
> 
> > I briefly talked to Thomas about this earlier today and we need to 
> > fix this at a lower level -- the quick 'n dirty solution is to add 
> > 1 jiffy down in the timer-wheel when we enqueue these things.
> 
> That can lead to situations like the one I encountered years ago 
> where msleep(1) would snooze for 20ms.  I didn't get much love for 
> my idea of switching msleep() to hrtimers back then, but I still 
> think it might be be better to provide the resolution that the 
> interface appears to promise.

That looks like a sensible approach - mind resending that patch? We 
can put it into the timer tree and see what happens.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf bug: bad page map

2013-11-18 Thread Ingo Molnar


* Vince Weaver  wrote:

> On Mon, 18 Nov 2013, One Thousand Gnomes wrote:
> 
> > On Mon, 18 Nov 2013 11:41:22 -0500 (EST)
> > Vince Weaver  wrote:
> > 
> > > On Mon, 18 Nov 2013, Peter Zijlstra wrote:
> > > 
> > > > On Fri, Nov 15, 2013 at 01:04:23PM -0500, Vince Weaver wrote:
> > > > > 
> > > > > (figured out the minicom issue).
> > > > > 
> > > > > Anyway while trying to reproduce the last bug I instead got this with
> > > > > the perf_fuzzer.
> > > > > 
> > > > > Is it worth continuing to run and report these issues?  I'm losing 
> > > > > track 
> > > > > of all the open bugs.
> > > > 
> > > > This is looks like ext4. Not entirely sure how perf ties into this.
> > > 
> > > It's believable the filesystem could have issues (it's a fuzzer machine, 
> > > so it's had 100+ unclean shutdowns on an SSD drive in the past few months)
> > > but as far as I know there shouldn't have been any filesystem accesses 
> > > happening at all when the bug triggered.
> > 
> > Obvious question - does it pass fsck currently. If it does then
> > presumably it was sane at the time it went pop ?
> 
> # e2fsck -f /dev/sda1 
>   
> e2fsck 1.42.8 (20-Jun-2013)   
>   
> Pass 1: Checking inodes, blocks, and sizes
>   
> Pass 2: Checking directory structure  
>   
> Pass 3: Checking directory connectivity   
>   
> Pass 4: Checking reference counts 
>   
> Pass 5: Checking group summary information
>   
> /dev/sda1: 620972/3514368 files (0.5% non-contiguous), 9796212/14047744 
> blocks  
> 
> so it looks clean now...

Also, in no way should a corrupted filesystem be able to provoke 
kernel crashes. So even if the filesystem had errors, this would still 
be a kernel bug we need to fix.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-kernel@vger.kernel.org

2013-11-18 Thread

Re: [PATCH] x86/ACPI: Make Sony Vaio Z1 series to use "reboot=pci" default

2013-11-18 Thread Ingo Molnar

* Adam Williamson  wrote:

> On Fri, 2013-11-01 at 15:02 +0100, Joerg Roedel wrote:
> > On Sun, Oct 27, 2013 at 12:06:29AM -0700, Adam Williamson wrote:
> > > On Sat, 2013-10-26 at 11:15 +0200, Ingo Molnar wrote:
> > > >   CONFIG_IRQ_REMAP=y
> > > 
> > > Agh, sorry - I had this down in my mind as a boot time 
> > > parameter, not a compile time option. I'm off on vacation for a 
> > > week in the morning, and it's too late to wait around for a 
> > > kernel compile tonight :/ So I'll have to check this one when I 
> > > get back. Sorry again.
> > 
> > Note, instead of recompiling the kernel, you can also pass 
> > 'intremap=off' on the kernel cmdline to disable interrupt 
> > remapping and test with that.
> 
> Sorry for the delay, folks - just got back to this. Booting with 
> 'intremap=off' results in a slow reboot, i.e., doesn't fix the bug. 
> Is that a sufficient test, Ingo, or do you still want me to build 
> with CONFIG_IRQ_REMAP=n and try that?

That should be a sufficient boot I suspect.

Can you disable virtualization in the BIOS - does that affect reboot 
speed?

I'm just shooting into the dark here - if you can make your system 
boot bzImages then you might as well be better off trying to bisect 
it.

On Fedora booting bzImages of vanilla kernels is reasonably 
straightforward: a 'make localconfig' done while you are booted into a 
Fedora kernel ought to pick up everything into your .config and you 
won't need any modules to be able to boot up to userspace. That should 
ease bisection.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Ingo Molnar


* David Ahern  wrote:

> This is mmap'ed output, not the ring buffers or its stack. As the 
> output file grows, new pages are needed and those are allocated on 
> access via page faults. The ftruncate only extends the file size, it 
> does not allocate pages at that time.

Hm, doesn't MAP_POPULATE prefault pages in this case as well? 
Prefaulting would avoid the most obvious page fault driven feedback 
loops and it would probably be faster as well, because it avoids all 
the pagefaults ...

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Ingo Molnar


* Namhyung Kim  wrote:

> > Well, with 1 khz sampling of a single threaded workload it's 8MB 
> > per second - that's 80 MB for 10 seconds profiling - not the end 
> > of the world.
> 
> We now use 4 khz sampling frequency by default, just FYI. :)

Yes, but if someone samples with 1 khz that's still plenty enough for 
a long enough run.

With 4k sampling you coul do 2.5 seconds of profiling and still have 
80 MB of data :-)

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add a text_poke syscall

2013-11-18 Thread Ingo Molnar


* Andi Kleen  wrote:

> From: Andi Kleen 
> 
> Properly patching running code ("cross modification")
> is a quite complicated business on x86.
> 
> The CPU has specific rules that need to be followed, including
> multiple global barriers.
> 
> Self modifying code is getting more popular, so it's important
> to make it easy to follow the rules.
> 
> The kernel does it properly with text_poke_bp(). But the same
> method is hard to do for user programs.
> 
> This patch adds a (x86 specific) text_poke() syscall that exposes
> the text_poke_bp() machinery to user programs.
> 
> The interface is practically the same as text_poke_bp, just as
> a syscall. I added an extra timeout parameter, that
> will potentially allow batching the global barriers in
> the future. Right now it is enforced to be 0.
> 
> The call also still has a global lock, so it has some scaling
> limitations. If it was commonly used this could be fixed
> by setting up a list of break point locations. Then
> a lock would only be hold to modify the list.
> 
> Right now the implementation is just as simple as possible.
> 
> Proposed man page:
> 
> NAME
>   text_poke - Safely modify running instructions (x86)
> 
> SYNOPSYS
>   int text_poke(void *addr, const void *opcode, size_t len,
> void (*handler)(void), int timeout);
> 
> DESCRIPTION
>   The text_poke system allows to safely modify code that may
>   be currently executing in parallel on other threads.
>   Patch the instruction at addr with the new instructions
>   at opcode of length len. The target instruction will temporarily
>   be patched with a break point, before it is replaced
>   with the final replacement instruction. When the break point
>   hits the code handler will be called in the context
>   of the thread. The handler does not save any registers
>   and cannot return. Typically it would consist of the
>   original instruction and then a jump to after the original
>   instruction. The handler is only needed during the
>   patching process and can be overwritten once the syscall
>   returns. timeout defines an optional timout to indicate
>   to the kernel how long the patching could be delayed.
>   Right now it has to be 0.
> 
> EXAMPLE
> 
> volatile int finished;
> 
> extern char patch[], recovery[], repl[];
> 
> struct res {
> long total;
> long val1, val2, handler;
> };
> 
> int text_poke(void *insn, void *repl, int len, void *handler, int to)
> {
> return syscall(314, insn, repl, len, handler, to);
> }
> 
> void *tfunc(void *arg)
> {
> struct res *res = (struct res *)arg;
> 
> while (!finished) {
> int val;
> asm volatile(   ".globl patch\n"
> ".globl recovery\n"
> ".global repl\n"
>   /* original code to be patched */
> "patch: mov $1,%0\n"
> "1:\n"
> ".section \".text.patchup\",\"x\"\n"
>   /* Called when a race happens during patching.
>  Just execute the original code and jump 
> back. */
> "recovery:\n"
> " mov $3,%0\n"
> " jmp 1b\n"
>   /* replacement code that gets patched in: */
> "repl:\n"
> " mov $2,%0\n"
> ".previous" : "=a" (val));
> if (val == 1)
> res->val1++;
> else if (val == 3)
> res->handler++;
> else
> res->val2++;
> res->total++;
> }
> return NULL;
> }
> 
> int main(int ac, char **av)
> {
> int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
> int ps = sysconf(_SC_PAGESIZE);
> pthread_t pthr[ncpus];
> struct res res[ncpus];
> int i;
> 
> srand(1);
> memset(, 0, sizeof(struct res) * ncpus);
> mprotect(patch - (unsigned long)patch % ps, ps,
>PROT_READ|PROT_WRITE|PROT_EXEC);
> for (i = 0; i < ncpus - 1; i++)
> pthread_create([i], NULL, tfunc, [i]);
> for (i = 0; i < 50; i++) {
> text_poke(patch, repl, 5, recovery, 0);
> nanosleep(&((struct timespec) { 0, rand() % 100 }), NULL);
> text_poke(repl, patch, 5, recovery, 0);
> }
> finished = 1;
> for (i = 0; i < ncpus - 1; i++) {
> pthread_join(pthr[i], NULL);
> printf("%d: val1 %lu val2 %lu handler %lu to %lu\n",
> i, res[i].val1, res[i].val2,

RE: [f2fs-dev] [PATCH 1/2] f2fs: clean up the do_submit_bio flow

2013-11-18 Thread Jaegeuk Kim

Hi,

2013-11-19 (화), 13:25 +0800, Chao Yu:
> Hi
> 
> > -Original Message-
> > From: Jaegeuk Kim [mailto:jaegeuk@samsung.com]
> > Sent: Monday, November 18, 2013 5:12 PM
> > Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> > linux-f2fs-de...@lists.sourceforge.net
> > Subject: [f2fs-dev] [PATCH 1/2] f2fs: clean up the do_submit_bio flow
> > 
> > This patch introduces PAGE_TYPE_OF_BIO() and cleans up do_submit_bio() with 
> > it.
> > 
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/f2fs.h|  1 +
> >  fs/f2fs/segment.c | 39 +--
> >  2 files changed, 22 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > index fe5c2fc..1c783fd 100644
> > --- a/fs/f2fs/f2fs.h
> > +++ b/fs/f2fs/f2fs.h
> > @@ -351,6 +351,7 @@ enum count_type {
> >   * with waiting the bio's completion
> >   * ... Only can be used with META.
> >   */
> > +#define PAGE_TYPE_OF_BIO(type) (type) > META ? META : (type)


I'll add parenthesis as you suggested. Thanks.

> >  enum page_type {
> > DATA,
> > NODE,
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> > index 1f83999..dad5f1a 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -837,32 +837,35 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
> > enum page_type type, bool sync)
> >  {
> > int rw = sync ? WRITE_SYNC : WRITE;
> > -   enum page_type btype = type > META ? META : type;
> > +   enum page_type btype = PAGE_TYPE_OF_BIO(type);
> 
> ->f2fs_submit_bio()
>   : enum page_type btype = PAGE_TYPE_OF_BIO(type);
>   ->do_submit_bio()
>   : enum page_type btype = PAGE_TYPE_OF_BIO(type);
> 
> Could we remove PAGE_TYPE_OF_BIO or use f2fs_bug_on to instead
> in do_submit_bio()? because it looks redundant , and also 
> submit_write_page() will not pass the type which is larger than META.

The f2fs_submit_bio(type) calls do_submit_bio(type) in which the type is
able to be META_FLUSH from sync_meta_pages().
So, we need to do this. :)

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] i2c: s3c2410: dont need CPU_FREQ transitions for exynos series

2013-11-18 Thread Naveen Krishna Chatradhi

For Exynos4 and Exynos5 SoCs from Samsung the i2c clock is based
on a fixed 66 MHz peripheral clock, and therefore is completely
independent of the cpu frequency.
Thus, registering for a CPU freq notifier is very wasteful.

This patch modifes the code such that, i2c bus registers to
cpu_freq_transition only if CONFIG_CPU_FREQ_S3C24XX is enabled.

This change should save a bunch of cpufreq transitions calls
which does not apply to exynos SoCs.

Signed-off-by: Naveen Krishna Chatradhi 
Acked-by: Kyungmin Park 
Reviewed-by: Doug Anderson 
---
Changes since v2:
None, Rebased on for-next of linux-i2c git repo.

Changes since v1:
Use CONFIG_CPU_FREQ_S3C24XX instead of (CONFIG_CPU_FREQ & !CONFIG_EXYNOS)
As commented by Tomasz

 drivers/i2c/busses/i2c-s3c2410.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c
index bf8fb94..fa51dff 100644
--- a/drivers/i2c/busses/i2c-s3c2410.c
+++ b/drivers/i2c/busses/i2c-s3c2410.c
@@ -123,7 +123,7 @@ struct s3c24xx_i2c {
struct s3c2410_platform_i2c *pdata;
int gpios[2];
struct pinctrl  *pctrl;
-#ifdef CONFIG_CPU_FREQ
+#if defined(CONFIG_CPU_FREQ_S3C24XX)
struct notifier_block   freq_transition;
 #endif
 };
@@ -843,7 +843,7 @@ static int s3c24xx_i2c_clockrate(struct s3c24xx_i2c *i2c, 
unsigned int *got)
return 0;
 }
 
-#ifdef CONFIG_CPU_FREQ
+#if defined(CONFIG_CPU_FREQ_S3C24XX)
 
 #define freq_to_i2c(_n) container_of(_n, struct s3c24xx_i2c, freq_transition)
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] pci: check PCI_EXP_FLAGS_SLOT before setting hotplug bridge

2013-11-18 Thread Adam Lee

On Mon, Nov 18, 2013 at 10:38:17AM -0700, Bjorn Helgaas wrote:
> [+cc Myron, Amos, Thomas, Ben]
> 
> On Mon, Nov 18, 2013 at 2:40 AM, Adam Lee  wrote:
> > This patch adds the PCI_EXP_FLAGS_SLOT check back before setting
> > hotplug bridge, which is omitted by an API switching commit,
> > 59875ae489609b2267548dc85160c5f0f0c6f9d4 "PCI/core: Use PCI Express
> > Capability accessors".
> >
> > Some Lenovo laptops hang in booting without this fix.
> 
> What kernel version hangs?  I suspect you might be missing 6d3a1741f1
> ("PCI: Support PCIe Capability Slot registers only for ports with
> slots"), because it *looks* like the current kernel should work
> correctly even without your patch.

No, patching 6d3a1741f1 and d3694d4fa3 doesn't fix the hang.

It hangs in acpi_evaluate_integer() from
59875ae489609b2267548dc85160c5f0f0c6f9d4 "PCI/core: Use PCI Express
Capability accessors" and before
ac212b6980d8d5eda705864fc5a8ecddc6d6eacc "ACPI / processor: Use common
hotplug infrastructure", 3.4~3.11. (double confirmed)

I didn't mention this because:
1, that check is omitted obviously, an API switching patch should not
remove things like that.
2, have run some tests, adding the check back is harmless.
3, I believe ac212b6 just workarounds the hang unexpectedly, bug still
exists.

-- 
Adam Lee
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] itg3200: add dt support.

2013-11-18 Thread NeilBrown

On Tue, 19 Nov 2013 02:49:38 +0100 Sebastian Reichel  wrote:

> Hi,
> 
> On Tue, Nov 19, 2013 at 11:30:13AM +1100, NeilBrown wrote:
> > No new configuration, just a 'compatible' string and
> > documentation.
> 
> itg3200 looks like a candidate for the list of trivial i2c
> devices [0] to me.
> 
> [0] Documentation/devicetree/bindings/i2c/trivial-devices.txt
> 
> -- Sebastian

Hmmm... a file that isn't referenced anywhere else in the kernel
documentation so is unlikely to be found except by people who know it is
there

I can see that having lots of files for trivial devices is rather clumsy, but
there must be a better way.
I don't suppose we could just put the documentation in the device-driver file
in some format similar to kernel-doc and just extract it if it is wanted
separately?  I think people read code a lot more than they read
documentation. I know I do.

But I can re-send with an update to trivial-devices.txt is that is what is
wanted.

Thanks,
NeilBrown

signature.asc
Description: PGP signature

[PATCH] cpufreq/stats: Add "unknown" frequency field in stats tables

2013-11-18 Thread Viresh Kumar

commit 46a310b ([CPUFREQ] Don't set stat->last_index to -1 if the pol->cur has
incorrect value.) tries to handle case where policy->cur does not match any
entry in freq_table.

As indicated in the above commit, the exact match search of freq_table_get index
will return a -1 which is stored in stat->last_index. However, as a result of
the above commit, cpufreq_stat_notifier_trans which updates the statistics,
fails to update any *further* valid transitions that take place as
stat->last_index is -1 as the condition occurs at boot and never solved.

To fix this issue, lets create another entry for time_in_state and trans_table
that will tell the user that CPU was running on unknown frequency for some time.

This is how the output looks like on my thinkpad (Removed some entries to keep
it simple):

$ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state 
280 46
260 138
120 65
100 152
80 34803
unknown 0

$ cat /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table 
   From  :To
 :   2801000   280   260   240   220   200  
 unknown 
  2801000: 01520 91317  
   0 
  280:13 0 4 1 0 1  
   0 
  260:26 1 0 5 1 1  
   0 
  240:11 0 6 0 1 1  
   0 
  220: 8 1 5 3 0 0  
   0 
  200:11 1 2 1 2 0  
   0 
  unknown: 0 0 0 0 0 0  
   0 

Reported-by: Carlos Hernandez 
Reported-and-tested-by: Nishanth Menon 
Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/cpufreq_stats.c | 45 -
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 4cf0d28..ebb21cd 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -72,9 +72,13 @@ static ssize_t show_time_in_state(struct cpufreq_policy 
*policy, char *buf)
return 0;
cpufreq_stats_update(stat->cpu);
for (i = 0; i < stat->state_num; i++) {
-   len += sprintf(buf + len, "%u %llu\n", stat->freq_table[i],
-   (unsigned long long)
-   jiffies_64_to_clock_t(stat->time_in_state[i]));
+   if (stat->freq_table[i] == -1)
+   return sprintf(buf + len, "unknown");
+   else
+   return sprintf(buf + len, "%u", stat->freq_table[i]);
+
+   len += sprintf(buf + len, " %llu\n", (unsigned long long)
+   jiffies_64_to_clock_t(stat->time_in_state[i]));
}
return len;
 }
@@ -94,8 +98,12 @@ static ssize_t show_trans_table(struct cpufreq_policy 
*policy, char *buf)
for (i = 0; i < stat->state_num; i++) {
if (len >= PAGE_SIZE)
break;
-   len += snprintf(buf + len, PAGE_SIZE - len, "%9u ",
-   stat->freq_table[i]);
+   if (stat->freq_table[i] == -1)
+   len += snprintf(buf + len, PAGE_SIZE - len, "%9s ",
+   "unknown");
+   else
+   len += snprintf(buf + len, PAGE_SIZE - len, "%9u ",
+   stat->freq_table[i]);
}
if (len >= PAGE_SIZE)
return PAGE_SIZE;
@@ -106,8 +114,12 @@ static ssize_t show_trans_table(struct cpufreq_policy 
*policy, char *buf)
if (len >= PAGE_SIZE)
break;
 
-   len += snprintf(buf + len, PAGE_SIZE - len, "%9u: ",
-   stat->freq_table[i]);
+   if (stat->freq_table[i] == -1)
+   len += snprintf(buf + len, PAGE_SIZE - len, "%9s: ",
+   "unknown");
+   else
+   len += snprintf(buf + len, PAGE_SIZE - len, "%9u: ",
+   stat->freq_table[i]);
 
for (j = 0; j < stat->state_num; j++) {
if (len >= PAGE_SIZE)
@@ -145,10 +157,12 @@ static struct attribute_group stats_attr_group = {
 static int freq_table_get_index(struct cpufreq_stats *stat, unsigned int freq)
 {
int index;
-   for (index = 0; index < stat->max_state; index++)
+   for (index = 0; index < stat->max_state - 1; index++)
if (stat->freq_table[index] == freq)
return index;
-   return -1;
+
+   /* Last state is INVALID, to mark out of table frequency */
+

[PATCH 9/9 v2] vfio pci: Add vfio iommu implementation for FSL_PAMU

2013-11-18 Thread Bharat Bhushan

This patch adds vfio iommu support for Freescale IOMMU (PAMU -
Peripheral Access Management Unit).

The Freescale PAMU is an aperture-based IOMMU with the following
characteristics.  Each device has an entry in a table in memory
describing the iova->phys mapping. The mapping has:
   -an overall aperture that is power of 2 sized, and has a start iova that
is naturally aligned
   -has 1 or more windows within the aperture
   -number of windows must be power of 2, max is 256
   -size of each window is determined by aperture size / # of windows
   -iova of each window is determined by aperture start iova / # of windows
   -the mapped region in each window can be different than
the window size...mapping must power of 2
   -physical address of the mapping must be naturally aligned
with the mapping size

Some of the code is derived from TYPE1 iommu (driver/vfio/vfio_iommu_type1.c).

Signed-off-by: Bharat Bhushan 
---
v1->v2
 - Use lock around msi-dma list
 - check for overlap between dma and msi-dma pages
 - Some code cleanup as per various comments

 drivers/vfio/Kconfig   |6 +
 drivers/vfio/Makefile  |1 +
 drivers/vfio/vfio_iommu_fsl_pamu.c | 1003 
 include/uapi/linux/vfio.h  |  100 
 4 files changed, 1110 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vfio/vfio_iommu_fsl_pamu.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 26b3d9d..7d1da26 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -8,11 +8,17 @@ config VFIO_IOMMU_SPAPR_TCE
depends on VFIO && SPAPR_TCE_IOMMU
default n
 
+config VFIO_IOMMU_FSL_PAMU
+   tristate
+   depends on VFIO
+   default n
+
 menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
select VFIO_IOMMU_TYPE1 if X86
select VFIO_IOMMU_SPAPR_TCE if (PPC_POWERNV || PPC_PSERIES)
+   select VFIO_IOMMU_FSL_PAMU if FSL_PAMU
help
  VFIO provides a framework for secure userspace device drivers.
  See Documentation/vfio.txt for more details.
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index c5792ec..7461350 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_VFIO) += vfio.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o 
vfio_iommu_spapr_tce.o
+obj-$(CONFIG_VFIO_IOMMU_FSL_PAMU) += vfio_iommu_common.o vfio_iommu_fsl_pamu.o
 obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio_iommu_fsl_pamu.c 
b/drivers/vfio/vfio_iommu_fsl_pamu.c
new file mode 100644
index 000..66efc84
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_fsl_pamu.c
@@ -0,0 +1,1003 @@
+/*
+ * VFIO: IOMMU DMA mapping support for FSL PAMU IOMMU
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ *
+ * Copyright (C) 2013 Freescale Semiconductor, Inc.
+ *
+ * Author: Bharat Bhushan 
+ *
+ * This file is derived from driver/vfio/vfio_iommu_type1.c
+ *
+ * The Freescale PAMU is an aperture-based IOMMU with the following
+ * characteristics.  Each device has an entry in a table in memory
+ * describing the iova->phys mapping. The mapping has:
+ *  -an overall aperture that is power of 2 sized, and has a start iova that
+ *   is naturally aligned
+ *  -has 1 or more windows within the aperture
+ * -number of windows must be power of 2, max is 256
+ * -size of each window is determined by aperture size / # of windows
+ * -iova of each window is determined by aperture start iova / # of windows
+ * -the mapped region in each window can be different than
+ *  the window size...mapping must power of 2
+ * -physical address of the mapping must be naturally aligned
+ *  with the mapping size
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include  /* pci_bus_type */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_iommu_common.h"
+
+#define DRIVER_VERSION  "0.1"
+#define DRIVER_AUTHOR   "Bharat Bhushan "
+#define DRIVER_DESC "FSL PAMU IOMMU driver for VFIO"
+
+struct vfio_iommu {
+   struct iommu_domain *domain;
+   struct mutexlock;
+   dma_addr_t  aperture_start;
+   dma_addr_t

Re: [GIT] Security subsystem updates for 3.13

2013-11-18 Thread James Morris

On Mon, 18 Nov 2013, Linus Torvalds wrote:

> On Mon, Nov 18, 2013 at 3:30 PM, James Morris  wrote:
> > On Mon, 18 Nov 2013, Josh Boyer wrote:
> >>
> >> Unless I'm missing something, I don't think this has landed in Linus'
> >> tree yet.  Linus, did this pull request get NAKed or fall through the
> >> cracks?
> >
> > I think Linus is on vacation and merging only sporadically at the moment.
> 
> No,while it is true that I've been traveling, I've also been actively
> merging, and no, that pull request isn't lost.
> 
> I don't really like the look of it though (particularly all the magic
> new keys changes), so I've been delaying it and merging other regular
> stuff that I don't have any issues with. I'm leaving that pull for
> later in order to go over it more carefully when I'm not doing fifteen
> other pulls the same day..
> 
> If somebody wants to explain about the rationale new keys code, that might 
> help.

Thanks -- I've cc'd David for comment.


-- 
James Morris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC 00/10] ARM: STi: Add dwmac glue and reset controller

2013-11-18 Thread Giuseppe CAVALLARO


On 11/12/2013 2:51 PM, srinivas.kandaga...@st.com wrote:

From: Srinivas Kandagatla 

Hi All,

This patch series adds Ethernet support to STi series SOCs STiH415 and STiH416.
STi SOC series integrates dwmac IP from synopsis, however there is a hardware
glue on top of this standard IP, this glue needs to configured before the
actual dwmac can be used.
To add this a new driver dwmac-sti is introduced whose responsibility is to
configure dwmac glue and before dwmac driver, this is achieved by making dwmac
device node as child to ethernet glue node. Inspired by usb/dwc3.
Also the glue needs to come out of softreset which is why we have added a
softreset controller to driver which looked perfectly neat, rather then
driving the softreset bit from the glue driver.

Also as part of power management in glue driver, I found that there was no
function to determine if the child device is a wakeup source or not.
I have added a new api device_child_may_wakeup API which could be useful for
drivers like this. "PM / wakeup : Introduce device_child_may_wakeup" patch has
that new API and "net: stmmac:sti: Add STi SOC glue driver." glue driver uses
this new API.

The reason for combining all these patches in a same series is because of
dependencies.

This patch series is tested on B2000 and B2020 boards with STiH415, STiH416
SOC on ethernet 100/1000 Links.

Comments?


Hello Srini
ll these patches are ok for me and, as you know, I have already started
using them while porting other SoC. Glue logic is mandatory now!

Thanks
peppe



Thanks,
srini

Srinivas Kandagatla (6):
   drivers: reset: stih415: add softreset controller
   drivers: reset: stih416: add softreset controller
   PM / wakeup : Introduce device_child_may_wakeup
   net: stmmac:sti: Add STi SOC glue driver.
   ARM: STi: Add STiH415 ethernet support.
   ARM: STi: Add STiH416 ethernet support.

Stephen Gallimore (4):
   drivers: reset: STi SoC system configuration reset controller support
   drivers: reset: Reset controller driver for STiH415
   drivers: reset: Reset controller driver for STiH416
   ARM: STi: Add reset controller support to mach-sti Kconfig

  .../devicetree/bindings/net/sti-dwmac.txt  |   45 +++
  .../devicetree/bindings/reset/st,sti-powerdown.txt |   46 +++
  .../devicetree/bindings/reset/st,sti-softreset.txt |   45 +++
  arch/arm/boot/dts/stih415-clock.dtsi   |   14 +
  arch/arm/boot/dts/stih415-pinctrl.dtsi |   82 ++
  arch/arm/boot/dts/stih415.dtsi |   67 +
  arch/arm/boot/dts/stih416-clock.dtsi   |   14 +
  arch/arm/boot/dts/stih416-pinctrl.dtsi |  106 +++
  arch/arm/boot/dts/stih416.dtsi |   69 +
  arch/arm/boot/dts/stih41x-b2000.dtsi   |   32 +++
  arch/arm/boot/dts/stih41x-b2020.dtsi   |   33 +++
  arch/arm/mach-sti/Kconfig  |3 +
  drivers/base/power/wakeup.c|   23 ++
  drivers/net/ethernet/stmicro/stmmac/Makefile   |1 +
  drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c|  294 
  drivers/reset/Kconfig  |2 +
  drivers/reset/Makefile |3 +
  drivers/reset/sti/Kconfig  |   15 +
  drivers/reset/sti/Makefile |4 +
  drivers/reset/sti/reset-stih415.c  |   99 +++
  drivers/reset/sti/reset-stih416.c  |  101 +++
  drivers/reset/sti/reset-syscfg.c   |  186 
  drivers/reset/sti/reset-syscfg.h   |   69 +
  .../dt-bindings/reset-controller/stih415-resets.h  |   23 ++
  .../dt-bindings/reset-controller/stih416-resets.h  |   25 ++
  include/linux/pm_wakeup.h  |1 +
  26 files changed, 1402 insertions(+), 0 deletions(-)
  create mode 100644 Documentation/devicetree/bindings/net/sti-dwmac.txt
  create mode 100644 
Documentation/devicetree/bindings/reset/st,sti-powerdown.txt
  create mode 100644 
Documentation/devicetree/bindings/reset/st,sti-softreset.txt
  create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
  create mode 100644 drivers/reset/sti/Kconfig
  create mode 100644 drivers/reset/sti/Makefile
  create mode 100644 drivers/reset/sti/reset-stih415.c
  create mode 100644 drivers/reset/sti/reset-stih416.c
  create mode 100644 drivers/reset/sti/reset-syscfg.c
  create mode 100644 drivers/reset/sti/reset-syscfg.h
  create mode 100644 include/dt-bindings/reset-controller/stih415-resets.h
  create mode 100644 include/dt-bindings/reset-controller/stih416-resets.h



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev] [PATCH 1/2] f2fs: clean up the do_submit_bio flow

2013-11-18 Thread Chao Yu

Hi

> -Original Message-
> From: Jaegeuk Kim [mailto:jaegeuk@samsung.com]
> Sent: Monday, November 18, 2013 5:12 PM
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 
> linux-f2fs-de...@lists.sourceforge.net
> Subject: [f2fs-dev] [PATCH 1/2] f2fs: clean up the do_submit_bio flow
> 
> This patch introduces PAGE_TYPE_OF_BIO() and cleans up do_submit_bio() with 
> it.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|  1 +
>  fs/f2fs/segment.c | 39 +--
>  2 files changed, 22 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index fe5c2fc..1c783fd 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -351,6 +351,7 @@ enum count_type {
>   *   with waiting the bio's completion
>   * ...   Only can be used with META.
>   */
> +#define PAGE_TYPE_OF_BIO(type)   (type) > META ? META : (type)
>  enum page_type {
>   DATA,
>   NODE,
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 1f83999..dad5f1a 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -837,32 +837,35 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
>   enum page_type type, bool sync)
>  {
>   int rw = sync ? WRITE_SYNC : WRITE;
> - enum page_type btype = type > META ? META : type;
> + enum page_type btype = PAGE_TYPE_OF_BIO(type);

->f2fs_submit_bio()
: enum page_type btype = PAGE_TYPE_OF_BIO(type);
->do_submit_bio()
: enum page_type btype = PAGE_TYPE_OF_BIO(type);

Could we remove PAGE_TYPE_OF_BIO or use f2fs_bug_on to instead
in do_submit_bio()? because it looks redundant , and also 
submit_write_page() will not pass the type which is larger than META.

> + struct bio *bio = sbi->bio[btype];
> + struct bio_private *p;
> +
> + if (!bio)
> + return;
> +
> + sbi->bio[btype] = NULL;
> 
>   if (type >= META_FLUSH)
>   rw = WRITE_FLUSH_FUA;
> -
>   if (btype == META)
>   rw |= REQ_META;
> 
> - if (sbi->bio[btype]) {
> - struct bio_private *p = sbi->bio[btype]->bi_private;
> - p->sbi = sbi;
> - sbi->bio[btype]->bi_end_io = f2fs_end_io_write;
> + p = bio->bi_private;
> + p->sbi = sbi;
> + bio->bi_end_io = f2fs_end_io_write;
> 
> - trace_f2fs_do_submit_bio(sbi->sb, btype, sync, sbi->bio[btype]);
> + trace_f2fs_do_submit_bio(sbi->sb, btype, sync, bio);
> 
> - if (type == META_FLUSH) {
> - DECLARE_COMPLETION_ONSTACK(wait);
> - p->is_sync = true;
> - p->wait = 
> - submit_bio(rw, sbi->bio[btype]);
> - wait_for_completion();
> - } else {
> - p->is_sync = false;
> - submit_bio(rw, sbi->bio[btype]);
> - }
> - sbi->bio[btype] = NULL;
> + if (type == META_FLUSH) {
> + DECLARE_COMPLETION_ONSTACK(wait);
> + p->is_sync = true;
> + p->wait = 
> + submit_bio(rw, bio);
> + wait_for_completion();
> + } else {
> + p->is_sync = false;
> + submit_bio(rw, bio);
>   }
>  }
> 
> --
> 1.8.4.474.g128a96c
> 
> 
> --
> DreamFactory - Open Source REST & JSON Services for HTML5 & Native Apps
> OAuth, Users, Roles, SQL, NoSQL, BLOB Storage and External API Access
> Free app hosting. Or install the open source package on any LAMP server.
> Sign up and see examples for AngularJS, jQuery, Sencha Touch and Native!
> http://pubads.g.doubleclick.net/gampad/clk?id=63469471=/4140/ostg.clktrk
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/9 v2] pci:msi: add weak function for returning msi region info

2013-11-18 Thread Bharat Bhushan

In Aperture type of IOMMU (like FSL PAMU), VFIO-iommu system need to know
the MSI region to map its window in h/w. This patch just defines the
required weak functions only and will be used by followup patches.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 - Added description on "struct msi_region" 

 drivers/pci/msi.c   |   22 ++
 include/linux/msi.h |   14 ++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index d5f90d6..2643a29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -67,6 +67,28 @@ int __weak arch_msi_check_device(struct pci_dev *dev, int 
nvec, int type)
return chip->check_device(chip, dev, nvec, type);
 }
 
+int __weak arch_msi_get_region_count(void)
+{
+   return 0;
+}
+
+int __weak arch_msi_get_region(int region_num, struct msi_region *region)
+{
+   return 0;
+}
+
+int msi_get_region_count(void)
+{
+   return arch_msi_get_region_count();
+}
+EXPORT_SYMBOL(msi_get_region_count);
+
+int msi_get_region(int region_num, struct msi_region *region)
+{
+   return arch_msi_get_region(region_num, region);
+}
+EXPORT_SYMBOL(msi_get_region);
+
 int __weak arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 {
struct msi_desc *entry;
diff --git a/include/linux/msi.h b/include/linux/msi.h
index b17ead8..ade1480 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -51,6 +51,18 @@ struct msi_desc {
 };
 
 /*
+ * This structure is used to get
+ * - physical address
+ * - size
+ * of a msi region
+ */
+struct msi_region {
+   int region_num; /* MSI region number */
+   dma_addr_t addr; /* Address of MSI region */
+   size_t size; /* Size of MSI region */
+};
+
+/*
  * The arch hooks to setup up msi irqs. Those functions are
  * implemented as weak symbols so that they /can/ be overriden by
  * architecture specific code if needed.
@@ -64,6 +76,8 @@ void arch_restore_msi_irqs(struct pci_dev *dev, int irq);
 
 void default_teardown_msi_irqs(struct pci_dev *dev);
 void default_restore_msi_irqs(struct pci_dev *dev, int irq);
+int arch_msi_get_region_count(void);
+int arch_msi_get_region(int region_num, struct msi_region *region);
 
 struct msi_chip {
struct module *owner;
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/9 v2] powerpc: msi: Extend the msi region interface to get info from fsl_msi

2013-11-18 Thread Bharat Bhushan

The FSL MSI will provide the interface to get:
  - Number of MSI regions (which is number of MSI banks for powerpc)
  - Get the region address range: Physical page which have the
address/addresses used for generating MSI interrupt
and size of the page.

These are required to create IOMMU (Freescale PAMU) mapping for
devices which are directly assigned using VFIO.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 - Atomic increment of bank index for parallel probe of msi node 

 arch/powerpc/sysdev/fsl_msi.c |   42 +++-
 arch/powerpc/sysdev/fsl_msi.h |   11 -
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index 77efbae..eeebbf0 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -109,6 +109,34 @@ static int fsl_msi_init_allocator(struct fsl_msi *msi_data)
return 0;
 }
 
+static int fsl_msi_get_region_count(void)
+{
+   int count = 0;
+   struct fsl_msi *msi_data;
+
+   list_for_each_entry(msi_data, _head, list)
+   count++;
+
+   return count;
+}
+
+static int fsl_msi_get_region(int region_num, struct msi_region *region)
+{
+   struct fsl_msi *msi_data;
+
+   list_for_each_entry(msi_data, _head, list) {
+   if (msi_data->bank_index == region_num) {
+   region->region_num = msi_data->bank_index;
+   /* Setting PAGE_SIZE as MSIIR is a 4 byte register */
+   region->size = PAGE_SIZE;
+   region->addr = msi_data->msiir & ~(region->size - 1);
+   return 0;
+   }
+   }
+
+   return -ENODEV;
+}
+
 static int fsl_msi_check_device(struct pci_dev *pdev, int nvec, int type)
 {
if (type == PCI_CAP_ID_MSIX)
@@ -150,7 +178,8 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int 
hwirq,
if (reg && (len == sizeof(u64)))
address = be64_to_cpup(reg);
else
-   address = fsl_pci_immrbar_base(hose) + msi_data->msiir_offset;
+   address = fsl_pci_immrbar_base(hose) +
+  (msi_data->msiir & 0xf);
 
msg->address_lo = lower_32_bits(address);
msg->address_hi = upper_32_bits(address);
@@ -393,6 +422,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
const struct fsl_msi_feature *features;
int len;
u32 offset;
+   static atomic_t bank_index = ATOMIC_INIT(-1);
 
match = of_match_device(fsl_of_msi_ids, >dev);
if (!match)
@@ -436,18 +466,15 @@ static int fsl_of_msi_probe(struct platform_device *dev)
dev->dev.of_node->full_name);
goto error_out;
}
-   msi->msiir_offset =
-   features->msiir_offset + (res.start & 0xf);
 
/*
 * First read the MSIIR/MSIIR1 offset from dts
 * On failure use the hardcode MSIIR offset
 */
if (of_address_to_resource(dev->dev.of_node, 1, ))
-   msi->msiir_offset = features->msiir_offset +
-   (res.start & MSIIR_OFFSET_MASK);
+   msi->msiir = res.start + features->msiir_offset;
else
-   msi->msiir_offset = msiir.start & MSIIR_OFFSET_MASK;
+   msi->msiir = msiir.start;
}
 
msi->feature = features->fsl_pic_ip;
@@ -521,6 +548,7 @@ static int fsl_of_msi_probe(struct platform_device *dev)
}
}
 
+   msi->bank_index = atomic_inc_return(_index);
list_add_tail(>list, _head);
 
/* The multiple setting ppc_md.setup_msi_irqs will not harm things */
@@ -528,6 +556,8 @@ static int fsl_of_msi_probe(struct platform_device *dev)
ppc_md.setup_msi_irqs = fsl_setup_msi_irqs;
ppc_md.teardown_msi_irqs = fsl_teardown_msi_irqs;
ppc_md.msi_check_device = fsl_msi_check_device;
+   ppc_md.msi_get_region_count = fsl_msi_get_region_count;
+   ppc_md.msi_get_region = fsl_msi_get_region;
} else if (ppc_md.setup_msi_irqs != fsl_setup_msi_irqs) {
dev_err(>dev, "Different MSI driver already installed!\n");
err = -ENODEV;
diff --git a/arch/powerpc/sysdev/fsl_msi.h b/arch/powerpc/sysdev/fsl_msi.h
index df9aa9f..a2cc5a2 100644
--- a/arch/powerpc/sysdev/fsl_msi.h
+++ b/arch/powerpc/sysdev/fsl_msi.h
@@ -31,14 +31,21 @@ struct fsl_msi {
struct irq_domain *irqhost;
 
unsigned long cascade_irq;
-
-   u32 msiir_offset; /* Offset of MSIIR, relative to start of CCSR */
+   phys_addr_t msiir; /* MSIIR Address in CCSR */
u32 ibs_shift; /* Shift of interrupt bit select */
u32 srs_shift; /* Shift of the shared interrupt register select */

Re: [PATCH RFC 0/9]net: stmmac PM related fixes.

2013-11-18 Thread Giuseppe CAVALLARO


On 11/18/2013 12:30 PM, srinivas.kandaga...@st.com wrote:

From: Srinivas Kandagatla 

Hi Peppe,

During PM_SUSPEND_FREEZE testing, I have noticed that PM support in STMMAC is
partly broken. I had to re-arrange the code to do PM correctly. There were lot
of things I did not like personally and some bits did not work in the first
place. I thought this is the nice opportunity to clean the mess up.

Here is what I did:

1> Test PM suspend freeeze via pm_test
It did not work for following reasons.
  - If the power to gmac is removed when it enters in low power state.
stmmac_resume could not cope up with such behaviour, it was expecting the ip
register contents to be still same as before entering low power, This
assumption is wrong. So I started to add some code to do Hardware
initialization, thats when I started to re-arrange the code. stmmac_open
contains both resource and memory allocations and hardware initialization. I
had to separate these two things in two different functions.

These two patches do that
   net: stmmac: move dma allocation to new function
   net: stmmac: move hardware setup for stmmac_open to new function

And rest of the other patches are fixing the loose ends, things like mdio
reset, which might be necessary in cases likes hibernation(I did not test).

In hibernation cases the driver was just unregistering with subsystems and
releasing resources which I did not like and its not necessary to do this as
part of PM. So using the same stmmac_suspend/resume made more sense for
hibernation cases than using stmmac_open/release.
Also fixed a NULL pointer dereference bug too.

2> Test WOL via PM_SUSPEND_FREEZE
Did get an wakeup interrupt, but could not wakeup a freeze system.
So I had to add pm_wakeup_event to the driver.
net: stmmac: notify the PM core of a wakeup event. patch.

Also few patches like
   net: stmmac: make stmmac_mdio_reset non-static
   net: stmmac: restore pinstate in pm resume.
helps the resume function to reset the phy and put back the pins in default
state.

Comments?


Srini, as we had internally discussed before sending the patches to the
net ML, I agreed with your work. Some parts of the PM stuff was fully
tested on our product kernels (where the PM was a bit different
especially on HoM) and nobody raised issues to me for this code.
Also some rework you did, for example to move the dma allocation in
a new function is fine for me.

So you continue to have my Acked-by for all.

peppe



Thanks,
srini

Srinivas Kandagatla (9):
   net: stmmac: support max-speed device tree property
   net: stmmac: mdio: remove reset gpio free
   net: stmmac: move dma allocation to new function
   net: stmmac: move hardware setup for stmmac_open to new function
   net: stmmac: make stmmac_mdio_reset non-static
   net: stmmac: fix power mangement suspend-resume case
   net: stmmac: use suspend functions for hibernation
   net: stmmac: restore pinstate in pm resume.
   net: stmmac: notify the PM core of a wakeup event.

  drivers/net/ethernet/stmicro/stmmac/stmmac.h   |4 +-
  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c  |  360 ++--
  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c  |3 +-
  .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |   51 +--
  include/linux/stmmac.h |1 +
  5 files changed, 209 insertions(+), 210 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/9 v2] powerpc: pci: Extend msi iova page setup to arch specific

2013-11-18 Thread Bharat Bhushan

This patch extend the interface to arch specific code for setting
msi iova address for a msi page. Machine specific code is not yet
implemented.

Signed-off-by: Bharat Bhushan 
---
v2
 - new patch

 arch/powerpc/include/asm/machdep.h |2 ++
 arch/powerpc/kernel/msi.c  |   10 ++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 8d1b787..e87b806 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -132,6 +132,8 @@ struct machdep_calls {
/* Returns the requested region's address and size */
int (*msi_get_region)(int region_num,
  struct msi_region *region);
+   int (*msi_set_iova)(struct pci_dev *pdev, int region_num,
+   dma_addr_t iova, bool set);
 #endif
 
void(*restart)(char *cmd);
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 1a67787..e2bd555 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -13,6 +13,16 @@
 
 #include 
 
+int arch_msi_set_iova(struct pci_dev *pdev, int region_num,
+ dma_addr_t iova, bool set)
+{
+   if (ppc_md.msi_set_iova) {
+   pr_debug("msi: Using platform get_region_count routine.\n");
+   return ppc_md.msi_set_iova(pdev, region_num, iova, set);
+   }
+   return 0;
+}
+
 int arch_msi_get_region_count(void)
 {
if (ppc_md.msi_get_region_count) {
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 7/9 v2] pci: msi: Extend msi iova setting interface to powerpc arch

2013-11-18 Thread Bharat Bhushan

Now we Keep track of devices which have msi page mapping to specific
iova page for all msi bank. When composing MSI address and data then
this list will be traversed. If device found in the list then use
configured iova page otherwise iova page will be taken as before.

Signed-off-by: Bharat Bhushan 
---
v2
 - new patch

 arch/powerpc/sysdev/fsl_msi.c |   90 +
 arch/powerpc/sysdev/fsl_msi.h |   16 ++-
 2 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
index eeebbf0..52d2beb 100644
--- a/arch/powerpc/sysdev/fsl_msi.c
+++ b/arch/powerpc/sysdev/fsl_msi.c
@@ -137,6 +137,75 @@ static int fsl_msi_get_region(int region_num, struct 
msi_region *region)
return -ENODEV;
 }
 
+/* Add device to the list which have iova page mapping */
+static int fsl_msi_add_iova_device(struct fsl_msi *msi_data,
+  struct pci_dev *pdev, dma_addr_t iova)
+{
+   struct fsl_msi_device *device;
+
+   mutex_lock(_data->lock);
+   list_for_each_entry(device, _data->device_list, list) {
+   /* If mapping already exits then update with new page mapping */
+   if (device->dev == pdev) {
+   device->iova = iova;
+   mutex_unlock(_data->lock);
+   return 0;
+   }
+   }
+
+   device = kzalloc(sizeof(struct fsl_msi_device), GFP_KERNEL);
+   if (!device) {
+   pr_err("%s: Memory allocation failed\n", __func__);
+   mutex_unlock(_data->lock);
+   return -ENOMEM;
+   }
+
+   device->dev = pdev;
+   device->iova = iova;
+   list_add_tail(>list, _data->device_list);
+   mutex_unlock(_data->lock);
+   return 0;
+}
+
+/* Remove device to the list which have iova page mapping */
+static int fsl_msi_del_iova_device(struct fsl_msi *msi_data,
+  struct pci_dev *pdev)
+{
+   struct fsl_msi_device *device;
+
+   mutex_lock(_data->lock);
+   list_for_each_entry(device, _data->device_list, list) {
+   if (device->dev == pdev) {
+   list_del(>list);
+   kfree(device);
+   break;
+   }
+   }
+   mutex_unlock(_data->lock);
+   return 0;
+}
+
+/* set/clear device iova mapping for the requested msi region */
+static int fsl_msi_set_iova(struct pci_dev *pdev, int region_num,
+   dma_addr_t iova, bool set)
+{
+   struct fsl_msi *msi_data;
+   int ret = -EINVAL;
+
+   list_for_each_entry(msi_data, _head, list) {
+   if (msi_data->bank_index != region_num)
+   continue;
+
+   if (set)
+   ret = fsl_msi_add_iova_device(msi_data, pdev, iova);
+   else
+   ret = fsl_msi_del_iova_device(msi_data, pdev);
+
+   break;
+   }
+   return ret;
+}
+
 static int fsl_msi_check_device(struct pci_dev *pdev, int nvec, int type)
 {
if (type == PCI_CAP_ID_MSIX)
@@ -167,6 +236,7 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int 
hwirq,
struct msi_msg *msg,
struct fsl_msi *fsl_msi_data)
 {
+   struct fsl_msi_device *device;
struct fsl_msi *msi_data = fsl_msi_data;
struct pci_controller *hose = pci_bus_to_host(pdev->bus);
u64 address; /* Physical address of the MSIIR */
@@ -181,6 +251,15 @@ static void fsl_compose_msi_msg(struct pci_dev *pdev, int 
hwirq,
address = fsl_pci_immrbar_base(hose) +
   (msi_data->msiir & 0xf);
 
+   mutex_lock(_data->lock);
+   list_for_each_entry(device, _data->device_list, list) {
+   if (device->dev == pdev) {
+   address = device->iova | (msi_data->msiir & 0xfff);
+   break;
+   }
+   }
+   mutex_unlock(_data->lock);
+
msg->address_lo = lower_32_bits(address);
msg->address_hi = upper_32_bits(address);
 
@@ -356,6 +435,7 @@ static int fsl_of_msi_remove(struct platform_device *ofdev)
struct fsl_msi *msi = platform_get_drvdata(ofdev);
int virq, i;
struct fsl_msi_cascade_data *cascade_data;
+   struct fsl_msi_device *device;
 
if (msi->list.prev != NULL)
list_del(>list);
@@ -371,6 +451,13 @@ static int fsl_of_msi_remove(struct platform_device *ofdev)
msi_bitmap_free(>bitmap);
if ((msi->feature & FSL_PIC_IP_MASK) != FSL_PIC_IP_VMPIC)
iounmap(msi->msi_regs);
+
+   mutex_lock(>lock);
+   list_for_each_entry(device, >device_list, list) {
+   list_del(>list);
+   kfree(device);
+   }
+   mutex_unlock(>lock);
kfree(msi);
 
return 0;
@@ -436,6 +523,8 @@

Re: [PATCH] Add a text_poke syscall

2013-11-18 Thread H. Peter Anvin

On 11/18/2013 04:27 PM, Andi Kleen wrote:
> 
> Proposed man page:
> 
> NAME
>   text_poke - Safely modify running instructions (x86)
> 
> SYNOPSYS
>   int text_poke(void *addr, const void *opcode, size_t len,
> void (*handler)(void), int timeout);
> 
> DESCRIPTION
>   The text_poke system allows to safely modify code that may
>   be currently executing in parallel on other threads.
>   Patch the instruction at addr with the new instructions
>   at opcode of length len. The target instruction will temporarily
>   be patched with a break point, before it is replaced
>   with the final replacement instruction. When the break point
>   hits the code handler will be called in the context
>   of the thread. The handler does not save any registers
>   and cannot return. Typically it would consist of the
>   original instruction and then a jump to after the original
>   instruction. The handler is only needed during the
>   patching process and can be overwritten once the syscall
>   returns. timeout defines an optional timout to indicate
>   to the kernel how long the patching could be delayed.
>   Right now it has to be 0.
> 

I think I would prefer an interface which took a list of patch points,
or implemented only the aspects which are impossible to do in user space.

All we really need in the kernel is the IPI broadcasts - the rest can be
done in user space, including intercepting SIGTRAP.  For userspace it is
probably the best to just put a thread to sleep until the patching is
done, which can be done with a futex.

One advantage with doing this in userspace is that the kernel doesn't
have to be responsible avoiding holding a thread due to a slightly
different SIGTRAP -- it will all come out after the signal handler is
restored, anyway.

That being said, the user space code would really need to be librarized.

-hpa

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 8/9 v2] vfio: moving some functions in common file

2013-11-18 Thread Bharat Bhushan

Some function defined in vfio_iommu_type1.c are generic (not specific
or type1 iommu) and we want to use these for FSL IOMMU (PAMU) and
going forward in iommu-none driver.
So I have created a new file naming vfio_iommu_common.c and moved some
of generic functions into this file.

I Agree (with Alex Williamson and myself :-)) that some more functions
can be moved to this new common file (with some changes in type1/fsl_pamu
and others). But in this patch i avoided doing these changes and
just moved functions which are straight forward and allow me to
get fsl-powerpc vfio framework in place.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 - removed un-necessary header file inclusion
 - mark static function which are internal to *common.c

 drivers/vfio/Makefile|4 +-
 drivers/vfio/vfio_iommu_common.c |  227 ++
 drivers/vfio/vfio_iommu_common.h |   27 +
 drivers/vfio/vfio_iommu_type1.c  |  206 +--
 4 files changed, 257 insertions(+), 207 deletions(-)
 create mode 100644 drivers/vfio/vfio_iommu_common.c
 create mode 100644 drivers/vfio/vfio_iommu_common.h

diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 72bfabc..c5792ec 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,4 +1,4 @@
 obj-$(CONFIG_VFIO) += vfio.o
-obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
-obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
+obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_common.o vfio_iommu_type1.o
+obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_common.o 
vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_PCI) += pci/
diff --git a/drivers/vfio/vfio_iommu_common.c b/drivers/vfio/vfio_iommu_common.c
new file mode 100644
index 000..08eea71
--- /dev/null
+++ b/drivers/vfio/vfio_iommu_common.c
@@ -0,0 +1,227 @@
+/*
+ * VFIO: Common code for vfio IOMMU support
+ *
+ * Copyright (C) 2012 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ * Author: Bharat Bhushan 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Derived from original vfio:
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ * Author: Tom Lyon, p...@cisco.com
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static bool disable_hugepages;
+module_param_named(disable_hugepages,
+  disable_hugepages, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_hugepages,
+"Disable VFIO IOMMU support for IOMMU hugepages.");
+
+struct vwork {
+   struct mm_struct*mm;
+   longnpage;
+   struct work_struct  work;
+};
+
+/* delayed decrement/increment for locked_vm */
+static void vfio_lock_acct_bg(struct work_struct *work)
+{
+   struct vwork *vwork = container_of(work, struct vwork, work);
+   struct mm_struct *mm;
+
+   mm = vwork->mm;
+   down_write(>mmap_sem);
+   mm->locked_vm += vwork->npage;
+   up_write(>mmap_sem);
+   mmput(mm);
+   kfree(vwork);
+}
+
+void vfio_lock_acct(long npage)
+{
+   struct vwork *vwork;
+   struct mm_struct *mm;
+
+   if (!current->mm || !npage)
+   return; /* process exited or nothing to do */
+
+   if (down_write_trylock(>mm->mmap_sem)) {
+   current->mm->locked_vm += npage;
+   up_write(>mm->mmap_sem);
+   return;
+   }
+
+   /*
+* Couldn't get mmap_sem lock, so must setup to update
+* mm->locked_vm later. If locked_vm were atomic, we
+* wouldn't need this silliness
+*/
+   vwork = kmalloc(sizeof(struct vwork), GFP_KERNEL);
+   if (!vwork)
+   return;
+   mm = get_task_mm(current);
+   if (!mm) {
+   kfree(vwork);
+   return;
+   }
+   INIT_WORK(>work, vfio_lock_acct_bg);
+   vwork->mm = mm;
+   vwork->npage = npage;
+   schedule_work(>work);
+}
+
+/*
+ * Some mappings aren't backed by a struct page, for example an mmap'd
+ * MMIO range for our own or another device.  These use a different
+ * pfn conversion and shouldn't be tracked as locked pages.
+ */
+static bool is_invalid_reserved_pfn(unsigned long pfn)
+{
+   if (pfn_valid(pfn)) {
+   bool reserved;
+   struct page *tail = pfn_to_page(pfn);
+   struct page *head = compound_trans_head(tail);
+   reserved = !!(PageReserved(head));
+   if (head != tail) {
+   /*
+* "head" is not a dangling pointer
+* (compound_trans_head takes care of that)
+* but the hugepage may have been split
+* from under us (and we may not hold a
+* reference count on the head page so it can
+

[PATCH 5/9 v2] pci/msi: interface to set an iova for a msi region

2013-11-18 Thread Bharat Bhushan

This patch defines an interface by which a msi page
can be mapped to a specific iova page.

This is a requirement in aperture type of IOMMUs (like Freescale PAMU),
where we map msi iova page just after guest memory iova address.

Signed-off-by: Bharat Bhushan 
---
v2
 - new patch

 drivers/pci/msi.c   |   13 +
 include/linux/pci.h |8 
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 2643a29..040609f 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -77,6 +77,19 @@ int __weak arch_msi_get_region(int region_num, struct 
msi_region *region)
return 0;
 }
 
+int __weak arch_msi_set_iova(struct pci_dev *pdev, int region_num,
+dma_addr_t iova, bool set)
+{
+   return 0;
+}
+
+int msi_set_iova(struct pci_dev *pdev, int region_num,
+dma_addr_t iova, bool set)
+{
+   return arch_msi_set_iova(pdev, region_num, iova, set);
+}
+EXPORT_SYMBOL(msi_set_iova);
+
 int msi_get_region_count(void)
 {
return arch_msi_get_region_count();
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c587034..c6d3e58 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1195,6 +1195,12 @@ static inline int msi_get_region(int region_num, struct 
msi_region *region)
 {
return 0;
 }
+
+static inline int msi_set_iova(struct pci_dev *pdev, int region_num,
+  dma_addr_t iova, bool set)
+{
+   return 0;
+}
 #else
 int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
 int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
@@ -1209,6 +1215,8 @@ void pci_restore_msi_state(struct pci_dev *dev);
 int pci_msi_enabled(void);
 int msi_get_region_count(void);
 int msi_get_region(int region_num, struct msi_region *region);
+int msi_set_iova(struct pci_dev *pdev, int region_num,
+dma_addr_t iova, bool set);
 #endif
 
 #ifdef CONFIG_PCIEPORTBUS
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/9 v2] pci: msi: expose msi region information functions

2013-11-18 Thread Bharat Bhushan

So by now we have defined all the interfaces for getting the msi region,
this patch expose the interface to linux subsystem. These will be used by
vfio subsystem for setting up iommu for MSI interrupt of direct assignment
devices.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 - None

 include/linux/pci.h |   13 +
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index da172f9..c587034 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1142,6 +1142,7 @@ struct msix_entry {
u16 entry;  /* driver uses to specify entry, OS writes */
 };
 
+struct msi_region;
 
 #ifndef CONFIG_PCI_MSI
 static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
@@ -1184,6 +1185,16 @@ static inline int pci_msi_enabled(void)
 {
return 0;
 }
+
+static inline int msi_get_region_count(void)
+{
+   return 0;
+}
+
+static inline int msi_get_region(int region_num, struct msi_region *region)
+{
+   return 0;
+}
 #else
 int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
 int pci_enable_msi_block_auto(struct pci_dev *dev, unsigned int *maxvec);
@@ -1196,6 +1207,8 @@ void pci_disable_msix(struct pci_dev *dev);
 void msi_remove_pci_irq_vectors(struct pci_dev *dev);
 void pci_restore_msi_state(struct pci_dev *dev);
 int pci_msi_enabled(void);
+int msi_get_region_count(void);
+int msi_get_region(int region_num, struct msi_region *region);
 #endif
 
 #ifdef CONFIG_PCIEPORTBUS
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/9 v2] powerpc: pci: Add arch specific msi region interface

2013-11-18 Thread Bharat Bhushan

This patch adds the interface to get the msi region information from arch
specific code. The machine spicific code is not yet defined.

Signed-off-by: Bharat Bhushan 
---
v1->v2
 - None

 arch/powerpc/include/asm/machdep.h |8 
 arch/powerpc/kernel/msi.c  |   18 ++
 2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 8b48090..8d1b787 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -30,6 +30,7 @@ struct file;
 struct pci_controller;
 struct kimage;
 struct pci_host_bridge;
+struct msi_region;
 
 struct machdep_calls {
char*name;
@@ -124,6 +125,13 @@ struct machdep_calls {
int (*setup_msi_irqs)(struct pci_dev *dev,
  int nvec, int type);
void(*teardown_msi_irqs)(struct pci_dev *dev);
+
+   /* Returns the number of MSI regions (banks) */
+   int (*msi_get_region_count)(void);
+
+   /* Returns the requested region's address and size */
+   int (*msi_get_region)(int region_num,
+ struct msi_region *region);
 #endif
 
void(*restart)(char *cmd);
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 8bbc12d..1a67787 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -13,6 +13,24 @@
 
 #include 
 
+int arch_msi_get_region_count(void)
+{
+   if (ppc_md.msi_get_region_count) {
+   pr_debug("msi: Using platform get_region_count routine.\n");
+   return ppc_md.msi_get_region_count();
+   }
+   return 0;
+}
+
+int arch_msi_get_region(int region_num, struct msi_region *region)
+{
+   if (ppc_md.msi_get_region) {
+   pr_debug("msi: Using platform get_region routine.\n");
+   return ppc_md.msi_get_region(region_num, region);
+   }
+   return 0;
+}
+
 int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
 {
if (!ppc_md.setup_msi_irqs || !ppc_md.teardown_msi_irqs) {
-- 
1.7.0.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/9 v2] vfio-pci: add support for Freescale IOMMU (PAMU)

2013-11-18 Thread Bharat Bhushan

From: Bharat Bhushan 

PAMU (FSL IOMMU) has a concept of primary window and subwindows.
Primary window corresponds to the complete guest iova address space
(including MSI space), with respect to IOMMU_API this is termed as
geometry. IOVA Base of subwindow is determined from the number of
subwindows (configurable using iommu API).
MSI I/O page must be within the geometry and maximum supported
subwindows, so MSI IO-page is setup just after guest memory iova space.

So patch 1/9-4/9(inclusive) are for defining the interface to get:
  - Number of MSI regions (which is number of MSI banks for powerpc)
  - MSI-region address range: Physical page which have the
address/addresses used for generating MSI interrupt
and size of the page.

Patch 5/9-7/9(inclusive) is defining the interface of setting up
MSI iova-base for a msi region(bank) for a device. so that when
msi-message will be composed then this configured iova will be used.
Earlier we were using iommu interface for getting the configured iova
which was not currect and Alex Williamson suggeested this type of interface.

patch 8/9 moves some common functions in a separate file so that these
can be used by FSL_PAMU implementation (next patch uses this).
These will be used later for iommu-none implementation. I believe we
can do more of this but will take step by step.

Finally last patch actually adds the support for FSL-PAMU :)

v1->v2
 - Added interface for setting msi iova for a msi region for a device.
   Earlier I added iommu interface for same but as per comment that is
   removed and now created a direct interface between vfio and msi.
 - Incorporated review comments (details is in individual patch)

Bharat Bhushan (9):
  pci:msi: add weak function for returning msi region info
  pci: msi: expose msi region information functions
  powerpc: pci: Add arch specific msi region interface
  powerpc: msi: Extend the msi region interface to get info from
fsl_msi
  pci/msi: interface to set an iova for a msi region
  powerpc: pci: Extend msi iova page setup to arch specific
  pci: msi: Extend msi iova setting interface to powerpc arch
  vfio: moving some functions in common file
  vfio pci: Add vfio iommu implementation for FSL_PAMU

 arch/powerpc/include/asm/machdep.h |   10 +
 arch/powerpc/kernel/msi.c  |   28 +
 arch/powerpc/sysdev/fsl_msi.c  |  132 +-
 arch/powerpc/sysdev/fsl_msi.h  |   25 +-
 drivers/pci/msi.c  |   35 ++
 drivers/vfio/Kconfig   |6 +
 drivers/vfio/Makefile  |5 +-
 drivers/vfio/vfio_iommu_common.c   |  227 
 drivers/vfio/vfio_iommu_common.h   |   27 +
 drivers/vfio/vfio_iommu_fsl_pamu.c | 1003 
 drivers/vfio/vfio_iommu_type1.c|  206 +
 include/linux/msi.h|   14 +
 include/linux/pci.h|   21 +
 include/uapi/linux/vfio.h  |  100 
 14 files changed, 1623 insertions(+), 216 deletions(-)
 create mode 100644 drivers/vfio/vfio_iommu_common.c
 create mode 100644 drivers/vfio/vfio_iommu_common.h
 create mode 100644 drivers/vfio/vfio_iommu_fsl_pamu.c


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 5/6] watchdog: davinci: reuse driver for keystone arch

2013-11-18 Thread Guenter Roeck


On 11/18/2013 09:18 AM, Ivan Khoronzhuk wrote:

The keystone arch use the same IP watchdog, so add "ti,keystone-wdt"
compatible and correct identity.

The Keystone arch is using clocks in DT and source clock for watchdog
has to be specified, so add this to binding.

Signed-off-by: Ivan Khoronzhuk 
Acked-by: Santosh Shilimkar 


Reviewed-by: Guenter Roeck 


---
  .../devicetree/bindings/watchdog/davinci-wdt.txt   |   11 +--
  drivers/watchdog/Kconfig   |4 ++--
  drivers/watchdog/davinci_wdt.c |3 ++-
  3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt
index e450134..0f1aa99 100644
--- a/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt
@@ -1,16 +1,23 @@
-DaVinci Watchdog Timer (WDT) Controller
+Texas Instruments DaVinci/Keystone Watchdog Timer (WDT) Controller

  Required properties:
-- compatible : Should be "ti,davinci-wdt"
+- compatible : Should be "ti,davinci-wdt" or "ti,keystone-wdt"
  - reg : Should contain WDT registers location and length
+- clocks : phandle reference to the controller clock.
+  Required only for Keystone arch. See clock-bindings.txt

  Optional properties:
  - timeout-sec : Contains the watchdog timeout in seconds

+Documentation:
+Davinci DM646x - http://www.ti.com/lit/ug/spruer5b/spruer5b.pdf
+Keystone - http://www.ti.com/lit/ug/sprugv5a/sprugv5a.pdf
+
  Examples:

  wdt: wdt@232 {
compatible = "ti,davinci-wdt";
reg = <0x0232 0x80>;
timeout-sec = <30>;
+   clocks = <>;
  };
diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index d7db13d..addfc2c 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -270,12 +270,12 @@ config IOP_WATCHDOG

  config DAVINCI_WATCHDOG
tristate "DaVinci watchdog"
-   depends on ARCH_DAVINCI
+   depends on ARCH_DAVINCI || ARCH_KEYSTONE
select WATCHDOG_CORE
select WATCHDOG_NOWAYOUT
help
  Say Y here if to include support for the watchdog timer
- in the DaVinci DM644x/DM646x processors.
+ in the DaVinci DM644x/DM646x or Keystone processors.
  To compile this driver as a module, choose M here: the
  module will be called davinci_wdt.

diff --git a/drivers/watchdog/davinci_wdt.c b/drivers/watchdog/davinci_wdt.c
index 55deaf8..a6d365a 100644
--- a/drivers/watchdog/davinci_wdt.c
+++ b/drivers/watchdog/davinci_wdt.c
@@ -143,7 +143,7 @@ static unsigned int davinci_wdt_get_timeleft(struct 
watchdog_device *wdd)

  static const struct watchdog_info davinci_wdt_info = {
.options = WDIOF_KEEPALIVEPING,
-   .identity = "DaVinci Watchdog",
+   .identity = "DaVinci/Keystone Watchdog",
  };

  static const struct watchdog_ops davinci_wdt_ops = {
@@ -212,6 +212,7 @@ static int davinci_wdt_remove(struct platform_device *pdev)

  static const struct of_device_id davinci_wdt_of_match[] = {
{ .compatible = "ti,davinci-wdt", },
+   { .compatible = "ti,keystone-wdt", },
{},
  };
  MODULE_DEVICE_TABLE(of, davinci_wdt_of_match);



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/6] watchdog: davinci: add "timeout-sec" property

2013-11-18 Thread Guenter Roeck


On 11/18/2013 09:18 AM, Ivan Khoronzhuk wrote:

Since Davinci WDT has been switched to use WDT core, it became able
to support timeout-sec property, so add it to it's binding description.

Signed-off-by: Ivan Khoronzhuk 
Acked-by: Santosh Shilimkar 


Acked-by: Guenter Roeck 


---
  .../devicetree/bindings/watchdog/davinci-wdt.txt   |4 
  1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt
index 75558cc..e450134 100644
--- a/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt
+++ b/Documentation/devicetree/bindings/watchdog/davinci-wdt.txt
@@ -4,9 +4,13 @@ Required properties:
  - compatible : Should be "ti,davinci-wdt"
  - reg : Should contain WDT registers location and length

+Optional properties:
+- timeout-sec : Contains the watchdog timeout in seconds
+
  Examples:

  wdt: wdt@232 {
compatible = "ti,davinci-wdt";
reg = <0x0232 0x80>;
+   timeout-sec = <30>;
  };



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arch: hexagon: include: asm: add "vga.h" in Kbuild

2013-11-18 Thread Chen Gang

Need include generic "vga.h", or can not pass compiling with
allmodconfig, the related error:

CC [M]  drivers/gpu/drm/drm_irq.o
  In file included from include/linux/vgaarb.h:34:0,
   from drivers/gpu/drm/drm_irq.c:42:
  include/video/vga.h:22:21: fatal error: asm/vga.h: No such file or directory

Also move "preempt.h" upper to match sort order.


Signed-off-by: Chen Gang 
---
 arch/hexagon/include/asm/Kbuild |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild
index 67c3450..424e28e 100644
--- a/arch/hexagon/include/asm/Kbuild
+++ b/arch/hexagon/include/asm/Kbuild
@@ -31,6 +31,7 @@ generic-y += pci.h
 generic-y += percpu.h
 generic-y += poll.h
 generic-y += posix_types.h
+generic-y += preempt.h
 generic-y += resource.h
 generic-y += rwsem.h
 generic-y += scatterlist.h
@@ -52,5 +53,5 @@ generic-y += trace_clock.h
 generic-y += types.h
 generic-y += ucontext.h
 generic-y += unaligned.h
+generic-y += vga.h
 generic-y += xor.h
-generic-y += preempt.h
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/6] watchdog: davinci: add GET_TIMELEFT option support

2013-11-18 Thread Guenter Roeck


On 11/18/2013 09:18 AM, Ivan Khoronzhuk wrote:

Currently, the davinci watchdog can be read while counting,
so we can add ability to report the remaining time before
the system will reboot.

Signed-off-by: Ivan Khoronzhuk 
Acked-by: Santosh Shilimkar 


Reviewed-by: Guenter Roeck 


---
  drivers/watchdog/davinci_wdt.c |   26 ++
  1 file changed, 26 insertions(+)

diff --git a/drivers/watchdog/davinci_wdt.c b/drivers/watchdog/davinci_wdt.c
index b353df5..55deaf8 100644
--- a/drivers/watchdog/davinci_wdt.c
+++ b/drivers/watchdog/davinci_wdt.c
@@ -116,6 +116,31 @@ static int davinci_wdt_ping(struct watchdog_device *wdd)
return 0;
  }

+static unsigned int davinci_wdt_get_timeleft(struct watchdog_device *wdd)
+{
+   u64 timer_counter;
+   unsigned long freq;
+   u32 val;
+   struct davinci_wdt_device *davinci_wdt = watchdog_get_drvdata(wdd);
+
+   /* if timeout has occured then return 0 */
+   val = ioread32(davinci_wdt->base + WDTCR);
+   if (val & WDFLAG)
+   return 0;
+
+   freq = clk_get_rate(davinci_wdt->clk);
+
+   if (!freq)
+   return 0;
+
+   timer_counter = ioread32(davinci_wdt->base + TIM12);
+   timer_counter |= ((u64)ioread32(davinci_wdt->base + TIM34) << 32);
+
+   do_div(timer_counter, freq);
+
+   return wdd->timeout - timer_counter;
+}
+
  static const struct watchdog_info davinci_wdt_info = {
.options = WDIOF_KEEPALIVEPING,
.identity = "DaVinci Watchdog",
@@ -126,6 +151,7 @@ static const struct watchdog_ops davinci_wdt_ops = {
.start  = davinci_wdt_start,
.stop   = davinci_wdt_ping,
.ping   = davinci_wdt_ping,
+   .get_timeleft   = davinci_wdt_get_timeleft,
  };

  static int davinci_wdt_probe(struct platform_device *pdev)



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/6] watchdog: davinci: change driver to use WDT core

2013-11-18 Thread Guenter Roeck


On 11/18/2013 09:18 AM, Ivan Khoronzhuk wrote:

To reduce code duplicate and increase code readability use WDT core
code to handle WDT interface.

Remove io_lock as the WDT core uses mutex to lock each wdt device.
Remove wdt_state as the WDT core track state with its own variable.

The watchdog_init_timeout() can read timeout value from timeout-sec
property if the passed value is out of bounds. The heartbeat is
initialized in next way. If heartbeat is not set thought module
parameter, try to read it's value from WDT node timeout-sec property.
If node has no one, use default value.

The heartbeat is hold in wdd->timeout by WDT core, so use it in
order to set timeout period.

Signed-off-by: Ivan Khoronzhuk 
Acked-by: Santosh Shilimkar 
---
  drivers/watchdog/Kconfig   |2 +
  drivers/watchdog/davinci_wdt.c |  152 ++--
  2 files changed, 39 insertions(+), 115 deletions(-)

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index d1d53f3..d7db13d 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -271,6 +271,8 @@ config IOP_WATCHDOG
  config DAVINCI_WATCHDOG
tristate "DaVinci watchdog"
depends on ARCH_DAVINCI
+   select WATCHDOG_CORE
+   select WATCHDOG_NOWAYOUT


Is this mandatory, ie is it correct that the watchdog can not be stopped once 
started ?

Assuming it is,

Reviewed-by: Guenter Roeck 


help
  Say Y here if to include support for the watchdog timer
  in the DaVinci DM644x/DM646x processors.
diff --git a/drivers/watchdog/davinci_wdt.c b/drivers/watchdog/davinci_wdt.c
index bead774..cb9e8c5 100644
--- a/drivers/watchdog/davinci_wdt.c
+++ b/drivers/watchdog/davinci_wdt.c
@@ -3,7 +3,7 @@
   *
   * Watchdog driver for DaVinci DM644x/DM646x processors
   *
- * Copyright (C) 2006 Texas Instruments.
+ * Copyright (C) 2006-2013 Texas Instruments.
   *
   * 2007 (c) MontaVista Software, Inc. This file is licensed under
   * the terms of the GNU General Public License version 2. This program
@@ -15,18 +15,12 @@
  #include 
  #include 
  #include 
-#include 
-#include 
  #include 
  #include 
-#include 
  #include 
-#include 
-#include 
  #include 
  #include 
  #include 
-#include 
  #include 

  #define MODULE_NAME "DAVINCI-WDT: "
@@ -61,31 +55,12 @@
  #define WDKEY_SEQ0(0xa5c6 << 16)
  #define WDKEY_SEQ1(0xda7e << 16)

-static int heartbeat = DEFAULT_HEARTBEAT;
-
-static DEFINE_SPINLOCK(io_lock);
-static unsigned long wdt_status;
-#define WDT_IN_USE0
-#define WDT_OK_TO_CLOSE   1
-#define WDT_REGION_INITED 2
-#define WDT_DEVICE_INITED 3
-
+static int heartbeat;
  static void __iomem   *wdt_base;
  struct clk*wdt_clk;
+static struct watchdog_device  wdt_wdd;

-static void wdt_service(void)
-{
-   spin_lock(_lock);
-
-   /* put watchdog in service state */
-   iowrite32(WDKEY_SEQ0, wdt_base + WDTCR);
-   /* put watchdog in active state */
-   iowrite32(WDKEY_SEQ1, wdt_base + WDTCR);
-
-   spin_unlock(_lock);
-}
-
-static void wdt_enable(void)
+static int davinci_wdt_start(struct watchdog_device *wdd)
  {
u32 tgcr;
u32 timer_margin;
@@ -93,8 +68,6 @@ static void wdt_enable(void)

wdt_freq = clk_get_rate(wdt_clk);

-   spin_lock(_lock);
-
/* disable, internal clock source */
iowrite32(0, wdt_base + TCR);
/* reset timer, set mode to 64-bit watchdog, and unreset */
@@ -105,9 +78,9 @@ static void wdt_enable(void)
iowrite32(0, wdt_base + TIM12);
iowrite32(0, wdt_base + TIM34);
/* set timeout period */
-   timer_margin = (((u64)heartbeat * wdt_freq) & 0x);
+   timer_margin = (((u64)wdd->timeout * wdt_freq) & 0x);
iowrite32(timer_margin, wdt_base + PRD12);
-   timer_margin = (((u64)heartbeat * wdt_freq) >> 32);
+   timer_margin = (((u64)wdd->timeout * wdt_freq) >> 32);
iowrite32(timer_margin, wdt_base + PRD34);
/* enable run continuously */
iowrite32(ENAMODE12_PERIODIC, wdt_base + TCR);
@@ -119,84 +92,28 @@ static void wdt_enable(void)
iowrite32(WDKEY_SEQ0 | WDEN, wdt_base + WDTCR);
/* put watchdog in active state */
iowrite32(WDKEY_SEQ1 | WDEN, wdt_base + WDTCR);
-
-   spin_unlock(_lock);
-}
-
-static int davinci_wdt_open(struct inode *inode, struct file *file)
-{
-   if (test_and_set_bit(WDT_IN_USE, _status))
-   return -EBUSY;
-
-   wdt_enable();
-
-   return nonseekable_open(inode, file);
+   return 0;
  }

-static ssize_t
-davinci_wdt_write(struct file *file, const char *data, size_t len,
- loff_t *ppos)
+static int davinci_wdt_ping(struct watchdog_device *wdd)
  {
-   if (len)
-   wdt_service();
-
-   return len;
+   /* put watchdog in service state */
+   iowrite32(WDKEY_SEQ0, wdt_base + WDTCR);
+   /* put watchdog in active state */
+   iowrite32(WDKEY_SEQ1, wdt_base +

Re: [PATCH v3 2/2] ARM: bcm281xx: watchdog configuration

2013-11-18 Thread Guenter Roeck


On 11/18/2013 05:24 PM, Markus Mayer wrote:

This commit enables the watchdog driver for the BCM281xx family of SoCs.

Signed-off-by: Markus Mayer 
Reviewed-by: Matt Porter 


Acked-by: Guenter Roeck 


---
  arch/arm/configs/bcm_defconfig |3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/arm/configs/bcm_defconfig b/arch/arm/configs/bcm_defconfig
index 6e49310..8b98e11 100644
--- a/arch/arm/configs/bcm_defconfig
+++ b/arch/arm/configs/bcm_defconfig
@@ -130,3 +130,6 @@ CONFIG_CRC_ITU_T=y
  CONFIG_CRC7=y
  CONFIG_XZ_DEC=y
  CONFIG_AVERAGE=y
+CONFIG_WATCHDOG=y
+CONFIG_BCM_KONA_WDT=y
+CONFIG_BCM_KONA_WDT_DEBUG=y



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/2] watchdog: bcm281xx: Watchdog Driver

2013-11-18 Thread Guenter Roeck


On 11/18/2013 05:24 PM, Markus Mayer wrote:

This commit adds support for the watchdog timer used on the BCM281xx
family of SoCs.

Signed-off-by: Markus Mayer 
Reviewed-by: Matt Porter 
---
  drivers/watchdog/Kconfig|   22 +++
  drivers/watchdog/Makefile   |1 +
  drivers/watchdog/bcm_kona_wdt.c |  366 +++
  3 files changed, 389 insertions(+)
  create mode 100644 drivers/watchdog/bcm_kona_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index d1d53f3..fe8bd21 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -1121,6 +1121,28 @@ config BCM2835_WDT
  To compile this driver as a loadable module, choose M here.
  The module will be called bcm2835_wdt.

+config BCM_KONA_WDT
+   tristate "BCM Kona Watchdog"
+   depends on ARCH_BCM
+   select WATCHDOG_CORE
+   help
+ Support for the watchdog timer on the following Broadcom BCM281xx
+ family, which includes BCM11130, BCM11140, BCM11351, BCM28145 and
+ BCM28155 variants.
+
+ Say 'Y' or 'M' here to enable the driver. The module will be called
+ bcm_kona_wdt.
+
+config BCM_KONA_WDT_DEBUG
+   bool "DEBUGFS support for BCM Kona Watchdog"
+   depends on BCM_KONA_WDT
+   help
+ If enabled, adds /sys/kernel/debug/bcm-kona-wdt/info which provides
+ access to the driver's internal data structures as well as watchdog
+ timer hardware registres.
+
+ If in doubt, say 'N'.
+
  config LANTIQ_WDT
tristate "Lantiq SoC watchdog"
depends on LANTIQ
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 6c5bb27..7c860ca 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_IMX2_WDT) += imx2_wdt.o
  obj-$(CONFIG_UX500_WATCHDOG) += ux500_wdt.o
  obj-$(CONFIG_RETU_WATCHDOG) += retu_wdt.o
  obj-$(CONFIG_BCM2835_WDT) += bcm2835_wdt.o
+obj-$(CONFIG_BCM_KONA_WDT) += bcm_kona_wdt.o

  # AVR32 Architecture
  obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o
diff --git a/drivers/watchdog/bcm_kona_wdt.c b/drivers/watchdog/bcm_kona_wdt.c
new file mode 100644
index 000..f43687f
--- /dev/null
+++ b/drivers/watchdog/bcm_kona_wdt.c
@@ -0,0 +1,366 @@
+/*
+ * Copyright (C) 2013 Broadcom Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation version 2.
+ *
+ * This program is distributed "as is" WITHOUT ANY WARRANTY of any
+ * kind, whether express or implied; without even the implied warranty
+ * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 


This include should no longer be needed.

Otherwise,

Reviewed-by: Guenter Roeck 



+#include 
+#include 
+#include 
+#include 
+
+#define SECWDOG_CTRL_REG   0x
+#define SECWDOG_COUNT_REG  0x0004
+
+#define SECWDOG_RESERVED_MASK  0x1dff
+#define SECWDOG_WD_LOAD_FLAG   0x1000
+#define SECWDOG_EN_MASK0x0800
+#define SECWDOG_SRSTEN_MASK0x0400
+#define SECWDOG_RES_MASK   0x00f0
+#define SECWDOG_COUNT_MASK 0x000f
+
+#define SECWDOG_MAX_COUNT  SECWDOG_COUNT_MASK
+#define SECWDOG_CLKS_SHIFT 20
+#define SECWDOG_MAX_RES15
+#define SECWDOG_DEFAULT_RESOLUTION 4
+#define SECWDOG_MAX_TRY1000
+
+#define SECS_TO_TICKS(x, w)((x) << (w)->resolution)
+#define TICKS_TO_SECS(x, w)((x) >> (w)->resolution)
+
+#define BCM_KONA_WDT_NAME  "bcm_kona_wdt"
+
+struct bcm_kona_wdt {
+   void __iomem *base;
+   /*
+* One watchdog tick is 1/(2^resolution) seconds. Resolution can take
+* the values 0-15, meaning one tick can be 1s to 30.52us. Our default
+* resolution of 4 means one tick is 62.5ms.
+*
+* The watchdog counter is 20 bits. Depending on resolution, the maximum
+* counter value of 0xf expires after about 12 days (resolution 0)
+* down to only 32s (resolution 15). The default resolution of 4 gives
+* us a maximum of about 18 hours and 12 minutes before the watchdog
+* times out.
+*/
+   int resolution;
+   spinlock_t lock;
+#ifdef CONFIG_BCM_KONA_WDT_DEBUG
+   struct dentry *debugfs;
+#endif
+};
+
+#ifdef CONFIG_BCM_KONA_WDT_DEBUG
+static unsigned long busy_count;
+#endif
+
+static int secure_register_read(void __iomem *addr)
+{
+   uint32_t val;
+   unsigned count = 0;
+
+   /*
+* If the WD_LOAD_FLAG is set, the watchdog counter field is being
+* updated in hardware. Once the WD timer is updated in hardware, it
+* gets cleared.
+*/
+

RE: [f2fs-dev] [PATCH] f2fs: split sbi->write_mutex for DATA/NODE/META to avoid unnecessary race

2013-11-18 Thread Chao Yu

Hi

> -Original Message-
> From: Jaegeuk Kim [mailto:jaegeuk@samsung.com]
> Sent: Tuesday, November 19, 2013 11:36 AM
> To: Chao Yu
> Cc: linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH] f2fs: split sbi->write_mutex for 
> DATA/NODE/META to
> avoid unnecessary race
> 
> Hi,
> 
> I think we don't need to make two patches for this.
> How about this?

This could be reasonable,
And I will refer to this patch.

> 
> From 71c27f78e72d680edcd7b1c0917842343044653c Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim 
> Date: Mon, 18 Nov 2013 17:16:17 +0900
> Subject: [PATCH] f2fs: use sbi->write_mutex for write bios
> 
> This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
> There is no reason to use the semaphore when f2fs submits read and write
> IOs.
> Instead, let's use a write mutex and cover the sbi->bio[] by the lock.
> 
> Change log from v1:
>  o split write_mutex suggested by Chao Yu
> 
> Chao described,
> "All DATA/NODE/META bio buffers in superblock is protected by
> 'sbi->write_mutex', but each bio buffer area is independent, So we
> should split write_mutex to three for DATA/NODE/META."
> 
> Signed-off-by: Chao Yu 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c|  4 
>  fs/f2fs/f2fs.h|  2 +-
>  fs/f2fs/segment.c | 13 +
>  fs/f2fs/super.c   |  6 +-
>  4 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 076a60c..5920639 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -383,8 +383,6 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct
> page *page,
> 
>   trace_f2fs_readpage(page, blk_addr, type);
> 
> - down_read(>bio_sem);
> -
>   /* Allocate a new bio */
>   bio = f2fs_bio_alloc(bdev, 1);
> 
> @@ -394,13 +392,11 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct
> page *page,
> 
>   if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
>   bio_put(bio);
> - up_read(>bio_sem);
>   f2fs_put_page(page, 1);
>   return -EFAULT;
>   }
> 
>   submit_bio(type, bio);
> - up_read(>bio_sem);
>   return 0;
>  }
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 6a49554..6e67f28 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -374,7 +374,7 @@ struct f2fs_sb_info {
>   struct f2fs_sm_info *sm_info;   /* segment manager */
>   struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
>   sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
> - struct rw_semaphore bio_sem;/* IO semaphore */
> + struct mutex write_mutex[NR_PAGE_TYPE]; /* mutex for writing IOs */
> 
>   /* for checkpoint */
>   struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index dad5f1a..119af0b 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -871,9 +871,14 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
> 
>  void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum page_type type,
> bool sync)
>  {
> - down_write(>bio_sem);
> + enum page_type btype = PAGE_TYPE_OF_BIO(type);
> +
> + if (!sbi->bio[btype])
> + return;
> +
> + mutex_lock(>write_mutex[btype]);
>   do_submit_bio(sbi, type, sync);
> - up_write(>bio_sem);
> + mutex_unlock(>write_mutex[btype]);
>  }
> 
>  static void submit_write_page(struct f2fs_sb_info *sbi, struct page
> *page,
> @@ -884,7 +889,7 @@ static void submit_write_page(struct f2fs_sb_info
> *sbi, struct page *page,
> 
>   verify_block_addr(sbi, blk_addr);
> 
> - down_write(>bio_sem);
> + mutex_lock(>write_mutex[type]);
> 
>   inc_page_count(sbi, F2FS_WRITEBACK);
> 
> @@ -919,7 +924,7 @@ retry:
> 
>   sbi->last_block_in_bio[type] = blk_addr;
> 
> - up_write(>bio_sem);
> + mutex_unlock(>write_mutex[type]);
>   trace_f2fs_submit_write_page(page, blk_addr, type);
>  }
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index a022412..e194578 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -820,6 +820,7 @@ static int f2fs_fill_super(struct super_block *sb,
> void *data, int silent)
>   struct buffer_head *raw_super_buf;
>   struct inode *root;
>   long err = -EINVAL;
> + int i;
> 
>   /* allocate memory for f2fs-specific super block info */
>   sbi = kzalloc(sizeof(struct f2fs_sb_info), GFP_KERNEL);
> @@ -876,7 +877,10 @@ static int f2fs_fill_super(struct super_block *sb,
> void *data, int silent)
>   mutex_init(>node_write);
>   sbi->por_doing = false;
>   spin_lock_init(>stat_lock);
> - init_rwsem(>bio_sem);
> +
> + for (i = 0; i < NR_PAGE_TYPE; i++)
> + mutex_init(>write_mutex[i]);
> +
>   init_rwsem(>cp_rwsem);
>   init_waitqueue_head(>cp_wait);
>   init_sb_info(sbi);
> --
> 1.8.4.474.g128a96c
> 
>

[PATCH] arch: hexagon: Kconfig: add HAVE_DMA_ATTR in Kconfig and remove "linux/dma-mapping.h" from "asm/dma-mapping.h"

2013-11-18 Thread Chen Gang

When HAS_DMA, and also need use generic implementation, HAVE_DMA_ATTR
must be enabled, or can not pass compiling with allmodconfig, the
related error:

CC [M]  drivers/ata/libata-core.o
  drivers/ata/libata-core.c: In function 'ata_sg_clean':
  drivers/ata/libata-core.c:4598:3: error: implicit declaration of function 
'dma_unmap_sg' [-Werror=implicit-function-declaration]
  drivers/ata/libata-core.c: In function 'ata_sg_setup':
  drivers/ata/libata-core.c:4708:2: error: implicit declaration of function 
'dma_map_sg' [-Werror=implicit-function-declaration]

"linux/dma-mapping.h" will include "asm/dma-mapping.h", so need remove
"linux/dma-mapping.h" from "asm/dma-mapping.h",


Signed-off-by: Chen Gang 
---
 arch/hexagon/Kconfig   |1 +
 arch/hexagon/include/asm/dma-mapping.h |1 -
 2 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig
index 09df260..fbc5c78 100644
--- a/arch/hexagon/Kconfig
+++ b/arch/hexagon/Kconfig
@@ -28,6 +28,7 @@ config HEXAGON
select GENERIC_CLOCKEVENTS_BROADCAST
select MODULES_USE_ELF_RELA
select GENERIC_CPU_DEVICES
+   select HAVE_DMA_ATTRS
---help---
  Qualcomm Hexagon is a processor architecture designed for high
  performance and low power across a wide variety of applications.
diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index 85e9935..1696542 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -25,7 +25,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: tegra: properly use FUSE clock

2013-11-18 Thread Alex Courbot


On 11/19/2013 08:48 AM, Stephen Warren wrote:

On 11/18/2013 04:43 AM, Thierry Reding wrote:

On Mon, Nov 18, 2013 at 07:40:47PM +0900, Alexandre Courbot wrote:

FUSE clock is enabled by most bootloaders, but we cannot expect
it to be on in all contexts (e.g. kexec).

This patch adds a FUSE clkdev to all Tegra platforms and makes
sure it is enabled before touching FUSE registers.
tegra_init_fuse() is invoked during very early boot and thus
cannot rely on the clock framework ; therefore the FUSE clock is
forcibly enabled using a register write in that function, and
remains that way until the clock framework can be used.

Signed-off-by: Alexandre Courbot  ---
arch/arm/mach-tegra/fuse.c   | 41
+++-
drivers/clk/tegra/clk-tegra114.c |  1 +
drivers/clk/tegra/clk-tegra124.c |  1 +
drivers/clk/tegra/clk-tegra20.c  |  1 +


Isn't this missing the clock driver changes for Tegra30? Ah...
Tegra30 already has this clock defined. I wonder why only Tegra30
has it. grep says that fuse-tegra isn't used by any drivers, which
also indicates that perhaps we don't need the .dev_id in the first
place. We should be able to get by with just the .con_id = "fuse".

Also are there any reasons to keep this in one single patch? Since
none of the fuse clocks are used yet, I think the clock changes
could be a separate patch that can go in through the clock tree.
And there isn't even a hard runtime dependency, since if the Tegra
changes were to go in without the clock changes, then the fallback
code in this patch should still turn the clock on properly. It just
might not be turned off again, but isn't that something we can live
with for a short period of time? I think perhaps that could even be
improved, see further below.

I've added Mike on Cc, he'll need to either take the patch in
through his tree or Ack this one, so he needs to see it
eventually.


4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-tegra/fuse.c
b/arch/arm/mach-tegra/fuse.c index 9a4e910c3796..3b9191b930b5
100644 --- a/arch/arm/mach-tegra/fuse.c +++
b/arch/arm/mach-tegra/fuse.c @@ -22,6 +22,7 @@ #include
 #include  #include 
+#include  #include 

#include "fuse.h" @@ -54,6 +55,7 @@ int tegra_cpu_speedo_id;  /*
only exist in Tegra30 and later */ int tegra_soc_speedo_id; enum
tegra_revision tegra_revision;

+static struct clk *fuse_clk; static int tegra_fuse_spare_bit;
static void (*tegra_init_speedo_data)(void);

@@ -77,6 +79,22 @@ static const char
*tegra_revision_name[TEGRA_REVISION_MAX] = { [TEGRA_REVISION_A04]
= "A04", };

+static void tegra_fuse_enable_clk(void) +{ +   if
(IS_ERR(fuse_clk)) +fuse_clk = clk_get_sys("fuse-tegra",
"fuse"); +if (IS_ERR(fuse_clk)) + return;


Perhaps instead of just returning here, this should actually be
where the code to enable the clock should go.


+   clk_prepare_enable(fuse_clk); +} + +static void
tegra_fuse_disable_clk(void) +{ +   if (IS_ERR(fuse_clk)) +
return;


And this is where we could disable it again. That way we should
get equal functionality in both cases.


That would need a shared lock with the clock code; at some point, the
clock will be registered, and the clock subsystem in control of the
enable bit. I think having a very early tegra_init_fuse() come along
and force the clock on, and then having the rest of the fuse code use
the clock object as soon as it's available, is the safest approach.

Of course, I suppose there's still a window where the following might
happen:

cpu 0:
- tegra_fuse_enable_clk entered
- fails to clk_get
cpu 1
- tegra clk driver is registered
- clk subsystem initcall disables all
  unused clocks
- access a fuse register

-> badness


It seems to me that both solutions require a shared lock with the clock 
code in order to be theoretically safe. The situation you described 
requires a lock to be addressed ; and unless I missed something, 
anything that could break with Thierry's proposal could also only do so 
if we "hold" the clock when the tegra clk driver is registered.


However we can consider ourselves safe for both cases if we know for 
sure that there is no fuse function in use when this happens. Since 
of_clk_init() is called very early during boot with SMP and preemption 
disabled, isn't that always the case?




I'm not sure how to protect against that, unless we simply assume that
all the fuse driver functions are guaranteed to happen early and
before the clk subsystem's initcall, so that can't happen?


That's probably the current behavior actually - the clk subsystem's 
initcall disables the fuse clock, fuse functions are never used after 
that, and we only notice when we try to kexec into another kernel that 
the fuse clock is not enabled as expected.


With that in mind, it's probably a good idea to preemptively implement 
proper clock enabling

Re: linux-next: Tree for Nov 19

2013-11-18 Thread Stephen Rothwell

Hi Guenter,

On Mon, 18 Nov 2013 20:08:54 -0800 Guenter Roeck  wrote:
>
> > Merging h8300-remove/h8300-remove (b400126add8f CREDITS: Add Yoshinori Sato 
> > for h8300)
> 
> This tree has been merged upstream and is no longer needed.

Thanks, I will remove it tomorrow.
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgp_z4cI2d3nY.pgp
Description: PGP signature

Re: [tip:x86/asm] x86-64, copy_user: Remove zero byte check before copy user buffer.

2013-11-18 Thread H. Peter Anvin

On 11/16/2013 10:44 PM, Linus Torvalds wrote:
> So this doesn't do the 32-bit truncation in the error path of the generic
> string copy. Oversight?
> 
>Linus

Hi Linus,

Do you have a preference:

1. Considering the 32-bit truncation incidental (take it or leave it);
2. Require the 32-bit truncation, or
3. Get rid of it completely?

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: tegra: properly use FUSE clock

2013-11-18 Thread Alex Courbot


On 11/18/2013 08:43 PM, Thierry Reding wrote:

* PGP Signed by an unknown key

On Mon, Nov 18, 2013 at 07:40:47PM +0900, Alexandre Courbot wrote:

FUSE clock is enabled by most bootloaders, but we cannot expect it to be
on in all contexts (e.g. kexec).

This patch adds a FUSE clkdev to all Tegra platforms and makes sure
it is enabled before touching FUSE registers. tegra_init_fuse() is
invoked during very early boot and thus cannot rely on the clock
framework ; therefore the FUSE clock is forcibly enabled using a
register write in that function, and remains that way until the
clock framework can be used.

Signed-off-by: Alexandre Courbot 
---
  arch/arm/mach-tegra/fuse.c   | 41 +++-
  drivers/clk/tegra/clk-tegra114.c |  1 +
  drivers/clk/tegra/clk-tegra124.c |  1 +
  drivers/clk/tegra/clk-tegra20.c  |  1 +


Isn't this missing the clock driver changes for Tegra30? Ah... Tegra30
already has this clock defined. I wonder why only Tegra30 has it. grep
says that fuse-tegra isn't used by any drivers, which also indicates
that perhaps we don't need the .dev_id in the first place. We should be
able to get by with just the .con_id = "fuse".


Will fix that.


Also are there any reasons to keep this in one single patch? Since none
of the fuse clocks are used yet, I think the clock changes could be a
separate patch that can go in through the clock tree. And there isn't
even a hard runtime dependency, since if the Tegra changes were to go in
without the clock changes, then the fallback code in this patch should
still turn the clock on properly. It just might not be turned off again,
but isn't that something we can live with for a short period of time? I
think perhaps that could even be improved, see further below.

I've added Mike on Cc, he'll need to either take the patch in through
his tree or Ack this one, so he needs to see it eventually.


I will split the change into two patches - at first I thought it would 
not be worth the trouble, but I overlooked the fact this needed to go 
through the clock source tree.





  4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mach-tegra/fuse.c b/arch/arm/mach-tegra/fuse.c
index 9a4e910c3796..3b9191b930b5 100644
--- a/arch/arm/mach-tegra/fuse.c
+++ b/arch/arm/mach-tegra/fuse.c
@@ -22,6 +22,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 

  #include "fuse.h"
@@ -54,6 +55,7 @@ int tegra_cpu_speedo_id;  /* only exist in 
Tegra30 and later */
  int tegra_soc_speedo_id;
  enum tegra_revision tegra_revision;

+static struct clk *fuse_clk;
  static int tegra_fuse_spare_bit;
  static void (*tegra_init_speedo_data)(void);

@@ -77,6 +79,22 @@ static const char *tegra_revision_name[TEGRA_REVISION_MAX] = 
{
[TEGRA_REVISION_A04] = "A04",
  };

+static void tegra_fuse_enable_clk(void)
+{
+   if (IS_ERR(fuse_clk))
+   fuse_clk = clk_get_sys("fuse-tegra", "fuse");
+   if (IS_ERR(fuse_clk))
+   return;


Perhaps instead of just returning here, this should actually be where
the code to enable the clock should go.


+   clk_prepare_enable(fuse_clk);
+}
+
+static void tegra_fuse_disable_clk(void)
+{
+   if (IS_ERR(fuse_clk))
+   return;


And this is where we could disable it again. That way we should get
equal functionality in both cases.


What Stephen said, basically - but let me address that in the other mail.

Thanks for the review!
Alex.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v6 3/5] qrwlock: Enable fair queue read/write lock

2013-11-18 Thread Long, Wai Man

Sorry for the late reply.

The presence of the unfair option as default is largely due to fixing the 
recursive read lock in interrupt handler problem. You are right that it is no 
longer a valid reason now. 

An unfair lock may have a better performance characteristics, especially for a 
highly contended lock. Other than that, I have no other reason to keep an 
unfair option. I don't mind taking out the unfair option and make the fair 
qrwlock the default. It does simplify the code. I will send out an updated 
patchset with only the fair qrwlock.

Thank for suggestion.
-Longman

-Original Message-
From: linus...@gmail.com [mailto:linus...@gmail.com] On Behalf Of Linus Torvalds
Sent: Monday, November 18, 2013 1:12 PM
To: Long, Wai Man
Cc: Thomas Gleixner; Ingo Molnar; H. Peter Anvin; Arnd Bergmann; 
linux-a...@vger.kernel.org; the arch/x86 maintainers; Linux Kernel Mailing 
List; Peter Zijlstra; Steven Rostedt; Andrew Morton; Michel Lespinasse; Andi 
Kleen; Rik van Riel; Paul E. McKenney; Raghavendra K T; George Spelvin; Tim 
Chen; Chandramouleeswaran, Aswin; Norton, Scott J
Subject: Re: [PATCH v6 3/5] qrwlock: Enable fair queue read/write lock

On Tue, Nov 12, 2013 at 6:48 AM, Waiman Long  wrote:
> By default, queue rwlock is fair among writers and gives preference to 
> readers allowing them to steal lock even if a writer is waiting. 
> However, there is a desire to have a fair variant of rwlock that is 
> more deterministic. To enable this [..]

Is there really any point in having the option for unfair at all?

From your timings, it looks like the unfair locks are more expensive for the 
writer side, but since pretty much the whole point of rwlocks is when readers 
are the common case, I don't think we care.

And I'm not at all convinced we want the complexity of two different kinds of 
rwlocks with different semantics and extra code for said semantics..

Your *original* fair rwlocks were unusable, since they didn't allow for the irq 
semantics that most users need, but afaik your current version always makes an 
irq/bh-context reader work even when the lock is otherwise trying to be fair, 
so this whole dual behavior seems to be largely pointless.

No?

Linus
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH v2] tools lib traceevent: Report better error message on bad function args

2013-11-18 Thread Namhyung Kim

On Tue, Nov 19, 2013 at 2:38 AM, Steven Rostedt  wrote:
> When Jiri Olsa was writing a function callback for
> scsi_trace_parse_cdb(), he thought that the traceevent library had a
> bug in it because he was getting this error:
>
>   Error: expected ')' but read ','
>   Error: expected ')' but read ','
>   Error: expected ')' but read ','
>   Error: expected ')' but read ','
>
> But in truth, he didn't have the write number of arguments for the
> function callback, and the error was the library detecting the
> discrepancy. A better error message would have prevented the confusion:
>
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_timeout has more
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_start has more
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_error has more
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_done has more
>
> Or
>
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_timeout only uses 3
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_start only uses 3
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_error only uses 3
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_done only uses 3

Acked-by: Namhyung Kim 

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS cleanly

2013-11-18 Thread Yuanhan Liu

Remove CONFIG_USE_GENERIC_SMP_HELPERS left by commit
0a06ff06("kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS").

Cc: Christoph Hellwig 
Cc: Andrew Morton 
Signed-off-by: Yuanhan Liu 
---
 drivers/block/null_blk.c |8 
 net/Kconfig  |4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index b5d8423..ea192ec 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -223,7 +223,7 @@ static void null_softirq_done_fn(struct request *rq)
blk_end_request_all(rq, 0);
 }
 
-#if defined(CONFIG_SMP) && defined(CONFIG_USE_GENERIC_SMP_HELPERS)
+#ifdef CONFIG_SMP
 
 static void null_ipi_cmd_end_io(void *data)
 {
@@ -260,7 +260,7 @@ static void null_cmd_end_ipi(struct nullb_cmd *cmd)
put_cpu();
 }
 
-#endif /* CONFIG_SMP && CONFIG_USE_GENERIC_SMP_HELPERS */
+#endif /* CONFIG_SMP */
 
 static inline void null_handle_cmd(struct nullb_cmd *cmd)
 {
@@ -270,7 +270,7 @@ static inline void null_handle_cmd(struct nullb_cmd *cmd)
end_cmd(cmd);
break;
case NULL_IRQ_SOFTIRQ:
-#if defined(CONFIG_SMP) && defined(CONFIG_USE_GENERIC_SMP_HELPERS)
+#ifdef CONFIG_SMP
null_cmd_end_ipi(cmd);
 #else
end_cmd(cmd);
@@ -571,7 +571,7 @@ static int __init null_init(void)
 {
unsigned int i;
 
-#if !defined(CONFIG_SMP) || !defined(CONFIG_USE_GENERIC_SMP_HELPERS)
+#if !defined(CONFIG_SMP)
if (irqmode == NULL_IRQ_SOFTIRQ) {
pr_warn("null_blk: softirq completions not available.\n");
pr_warn("null_blk: using direct completions.\n");
diff --git a/net/Kconfig b/net/Kconfig
index 0715db6..d334678 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -224,7 +224,7 @@ source "net/hsr/Kconfig"
 
 config RPS
boolean
-   depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
+   depends on SMP && SYSFS
default y
 
 config RFS_ACCEL
@@ -235,7 +235,7 @@ config RFS_ACCEL
 
 config XPS
boolean
-   depends on SMP && USE_GENERIC_SMP_HELPERS
+   depends on SMP
default y
 
 config NETPRIO_CGROUP
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: Tree for Nov 19

2013-11-18 Thread Guenter Roeck

On Tue, Nov 19, 2013 at 02:39:54PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
Hi Stephen,

> Merging h8300-remove/h8300-remove (b400126add8f CREDITS: Add Yoshinori Sato 
> for h8300)

This tree has been merged upstream and is no longer needed.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpufreq: cpufreq-cpu0: Use a sane boot frequency when booting with a mismatched bootloader configuration

2013-11-18 Thread Viresh Kumar

On 19 November 2013 07:51, Shawn Guo  wrote:
> No, I did not say that.  IMO, when cpufreq-cpu0 sees a mismatch, it has
> no way to know or assume which one is correct and which is incorrect.
> The best thing it can do is to fail out without changing anything about
> running frequency and voltage.

Not specifically on this patch, but this is what I feel about this issue:

- As we are discussing on the other thread, there is scope of adding
"unknown" field in tables so that people would know that they were
running out of table freq at some point..
- This is a common problem for all drivers/platforms and not only
cpufreq-cpu0, so the solution has to be generic and not driver
specific.. So, atleast I don't want to get this patch in at any cost,
unless there is a generic solution present..
- There are non-dt drivers as well, and so freq table is present
with the kernel and we can't support all frequencies that bootloader
may end up with..

--
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: Tree for Nov 19

2013-11-18 Thread Stephen Rothwell

Hi all,

Please do *not* add any v3.14 material to linux-next until after
v3.13-rc1 is released.

Changes since 20131118:

The nfs tree lost its build failure but gained another for which I
reverted a commit.

The gpio tree gained a conflict against the pm tree.



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" as mentioned in the FAQ on the wiki
(see below).

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log files
in the Next directory.  Between each merge, the tree was built with
a ppc64_defconfig for powerpc and an allmodconfig for x86_64 and a
multi_v7_defconfig for arm. After the final fixups (if any), it is also
built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig and
allyesconfig (minus CONFIG_PROFILE_ALL_BRANCHES - this fails its final
link) and i386, sparc, sparc64 and arm defconfig. These builds also have
CONFIG_ENABLE_WARN_DEPRECATED, CONFIG_ENABLE_MUST_CHECK and
CONFIG_DEBUG_INFO disabled when necessary.

Below is a summary of the state of the merge.

I am currently merging 210 trees (counting Linus' and 29 trees of patches
pending for Linus' tree), more are welcome (even if they are currently
empty). Thanks to those who have contributed, and to those who haven't,
please do.

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

There is a wiki covering stuff to do with linux-next at
http://linux.f-seidel.de/linux-next/pmwiki/ .  Thanks to Frank Seidel.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (2d3c627502f2 Revert "init/Kconfig: add option to disable 
kernel compression")
Merging fixes/master (2d3c627502f2 Revert "init/Kconfig: add option to disable 
kernel compression")
Merging kbuild-current/rc-fixes (19514fc665ff arm, kbuild: make "make install" 
not depend on vmlinux)
Merging arc-current/for-curr (737d5b980be8 ARC: [plat-arcfpga] defconfig update)
Merging arm-current/fixes (9170217510cd ARM: 7888/1: seccomp: not compatible 
with ARM OABI)
Merging m68k-current/for-linus (77a42796786c m68k: Remove deprecated 
IRQF_DISABLED)
Merging metag-fixes/fixes (3b2f64d00c46 Linux 3.11-rc2)
Merging powerpc-merge/merge (8b5ede69d24d powerpc/irq: Don't switch to irq 
stack from softirq stack)
Merging sparc/master (6a328f3fe042 sparc64: merge fix)
Merging net/master (d11a347de3f5 be2net: Delete secondary unicast MAC addresses 
during be_close)
Merging ipsec/master (be408cd3e1fe Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net)
Merging sound-current/for-linus (2793769f4450 ALSA: hda - Enable mute/mic-mute 
LEDs for more Thinkpads with Realtek codec)
Merging pci-current/for-linus (a2c8f94ffa84 Revert "workqueue: allow 
work_on_cpu() to be called recursively")
Merging wireless/master (3b1bace9960b brcmfmac: fix possible memory leak)
Merging driver-core.current/driver-core-linus (31d141e3a666 Linux 3.12-rc6)
Merging tty.current/tty-linus (6e757ad2c92c tty/serial: at91: fix uart/usart 
selection for older products)
Merging usb.current/usb-linus (e1466ad5b1ae USB: serial: ftdi_sio: add id for 
Z3X Box device)
Merging staging.current/staging-linus (31d141e3a666 Linux 3.12-rc6)
Merging char-misc.current/char-misc-linus (31d141e3a666 Linux 3.12-rc6)
Merging input-current/for-linus (42249094f794 Merge branch 'next' into 
for-linus)
Merging md-current/for-linus (d47648fcf061 raid5: avoid finding "discard" 
stripe)
Merging crypto-current/master (f262f0f5cad0 crypto: s390 - Fix aes-cbc IV 
corruption)
CONFLICT (content): Merge conflict in drivers/crypto/caam/jr.c
Merging ide/master (c2f7d1e103ef ide: pmac: remove unnecessary 
pci_set_drvdata())
Merging dwmw2/master (5950f0803ca9 pcmcia: remove RPX board stuff)
Merging sh-current/sh-fixes-for-linus (44033109e99c SH: Convert out[bwl] macros 
to inline functions)
Merging devicetree-current/devicetree/merge (1931ee143b0a Revert "drivers: of: 
add initialization code for dma reserved memory")
Merging rr-fixes/fixes (f6537f2f0eba scripts/kallsyms: filter symbols not in 
kernel address space)
Merging mfd-fixes/master (ed2fe55fd91e mfd: ti-ssp: Fix build)
Merging vfio-fixes/for-linus (d93b3ac0edb8 VFIO: vfio_iommu_type1: fix bug 
caused by break in neste

Re: [f2fs-dev] [PATCH] f2fs: split sbi->write_mutex for DATA/NODE/META to avoid unnecessary race

2013-11-18 Thread Jaegeuk Kim

Hi,

I think we don't need to make two patches for this.
How about this?

>From 71c27f78e72d680edcd7b1c0917842343044653c Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim 
Date: Mon, 18 Nov 2013 17:16:17 +0900
Subject: [PATCH] f2fs: use sbi->write_mutex for write bios

This patch removes an unnecessary semaphore (i.e., sbi->bio_sem).
There is no reason to use the semaphore when f2fs submits read and write
IOs.
Instead, let's use a write mutex and cover the sbi->bio[] by the lock.

Change log from v1:
 o split write_mutex suggested by Chao Yu

Chao described,
"All DATA/NODE/META bio buffers in superblock is protected by
'sbi->write_mutex', but each bio buffer area is independent, So we
should split write_mutex to three for DATA/NODE/META."

Signed-off-by: Chao Yu 
Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/data.c|  4 
 fs/f2fs/f2fs.h|  2 +-
 fs/f2fs/segment.c | 13 +
 fs/f2fs/super.c   |  6 +-
 4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 076a60c..5920639 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -383,8 +383,6 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct
page *page,
 
trace_f2fs_readpage(page, blk_addr, type);
 
-   down_read(>bio_sem);
-
/* Allocate a new bio */
bio = f2fs_bio_alloc(bdev, 1);
 
@@ -394,13 +392,11 @@ int f2fs_readpage(struct f2fs_sb_info *sbi, struct
page *page,
 
if (bio_add_page(bio, page, PAGE_CACHE_SIZE, 0) < PAGE_CACHE_SIZE) {
bio_put(bio);
-   up_read(>bio_sem);
f2fs_put_page(page, 1);
return -EFAULT;
}
 
submit_bio(type, bio);
-   up_read(>bio_sem);
return 0;
 }
 
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 6a49554..6e67f28 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -374,7 +374,7 @@ struct f2fs_sb_info {
struct f2fs_sm_info *sm_info;   /* segment manager */
struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
-   struct rw_semaphore bio_sem;/* IO semaphore */
+   struct mutex write_mutex[NR_PAGE_TYPE]; /* mutex for writing IOs */
 
/* for checkpoint */
struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index dad5f1a..119af0b 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -871,9 +871,14 @@ static void do_submit_bio(struct f2fs_sb_info *sbi,
 
 void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum page_type type,
bool sync)
 {
-   down_write(>bio_sem);
+   enum page_type btype = PAGE_TYPE_OF_BIO(type);
+
+   if (!sbi->bio[btype])
+   return;
+
+   mutex_lock(>write_mutex[btype]);
do_submit_bio(sbi, type, sync);
-   up_write(>bio_sem);
+   mutex_unlock(>write_mutex[btype]);
 }
 
 static void submit_write_page(struct f2fs_sb_info *sbi, struct page
*page,
@@ -884,7 +889,7 @@ static void submit_write_page(struct f2fs_sb_info
*sbi, struct page *page,
 
verify_block_addr(sbi, blk_addr);
 
-   down_write(>bio_sem);
+   mutex_lock(>write_mutex[type]);
 
inc_page_count(sbi, F2FS_WRITEBACK);
 
@@ -919,7 +924,7 @@ retry:
 
sbi->last_block_in_bio[type] = blk_addr;
 
-   up_write(>bio_sem);
+   mutex_unlock(>write_mutex[type]);
trace_f2fs_submit_write_page(page, blk_addr, type);
 }
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a022412..e194578 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -820,6 +820,7 @@ static int f2fs_fill_super(struct super_block *sb,
void *data, int silent)
struct buffer_head *raw_super_buf;
struct inode *root;
long err = -EINVAL;
+   int i;
 
/* allocate memory for f2fs-specific super block info */
sbi = kzalloc(sizeof(struct f2fs_sb_info), GFP_KERNEL);
@@ -876,7 +877,10 @@ static int f2fs_fill_super(struct super_block *sb,
void *data, int silent)
mutex_init(>node_write);
sbi->por_doing = false;
spin_lock_init(>stat_lock);
-   init_rwsem(>bio_sem);
+
+   for (i = 0; i < NR_PAGE_TYPE; i++)
+   mutex_init(>write_mutex[i]);
+
init_rwsem(>cp_rwsem);
init_waitqueue_head(>cp_wait);
init_sb_info(sbi);
-- 
1.8.4.474.g128a96c



-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] FS: Fixed buffer overflow issue in seq_read()

2013-11-18 Thread Linus Torvalds

On Mon, Nov 18, 2013 at 7:28 PM, Al Viro  wrote:
>
> BTW, I've several old commits that didn't go into the first pile (e.g.
> taking read_seqbegin_or_lock() and friends from fs/dcache.c into
> linux/seqlock.h, where they obviously belong, etc.) and several regression
> fixes; are you OK with pull request tomorrow?  I can post it tonight,
> but I'd prefer to leave local toruture running overnight...

Tomorrow's fine.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] FS: Fixed buffer overflow issue in seq_read()

2013-11-18 Thread Al Viro

On Mon, Nov 18, 2013 at 07:13:54PM -0800, Linus Torvalds wrote:
> 
> 
> On Tue, 19 Nov 2013, Al Viro wrote:
> > 
> > seq_file: always clear m->count when we free m->buf
> 
> Ok, applied.
> 
> What do you think about then just abstracing out that now common sequence 
> of re-allocating a larger buffer, while clearing m->count?

Sure, no problem, but then we really have only 2 places doing that and no
visible cause to grow more of them.  With this common sequence being that
short, I'm not sure that effort to recall the definition of that helper
won't be more than that to understand the open-coded variant.  Matter of
taste, but IMO in this case the helper makes it slightly less readable...

BTW, I've several old commits that didn't go into the first pile (e.g.
taking read_seqbegin_or_lock() and friends from fs/dcache.c into
linux/seqlock.h, where they obviously belong, etc.) and several regression
fixes; are you OK with pull request tomorrow?  I can post it tonight,
but I'd prefer to leave local toruture running overnight...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net v4 3/4] r8152: support stopping/waking tx queue

2013-11-18 Thread Hayes Wang

The maximum packet number which a tx aggregation buffer could contain
is the buffer size / (packet size + descriptor size).

If the tx buffer is empty and the tx queue length is more than the
maximum value which is defined above, stop the tx queue. Wake the tx
queue after any queued packet is filled in a available tx buffer.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 8a786b6..0ac2b53 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -365,6 +365,7 @@ struct r8152 {
struct mii_if_info mii;
int intr_interval;
u32 msg_enable;
+   u32 tx_qlen;
u16 ocp_base;
u8 *intr_buff;
u8 version;
@@ -1173,6 +1174,9 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct 
tx_agg *agg)
remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
}
 
+   if (netif_queue_stopped(tp->netdev))
+   netif_wake_queue(tp->netdev);
+
usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
  agg->head, (int)(tx_data - (u8 *)agg->head),
  (usb_complete_t)write_bulk_callback, agg);
@@ -1393,6 +1397,10 @@ static netdev_tx_t rtl8152_start_xmit(struct sk_buff 
*skb,
 
skb_queue_tail(>tx_queue, skb);
 
+   if (list_empty(>tx_free) &&
+   skb_queue_len(>tx_queue) > tp->tx_qlen)
+   netif_stop_queue(netdev);
+
if (!list_empty(>tx_free))
tasklet_schedule(>tl);
 
@@ -1423,6 +1431,14 @@ static void rtl8152_nic_reset(struct r8152 *tp)
}
 }
 
+static void set_tx_qlen(struct r8152 *tp)
+{
+   struct net_device *netdev = tp->netdev;
+
+   tp->tx_qlen = rx_buf_sz / (netdev->mtu + VLAN_ETH_HLEN + VLAN_HLEN +
+  sizeof(struct tx_desc));
+}
+
 static inline u8 rtl8152_get_speed(struct r8152 *tp)
 {
return ocp_read_byte(tp, MCU_TYPE_PLA, PLA_PHYSTATUS);
@@ -1434,6 +1450,7 @@ static int rtl8152_enable(struct r8152 *tp)
int i, ret;
u8 speed;
 
+   set_tx_qlen(tp);
speed = rtl8152_get_speed(tp);
if (speed & _10bps) {
ocp_data = ocp_read_word(tp, MCU_TYPE_PLA, PLA_EEEP_CR);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net v4 4/4] r8152: fix incorrect type in assignment

2013-11-18 Thread Hayes Wang

The data from the hardware should be little endian. Correct the
declaration.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 0ac2b53..fb35e6e 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -307,22 +307,22 @@ enum rtl8152_flags {
 #define MCU_TYPE_USB   0x
 
 struct rx_desc {
-   u32 opts1;
+   __le32 opts1;
 #define RX_LEN_MASK0x7fff
-   u32 opts2;
-   u32 opts3;
-   u32 opts4;
-   u32 opts5;
-   u32 opts6;
+   __le32 opts2;
+   __le32 opts3;
+   __le32 opts4;
+   __le32 opts5;
+   __le32 opts6;
 };
 
 struct tx_desc {
-   u32 opts1;
+   __le32 opts1;
 #define TX_FS  (1 << 31) /* First segment of a packet */
 #define TX_LS  (1 << 30) /* Final segment of a packet */
 #define TX_LEN_MASK0x3
 
-   u32 opts2;
+   __le32 opts2;
 #define UDP_CS (1 << 31) /* Calculate UDP/IP checksum */
 #define TCP_CS (1 << 30) /* Calculate TCP/IP checksum */
 #define IPV4_CS(1 << 29) /* Calculate IPv4 checksum */
@@ -877,7 +877,7 @@ static void write_bulk_callback(struct urb *urb)
 static void intr_callback(struct urb *urb)
 {
struct r8152 *tp;
-   __u16 *d;
+   __le16 *d;
int status = urb->status;
int res;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net v4 2/4] r8152: modify the tx flow

2013-11-18 Thread Hayes Wang

Remove the code for sending the packet in the rtl8152_start_xmit().
Let rtl8152_start_xmit() to queue the packet only, and schedule a
tasklet to send the queued packets. This simplify the code and make
sure all the packet would be sent by the original order.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 46 +++---
 1 file changed, 3 insertions(+), 43 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 428600d..8a786b6 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1388,53 +1388,13 @@ static netdev_tx_t rtl8152_start_xmit(struct sk_buff 
*skb,
struct net_device *netdev)
 {
struct r8152 *tp = netdev_priv(netdev);
-   struct net_device_stats *stats = rtl8152_get_stats(netdev);
-   unsigned long flags;
-   struct tx_agg *agg = NULL;
-   struct tx_desc *tx_desc;
-   unsigned int len;
-   u8 *tx_data;
-   int res;
 
skb_tx_timestamp(skb);
 
-   /* If tx_queue is not empty, it means at least one previous packt */
-   /* is waiting for sending. Don't send current one before it.  */
-   if (skb_queue_empty(>tx_queue))
-   agg = r8152_get_tx_agg(tp);
-
-   if (!agg) {
-   skb_queue_tail(>tx_queue, skb);
-   return NETDEV_TX_OK;
-   }
-
-   tx_desc = (struct tx_desc *)agg->head;
-   tx_data = agg->head + sizeof(*tx_desc);
-   agg->skb_num = agg->skb_len = 0;
+   skb_queue_tail(>tx_queue, skb);
 
-   len = skb->len;
-   r8152_tx_csum(tp, tx_desc, skb);
-   memcpy(tx_data, skb->data, len);
-   dev_kfree_skb_any(skb);
-   agg->skb_num++;
-   agg->skb_len += len;
-   usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
- agg->head, len + sizeof(*tx_desc),
- (usb_complete_t)write_bulk_callback, agg);
-   res = usb_submit_urb(agg->urb, GFP_ATOMIC);
-   if (res) {
-   /* Can we get/handle EPIPE here? */
-   if (res == -ENODEV) {
-   netif_device_detach(tp->netdev);
-   } else {
-   netif_warn(tp, tx_err, netdev,
-  "failed tx_urb %d\n", res);
-   stats->tx_dropped++;
-   spin_lock_irqsave(>tx_lock, flags);
-   list_add_tail(>list, >tx_free);
-   spin_unlock_irqrestore(>tx_lock, flags);
-   }
-   }
+   if (!list_empty(>tx_free))
+   tasklet_schedule(>tl);
 
return NETDEV_TX_OK;
 }
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net v4 0/4] r8152 bug fixes

2013-11-18 Thread Hayes Wang

For the patch #1, I modify the type of the variable "pkt_len" from "unsigned"
to "unsigned int".

Hayes Wang (4):
  r8152: fix tx/rx memory overflow
  r8152: modify the tx flow
  r8152: support stopping/waking tx queue
  r8152: fix incorrect type in assignment

 drivers/net/usb/r8152.c | 109 
 1 file changed, 45 insertions(+), 64 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH net v4 1/4] r8152: fix tx/rx memory overflow

2013-11-18 Thread Hayes Wang

The tx/rx would access the memory which is out of the desired range.
Modify the method of checking the end of the memory to avoid it.

For r8152_tx_agg_fill(), the variable remain may become negative.
However, the declaration is unsigned, so the while loop wouldn't
break when reaching the end of the desied memory. Although to change
the declaration from unsigned to signed is enough to fix it, I also
modify the checking method for safe. Replace

remain = rx_buf_sz - sizeof(*tx_desc) -
 (u32)((void *)tx_data - agg->head);

with

remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);

to make sure the variable remain is always positive. Then, the
overflow wouldn't happen.

For rx_bottom(), the rx_desc should not be used to calculate the
packet length before making sure the rx_desc is in the desired range.
Change the checking to two parts. First, check the descriptor is in
the memory. The other, using the descriptor to find out the packet
length and check if the packet is in the memory.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f3fce41..428600d 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -24,7 +24,7 @@
 #include 
 
 /* Version Information */
-#define DRIVER_VERSION "v1.01.0 (2013/08/12)"
+#define DRIVER_VERSION "v1.02.0 (2013/10/28)"
 #define DRIVER_AUTHOR "Realtek linux nic maintainers "
 #define DRIVER_DESC "Realtek RTL8152 Based USB 2.0 Ethernet Adapters"
 #define MODULENAME "r8152"
@@ -1136,14 +1136,14 @@ r8152_tx_csum(struct r8152 *tp, struct tx_desc *desc, 
struct sk_buff *skb)
 
 static int r8152_tx_agg_fill(struct r8152 *tp, struct tx_agg *agg)
 {
-   u32 remain;
+   int remain;
u8 *tx_data;
 
tx_data = agg->head;
agg->skb_num = agg->skb_len = 0;
-   remain = rx_buf_sz - sizeof(struct tx_desc);
+   remain = rx_buf_sz;
 
-   while (remain >= ETH_ZLEN) {
+   while (remain >= ETH_ZLEN + sizeof(struct tx_desc)) {
struct tx_desc *tx_desc;
struct sk_buff *skb;
unsigned int len;
@@ -1152,12 +1152,14 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct 
tx_agg *agg)
if (!skb)
break;
 
+   remain -= sizeof(*tx_desc);
len = skb->len;
if (remain < len) {
skb_queue_head(>tx_queue, skb);
break;
}
 
+   tx_data = tx_agg_align(tx_data);
tx_desc = (struct tx_desc *)tx_data;
tx_data += sizeof(*tx_desc);
 
@@ -1167,9 +1169,8 @@ static int r8152_tx_agg_fill(struct r8152 *tp, struct 
tx_agg *agg)
agg->skb_len += len;
dev_kfree_skb_any(skb);
 
-   tx_data = tx_agg_align(tx_data + len);
-   remain = rx_buf_sz - sizeof(*tx_desc) -
-(u32)((void *)tx_data - agg->head);
+   tx_data += len;
+   remain = rx_buf_sz - (int)(tx_agg_align(tx_data) - agg->head);
}
 
usb_fill_bulk_urb(agg->urb, tp->udev, usb_sndbulkpipe(tp->udev, 2),
@@ -1188,7 +1189,6 @@ static void rx_bottom(struct r8152 *tp)
list_for_each_safe(cursor, next, >rx_done) {
struct rx_desc *rx_desc;
struct rx_agg *agg;
-   unsigned pkt_len;
int len_used = 0;
struct urb *urb;
u8 *rx_data;
@@ -1204,17 +1204,22 @@ static void rx_bottom(struct r8152 *tp)
 
rx_desc = agg->head;
rx_data = agg->head;
-   pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
-   len_used += sizeof(struct rx_desc) + pkt_len;
+   len_used += sizeof(struct rx_desc);
 
-   while (urb->actual_length >= len_used) {
+   while (urb->actual_length > len_used) {
struct net_device *netdev = tp->netdev;
struct net_device_stats *stats;
+   unsigned int pkt_len;
struct sk_buff *skb;
 
+   pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;
if (pkt_len < ETH_ZLEN)
break;
 
+   len_used += pkt_len;
+   if (urb->actual_length < len_used)
+   break;
+
stats = rtl8152_get_stats(netdev);
 
pkt_len -= 4; /* CRC */
@@ -1234,9 +1239,8 @@ static void rx_bottom(struct r8152 *tp)
 
rx_data = rx_agg_align(rx_data + pkt_len + 4);
rx_desc = (struct rx_desc *)rx_data;
-   pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK;

Re: [f2fs-dev] [PATCH] f2fs: split sbi->write_mutex for DATA/NODE/META to avoid unnecessary race

2013-11-18 Thread Jaegeuk Kim

Hi,

2013-11-19 (화), 09:43 +0800, Chao Yu:
> All DATA/NODE/META bio buffers in superblock is protected by 
> 'sbi->write_mutex', but each bio buffer area is independent, So we 
> should split write_mutex to three for DATA/NODE/META.

Agreed, one comment below though.
Anyway I'll send v2.
Thanks, :)

> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/f2fs.h|2 +-
>  fs/f2fs/segment.c |8 
>  fs/f2fs/super.c   |4 +++-
>  3 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 2df1e61..1c67521 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -362,7 +362,7 @@ struct f2fs_sb_info {
>   struct f2fs_sm_info *sm_info;   /* segment manager */
>   struct bio *bio[NR_PAGE_TYPE];  /* bios to merge */
>   sector_t last_block_in_bio[NR_PAGE_TYPE];   /* last block number */
> - struct mutex write_mutex;   /* mutex for writing IOs */
> + struct mutex write_mutex[NR_PAGE_TYPE]; /* mutex for writing IOs */
>  
>   /* for checkpoint */
>   struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 556965f..a36713ae 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -661,9 +661,9 @@ void f2fs_submit_bio(struct f2fs_sb_info *sbi, enum
> page_type type, bool sync)
>   if (!sbi->bio[btype])
>   return;
>  
> - mutex_lock(>write_mutex);
> + mutex_lock(>write_mutex[type]);

btype instead of type.

>   do_submit_bio(sbi, type, sync);
> - mutex_unlock(>write_mutex);
> + mutex_unlock(>write_mutex[type]);
>  }
>  
>  static void submit_write_page(struct f2fs_sb_info *sbi, struct page *page,
> @@ -674,7 +674,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi,
> struct page *page,
>  
>   verify_block_addr(sbi, blk_addr);
>  
> - mutex_lock(>write_mutex);
> + mutex_lock(>write_mutex[type]);
>  
>   inc_page_count(sbi, F2FS_WRITEBACK);
>  
> @@ -709,7 +709,7 @@ retry:
>  
>   sbi->last_block_in_bio[type] = blk_addr;
>  
> - mutex_unlock(>write_mutex);
> + mutex_unlock(>write_mutex[type]);
>   trace_f2fs_submit_write_page(page, blk_addr, type);
>  }
>  
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index f56941c..6928c0a 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -874,7 +874,9 @@ static int f2fs_fill_super(struct super_block *sb, void
> *data, int silent)
>   mutex_init(>node_write);
>   sbi->por_doing = false;
>   spin_lock_init(>stat_lock);
> - mutex_init(>write_mutex);
> + mutex_init(>write_mutex[DATA]);
> + mutex_init(>write_mutex[NODE]);
> + mutex_init(>write_mutex[META]);
>   init_rwsem(>cp_rwsem);
>   init_waitqueue_head(>cp_wait);
>   init_sb_info(sbi);

-- 
Jaegeuk Kim
Samsung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:sched/urgent] sched: Optimize task_sched_runtime()

2013-11-18 Thread Davidlohr Bueso

On Wed, 2013-11-13 at 09:25 -0800, tip-bot for Peter Zijlstra wrote:
> Commit-ID:  911b2898b3c9fe0048e9485ad1629ed4fce330fd
> Gitweb: http://git.kernel.org/tip/911b2898b3c9fe0048e9485ad1629ed4fce330fd
> Author: Peter Zijlstra 
> AuthorDate: Mon, 11 Nov 2013 18:21:56 +0100
> Committer:  Ingo Molnar 
> CommitDate: Wed, 13 Nov 2013 13:33:54 +0100
> 
> sched: Optimize task_sched_runtime()
> 
> Large multi-threaded apps like to hit this using do_sys_times() and
> then queue up on the rq->lock.
> 
> Avoid when possible.
> 
> Larry reported ~20% performance increase his test case.
> 
> Reported-by: Larry Woodman 
> Suggested-by: Paul Turner 
> Signed-off-by: Peter Zijlstra 
> Cc: KOSAKI Motohiro 
> Cc: Linus Torvalds 
> Cc: Andrew Morton 
> Link: 
> http://lkml.kernel.org/r/2013172925.gg26...@twins.programming.kicks-ass.net
> Signed-off-by: Ingo Molnar 

For what it's worth:

Tested-by: Davidlohr Bueso 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, efi: change name of efi_no_storage_paranoia parameter to efi_storage_paranoia

2013-11-18 Thread Madper Xie


isimatu.yasu...@jp.fujitsu.com writes:

> Hi Matt,
>
> Sorry for late the reply.
>
>
> (2013/11/11 19:54), Matt Fleming wrote:
>> On Mon, 11 Nov, at 05:52:59PM, Yasuaki Ishimatsu wrote:
>>> Hi Matt,
>>>
>>> I uses FUJITSU's x86 box.
>>> This does not become bricked even if I use all efi variable storage.
>>> Thus I want a way to not need to specify efi_no_storage_paranoia
>>> parameter.
>>
>> The efi_no_storage_paranoia parameter was introduced because some
>> machines do not initiate garbage collection of the NVRAM until you
>> allocate all space - basically it's a switch to turn off the "save 5KB
>> of stoarge at all times" workaround that is needed to avoid bricking
>> some machines.
>>
>> The intention of the switch is not to allow you to fill your NVRAM just
>> because you can. If that is something you want to do then I think it's
>> fair to require you to explicitly turn on efi_no_storage_paranoia. But
>> I'm assuming here that you are doing something like writing lots and
>> lots of pstore entries and just want to write as many as your variable
>> storage will allow? Or are you doing something more fundamental like
>> creating Boot entries?
>>
>> What are you doing to run into the 5KB reserve? How much NVRAM does your
>> machine come with?
>
> I just add boot entry to NVRAM by efibootmgr command. But when Linux boots up,
> the remaining NVRAM is less than 5Kbyte. So I cannnot add new entry.
>
Howdy Yasuaki,
  If the remaining NVRAM is less than 5Kb, your writing will trigger a
  NVRAM storage reclamation. However you still failed creating entry. So
  I'm just curious what itmes occupy lots of nvram storage space.

> Thanks,
> Yasuaki Ishimatsu

--
Best,
Madper
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] FS: Fixed buffer overflow issue in seq_read()

2013-11-18 Thread Linus Torvalds



On Tue, 19 Nov 2013, Al Viro wrote:
> 
> seq_file: always clear m->count when we free m->buf

Ok, applied.

What do you think about then just abstracing out that now common sequence 
of re-allocating a larger buffer, while clearing m->count?

IOW, something like the appended..

  Linus

---
 fs/seq_file.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 1d641bb108d2..f513e1efa49d 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -82,6 +82,14 @@ int seq_open(struct file *file, const struct seq_operations 
*op)
 }
 EXPORT_SYMBOL(seq_open);
 
+static int grow_seq_buf(struct seq_file *m)
+{
+   kfree(m->buf);
+   m->count = 0;
+   m->buf = kmalloc(m->size <<= 1, GFP_KERNEL);
+   return m->buf != NULL;
+}
+
 static int traverse(struct seq_file *m, loff_t offset)
 {
loff_t pos = 0, index;
@@ -135,10 +143,7 @@ static int traverse(struct seq_file *m, loff_t offset)
 
 Eoverflow:
m->op->stop(m, p);
-   kfree(m->buf);
-   m->count = 0;
-   m->buf = kmalloc(m->size <<= 1, GFP_KERNEL);
-   return !m->buf ? -ENOMEM : -EAGAIN;
+   return grow_seq_buf(m) ? -EAGAIN : -ENOMEM;
 }
 
 /**
@@ -232,10 +237,7 @@ ssize_t seq_read(struct file *file, char __user *buf, 
size_t size, loff_t *ppos)
if (m->count < m->size)
goto Fill;
m->op->stop(m, p);
-   kfree(m->buf);
-   m->count = 0;
-   m->buf = kmalloc(m->size <<= 1, GFP_KERNEL);
-   if (!m->buf)
+   if (!grow_seq_buf(m))
goto Enomem;
m->version = 0;
pos = m->index;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.12 00/19] 3.12.1-stable review

2013-11-18 Thread Guenter Roeck

On Mon, Nov 18, 2013 at 10:37:13AM -0800, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.12.1 release.
> There are 19 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Nov 20 18:36:14 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.12.1-rc1.gz
> and the diffstat can be found below.
> 
Build test results:
total: 110 pass: 107 skipped: 3 fail: 0

qemu tests all passed.

Results match those seen with the previous release and are as expected.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.11 00/25] 3.11.9-stable review

2013-11-18 Thread Guenter Roeck

On Mon, Nov 18, 2013 at 10:40:29AM -0800, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.11.9 release.
> There are 25 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Nov 20 18:40:06 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.11.9-rc1.gz
> and the diffstat can be found below.
> 
Build test results:
total: 110 pass: 108 skipped: 2 fail: 0

qemu tests all passed.

Results match the results seen with the previous release and are as expected.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] arch: hexagon: kernel: add export symbol function __delay()

2013-11-18 Thread Chen Gang

Need add __delay() implementation, or can not pass allmodconfig in
next-20131118 tree.

The related error:

CC  kernel/locking/spinlock_debug.o
  kernel/locking/spinlock_debug.c: In function '__spin_lock_debug':
  kernel/locking/spinlock_debug.c:114:3: error: implicit declaration of 
function '__delay' [-Werror=implicit-function-declaration]


Signed-off-by: Chen Gang 
---
 arch/hexagon/include/asm/delay.h |1 +
 arch/hexagon/kernel/time.c   |9 +
 2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/hexagon/include/asm/delay.h b/arch/hexagon/include/asm/delay.h
index 5307971..8933b9b1 100644
--- a/arch/hexagon/include/asm/delay.h
+++ b/arch/hexagon/include/asm/delay.h
@@ -21,6 +21,7 @@
 
 #include 
 
+extern void __delay(unsigned long cycles);
 extern void __udelay(unsigned long usecs);
 
 #define udelay(usecs) __udelay((usecs))
diff --git a/arch/hexagon/kernel/time.c b/arch/hexagon/kernel/time.c
index d0c4f5a..17fbf45 100644
--- a/arch/hexagon/kernel/time.c
+++ b/arch/hexagon/kernel/time.c
@@ -229,6 +229,15 @@ void __init time_init(void)
late_time_init = time_init_deferred;
 }
 
+void __delay(unsigned long cycles)
+{
+   unsigned long long start = __vmgettime();
+
+   while ((__vmgettime() - start) < cycles)
+   cpu_relax();
+}
+EXPORT_SYMBOL(__delay);
+
 /*
  * This could become parametric or perhaps even computed at run-time,
  * but for now we take the observed simulator jitter.
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.10 00/24] 3.10.20-stable review

2013-11-18 Thread Guenter Roeck

On Mon, Nov 18, 2013 at 10:42:10AM -0800, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.10.20 release.
> There are 24 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Nov 20 18:42:04 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.10.20-rc1.gz
> and the diffstat can be found below.
> 

Build test results:
total: 110 pass: 110 skipped: 0 fail: 0

qemu tests all pass.

Results match the results seen with the previous release and are as expected.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.4 00/12] 3.4.70-stable review

2013-11-18 Thread Guenter Roeck

On Mon, Nov 18, 2013 at 10:41:33AM -0800, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 3.4.70 release.
> There are 12 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed Nov 20 18:41:11 UTC 2013.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.70-rc1.gz
> and the diffstat can be found below.
> 

Build test results:
total: 103 pass: 89 skipped: 10 fail: 4

qemu tests all pass.

Results match the results seen with the previous release and are as expected.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86, efi: change name of efi_no_storage_paranoia parameter to efi_storage_paranoia

2013-11-18 Thread Yasuaki Ishimatsu


Hi Matt,

Sorry for late the reply.


(2013/11/11 19:54), Matt Fleming wrote:

On Mon, 11 Nov, at 05:52:59PM, Yasuaki Ishimatsu wrote:

Hi Matt,

I uses FUJITSU's x86 box.
This does not become bricked even if I use all efi variable storage.
Thus I want a way to not need to specify efi_no_storage_paranoia
parameter.


The efi_no_storage_paranoia parameter was introduced because some
machines do not initiate garbage collection of the NVRAM until you
allocate all space - basically it's a switch to turn off the "save 5KB
of stoarge at all times" workaround that is needed to avoid bricking
some machines.

The intention of the switch is not to allow you to fill your NVRAM just
because you can. If that is something you want to do then I think it's
fair to require you to explicitly turn on efi_no_storage_paranoia. But
I'm assuming here that you are doing something like writing lots and
lots of pstore entries and just want to write as many as your variable
storage will allow? Or are you doing something more fundamental like
creating Boot entries?

What are you doing to run into the 5KB reserve? How much NVRAM does your
machine come with?


I just add boot entry to NVRAM by efibootmgr command. But when Linux boots up,
the remaining NVRAM is less than 5Kbyte. So I cannnot add new entry.

Thanks,
Yasuaki Ishimatsu

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] Early use of boot service memory

2013-11-18 Thread Yinghai Lu

On Mon, Nov 18, 2013 at 5:32 PM, H. Peter Anvin  wrote:
> On 11/15/2013 10:30 AM, Vivek Goyal wrote:
>>
>> And IOMMU support is very flaky with kdump. And IOMMU's can be turned
>> off at command line. And that would force one to remove crahkernel_low=0.
>> So change of one command line option forces change of another. It is
>> complicated.
>>
>> Also there are very few systems which work with IOMMU on. A lot more
>> which work without IOMMU. We have all these DMAR issues and still nobody
>> has been able to address IOMMU issues properly.
>>
>
> Why do we need such a big bounce buffer for kdump swiotlb anyway?
> Surely the vast majority of all dump devices don't need it, so it is
> there for completeness, no?

Yes, because normal path will need that 64M+32k default.

We may reduce that amount to 16M or 18M and in second kernel let
allocate less for swiotlb.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] ARM: move firmware_ops to drivers/firmware

2013-11-18 Thread Alex Courbot


On 11/18/2013 08:58 PM, Catalin Marinas wrote:

On Mon, Nov 18, 2013 at 03:05:59AM +, Alex Courbot wrote:

On 11/18/2013 12:59 AM, Catalin Marinas wrote:

On 17 November 2013 08:49, Alexandre Courbot  wrote:

The ARM tree includes a firmware_ops interface that is designed to
implement support for simple, TrustZone-based firmwares but could
also cover other use-cases. It has been suggested that this
interface might be useful to other architectures (e.g. arm64) and
that it should be moved out of arch/arm.


NAK. I'm for code sharing with arm via common locations but this API
goes against the ARMv8 firmware standardisation efforts like PSCI,
encouraging each platform to define there own non-standard interface.


I have to say, I pretty much agree with your NAK.

The reason for this patch is that the suggestion to move firmware_ops
out of arch/arm is the last (I hope) thing that prevents my Trusted
Foundation support series from being merged.


Moving it into drivers shouldn't be a workaround. Nice try ;).


Hehe. I thought that just sending a patch would settle the issue one way 
or the other and avoid a huge discussion. Woke up this morning to see 
how wrong I was.





Now if we can all agree:

* that ARMv8 will only use PSCI


Or spin-table (which does not require secure calls). Otherwise, if
secure firmware is present, SoCs should use PSCI (as the only firmware
standard currently supported in the arm64 kernel).

However, things evolve and we may have other needs in the future or PSCI
may not be sufficient or we get newer PSCI revisions. This can be
extended but my requirement is to decouple booting standard from SoC
support (together with the aim of having no SoC-specific code under
arch/arm64). I really don't see why SoCs can't agree on one (or very
few) standard booting protocol (and legacy argument doesn't work since
the ARMv8 firmware needs to be converted to AArch64 anyway).


* that there is no use-case of this interface outside of arch/arm as of
today (and none foreseen in the near future)


The firmware_ops are only used under arch/arm so far, I don't see any
drivers doing anything with it. Also, l2x0_init is ARMv7 only.

On arm64, support for PSCI is handled via cpu_operations in the latest
kernel. That's an arm64 abstraction and is extensible (but we want to
keep tight control of this, hence no register_cpu_ops function).


* that the firmware_ops interface is quite ARMv7-specific anyway,


This was introduced to allow SoC code to enable hooks for SoC-specific
firmware calls like cpu_idle, l2x0_init. By standardising the interface
and decoupling it from SoC code on arm64, we don't need per-SoC
firmware_ops.

Of course, trusted foundations interface could be plugged into cpu_ops
on arm64 but I will NAK it on the grounds of not using the PSCI API, nor
the SMC calling convention (and it's easy to fix when porting to ARMv8).
If a supported standard API is used, then there is no need for
additional code in the kernel.

BTW, is legacy code the reason for not converting the SMC # to PSCI?
It's already supported on ARMv7, so you may not have much code left to
merge in the kernel ;).


The problem here is twofold:

1) we are just consumers of the TrustZone secure monitor who receive a 
binary and do not have any control over its calling conventions. I agree 
that it would be trivial to make it compatible with PSCI, but it's just 
not something we can make by ourselves (TF does not even follow the SMC 
calling convention). If this problem is to be addressed, it should be 
done by forcing the TrustZone secure monitors providers to follow PSCI.


2) devices have already shipped with this firmware. Are we going to just 
renounce supporting them, even though the necessary support is 
lightweight and fits within already existing interfaces?


I certainly do hope that for ARMv8 things will be different and more 
standardized. But that's not something that can be guaranteed unless ARM 
strongly enforces it to firmware vendors. In case such a non-standard 
firmware gets used again, I *do* hope that using cpu_ops will be 
preferred over saying "this device cannot be supported in mainline, ever".


The kernel already supports non-standard hardware, BIOS, ACPI through 
hacks that are *way* more horrible than that. This should certainly not 
be encouraged, but that's not a valid reason to forbid otherwise 
perfectly fine devices to run mainline IMHO.





* that should a need to move it (for whatever reason) occur later, it
will be easy to do (as this patch hopefully demonstrates).


I agree, it's not hard to unify this but so far I haven't seen a good
reason.


Same here. arm64 has its own cpu_operations. Other archs have different 
needs and if we move this to a common place it will just become a messy 
placeholder for function pointers from which each arch will only use a 
subset.


Not to mention that if we follow the logic completely, we should then 
implement PCSI on top of cpu_ops and cpu_ops on top of

Re: 3.10.16 cgroup_mutex deadlock

2013-11-18 Thread Li Zefan

> Thanks Tejun and Hugh.  Sorry for my late entry in getting around to
> testing this fix. On the surface it sounds correct however I'd like to
> test this on top of 3.10.* since that is what we'll likely be running.
> I've tried to apply Hugh's patch above on top of 3.10.19 but it
> appears there are a number of conflicts.  Looking over the changes and
> my understanding of the problem I believe on 3.10 only the
> cgroup_free_fn needs to be run in a separate workqueue.  Below is the
> patch I've applied on top of 3.10.19, which I'm about to start
> testing.  If it looks like I botched the backport in any way please
> let me know so I can test a propper fix on top of 3.10.19.
> 

You didn't move css free_work to the dedicate wq as Tejun's patch does.
css free_work won't acquire cgroup_mutex, but when destroying a lot of
cgroups, we can have a lot of css free_work in the workqueue, so I'd
suggest you also use cgroup_destroy_wq for it.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] tools lib traceevent: Report better error message on bad function args

2013-11-18 Thread Steven Rostedt

When Jiri Olsa was writing a function callback for
scsi_trace_parse_cdb(), he thought that the traceevent library had a
bug in it because he was getting this error:

  Error: expected ')' but read ','
  Error: expected ')' but read ','
  Error: expected ')' but read ','
  Error: expected ')' but read ','

But in truth, he didn't have the write number of arguments for the
function callback, and the error was the library detecting the
discrepancy. A better error message would have prevented the confusion:

  Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
scsi_dispatch_cmd_timeout has more
  Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
scsi_dispatch_cmd_start has more
  Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
scsi_dispatch_cmd_error has more
  Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
scsi_dispatch_cmd_done has more

Or

  Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
scsi_dispatch_cmd_timeout only uses 3
  Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
scsi_dispatch_cmd_start only uses 3
  Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
scsi_dispatch_cmd_error only uses 3
  Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
scsi_dispatch_cmd_done only uses 3

Signed-off-by: Steven Rostedt 
---

V2:
   Added '[PATCH.*]' in subject. Don't know how I missed that the first time.

   Free token and farg. Thanks to Namhyung Kim for pointing that out.


diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 0362d57..87dda60 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -2691,7 +2691,6 @@ process_func_handler(struct event_format *event, struct 
pevent_function_handler
struct print_arg *farg;
enum event_type type;
char *token;
-   const char *test;
int i;
 
arg->type = PRINT_FUNC;
@@ -2708,15 +2707,19 @@ process_func_handler(struct event_format *event, struct 
pevent_function_handler
}
 
type = process_arg(event, farg, );
-   if (i < (func->nr_args - 1))
-   test = ",";
-   else
-   test = ")";
-
-   if (test_type_token(type, token, EVENT_DELIM, test)) {
-   free_arg(farg);
-   free_token(token);
-   return EVENT_ERROR;
+   if (i < (func->nr_args - 1)) {
+   if (type != EVENT_DELIM || strcmp(token, ",") != 0) {
+   warning("Error: function '%s()' expects %d 
arguments but event %s only uses %d",
+   func->name, func->nr_args,
+   event->name, i + 1);
+   goto err;
+   }
+   } else {
+   if (type != EVENT_DELIM || strcmp(token, ")") != 0) {
+   warning("Error: function '%s()' only expects %d 
arguments but event %s has more",
+   func->name, func->nr_args, event->name);
+   goto err;
+   }
}
 
*next_arg = farg;
@@ -2728,6 +2731,11 @@ process_func_handler(struct event_format *event, struct 
pevent_function_handler
*tok = token;
 
return type;
+
+err:
+   free_arg(farg);
+   free_token(token);
+   return EVENT_ERROR;
 }
 
 static enum event_type
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Namhyung Kim

On Tue, Nov 19, 2013 at 2:33 AM, David Ahern  wrote:
> On 11/18/13, 7:30 PM, Namhyung Kim wrote:
>>
>> On Mon, 18 Nov 2013 19:17:37 -0700, David Ahern wrote:
>>>
>>> On 11/18/13, 7:13 PM, Namhyung Kim wrote:

 I think it should be

 perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192
 -a -- sleep 1

 (at least to generate the feedback spiral more efficiently..)
>>>
>>>
>>> you don't need the cycles. faults by itself works. Each event contains
>>> 2 pages of data in the sample. With mmap-based output a single
>>> sample (1 page fault in any process) generates 2-3 page faults by perf
>>> which cause 2-3 >8k samples to be generated, which generates faults,
>>> 
>>
>>
>> But after perf touches all pages in ring-buffer and stack, it won't
>> generate page-faults for itself anymore, right?
>>
>> Hmm.. thinking it again, perf has all ring-buffer pages in memory when
>> mmap() called, right?  If so why not doing something like MAP_POPULATE
>> so that it doesn't need to generate minor-faults?
>
>
> This is mmap'ed output, not the ring buffers or its stack. As the output
> file grows, new pages are needed and those are allocated on access via page
> faults. The ftruncate only extends the file size, it does not allocate pages
> at that time.

Argh, I was completely confused! ;-)

Please feel free to ignore what I said here..

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread David Ahern


On 11/18/13, 7:30 PM, Namhyung Kim wrote:

On Mon, 18 Nov 2013 19:17:37 -0700, David Ahern wrote:

On 11/18/13, 7:13 PM, Namhyung Kim wrote:

I think it should be

perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- 
sleep 1

(at least to generate the feedback spiral more efficiently..)


you don't need the cycles. faults by itself works. Each event contains
2 pages of data in the sample. With mmap-based output a single
sample (1 page fault in any process) generates 2-3 page faults by perf
which cause 2-3 >8k samples to be generated, which generates faults,



But after perf touches all pages in ring-buffer and stack, it won't
generate page-faults for itself anymore, right?

Hmm.. thinking it again, perf has all ring-buffer pages in memory when
mmap() called, right?  If so why not doing something like MAP_POPULATE
so that it doesn't need to generate minor-faults?


This is mmap'ed output, not the ring buffers or its stack. As the output 
file grows, new pages are needed and those are allocated on access via 
page faults. The ftruncate only extends the file size, it does not 
allocate pages at that time.


David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH net v3 1/4] r8152: fix tx/rx memory overflow

2013-11-18 Thread hayeswang

David Miller [mailto:da...@davemloft.net] 
> Sent: Saturday, November 16, 2013 6:40 AM
> To: Hayeswang
> Cc: net...@vger.kernel.org; nic_swsd; 
> linux-kernel@vger.kernel.org; linux-...@vger.kernel.org
> Subject: Re: [PATCH net v3 1/4] r8152: fix tx/rx memory overflow
> 
> From: Hayes Wang 
> Date: Fri, 15 Nov 2013 15:57:56 +0800
> 
> > +   unsigned pkt_len;
> 
> Please fully specify the type as "unsigned int".  Please 
> check for this
> problem in the rest of your patches too.

I would fix it.

I check the other patches, and I don't find the same problem.

Thanks.
 
Best Regards,
Hayes

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Add a text_poke syscall

2013-11-18 Thread Andy Lutomirski

On 11/18/2013 04:27 PM, Andi Kleen wrote:
> From: Andi Kleen 
> 
> Properly patching running code ("cross modification")
> is a quite complicated business on x86.
> 
> The CPU has specific rules that need to be followed, including
> multiple global barriers.
> 
> Self modifying code is getting more popular, so it's important
> to make it easy to follow the rules.
> 
> The kernel does it properly with text_poke_bp(). But the same
> method is hard to do for user programs.
> 
> This patch adds a (x86 specific) text_poke() syscall that exposes
> the text_poke_bp() machinery to user programs.
> 
> The interface is practically the same as text_poke_bp, just as
> a syscall. I added an extra timeout parameter, that
> will potentially allow batching the global barriers in
> the future. Right now it is enforced to be 0.
> 
> The call also still has a global lock, so it has some scaling
> limitations. If it was commonly used this could be fixed
> by setting up a list of break point locations. Then
> a lock would only be hold to modify the list.
> 
> Right now the implementation is just as simple as possible.
> 
> Proposed man page:
> 
> NAME
>   text_poke - Safely modify running instructions (x86)
> 
> SYNOPSYS
>   int text_poke(void *addr, const void *opcode, size_t len,
> void (*handler)(void), int timeout);
> 
> DESCRIPTION
>   The text_poke system allows to safely modify code that may
>   be currently executing in parallel on other threads.
>   Patch the instruction at addr with the new instructions
>   at opcode of length len. The target instruction will temporarily
>   be patched with a break point, before it is replaced
>   with the final replacement instruction. When the break point
>   hits the code handler will be called in the context
>   of the thread. The handler does not save any registers
>   and cannot return. Typically it would consist of the
>   original instruction and then a jump to after the original
>   instruction. The handler is only needed during the
>   patching process and can be overwritten once the syscall
>   returns. timeout defines an optional timout to indicate
>   to the kernel how long the patching could be delayed.
>   Right now it has to be 0.
> 
> EXAMPLE
> 
> volatile int finished;
> 
> extern char patch[], recovery[], repl[];
> 
> struct res {
> long total;
> long val1, val2, handler;
> };
> 
> int text_poke(void *insn, void *repl, int len, void *handler, int to)
> {
> return syscall(314, insn, repl, len, handler, to);
> }
> 
> void *tfunc(void *arg)
> {
> struct res *res = (struct res *)arg;
> 
> while (!finished) {
> int val;
> asm volatile(   ".globl patch\n"
> ".globl recovery\n"
> ".global repl\n"
>   /* original code to be patched */
> "patch: mov $1,%0\n"
> "1:\n"
> ".section \".text.patchup\",\"x\"\n"
>   /* Called when a race happens during patching.
>  Just execute the original code and jump 
> back. */
> "recovery:\n"
> " mov $3,%0\n"
> " jmp 1b\n"
>   /* replacement code that gets patched in: */
> "repl:\n"
> " mov $2,%0\n"
> ".previous" : "=a" (val));
> if (val == 1)
> res->val1++;
> else if (val == 3)
> res->handler++;
> else
> res->val2++;
> res->total++;
> }
> return NULL;
> }
> 
> int main(int ac, char **av)
> {
> int ncpus = sysconf(_SC_NPROCESSORS_ONLN);
> int ps = sysconf(_SC_PAGESIZE);
> pthread_t pthr[ncpus];
> struct res res[ncpus];
> int i;
> 
> srand(1);
> memset(, 0, sizeof(struct res) * ncpus);
> mprotect(patch - (unsigned long)patch % ps, ps,
>PROT_READ|PROT_WRITE|PROT_EXEC);
> for (i = 0; i < ncpus - 1; i++)
> pthread_create([i], NULL, tfunc, [i]);
> for (i = 0; i < 50; i++) {
> text_poke(patch, repl, 5, recovery, 0);
> nanosleep(&((struct timespec) { 0, rand() % 100 }), NULL);
> text_poke(repl, patch, 5, recovery, 0);
> }
> finished = 1;
> for (i = 0; i < ncpus - 1; i++) {
> pthread_join(pthr[i], NULL);
> printf("%d: val1 %lu val2 %lu handler %lu to %lu\n",
> i, res[i].val1,

Re: tools lib traceevent: Report better error message on bad function args

2013-11-18 Thread Steven Rostedt

On Tue, 19 Nov 2013 11:23:56 +0900
Namhyung Kim  wrote:
 
> > arg->type = PRINT_FUNC;
> > @@ -2708,15 +2707,19 @@ process_func_handler(struct event_format *event, 
> > struct pevent_function_handler
> > }
> >  
> > type = process_arg(event, farg, );
> > -   if (i < (func->nr_args - 1))
> > -   test = ",";
> > -   else
> > -   test = ")";
> > -
> > -   if (test_type_token(type, token, EVENT_DELIM, test)) {
> > -   free_arg(farg);
> > -   free_token(token);
> > -   return EVENT_ERROR;
> > +   if (i < (func->nr_args - 1)) {
> > +   if (type != EVENT_DELIM || strcmp(token, ",") != 0) {
> > +   warning("Error: function '%s()' expects %d 
> > arguments but event %s only uses %d",
> > +   func->name, func->nr_args,
> > +   event->name, i + 1);
> > +   return EVENT_ERROR;
> > +   }
> > +   } else {
> > +   if (type != EVENT_DELIM || strcmp(token, ")") != 0) {
> > +   warning("Error: function '%s()' only expects %d 
> > arguments but event %s has more",
> > +   func->name, func->nr_args, event->name);
> > +   return EVENT_ERROR;
> 
> It seems that you missed to free farg and token in error paths.
> 

Ug, you're right!

Thanks for the review. v2 coming up.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Namhyung Kim

On Mon, 18 Nov 2013 19:17:37 -0700, David Ahern wrote:
> On 11/18/13, 7:13 PM, Namhyung Kim wrote:
>> I think it should be
>>
>>perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a 
>> -- sleep 1
>>
>> (at least to generate the feedback spiral more efficiently..)
>
> you don't need the cycles. faults by itself works. Each event contains
> 2 pages of data in the sample. With mmap-based output a single
> sample (1 page fault in any process) generates 2-3 page faults by perf
> which cause 2-3 >8k samples to be generated, which generates faults,
> 

But after perf touches all pages in ring-buffer and stack, it won't
generate page-faults for itself anymore, right?

Hmm.. thinking it again, perf has all ring-buffer pages in memory when
mmap() called, right?  If so why not doing something like MAP_POPULATE
so that it doesn't need to generate minor-faults?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] X86: MM: Add PAT Type write-through in combination with mtrr

2013-11-18 Thread H. Peter Anvin

On 11/03/2013 04:02 AM, Andreas Werner wrote:
> Revision 2:
>   added comment in code.
> 
> This patch adds the Write-through memory type in combination with mtrr.
> If you call ioremap_cache to request cachable memory (write-back) the
> function tries to set the PAT to write-back only if the mtrr setting of
> the requested region is also marked as Write-Back.
> 
> If the mttr regions are marked e.g. as Write-through or with other
> types, the function will always return UC- memory.
> 
> If you check the Intel document " IA-32 SDM vol 3a table Effective
> Memory Type", there
> are many other combinations possible.
> 
> This patch will only add the following combination:
> PAT=Write-Back + MTRR=Write-Through.
> 
> Since marking IO Memory as cachable is not valid, WT is the
> best way for caching/bursting on MMIO Devices.
> 
> Tested on - Intel (R) Atom E680 (Tunnel Creek)
>   - Intel (R) Core(TM)2 Duo
> 
> Signed-off-by: Andreas Werner 

I don't quite know where this ended up, but I am *really* not happy
about going back to using MTRRs to mark I/O devices with the chronic
problems of MTRR exhaustion that entails.  As such I do insist that PAT
is properly updated to support WT if we're going to do this.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: tools lib traceevent: Report better error message on bad function args

2013-11-18 Thread Namhyung Kim

Hi Steven,

On Mon, 18 Nov 2013 19:11:31 -0500, Steven Rostedt wrote:
> When Jiri Olsa was writing a function callback for
> scsi_trace_parse_cdb(), he thought that the traceevent library had a
> bug in it because he was getting this error:
>
>   Error: expected ')' but read ','
>   Error: expected ')' but read ','
>   Error: expected ')' but read ','
>   Error: expected ')' but read ','
>
> But in truth, he didn't have the write number of arguments for the
> function callback, and the error was the library detecting the
> discrepancy. A better error message would have prevented the confusion:
>
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_timeout has more
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_start has more
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_error has more
>   Error: function 'scsi_trace_parse_cdb()' only expects 2 arguments but event 
> scsi_dispatch_cmd_done has more
>
> Or
>
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_timeout only uses 3
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_start only uses 3
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_error only uses 3
>   Error: function 'scsi_trace_parse_cdb()' expects 4 arguments but event 
> scsi_dispatch_cmd_done only uses 3
>
> Signed-off-by: Steven Rostedt 
>
> diff --git a/tools/lib/traceevent/event-parse.c 
> b/tools/lib/traceevent/event-parse.c
> index 0362d57..f7aab3d 100644
> --- a/tools/lib/traceevent/event-parse.c
> +++ b/tools/lib/traceevent/event-parse.c
> @@ -2691,7 +2691,6 @@ process_func_handler(struct event_format *event, struct 
> pevent_function_handler
>   struct print_arg *farg;
>   enum event_type type;
>   char *token;
> - const char *test;
>   int i;
>  
>   arg->type = PRINT_FUNC;
> @@ -2708,15 +2707,19 @@ process_func_handler(struct event_format *event, 
> struct pevent_function_handler
>   }
>  
>   type = process_arg(event, farg, );
> - if (i < (func->nr_args - 1))
> - test = ",";
> - else
> - test = ")";
> -
> - if (test_type_token(type, token, EVENT_DELIM, test)) {
> - free_arg(farg);
> - free_token(token);
> - return EVENT_ERROR;
> + if (i < (func->nr_args - 1)) {
> + if (type != EVENT_DELIM || strcmp(token, ",") != 0) {
> + warning("Error: function '%s()' expects %d 
> arguments but event %s only uses %d",
> + func->name, func->nr_args,
> + event->name, i + 1);
> + return EVENT_ERROR;
> + }
> + } else {
> + if (type != EVENT_DELIM || strcmp(token, ")") != 0) {
> + warning("Error: function '%s()' only expects %d 
> arguments but event %s has more",
> + func->name, func->nr_args, event->name);
> + return EVENT_ERROR;

It seems that you missed to free farg and token in error paths.

Thanks,
Namhyung


> + }
>   }
>  
>   *next_arg = farg;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] x86: boot: Fix mixed indentation in a20.c

2013-11-18 Thread H. Peter Anvin

On 11/18/2013 09:50 AM, Johannes Löthberg wrote:
> Replace all mixed indentation with tabs
> 
> Signed-off-by: Johannes Löthberg 

NAK.  Not worth the churn in the absence of other changes.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT] Crypto Update for 3.13

2013-11-18 Thread Herbert Xu

Hi Linus:

Here is a resend of crypto the update for 3.13:

* Made x86 ablk_helper generic for ARM.
* Phase out chainiv in favour of eseqiv (affects IPsec).
* Fixed aes-cbc IV corruption on s390.
* Added constant-time crypto_memneq which replaces memcmp.

* Fixed aes-ctr in omap-aes.
* Added OMAP3 ROM RNG support.
* Add PRNG support for MSM SoC's
* Add and use Job Ring API in caam.

* Misc fixes.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git



Alex Porosanu (7):
  crypto: caam - fix RNG state handle instantiation descriptor
  crypto: caam - fix hash, alg and rng registration if CAAM driver not 
initialized
  crypto: caam - fix RNG4 instantiation
  crypto: caam - split RNG4 instantiation function
  crypto: caam - uninstantiate RNG state handle 0 if instantiated by caam 
driver
  crypto: caam - fix RNG4 AAI defines
  crypto: caam - enable instantiation of all RNG4 state handles

Ard Biesheuvel (2):
  crypto: create generic version of ablk_helper
  crypto: move x86 to the generic version of ablk_helper

Ben Hutchings (1):
  hwrng: via-rng - Mark device ID table as __maybe_unused

Fabio Estevam (4):
  crypto: dcp - Use devm_ioremap_resource()
  crypto: dcp - Use devm_request_irq()
  crypto: dcp - Fix the path for releasing the resources
  crypto: dcp - Check the return value from devm_ioremap_resource()

Herbert Xu (2):
  crypto: skcipher - Use eseqiv even on UP machines
  crypto: s390 - Fix aes-cbc IV corruption

James Yonan (1):
  crypto: crypto_memneq - add equality testing of memory regions w/o timing 
leaks

Joel Fernandes (1):
  crypto: omap-aes - Fix CTR mode counter length

Joni Lapilainen (1):
  crypto: omap-sham - Add missing modalias

Jussi Kivilinna (2):
  crypto: sha256_ssse3 - use correct module alias for sha224
  crypto: x86 - restore avx2_supported check

Linus Walleij (1):
  crypto: tegra - use kernel entropy instead of ad-hoc

Mathias Krause (6):
  crypto: authenc - Export key parsing helper function
  crypto: authencesn - Simplify key parsing
  crypto: ixp4xx - Simplify and harden key parsing
  crypto: picoxcell - Simplify and harden key parsing
  crypto: talitos - Simplify key parsing
  padata: make the sequence counter an atomic_t

Michael Ellerman (2):
  hwrng: pseries - Use KBUILD_MODNAME in pseries-rng.c
  hwrng: pseries - Return errors to upper levels in pseries-rng.c

Michael Opdenacker (1):
  crypto: mv_cesa: remove deprecated IRQF_DISABLED

Neil Horman (1):
  crypto: ansi_cprng - Fix off by one error in non-block size request

Oliver Neukum (1):
  crypto: sha256_ssse3 - also test for BMI2

Pali Rohár (1):
  hwrng: OMAP3 ROM Random Number Generator support

Ruchika Gupta (3):
  crypto: caam - Add Platform driver for Job Ring
  crypto: caam - Add API's to allocate/free Job Rings
  crypto: caam - Modify the interface layers to use JR API's

Sachin Kamat (7):
  crypto: mv_cesa - Staticize local symbols
  crypto: omap-aes - Staticize local symbols
  crypto: tegra-aes - Staticize tegra_aes_cra_exit
  crypto: tegra-aes - Fix NULL pointer dereference
  crypto: tegra-aes - Use devm_clk_get
  crypto: sahara - Remove redundant of_match_ptr
  crypto: mv_cesa - Remove redundant of_match_ptr

Stanimir Varbanov (2):
  ARM: DT: msm: Add Qualcomm's PRNG driver binding document
  hwrng: msm - Add PRNG support for MSM SoC's

Stephen Warren (1):
  ARM: tegra: remove tegra_chip_uid()

Yashpal Dutta (1):
  crypto: caam - map src buffer before access

kbuild test robot (1):
  crypto: ablk_helper - Replace memcpy with struct assignment

 .../devicetree/bindings/rng/qcom,prng.txt  |   17 +
 arch/arm/mach-tegra/fuse.c |   10 -
 arch/s390/crypto/aes_s390.c|   19 +-
 arch/x86/crypto/Makefile   |3 +-
 arch/x86/crypto/aesni-intel_glue.c |2 +-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |2 +-
 arch/x86/crypto/camellia_aesni_avx_glue.c  |2 +-
 arch/x86/crypto/cast5_avx_glue.c   |2 +-
 arch/x86/crypto/cast6_avx_glue.c   |2 +-
 arch/x86/crypto/serpent_avx2_glue.c|2 +-
 arch/x86/crypto/serpent_avx_glue.c |2 +-
 arch/x86/crypto/serpent_sse2_glue.c|2 +-
 arch/x86/crypto/sha256_ssse3_glue.c|4 +-
 arch/x86/crypto/twofish_avx_glue.c |2 +-
 arch/x86/include/asm/simd.h|   11 +
 crypto/Kconfig |   23 +-
 crypto/Makefile|8 +-
 {arch/x86/crypto => crypto}/ablk_helper.c  |   13 +-
 crypto/ablkcipher.c|   21 +-
 crypto/ansi_cprng.c|4 +-

Re: [PATCH] cpufreq: cpufreq-cpu0: Use a sane boot frequency when booting with a mismatched bootloader configuration

2013-11-18 Thread Shawn Guo

On Mon, Nov 18, 2013 at 10:41:36AM -0600, Nishanth Menon wrote:
> In the case of mismatch, to consider that device tree may be wrong in
> driver is also to assume that hardware was always configured correctly
> and we assume description is the flawed data.

No, I did not say that.  IMO, when cpufreq-cpu0 sees a mismatch, it has
no way to know or assume which one is correct and which is incorrect.
The best thing it can do is to fail out without changing anything about
running frequency and voltage.

Shawn

> That is just plain
> wrong. we need to assume device tree is the correct description of the
> hardware and any mismatch must be assumed as bad configuration - and
> this is what the patch does.
> 
> Now, if the description is wrong, that is a dts bug of it's own.
> 
> If the suggestion is to improve my commit message, I am more than
> happy to do so - please suggest how I could improve the same.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread David Ahern


On 11/18/13, 7:13 PM, Namhyung Kim wrote:

I think it should be

   perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- 
sleep 1

(at least to generate the feedback spiral more efficiently..)


you don't need the cycles. faults by itself works. Each event contains > 
2 pages of data in the sample. With mmap-based output a single sample (1 
page fault in any process) generates 2-3 page faults by perf which cause 
2-3 >8k samples to be generated, which generates faults, 


David


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 2/3] hwrng: OMAP3 ROM Random Number Generator support

2013-11-18 Thread Herbert Xu

On Mon, Nov 18, 2013 at 10:51:30PM +0100, Pali Rohár wrote:
> On Wednesday 16 October 2013 14:57:34 Herbert Xu wrote:
> > On Tue, Oct 08, 2013 at 12:04:09PM -0700, Tony Lindgren wrote:
> > > * Pali Rohár  [130920 06:33]:
> > > > This driver provides kernel-side support for the Random
> > > > Number Generator hardware found on OMAP34xx processors.
> > > > 
> > > > This driver comes from Maemo 2.6.28 kernel and was tested
> > > > on Nokia RX-51. It is platform device because it needs
> > > > board specific function for smc calls.
> > > 
> > > This one is should be merged via the hw_random patches
> > > seprately:
> > > 
> > > Acked-by: Tony Lindgren 
> > 
> > Patch applied.  Thanks!
> 
> Hello, I still do not see this patch (2/3) in linus tree. But 
> patch 1/3 and 3/3 are already merged. So is there any problem?

2/3 is still in my tree so when Linus pulls it it'll be there.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Namhyung Kim

On Mon, 18 Nov 2013 17:34:49 -0700, David Ahern wrote:
> On 11/18/13, 5:24 PM, Namhyung Kim wrote:
> What now? Can we add the mmap path as an option?

 I'd say an option is always a possibility, but someone please try
 what happens if you use stupid large events (dwarf stack copies) on
 PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().

 The other option is to simply disallow PERF_SAMPLE_STACK_USER for
 that event.

 Personally I think 8k copies for every event are way stupid anyway,
 that's a metric ton of data at a huge cost.
>>>
>>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>>> second - that's 80 MB for 10 seconds profiling - not the end of the
>>> world.
>>
>> We now use 4 khz sampling frequency by default, just FYI. :)
>
> I think Peter is asking about:
> perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1

I think it should be

  perf record -e cycles -F 4000 -e faults -c 1 --call-graph dwarf,8192 -a -- 
sleep 1

(at least to generate the feedback spiral more efficiently..)

Well, I know that we don't support this now.  But wouldn't it make sense
to support this kind of thing?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/6] Squashfs: add multi-threaded decompression using percpu variables

2013-11-18 Thread Minchan Kim

Hello Phillip,

Sorry for late response.

On Thu, Nov 14, 2013 at 05:04:36PM +, Phillip Lougher wrote:
> CCing Junjiro Okijima and Stephen Hemminger
> 
> On 08/11/13 02:42, Minchan Kim wrote:
> >
> >Hello Phillip,
> >
> >On Thu, Nov 07, 2013 at 08:24:22PM +, Phillip Lougher wrote:
> >>Add a multi-threaded decompression implementation which uses
> >>percpu variables.
> >>
> >>Using percpu variables has advantages and disadvantages over
> >>implementations which do not use percpu variables.
> >>
> >>Advantages: the nature of percpu variables ensures decompression is
> >>load-balanced across the multiple cores.
> >>
> >>Disadvantages: it limits decompression to one thread per core.
> >
> >At a glance, I understand your concern but I don't see benefit to
> >make this feature as separate new config because we can modify the
> >number of decompressor per core in the future.
> >I don't want to create new config SQUASHFS_DECOMP_MULTI_3,
> >SQUASHFS_DECOMP_MULTI_4 and so on. :)
> 
> You misunderstand
> 
> I have been sent two multi-threaded implementations in the
> past which use percpu variables:
> 
> 1.  First patch set:
> 
>http://www.spinics.net/lists/linux-fsdevel/msg34365.html
> 
>Later in early 2011, I explained why I'd not merged the
>patches, and promised to do so when I got time
> 
>http://www.spinics.net/lists/linux-fsdevel/msg42392.html
> 
> 
> 2.  Second patch set sent in 2011
> 
>http://www.spinics.net/lists/linux-fsdevel/msg44111.html
> 
> So, these patches have been in my inbox, waiting until I got
> time to refactor Squashfs so that they could be merged... and
> I finally got to do this last month, which is why I'm merging
> a combined version of both patches now.
> 
> As to why have *two* implementations, I previously explained these
> two approaches are complementary, and merging both allows the
> user to decide which method of parallelising Squashfs they want
> to do.

What I see for benefit decompressor_multi_percpu is only limit
memory overhead but it could be solved by dynamic shrinking as
I mentioned earlier.

About CPU usage, I'm not sure how decompressor_multi is horrible.
If it's really concern, we can fix it to limit number of decomp
stream by admin via sysfs or something.

Decompression load balance? Acutally, I don't understand your claim.
Could you elaborate it a bit?
I couldn't understand why decompressor_multi_percpu is better than
decompressor_multi.

> 
> The percpu implementation is a good approach to parallelising
> Squashfs.  It is extremely simple, both in code and overhead.

I agree about code but not sure about overhead.
I admit decompressor_multi is bigger but it consists of simple opeartions.
Sometime, kmalloc's cost could be higher if system memory pressure is
severe so that file system's performance would fluctuate. If it's
your concern, we could make threshold min/high so that it could guarantee
some speed at least.

> The decompression hotpath simply consists of taking a percpu
> variable, doing the decompression, and then a release.

When decomp buffer is created dynamically, it could be rather overhead
compared to percpu approach but once it did, the overhead of decompress
would be marginal.

> Looking at code sizes:
> 
> fs/squashfs/decompressor_multi.c|  199 +++
> fs/squashfs/decompressor_multi_percpu.c |  104 
> fs/squashfs/decompressor_single.c   |   85 +
> 
> The simplicity of the percpu approach is readily apparent, at 104
> lines it is only slightly larger than the single threaded
> implementation.
> 
> Personally I like both approaches, and I have no reason not to
> merge both implementations I have been sent.

Okay. I'm not a maintainer so I'm not strong against your thougt.
I just wanted to unify them if either of them isn't sigificantly win
because I thought it makes maintainer and contributors happy due to
avoid more CONFIG options which should be considered when some feature
is added.

> 
> But what does the community think here?  Do you want the percpu
> implementation?  Do you see value in having two implementations?
> Feedback is appreciated.

As I mentioned, my opinion is that let's unify them if either of them
is significantly win. It would be better to see some benchmark result.

> 
> >
> >How about this?
> >
> >1. Let's make CONFIG_DECOMPRESSOR_MAX which could be tuned by admin
> >  in Kconfig. default is CPU *2 or CPU, Otherwise, we can put it
> >  to sysfs so user can tune it in rumtime.
> >
> >2. put decompressor shrink logic by slab shrinker so if system has
> >  memory pressure, we could catch the event and free some of decompressor
> >  but memory pressure is not severe again in the future, we can create
> >  new decompressor until reaching threadhold user define.
> >  We could know system memory is enough by GFP_NOWAIT, not GFP_KERNEL
> >  in get_decomp_stream's allocation indirectly.
> 
> This adds extra complexity to an implementation already

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Namhyung Kim

On Mon, 18 Nov 2013 17:34:49 -0700, David Ahern wrote:
> On 11/18/13, 5:24 PM, Namhyung Kim wrote:
> What now? Can we add the mmap path as an option?

 I'd say an option is always a possibility, but someone please try
 what happens if you use stupid large events (dwarf stack copies) on
 PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().

 The other option is to simply disallow PERF_SAMPLE_STACK_USER for
 that event.

 Personally I think 8k copies for every event are way stupid anyway,
 that's a metric ton of data at a huge cost.
>>>
>>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>>> second - that's 80 MB for 10 seconds profiling - not the end of the
>>> world.
>>
>> We now use 4 khz sampling frequency by default, just FYI. :)
>
> I think Peter is asking about:
> perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>
> And as expected it is a massive feedback spiraling out of control.

How about adding an option to exclude the perf tools from recording for
system-wide (or cpu-wide) session?

This way, we can prevent the feedback loops for page-fault or syscall
events you mentioned IMHO.

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [3.11.4] Thunderbolt/PCI unplug oops in pci_pme_list_scan

2013-11-18 Thread Yijing Wang

> The pcie_portdrv .probe() method calls pci_enable_device() once, in
> pcie_port_device_register(), but the .remove() method calls
> pci_disable_device() twice, in pcie_port_device_remove() and in
> pcie_portdrv_remove().
> 
> That causes a "disabling already-disabled device" warning when removing a
> PCIe port device.  This happens all the time when removing Thunderbolt
> devices, but is also easy to reproduce with, e.g.,
> "echo :00:1c.3 > /sys/bus/pci/drivers/pcieport/unbind"
> 
> This patch removes the disable from pcie_portdrv_remove().
> 
> [bhelgaas: changelog, tag for stable]
> Reported-by: David Bulkow 
> Reported-by: Mika Westerberg 
> Signed-off-by: Yinghai Lu 
> Signed-off-by: Bjorn Helgaas 
> CC: sta...@vger.kernel.org# v2.6.32+

Hi Bjorn,
   This issue in X86 seems to be introduced after commit 928bea9 "PCI: Delay 
enabling bridges until they're needed"
So this patch needs to back port to 2.6.32+ ?

> ---
>  drivers/pci/pcie/portdrv_pci.c |1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index cd1e57e51aa7..0d8fdc48e642 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -223,7 +223,6 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>  static void pcie_portdrv_remove(struct pci_dev *dev)
>  {
>   pcie_port_device_remove(dev);
> - pci_disable_device(dev);
>  }
>  
>  static int error_detected_iter(struct device *device, void *data)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


-- 
Thanks!
Yijing

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nsproxy: Check to make sure count is truly zero before freeing

2013-11-18 Thread Steven Rostedt

On Tue, 19 Nov 2013 09:22:54 +0800
Gao feng  wrote:

> On 11/19/2013 08:04 AM, Steven Rostedt wrote:
> > 
> > I'll start out saying that this email was a complete oops. I only kept
> > it around for reference, as this didn't fix the bug we were seeing, and
> > I used this email to just document what I initially thought.
> > 
> 
> Can you describe the panic situation and the way to reproduce?
> it's useful for us to find out the real problem.
> 

I need to talk with more people to get more information as it happened
in our test labs. From what I remember, it was caused on 3.10-rt (even
with PREEMPT_RT disabled). We are currently testing 3.10 vanilla, to
make sure that it really does occur on mainline.

>From what I understand, it was ltp-lite that caused the panic and it
happened on a box with 16 CPUs.

I'll keep you posted when we get more data that is relevant to mainline.

This is another reason that the email went out prematurely.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 4/5] perf record: mmap output file - v5

2013-11-18 Thread Namhyung Kim

Hi David,

On Tue, Nov 19, 2013 at 12:34 AM, David Ahern  wrote:
> On 11/18/13, 5:24 PM, Namhyung Kim wrote:
>
> What now? Can we add the mmap path as an option?

 I'd say an option is always a possibility, but someone please try
 what happens if you use stupid large events (dwarf stack copies) on
 PERF_COUNT_SW_PAGE_FAULTS (.period=1) while recording with mmap().

 The other option is to simply disallow PERF_SAMPLE_STACK_USER for
 that event.

 Personally I think 8k copies for every event are way stupid anyway,
 that's a metric ton of data at a huge cost.
>>>
>>>
>>> Well, with 1 khz sampling of a single threaded workload it's 8MB per
>>> second - that's 80 MB for 10 seconds profiling - not the end of the
>>> world.
>>
>>
>> We now use 4 khz sampling frequency by default, just FYI. :)
>
>
> I think Peter is asking about:
> perf record -e faults -c 1 --call-graph dwarf,8192 -a -- sleep 1
>
> And as expected it is a massive feedback spiraling out of control.

Ah, I missed that part - just blindly answered about the freq -
thinking he's talking about the default freq of perf record/top.

Anyway, for above case, I guess it won't affect much as stack usually
is in memory so no page fault will occur even recording with mmap
unless the system suffers from a high memory pressure, right?

But I agree that copying 8KB for each sample seems too large.

-- 
Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] itg3200: add dt support.

2013-11-18 Thread Sebastian Reichel

Hi,

On Tue, Nov 19, 2013 at 11:30:13AM +1100, NeilBrown wrote:
> No new configuration, just a 'compatible' string and
> documentation.

itg3200 looks like a candidate for the list of trivial i2c
devices [0] to me.

[0] Documentation/devicetree/bindings/i2c/trivial-devices.txt

-- Sebastian


signature.asc
Description: Digital signature

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1351 matches

Mail list logo