date:20120726

[ 18/23] pnfs-obj: dont leak objio_state if ore_write/read fails

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Boaz Harrosh 

commit 9909d45a8557455ca5f8ee7af0f253debc851f1a upstream.

[Bug since 3.2 Kernel]
Signed-off-by: Boaz Harrosh 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/nfs/objlayout/objio_osd.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -453,7 +453,10 @@ int objio_read_pagelist(struct nfs_read_
objios->ios->done = _read_done;
dprintk("%s: offset=0x%llx length=0x%x\n", __func__,
rdata->args.offset, rdata->args.count);
-   return ore_read(objios->ios);
+   ret = ore_read(objios->ios);
+   if (unlikely(ret))
+   objio_free_result(>oir);
+   return ret;
 }
 
 /*
@@ -537,8 +540,10 @@ int objio_write_pagelist(struct nfs_writ
dprintk("%s: offset=0x%llx length=0x%x\n", __func__,
wdata->args.offset, wdata->args.count);
ret = ore_write(objios->ios);
-   if (unlikely(ret))
+   if (unlikely(ret)) {
+   objio_free_result(>oir);
return ret;
+   }
 
if (objios->sync)
_write_done(objios->ios, objios);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 17/23] ore: Remove support of partial IO request (NFS crash)

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Boaz Harrosh 

commit 62b62ad873f2accad9222a4d7ffbe1e93f6714c1 upstream.

Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.

Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.

TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets

[Bug since 3.2 Kernel]
CC: Benny Halevy 
Signed-off-by: Boaz Harrosh 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/exofs/ore.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -735,13 +735,7 @@ static int _prepare_for_striping(struct
 out:
ios->numdevs = devs_in_group;
ios->pages_consumed = cur_pg;
-   if (unlikely(ret)) {
-   if (length == ios->length)
-   return ret;
-   else
-   ios->length -= length;
-   }
-   return 0;
+   return ret;
 }
 
 int ore_create(struct ore_io_state *ios)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 16/23] ore: Fix NFS crash by supporting any unaligned RAID IO

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Boaz Harrosh 

commit 9ff19309a9623f2963ac5a136782ea4d8b5d67fb upstream.

In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt  with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
Signed-off-by: Boaz Harrosh 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/exofs/ore_raid.c |   67 +++-
 1 file changed, 36 insertions(+), 31 deletions(-)

--- a/fs/exofs/ore_raid.c
+++ b/fs/exofs/ore_raid.c
@@ -461,16 +461,12 @@ static void _mark_read4write_pages_uptod
  * ios->sp2d[p][*], xor is calculated the same way. These pages are
  * allocated/freed and don't go through cache
  */
-static int _read_4_write(struct ore_io_state *ios)
+static int _read_4_write_first_stripe(struct ore_io_state *ios)
 {
-   struct ore_io_state *ios_read;
struct ore_striping_info read_si;
struct __stripe_pages_2d *sp2d = ios->sp2d;
u64 offset = ios->si.first_stripe_start;
-   u64 last_stripe_end;
-   unsigned bytes_in_stripe = ios->si.bytes_in_stripe;
-   unsigned i, c, p, min_p = sp2d->pages_in_unit, max_p = -1;
-   int ret;
+   unsigned c, p, min_p = sp2d->pages_in_unit, max_p = -1;

if (offset == ios->offset) /* Go to start collect $200 */
goto read_last_stripe;
@@ -478,6 +474,9 @@ static int _read_4_write(struct ore_io_s
min_p = _sp2d_min_pg(sp2d);
max_p = _sp2d_max_pg(sp2d);

+   ORE_DBGMSG("stripe_start=0x%llx ios->offset=0x%llx min_p=%d max_p=%d\n",
+  offset, ios->offset, min_p, max_p);
+
for (c = 0; ; c++) {
ore_calc_stripe_info(ios->layout, offset, 0, _si);
read_si.obj_offset += min_p * PAGE_SIZE;
@@ -512,6 +511,18 @@ static int _read_4_write(struct ore_io_s
}

 read_last_stripe:
+   return 0;
+}
+
+static int _read_4_write_last_stripe(struct ore_io_state *ios)
+{
+   struct ore_striping_info read_si;
+   struct __stripe_pages_2d *sp2d = ios->sp2d;
+   u64 offset;
+   u64 last_stripe_end;
+   unsigned bytes_in_stripe = ios->si.bytes_in_stripe;
+   unsigned c, p, min_p = sp2d->pages_in_unit, max_p = -1;
+
offset = ios->offset + ios->length;
if (offset % PAGE_SIZE)
_add_to_r4w_last_page(ios, );
@@ -527,15 +538,15 @@ read_last_stripe:
c = _dev_order(ios->layout->group_width * ios->layout->mirrors_p1,
   ios->layout->mirrors_p1, read_si.par_dev, read_si.dev);

-   BUG_ON(ios->si.first_stripe_start + bytes_in_stripe != last_stripe_end);
-   /* unaligned IO must be within a single stripe */
-
if (min_p == sp2d->pages_in_unit) {
/* Didn't do it yet */
min_p = _sp2d_min_pg(sp2d);
max_p = _sp2d_max_pg(sp2d);
}

+   ORE_DBGMSG("offset=0x%llx stripe_end=0x%llx min_p=%d max_p=%d\n",
+  offset, last_stripe_end, min_p, max_p);
+
while

[ 15/23] UBIFS: fix a bug in empty space fix-up

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Artem Bityutskiy 

commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream.

UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).

The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.

There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:

UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log 
at LEB 3:0

The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.

The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.

Signed-off-by: Artem Bityutskiy 
Reported-by: Iwo Mergler 
Tested-by: Iwo Mergler 
Reported-by: James Nute 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/ubifs/sb.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/fs/ubifs/sb.c
+++ b/fs/ubifs/sb.c
@@ -724,8 +724,12 @@ static int fixup_free_space(struct ubifs
lnum = ubifs_next_log_lnum(c, lnum);
}

-   /* Fixup the current log head */
-   err = fixup_leb(c, c->lhead_lnum, c->lhead_offs);
+   /*
+* Fixup the log head which contains the only a CS node at the
+* beginning.
+*/
+   err = fixup_leb(c, c->lhead_lnum,
+   ALIGN(UBIFS_CS_NODE_SZ, c->min_io_size));
if (err)
goto out;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 14/23] MIPS: Properly align the .data..init_task section.

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 

commit 7b1c0d26a8e272787f0f9fcc5f3e8531df3b3409 upstream.

Improper alignment can lead to unbootable systems and/or random
crashes.

[r...@linux-mips.org: This is a lond standing bug since
6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp.
c422a10917f75fd19fa7fe07023e384dae6f (lmo) [MIPS: Clean up linker script
using new linker script macros.] so dates back to 2.6.32.]

Signed-off-by: David Daney 
Cc: linux-m...@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3881/
Signed-off-by: Ralf Baechle 
Signed-off-by: Greg Kroah-Hartman 

---
 arch/mips/include/asm/thread_info.h |4 ++--
 arch/mips/kernel/vmlinux.lds.S  |3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -60,6 +60,8 @@ struct thread_info {
 register struct thread_info *__current_thread_info __asm__("$28");
 #define current_thread_info()  __current_thread_info
 
+#endif /* !__ASSEMBLY__ */
+
 /* thread information allocation */
 #if defined(CONFIG_PAGE_SIZE_4KB) && defined(CONFIG_32BIT)
 #define THREAD_SIZE_ORDER (1)
@@ -97,8 +99,6 @@ register struct thread_info *__current_t
 
 #define free_thread_info(info) kfree(info)
 
-#endif /* !__ASSEMBLY__ */
-
 #define PREEMPT_ACTIVE 0x1000
 
 /*
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 #include 
 
 #undef mips
@@ -72,7 +73,7 @@ SECTIONS
.data : {   /* Data */
. = . + DATAOFFSET; /* for CONFIG_MAPPED_KERNEL */
 
-   INIT_TASK_DATA(PAGE_SIZE)
+   INIT_TASK_DATA(THREAD_SIZE)
NOSAVE_DATA
CACHELINE_ALIGNED_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)
READ_MOSTLY_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 13/23] HID: multitouch: Add support for Baanto touchscreen

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiri Kosina 

commit 9ed326951806c424b42dcf2e1125e25a98fb13d1 upstream.

Reported-by: Tvrtko Ursulin 
Tested-by: Tvrtko Ursulin 
Signed-off-by: Jiri Kosina 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/hid/hid-core.c   |1 +
 drivers/hid/hid-ids.h|3 +++
 drivers/hid/hid-multitouch.c |4 
 3 files changed, 8 insertions(+)

--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -1391,6 +1391,7 @@ static const struct hid_device_id hid_ha
{ HID_USB_DEVICE(USB_VENDOR_ID_ASUS, USB_DEVICE_ID_ASUS_T91MT) },
{ HID_USB_DEVICE(USB_VENDOR_ID_ASUS, 
USB_DEVICE_ID_ASUSTEK_MULTITOUCH_YFO) },
{ HID_USB_DEVICE(USB_VENDOR_ID_BELKIN, USB_DEVICE_ID_FLIP_KVM) },
+   { HID_USB_DEVICE(USB_VENDOR_ID_BAANTO, USB_DEVICE_ID_BAANTO_MT_190W2), 
},
{ HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE) },
{ HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE_2) 
},
{ HID_USB_DEVICE(USB_VENDOR_ID_CANDO, 
USB_DEVICE_ID_CANDO_PIXCIR_MULTI_TOUCH) },
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -160,6 +160,9 @@
 #define USB_VENDOR_ID_AVERMEDIA0x07ca
 #define USB_DEVICE_ID_AVER_FM_MR8000xb800
 
+#define USB_VENDOR_ID_BAANTO   0x2453
+#define USB_DEVICE_ID_BAANTO_MT_190W2  0x0100
+
 #define USB_VENDOR_ID_BELKIN   0x050d
 #define USB_DEVICE_ID_FLIP_KVM 0x3201
 
--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -783,6 +783,10 @@ static const struct hid_device_id mt_dev
HID_USB_DEVICE(USB_VENDOR_ID_ATMEL,
USB_DEVICE_ID_ATMEL_MXT_DIGITIZER) },
 
+   /* Baanto multitouch devices */
+   { .driver_data = MT_CLS_DEFAULT,
+   HID_USB_DEVICE(USB_VENDOR_ID_BAANTO,
+   USB_DEVICE_ID_BAANTO_MT_190W2) },
/* Cando panels */
{ .driver_data = MT_CLS_DUAL_INRANGE_CONTACTNUMBER,
HID_USB_DEVICE(USB_VENDOR_ID_CANDO,


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 09/23] ext4: fix duplicated mnt_drop_write call in EXT4_IOC_MOVE_EXT

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Al Viro 

commit 331ae4962b975246944ea039697a8f1cadce42bb upstream.

Caused, AFAICS, by mismerge in commit ff9cb1c4eead ("Merge branch
'for_linus' into for_linus_merged")

Signed-off-by: Al Viro 
Cc: Theodore Ts'o 
Signed-off-by: Linus Torvalds 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/ext4/ioctl.c |1 -
 1 file changed, 1 deletion(-)

--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -261,7 +261,6 @@ group_extend_out:
err = ext4_move_extents(filp, donor_filp, me.orig_start,
me.donor_start, me.len, _len);
mnt_drop_write_file(filp);
-   mnt_drop_write(filp->f_path.mnt);
 
if (copy_to_user((struct move_extent __user *)arg,
 , sizeof(me)))


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 07/23] ntp: Fix STA_INS/DEL clearing bug

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: John Stultz 

commit 6b1859dba01c7d512b72d77e3fd7da8354235189 upstream.

In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I
introduced a bug that kept the STA_INS or STA_DEL bit
from being cleared from time_status via adjtimex()
without forcing STA_PLL first.

Usually once the STA_INS is set, it isn't cleared
until the leap second is applied, so its unlikely this
affected anyone. However during testing I noticed it
took some effort to cancel a leap second once STA_INS
was set.

Signed-off-by: John Stultz 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Richard Cochran 
Cc: Prarit Bhargava 
Link: 
http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
Signed-off-by: Greg Kroah-Hartman 

---
 kernel/time/ntp.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -409,7 +409,9 @@ int second_overflow(unsigned long secs)
time_state = TIME_DEL;
break;
case TIME_INS:
-   if (secs % 86400 == 0) {
+   if (!(time_status & STA_INS))
+   time_state = TIME_OK;
+   else if (secs % 86400 == 0) {
leap = -1;
time_state = TIME_OOP;
time_tai++;
@@ -418,7 +420,9 @@ int second_overflow(unsigned long secs)
}
break;
case TIME_DEL:
-   if ((secs + 1) % 86400 == 0) {
+   if (!(time_status & STA_DEL))
+   time_state = TIME_OK;
+   else if ((secs + 1) % 86400 == 0) {
leap = 1;
time_tai--;
time_state = TIME_WAIT;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 05/23] target: Clean up returning errors in PR handling code

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Roland Dreier 

commit d35212f3ca3bf4fb49d15e37f530c9931e2d2183 upstream.

 - instead of (PTR_ERR(file) < 0) just use IS_ERR(file)
 - return -EINVAL instead of EINVAL
 - all other error returns in target_scsi3_emulate_pr_out() use
   "goto out" -- get rid of the one remaining straight "return."

Signed-off-by: Roland Dreier 
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/target/target_core_pr.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/target/target_core_pr.c
+++ b/drivers/target/target_core_pr.c
@@ -2038,7 +2038,7 @@ static int __core_scsi3_write_aptpl_to_f
if (IS_ERR(file) || !file || !file->f_dentry) {
pr_err("filp_open(%s) for APTPL metadata"
" failed\n", path);
-   return (PTR_ERR(file) < 0 ? PTR_ERR(file) : -ENOENT);
+   return IS_ERR(file) ? PTR_ERR(file) : -ENOENT;
}
 
iov[0].iov_base = [0];
@@ -3826,7 +3826,7 @@ int target_scsi3_emulate_pr_out(struct s
" SPC-2 reservation is held, returning"
" RESERVATION_CONFLICT\n");
cmd->scsi_sense_reason = TCM_RESERVATION_CONFLICT;
-   ret = EINVAL;
+   ret = -EINVAL;
goto out;
}
 
@@ -3836,7 +3836,8 @@ int target_scsi3_emulate_pr_out(struct s
 */
if (!cmd->se_sess) {
cmd->scsi_sense_reason = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out;
}
 
if (cmd->data_length < 24) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 04/23] cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Jeff Layton 

commit 3ae629d98bd5ed77585a878566f04f310adbc591 upstream.

We currently rely on being able to kmap all of the pages in an async
read or write request. If you're on a machine that has CONFIG_HIGHMEM
set then that kmap space is limited, sometimes to as low as 512 slots.

With 512 slots, we can only support up to a 2M r/wsize, and that's
assuming that we can get our greedy little hands on all of them. There
are other users however, so it's possible we'll end up stuck with a
size that large.

Since we can't handle a rsize or wsize larger than that currently, cap
those options at the number of kmap slots we have. We could consider
capping it even lower, but we currently default to a max of 1M. Might as
well allow those luddites on 32 bit arches enough rope to hang
themselves.

A more robust fix would be to teach the send and receive routines how
to contend with an array of pages so we don't need to marshal up a kvec
array at all. That's a fairly significant overhaul though, so we'll need
this limit in place until that's ready.

Reported-by: Jian Li 
Signed-off-by: Jeff Layton 
Signed-off-by: Steve French 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/cifs/connect.c |   18 ++
 1 file changed, 18 insertions(+)

--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -3348,6 +3348,18 @@ void cifs_setup_cifs_sb(struct smb_vol *
 #define CIFS_DEFAULT_NON_POSIX_RSIZE (60 * 1024)
 #define CIFS_DEFAULT_NON_POSIX_WSIZE (65536)

+/*
+ * On hosts with high memory, we can't currently support wsize/rsize that are
+ * larger than we can kmap at once. Cap the rsize/wsize at
+ * LAST_PKMAP * PAGE_SIZE. We'll never be able to fill a read or write request
+ * larger than that anyway.
+ */
+#ifdef CONFIG_HIGHMEM
+#define CIFS_KMAP_SIZE_LIMIT   (LAST_PKMAP * PAGE_CACHE_SIZE)
+#else /* CONFIG_HIGHMEM */
+#define CIFS_KMAP_SIZE_LIMIT   (1<<24)
+#endif /* CONFIG_HIGHMEM */
+
 static unsigned int
 cifs_negotiate_wsize(struct cifs_tcon *tcon, struct smb_vol *pvolume_info)
 {
@@ -3378,6 +3390,9 @@ cifs_negotiate_wsize(struct cifs_tcon *t
wsize = min_t(unsigned int, wsize,
server->maxBuf - sizeof(WRITE_REQ) + 4);

+   /* limit to the amount that we can kmap at once */
+   wsize = min_t(unsigned int, wsize, CIFS_KMAP_SIZE_LIMIT);
+
/* hard limit of CIFS_MAX_WSIZE */
wsize = min_t(unsigned int, wsize, CIFS_MAX_WSIZE);

@@ -3419,6 +3434,9 @@ cifs_negotiate_rsize(struct cifs_tcon *t
if (!(server->capabilities & CAP_LARGE_READ_X))
rsize = min_t(unsigned int, CIFSMaxBufSize, rsize);

+   /* limit to the amount that we can kmap at once */
+   rsize = min_t(unsigned int, rsize, CIFS_KMAP_SIZE_LIMIT);
+
/* hard limit of CIFS_MAX_RSIZE */
rsize = min_t(unsigned int, rsize, CIFS_MAX_RSIZE);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 03/23] cifs: always update the inode cache with the results from a FIND_*

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Jeff Layton 

commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream.

When we get back a FIND_FIRST/NEXT result, we have some info about the
dentry that we use to instantiate a new inode. We were ignoring and
discarding that info when we had an existing dentry in the cache.

Fix this by updating the inode in place when we find an existing dentry
and the uniqueid is the same.

Reported-and-Tested-by: Andrew Bartlett 
Reported-by: Bill Robertson 
Reported-by: Dion Edwards 
Signed-off-by: Jeff Layton 
Signed-off-by: Steve French 
Signed-off-by: Greg Kroah-Hartman 

---
 fs/cifs/readdir.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

--- a/fs/cifs/readdir.c
+++ b/fs/cifs/readdir.c
@@ -86,9 +86,12 @@ cifs_readdir_lookup(struct dentry *paren
 
dentry = d_lookup(parent, name);
if (dentry) {
-   /* FIXME: check for inode number changes? */
-   if (dentry->d_inode != NULL)
+   inode = dentry->d_inode;
+   /* update inode in place if i_ino didn't change */
+   if (inode && CIFS_I(inode)->uniqueid == fattr->cf_uniqueid) {
+   cifs_fattr_to_inode(inode, fattr);
return dentry;
+   }
d_drop(dentry);
dput(dentry);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 01/23] md: avoid crash when stopping md array races with closing other open fds.

2012-07-26 Thread Greg Kroah-Hartman

From: Greg KH 

3.4-stable review patch.  If anyone has any objections, please let me know.

--

From: NeilBrown 

commit a05b7ea03d72f36edb0cec05e8893803335c61a0 upstream.

md will refuse to stop an array if any other fd (or mounted fs) is
using it.
When any fs is unmounted of when the last open fd is closed all
pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
so there will be no pending IO to worry about when the array is
stopped.

However in order to send the STOP_ARRAY ioctl to stop the array one
must first get and open fd on the block device.
If some fd is being used to write to the block device and it is closed
after mdadm open the block device, but before mdadm issues the
STOP_ARRAY ioctl, then there will be no last-close on the md device so
__blkdev_put will not call sync_blockdev.

If this happens, then IO can still be in-flight while md tears down
the array and bad things can happen (use-after-free and subsequent
havoc).

So in the case where do_md_stop is being called from an open file
descriptor, call sync_block after taking the mutex to ensure there
will be no new openers.

This is needed when setting a read-write device to read-only too.

Reported-by: majianpeng 
Signed-off-by: NeilBrown 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/md/md.c |   36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3744,8 +3744,8 @@ array_state_show(struct mddev *mddev, ch
return sprintf(page, "%s\n", array_states[st]);
 }
 
-static int do_md_stop(struct mddev * mddev, int ro, int is_open);
-static int md_set_readonly(struct mddev * mddev, int is_open);
+static int do_md_stop(struct mddev * mddev, int ro, struct block_device *bdev);
+static int md_set_readonly(struct mddev * mddev, struct block_device *bdev);
 static int do_md_run(struct mddev * mddev);
 static int restart_array(struct mddev *mddev);
 
@@ -3761,14 +3761,14 @@ array_state_store(struct mddev *mddev, c
/* stopping an active array */
if (atomic_read(>openers) > 0)
return -EBUSY;
-   err = do_md_stop(mddev, 0, 0);
+   err = do_md_stop(mddev, 0, NULL);
break;
case inactive:
/* stopping an active array */
if (mddev->pers) {
if (atomic_read(>openers) > 0)
return -EBUSY;
-   err = do_md_stop(mddev, 2, 0);
+   err = do_md_stop(mddev, 2, NULL);
} else
err = 0; /* already inactive */
break;
@@ -3776,7 +3776,7 @@ array_state_store(struct mddev *mddev, c
break; /* not supported yet */
case readonly:
if (mddev->pers)
-   err = md_set_readonly(mddev, 0);
+   err = md_set_readonly(mddev, NULL);
else {
mddev->ro = 1;
set_disk_ro(mddev->gendisk, 1);
@@ -3786,7 +3786,7 @@ array_state_store(struct mddev *mddev, c
case read_auto:
if (mddev->pers) {
if (mddev->ro == 0)
-   err = md_set_readonly(mddev, 0);
+   err = md_set_readonly(mddev, NULL);
else if (mddev->ro == 1)
err = restart_array(mddev);
if (err == 0) {
@@ -5124,15 +5124,17 @@ void md_stop(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(md_stop);
 
-static int md_set_readonly(struct mddev *mddev, int is_open)
+static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
 {
int err = 0;
mutex_lock(>open_mutex);
-   if (atomic_read(>openers) > is_open) {
+   if (atomic_read(>openers) > !!bdev) {
printk("md: %s still in use.\n",mdname(mddev));
err = -EBUSY;
goto out;
}
+   if (bdev)
+   sync_blockdev(bdev);
if (mddev->pers) {
__md_stop_writes(mddev);
 
@@ -5154,18 +5156,26 @@ out:
  *   0 - completely stop and dis-assemble array
  *   2 - stop but do not disassemble array
  */
-static int do_md_stop(struct mddev * mddev, int mode, int is_open)
+static int do_md_stop(struct mddev * mddev, int mode,
+ struct block_device *bdev)
 {
struct gendisk *disk = mddev->gendisk;
struct md_rdev *rdev;
 
mutex_lock(>open_mutex);
-   if (atomic_read(>openers) > is_open ||
+   if (atomic_read(>openers) > !!bdev ||
mddev->sysfs_active) {
printk("md: %s still in use.\n",mdname(mddev));
mutex_unlock(>open_mutex);
return -EBUSY;
}
+   if (bdev)
+   /* It is possible IO was issued on some other
+* open file which was

Re: No big TTY/serial patch merge for 3.6-rc1

2012-07-26 Thread Alan Cox

On Thu, 26 Jul 2012 12:08:14 -0700
Greg KH  wrote:

>   tty: Move the handling of the tty release logic

Can we lose that one specifically. I've chased down Ian Abbotts problem
and replicated it and that is the offending patch not the lock localise
(which still needs to be kept out as it depends upon this one)

I have it fixed but it's not had enough testing for -rc1 and moving the
termios data has enough spectacular hits all drivers fallout for 3.6

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 1/1] kthread: disable preemption during complete()

2012-07-26 Thread Peter Boonstoppel

> > tglx has patches that make the kthread create/destroy stuff from hotplug
> > go away.. that seems like the better approach.

> Right. That cpu hotplug setup/teardown stuff is ugly.

If that stuff gets removed complete that's great. The only change I'm aware of 
right now is the workqueue one: 
http://thread.gmane.org/gmane.linux.kernel/1329164

> > The main thing is avoiding the wakeup preemption from the complete()
> > because we're going to sleep right after anyway.

You are very likely to be preempted by the complete(), since the newly created 
thread has a relatively high vruntime.

> > The comment doesn't really make that clear.

> Right, the comment is crap. It has nothing to do with kthread_bind()
> and stuff. The whole purpose is to avoid the pointless preemption
> after wakeup.

The only case I want to solve is the kthread_bind()->wait_task_inactive() 
scenario. On our platforms this patch reduces average cpu_up() time from about 
9ms to 8ms, but max time goes down from 37ms to 8.5ms. cpu_up() latency becomes 
much more predictable.


PeterB--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mfd: add MAX8907 core driver

2012-07-26 Thread Stephen Warren

On 07/26/2012 02:35 PM, Mark Brown wrote:
> On Thu, Jul 26, 2012 at 01:40:30PM -0600, Stephen Warren wrote:

>> +if (!irqd_irq_disabled(d) && (value & irq_data->offs)) {
> 
> This looks very suspicious...  why do we need to call 
> irqd_irq_disabled() here?

I believe the status register reflects the unmasked status, it's just
the interrupt signal that's affected by the mask.

>> +static void max8907_irq_enable(struct irq_data *data) +{ +  /*
>> Everything happens in max8907_irq_sync_unlock */ +}
> 
>> +static void max8907_irq_disable(struct irq_data *data) +{ + /*
>> Everything happens in max8907_irq_sync_unlock */ +}
> 
> The fact that these functions are empty is the second part of the
> above suspicous check for disabled IRQs.  We're just completely
> ignoring the caller here.  What would idiomatically happen is that
> we'd update a variable here then write it out in the unmask.
> 
> If these functions really should be empty then they should be
> omitted.
> 
>> +static int max8907_irq_set_wake(struct irq_data *data, unsigned
>> int on) +{ + /* Everything happens in max8907_irq_sync_unlock */ 
>> + +  return 0; +}
> 
> Again, this doesn't look clever at all.

So the idea here was that the IRQ core is already maintaining state
which describes which IRQs are enabled/disabled and wake/not. Rather
than have irq_enable/irq_disable/set_wake do nothing but save the same
state to irq_chip-specific structures, I removed the body of those
functions and instead just call irqd_irq_disabled() etc. wherever I
would have accessed the "local" state. Is that not a legitimate design
then?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 00/40] 3.0.39-stable review

2012-07-26 Thread Greg KH

This is the start of the stable review cycle for the 3.0.39 release.
There are 40 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat Jul 28 21:14:09 UTC 2012.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.0.39-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-
 .../trace/postprocess/trace-vmscan-postprocess.pl  |8 +-
 Makefile   |4 +-
 arch/mips/include/asm/thread_info.h|4 +-
 arch/mips/kernel/vmlinux.lds.S |3 +-
 drivers/base/memory.c  |   58 ++--
 drivers/md/dm-raid1.c  |2 +-
 drivers/md/dm-region-hash.c|5 +-
 fs/btrfs/disk-io.c |5 +-
 fs/cifs/readdir.c  |7 +-
 fs/hugetlbfs/inode.c   |3 +-
 fs/nfs/internal.h  |2 +-
 fs/nfs/write.c |4 +-
 fs/ubifs/sb.c  |8 +-
 include/linux/cpuset.h |   45 ++-
 include/linux/fs.h |   11 +-
 include/linux/init_task.h  |8 +
 include/linux/memcontrol.h |3 +-
 include/linux/migrate.h|   23 +-
 include/linux/mmzone.h |   14 +
 include/linux/sched.h  |2 +-
 include/linux/swap.h   |7 +-
 include/trace/events/vmscan.h  |   85 +-
 kernel/cpuset.c|   63 ++---
 kernel/fork.c  |3 +
 kernel/time/ntp.c  |8 +-
 mm/compaction.c|   26 +-
 mm/filemap.c   |   11 +-
 mm/hugetlb.c   |   13 +-
 mm/memcontrol.c|3 +-
 mm/memory-failure.c|2 +-
 mm/memory_hotplug.c|2 +-
 mm/mempolicy.c |   30 +-
 mm/migrate.c   |  240 ++--
 mm/page_alloc.c|  118 +---
 mm/slab.c  |   13 +-
 mm/slub.c  |   39 ++-
 mm/vmscan.c|  289 
 mm/vmstat.c|2 +-
 38 files changed, 820 insertions(+), 353 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[ 00/23] 3.4.7-stable review

2012-07-26 Thread Greg KH

This is the start of the stable review cycle for the 3.4.7 release.
There are 23 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat Jul 28 21:14:04 UTC 2012.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v3.0/stable-review/patch-3.4.7-rc1.gz
and the diffstat can be found below.

thanks,

greg k-h

-
 Makefile|4 +--
 arch/arm/plat-s5p/clock.c   |1 +
 arch/mips/include/asm/thread_info.h |4 +--
 arch/mips/kernel/vmlinux.lds.S  |3 +-
 drivers/hid/hid-core.c  |1 +
 drivers/hid/hid-ids.h   |6 
 drivers/hid/hid-input.c |3 ++
 drivers/hid/hid-multitouch.c|4 +++
 drivers/hid/usbhid/hid-quirks.c |1 +
 drivers/md/dm-raid1.c   |3 +-
 drivers/md/dm-region-hash.c |5 ++-
 drivers/md/dm-thin.c|6 +++-
 drivers/md/md.c |   36 ---
 drivers/md/raid1.c  |   10 --
 drivers/target/target_core_cdb.c|2 +-
 drivers/target/target_core_pr.c |7 ++--
 drivers/target/tcm_fc/tfc_cmd.c |2 ++
 fs/cifs/cifssmb.c   |   30 
 fs/cifs/connect.c   |   18 ++
 fs/cifs/readdir.c   |7 ++--
 fs/exofs/ore.c  |8 +
 fs/exofs/ore_raid.c |   67 +++
 fs/ext4/ioctl.c |1 -
 fs/nfs/objlayout/objio_osd.c|9 +++--
 fs/ubifs/sb.c   |8 +++--
 kernel/time/ntp.c   |8 +++--
 mm/vmscan.c |5 ++-
 27 files changed, 184 insertions(+), 75 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [GIT PULL] PWM subsystem for v3.6

2012-07-26 Thread Linus Torvalds

On Thu, Jul 26, 2012 at 12:16 AM, Thierry Reding
 wrote:
>
> The new PWM subsystem aims at collecting all implementations of the
> legacy PWM API and to eventually replace it completely. The subsystem
> has been in development for over half a year now and many drivers have
> already been converted. It has been in linux-next for a couple of weeks
> and there have been no major issues so I think it is ready for inclusion
> in your tree.

For new subsystems like this, I really want ack's from the people who
are expected to use it.

For a gitorious pull like this, I also want signed tags with the gpg
key having signatures from people I recognize. I don't think I have
such a key from you.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 0/14] PM / shmobile: Pass power domain information via DT (was: Re: [RFD] PM: Device tree representation of power domains)

2012-07-26 Thread Mark Brown

On Wed, Jul 25, 2012 at 05:38:35PM -0700, Kevin Hilman wrote:

> That being said, I'm not sure why ti,hwmods is being used as an example
> for powerdomains.  hwmods describe the integration of SoC IP blocks
> (base addr, IRQ, DMA channel etc., which are being moved to DT) as well
> as a bunch of SoC specific PM register descriptions.  This stuff is
> SoC-specific PM register layout, so being very SoC specific, it has the
> 'ti' prefix in the DT binding.

I think the thing here is that one aspect of that SoC integration is
which power domain the blocks are in.  Describing which power domain an
IP is in isn't a million miles away from describing which hwmod applies
to an IP.

signature.asc
Description: Digital signature

[GIT PULL] a large btrfs update

2012-07-26 Thread Chris Mason

Hi Linus,

Since this has a conflict, I've split it into two branches.

# against 3.5, my diffstat is based on this branch
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git for-linus

# against your git as of today
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs.git 
for-linus-merged

for-linus-merged has an extra commit on top as well that changes the
btrfs send/receive code to Al's new dentry_open.  It's a small commit,
and my guess is that you'll cherry pick it and do your own merge.

I will be on vacation next week, but Josef Bacik is in charge of dealing
with any problems and sending fixes.  He has git trees on kernel.org, so
there won't be any problems sending things in.

This pull request is very large, and the two main features in here have
been under testing/devel for quite a while.

We have subvolume quotas from the strato developers.  This enables full
tracking of how many blocks are allocated to each subvolume (and all
snapshots) and you can set limits on a per-subvolume basis.  You can
also create quota groups and toss multiple subvolumes into a big group.
It's everything you need to be a web hosting company and give each user
their own subvolume.

The userland side of the quotas is being refreshed, they'll send out
details on where to grab it soon.

Next is the kernel side of btrfs send/receive from Alexander Block.
This leverages the same infrastructure as the quota code to figure out
relationships between blocks and their owners.  It can then compute the
difference between two snapshots and sends the diffs in a neutral format
into userland.

The basic model:

create a snapshot
send that snapshot as the initial backup
make changes
create a second snapshot
send the incremental as a backup
delete the first snapshot
(use the second snapshot for the next incremental)

The receive portion is all in userland, and in the 'next' branch of my
btrfs-progs repo.

There's still some work to do in terms of optimizing the send side from
kernel to userland.  The really important part is figuring out how two
snapshots are different, and this is where we are concentrating right
now.  The initial send of a dataset is a little slower than tar, but the
incremental sends are dramatically faster than what rsync can do.

On top of all of that, we have a nice queue of fixes, cleanups and
optimizations.

Liu Bo (13) commits (+120/-67):
Btrfs: check write access to mount earlier while creating snapshots 
(+11/-11)
Btrfs: fix a bug of writting free space cache during balance (+21/-3)
Btrfs: fix btrfs_is_free_space_inode to recognize btree inode (+4/-2)
Btrfs: fix typo in cow_file_range_async and async_cow_submit (+2/-2)
Btrfs: make btrfs's allocation smoothly with preallocation (+3/-1)
Btrfs: use mnt_want_write_file instead of mnt_want_write (+2/-2)
Btrfs: do not set subvolume flags in readonly mode (+28/-14)
Btrfs: kill root from btrfs_is_free_space_inode (+16/-15)
Btrfs: do not abort transaction in prealloc case (+5/-1)
Btrfs: remove redundant r/o check for superblock (+0/-7)
Btrfs: add ro notification to dump_space_info (+3/-3)
Btrfs: improve multi-thread buffer read (+24/-5)
Btrfs: do not count in readonly bytes (+1/-1)

Arne Jansen (12) commits (+2488/-26):
Btrfs: Test code to change the order of delayed-ref processing (+49/-0)
Btrfs: check the root passed to btrfs_end_transaction (+12/-0)
Btrfs: qgroup implementation and prototypes (+1681/-1)
Btrfs: added helper to create new trees (+83/-1)
Btrfs: add helper for tree enumeration (+77/-0)
Btrfs: qgroup state and initialization (+31/-0)
Btrfs: add helper for tree enumeration (+75/-0)
Btrfs: quota tree support and startup (+42/-6)
Btrfs: hooks to reserve qgroup space (+29/-0)
Btrfs: add qgroup inheritance (+61/-18)
Btrfs: qgroup on-disk format (+136/-0)
Btrfs: add qgroup ioctls (+212/-0)

Josef Bacik (8) commits (+160/-133):
Btrfs: don't return true in releasepage unless we actually freed the eb 
(+5/-4)
Btrfs: lock the transition from dirty to writeback for an eb (+9/-0)
Btrfs: flush delayed inodes if we're short on space (+83/-38)
Btrfs: fix potential race in extent buffer freeing (+3/-6)
Btrfs: change how we indicate we're adding csums (+18/-15)
Btrfs: rework shrink_delalloc (+24/-57)
Btrfs: add DEVICE_READY ioctl (+18/-2)
Btrfs: remove ->dirty_inode (+0/-11)

Jan Schmidt (5) commits (+295/-225):
Btrfs: join tree mod log code with the code holding back delayed refs 
(+240/-219)
Btrfs: hooks for qgroup to record delayed refs (+36/-6)
Btrfs: fix buffer leak in btrfs_next_old_leaf (+1/-0)
Btrfs: fix buffer leak in btrfs_next_old_leaf (+1/-0)
Btrfs: call the qgroup accounting functions (+17/-0)

Alexander Block (5) commits (+5465/-21):
Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive (+4717/-1)
Btrfs: introduce

Re: [PATCH] bcma: fix regression in pmu workaround reg masks

2012-07-26 Thread Linus Torvalds

Oops. For some reason I didn't see this email this morning, and grew
impatient and committed the patch without the sign-off and with a
different changelog.

My bad. Too much email.

Anyway, this is commit 1f03bf06e4e3 in my tree.

  Linus

On Thu, Jul 26, 2012 at 2:15 AM, Hauke Mehrtens  wrote:
> This fixes a regression introduced in:
> commit b9562545ef0b13c0440ccd8d6dd4111fb77cb17a
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-26 Thread Larry Woodman


On 07/26/2012 02:37 PM, Rik van Riel wrote:

On 07/23/2012 12:04 AM, Hugh Dickins wrote:


I spent hours trying to dream up a better patch, trying various
approaches.  I think I have a nice one now, what do you think?  And
more importantly, does it work?  I have not tried to test it at all,
that I'm hoping to leave to you, I'm sure you'll attack it with gusto!

If you like it, please take it over and add your comments and signoff
and send it in.  The second part won't come up in your testing, and 
could

be made a separate patch if you prefer: it's a related point that struck
me while I was playing with a different approach.

I'm sorely tempted to leave a dangerous pair of eyes off the Cc,
but that too would be unfair.

Subject-to-your-testing-
Signed-off-by: Hugh Dickins 


This patch looks good to me.

Larry, does Hugh's patch survive your testing?


It doesnt.  However its got a slightly different footprint because this 
is RHEL6 and
there have been changes to the hugetlbfs_inode code.  Also, we are 
seeing the
problem via group_exit() rather than shmdt().  Also, I print out the 
actual _mapcount

at the BUG and most of the time its 1 but have seen it as high as 6.



dell-per620-01.lab.bos.redhat.com login: MAPCOUNT = 2
[ cut here ]
kernel BUG at mm/filemap.c:131!
invalid opcode:  [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
CPU 8
Modules linked in: autofs4 sunrpc ipv6 acpi_pad power_meter dcdbas 
microcode sb_edac edac_core iTCO_wdt i]


Pid: 3106, comm: mpitest Not tainted 2.6.32-289.el6.sharedpte.x86_64 #17 
Dell Inc. PowerEdge R620/07NDJ2
RIP: 0010:[]  [] 
__remove_from_page_cache+0xe2/0x100

RSP: 0018:880434897b78  EFLAGS: 00010002
RAX: 0001 RBX: ea00074ec000 RCX: 10f6
RDX:  RSI: 0046 RDI: 0046
RBP: 880434897b88 R08: 81c01a00 R09: 
R10:  R11: 0004 R12: 880432683d98
R13: 880432683db0 R14:  R15: ea00074ec000
FS:  () GS:88002828() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 003a1d38c4a8 CR3: 01a85000 CR4: 000406e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process mpitest (pid: 3106, threadinfo 880434896000, task 
880431abb500)

Stack:
 ea00074ec000  880434897bb8 81114ab4
 880434897bb8 02ab 02a0 880434897c08
 880434897cb8 811f758d 88022dd8 
Call Trace:
 [] remove_from_page_cache+0x54/0x90
 [] truncate_hugepages+0x11d/0x200
 [] ? hugetlbfs_delete_inode+0x0/0x30
 [] hugetlbfs_delete_inode+0x18/0x30
 [] generic_delete_inode+0xde/0x1d0
 [] hugetlbfs_drop_inode+0x5d/0x70
 [] iput+0x62/0x70
 [] dentry_iput+0x90/0x100
 [] d_kill+0x31/0x60
 [] dput+0x7c/0x150
 [] __fput+0x189/0x210
 [] fput+0x25/0x30
 [] filp_close+0x5d/0x90
 [] put_files_struct+0x7f/0xf0
 [] exit_files+0x53/0x70
 [] do_exit+0x18d/0x870
 [] ? audit_syscall_entry+0x272/0x2a0
 [] do_group_exit+0x58/0xd0
 [] sys_exit_group+0x17/0x20
 [] system_call_fastpath+0x16/0x1b

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 1/4] ACPI: Add acpi_pr_() interfaces

2012-07-26 Thread Toshi Kani

On Thu, 2012-07-26 at 13:22 -0600, Bjorn Helgaas wrote:
> On Wed, Jul 25, 2012 at 5:12 PM, Toshi Kani  wrote:
> > This patch introduces acpi_pr_(), where  is a kernel
> > message level such as err/warn/info, to support improved logging
> > messages for ACPI, esp. in hotplug operations.  acpi_pr_()
> > appends "ACPI" prefix and ACPI object path to the messages.  This
> > improves diagnostics in hotplug operations since it identifies an
> > object that caused an issue in a log file.
> >
> > acpi_pr_() takes acpi_handle as an argument, which is passed
> > to ACPI hotplug notify handlers from the ACPICA.  Therefore, it is
> > always available unlike other kernel objects, such as device.
> >
> > For example, the statement below
> >   acpi_pr_err(handle, "Device don't exist, dropping EJECT\n");
> > logs an error message like this at KERN_ERR.
> >   ACPI: \_SB_.SCK4.CPU4: Device don't exist, dropping EJECT
> >
> > ACPI drivers can use acpi_pr_() when they need to identify
> > a target ACPI object in their messages, such as error messages.
> 
> It's definitely an improvement to have *something* that identifies a
> device in these messages.  But the ACPI namespace path is not really
> intended to be user-consumable, so I don't think we should expose it
> indiscriminately.  I think we should be using the ACPI device name
> ("PNP0C02:00") whenever possible.  Given the device name, we can get
> the path from the sysfs "path" file.

Hi Bjorn,

Thanks for reviewing!  Yes, ACPI device path is not good for regular
users to analyze from the info.  I also agree with you that device name
is a better choice when users always diagnose issues by themselves right
after they performed an operation.  However, there are also cases that
users ask someone to diagnose an issue remotely from the log files, and
hotplug operations are performed automatically.  In such cases, using
ACPI device name alone is problematic for hotplug operations since a
device name comes with an instance number that continues to change with
hot-add/remove operations.  Here is one example scenario.  Let's say,
user has automatic load-balancer or power-saving that can trigger
hundreds of CPU hotplug operations per a day.  This user then found that
there were multiple hotplug errors logged in the past few days, and
asked an IT guy to look at the error messages.  When this user found the
issue, all device names are gone from sysfs after repeated hotplug
operations.  This IT guy would have no idea if those errors were
happening on a particular device or not from the error messages since
their instance numbers continue to change.

> > The usage model is similar to dev_().  acpi_pr_() can
> > be used when device is not created/valid, which may be the case for
> > ACPI hotplug handlers.  ACPI drivers can continue to use dev_()
> > when device is valid.
> 
> I'd argue that ACPI driver code should never be called unless the
> device is valid, so drivers should *always* be able to use
> dev_.  Obviously, ACPI hotplug is currently screwed up (it's
> mostly handled in drivers rather than in the ACPI core), so in some of
> those hotplug paths in the drivers, we may not have a device yet.  But
> those cases should be minimal.

I think it makes sense for ACPI hotplug notify handlers to use
acpi_pr_() for their error messages since ACPI device names are
static on the platform.  This info greatly helps in the scenario I
described above.  In the outside of the hotplug notify handlers, I agree
with you that dev_() should be used.

> Another possible approach to this is to add a %p extension rather than
> adding acpi_printk().  Then you could do, e.g., 'printk("%pA ...\n",
> handle)', and printk could interpolate the namespace path.  But I
> really think there should be very few places where we need the path,
> so I'm not sure it's worth it.

Address of handle is not very helpful either when someone needs to
analyze from log files.

Thanks,
-Toshi

> > ACPI drivers also continue to use pr_() when ACPI device
> > path does not have to be appended to the messages, such as boot-up
> > messages.
> >
> > Note: ACPI_[WARNING|INFO|ERROR]() are intended for the ACPICA and
> > are not associated with the kernel message level.
> >
> > Signed-off-by: Toshi Kani 
> > Tested-by: Vijay Mohan Pandarathil 
> > ---
> >  drivers/acpi/utils.c|   34 ++
> >  include/acpi/acpi_bus.h |   31 +++
> >  2 files changed, 65 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/acpi/utils.c b/drivers/acpi/utils.c
> > index 3e87c9c..ec0c6f9 100644
> > --- a/drivers/acpi/utils.c
> > +++ b/drivers/acpi/utils.c
> > @@ -454,3 +454,37 @@ acpi_evaluate_hotplug_ost(acpi_handle handle, u32 
> > source_event,
> >  #endif
> >  }
> >  EXPORT_SYMBOL(acpi_evaluate_hotplug_ost);
> > +
> > +/**
> > + * acpi_printk: Print messages with ACPI prefix and object path
> > + *
> > + * This function is intended to be called through acpi_pr_ macros.
> > + */
> >

Re: [PATCH] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables v2

2012-07-26 Thread Rik van Riel


On 07/20/2012 09:49 AM, Mel Gorman wrote:

This V2 is still the mmap_sem approach that fixes a potential deadlock
problem pointed out by Michal.


Larry and I were looking around the hugetlb code some
more, and found what looks like yet another race.

In hugetlb_no_page, we have the following code:


spin_lock(>page_table_lock);
size = i_size_read(mapping->host) >> huge_page_shift(h);
if (idx >= size)
goto backout;

ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
goto backout;

if (anon_rmap)
hugepage_add_new_anon_rmap(page, vma, address);
else
page_dup_rmap(page);
new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE)
&& (vma->vm_flags & VM_SHARED)));
set_huge_pte_at(mm, address, ptep, new_pte);
...
spin_unlock(>page_table_lock);

Notice how we check !huge_pte_none with our own
mm->page_table_lock held.

This offers no protection at all against other
processes, that also hold their own page_table_lock.

In short, it looks like it is possible for multiple
processes to go through the above code simultaneously,
potentially resulting in:

1) one process overwriting the pte just created by
   another process

2) data corruption, as one partially written page
   gets superceded by an newly zeroed page, but no
   TLB invalidates get sent to other CPUs

3) a memory leak of a huge page

Is there anything that would make this race impossible,
or is this a real bug?

If so, are there more like it in the hugetlbfs code?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: regulators: creating a regulator device for the AC/USB/BAT/charge component of a PMIC?

2012-07-26 Thread Mark Brown

On Thu, Jul 26, 2012 at 12:02:31PM -0600, Stephen Warren wrote:

> A couple of the regulators I'm looking at (I guess many/most in fact)
> are structured as:

> Battery, AC, USB, ... -> PMIC -> main output (unregulated?)

Yes, that's very normal.

> Should this "main output" be represented as a regulator itself?

It can be if you like - most things will be depending on it, often what
people call the battery supply is actually the main power rail in the
schematic.  Having it there certainly won't do any harm and may be
useful.

> However, some of the regulators in the TPS6586x at least are fed
> directly from the SYS output by an internal connection within the PMIC
> (e.g. LDO5). Currently, the driver sets up these regulators as having no
> supply, which seems wrong too. Presumably the PMIC driver should
> internally hook up its SYS as LDO5's supply without needing any platform
> data or DT ldo5-supply property to do this?

Yes, I think if we're going to represent such supplies the driver should
just do it and not force everyone to cut'n'paste.  Though to be honest
if it's a supply that's purely internal to the primary PMIC there's no
real need, if the system core rail gets turned off software really isn't
an issue any more.

signature.asc
Description: Digital signature

Re: [GIT PULL] GPIO changes for v3.6

2012-07-26 Thread Linus Torvalds

On Wed, Jul 25, 2012 at 3:48 PM, Linus Walleij  wrote:
>
> in Grants absence, these are my queued and -next-tested changes
> for v3.6, please pull them in. Grants "merge" branch prior to his
> absence was merged in as a base for this patch series.

Please verify my conflict resolution in drivers/gpio/gpio-mxc.c.

The resolution seemed pretty straight-forward, but I obviously didn't
test the result, so it would be best to double-check,

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 6/7] xen/mmu: Copy and revector the P2M tree.

2012-07-26 Thread Konrad Rzeszutek Wilk

Please first read the description in "xen/p2m: Add logic to revector a
P2M tree to use __va leafs" patch.

The 'xen_revector_p2m_tree()' function allocates a new P2M tree
copies the contents of the old one in it, and returns the new one.

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

We have revectored the P2M tree (and the one for save/restore as well)
to use new shiny __va address to new MFNs. The xen_start_info
has been taken care of already in 'xen_setup_kernel_pagetable()' and
xen_start_info->shared_info in 'xen_setup_shared_info()', so
we are free to roam and delete PMD entries - which is exactly what
we are going to do. We rip out the __ka for the old P2M array.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   57 
 1 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 7f54b75..05e8492 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1183,9 +1183,64 @@ static __init void xen_mapping_pagetable_reserve(u64 
start, u64 end)
 
 static void xen_post_allocator_init(void);
 
+#ifdef CONFIG_X86_64
+void __init xen_cleanhighmap(unsigned long vaddr, unsigned long vaddr_end)
+{
+   unsigned long kernel_end = roundup((unsigned long)_brk_end, PMD_SIZE) - 
1;
+   pmd_t *pmd = level2_kernel_pgt + pmd_index(vaddr);
+
+   /* NOTE: The loop is more greedy than the cleanup_highmap variant.
+* We include the PMD passed in on _both_ boundaries. */
+   for (; vaddr <= vaddr_end && (pmd < (level2_kernel_pgt + PAGE_SIZE));
+   pmd++, vaddr += PMD_SIZE) {
+   if (pmd_none(*pmd))
+   continue;
+   if (vaddr < (unsigned long) _text || vaddr > kernel_end)
+   set_pmd(pmd, __pmd(0));
+   }
+   /* In case we did something silly, we should crash in this function
+* instead of somewhere later and be confusing. */
+   xen_mc_flush();
+}
+#else
+void __init xen_cleanhighmap(unsigned long vaddr, unsigned long vaddr_end)
+{
+}
+#endif
 static void __init xen_pagetable_setup_done(pgd_t *base)
 {
+   unsigned long size;
+   unsigned long addr;
+
xen_setup_shared_info();
+   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
+   unsigned long new_mfn_list;
+
+   size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned 
long));
+
+   new_mfn_list = xen_revector_p2m_tree();
+
+   /* On 32-bit, we get zero so this never gets executed. */
+   if (new_mfn_list && new_mfn_list != xen_start_info->mfn_list) {
+   /* using __ka address! */
+   memset((void *)xen_start_info->mfn_list, 0, size);
+
+   /* We should be in __ka space. */
+   BUG_ON(xen_start_info->mfn_list < __START_KERNEL_map);
+   addr = xen_start_info->mfn_list;
+   size = PAGE_ALIGN(xen_start_info->nr_pages * 
sizeof(unsigned long));
+   /* We roundup to the PMD, which means that if anybody 
at this stage is
+* using the __ka address of xen_start_info or 
xen_start_info->shared_info
+* they are in going to crash. Fortunatly we have 
already revectored
+* in xen_setup_kernel_pagetable and in 
xen_setup_shared_info. */
+   size = roundup(size, PMD_SIZE);
+   xen_cleanhighmap(addr, addr + size);
+
+   memblock_free(__pa(xen_start_info->mfn_list), size);
+   /* And revector! Bye bye old array */
+   xen_start_info->mfn_list = new_mfn_list;
+   }
+   }
xen_post_allocator_init();
 }
 
@@ -1823,6 +1878,8 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
}
/* Our (by three pages) smaller Xen pagetable that we are using */
memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE);
+   /* Revector the xen_start_info */
+   xen_start_info = (struct start_info *)__va(__pa(xen_start_info));
 }
 #else  /* !CONFIG_X86_64 */
 static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] Boot PV guests with more than 128GB (v1) for 3.7

2012-07-26 Thread Konrad Rzeszutek Wilk

These depend on the "documentation, refactor and cleanups (v2) patches
I posted" (https://lkml.org/lkml/2012/7/26/469).

The details of this problem are nicely explained in:

 [PATCH 5/7] xen/p2m: Add logic to revector a P2M tree to use __va
 [PATCH 6/7] xen/mmu: Copy and revector the P2M tree.
 [PATCH 7/7] xen/mmu: Remove from __ka space PMD entries for

and the supporting patches are just nice optimizations. Pasting in
what those patches mentioned:


During bootup Xen supplies us with a P2M array. It sticks
it right after the ramdisk, as can be seen with a 128GB PV guest:

(certain parts removed for clarity):
xc_dom_build_image: called
xc_dom_alloc_segment:   kernel   : 0x8100 -> 0x81e43000 
 (pfn 0x1000 + 0xe43 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
xc_dom_alloc_segment:   ramdisk  : 0x81e43000 -> 0x925c7000 
 (pfn 0x1e43 + 0x10784 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
xc_dom_alloc_segment:   phys2mach: 0x925c7000 -> 0xa25c7000 
 (pfn 0x125c7 + 0x1 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x1 at 0x7f0942dd2000
xc_dom_alloc_page   :   start info   : 0xa25c7000 (pfn 0x225c7)
xc_dom_alloc_page   :   xenstore : 0xa25c8000 (pfn 0x225c8)
xc_dom_alloc_page   :   console  : 0xa25c9000 (pfn 0x225c9)
nr_page_tables: 0x/48: 0x -> 
0x, 1 table(s)
nr_page_tables: 0x007f/39: 0xff80 -> 
0x, 1 table(s)
nr_page_tables: 0x3fff/30: 0x8000 -> 
0xbfff, 1 table(s)
nr_page_tables: 0x001f/21: 0x8000 -> 
0xa27f, 276 table(s)
xc_dom_alloc_segment:   page tables  : 0xa25ca000 -> 0xa26e1000 
 (pfn 0x225ca + 0x117 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
xc_dom_alloc_page   :   boot stack   : 0xa26e1000 (pfn 0x226e1)
xc_dom_build_image  : virt_alloc_end : 0xa26e2000
xc_dom_build_image  : virt_pgtab_end : 0xa280

So the physical memory and virtual (using __START_KERNEL_map addresses)
layout looks as so:

  phys __ka
/\   /---\
| 0  | empty | 0x8000|
| .. |   | ..|
| 16MB   | <= kernel starts  | 0x8100|
| .. |   |   |
| 30MB   | <= kernel ends => | 0x81e43000|
| .. |  & ramdisk starts | ..|
| 293MB  | <= ramdisk ends=> | 0x925c7000|
| .. |  & P2M starts | ..|
| .. |   | ..|
| 549MB  | <= P2M ends=> | 0xa25c7000|
| .. | start_info| 0xa25c7000|
| .. | xenstore  | 0xa25c8000|
| .. | cosole| 0xa25c9000|
| 549MB  | <= page tables => | 0xa25ca000|
| .. |   |   |
| 550MB  | <= PGT end => | 0xa26e1000|
| .. | boot stack|   |
\/   \---/

As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xa000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:

[ cut here ]
WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 
vmap_page_range_noflush+0x2d9/0x370()
Call Trace:
 [] warn_slowpath_common+0x7a/0xb0
 [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [] warn_slowpath_null+0x15/0x20
 [] vmap_page_range_noflush+0x2d9/0x370
 [] map_vm_area+0x2d/0x50
 [] __vmalloc_node_range+0x160/0x250
 [] ? module_alloc_update_bounds+0x19/0x80
 [] ? load_module+0x66/0x19c0
 [] module_alloc+0x5c/0x60
 [] ? module_alloc_update_bounds+0x19/0x80
 [] module_alloc_update_bounds+0x19/0x80
 [] load_module+0xfa3/0x19c0
 [] ? security_file_permission+0x86/0x90
 [] sys_init_module+0x5a/0x220
 [] system_call_fastpath+0x16/0x1b
---[ end trace fd8f7704fdea0291 ]---
vmalloc: allocation failure, allocated 16384 of 20480 bytes
modprobe: page allocation failure: order:0, mode:0xd2

Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:

 1) All P2M lookups instead of using the __ka address would
use the __va address. This means we can safely erase from
__ka space the PMD pointers that point to the PFNs for
P2M array and be OK.
 2). Allocate a new array, copy the existing P2M into it,
revector the P2M tree to use that, and return the old
P2M to the memory

[PATCH 5/7] xen/p2m: Add logic to revector a P2M tree to use __va leafs.

2012-07-26 Thread Konrad Rzeszutek Wilk

During bootup Xen supplies us with a P2M array. It sticks
it right after the ramdisk, as can be seen with a 128GB PV guest:

(certain parts removed for clarity):
xc_dom_build_image: called
xc_dom_alloc_segment:   kernel   : 0x8100 -> 0x81e43000 
 (pfn 0x1000 + 0xe43 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1000+0xe43 at 0x7f097d8bf000
xc_dom_alloc_segment:   ramdisk  : 0x81e43000 -> 0x925c7000 
 (pfn 0x1e43 + 0x10784 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x1e43+0x10784 at 0x7f0952dd2000
xc_dom_alloc_segment:   phys2mach: 0x925c7000 -> 0xa25c7000 
 (pfn 0x125c7 + 0x1 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x125c7+0x1 at 0x7f0942dd2000
xc_dom_alloc_page   :   start info   : 0xa25c7000 (pfn 0x225c7)
xc_dom_alloc_page   :   xenstore : 0xa25c8000 (pfn 0x225c8)
xc_dom_alloc_page   :   console  : 0xa25c9000 (pfn 0x225c9)
nr_page_tables: 0x/48: 0x -> 
0x, 1 table(s)
nr_page_tables: 0x007f/39: 0xff80 -> 
0x, 1 table(s)
nr_page_tables: 0x3fff/30: 0x8000 -> 
0xbfff, 1 table(s)
nr_page_tables: 0x001f/21: 0x8000 -> 
0xa27f, 276 table(s)
xc_dom_alloc_segment:   page tables  : 0xa25ca000 -> 0xa26e1000 
 (pfn 0x225ca + 0x117 pages)
xc_dom_pfn_to_ptr: domU mapping: pfn 0x225ca+0x117 at 0x7f097d7a8000
xc_dom_alloc_page   :   boot stack   : 0xa26e1000 (pfn 0x226e1)
xc_dom_build_image  : virt_alloc_end : 0xa26e2000
xc_dom_build_image  : virt_pgtab_end : 0xa280

So the physical memory and virtual (using __START_KERNEL_map addresses)
layout looks as so:

  phys __ka
/\   /---\
| 0  | empty | 0x8000|
| .. |   | ..|
| 16MB   | <= kernel starts  | 0x8100|
| .. |   |   |
| 30MB   | <= kernel ends => | 0x81e43000|
| .. |  & ramdisk starts | ..|
| 293MB  | <= ramdisk ends=> | 0x925c7000|
| .. |  & P2M starts | ..|
| .. |   | ..|
| 549MB  | <= P2M ends=> | 0xa25c7000|
| .. | start_info| 0xa25c7000|
| .. | xenstore  | 0xa25c8000|
| .. | cosole| 0xa25c9000|
| 549MB  | <= page tables => | 0xa25ca000|
| .. |   |   |
| 550MB  | <= PGT end => | 0xa26e1000|
| .. | boot stack|   |
\/   \---/

As can be seen, the ramdisk, P2M and pagetables are taking
a bit of __ka addresses space. Which is a problem since the
MODULES_VADDR starts at 0xa000 - and P2M sits
right in there! This results during bootup with the inability to
load modules, with this error:

[ cut here ]
WARNING: at /home/konrad/ssd/linux/mm/vmalloc.c:106 
vmap_page_range_noflush+0x2d9/0x370()
Call Trace:
 [] warn_slowpath_common+0x7a/0xb0
 [] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [] warn_slowpath_null+0x15/0x20
 [] vmap_page_range_noflush+0x2d9/0x370
 [] map_vm_area+0x2d/0x50
 [] __vmalloc_node_range+0x160/0x250
 [] ? module_alloc_update_bounds+0x19/0x80
 [] ? load_module+0x66/0x19c0
 [] module_alloc+0x5c/0x60
 [] ? module_alloc_update_bounds+0x19/0x80
 [] module_alloc_update_bounds+0x19/0x80
 [] load_module+0xfa3/0x19c0
 [] ? security_file_permission+0x86/0x90
 [] sys_init_module+0x5a/0x220
 [] system_call_fastpath+0x16/0x1b
---[ end trace fd8f7704fdea0291 ]---
vmalloc: allocation failure, allocated 16384 of 20480 bytes
modprobe: page allocation failure: order:0, mode:0xd2

Since the __va and __ka are 1:1 up to MODULES_VADDR and
cleanup_highmap rids __ka of the ramdisk mapping, what
we want to do is similar - get rid of the P2M in the __ka
address space. There are two ways of fixing this:

 1) All P2M lookups instead of using the __ka address would
use the __va address. This means we can safely erase from
__ka space the PMD pointers that point to the PFNs for
P2M array and be OK.
 2). Allocate a new array, copy the existing P2M into it,
revector the P2M tree to use that, and return the old
P2M to the memory allocate. This has the advantage that
it sets the stage for using XEN_ELF_NOTE_INIT_P2M
feature. That feature allows us to set the exact virtual
address space we want for the P2M - and allows us to
boot as initial domain on large machines.

So we pick option 2).

This patch only lays the groundwork in the P2M code. The patch
that modifies the MMU is called "xen/mmu: Copy and revector the P2M tree."

Signed-off-by: Konrad Rzeszutek Wilk 
---

[PATCH 7/7] xen/mmu: Remove from __ka space PMD entries for pagetables.

2012-07-26 Thread Konrad Rzeszutek Wilk

Please first read the description in "xen/mmu: Copy and revector the
P2M tree."

At this stage, the __ka address space (which is what the old
P2M tree was using) is partially disassembled. The cleanup_highmap
has removed the PMD entries from 0-16MB and anything past _brk_end
up to the max_pfn_mapped (which is the end of the ramdisk).

The xen_remove_p2m_tree and code around has ripped out the __ka for
the old P2M array.

Here we continue on doing it to where the Xen page-tables were.
It is safe to do it, as the page-tables are addressed using __va.
For good measure we delete anything that is within MODULES_VADDR
and up to the end of the PMD.

At this point the __ka only contains PMD entries for the start
of the kernel up to __brk.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   20 
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 05e8492..738feca 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1241,6 +1241,26 @@ static void __init xen_pagetable_setup_done(pgd_t *base)
xen_start_info->mfn_list = new_mfn_list;
}
}
+#ifdef CONFIG_X86_64
+   /* At this stage, cleanup_highmap has already cleaned __ka space
+* from _brk_limit way up to the max_pfn_mapped (which is the end of
+* the ramdisk). We continue on, erasing PMD entries that point to page
+* tables - do note that they are accessible at this stage via __va.
+* For good measure we also round up to the PMD - which means that if
+* anybody is using __ka address to the initial boot-stack - and try
+* to use it - they are going to crash. The xen_start_info has been
+* taken care of already in xen_setup_kernel_pagetable. */
+   addr = xen_start_info->pt_base;
+   size = roundup(xen_start_info->nr_pt_frames * PAGE_SIZE, PMD_SIZE);
+
+   xen_cleanhighmap(addr, addr + size);
+   xen_start_info->pt_base = (unsigned 
long)__va(__pa(xen_start_info->pt_base));
+
+   /* This is superflous and shouldn't be neccessary, but you know what
+* lets do it. The MODULES_VADDR -> MODULES_END should be clear of
+* anything at this stage. */
+   xen_cleanhighmap(MODULES_VADDR, roundup(MODULES_VADDR, PUD_SIZE) - 1);
+#endif
xen_post_allocator_init();
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/7] xen/mmu: Recycle the Xen provided L4, L3, and L2 pages

2012-07-26 Thread Konrad Rzeszutek Wilk

As we are not using them. We end up only using the L1 pagetables
and grafting those to our page-tables.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   38 --
 1 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 48bdc9f..7f54b75 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1724,6 +1724,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
 {
pud_t *l3;
pmd_t *l2;
+   unsigned long addr[3];
+   unsigned long pt_base, pt_end;
+   unsigned i;
 
/* max_pfn_mapped is the last pfn mapped in the initial memory
 * mappings. Considering that on Xen after the kernel mappings we
@@ -1731,6 +1734,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
 * set max_pfn_mapped to the last real pfn mapped. */
max_pfn_mapped = PFN_DOWN(__pa(xen_start_info->mfn_list));
 
+   pt_base = PFN_DOWN(__pa(xen_start_info->pt_base));
+   pt_end = PFN_DOWN(__pa(xen_start_info->pt_base + 
(xen_start_info->nr_pt_frames * PAGE_SIZE)));
+
/* Zap identity mapping */
init_level4_pgt[0] = __pgd(0);
 
@@ -1749,6 +1755,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
 
+   addr[0] = (unsigned long)pgd;
+   addr[1] = (unsigned long)l2;
+   addr[2] = (unsigned long)l3;
/* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
 * Both L4[272][0] and L4[511][511] have entries that point to the same
 * L2 (PMD) tables. Meaning that if you modify it in __va space
@@ -1791,12 +1800,29 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
__xen_write_cr3(true, __pa(init_level4_pgt));
xen_mc_issue(PARAVIRT_LAZY_CPU);
 
-   /* Offset by one page since the original pgd is going bye bye */
-   memblock_reserve(__pa(xen_start_info->pt_base + PAGE_SIZE),
-(xen_start_info->nr_pt_frames * PAGE_SIZE) - 
PAGE_SIZE);
-   /* and also RW it so it can actually be used. */
-   set_page_prot(pgd, PAGE_KERNEL);
-   clear_page(pgd);
+   /* We can't that easily rip out L3 and L2, as the Xen pagetables are
+* set out this way: [L4], [L1], [L2], [L3], [L1], [L1] ...  for
+* the initial domain. For guests using the toolstack, they are in:
+* [L4], [L3], [L2], [L1], [L1], order .. */
+   for (i = 0; i < ARRAY_SIZE(addr); i++) {
+   unsigned j;
+   /* No idea about the order the addr are in, so just do them 
twice. */
+   for (j = 0; j < ARRAY_SIZE(addr); j++) {
+   if (pt_base == PFN_DOWN(__pa(addr[j]))) {
+   set_page_prot((void *)addr[j], PAGE_KERNEL);
+   clear_page((void *)addr[j]);
+   pt_base++;
+
+   }
+   if (pt_end == PFN_DOWN(__pa(addr[j]))) {
+   set_page_prot((void *)addr[j], PAGE_KERNEL);
+   clear_page((void *)addr[j]);
+   pt_end--;
+   }
+   }
+   }
+   /* Our (by three pages) smaller Xen pagetable that we are using */
+   memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE);
 }
 #else  /* !CONFIG_X86_64 */
 static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/7] xen/mmu: For 64-bit do not call xen_map_identity_early

2012-07-26 Thread Konrad Rzeszutek Wilk

B/c we do not need it. During the startup the Xen provides
us with all the memory mapped that we need to function.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   11 +--
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 7247e5a..a59070b 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -84,6 +84,7 @@
  */
 DEFINE_SPINLOCK(xen_reservation_lock);
 
+#ifdef CONFIG_X86_32
 /*
  * Identity map, in addition to plain kernel map.  This needs to be
  * large enough to allocate page table pages to allocate the rest.
@@ -91,7 +92,7 @@ DEFINE_SPINLOCK(xen_reservation_lock);
  */
 #define LEVEL1_IDENT_ENTRIES   (PTRS_PER_PTE * 4)
 static RESERVE_BRK_ARRAY(pte_t, level1_ident_pgt, LEVEL1_IDENT_ENTRIES);
-
+#endif
 #ifdef CONFIG_X86_64
 /* l3 pud for userspace vsyscall mapping */
 static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss;
@@ -1628,7 +1629,7 @@ static void set_page_prot(void *addr, pgprot_t prot)
if (HYPERVISOR_update_va_mapping((unsigned long)addr, pte, 0))
BUG();
 }
-
+#ifdef CONFIG_X86_32
 static void __init xen_map_identity_early(pmd_t *pmd, unsigned long max_pfn)
 {
unsigned pmdidx, pteidx;
@@ -1679,7 +1680,7 @@ static void __init xen_map_identity_early(pmd_t *pmd, 
unsigned long max_pfn)
 
set_page_prot(pmd, PAGE_KERNEL_RO);
 }
-
+#endif
 void __init xen_setup_machphys_mapping(void)
 {
struct xen_machphys_mapping mapping;
@@ -1765,14 +1766,12 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
/* Note that we don't do anything with level1_fixmap_pgt which
 * we don't need. */
 
-   /* Set up identity map */
-   xen_map_identity_early(level2_ident_pgt, max_pfn);
-
/* Make pagetable pieces RO */
set_page_prot(init_level4_pgt, PAGE_KERNEL_RO);
set_page_prot(level3_ident_pgt, PAGE_KERNEL_RO);
set_page_prot(level3_kernel_pgt, PAGE_KERNEL_RO);
set_page_prot(level3_user_vsyscall, PAGE_KERNEL_RO);
+   set_page_prot(level2_ident_pgt, PAGE_KERNEL_RO);
set_page_prot(level2_kernel_pgt, PAGE_KERNEL_RO);
set_page_prot(level2_fixmap_pgt, PAGE_KERNEL_RO);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/7] xen/mmu: use copy_page instead of memcpy.

2012-07-26 Thread Konrad Rzeszutek Wilk

After all, this is what it is there for.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   13 ++---
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 6ba6100..7247e5a 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1754,14 +1754,14 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
 * it will be also modified in the __ka space! (But if you just
 * modify the PMD table to point to other PTE's or none, then you
 * are OK - which is what cleanup_highmap does) */
-   memcpy(level2_ident_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+   copy_page(level2_ident_pgt, l2);
/* Graft it onto L4[511][511] */
-   memcpy(level2_kernel_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+   copy_page(level2_kernel_pgt, l2);
 
/* Get [511][510] and graft that in level2_fixmap_pgt */
l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
-   memcpy(level2_fixmap_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+   copy_page(level2_fixmap_pgt, l2);
/* Note that we don't do anything with level1_fixmap_pgt which
 * we don't need. */
 
@@ -1821,8 +1821,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
 */
swapper_kernel_pmd =
extend_brk(sizeof(pmd_t) * PTRS_PER_PMD, PAGE_SIZE);
-   memcpy(swapper_kernel_pmd, initial_kernel_pmd,
-  sizeof(pmd_t) * PTRS_PER_PMD);
+   copy_page(swapper_kernel_pmd, initial_kernel_pmd);
swapper_pg_dir[KERNEL_PGD_BOUNDARY] =
__pgd(__pa(swapper_kernel_pmd) | _PAGE_PRESENT);
set_page_prot(swapper_kernel_pmd, PAGE_KERNEL_RO);
@@ -1851,11 +1850,11 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
  512*1024);
 
kernel_pmd = m2v(pgd[KERNEL_PGD_BOUNDARY].pgd);
-   memcpy(initial_kernel_pmd, kernel_pmd, sizeof(pmd_t) * PTRS_PER_PMD);
+   copy_page(initial_kernel_pmd, kernel_pmd);
 
xen_map_identity_early(initial_kernel_pmd, max_pfn);
 
-   memcpy(initial_page_table, pgd, sizeof(pgd_t) * PTRS_PER_PGD);
+   copy_page(initial_page_table, pgd);
initial_page_table[KERNEL_PGD_BOUNDARY] =
__pgd(__pa(initial_kernel_pmd) | _PAGE_PRESENT);
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/7] xen/mmu: Release the Xen provided L4 (PGD) back.

2012-07-26 Thread Konrad Rzeszutek Wilk

Since we are not using it and somebody else could use it.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index a59070b..48bdc9f 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1782,20 +1782,21 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
/* Unpin Xen-provided one */
pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(__pa(pgd)));
 
-   /* Switch over */
-   pgd = init_level4_pgt;
-
/*
 * At this stage there can be no user pgd, and no page
 * structure to attach it to, so make sure we just set kernel
 * pgd.
 */
xen_mc_batch();
-   __xen_write_cr3(true, __pa(pgd));
+   __xen_write_cr3(true, __pa(init_level4_pgt));
xen_mc_issue(PARAVIRT_LAZY_CPU);
 
-   memblock_reserve(__pa(xen_start_info->pt_base),
-xen_start_info->nr_pt_frames * PAGE_SIZE);
+   /* Offset by one page since the original pgd is going bye bye */
+   memblock_reserve(__pa(xen_start_info->pt_base + PAGE_SIZE),
+(xen_start_info->nr_pt_frames * PAGE_SIZE) - 
PAGE_SIZE);
+   /* and also RW it so it can actually be used. */
+   set_page_prot(pgd, PAGE_KERNEL);
+   clear_page(pgd);
 }
 #else  /* !CONFIG_X86_64 */
 static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] xen/swiotlb: If iommu=soft was not passed in on > 4GB, don't turn it on.

2012-07-26 Thread Konrad Rzeszutek Wilk

If we boot a 64-bit guest with more than 4GB memory, the SWIOTLB
gets turned on:
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
software IO TLB [mem 0xfb43d000-0xff43cfff] (64MB) mapped at 
[8800fb43d000-8800ff43cfff]

which is OK if we had PCI devices, but not if we did not. In a PV
guest the SWIOTLB ends up asking the hypervisor for precious lowmem
memory - and 64MB of it per guest. On a 32GB machine, this limits the
amount of guests that are 4GB to start due to lowmem exhaustion.

What we do is detect whether the user supplied e820_hole=1
parameter, which is used to construct an E820 that is similar to
the machine  - so that the PCI regions do not overlap with RAM regions.
We check for that by looking at the E820 and seeing if it diverges
from the standard - and if so (and if iommu=soft was not turned on),
we disable the check pci_swiotlb_detect_4gb code.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/pci-swiotlb-xen.c |   26 ++
 1 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 967633a..56f373e 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -8,6 +8,10 @@
 #include 
 #include 
 
+#include 
+#include 
+#include 
+
 int xen_swiotlb __read_mostly;
 
 static struct dma_map_ops xen_swiotlb_dma_ops = {
@@ -24,7 +28,19 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
 };
+bool __init e820_has_acpi(void)
+{
+   int i;
 
+   /* Check if the user supplied the e820_hole parameter
+* which would create a machine looking E820 region. */
+   for (i = 0; i < e820.nr_map; i++) {
+   if ((e820.map[i].type == E820_ACPI) ||
+   (e820.map[i].type == E820_NVS))
+   return true;
+   }
+   return false;
+}
 /*
  * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary
  *
@@ -33,7 +49,17 @@ static struct dma_map_ops xen_swiotlb_dma_ops = {
  */
 int __init pci_xen_swiotlb_detect(void)
 {
+#ifdef CONFIG_X86_64
 
+   /* Having more than 4GB triggers the native SWIOTLB to activate.
+* The way to turn it off is to set no_iommu. */
+   printk(KERN_INFO "swiotlb: %d\n", swiotlb);
+   if (xen_pv_domain() && !swiotlb && max_pfn > MAX_DMA32_PFN) {
+   /* Normal PV guests only have E820_RSV and E820_RAM regions */
+   if (!e820_has_acpi())
+   no_iommu = 1;
+   }
+#endif
/* If running as PV guest, either iommu=soft, or swiotlb=force will
 * activate this IOMMU. If running as PV privileged, activate it
 * irregardless.
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] xen/swiotlb: If user supplied e820_hole=1 in the guest config, enable SWIOTLB.

2012-07-26 Thread Konrad Rzeszutek Wilk

We can detect that the user is using e820_hole=1 by parsing the E820.
If it shows regions other than E820_RAM or E820_RESV then the
user is bent on providing us with a PCI device and forgot
to do 'iommu=soft'. So lets enable it for him/her.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/pci-swiotlb-xen.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 56f373e..9b1aebb 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -74,6 +74,11 @@ int __init pci_xen_swiotlb_detect(void)
if (xen_pv_domain())
swiotlb = 0;
 
+   /* If it hasn't been activated yet, and it has E820 that looks like
+* the user supplied e820_hole=1, then turn it on. */
+   if (xen_pv_domain() && !xen_initial_domain() &&
+   !xen_swiotlb && e820_has_acpi())
+   xen_swiotlb = 1;
return xen_swiotlb;
 }
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH] Xen-SWIOTLB fixes (v1) for 3.7

2012-07-26 Thread Konrad Rzeszutek Wilk

This is an RFC patch for 3.7.

I thought I had addressed this in the past but I can't seem to find
the patches. There is one bug - if one boots a PV 64-bit guests
with more than 4GB, the SWIOTLB gets turned on - and 64MB of precious
low-memory gets used. If you launch more than 10 of them on a 32GB
machines you are going to run in-to trouble as the lowmem gets
exhausted.

On the other hand, the user might want to have 10 guests with 4GB
and each with a PCI device!

So to fix this, we are going to figure out whether the user had
provided the e820_hole=1 parameter in the guest config. The effect
of that parameter is that a massaged host's E820 is used in the guest
 - and we check if it has E820_ACPI or E820_NVS. If so, the user
really wanted to pass in PCI devices to the guest.

Since we now have a routine to check for the e820_hole we can optimize and
see if the user forgot the "iommu=soft" and automatically turn that
on.

The patches are RFC b/c it looks like something has gone bit-rotten the
last time I used this (when Fedora Core 16 was released) b/c I can't
get a guest to boot with e820_hole :-(. But they [the patches] look
sound to me and they do fix the bug of allocating SWIOTLB for normal
PV guests..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 0/14] PM / shmobile: Pass power domain information via DT (was: Re: [RFD] PM: Device tree representation of power domains)

2012-07-26 Thread Rafael J. Wysocki

On Thursday, July 26, 2012, Kevin Hilman wrote:
> "Rafael J. Wysocki"  writes:
> 
> > On Wednesday, July 25, 2012, Arnd Bergmann wrote:
> >> On Tuesday 24 July 2012, Rafael J. Wysocki wrote:
> >> > On Tuesday, July 24, 2012, Arnd Bergmann wrote:
> >> > > On Tuesday 24 July 2012, Rafael J. Wysocki wrote:
> >> > > > On Tuesday, July 24, 2012, Arnd Bergmann wrote:
> >> > > > > On Saturday 21 July 2012, Rafael J. Wysocki wrote:
> >> > > 
> >> > > > > 
> >> > > > > Sorry for taking so long to reply. I am really not that familiar 
> >> > > > > with the
> >> > > > > power domain requirements, but I do have two comments on your 
> >> > > > > approach:
> >> > > > > 
> >> > > > > * I think when we want to add a generic concept to the device tree 
> >> > > > > such
> >> > > > >   as power domains, we should always make it specified in a 
> >> > > > > generic way.
> >> > > > 
> >> > > > Do we really want that?  I'm a bit skeptical, because apparently 
> >> > > > nobody
> >> > > > cares, as the (zero) response to this patchset evidently indicates 
> >> > > > and
> >> > > > since nobody cares, it's probably better not to add such "generic" 
> >> > > > things
> >> > > > just yet.
> 
> Sorry to jump in late, but it's been another busy dev cycle and I
> haven't had the time to look at this series in detail.  But just so you
> know that somebody cares, we're also interested in bindings that will be
> useful on other SoCs for PM domains.
> 
> However, since OMAP powerdomain support pre-dates generic powerdomains ,
> the "generic" power domains aren't quite generic enough get for OMAP,
> and I haven't had the time to extend the generic code, we haven't yet
> moved to generic powerdomains.
> 
> >> > > 
> >> > > Well, the trouble with bindings is that they are much harder to change
> >> > > later, at least in incompatible ways. 
> >> > 
> >> > Hmm, so I think you wanted to say that it might be burdensome to retain 
> >> > the
> >> > code handling the old binding once we had started to use a new generic 
> >> > one.
> >> > 
> >> > I can agree with that, but that's quite similar to user space interfaces.
> >> > Once we've exposed a user space interface of some kind and someone starts
> >> > to use it, we'll have to maintain it going forward for the user in 
> >> > question.
> >> > However, there is a way to deprecate old user space interfaces and it has
> >> > happened.
> >> > 
> >> > In this particular case the burden would be on Renesas, but I don't 
> >> > think it
> >> > would affect anybody else.
> >> 
> >> [adding devicetree-disc...@lists.ozlabs.org]
> >> 
> >> In case of user space interfaces, we also try very hard to avoid cases
> >> where we know that we will have to change things later.
> >
> > [Cough, cough]  Yeah, sure.  Except that that's rather difficult to 
> > anticipate
> > usually.
> >
> >> I don't think it's that hard to define a generic binding here, we just
> >> need to make sure it's extensible.
> >> 
> >> One thing I would like to avoid is having to add to every single
> >> device binding two separate optional properties defined like
> >> 
> >> diff --git a/Documentation/devicetree/bindings/mmc/mmci.txt 
> >> b/Documentation/devicetree/bindings/mmc/mmci.txt
> >> index 2b584ca..353152e 100644
> >> --- a/Documentation/devicetree/bindings/mmc/mmci.txt
> >> +++ b/Documentation/devicetree/bindings/mmc/mmci.txt
> >> @@ -13,3 +13,9 @@ Required properties:
> >>  Optional properties:
> >>  - mmc-cap-mmc-highspeed  : indicates whether MMC is high speed capable
> >>  - mmc-cap-sd-highspeed   : indicates whether SD is high speed capable
> >> +- pm-domain: a phandle pointing to the power domain
> >> + controlling this device
> >> + See ../pm-domain/generic.txt
> >> +- renesas,pm-domain: a string with the name of the power domain
> >> + controlling this device.
> >> + See ../pm-domain/renesas.txt
> >> 
> >> Even if you say that the burden is only on Renesas to maintain all those
> >> changes to every binding they use, there is also a burden on people trying
> >> to understand the binding and deciding which one to use.
> >
> > What about (tongue in cheek) "renesas,hwmod", then?  That won't be confused
> > with the generic "pm-domain" in any way, will it?  And since TI did that, we
> > surely should be allowed to do it as well, no?
> >
> > Seriously, I'm not fundamentally opposed to using phandles for that in 
> > analogy
> > with regulators, but I'm afraid we won't get it right from the start and it
> > will turn out that we need to change the definition of the binding somehow
> > and _that_ is going to be painful.  Pretty much like changing generic user
> > space interfaces is (as opposed to changing interfaces of limited scope).
> >
> > However, if that route is taken, I'll expect you to require TI to change 
> > their
> > hwmod binding in the analogous way.
> 
> FWIW, we're already working on making ti,hwmods

[PATCH 1/4] xen/p2m: Fix the comment describing the P2M tree.

2012-07-26 Thread Konrad Rzeszutek Wilk

It mixed up the p2m_mid_missing with p2m_missing. Also
remove some extra spaces.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/p2m.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index 64effdc..e4adbfb 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -22,7 +22,7 @@
  *
  * P2M_PER_PAGE depends on the architecture, as a mfn is always
  * unsigned long (8 bytes on 64-bit, 4 bytes on 32), leading to
- * 512 and 1024 entries respectively. 
+ * 512 and 1024 entries respectively.
  *
  * In short, these structures contain the Machine Frame Number (MFN) of the 
PFN.
  *
@@ -139,11 +139,11 @@
  *  /| ~0, ~0,   |
  * | \---/
  * |
- * p2m_missing p2m_missing
- * /--\ /\
- * | [p2m_mid_missing]+>| ~0, ~0, ~0 |
- * | [p2m_mid_missing]+>| ..., ~0|
- * \--/ \/
+ *   p2m_mid_missing   p2m_missing
+ * /-\ /\
+ * | [p2m_missing]   +>| ~0, ~0, ~0 |
+ * | [p2m_missing]   +>| ..., ~0|
+ * \-/ \/
  *
  * where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN | IDENTITY_BIT)
  */
@@ -423,7 +423,7 @@ static void free_p2m_page(void *p)
free_page((unsigned long)p);
 }
 
-/* 
+/*
  * Fully allocate the p2m structure for a given pfn.  We need to check
  * that both the top and mid levels are allocated, and make sure the
  * parallel mfn tree is kept in sync.  We may race with other cpus, so
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] xen/x86: Use memblock_reserve for sensitive areas.

2012-07-26 Thread Konrad Rzeszutek Wilk

instead of a big memblock_reserve. This way we can be more
selective in freeing regions (and it also makes it easier
to understand where is what).

[v1: Move the auto_translate_physmap to proper line]
Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/enlighten.c |   38 ++
 arch/x86/xen/p2m.c   |5 +
 arch/x86/xen/setup.c |9 -
 3 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ff962d4..9b1afa4 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -998,7 +998,44 @@ static int xen_write_msr_safe(unsigned int msr, unsigned 
low, unsigned high)
 
return ret;
 }
+static void __init xen_reserve_mfn(unsigned long mfn)
+{
+   unsigned long pfn;
+
+   if (!mfn)
+   return;
+   pfn = mfn_to_pfn(mfn);
+   if (phys_to_machine_mapping_valid(pfn))
+   memblock_reserve(PFN_PHYS(pfn), PAGE_SIZE);
+}
+static void __init xen_reserve_internals(void)
+{
+   unsigned long size;
+
+   if (!xen_pv_domain())
+   return;
+
+   memblock_reserve(__pa(xen_start_info), PAGE_SIZE);
+
+   xen_reserve_mfn(PFN_DOWN(xen_start_info->shared_info));
+   xen_reserve_mfn(xen_start_info->store_mfn);
 
+   if (!xen_initial_domain())
+   xen_reserve_mfn(xen_start_info->console.domU.mfn);
+
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return;
+
+   /*
+* ALIGN up to compensate for the p2m_page pointing to an array that
+* can partially filled (look in xen_build_dynamic_phys_to_machine).
+*/
+
+   size = PAGE_ALIGN(xen_start_info->nr_pages * sizeof(unsigned long));
+   memblock_reserve(__pa(xen_start_info->mfn_list), size);
+
+   /* The pagetables are reserved in mmu.c */
+}
 void xen_setup_shared_info(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
@@ -1362,6 +1399,7 @@ asmlinkage void __init xen_start_kernel(void)
xen_raw_console_write("mapping kernel into physical memory\n");
pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages);
 
+   xen_reserve_internals();
/* Allocate and initialize top and mid mfn levels for p2m structure */
xen_build_mfn_list_list();
 
diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
index e4adbfb..6a2bfa4 100644
--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -388,6 +388,11 @@ void __init xen_build_dynamic_phys_to_machine(void)
}
 
m2p_override_init();
+
+   /* NOTE: We cannot call memblock_reserve here for the mfn_list as there
+* isn't enough pieces to make it work (for one - we are still using the
+* Xen provided pagetable). Do it later in xen_reserve_internals.
+*/
 }
 
 unsigned long get_phys_to_machine(unsigned long pfn)
diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
index a4790bf..9efca75 100644
--- a/arch/x86/xen/setup.c
+++ b/arch/x86/xen/setup.c
@@ -424,15 +424,6 @@ char * __init xen_memory_setup(void)
e820_add_region(ISA_START_ADDRESS, ISA_END_ADDRESS - ISA_START_ADDRESS,
E820_RESERVED);
 
-   /*
-* Reserve Xen bits:
-*  - mfn_list
-*  - xen_start_info
-* See comment above "struct start_info" in 
-*/
-   memblock_reserve(__pa(xen_start_info->mfn_list),
-xen_start_info->pt_base - xen_start_info->mfn_list);
-
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), _map);
 
return "Xen";
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/4] xen/mmu: Provide comments describing the _ka and _va aliasing issue

2012-07-26 Thread Konrad Rzeszutek Wilk

Which is that the level2_kernel_pgt (__ka virtual addresses)
and level2_ident_pgt (__va virtual address) contain the same
PMD entries. So if you modify a PTE in __ka, it will be reflected
in __va (and vice-versa).

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/mmu.c |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 4ac21a4..6ba6100 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1734,19 +1734,36 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
init_level4_pgt[0] = __pgd(0);
 
/* Pre-constructed entries are in pfn, so convert to mfn */
+   /* L4[272] -> level3_ident_pgt
+* L4[511] -> level3_kernel_pgt */
convert_pfn_mfn(init_level4_pgt);
+
+   /* L3_i[0] -> level2_ident_pgt */
convert_pfn_mfn(level3_ident_pgt);
+   /* L3_k[510] -> level2_kernel_pgt
+* L3_i[511] -> level2_fixmap_pgt */
convert_pfn_mfn(level3_kernel_pgt);
 
+   /* We get [511][511] and have Xen's version of level2_kernel_pgt */
l3 = m2v(pgd[pgd_index(__START_KERNEL_map)].pgd);
l2 = m2v(l3[pud_index(__START_KERNEL_map)].pud);
 
+   /* Graft it onto L4[272][0]. Note that we creating an aliasing problem:
+* Both L4[272][0] and L4[511][511] have entries that point to the same
+* L2 (PMD) tables. Meaning that if you modify it in __va space
+* it will be also modified in the __ka space! (But if you just
+* modify the PMD table to point to other PTE's or none, then you
+* are OK - which is what cleanup_highmap does) */
memcpy(level2_ident_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+   /* Graft it onto L4[511][511] */
memcpy(level2_kernel_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
 
+   /* Get [511][510] and graft that in level2_fixmap_pgt */
l3 = m2v(pgd[pgd_index(__START_KERNEL_map + PMD_SIZE)].pgd);
l2 = m2v(l3[pud_index(__START_KERNEL_map + PMD_SIZE)].pud);
memcpy(level2_fixmap_pgt, l2, sizeof(pmd_t) * PTRS_PER_PMD);
+   /* Note that we don't do anything with level1_fixmap_pgt which
+* we don't need. */
 
/* Set up identity map */
xen_map_identity_early(level2_ident_pgt, max_pfn);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/4] xen/mmu: The xen_setup_kernel_pagetable doesn't need to return anything.

2012-07-26 Thread Konrad Rzeszutek Wilk

We don't need to return the new PGD - as we do not use it.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 arch/x86/xen/enlighten.c |5 +
 arch/x86/xen/mmu.c   |   10 ++
 arch/x86/xen/xen-ops.h   |2 +-
 3 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index 9b1afa4..2b67948 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1295,7 +1295,6 @@ asmlinkage void __init xen_start_kernel(void)
 {
struct physdev_set_iopl set_iopl;
int rc;
-   pgd_t *pgd;
 
if (!xen_start_info)
return;
@@ -1387,8 +1386,6 @@ asmlinkage void __init xen_start_kernel(void)
acpi_numa = -1;
 #endif
 
-   pgd = (pgd_t *)xen_start_info->pt_base;
-
/* Don't do the full vcpu_info placement stuff until we have a
   possible map and a non-dummy shared_info. */
per_cpu(xen_vcpu, 0) = _shared_info->vcpu_info[0];
@@ -1397,7 +1394,7 @@ asmlinkage void __init xen_start_kernel(void)
early_boot_irqs_disabled = true;
 
xen_raw_console_write("mapping kernel into physical memory\n");
-   pgd = xen_setup_kernel_pagetable(pgd, xen_start_info->nr_pages);
+   xen_setup_kernel_pagetable((pgd_t *)xen_start_info->pt_base, 
xen_start_info->nr_pages);
 
xen_reserve_internals();
/* Allocate and initialize top and mid mfn levels for p2m structure */
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3a73785..4ac21a4 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -1719,8 +1719,7 @@ static void convert_pfn_mfn(void *v)
  * of the physical mapping once some sort of allocator has been set
  * up.
  */
-pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
-unsigned long max_pfn)
+void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
pud_t *l3;
pmd_t *l2;
@@ -1781,8 +1780,6 @@ pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
 
memblock_reserve(__pa(xen_start_info->pt_base),
 xen_start_info->nr_pt_frames * PAGE_SIZE);
-
-   return pgd;
 }
 #else  /* !CONFIG_X86_64 */
 static RESERVE_BRK_ARRAY(pmd_t, initial_kernel_pmd, PTRS_PER_PMD);
@@ -1825,8 +1822,7 @@ static void __init xen_write_cr3_init(unsigned long cr3)
pv_mmu_ops.write_cr3 = _write_cr3;
 }
 
-pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
-unsigned long max_pfn)
+void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn)
 {
pmd_t *kernel_pmd;
 
@@ -1858,8 +1854,6 @@ pgd_t * __init xen_setup_kernel_pagetable(pgd_t *pgd,
 
memblock_reserve(__pa(xen_start_info->pt_base),
 xen_start_info->nr_pt_frames * PAGE_SIZE);
-
-   return initial_page_table;
 }
 #endif /* CONFIG_X86_64 */
 
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 202d4c1..2230f57 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -27,7 +27,7 @@ void xen_setup_mfn_list_list(void);
 void xen_setup_shared_info(void);
 void xen_build_mfn_list_list(void);
 void xen_setup_machphys_mapping(void);
-pgd_t *xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
+void xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn);
 void xen_reserve_top(void);
 extern unsigned long xen_max_p2m_pfn;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] documentation, refactor, and cleanups (v2) for 3.7

2012-07-26 Thread Konrad Rzeszutek Wilk

Attached are four patches that documented a bit more the P2M and MMU
code. And as well make some of the code cleaner and easier to read.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] page-table walkers vs memory order

2012-07-26 Thread Peter Zijlstra

On Tue, 2012-07-24 at 14:51 -0700, Hugh Dickins wrote:
> I do love the status quo, but an audit would be welcome.  When
> it comes to patches, personally I tend to prefer ACCESS_ONCE() and
> smp_read_barrier_depends() and accompanying comments to be hidden away
> in the underlying macros or inlines where reasonable, rather than
> repeated all over; but I may have my priorities wrong on that.
> 
> 
Yeah, I was being lazy, and I totally forgot to actually look at the
alpha code.

How about we do a generic (cribbed from rcu_dereference):

#define page_table_deref(p) \
({  \
typeof(*p) *__p = (typeof(*p) __force *)ACCESS_ONCE(p);\
smp_read_barrier_depends(); \
((typeof(*p) __force __kernel *)(__p)); \
})

and use that all over to dereference page-tables. That way all this
lives in one place. Granted, I'll have to go edit all arch code, but I
seem to be doing that on a frequent basis anyway :/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] mfd: add MAX8907 core driver

2012-07-26 Thread Mark Brown

On Thu, Jul 26, 2012 at 01:40:30PM -0600, Stephen Warren wrote:

> +struct max8907_irq_data {
> + int reg;
> + int mask_reg;
> + int offs;   /* bit offset in mask register */
> + boolis_rtc;
> +};

This (and all the code in here) looks very much like regmap-irq (or one
of the pre-regmap drivers I wrote which were factored out to there)...
why can't we use regmap_irq?

Looking at the code it looks like a very similar pattern to the arizona
chips where you've got two IRQ domains in the chip which can be handled
with a single virtual IRQ to do the demux.  We could factor that out
too easily enough, I might just do that...

> + if (!irqd_irq_disabled(d) && (value & irq_data->offs)) {

This looks very suspicious...  why do we need to call
irqd_irq_disabled() here?

> + regmap_write(chip->regmap_gen, MAX8907_REG_CHG_IRQ1_MASK, irq_chg[0]);
> + regmap_write(chip->regmap_gen, MAX8907_REG_CHG_IRQ2_MASK, irq_chg[1]);
> + regmap_write(chip->regmap_gen, MAX8907_REG_ON_OFF_IRQ1_MASK,
> +  irq_on[0]);
> + regmap_write(chip->regmap_gen, MAX8907_REG_ON_OFF_IRQ2_MASK,
> +  irq_on[1]);
> + regmap_write(chip->regmap_rtc, MAX8907_REG_RTC_IRQ_MASK, irq_rtc);

If you have the cache enabled regmap_update_bits() is your friend here,
it'll suppress duplicate I/O.

> +static void max8907_irq_enable(struct irq_data *data)
> +{
> + /* Everything happens in max8907_irq_sync_unlock */
> +}

> +static void max8907_irq_disable(struct irq_data *data)
> +{
> + /* Everything happens in max8907_irq_sync_unlock */
> +}

The fact that these functions are empty is the second part of the above
suspicous check for disabled IRQs.  We're just completely ignoring the
caller here.  What would idiomatically happen is that we'd update a
variable here then write it out in the unmask.

If these functions really should be empty then they should be omitted.

> +static int max8907_irq_set_wake(struct irq_data *data, unsigned int on)
> +{
> + /* Everything happens in max8907_irq_sync_unlock */
> +
> + return 0;
> +}

Again, this doesn't look clever at all.

> + if (irqd_is_wakeup_set(d)) {
> + /* 1 -- disable, 0 -- enable */
> + switch (irq_data->mask_reg) {

This loop we should just port over into the regmap code.

> +static const struct regmap_config max8907_regmap_gen_config = {
> + .reg_bits = 8,
> + .val_bits = 8,
> + .volatile_reg = max8907_gen_is_volatile_reg,
> + .writeable_reg = max8907_gen_is_writeable_reg,
> + .max_register = MAX8907_REG_LDO20VOUT,
> + .cache_type = REGCACHE_RBTREE,
> +};

Your IRQ registers appear to be clear on read which means you should
have a precious_reg callback too otherwise someone looking at the
register map in debugfs can ack interrupts.

signature.asc
Description: Digital signature

Re: thp and memory barrier assumptions

2012-07-26 Thread Peter Zijlstra

On Thu, 2012-07-26 at 22:31 +0200, Peter Zijlstra wrote:
> __do_huge_pmd_anonymous_page() contains:
> 
> /*
>  * The spinlocking to take the lru_lock inside
>  * page_add_new_anon_rmap() acts as a full memory
>  * barrier to be sure clear_huge_page writes become
>  * visible after the set_pmd_at() write.
>  */
> page_add_new_anon_rmap(page, vma, haddr);
> 
> 
> page_add_new_anon_rmap() doesn't look to actually do a LOCK+UNLOCK
> except for unevictable pages.
> 
> But even if it did do an unconditional LOCK+UNLOCK that doesn't make a
> full memory barrier, see Documentation/memory-barriers.txt.
> 
> In particular:
> 
> *A = a;
> LOCK
> UNLOCK
> *B = b;
> 
> may occur as:
> 
> LOCK, STORE *B, STORE *A, UNLOCK
> 


Also, what is that barrier() in handle_mm_fault() doing? And why doesn't
it have a comment explaining that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

thp and memory barrier assumptions

2012-07-26 Thread Peter Zijlstra


__do_huge_pmd_anonymous_page() contains:

/*
 * The spinlocking to take the lru_lock inside
 * page_add_new_anon_rmap() acts as a full memory
 * barrier to be sure clear_huge_page writes become
 * visible after the set_pmd_at() write.
 */
page_add_new_anon_rmap(page, vma, haddr);


page_add_new_anon_rmap() doesn't look to actually do a LOCK+UNLOCK
except for unevictable pages.

But even if it did do an unconditional LOCK+UNLOCK that doesn't make a
full memory barrier, see Documentation/memory-barriers.txt.

In particular:

*A = a;
LOCK
UNLOCK
*B = b;

may occur as:

LOCK, STORE *B, STORE *A, UNLOCK

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 07/21] ASoC: io: Prevent use of regmap if request fails

2012-07-26 Thread Mark Brown

On Thu, Jul 26, 2012 at 05:05:51PM +0100, Lee Jones wrote:
> On 26/07/12 16:25, Mark Brown wrote:

> >You're supposed to use it for the data you use to call back into the
> >underlying I/O code.

> I don't understand. What 'data'?

Whatever your I/O layer so desires, the core doesn't care.  It's
generally whatever the lower layer that does your I/O takes to identify
the device.

> Surely if .read and .write are populated in 'struct
> snd_soc_codec_driver', then it should just call back into those?

Yes, and in fact that's what we do!

signature.asc
Description: Digital signature

Re: [PATCH] cpuidle: coupled: fix sleeping while atomic in cpu notifier

2012-07-26 Thread Rafael J. Wysocki

On Thursday, July 26, 2012, Rafael J. Wysocki wrote:
> On Thursday, July 26, 2012, Colin Cross wrote:
> > On Thu, Jul 26, 2012 at 12:55 PM, Rafael J. Wysocki  wrote:
> > > On Wednesday, July 25, 2012, Colin Cross wrote:
> > >> The cpu hotplug notifier gets called in both atomic and non-atomic
> > >> contexts, it is not always safe to lock a mutex.  Filter out all events
> > >> except the six necessary ones, which are all sleepable, before taking
> > >> the mutex.
> > >
> > > I wonder what mutual exclusion mechanis we rely on when the mutex is not 
> > > taken?
> > 
> > We don't need any mutual exclusion because the notifier returns immediately.
> 
> Don't we need to disable preemption even?

Sorry, scratch that.  It returns NOTIFY_OK if we're not going to take the
mutex.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpuidle: coupled: fix sleeping while atomic in cpu notifier

2012-07-26 Thread Rafael J. Wysocki

On Thursday, July 26, 2012, Colin Cross wrote:
> On Thu, Jul 26, 2012 at 12:55 PM, Rafael J. Wysocki  wrote:
> > On Wednesday, July 25, 2012, Colin Cross wrote:
> >> The cpu hotplug notifier gets called in both atomic and non-atomic
> >> contexts, it is not always safe to lock a mutex.  Filter out all events
> >> except the six necessary ones, which are all sleepable, before taking
> >> the mutex.
> >
> > I wonder what mutual exclusion mechanis we rely on when the mutex is not 
> > taken?
> 
> We don't need any mutual exclusion because the notifier returns immediately.

Don't we need to disable preemption even?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vxge: Declare MODULE_FIRMWARE usage

2012-07-26 Thread Jon Mason

On Thu, Jul 26, 2012 at 12:08 PM, Tim Gardner  wrote:
> Cc: Jon Mason 
> Cc: "David S. Miller" 
> Cc: Joe Perches 
> Cc: Jiri Pirko 
> Cc: Stephen Hemminger 
> Cc: Paul Gortmaker 
> Cc: net...@vger.kernel.org
> Signed-off-by: Tim Gardner 
> ---
>  drivers/net/ethernet/neterion/vxge/vxge-main.c |9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c 
> b/drivers/net/ethernet/neterion/vxge/vxge-main.c
> index de21904..d4832b2 100644
> --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
> +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
> @@ -4203,6 +4203,9 @@ out:
> return ret;
>  }
>
> +#define VXGE_PXE_FIRMWARE "vxge/X3fw-pxe.ncf"
> +#define VXGE_FIRMWARE "vxge/X3fw.ncf"
> +
>  static int vxge_probe_fw_update(struct vxgedev *vdev)
>  {
> u32 maj, min, bld;
> @@ -4245,9 +4248,9 @@ static int vxge_probe_fw_update(struct vxgedev *vdev)
> }
> }
> if (gpxe)
> -   fw_name = "vxge/X3fw-pxe.ncf";
> +   fw_name = VXGE_PXE_FIRMWARE;
> else
> -   fw_name = "vxge/X3fw.ncf";
> +   fw_name = VXGE_FIRMWARE;
>
> ret = vxge_fw_upgrade(vdev, fw_name, 0);
> /* -EINVAL and -ENOENT are not fatal errors for flashing firmware on
> @@ -4855,3 +4858,5 @@ vxge_closer(void)
>  }
>  module_init(vxge_starter);
>  module_exit(vxge_closer);
> +MODULE_FIRMWARE(VXGE_PXE_FIRMWARE);
> +MODULE_FIRMWARE(VXGE_FIRMWARE);

IIUC, MODULE_FIRMWARE is only necessary for devices that need firmware
to operate.  vxge hardware has an image in flash on the nic, and the
modified code is used to update the firmware image on the adapter.
So, this change isn't doing what you want it to do.

Also, wasn't this already discussed (https://lkml.org/lkml/2012/4/12/401)?

Thanks,
Jon



> --
> 1.7.9.5
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/8] acpi-cpufreq: Move modern AMD cpufreq support to acpi-cpufreq

2012-07-26 Thread Rafael J. Wysocki

On Thursday, July 26, 2012, Thomas Renninger wrote:
> On Thursday, July 26, 2012 02:28:36 PM Andre Przywara wrote:
> > The programming model for cpufreq on current AMD CPUs is almost
> > identical to the one used on Intel and VIA hardware. This patchset
> > merges support into acpi-cpufreq and removes it from powernow-k8.
>  
> > This patchset is heavily based on Matthew Garrett's V4 from last July.
> > The boosting part has been mostly reworked and documentation for it
> > has been added. Also there was a need for (yet another) BIOS quirk
> > on AMD desktop boards.
> > 
> > Signed-off-by: Andre Przywara 
> 
> I had a look at Matthew's patches and I like the idea.
> 
> I didn't review Andre's in detail, but if they are based on
> Matthew's and I expect they got some testing, I guess it should
> be fine to push them with the next merge window.

Good, thanks for your opinion, it helps a lot! :-)

I'll have a deeper look at the patches in the next couple of days and will
queue them up for v3.7 if I don't find anything objectionable in them.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH] Make io_submit non-blocking

2012-07-26 Thread Ankit Jain

On 07/25/2012 04:20 AM, Christoph Hellwig wrote:
> On Wed, Jul 25, 2012 at 08:31:10AM +1000, Dave Chinner wrote:
>> FWIW, if you are going to change generic code, you need to present
>> results for other filesystems as well (xfs, btrfs are typical), as
>> they may not have the same problems as ext4 or react the same way to
>> your change. The result might simply be "it is 20% slower"
> 
> And most importantly block devices, as they are one of the biggest
> use cases of AIO.  With an almost no-op get_blocks callback I can't
> see how this change would provide any gain there.

I tried running fio against a block device, disk partition and a
ramdisk. I ran this with a single job though. For disks, bandwidth
seems to stay nearly the same with submit latencies getting better.
And for ramdisk, bandwidth also sees improvement. I should probably
be doing better tests, any suggestions on what or how I can test?
For block devices, if the patch doesn't make it worse, at least, then
that should be good enough?

-- disk ---
  submit latencies(usec)
B/w   iops   runtime min  max   avg  std dev
Read :
Old:  417335 B/s  101   252668msec 4  231  40.03  21.66
New:  419099 B/s  102   251282msec 0  169   8.20   6.95

Write:
Old:  412667 B/s  100   252668msec 3  272  47.65  24.58
New:  415481 B/s  101   251282msec 0  134   7.95   7.11

-- ramdisk ---
  submit latencies(usec)
B/w   iops  runtime   min  max   avg  std dev
Read:
Old:  708235KB/s  177058   1227msec 1   51   1.61  0.72
New:  822157KB/s  205539   1059msec 0   14   0.38  0.52

Write:
Old:  710510KB/s  177627   1227msec 2   46   2.33  0.81
New:  821658KB/s  205414   1059msec 0   24   0.40  0.53

Full fio results are attached, and I dropped cache before running
the tests.

-- 
Ankit Jain
SUSE Labs
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2109: Thu Jul 26 17:14:55 2012
  read : io=102844KB, bw=419099 B/s, iops=102 , runt=251282msec
slat (usec): min=0 , max=169 , avg= 8.20, stdev= 6.95
clat (usec): min=335 , max=3356.7K, avg=255054.47, stdev=158234.29
 lat (usec): min=342 , max=3356.7K, avg=255063.32, stdev=158234.33
clat percentiles (msec):
 |  1.00th=[8],  5.00th=[   50], 10.00th=[   84], 20.00th=[  130],
 | 30.00th=[  169], 40.00th=[  204], 50.00th=[  237], 60.00th=[  269],
 | 70.00th=[  306], 80.00th=[  351], 90.00th=[  437], 95.00th=[  529],
 | 99.00th=[  791], 99.50th=[  914], 99.90th=[ 1237], 99.95th=[ 1483],
 | 99.99th=[ 2073]
bw (KB/s)  : min=  111, max=  646, per=100.00%, avg=410.90, stdev=84.69
  write: io=101956KB, bw=415481 B/s, iops=101 , runt=251282msec
slat (usec): min=0 , max=134 , avg= 7.95, stdev= 7.11
clat (usec): min=189 , max=928209 , avg=58138.79, stdev=76776.72
 lat (usec): min=194 , max=928221 , avg=58147.37, stdev=76776.86
clat percentiles (usec):
 |  1.00th=[  498],  5.00th=[  828], 10.00th=[ 1624], 20.00th=[ 4960],
 | 30.00th=[12352], 40.00th=[22144], 50.00th=[33536], 60.00th=[46848],
 | 70.00th=[63232], 80.00th=[90624], 90.00th=[148480], 95.00th=[203776],
 | 99.00th=[370688], 99.50th=[460800], 99.90th=[643072], 99.95th=[716800],
 | 99.99th=[831488]
bw (KB/s)  : min=   31, max=  864, per=100.00%, avg=408.11, stdev=111.34
lat (usec) : 250=0.02%, 500=0.54%, 750=1.27%, 1000=1.51%
lat (msec) : 2=2.39%, 4=3.60%, 10=4.63%, 20=5.96%, 50=13.51%
lat (msec) : 100=14.18%, 250=27.95%, 500=21.04%, 750=2.78%, 1000=0.46%
lat (msec) : 2000=0.15%, >=2000=0.01%
  cpu  : usr=0.51%, sys=1.52%, ctx=52135, majf=0, minf=23
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.9%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued: total=r=25711/w=25489/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
   READ: io=102844KB, aggrb=409KB/s, minb=409KB/s, maxb=409KB/s, mint=251282msec, maxt=251282msec
  WRITE: io=101956KB, aggrb=405KB/s, minb=405KB/s, maxb=405KB/s, mint=251282msec, maxt=251282msec
fio rand-rw-disk-2-raw.fio --output=/home/radical/src/play/ios-test/logs-with-drop-cache/ad6d29a/raw-disk-2-raw-ad6d29a.log --max-jobs=2 --latency-log --bandwidth-log
ad6d29a sent upstream
random_rw: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=32
fio-2.0.8-9-gfb9f0
Starting 1 process

random_rw: (groupid=0, jobs=1): err= 0: pid=2117: Thu Jul 26 17:53:41 2012
  read : io=102976KB, bw=417335 B/s, iops=101 , runt=252668msec
slat (usec): min=4 , max=231 , avg=40.03, stdev=21.66
clat (usec): min=236 , max=4075.6K, avg=254175.39, stdev=158853.64
 lat (usec): min=339 , max=4075.7K, avg=254216.22, stdev=158853.33
clat percentiles

Re: [PATCH] cpuidle: coupled: fix sleeping while atomic in cpu notifier

2012-07-26 Thread Colin Cross

On Thu, Jul 26, 2012 at 12:55 PM, Rafael J. Wysocki  wrote:
> On Wednesday, July 25, 2012, Colin Cross wrote:
>> The cpu hotplug notifier gets called in both atomic and non-atomic
>> contexts, it is not always safe to lock a mutex.  Filter out all events
>> except the six necessary ones, which are all sleepable, before taking
>> the mutex.
>
> I wonder what mutual exclusion mechanis we rely on when the mutex is not 
> taken?

We don't need any mutual exclusion because the notifier returns immediately.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] cpuidle: coupled: fix sleeping while atomic in cpu notifier

2012-07-26 Thread Rafael J. Wysocki

On Wednesday, July 25, 2012, Colin Cross wrote:
> The cpu hotplug notifier gets called in both atomic and non-atomic
> contexts, it is not always safe to lock a mutex.  Filter out all events
> except the six necessary ones, which are all sleepable, before taking
> the mutex.

I wonder what mutual exclusion mechanis we rely on when the mutex is not taken?

Rafael


> Signed-off-by: Colin Cross 
> ---
>  drivers/cpuidle/coupled.c |   12 
>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/cpuidle/coupled.c b/drivers/cpuidle/coupled.c
> index 2c9bf26..c24dda0 100644
> --- a/drivers/cpuidle/coupled.c
> +++ b/drivers/cpuidle/coupled.c
> @@ -678,6 +678,18 @@ static int cpuidle_coupled_cpu_notify(struct 
> notifier_block *nb,
>   int cpu = (unsigned long)hcpu;
>   struct cpuidle_device *dev;
>  
> + switch (action & ~CPU_TASKS_FROZEN) {
> + case CPU_UP_PREPARE:
> + case CPU_DOWN_PREPARE:
> + case CPU_ONLINE:
> + case CPU_DEAD:
> + case CPU_UP_CANCELED:
> + case CPU_DOWN_FAILED:
> + break;
> + default:
> + return NOTIFY_OK;
> + }
> +
>   mutex_lock(_lock);
>  
>   dev = per_cpu(cpuidle_devices, cpu);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kernel/watchdog.c : fix smp_processor_id() warning

2012-07-26 Thread Don Zickus

On Wed, Jul 25, 2012 at 12:39:45PM +0800, Ming Lei wrote:
> Use raw_smp_processor_id in lockup_detector_bootcpu_resume()
> because it is enough when non-boot CPUs are offline.
> 
> This patch fixes the following warning when DEBUG_PREEMPT
> is enabled.

Is this patched on top of linux-next?

It seems right based on the code usage.  Though it makes me sad the resume
code has to hack into the cpu notifiers like that.

Cheers,
Don

> 
> [  168.259429] BUG: using smp_processor_id() in preemptible [] code: 
> pm/1577
> [  168.259460] caller is lockup_detector_bootcpu_resume+0x8/0x48
> [  168.259490] [] (unwind_backtrace+0x0/0x11c) from [] 
> (debug_smp_processor_id+0xbc/0xf0)
> [  168.259521] [] (debug_smp_processor_id+0xbc/0xf0) from 
> [] (lockup_detector_bootcpu_res
> ume+0x8/0x48)
> [  168.259552] [] (lockup_detector_bootcpu_resume+0x8/0x48) from 
> [] (suspend_devices_and_
> enter+0x1f8/0x358)
> [  168.259552] [] (suspend_devices_and_enter+0x1f8/0x358) from 
> [] (pm_suspend+0x13c/0x204
> )
> [  168.259582] [] (pm_suspend+0x13c/0x204) from [] 
> (state_store+0xb0/0xd4)
> [  168.259582] [] (state_store+0xb0/0xd4) from [] 
> (kobj_attr_store+0x14/0x20)
> [  168.259613] [] (kobj_attr_store+0x14/0x20) from [] 
> (sysfs_write_file+0x10c/0x140)
> [  168.259643] [] (sysfs_write_file+0x10c/0x140) from [] 
> (vfs_write+0xb0/0x138)
> [  168.259643] [] (vfs_write+0xb0/0x138) from [] 
> (sys_write+0x3c/0x68)
> [  168.259674] [] (sys_write+0x3c/0x68) from [] 
> (ret_fast_syscall+0x0/0x48)
> [  168.260375] Enabling non-boot CPUs ...
> 
> Signed-off-by: Ming Lei 
> ---
>  kernel/watchdog.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 69add8a..7ddb11b 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -623,7 +623,7 @@ static struct notifier_block cpu_nfb = {
>   */
>  void lockup_detector_bootcpu_resume(void)
>  {
> - void *cpu = (void *)(long)smp_processor_id();
> + void *cpu = (void *)(long)raw_smp_processor_id();
>  
>   cpu_callback(_nfb, CPU_DEAD_FROZEN, cpu);
>   cpu_callback(_nfb, CPU_UP_PREPARE_FROZEN, cpu);
> -- 
> 1.7.9.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] mfd: add MAX8907 core driver

2012-07-26 Thread Stephen Warren

From: Gyungoh Yoo 

The MAX8907 is an I2C-based power-management IC containing voltage
regulators, a reset controller, a real-time clock, and a touch-screen
controller.

The original driver was written by:
* Gyungoh Yoo 

Various fixes and enhancements by:
* Jin Park 
* Tom Cherry 
* Prashant Gaikwad 
* Dan Willemsen 
* Laxman Dewangan 

During upstreaming, I (swarren):
* Converted to regmap.
* Converted to irq domain, and stopped storing state in globals.
* Allowed probing from device tree.
* Renamed from max8907c->max8907, since the driver covers at least the
C and B revisions.
* General cleanup.

Signed-off-by: Gyungoh Yoo 
Signed-off-by: Stephen Warren 
---
Note that I also have a regulator driver for this part, which I'll post
in the near future. That driver will depend on this patch (at least the
header file). I'm not sure how dependencies between the mfd and regulator
trees are usually managed.

 .../devicetree/bindings/regulator/max8907.txt  |   49 +++
 drivers/mfd/Kconfig|   11 +
 drivers/mfd/Makefile   |1 +
 drivers/mfd/max8907-irq.c  |  436 
 drivers/mfd/max8907.c  |  213 ++
 include/linux/mfd/max8907.h|  248 +++
 6 files changed, 958 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/regulator/max8907.txt
 create mode 100644 drivers/mfd/max8907-irq.c
 create mode 100644 drivers/mfd/max8907.c
 create mode 100644 include/linux/mfd/max8907.h

diff --git a/Documentation/devicetree/bindings/regulator/max8907.txt 
b/Documentation/devicetree/bindings/regulator/max8907.txt
new file mode 100644
index 000..dd48036
--- /dev/null
+++ b/Documentation/devicetree/bindings/regulator/max8907.txt
@@ -0,0 +1,49 @@
+MAX8907 regulator
+
+Required properties:
+- compatible: "maxim,max8907"
+- reg: I2C slave address
+- interrupts: The interrupt output of the controller
+- regulators: A node that houses a sub-node for each regulator within the
+  device. Each sub-node is identified using the regulator-compatible
+  property, with valid values listed below. The content of each sub-node
+  is defined by the standard binding for regulators; see regulator.txt.
+
+Valid regulator-compatible values are:
+
+  sd1, sd2, sd3, ldo1, ldo2, ldo3, ldo4, ldo5, ldo6, ldo7, ldo8, ldo9, ldo10,
+  ldo11, ldo12, ldo13, ldo14, ldo15, ldo16, ldo17, ldo18, ldo19, ldo20, out5v,
+  out33v, bbat, sdby, vrtc, wled.
+
+Example:
+
+   max8907@3c {
+   compatible = "maxim,max8907";
+   reg = <0x3c>;
+   interrupts = <0 86 0x4>;
+
+   regulators {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   regulator@0 {
+   reg = <0>;
+   regulator-compatible = "sd1";
+   regulator-name = 
"nvvdd_sv1,vdd_cpu_pmu";
+   regulator-min-microvolt = <100>;
+   regulator-max-microvolt = <100>;
+   regulator-always-on;
+   };
+
+   regulator@1 {
+   reg = <1>;
+   regulator-compatible = "sd2";
+   regulator-name = "nvvdd_sv2,vdd_core";
+   regulator-min-microvolt = <120>;
+   regulator-max-microvolt = <120>;
+   regulator-always-on;
+   };
+...
+   };
+   };
+   };
diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index 66fd378..1ef2814 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -476,6 +476,17 @@ config MFD_MAX77693
  additional drivers must be enabled in order to use the functionality
  of the device.
 
+config MFD_MAX8907
+   tristate "Maxim Semiconductor MAX8907 PMIC Support"
+   select MFD_CORE
+   depends on I2C=y && GENERIC_HARDIRQS
+   select REGMAP_I2C
+   help
+ Say yes here to support for Maxim Semiconductor MAX8907. This is
+ a Power Management IC. This driver provides common support for
+ accessing the device; additional drivers must be enabled in order
+ to use the functionality of the device.
+
 config MFD_MAX8925
bool "Maxim Semiconductor MAX8925 PMIC Support"
depends on I2C=y && GENERIC_HARDIRQS
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 79dd22d..3cc47ee 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -92,6 +92,7 @@ obj-$(CONFIG_MFD_DA9052_I2C)  +=

Re: [PATCH 1/1] kthread: disable preemption during complete()

2012-07-26 Thread Peter Zijlstra

On Thu, 2012-07-26 at 17:54 +0200, Oleg Nesterov wrote:
> Yes, but this "avoid the preemption after wakeup" can actually help
> kthread_bind()->wait_task_inactive() ?

Yeah.

> This reminds me, Peter had a patch which teaches wait_task_inactive()
> to use sched_in/sched_out notifiers to avoid the polling... 

I did, but from what I could remember you shot holes in it and I didn't
find a way to plug them and not make it a bigger mess than it is now :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 4/4] ACPI: Update Container hotplug messages

2012-07-26 Thread Bjorn Helgaas

On Wed, Jul 25, 2012 at 5:12 PM, Toshi Kani  wrote:
> Updated Container hotplug log messages with acpi_pr_()
> and pr_().
>
> Signed-off-by: Toshi Kani 
> ---
>  drivers/acpi/container.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
> index 01a986d..643e962 100644
> --- a/drivers/acpi/container.c
> +++ b/drivers/acpi/container.c
> @@ -99,7 +99,7 @@ static int acpi_container_add(struct acpi_device *device)
>
>
> if (!device) {
> -   printk(KERN_ERR PREFIX "device is NULL\n");
> +   pr_err(PREFIX "device is NULL\n");
> return -EINVAL;
> }

This whole "if (!device)" check and the printk should be deleted.  If
the ACPI core calls .add() with a null acpi_device pointer, it's a
core bug, and it's better to take the oops and get the backtrace.

>
> @@ -164,7 +164,7 @@ static void container_notify_cb(acpi_handle handle, u32 
> type, void *context)
> case ACPI_NOTIFY_BUS_CHECK:
> /* Fall through */
> case ACPI_NOTIFY_DEVICE_CHECK:
> -   printk(KERN_WARNING "Container driver received %s event\n",
> +   pr_warn("Container driver received %s event\n",
>(type == ACPI_NOTIFY_BUS_CHECK) ?
>"ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK");

This message looks dubious.  Receiving this event should be a normal
occurrence, so the message might be useful for debugging, but doesn't
seem like a KERN_WARNING event for the user.

>
> @@ -185,7 +185,7 @@ static void container_notify_cb(acpi_handle handle, u32 
> type, void *context)
>
> result = container_device_add(, handle);
> if (result) {
> -   pr_warn("Failed to add container\n");
> +   acpi_pr_warn(handle, "Failed to add container\n");
> break;
> }
>
> --
> 1.7.7.6
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 3/4] ACPI: Update Memory hotplug messages

2012-07-26 Thread Bjorn Helgaas

On Wed, Jul 25, 2012 at 5:12 PM, Toshi Kani  wrote:
> Updated Memory hotplug log messages with acpi_pr_()
> and pr_().
>
> Signed-off-by: Toshi Kani 
> ---
>  drivers/acpi/acpi_memhotplug.c |   24 
>  1 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index 06c55cd..dcc8f4d 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -170,7 +170,7 @@ acpi_memory_get_device(acpi_handle handle,
> /* Get the parent device */
> result = acpi_bus_get_device(phandle, );
> if (result) {
> -   printk(KERN_WARNING PREFIX "Cannot get acpi bus device");
> +   acpi_pr_warn(phandle, "Cannot get acpi bus device\n");
> return -EINVAL;
> }
>
> @@ -180,14 +180,14 @@ acpi_memory_get_device(acpi_handle handle,
>  */
> result = acpi_bus_add(, pdevice, handle, ACPI_BUS_TYPE_DEVICE);
> if (result) {
> -   printk(KERN_WARNING PREFIX "Cannot add acpi bus");
> +   acpi_pr_warn(handle, "Cannot add acpi bus\n");
> return -EINVAL;
> }
>
>end:
> *mem_device = acpi_driver_data(device);
> if (!(*mem_device)) {
> -   printk(KERN_ERR "\n driver data not found");
> +   acpi_pr_err(handle, "driver data not found\n");

acpi_driver_data() requires a valid acpi_device *, so dev_err() should
work here.

> return -ENODEV;
> }
>
> @@ -224,7 +224,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> /* Get the range from the _CRS */
> result = acpi_memory_get_device_resources(mem_device);
> if (result) {
> -   printk(KERN_ERR PREFIX "get_device_resources failed\n");
> +   pr_err(PREFIX "get_device_resources failed\n");

And here.

> mem_device->state = MEMORY_INVALID_STATE;
> return result;
> }
> @@ -257,7 +257,7 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
> num_enabled++;
> }
> if (!num_enabled) {
> -   printk(KERN_ERR PREFIX "add_memory failed\n");
> +   acpi_pr_err(mem_device->device->handle, "add_memory 
> failed\n");

And here.

> mem_device->state = MEMORY_INVALID_STATE;
> return -EINVAL;
> }
> @@ -353,7 +353,7 @@ static void acpi_memory_device_notify(acpi_handle handle, 
> u32 event, void *data)
> ACPI_DEBUG_PRINT((ACPI_DB_INFO,
>   "\nReceived DEVICE CHECK 
> notification for device\n"));
> if (acpi_memory_get_device(handle, _device)) {
> -   printk(KERN_ERR PREFIX "Cannot find driver data\n");
> +   acpi_pr_err(handle, "Cannot find driver data\n");
> break;
> }
>
> @@ -361,7 +361,7 @@ static void acpi_memory_device_notify(acpi_handle handle, 
> u32 event, void *data)
> break;
>
> if (acpi_memory_enable_device(mem_device)) {
> -   pr_err(PREFIX "Cannot enable memory device\n");
> +   acpi_pr_err(handle, "Cannot enable memory device\n");

And here.

> break;
> }
>
> @@ -373,12 +373,12 @@ static void acpi_memory_device_notify(acpi_handle 
> handle, u32 event, void *data)
>   "\nReceived EJECT REQUEST notification for 
> device\n"));
>
> if (acpi_bus_get_device(handle, )) {
> -   printk(KERN_ERR PREFIX "Device doesn't exist\n");
> +   acpi_pr_err(handle, "Device doesn't exist\n");
> break;
> }
> mem_device = acpi_driver_data(device);
> if (!mem_device) {
> -   printk(KERN_ERR PREFIX "Driver Data is NULL\n");
> +   acpi_pr_err(handle, "Driver Data is NULL\n");

And here.

> break;
> }
>
> @@ -389,7 +389,7 @@ static void acpi_memory_device_notify(acpi_handle handle, 
> u32 event, void *data)
>  *  with generic sysfs driver
>  */
> if (acpi_memory_disable_device(mem_device)) {
> -   pr_err(PREFIX "Disable memory device\n");
> +   acpi_pr_err(handle, "Disable memory device\n");

And here.  (What is this message supposed to mean, anyway?)

> /*
>  * If _EJ0 was called but failed, _OST is not
>  * necessary.
> @@ -449,7 +449,7 @@ static int acpi_memory_device_add(struct acpi_device 
> *device)
> /* Set the device state */
> mem_device->state = MEMORY_POWER_ON_STATE;

Re: [PATCH v3 2/4] ACPI: Update CPU hotplug messages

2012-07-26 Thread Bjorn Helgaas

On Wed, Jul 25, 2012 at 5:12 PM, Toshi Kani  wrote:
> Updated CPU hotplug log messages with acpi_pr_(),
> dev_() and pr_().  Some messages are also
> changed for clarity.
>
> Signed-off-by: Toshi Kani 
> Tested-by: Vijay Mohan Pandarathil 
> ---
>  drivers/acpi/processor_driver.c |   36 +---
>  1 files changed, 21 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
> index a6bdeaa..225f252 100644
> --- a/drivers/acpi/processor_driver.c
> +++ b/drivers/acpi/processor_driver.c
> @@ -282,7 +282,9 @@ static int acpi_processor_get_info(struct acpi_device 
> *device)
> /* Declared with "Processor" statement; match ProcessorID */
> status = acpi_evaluate_object(pr->handle, NULL, NULL, 
> );
> if (ACPI_FAILURE(status)) {
> -   printk(KERN_ERR PREFIX "Evaluating processor 
> object\n");
> +   acpi_pr_err(pr->handle,
> +   "Failed to evaluate processor object 
> (0x%x)\n",
> +   status);

This looks like it could be a dev_err().

> return -ENODEV;
> }
>
> @@ -301,8 +303,9 @@ static int acpi_processor_get_info(struct acpi_device 
> *device)
> status = acpi_evaluate_integer(pr->handle, METHOD_NAME__UID,
> NULL, );
> if (ACPI_FAILURE(status)) {
> -   printk(KERN_ERR PREFIX
> -   "Evaluating processor _UID [%#x]\n", status);
> +   acpi_pr_err(pr->handle,
> +   "Failed to evaluate processor _UID (0x%x)\n",
> +   status);

And this.

> return -ENODEV;
> }
> device_declaration = 1;
> @@ -345,7 +348,7 @@ static int acpi_processor_get_info(struct acpi_device 
> *device)
> if (!object.processor.pblk_address)
> ACPI_DEBUG_PRINT((ACPI_DB_INFO, "No PBLK (NULL address)\n"));
> else if (object.processor.pblk_length != 6)
> -   printk(KERN_ERR PREFIX "Invalid PBLK length [%d]\n",
> +   acpi_pr_err(pr->handle, "Invalid PBLK length [%d]\n",

And this.

> object.processor.pblk_length);
> else {
> pr->throttling.address = object.processor.pblk_address;
> @@ -429,8 +432,8 @@ static int acpi_cpu_soft_notify(struct notifier_block 
> *nfb,
>  * Initialize missing things
>  */
> if (pr->flags.need_hotplug_init) {
> -   printk(KERN_INFO "Will online and init hotplugged "
> -  "CPU: %d\n", pr->id);
> +   pr_info("Will online and init hotplugged CPU: %d\n",
> +   pr->id);
> WARN(acpi_processor_start(pr), "Failed to start CPU:"
> " %d\n", pr->id);
> pr->flags.need_hotplug_init = 0;
> @@ -491,14 +494,16 @@ static __ref int acpi_processor_start(struct 
> acpi_processor *pr)
>>cdev->device.kobj,
>"thermal_cooling");
> if (result) {
> -   printk(KERN_ERR PREFIX "Create sysfs link\n");
> +   dev_err(>dev,
> +   "Failed to create sysfs link 'thermal_cooling'\n");
> goto err_thermal_unregister;
> }
> result = sysfs_create_link(>cdev->device.kobj,
>>dev.kobj,
>"device");
> if (result) {
> -   printk(KERN_ERR PREFIX "Create sysfs link\n");
> +   dev_err(>cdev->device,
> +   "Failed to create sysfs link 'device'\n");
> goto err_remove_sysfs_thermal;
> }
>
> @@ -560,8 +565,7 @@ static int __cpuinit acpi_processor_add(struct 
> acpi_device *device)
>  */
> if (per_cpu(processor_device_array, pr->id) != NULL &&
> per_cpu(processor_device_array, pr->id) != device) {
> -   printk(KERN_WARNING "BIOS reported wrong ACPI id "
> -   "for the processor\n");
> +   pr_warn("BIOS reported wrong ACPI id for the processor\n");

And this.

> result = -ENODEV;
> goto err_free_cpumask;
> }
> @@ -715,7 +719,7 @@ static void acpi_processor_hotplug_notify(acpi_handle 
> handle,
>
> result = acpi_processor_device_add(handle, );
> if (result) {
> -   pr_err(PREFIX "Unable to add the device\n");
> +   acpi_pr_err(handle, "Unable to add the device\n");
> break;
> }
>
> @@ -727,17 +731,19 @@ static void

Re: [PATCH v3 1/4] ACPI: Add acpi_pr_() interfaces

2012-07-26 Thread Bjorn Helgaas

On Wed, Jul 25, 2012 at 5:12 PM, Toshi Kani  wrote:
> This patch introduces acpi_pr_(), where  is a kernel
> message level such as err/warn/info, to support improved logging
> messages for ACPI, esp. in hotplug operations.  acpi_pr_()
> appends "ACPI" prefix and ACPI object path to the messages.  This
> improves diagnostics in hotplug operations since it identifies an
> object that caused an issue in a log file.
>
> acpi_pr_() takes acpi_handle as an argument, which is passed
> to ACPI hotplug notify handlers from the ACPICA.  Therefore, it is
> always available unlike other kernel objects, such as device.
>
> For example, the statement below
>   acpi_pr_err(handle, "Device don't exist, dropping EJECT\n");
> logs an error message like this at KERN_ERR.
>   ACPI: \_SB_.SCK4.CPU4: Device don't exist, dropping EJECT
>
> ACPI drivers can use acpi_pr_() when they need to identify
> a target ACPI object in their messages, such as error messages.

It's definitely an improvement to have *something* that identifies a
device in these messages.  But the ACPI namespace path is not really
intended to be user-consumable, so I don't think we should expose it
indiscriminately.  I think we should be using the ACPI device name
("PNP0C02:00") whenever possible.  Given the device name, we can get
the path from the sysfs "path" file.

> The usage model is similar to dev_().  acpi_pr_() can
> be used when device is not created/valid, which may be the case for
> ACPI hotplug handlers.  ACPI drivers can continue to use dev_()
> when device is valid.

I'd argue that ACPI driver code should never be called unless the
device is valid, so drivers should *always* be able to use
dev_.  Obviously, ACPI hotplug is currently screwed up (it's
mostly handled in drivers rather than in the ACPI core), so in some of
those hotplug paths in the drivers, we may not have a device yet.  But
those cases should be minimal.

Another possible approach to this is to add a %p extension rather than
adding acpi_printk().  Then you could do, e.g., 'printk("%pA ...\n",
handle)', and printk could interpolate the namespace path.  But I
really think there should be very few places where we need the path,
so I'm not sure it's worth it.

> ACPI drivers also continue to use pr_() when ACPI device
> path does not have to be appended to the messages, such as boot-up
> messages.
>
> Note: ACPI_[WARNING|INFO|ERROR]() are intended for the ACPICA and
> are not associated with the kernel message level.
>
> Signed-off-by: Toshi Kani 
> Tested-by: Vijay Mohan Pandarathil 
> ---
>  drivers/acpi/utils.c|   34 ++
>  include/acpi/acpi_bus.h |   31 +++
>  2 files changed, 65 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/acpi/utils.c b/drivers/acpi/utils.c
> index 3e87c9c..ec0c6f9 100644
> --- a/drivers/acpi/utils.c
> +++ b/drivers/acpi/utils.c
> @@ -454,3 +454,37 @@ acpi_evaluate_hotplug_ost(acpi_handle handle, u32 
> source_event,
>  #endif
>  }
>  EXPORT_SYMBOL(acpi_evaluate_hotplug_ost);
> +
> +/**
> + * acpi_printk: Print messages with ACPI prefix and object path
> + *
> + * This function is intended to be called through acpi_pr_ macros.
> + */
> +void
> +acpi_printk(const char *level, acpi_handle handle, const char *fmt, ...)
> +{
> +   struct va_format vaf;
> +   va_list args;
> +   struct acpi_buffer buffer = {
> +   .length = ACPI_ALLOCATE_BUFFER,
> +   .pointer = NULL
> +   };
> +   const char *path;
> +   acpi_status ret;
> +
> +   va_start(args, fmt);
> +   vaf.fmt = fmt;
> +   vaf.va = 
> +
> +   ret = acpi_get_name(handle, ACPI_FULL_PATHNAME, );
> +   if (ret == AE_OK)
> +   path = buffer.pointer;
> +   else
> +   path = "";
> +
> +   printk("%sACPI: %s: %pV", level, path, );
> +
> +   va_end(args);
> +   kfree(buffer.pointer);
> +}
> +EXPORT_SYMBOL(acpi_printk);
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index bde976e..1c855b8 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -85,6 +85,37 @@ struct acpi_pld {
>
>  acpi_status
>  acpi_get_physical_device_location(acpi_handle handle, struct acpi_pld *pld);
> +
> +void acpi_printk(const char *level, acpi_handle handle, const char *fmt, 
> ...);
> +
> +#define acpi_pr_emerg(handle, fmt, ...)\
> +   acpi_printk(KERN_EMERG, handle, fmt, ##__VA_ARGS__)
> +#define acpi_pr_alert(handle, fmt, ...)\
> +   acpi_printk(KERN_ALERT, handle, fmt, ##__VA_ARGS__)
> +#define acpi_pr_crit(handle, fmt, ...) \
> +   acpi_printk(KERN_CRIT, handle, fmt, ##__VA_ARGS__)
> +#define acpi_pr_err(handle, fmt, ...)  \
> +   acpi_printk(KERN_ERR, handle, fmt, ##__VA_ARGS__)
> +#define acpi_pr_warn(handle, fmt, ...) \
> +   acpi_printk(KERN_WARNING, handle, fmt,

Re: No big TTY/serial patch merge for 3.6-rc1

2012-07-26 Thread Jiri Slaby

On 07/26/2012 09:20 PM, Greg KH wrote:
> I'll just be ignoring any new stuff
> until then, with the exception of patches to fix build errors in the
> tty-next tree :)

Hehe, OK :).

-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: No big TTY/serial patch merge for 3.6-rc1

2012-07-26 Thread Greg KH

On Thu, Jul 26, 2012 at 09:12:29PM +0200, Jiri Slaby wrote:
> On 07/26/2012 09:08 PM, Greg KH wrote:
> > Jiri, I know this postpones your patches from being merged, sorry about
> > that, but this gives us a few more months to ensure that they are
> > working properly :)
> 
> Fine with me.
> 
> When should I send you 3.7 material I have in my local queue -- now or
> after 3.6-rc1 is out as usual?

After 3.6-rc1 is out is good, as I'll just be ignoring any new stuff
until then, with the exception of patches to fix build errors in the
tty-next tree :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/24] xen/arm: hypercalls

2012-07-26 Thread Christopher Covington

Hi Stefano,

On 07/26/2012 11:33 AM, Stefano Stabellini wrote:
> Use r12 to pass the hypercall number to the hypervisor.
> 
> We need a register to pass the hypercall number because we might not
> know it at compile time and HVC only takes an immediate argument.

You're not going to JIT assemble the appropriate HVC instruction? Darn.

How many call numbers are there, though? 8? It seems like it'd be
reasonable to take the approach that seems to be favored for MRC/MCR
instructions, using a function containing switch statement that chooses
between several inline assembly instructions based off an enum passed to
the function. See for example arch_timer_reg_read in
arch/arm/kernel/arch_timer.c.

Regards,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: No big TTY/serial patch merge for 3.6-rc1

2012-07-26 Thread Linus Torvalds

On Thu, Jul 26, 2012 at 12:08 PM, Greg KH  wrote:
>
> I don't really feel comfortable sending you the tty tree at the present
> time to have merged for 3.6-rc1.

Good. This is what I like to see. If it's not ready to be merged,
let's not merge it. I don't think anybody will mind horribly.

  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: No big TTY/serial patch merge for 3.6-rc1

2012-07-26 Thread Jiri Slaby

On 07/26/2012 09:08 PM, Greg KH wrote:
> Jiri, I know this postpones your patches from being merged, sorry about
> that, but this gives us a few more months to ensure that they are
> working properly :)

Fine with me.

When should I send you 3.7 material I have in my local queue -- now or
after 3.6-rc1 is out as usual?

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Regression in staging:r8712u since 3.4 merge

2012-07-26 Thread Larry Finger

Since kernel 3.4, driver r8712u has yielded intermittent errors when connected 
to a secure connection. With Firefox, the message is "Secure Connection Failed: 
SSL received a record with an incorrect Message Authentication Code (Error code: 
ssl_error_bad_mac_read)". A retry may work eventually, When using wget with an 
https URL, the error message is "SSL3_GET_RECORD: decryption failed or bad 
record mac".


This regression is the basis for 
https://bugzilla.kernel.org/show_bug.cgi?id=45071.

Although intermittent, I managed to bisect the problem. The bad commit is

==
commit c8628155ece363487b57d33441ea0359018c0fa7
Author: Eric Dumazet 
Date:   Sun Mar 18 11:07:47 2012 +

tcp: reduce out_of_order memory use

With increasing receive window sizes, but speed of light not improved
that much, out of order queue can contain a huge number of skbs, waiting
to be moved to receive_queue when missing packets can fill the holes.

Some devices happen to use fat skbs (truesize of 4096 + sizeof(struct
sk_buff)) to store regular (MTU <= 1500) frames. This makes highly
probable sk_rmem_alloc hits sk_rcvbuf limit, which can be 4Mbytes in
many cases.

When limit is hit, tcp stack calls tcp_collapse_ofo_queue(), a true
latency killer and cpu cache blower.

Doing the coalescing attempt each time we add a frame in ofo queue
permits to keep memory use tight and in many cases avoid the
tcp_collapse() thing later.

Tested on various wireless setups (b43, ath9k, ...) known to use big skb
truesize, this patch removed the "packets collapsed in receive queue due
to low socket buffer" I had before.

This also reduced average memory used by tcp sockets.

With help from Neal Cardwell.

Signed-off-by: Eric Dumazet 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: H.K. Jerry Chu 
Cc: Tom Herbert 
Cc: Ilpo Järvinen 
Acked-by: Neal Cardwell 
Signed-off-by: David S. Miller 


As every other network driver is OK with this patch, I know the problem is in 
r8712u. Do you have any thoughts on what it might be doing wrong to cause this 
problem?


Thanks,

Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

No big TTY/serial patch merge for 3.6-rc1

2012-07-26 Thread Greg KH

Hi Linus,

I don't really feel comfortable sending you the tty tree at the present
time to have merged for 3.6-rc1.  It contains some tty changes that are
still causing build problems, as Stephen has pointed out over the past
week.  These fixes are being resolved by Alan, but I don't feel that
they have had the time to fully be tested, and given the late arrival of
them (i.e. the past few days), and the lack of real amount of time
testing in linux-next, I'd really like to postpone the whole merge until
3.7.

Right now, this really isn't a whole lot of patches, there are only 62
patches in the tty-next tree.  I've included below the full diffstat and
shortlog of them if anyone wants to see them.

There are maybe a few patches below that I think I should cherry-pick
and have you pull, but that's just a handful, and are only for a few
drivers, nothing in the tty core code at all.

Jiri, I know this postpones your patches from being merged, sorry about
that, but this gives us a few more months to ensure that they are
working properly :)

Alan, please keep sending me patches to fix these merge issues, but for
now, I think it's best to wait until 3.7 for this to go to Linus.

thanks,

greg k-h

--

 .../bindings/tty/serial/nxp-lpc32xx-hsuart.txt |   14 +
 .../devicetree/bindings/tty/serial/of-serial.txt   |3 +
 arch/ia64/hp/sim/simserial.c   |2 +-
 arch/um/drivers/chan_kern.c|4 +-
 arch/um/drivers/line.c |   32 +-
 arch/um/drivers/line.h |3 +-
 drivers/bluetooth/hci_ath.c|2 +-
 drivers/char/mwave/mwavedd.c   |   16 +-
 drivers/char/pcmcia/synclink_cs.c  |   24 +-
 drivers/isdn/gigaset/interface.c   |4 +-
 drivers/isdn/i4l/isdn_tty.c|   16 +-
 drivers/misc/ibmasm/uart.c |   16 +-
 drivers/mmc/card/sdio_uart.c   |   20 +-
 drivers/net/ethernet/sgi/ioc3-eth.c|   22 +-
 drivers/net/irda/irtty-sir.c   |   10 +-
 drivers/net/usb/hso.c  |   12 +-
 drivers/tty/amiserial.c|   20 +-
 drivers/tty/cyclades.c |   82 +-
 drivers/tty/hvc/hvsi_lib.c |2 +-
 drivers/tty/isicom.c   |8 +-
 drivers/tty/moxa.c |   10 +-
 drivers/tty/mxser.c|   20 +-
 drivers/tty/n_gsm.c|8 +-
 drivers/tty/n_tty.c|8 +-
 drivers/tty/pty.c  |  144 ++--
 drivers/tty/rocket.c   |   18 +-
 drivers/tty/serial/8250/8250.c |   88 +--
 drivers/tty/serial/8250/8250.h |   31 +-
 drivers/tty/serial/8250/8250_acorn.c   |   22 +-
 drivers/tty/serial/8250/8250_dw.c  |   38 +-
 drivers/tty/serial/8250/8250_gsc.c |   26 +-
 drivers/tty/serial/8250/8250_hp300.c   |   26 +-
 drivers/tty/serial/8250/8250_pci.c |  126 +--
 drivers/tty/serial/8250/8250_pnp.c |   28 +-
 drivers/tty/serial/8250/serial_cs.c|   30 +-
 drivers/tty/serial/Kconfig |   19 +
 drivers/tty/serial/Makefile|1 +
 drivers/tty/serial/amba-pl011.c|   34 +-
 drivers/tty/serial/bfin_uart.c |2 +-
 drivers/tty/serial/crisv10.c   |   26 +-
 drivers/tty/serial/imx.c   |2 +-
 drivers/tty/serial/ioc4_serial.c   |2 +-
 drivers/tty/serial/jsm/jsm_tty.c   |8 +-
 drivers/tty/serial/lpc32xx_hs.c|  823 
 drivers/tty/serial/of_serial.c |   14 +-
 drivers/tty/serial/pch_uart.c  |   59 +-
 drivers/tty/serial/pxa.c   |   14 +
 drivers/tty/serial/samsung.c   |   30 +-
 drivers/tty/serial/serial_core.c   |   34 +-
 drivers/tty/synclink.c |   36 +-
 drivers/tty/synclink_gt.c  |   24 +-
 drivers/tty/synclinkmp.c   |   24 +-
 drivers/tty/tty_io.c   |  104 +--
 drivers/tty/tty_ioctl.c|  100 +--
 drivers/tty/tty_ldisc.c|   12 +-
 drivers/tty/tty_port.c |   23 +-
 drivers/tty/vt/keyboard.c  |   50 +-
 drivers/tty/vt/vt.c|   63 +-
 drivers/tty/vt/vt_ioctl.c  |   47 +-
 drivers/usb/class/cdc-acm.c|2 +-
 drivers/usb/serial/ark3116.c

[PATCH] vxge: Declare MODULE_FIRMWARE usage

2012-07-26 Thread Tim Gardner

Cc: Jon Mason 
Cc: "David S. Miller" 
Cc: Joe Perches 
Cc: Jiri Pirko 
Cc: Stephen Hemminger 
Cc: Paul Gortmaker 
Cc: net...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
 drivers/net/ethernet/neterion/vxge/vxge-main.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c 
b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index de21904..d4832b2 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -4203,6 +4203,9 @@ out:
return ret;
 }
 
+#define VXGE_PXE_FIRMWARE "vxge/X3fw-pxe.ncf"
+#define VXGE_FIRMWARE "vxge/X3fw.ncf"
+
 static int vxge_probe_fw_update(struct vxgedev *vdev)
 {
u32 maj, min, bld;
@@ -4245,9 +4248,9 @@ static int vxge_probe_fw_update(struct vxgedev *vdev)
}
}
if (gpxe)
-   fw_name = "vxge/X3fw-pxe.ncf";
+   fw_name = VXGE_PXE_FIRMWARE;
else
-   fw_name = "vxge/X3fw.ncf";
+   fw_name = VXGE_FIRMWARE;
 
ret = vxge_fw_upgrade(vdev, fw_name, 0);
/* -EINVAL and -ENOENT are not fatal errors for flashing firmware on
@@ -4855,3 +4858,5 @@ vxge_closer(void)
 }
 module_init(vxge_starter);
 module_exit(vxge_closer);
+MODULE_FIRMWARE(VXGE_PXE_FIRMWARE);
+MODULE_FIRMWARE(VXGE_FIRMWARE);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] drivers/misc: Add realtek pci card reader driver

2012-07-26 Thread Bjørn Mork

 writes:

> +static bool msi_en = 1;
> +module_param(msi_en, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(msi_en, "Enable MSI");
> +
> +static bool adma_mode = 1;
> +module_param(adma_mode, bool, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(adma_mode, "ADMA Mode");

Why would I want to disable these features?  And what if I have two
devices and want different settings for them?


> +int rtsx_pci_read_register(struct rtsx_pdev *pdev, u16 addr, u8 *data)
> +{
> + u32 val = 2 << 30;
> + int i;
> +
> + if (data)
> + *data = 0;

Why would anyone want to call this function with a NULL pointer?

> +
> + val |= (u32)(addr & 0x3FFF) << 16;
> + rtsx_pci_writel(pdev, RTSX_HAIMR, val);
> +
> + for (i = 0; i < MAX_RW_REG_CNT; i++) {
> + val = rtsx_pci_readl(pdev, RTSX_HAIMR);
> + if ((val & (1 << 31)) == 0)
> + break;
> + }
> +
> + if (i >= MAX_RW_REG_CNT)
> + return -ETIMEDOUT;
> +
> + if (data)
> + *data = (u8)(val & 0xFF);

And even if they did, why do go through the read and then check again?
Register reading side effects?  Would be nice if that was mentioned in a
comment. 

> + pr_debug("SG table count = %d\n", pdev->sgi);

dev_dbg here and many other places, maybe?  Always nice to see which
device is spitting out such messages.
> + BUG_ON(!buf || (buf_len <= 0));

OK?  And then I do what? Give you a call?

> + pr_info("%s: pdev->msi_en = %d, pci->irq = %d\n",
> + __func__, pdev->msi_en, pdev->pci->irq);

Same as for the debugging:  dev_info is nicer.

> + pr_err("rtsx_sdmmc: unable to grab IRQ %d, disabling device\n",
> + pdev->pci->irq);


Likewise for other levels.

> +static unsigned char get_card_type(u32 card_status)
> +{
> + unsigned char type = 0;
> +
> + switch (card_status) {
> + case XD_EXIST:
> + type = RTSX_TYPE_XD;
> + break;
> +
> + case MS_EXIST:
> + type = RTSX_TYPE_MS;
> + break;
> +
> + case SD_EXIST:
> + type = RTSX_TYPE_SD;
> + break;
> +
> + default:
> + type = 0;
> + break;

Seems a bit redundant given that you initialized it to 0.

> +static u32 get_card_status(unsigned char type)
> +{
> + u32 card_status = 0;
> +
> + switch (type) {
> + case RTSX_TYPE_XD:
> + card_status = XD_EXIST;
> + break;
> +
> + case RTSX_TYPE_MS:
> + card_status = MS_EXIST;
> + break;
> +
> + case RTSX_TYPE_SD:
> + card_status = SD_EXIST;
> + break;
> +
> + default:
> + card_status = 0;
> + break;

Same as above.

> +static int rtsx_pci_extra_init_hw(struct rtsx_pdev *pdev)
> +{
> + pr_warn("%s\n", __func__);
> + return 0;
> +}
> +
> +static int rtsx_pci_optimize_phy(struct rtsx_pdev *pdev)
> +{
> + pr_warn("%s\n", __func__);
> + return 0;
> +}
> +
> +static void rtsx_pci_turn_on_led(struct rtsx_pdev *pdev)
> +{
> + pr_warn("%s\n", __func__);
> +}
> +
> +static void rtsx_pci_turn_off_led(struct rtsx_pdev *pdev)
> +{
> + pr_warn("%s\n", __func__);
> +}
> +
> +static void rtsx_pci_enable_auto_blink(struct rtsx_pdev *pdev)
> +{
> + pr_warn("%s\n", __func__);
> +}
> +
> +static void rtsx_pci_disable_auto_blink(struct rtsx_pdev *pdev)
> +{
> + pr_warn("%s\n", __func__);
> +}

Can all these stubs really be necessary?  


> +static void rtsx_pci_init_ops(struct rtsx_pdev *pdev)
> +{
> + switch (PCI_PID(pdev)) {
> + case 0x5209:
> + pr_info("Initialize 0x5209\n");
> + pdev->ops.extra_init_hw = rts5209_extra_init_hw;
> + pdev->ops.optimize_phy = rts5209_optimize_phy;
> + pdev->ops.turn_on_led = rts5209_turn_on_led;
> + pdev->ops.turn_off_led = rts5209_turn_off_led;
> + pdev->ops.enable_auto_blink = rts5209_enable_auto_blink;
> + pdev->ops.disable_auto_blink = rts5209_disable_auto_blink;
> + break;
> +
> + case 0x5229:
> + pr_info("Initialize 0x5229\n");
> + pdev->ops.extra_init_hw = rts5229_extra_init_hw;
> + pdev->ops.optimize_phy = rts5229_optimize_phy;
> + pdev->ops.turn_on_led = rts5229_turn_on_led;
> + pdev->ops.turn_off_led = rts5229_turn_off_led;
> + pdev->ops.enable_auto_blink = rts5229_enable_auto_blink;
> + pdev->ops.disable_auto_blink = rts5229_disable_auto_blink;
> + break;
> +
> + default:
> + pr_warn("Initialize dummy ops\n");
> + pdev->ops.extra_init_hw = rtsx_pci_extra_init_hw;
> + pdev->ops.optimize_phy = rtsx_pci_optimize_phy;
> + pdev->ops.turn_on_led = rtsx_pci_turn_on_led;
> + pdev->ops.turn_off_led = rtsx_pci_turn_off_led;
> + pdev->ops.enable_auto_blink =

[PATCH] pvrusb2: Declare MODULE_FIRMWARE usage

2012-07-26 Thread Tim Gardner

Cc: Mike Isely 
Cc: Mauro Carvalho Chehab 
Cc: linux-me...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
 drivers/media/video/pvrusb2/pvrusb2-devattr.c |   17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/media/video/pvrusb2/pvrusb2-devattr.c 
b/drivers/media/video/pvrusb2/pvrusb2-devattr.c
index d8c8982..adc501d3 100644
--- a/drivers/media/video/pvrusb2/pvrusb2-devattr.c
+++ b/drivers/media/video/pvrusb2/pvrusb2-devattr.c
@@ -54,8 +54,9 @@ static const struct pvr2_device_client_desc pvr2_cli_29xxx[] 
= {
{ .module_id = PVR2_CLIENT_ID_DEMOD },
 };
 
+#define PVR2_FIRMWARE_29xxx "v4l-pvrusb2-29xxx-01.fw"
 static const char *pvr2_fw1_names_29xxx[] = {
-   "v4l-pvrusb2-29xxx-01.fw",
+   PVR2_FIRMWARE_29xxx,
 };
 
 static const struct pvr2_device_desc pvr2_device_29xxx = {
@@ -87,8 +88,9 @@ static const struct pvr2_device_client_desc pvr2_cli_24xxx[] 
= {
{ .module_id = PVR2_CLIENT_ID_DEMOD },
 };
 
+#define PVR2_FIRMWARE_24xxx "v4l-pvrusb2-24xxx-01.fw"
 static const char *pvr2_fw1_names_24xxx[] = {
-   "v4l-pvrusb2-24xxx-01.fw",
+   PVR2_FIRMWARE_24xxx,
 };
 
 static const struct pvr2_device_desc pvr2_device_24xxx = {
@@ -369,8 +371,9 @@ static const struct pvr2_device_client_desc 
pvr2_cli_73xxx[] = {
  .i2c_address_list = "\x42"},
 };
 
+#define PVR2_FIRMWARE_73xxx "v4l-pvrusb2-73xxx-01.fw"
 static const char *pvr2_fw1_names_73xxx[] = {
-   "v4l-pvrusb2-73xxx-01.fw",
+   PVR2_FIRMWARE_73xxx,
 };
 
 static const struct pvr2_device_desc pvr2_device_73xxx = {
@@ -475,8 +478,9 @@ static const struct pvr2_dvb_props pvr2_751xx_dvb_props = {
 };
 #endif
 
+#define PVR2_FIRMWARE_75xxx "v4l-pvrusb2-73xxx-01.fw"
 static const char *pvr2_fw1_names_75xxx[] = {
-   "v4l-pvrusb2-73xxx-01.fw",
+   PVR2_FIRMWARE_75xxx,
 };
 
 static const struct pvr2_device_desc pvr2_device_750xx = {
@@ -556,7 +560,10 @@ struct usb_device_id pvr2_device_table[] = {
 };
 
 MODULE_DEVICE_TABLE(usb, pvr2_device_table);
-
+MODULE_FIRMWARE(PVR2_FIRMWARE_29xxx);
+MODULE_FIRMWARE(PVR2_FIRMWARE_24xxx);
+MODULE_FIRMWARE(PVR2_FIRMWARE_73xxx);
+MODULE_FIRMWARE(PVR2_FIRMWARE_75xxx);
 
 /*
   Stuff for Emacs to see, in order to encourage consistent editing style:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/1] mmc: block: Add write packing control

2012-07-26 Thread merez


On Thu, July 26, 2012 8:28 am, S, Venkatraman wrote:
> On Tue, Jul 24, 2012 at 2:14 PM,   wrote:
>> On Mon, July 23, 2012 5:22 am, S, Venkatraman wrote:
>>> On Mon, Jul 23, 2012 at 5:13 PM,   wrote:
 On Wed, July 18, 2012 12:26 am, Chris Ball wrote:
> Hi,  [removing Jens and the documentation list, since now we're
>> talking about the MMC side only]
> On Wed, Jul 18 2012, me...@codeaurora.org wrote:
>> Is there anything else that holds this patch from being pushed to
 mmc-next?
> Yes, I'm still uncomfortable with the write packing patchsets for a
 couple of reasons, and I suspect that the sum of those reasons means
 that
 we should probably plan on holding off merging it until after 3.6.
> Here are the open issues; please correct any misunderstandings: With
>> Seungwon's patchset ("Support packed write command"):
> * I still don't have a good set of representative benchmarks showing
>   what kind of performance changes come with this patchset.  It seems
 like we've had a small amount of testing on one controller/eMMC part
 combo
 from Seungwon, and an entirely different test from Maya, and the
>> results
 aren't documented fully anywhere to the level of describing what the
>> hardware was, what the test was, and what the results were before and
>> after the patchset.
 Currently, there is only one card vendor that supports packed
 commands.
>> Following are our sequential write (LMDD) test results on 2 of our
>> targets
 (in MB/s):
No packingpacking
 Target 1 (SDR 50MHz) 15   25
 Target 2 (DDR 50MHz) 20   30
> With the reads-during-writes regression:
> * Venkat still has open questions about the nature of the read
>   regression, and thinks we should understand it with blktrace before
 trying to fix it.  Maya has a theory about writes overwhelming reads,
 but
 Venkat doesn't understand why this would explain the observed
 bandwidth drop.
 The degradation of read due to writes is not a new behavior and exists
>> also without the write packing feature (which only increases the
>> degradation). Our investigation of this phenomenon led us to the
>> Conclusion that a new scheduling policy should be used for mobile
>> devices,
 but this is not related to the current discussion of the write packing
>> feature.
 The write packing feature increases the degradation of read due to
>> write
 since it allows the MMC to fetch many write requests in a row, instead
 of
 fetching only one at a time.  Therefore some of the read requests will
>> have to wait for the completion of more write requests before they can
>> be
 issued.
>>>
>>> I am a bit puzzled by this claim. One thing I checked carefully when
>> reviewing write packing patches from SJeon was that the code didn't
>> plough through a mixed list of reads and writes and selected only
>> writes.
>>> This section of the code in "mmc_blk_prep_packed_list()", from v8
>> patchset..
>>> 
>>> +   if (rq_data_dir(cur) != rq_data_dir(next)) {
>>> +   put_back = 1;
>>> +   break;
>>> +   }
>>> 
>>>
>>> means that once a read is encountered in the middle of write packing,
>> the packing is stopped at that point and it is executed. Then the next
>> blk_fetch_request should get the next read and continue as before.
>>>
>>> IOW, the ordering of reads and writes is _not_ altered when using
>>> packed
>> commands.
>>> For example if there were 5 write requests, followed by 1 read,
>>> followed by 5 more write requests in the request_queue, the first 5
>> writes will be executed as one "packed command", then the read will be
>> executed, and then the remaining 5 writes will be executed as one
>> "packed command". So the read does not have to wait any more than it
>> waited before (packing feature)
>>
>> Let me try to better explain with your example.
>> Without packing the MMC layer will fetch 2 write requests and wait for
>> the
>> first write request completion before fetching another write request.
>> During this time the read request could be inserted into the CFQ and
>> since
>> it has higher priority than the async write it will be dispatched in the
>> next fetch. So, the result would be 2 write requests followed by one
>> read
>> request and the read would have to wait for completion of only 2 write
>> requests.
>> With packing, all the 5 write requests will be fetched in a row, and
>> then
>> the read will arrive and be dispatched in the next fetch. Then the read
>> will have to wait for the completion of 5 write requests.
>>
>> Few more clarifications:
>> Due to the plug list mechanism in the block layer the applications can
>> "aggregate" several requests to be inserted into the scheduler before
>> waking the MMC queue thread.
>> This leads to a situation where there are several write requests in the
>>

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-26 Thread Rik van Riel


On 07/23/2012 12:04 AM, Hugh Dickins wrote:


I spent hours trying to dream up a better patch, trying various
approaches.  I think I have a nice one now, what do you think?  And
more importantly, does it work?  I have not tried to test it at all,
that I'm hoping to leave to you, I'm sure you'll attack it with gusto!

If you like it, please take it over and add your comments and signoff
and send it in.  The second part won't come up in your testing, and could
be made a separate patch if you prefer: it's a related point that struck
me while I was playing with a different approach.

I'm sorely tempted to leave a dangerous pair of eyes off the Cc,
but that too would be unfair.

Subject-to-your-testing-
Signed-off-by: Hugh Dickins 


This patch looks good to me.

Larry, does Hugh's patch survive your testing?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cx23885: Declare MODULE_FIRMWARE usage

2012-07-26 Thread Tim Gardner

Cc: Mauro Carvalho Chehab 
Cc: Steven Toth 
Cc: Hans Verkuil 
Cc: linux-me...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
 drivers/media/video/cx23885/cx23885-417.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/media/video/cx23885/cx23885-417.c 
b/drivers/media/video/cx23885/cx23885-417.c
index f5c79e5..5d5052d 100644
--- a/drivers/media/video/cx23885/cx23885-417.c
+++ b/drivers/media/video/cx23885/cx23885-417.c
@@ -1786,3 +1786,5 @@ int cx23885_417_register(struct cx23885_dev *dev)
 
return 0;
 }
+
+MODULE_FIRMWARE(CX23885_FIRM_IMAGE_NAME);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] cx231xx: Declare MODULE_FIRMWARE usage

2012-07-26 Thread Tim Gardner

Cc: Mauro Carvalho Chehab 
Cc: Hans Verkuil 
Cc: linux-me...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
 drivers/media/video/cx231xx/cx231xx-417.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/media/video/cx231xx/cx231xx-417.c 
b/drivers/media/video/cx231xx/cx231xx-417.c
index ce2f622..b024e51 100644
--- a/drivers/media/video/cx231xx/cx231xx-417.c
+++ b/drivers/media/video/cx231xx/cx231xx-417.c
@@ -2193,3 +2193,5 @@ int cx231xx_417_register(struct cx231xx *dev)
 
return 0;
 }
+
+MODULE_FIRMWARE(CX231xx_FIRM_IMAGE_NAME);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-26 Thread Rik van Riel


On 07/20/2012 10:36 AM, Michal Hocko wrote:


--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -81,7 +81,12 @@ static void huge_pmd_share(struct mm_struct *mm, unsigned 
long addr, pud_t *pud)
if (saddr) {
spte = huge_pte_offset(svma->vm_mm, saddr);
if (spte) {
-   get_page(virt_to_page(spte));
+   struct page *spte_page = virt_to_page(spte);
+   if (!is_hugetlb_pmd_page_valid(spte_page)) {


What prevents somebody else from marking the hugetlb
pmd invalid, between here...


+   spte = NULL;
+   continue;
+   }


... and here?


+   get_page(spte_page);
break;
}


I think need to take the refcount before checking whether
the hugetlb pmd is still valid.

Also, disregard my previous email in this thread, I just
read Mel's detailed explanation and wrapped my brain
around the bug :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Andre Hedrick (anhedric) has died

2012-07-26 Thread Jeff Garzik


On 07/20/2012 12:39 AM, Nate Lawson wrote:

Dear Linux hackers,

Sorry for the intrusion on this technical list. I wanted to let Andre's fellow 
Linux developers know that he died this past weekend. For those that don't know 
him, Andre was an active developer for the ATA driver a while back.

I have known Andre for about 9 years, although I haven't seen much of him in 
the past year. He and I attended several retreats together in 2002 and 2003. 
I'd like to share a story from the first time I met him.

I was a FreeBSD developer at the time and had done some work in the SCSI (CAM) subsystem. 
I met Andre at a retreat center in the Marin woods. As we ate dinner, he mentioned that 
he was a software developer. "So am I", I said.


A very sad day.  Andre taught me a lot about ATA, and could be 
considered one of the grandfathers of libata.


http://hedrick4419.blogspot.com/
http://www.theregister.co.uk/2012/07/26/andre_hedrick/?2

Rest in peace,

Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] ivtv: Declare MODULE_FIRMWARE usage

2012-07-26 Thread Tim Gardner

Cc: Andy Walls 
Cc: Mauro Carvalho Chehab 
Cc: ivtv-de...@ivtvdriver.org
Cc: linux-me...@vger.kernel.org
Signed-off-by: Tim Gardner 
---
 drivers/media/video/ivtv/ivtv-firmware.c |4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/media/video/ivtv/ivtv-firmware.c 
b/drivers/media/video/ivtv/ivtv-firmware.c
index 02c5ade..6ec7705 100644
--- a/drivers/media/video/ivtv/ivtv-firmware.c
+++ b/drivers/media/video/ivtv/ivtv-firmware.c
@@ -396,3 +396,7 @@ int ivtv_firmware_check(struct ivtv *itv, char *where)
 
return res;
 }
+
+MODULE_FIRMWARE(CX2341X_FIRM_ENC_FILENAME);
+MODULE_FIRMWARE(CX2341X_FIRM_DEC_FILENAME);
+MODULE_FIRMWARE(IVTV_DECODE_INIT_MPEG_FILENAME);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/17] perf: Unified API to record selective sets of arch registers

2012-07-26 Thread Arnaldo Carvalho de Melo

Em Sun, Jul 22, 2012 at 02:14:24PM +0200, Jiri Olsa escreveu:
> This brings a new API to help the selective dump of registers on
> event sampling, and its implementation for x86 arch.
> 
> Added HAVE_PERF_REGS config option to determine if the architecture
> provides perf registers ABI.
> 
> The information about desired registers will be passed in u64 mask.
> It's up to the architecture to map the registers into the mask bits.
> 
> For the x86 arch implementation, both 32 and 64 bit registers
> bits are defined within single enum to ensure 64 bit system can
> provide register dump for compat task if needed in the future.


Anton, Paul, Ben,

Does this look OK for PPC?

- Arnaldo
 
> Signed-off-by: Jiri Olsa 
> Original-patch-by: Frederic Weisbecker 
> ---
>  arch/Kconfig |6 +++
>  arch/x86/Kconfig |1 +
>  arch/x86/include/asm/perf_regs.h |   33 ++
>  arch/x86/kernel/Makefile |2 +
>  arch/x86/kernel/perf_regs.c  |   90 
> ++
>  include/linux/perf_regs.h|   19 
>  6 files changed, 151 insertions(+), 0 deletions(-)
>  create mode 100644 arch/x86/include/asm/perf_regs.h
>  create mode 100644 arch/x86/kernel/perf_regs.c
>  create mode 100644 include/linux/perf_regs.h
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 8c3d957..32f4873 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -222,6 +222,12 @@ config HAVE_PERF_EVENTS_NMI
> subsystem.  Also has support for calculating CPU cycle events
> to determine how many clock cycles in a given period.
>  
> +config HAVE_PERF_REGS
> + bool
> + help
> +   Support selective register dumps for perf events. This includes
> +   bit-mapping of each registers and a unique architecture id.
> +
>  config HAVE_ARCH_JUMP_LABEL
>   bool
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 94de2c5..acebbd6 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -60,6 +60,7 @@ config X86
>   select HAVE_MIXED_BREAKPOINTS_REGS
>   select PERF_EVENTS
>   select HAVE_PERF_EVENTS_NMI
> + select HAVE_PERF_REGS
>   select ANON_INODES
>   select HAVE_ALIGNED_STRUCT_PAGE if SLUB && !M386
>   select HAVE_CMPXCHG_LOCAL if !M386
> diff --git a/arch/x86/include/asm/perf_regs.h 
> b/arch/x86/include/asm/perf_regs.h
> new file mode 100644
> index 000..3f2207b
> --- /dev/null
> +++ b/arch/x86/include/asm/perf_regs.h
> @@ -0,0 +1,33 @@
> +#ifndef _ASM_X86_PERF_REGS_H
> +#define _ASM_X86_PERF_REGS_H
> +
> +enum perf_event_x86_regs {
> + PERF_REG_X86_AX,
> + PERF_REG_X86_BX,
> + PERF_REG_X86_CX,
> + PERF_REG_X86_DX,
> + PERF_REG_X86_SI,
> + PERF_REG_X86_DI,
> + PERF_REG_X86_BP,
> + PERF_REG_X86_SP,
> + PERF_REG_X86_IP,
> + PERF_REG_X86_FLAGS,
> + PERF_REG_X86_CS,
> + PERF_REG_X86_SS,
> + PERF_REG_X86_DS,
> + PERF_REG_X86_ES,
> + PERF_REG_X86_FS,
> + PERF_REG_X86_GS,
> + PERF_REG_X86_R8,
> + PERF_REG_X86_R9,
> + PERF_REG_X86_R10,
> + PERF_REG_X86_R11,
> + PERF_REG_X86_R12,
> + PERF_REG_X86_R13,
> + PERF_REG_X86_R14,
> + PERF_REG_X86_R15,
> +
> + PERF_REG_X86_32_MAX = PERF_REG_X86_GS + 1,
> + PERF_REG_X86_64_MAX = PERF_REG_X86_R15 + 1,
> +};
> +#endif /* _ASM_X86_PERF_REGS_H */
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 8215e56..8d7a619 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -100,6 +100,8 @@ obj-$(CONFIG_SWIOTLB) += pci-swiotlb.o
>  obj-$(CONFIG_OF) += devicetree.o
>  obj-$(CONFIG_UPROBES)+= uprobes.o
>  
> +obj-$(CONFIG_PERF_EVENTS)+= perf_regs.o
> +
>  ###
>  # 64 bit specific files
>  ifeq ($(CONFIG_X86_64),y)
> diff --git a/arch/x86/kernel/perf_regs.c b/arch/x86/kernel/perf_regs.c
> new file mode 100644
> index 000..c00c92a
> --- /dev/null
> +++ b/arch/x86/kernel/perf_regs.c
> @@ -0,0 +1,90 @@
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#ifdef CONFIG_X86_32
> +#define PERF_REG_X86_MAX PERF_REG_X86_32_MAX
> +#else
> +#define PERF_REG_X86_MAX PERF_REG_X86_64_MAX
> +#endif
> +
> +#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
> +
> +static unsigned int pt_regs_offset[PERF_REG_X86_MAX] = {
> + PT_REGS_OFFSET(PERF_REG_X86_AX, ax),
> + PT_REGS_OFFSET(PERF_REG_X86_BX, bx),
> + PT_REGS_OFFSET(PERF_REG_X86_CX, cx),
> + PT_REGS_OFFSET(PERF_REG_X86_DX, dx),
> + PT_REGS_OFFSET(PERF_REG_X86_SI, si),
> + PT_REGS_OFFSET(PERF_REG_X86_DI, di),
> + PT_REGS_OFFSET(PERF_REG_X86_BP, bp),
> + PT_REGS_OFFSET(PERF_REG_X86_SP, sp),
> + PT_REGS_OFFSET(PERF_REG_X86_IP, ip),
> + PT_REGS_OFFSET(PERF_REG_X86_FLAGS, flags),
> + PT_REGS_OFFSET(PERF_REG_X86_CS, cs),
> + PT_REGS_OFFSET(PERF_REG_X86_SS, ss),
> +#ifdef CONFIG_X86_32
> +

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-26 Thread Rik van Riel


On 07/23/2012 12:04 AM, Hugh Dickins wrote:


Please don't be upset if I say that I don't like either of your patches.
Mainly for obvious reasons - I don't like Mel's because anything with
trylock retries and nested spinlocks worries me before I can even start
to think about it; and I don't like Michal's for the same reason as Mel,
that it spreads more change around in common paths than we would like.


I have a naive question.

In huge_pmd_share, we protect ourselves by taking
the mapping->i_mmap_mutex.

Is there any reason we could not take the i_mmap_mutex
in the huge_pmd_unshare path?

I see that hugetlb_change_protection already takes that
lock. Is there something preventing __unmap_hugepage_range
from also taking mapping->i_mmap_mutex?

That way the sharing and the unsharing code are
protected by the same, per shm segment, lock.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] NTFS code doesn't sanitize folder names sufficiently

2012-07-26 Thread Marian Beermann


Hello everyone,

today I noticed some very odd behaviour, which could lead people to 
believe a loss of data, because it is possible to create directories 
with backslashes in them.


I am currently running kernel 3.5.

To completly reproduce the problem to the full extend you'll need a 
Windows computer, but to see whats wrong Linux completly suffices :-)


On a Linux computer
1. Create a directory named TestA on an NTFS partition
2. Create a subdirectory of TestA named TestB
3. Create a third directory alongside TestA named TestA\TestB (the 
fundamental problem is this: backslashes in directory names)


Connect the drive containing the NTFS partition now to a Windows 
computer and navigate to the directory containing TestA and TestA\TestB. 
If you navigate to the folder (not path!) TestA\TestB you'll actually 
see the contents of the path TestA\TestB (the subfolder TestB) and not 
the contents of the directory.
It is not possible on a Windows machine to access the contents of the 
directory named TestA\TestB. This is not a bug in Windows, it's caused 
by a bug in the NTFS driver, which allows illegal characters.


The solution to this would be to disallow creation of files and folders 
on NTFS drives containing illegal characters.


Best regards
Marian Beermann

(notice: I'm not subscribed to linux-fsdevel...)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] uprobes: don't enable/disable signle step if the user did it

2012-07-26 Thread Oleg Nesterov

Well. I agree, this needs changes. To begin with, uprobe should avoid
user_enable_single_step() which does access_process_vm(). And I suspect
uprobes have the problems with TIF_FORCED_TF logic.

But I am not sure about this patch...

On 07/26, Sebastian Andrzej Siewior wrote:
>
> @@ -1528,7 +1528,10 @@ static void handle_swbp(struct pt_regs *regs)
>
>   utask->state = UTASK_SSTEP;
>   if (!pre_ssout(uprobe, regs, bp_vaddr)) {
> - user_enable_single_step(current);
> + if (test_tsk_thread_flag(current, TIF_SINGLESTEP))
> + uprobe->flags |= UPROBE_USER_SSTEP;
> + else
> + user_enable_single_step(current);

This is x86 specific, TIF_SINGLESTEP is not defined on every arch.

> @@ -1569,7 +1572,10 @@ static void handle_singlestep(struct uprobe_task 
> *utask, struct pt_regs *regs)
>   put_uprobe(uprobe);
>   utask->active_uprobe = NULL;
>   utask->state = UTASK_RUNNING;
> - user_disable_single_step(current);
> + if (uprobe->flags & UPROBE_USER_SSTEP)
> + uprobe->flags &= ~UPROBE_USER_SSTEP;
> + else
> + user_disable_single_step(current);

This is not enough (and I am not sure this is portable).

If SINGLESTEP was set, we should send SIGTRAP here. With this patch
we return with X86_EFLAGS_TF set, gdb will be notified only after the
next insn. And if we notify gdb, there is no need to keep X86_EFLAGS_TF.

I'm afraid this needs more thinking and new arch-dependant helpers.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/17] perf: Add ability to attach user level registers dump to sample

2012-07-26 Thread Jiri Olsa

On Thu, Jul 26, 2012 at 07:42:55PM +0200, Stephane Eranian wrote:
> On Wed, Jul 25, 2012 at 8:27 PM, Jiri Olsa  wrote:
> > On Wed, Jul 25, 2012 at 07:39:18PM +0200, Stephane Eranian wrote:
> >> On Sun, Jul 22, 2012 at 2:14 PM, Jiri Olsa  wrote:
> >
> > SNIP
> >
> >> > +   if (sample_type & PERF_SAMPLE_REGS_USER) {
> >> > +   u64 avail = (data->regs_user != NULL);
> >> > +
> >> > +   /*
> >> > +* If there are no regs to dump, notice it through
> >> > +* first u64 being zero.
> >> > +*/
> >> > +   perf_output_put(handle, avail);
> >> > +
> >> The only role of avail is to report whether or not you've captured actual
> >> registers. Could it be used to report the sampled process ABI (32 vs. 64)
> >> instead? Something like:
> >>   PERF_SAMPLE_REGS_ABI_NONE -> no regs captured (emulate your
> >> current behavior)
> >>   PERF_SAMPLE_REGS_ABI_32 -> 32 bit ABI regs captured
> >>   PERF_SAMPLE_REGS_ABI_64 -> 64 bit ABI regs captured
> >>
> >> That could help the tools interpret the register values.
> >
> > Yes, I think that could help once we start dealing with compat tasks.
> >
> You don't control whether or not you capture compat tasks. So
> you have to deal with those right now.
> 
> > The current userspace code stays untouched, because it checks for
> > 'avail != 0', which stays even with your change.
> >
> > I think this could be sent later with all other fixes I'm already
> > working on. But I can work/send it preferentially before whole patchset
> > is taken if you like.
> >
> Well, why not do it now. You'd have to rename the available field
> into something more sensible. Also need to prepare it for future
> extension if they ever become necessary.

I'll send new version shortly

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Question about the fallocate system call

2012-07-26 Thread Pádraig Brady

On 07/26/2012 03:30 PM, Jidong Xiao wrote:
> Hi,
> 
> I just have a simple question about fallocate.
> 
> I want to test the punch hole function of fallocate(). So I wrote such
> a simple program:
> 
> yosemite:/mnt # cat test.c
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(void)
> {
> int fd;
> 
> fd = open("testfile", O_RDWR);
> fallocate(fd,FALLOC_FL_PUNCH_HOLE,0,500*1024*1024);
> close(fd);
> 
> return 0;
> }
> 
> I created a file called "testfile" whose size is 1GB, however, when I
> run the above program, the size of the testfile simply won't change,
> if I use stat command to check the file status, nothing is changed when I
> execute the above program. My filesystem is ext4, as I understand,
> ideally when I run the above program, the file size should decrease
> from 1GB to 512MB, is there anything wrong with the program or I just
> understood incorrectly?
> 
> Thank you for any inputs/comments.

code looks OK,
but you're not checking the return from fallocate().
I'm guessing it's returning -1 with errno = ENOTSUP

cheers,
Pádraig.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH -alternative] mm: hugetlbfs: Close race during teardown of hugetlbfs shared page tables V2 (resend)

2012-07-26 Thread Larry Woodman


On 07/26/2012 01:42 PM, Rik van Riel wrote:

On 07/23/2012 12:04 AM, Hugh Dickins wrote:


Please don't be upset if I say that I don't like either of your patches.
Mainly for obvious reasons - I don't like Mel's because anything with
trylock retries and nested spinlocks worries me before I can even start
to think about it; and I don't like Michal's for the same reason as Mel,
that it spreads more change around in common paths than we would like.


I have a naive question.

In huge_pmd_share, we protect ourselves by taking
the mapping->i_mmap_mutex.

Is there any reason we could not take the i_mmap_mutex
in the huge_pmd_unshare path?


I think it is already taken on every path into huge_pmd_unshare().

Larry


I see that hugetlb_change_protection already takes that
lock. Is there something preventing __unmap_hugepage_range
from also taking mapping->i_mmap_mutex?

That way the sharing and the unsharing code are
protected by the same, per shm segment, lock.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the akpm tree with the tip tree

2012-07-26 Thread Andrew Morton

On Wed, 25 Jul 2012 12:26:13 -0700
Andrew Morton  wrote:

> On Wed, 25 Jul 2012 21:03:51 +0200
> Ingo Molnar  wrote:
> 
> > > This means that if the above code reappears in linux-next or 
> > > mainline, the current copy of 
> > > mm-memcg-fix-compaction-migration-failing-due-to-memcg-limits.patch 
> > > will no longer update it, and I probably won't notice that 
> > > omission.
> > 
> > Did you plan to send Johannes's memcg bits to Linus in this 
> > merge window?
> 
> Yes.  I was kinda thinking of starting the bombing run on Monday but I
> guess I could do the MM queue on Thursday.

Sorry, this didn't work out: there's still too much stuff which hasn't
gone into mainline yet (slab, NFS, others).  Merging the MM code now
would involve a worrying amount of last-minute code rework and would
cause the owners of those trees to have to do last-minute rework as
well.  This is why I always go last.

I'll take another look on Monday.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 0/7] ZPODD patches for scsi tree

2012-07-26 Thread Jeff Garzik


On 07/26/2012 10:41 AM, Aaron Lu wrote:

On Thu, Jul 26, 2012 at 09:43:37AM -0400, Jeff Garzik wrote:

On 07/26/2012 06:05 AM, Aaron Lu wrote:

I can't set a flag in libata-acpi.c since a related function is
missing in scsi-misc tree. Will fix this when 3.6-rc1 released.



What does this mean?  Would you be more specific?



The patch "libata-acpi: add ata port runtime D3Cold support" by Lin Ming
introduced a function ata_acpi_wake_dev in libata-acpi.c, and only lives
in libata-next tree but not scsi-misc tree.

[...]

Another minor issue is, I need to use the can_power_off and
wakeup_by_user flag of the scsi_device structure in sr patches, but
they are all introduced in patches in libata-next tree, so I have to
re-define them in this patch set. Will cause conflict if James send
these sr patches to Linus. Any way to avoid this?



Linus said he just merged the libata patches, so they should appear in 
the torvalds/linux.git as soon as he pushes out (in the next 12 hours, 
I'm guessing).


Up to James how he wants to coordinate after that...  he might pull in 
Linus's tree into scsi-misc or another solution.


Jeff


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

regulators: creating a regulator device for the AC/USB/BAT/charge component of a PMIC?

2012-07-26 Thread Stephen Warren

Mark, Liam,

A couple of the regulators I'm looking at (I guess many/most in fact)
are structured as:

Battery, AC, USB, ... -> PMIC -> main output (unregulated?)

main output -> PMIC input pins for some of the SW-controllable
regulators. This is an external connection on the board.

Should this "main output" be represented as a regulator itself?

In more graphical/concrete terms, take the TPS6586x:

+---+
|   |
AC  --> | \ |
USB --> |  |--> SYS | >---\
BAT --> | / | |
|   VIN_SM0 | <---/
| v |
|   SM0 OUT | ---> other devices
...

... where SM0 is one of the regulators the driver already exposes.

I assume SYS should be an explicit regulator device, because all the
other regulators within the PMIC can be set up to require that a supply
be specified (in the DT, a vin-sm0-supply property is mandatory for the
TPS6586x driver), so some regulator object must exist and be provided as
the supply.

The alternative would be to this would be to ignore this aspect of the
PMIC, and just create a standalone fixed regulator to act as the supply
for the SM0 regulator. However, this doesn't seem like an accurate model
of the HW.

However, some of the regulators in the TPS6586x at least are fed
directly from the SYS output by an internal connection within the PMIC
(e.g. LDO5). Currently, the driver sets up these regulators as having no
supply, which seems wrong too. Presumably the PMIC driver should
internally hook up its SYS as LDO5's supply without needing any platform
data or DT ldo5-supply property to do this?

What are your thoughts here?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 12/13] driver core: firmware loader: use small timeout for cache device firmware

2012-07-26 Thread Borislav Petkov

On Thu, Jul 26, 2012 at 11:48:17PM +0800, Ming Lei wrote:
> On Thu, Jul 26, 2012 at 8:36 PM, Borislav Petkov  wrote:
> > On Wed, Jul 25, 2012 at 01:00:12AM +0800, Ming Lei wrote:
> >> Because device_cache_firmwares only cache the firmware which has been
> >> loaded sucessfully at leat once, using a small loading timeout should
> >
> > least
> >
> >> be OK.
> >
> > Your commit message doesn't explain why exactly we decrease the timeout:
> 
> I have explained it. Because the firmware has been loaded successfully at 
> least
> once, so it is very probably to not timeout.
> 
> > you should probably say that this patch overrides the default 60s
> > timeout because we're in pre-suspend/-hibernate mode where we have
> > userspace and are able to load the firmware quickly.
> 
> No, it is not what I was saying.

Ok, maybe I'm not understanding this then. So explain to me this: why
do you need that timeout value of 10, how did we decide it to be 10
(and not 20 or 30 or whatever)? Generally, why do we need to reprogram
the timer to a smaller timeout instead of simply doing the completion
without a timeout?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [BUG] Problem with commit cf03c5dac83609f09d9f0e9fa3c09d86daed614d

2012-07-26 Thread Dirk Gouders

Seth Forshee  writes:

> On Thu, Jul 26, 2012 at 05:07:57PM +0200, Dirk Gouders wrote:
>> Hi Seth,
>> 
>> thanks for your reply and sorry for the noise.
>> 
>> I followed your advice and tried to boot with the WLAN interface turned
>> off, and the problem still exists.  I'll start a new bisect session,
>> probably with one of the commits you mentioned as the first good commit.
>
> Just to make sure three's not any confusion ...
>
> What I was suggesting was not just disabling the network interface but
> completely preventing brcmsmac from being loaded. The oops you saw
> happens in the context of the driver's probe function and would happen
> regardless of whether or not the interface is enabled.

Sorry for the confusion.

I already started a new bisect session, with a bad commit I disabled
brcmsmac in the kernel config and the problem still exists.  Now, I
continue with brcmsmac disabled and will see where the bisect ends.

Dirk
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PATCH] char/misc patches for 3.6-rc1

2012-07-26 Thread Greg KH

The following changes since commit 84a1caf1453c3d44050bd22db958af4a7f99315c:

  Linux 3.5-rc7 (2012-07-14 15:40:28 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git/ 
tags/char-misc-3.6-rc1

for you to fetch changes up to 6078188e2ba1d61a2119ddb2289e88c2c2a015ab:

  mei: use module_pci_driver (2012-07-19 15:33:30 -0700)


CHAR/MISC patches for 3.6-rc1

Here's the "big" pull request for 3.6-rc1 for the char/misc drivers.

It's really just a few updates to the mei driver, plus 4 other tiny patches,
nothing big at all.

Signed-off-by: Greg Kroah-Hartman 


Alexandre Pereira da Silva (1):
  misc: at25: Parse dt settings

Camuso, Tony (1):
  misc: hpilo: increase number of max supported channels

Devendra Naga (1):
  powerpc/BSR: cleanup the error path of bsr_init

Greg Kroah-Hartman (1):
  Merge 3.5-rc7 into char-misc-next.

Tomas Winkler (14):
  mei: mei.txt: minor grammar fixes
  mei: check for error codes that mei_flow_ctrl_creds retuns
  mei: make mei_write_message more readable
  mei: mei_irq_thread_write_handler check for overflow
  mei: group wd_interface_reg with watchdog variables within struct 
mei_device
  mei: don't query HCSR for host buffer depth
  mei: revamp host buffer interface function
  mei: mei_device can be const for mei register access functions
  mei: remove write only wariable wd_due_counter
  mei: mei_wd_host_init: update the comment
  mei: introduce mei_data2slots wrapper
  mei: streamline the _mei_irq_thread_close/ioctol functions
  mei: mei_irq_thread_write_handler - line break fix
  mei: use module_pci_driver

 Documentation/devicetree/bindings/misc/at25.txt |   21 +++
 Documentation/misc-devices/mei/mei.txt  |   14 +-
 drivers/char/bsr.c  |6 +-
 drivers/misc/eeprom/at25.c  |   61 +---
 drivers/misc/hpilo.c|   33 +++--
 drivers/misc/hpilo.h|4 +-
 drivers/misc/mei/init.c |4 +-
 drivers/misc/mei/interface.c|   85 +---
 drivers/misc/mei/interface.h|   18 ++-
 drivers/misc/mei/interrupt.c|  169 ++-
 drivers/misc/mei/iorw.c |8 +-
 drivers/misc/mei/main.c |   48 +--
 drivers/misc/mei/mei_dev.h  |   24 ++--
 drivers/misc/mei/wd.c   |6 +-
 14 files changed, 242 insertions(+), 259 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/misc/at25.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PATCH] Driver core merge for 3.6-rc1

2012-07-26 Thread Greg KH

The following changes since commit 84a1caf1453c3d44050bd22db958af4a7f99315c:

  Linux 3.5-rc7 (2012-07-14 15:40:28 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git/ 
tags/driver-core-3.6-rc1

for you to fetch changes up to 6791457a090d9a234a40b501c2536f0aefaeae4b:

  printk: Export struct log size and member offsets through vmcoreinfo 
(2012-07-19 17:14:18 -0700)


Driver core merge for 3.6-rc1

Here's the big driver core pull request for 3.6-rc1.

Unlike 3.5, this kernel should be a lot tamer, with the printk changes now
settled down.  All we have here is some extcon driver updates, w1 driver
updates, a few printk cleanups that weren't needed for 3.5, but are good to
have now, and some other minor fixes/changes in the driver core.

All of these have been in the linux-next releases for a while now.

Signed-off-by: Greg Kroah-Hartman 


Andrew Morton (1):
  sysfs: fail dentry revalidation after namespace change fix

Arend van Spriel (1):
  debugfs: change parameter check in debugfs_remove() functions

Axel Lin (2):
  extcon: Set platform drvdata in gpio_extcon_probe() and fix irq leak
  extcon: Convert extcon_gpio to devm_gpio_request_one

Chanwoo Choi (1):
  extcon: MAX77693: Add extcon-max77693 driver to support Maxim MAX77693 
MUIC device

Devendra Naga (1):
  w1: cleanup w1_uevent

Glauber Costa (1):
  sysfs: fail dentry revalidation after namespace change

Greg Kroah-Hartman (3):
  Revert "w1: introduce a slave mutex for serializing IO"
  Merge v3.5-rc5 into driver-core-next
  Merge 3.5-rc7 into driver-core-next

Hans de Goede (1):
  device-core: Ensure drvdata = NULL when no driver is bound

K. Y. Srinivasan (1):
  Drivers: hv: Change the hex constant to a decimal constant

Kay Sievers (4):
  kmsg - properly print over-long continuation lines
  kmsg - avoid warning for CONFIG_PRINTK=n compilations
  kmsg - export "continuation record" flag to /dev/kmsg
  kmsg - do not flush partial lines when the console is busy

Lars-Peter Clausen (2):
  driver-core: Move kobj_to_dev from genhd.h to device.h
  driver-core: Use kobj_to_dev instead of re-implementing it

Mark Brown (5):
  Extcon: Staticise extcon_class
  Extcon: Arizona: Add driver for Wolfson Arizona class devices
  driver core: Move deferred devices to the end of dpm_list before probing
  extcon: arizona: Update cable reporting calls and split headset
  extcon: arizona: Stop microphone detection if we give up on it

Markus Franke (1):
  w1: Add 1-wire slave device driver for DS28E04-100

Mel Gorman (1):
  stable: Allow merging of backports for serious user-visible performance 
issues

Ming Lei (1):
  driver core: fix shutdown races with probe/remove(v3)

MyungJoo Ham (1):
  MAINTAINERS: Add entries for extcon (external connector) subsystem.

NeilBrown (4):
  w1: introduce a slave mutex for serializing IO
  w1: omap_hdq: Fix some error/debug handling.
  w1: omap_hdq: use wait_event_timeout to wait for read to complete.
  W1: split master mutex to avoid deadlocks.

Otavio Salvador (1):
  w1: Fix a typo in 'hardware' word

Paul Gortmaker (1):
  stable: update references to older 2.6 versions for 3.x

Peter Meerwald (1):
  extcon: spelling of detach in function doc

Rabin Vincent (1):
  driver core: always handle dpm_order

Rafael J. Wysocki (1):
  PM / Runtime: Do not increment device usage counts before probing

Randy Dunlap (1):
  driver core: fix some kernel-doc warnings in dma*.c

Sebastian Ott (2):
  driver core: move uevent call to driver_register
  driver core: don't trigger uevent after failure

Vivek Goyal (1):
  printk: Export struct log size and member offsets through vmcoreinfo

 Documentation/ABI/stable/sysfs-driver-w1_ds28e04 |   15 +
 Documentation/ABI/testing/dev-kmsg   |   29 +-
 Documentation/stable_kernel_rules.txt|   19 +-
 Documentation/w1/slaves/w1_ds28e04   |   36 +
 MAINTAINERS  |8 +
 drivers/base/bus.c   |1 -
 drivers/base/core.c  |   71 +-
 drivers/base/dd.c|   20 +-
 drivers/base/dma-buf.c   |1 +
 drivers/base/dma-coherent.c  |1 +
 drivers/base/driver.c|6 +-
 drivers/base/firmware_class.c|6 +-
 drivers/extcon/Kconfig   |   18 +
 drivers/extcon/Makefile  |2 +
 drivers/extcon/extcon-arizona.c  |  490 ++
 drivers/extcon/extcon-max77693.c |  779 ++
 drivers/extcon/extcon_class.c

Re: [RFC PATCH 08/13] driver core: firmware loader: fix device lifetime

2012-07-26 Thread Borislav Petkov

On Thu, Jul 26, 2012 at 11:44:48PM +0800, Ming Lei wrote:
> On Thu, Jul 26, 2012 at 8:20 PM, Borislav Petkov  wrote:
> >
> > Ok, here's what I got from looking at the patch:
> >
> > Your commit message says: "Also request_firmware_nowait should be called
> > in atomic context now, so fix the obsolete comments."
> >
> > Atomic context in my book means you're not allowed to sleep at all.
> 
> In fact, I mean the function can be called in atomic context now, and
> I know some time ago the function will create kthread to execute
> the request_firmware, and atomic context is not allowed.

Right, but when called with GFP_KERNEL mask, it can sleep, right?

> > But the comment says that it is possible to sleep a little. This is very
> > wrongly formulated AFAICT.
> 
> The function can be run in both contexts, and I don't see any words which
> says the function will sleep.

"
...
 *  Asynchronous variant of request_firmware() for user contexts where
 *  it is not possible to sleep for long time.
 **/
"

Not possible to sleep for a long time means the function still *can*
sleep... even for short time. For a certain definion of "short."

> > But, since request_firmware_nowait receives a GFP mask as one of its
> > arguments and some of its callers don't supply GFP_ATOMIC then this
> > has nothing to do with atomic contexts at all. Then, you should simply
> > explain in the comment why exactly callers aren't allowed to be sleeping
> > for a long time. And using adjectives like "long" or "short" is very
> > misleading in such explanations so please be more specific as to why the
> 
> It is the original one, and I don't think it is wrong. Also it
> shouldn't be covered
> by this patch.
> 
> Maybe I shouldn't have fixed the comment in this patch.

Why, simply fix the comment to adhere to what the function does. And
since it can sleep, maybe the easiest fix is to say simply that:
"function can sleep", right?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:x86:fpu 2/2] arch/x86/kernel/signal.c:626:4: error: implicit declaration of function '__setup_frame'

2012-07-26 Thread Suresh Siddha

On Thu, 2012-07-26 at 07:27 +0800, Fengguang Wu wrote:
> Hi Suresh,
> 
> Kernel build failed on
> 
> tree:   tip/x86/fpu x86/fpu
> head:   29221d4b89d4e50f05ade42ad3b22e92bb564ca4
> commit: 29221d4b89d4e50f05ade42ad3b22e92bb564ca4 [2/2] x86, fpu: Unify signal 
> handling code paths for x86 and x86_64 kernels
> config: x86_64-randconfig-s003 (attached as .config)
> 
> All related error/warning messages:
> 
> arch/x86/kernel/signal.c: In function 'setup_rt_frame':
> arch/x86/kernel/signal.c:626:4: error: implicit declaration of function 
> '__setup_frame' [-Werror=implicit-function-declaration]
> cc1: some warnings being treated as errors
> --
> arch/x86/kernel/xsave.c: In function 'save_fsave_header':
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: 'X86_FXSR_MAGIC' undeclared (first use 
> in this function)
> arch/x86/kernel/xsave.c:145:7: note: each undeclared identifier is reported 
> only once for each function it appears in
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c:145:7: error: dereferencing pointer to incomplete type
> arch/x86/kernel/xsave.c: In function 'save_user_xstate':
> arch/x86/kernel/xsave.c:209:15: warning: ignoring return value of 
> '__clear_user', declared with attribute warn_unused_result [-Wunused-result]
> 

Appended the patch for this. Thanks!
---
From: Suresh Siddha 
Subject: x86, fpu: fix x86_64 build without CONFIG_IA32_EMULATION

Fengguang's automated build reported some compilation failures:
> arch/x86/kernel/signal.c: In function 'setup_rt_frame':
> arch/x86/kernel/signal.c:626:4: error: implicit declaration of function 
> '__setup_frame'
> arch/x86/kernel/xsave.c: In function 'save_fsave_header':
> arch/x86/kernel/xsave.c:144:7: error: dereferencing pointer to incomplete type
> ...

Fix x86_64 kernel build without CONFIG_IA32_EMULATION.

Code saving fsave prefix is applicable only for CONFIG_X86_32 or
CONFIG_IA32_EMULATION. Use config_enabled() checks to remove the unnecessary
code compile-time for x86_64 kernels build without CONFIG_IA32_EMULATION.

Also while we are at this, fix a spurious warning:
> arch/x86/kernel/xsave.c:209:15: warning: ignoring return value of 
> ‘__clear_user’, declared with attribute warn_unused_result

Signed-off-by: Suresh Siddha 
---
 arch/x86/include/asm/fpu-internal.h |2 +-
 arch/x86/kernel/xsave.c |   10 --
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index 35ad161..5779184 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -22,7 +22,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_IA32_EMULATION
+#ifdef CONFIG_X86_64
 # include 
 # include 
 int ia32_setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 2917e34..a23d100 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -205,8 +205,8 @@ static inline int save_user_xstate(struct xsave_struct 
__user *buf)
else
err = fsave_user((struct i387_fsave_struct __user *) buf);
 
-   if (unlikely(err))
-   __clear_user(buf, xstate_size);
+   if (unlikely(err) && __clear_user(buf, xstate_size))
+   err = -EFAULT;
return err;
 }
 
@@ -236,6 +236,9 @@ int save_xstate_sig(void __user *buf, void __user *buf_fx, 
int size)
struct task_struct *tsk = current;
int ia32_fxstate = (buf != buf_fx);
 
+

Re: [PATCH 02/17] perf: Add ability to attach user level registers dump to sample

2012-07-26 Thread Stephane Eranian

On Wed, Jul 25, 2012 at 8:27 PM, Jiri Olsa  wrote:
> On Wed, Jul 25, 2012 at 07:39:18PM +0200, Stephane Eranian wrote:
>> On Sun, Jul 22, 2012 at 2:14 PM, Jiri Olsa  wrote:
>
> SNIP
>
>> > +   if (sample_type & PERF_SAMPLE_REGS_USER) {
>> > +   u64 avail = (data->regs_user != NULL);
>> > +
>> > +   /*
>> > +* If there are no regs to dump, notice it through
>> > +* first u64 being zero.
>> > +*/
>> > +   perf_output_put(handle, avail);
>> > +
>> The only role of avail is to report whether or not you've captured actual
>> registers. Could it be used to report the sampled process ABI (32 vs. 64)
>> instead? Something like:
>>   PERF_SAMPLE_REGS_ABI_NONE -> no regs captured (emulate your
>> current behavior)
>>   PERF_SAMPLE_REGS_ABI_32 -> 32 bit ABI regs captured
>>   PERF_SAMPLE_REGS_ABI_64 -> 64 bit ABI regs captured
>>
>> That could help the tools interpret the register values.
>
> Yes, I think that could help once we start dealing with compat tasks.
>
You don't control whether or not you capture compat tasks. So
you have to deal with those right now.

> The current userspace code stays untouched, because it checks for
> 'avail != 0', which stays even with your change.
>
> I think this could be sent later with all other fixes I'm already
> working on. But I can work/send it preferentially before whole patchset
> is taken if you like.
>
Well, why not do it now. You'd have to rename the available field
into something more sensible. Also need to prepare it for future
extension if they ever become necessary.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 6/7] kdb: Mark safe commands as KDB_SAFE and KDB_SAFE_NO_ARGS

2012-07-26 Thread Anton Vorontsov

On Thu, Jul 26, 2012 at 06:07:09PM +0100, Alan Cox wrote:
> > The following commands were marked as "safe":
> > 
> > Clear Breakpoint
> > Enable Breakpoint
> > Disable Breakpoint
> > Display exception frame
> > Stack traceback
> 
> This is sufficient to steal cryptographic keys in many environments. In
> fact you merely need two or three breakpoints and to log the order they
> are hit through the crypto computation.

Neat. :-)

Breakpoints are no good then.

> > Display stack for process
> 
> Exposes all sorts of user data unless you mean just the call trace, in
> which case it's still quite useful.
> 
> > Display stack all processes
> 
> Ditto

What I think is, should we just mark single stepping (as well as
breakpoints) as unsafe, then it's hard to impossible to use the call
trace as something meaningful?

> > Send a signal to a process
> 
> Like say sending SIGSTOP to security monitoring threads or the battery
> manager on locked devices that rely on software battery management ?

Yeah, will need to zap it too.

> It's an interesting idea but you need almost nothing to extract keys from
> a system or to subvert it.

Apart from the above issues?

(Now it might seem that we cut almost everything from the KDB, but KDB is
not just about ordinary debugging facilities, like breakpoints or
variables watch. KDB is a shell that also implements commands to query
kernel about its state: e.g. in Android case there is "irqs" commands that
just shows interrupts counters, that is a nice feature if used w/ KDB
NMI/FIQ debugger[1], so you can see which interrupt is misbehaving.
There is also a 'dmesg' command, and 'summary' and maybe others.)

Thanks!

[1] http://lwn.net/Articles/506673/

-- 
Anton Vorontsov
Email: cbouatmai...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] uprobes: don't enable/disable signle step if the user did it

2012-07-26 Thread Sebastian Andrzej Siewior


On 07/26/2012 05:20 PM, Sebastian Andrzej Siewior wrote:

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f935327..772eb3a 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1528,7 +1528,10 @@ static void handle_swbp(struct pt_regs *regs)

utask->state = UTASK_SSTEP;
if (!pre_ssout(uprobe, regs, bp_vaddr)) {
-   user_enable_single_step(current);
+   if (test_tsk_thread_flag(current, TIF_SINGLESTEP))
+   uprobe->flags |= UPROBE_USER_SSTEP;
+   else
+   user_enable_single_step(current);


After looking at it for a bit I noticed that the state should be saved
in utask intead of uprobe because uprobe might be shared with another
task.
I would resend the fixed patch unless someone comes up with something
else..

Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Shorten constant names for EFI variable attributes

2012-07-26 Thread Matthew Garrett

On Thu, Jul 26, 2012 at 11:28:32AM -0600, Khalid Aziz wrote:

> I also do not believe that kernel must use the constant names
> mentioned in the specification especially when the name reaches 50
> characters. We can not get away from having to create aliases. Do
> you think having aliases in efi.h can cause mixed use of long names
> and short names in future code in the kernel? Can we address this by
> suggesting to future code authors that they should use the short
> names in their code? Should we consider inclusion of this patch in
> the kernel?

I'd be surprised if it were a problem - we should catch any of those 
cases in code review, or gate the aliases under #ifndef __KERNEL__

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v8 0/7] power management patch set

2012-07-26 Thread Kumar Gala


On Jul 26, 2012, at 9:02 AM, Li Yang wrote:

> On Fri, Jul 20, 2012 at 8:42 PM, Zhao Chenhui
>  wrote:
>> Changes for v8:
>> * Separated the cpu hotplug patch into three patches, as follows
>> [PATCH v8 1/7] powerpc/smp: use a struct epapr_spin_table to replace macros
>> [PATCH v8 2/7] powerpc/smp: add generic_set_cpu_up() to set cpu_state as 
>> CPU_UP_PREPARE
>> [PATCH v8 4/7] powerpc/85xx: add HOTPLUG_CPU support
>> 
>> * Replaced magic numbers with macros in "[PATCH 5/7] powerpc/85xx: add sleep 
>> and deep sleep support"
>> 
>> * no change to the rest of the patch set
> 
> Hi Kumar,
> 
> How about picking about this series for 3.6?  The review seems to
> settle down for this revision.

Its too late for 3.6, but will look at queuing it up for 3.7.

> Hi Scott,
> 
> Thanks for the review comments provided.  We'd like to get the ACK
> from you for the series if you can.
> 
> Regards,
> Leo

- k--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1352 matches

Mail list logo