Re: [dm-devel] [PATCH v5 12/12] block: Only clone bio vecs that are in use

2012-08-09 Thread Tejun Heo
Hello,

On Wed, Aug 08, 2012 at 04:47:46PM -0700, Muthu Kumar wrote:
> You are changing the meaning of __bio_clone() here. In old code, the
> number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
> code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
> bi_iovec[0] and also restricting the number of allocated io_vecs of
> the clone. It may be useful for cases were we would like a identical
> copy of the original bio (may not be in current code base, but this
> implementation is definitely not what one would expect from the name
> "clone").

Implementation details changed somewhat but the high-level semantics
didn't change at all.  Any driver not messing with bio internals - and
they shouldn't - shouldn't notice the change.  No in-kernel drivers
seem to be broken by the change.  If you ask me, this looks more like
a bug fix to me where the bug is a silly behavior restricting
usefulness of the interface.

> May be, call this new implementation some thing else (and use it for bcache)?

This doesn't only change __bio_clone() but all clone interface stacked
on top of it, so, no way.  This ain't windows.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda/realtek - Keep analog bypass muted on ALC892

2012-08-09 Thread Takashi Iwai
At Thu,  9 Aug 2012 08:47:19 +0200,
Thierry Reding wrote:
> 
> Unmuting the analog bypass causes any input to be directly bypassed to
> the outputs, which is usually not expected by the user.
> 
> Signed-off-by: Thierry Reding 
> ---
> Note that this patch could be generalized to include all codec variants
> that have this second channel. As I don't have access to all the data
> sheets I'll leave that decision up to somebody with more hardware
> available.

Hmm, are you really sure that this is exclusive switch between two?
All Realtek codecs with this mixer widget mixes two sources up.
Otherwise it must not be a "mixer" widget at all but it must be a
"selector" widget.

And I haven't received a bug report regarding this although ALC892 is
no new chip and the auto-parser has been enabled for so long time.


thanks,

Takashi


> 
>  sound/pci/hda/patch_realtek.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
> index 4f81dd4..54d0917 100644
> --- a/sound/pci/hda/patch_realtek.c
> +++ b/sound/pci/hda/patch_realtek.c
> @@ -3752,8 +3752,15 @@ static void alc_auto_set_output_and_unmute(struct 
> hda_codec *codec,
>   if (nid_has_mute(codec, mix, HDA_INPUT)) {
>   snd_hda_codec_write(codec, mix, 0, AC_VERB_SET_AMP_GAIN_MUTE,
>   AMP_IN_UNMUTE(0));
> - snd_hda_codec_write(codec, mix, 0, AC_VERB_SET_AMP_GAIN_MUTE,
> - AMP_IN_UNMUTE(1));
> + /*
> +  * On ALC892, the second entry in the connection list is the
> +  * analog bypass. Unmuting it will cause any inputs to be
> +  * directly forwarded to the corresponding outputs.
> +  */
> + if (codec->subsystem_id != 0x10ec0892)
> + snd_hda_codec_write(codec, mix, 0,
> + AC_VERB_SET_AMP_GAIN_MUTE,
> + AMP_IN_UNMUTE(1));
>   }
>   /* initialize volume */
>   nid = alc_look_for_out_vol_nid(codec, pin, dac);
> -- 
> 1.7.11.4
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 10/12] block: Add bio_clone_kmalloc()

2012-08-09 Thread Kent Overstreet
On Wed, Aug 08, 2012 at 11:55:04PM -0700, Tejun Heo wrote:
> On Wed, Aug 08, 2012 at 06:57:04PM -0700, Kent Overstreet wrote:
> > That means bio_clone_kmalloc will just become:
> > 
> > static inline struct bio *bio_clone_kmalloc(struct bio *bio,
> > gfp_t gfp_mask)
> > {
> > return bio_clone_bioset(bio, gfp_mask, BIO_KMALLOC_POOL)
> > }
> > 
> > (or maybe NULL there, I think using NULL for the interface makes sense,
> > I just don't want to use it for bi_pool).
> > 
> > Do you still want the /** for a one line wrapper like that?
> 
> I don't know.  But do you think you can do similar thing to alloc
> interface too?

Already did:

commit 313e0a46b1681a8e02b2fe9a86cfc3b82599be58
Author: Kent Overstreet 
Date:   Wed Aug 8 20:30:16 2012 -0700

block: Add bio_clone_bioset(), bio_clone_kmalloc()

Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet 

diff --git a/block/blk-core.c b/block/blk-core.c
index e9058c2..10a6e08 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2768,16 +2768,10 @@ int blk_rq_prep_clone(struct request *rq, struct 
request *rq_src,
blk_rq_init(NULL, rq);
 
__rq_for_each_bio(bio_src, rq_src) {
-   bio = bio_alloc_bioset(gfp_mask, bio_src->bi_max_vecs, bs);
+   bio = bio_clone_bioset(bio_src, gfp_mask, bs);
if (!bio)
goto free_and_out;
 
-   __bio_clone(bio, bio_src);
-
-   if (bio_integrity(bio_src) &&
-   bio_integrity_clone(bio, bio_src, gfp_mask, bs))
-   goto free_and_out;
-
if (bio_ctr && bio_ctr(bio, bio_src, data))
goto free_and_out;
 
diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
index 87311eb..1bbc681 100644
--- a/drivers/block/osdblk.c
+++ b/drivers/block/osdblk.c
@@ -266,11 +266,10 @@ static struct bio *bio_chain_clone(struct bio *old_chain, 
gfp_t gfpmask)
struct bio *tmp, *new_chain = NULL, *tail = NULL;
 
while (old_chain) {
-   tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+   tmp = bio_clone_kmalloc(old_chain, gfpmask);
if (!tmp)
goto err_out;
 
-   __bio_clone(tmp, old_chain);
tmp->bi_bdev = NULL;
gfpmask &= ~__GFP_WAIT;
tmp->bi_next = NULL;
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a8f5cdc..d978f7e 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1105,8 +1105,8 @@ static void __issue_target_request(struct clone_info *ci, 
struct dm_target *ti,
 * ci->bio->bi_max_vecs is BIO_INLINE_VECS anyway, for both flush
 * and discard, so no need for concern about wasted bvec allocations.
 */
-   clone = bio_alloc_bioset(GFP_NOIO, ci->bio->bi_max_vecs, ci->md->bs);
-   __bio_clone(clone, ci->bio);
+   clone = bio_clone_bioset(ci->bio, GFP_NOIO, ci->md->bs);
+
if (len) {
clone->bi_sector = ci->sector;
clone->bi_size = to_bytes(len);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f9d16dc..069c3bc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -173,28 +173,10 @@ EXPORT_SYMBOL_GPL(bio_alloc_mddev);
 struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
struct mddev *mddev)
 {
-   struct bio *b;
-
if (!mddev || !mddev->bio_set)
return bio_clone(bio, gfp_mask);
 
-   b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, mddev->bio_set);
-   if (!b)
-   return NULL;
-
-   __bio_clone(b, bio);
-   if (bio_integrity(bio)) {
-   int ret;
-
-   ret = bio_integrity_clone(b, bio, gfp_mask, mddev->bio_set);
-
-   if (ret < 0) {
-   bio_put(b);
-   return NULL;
-   }
-   }
-
-   return b;
+   return bio_clone_bioset(bio, gfp_mask, mddev->bio_set);
 }
 EXPORT_SYMBOL_GPL(bio_clone_mddev);
 
diff --git a/fs/bio.c b/fs/bio.c
index c0b9bf3..71f1ac5 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -419,16 +419,19 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
 EXPORT_SYMBOL(__bio_clone);
 
 /**
- * bio_clone   -   clone a bio
+ * bio_clone_bioset -  clone a bio
  * @bio: bio to clone
  * @gfp_mask: allocation priority
+ * @bs: bio_set to allocate from
  *
  * Like __bio_clone, only also allocates the returned bio
  */
-struct bio *bio_clone(struct bio *bio, gfp_t gfp_ma

Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> At Thu,  9 Aug 2012 08:45:23 +0200,
> Thierry Reding wrote:
> > 
> > Recent changes to the firmware loading helpers cause drivers to stall
> > when firmware is loaded during the module_init() call. The snd-hda-intel
> > module requests firmware if the patch= parameter is used to load a patch
> > file. This patch works around the problem by deferring the probe in such
> > cases, which will cause the module to load successfully and the driver
> > binding to the device outside the module_init() call.
> 
> Is the "recent" change meant 3.6 kernel, or in linux-next?
> 
> In anyway, I don't understand why such a change was allowed.  Most
> drivers do call request_firmware() at the device probing time.
> If this really has to be resolved in the driver side, it must be a bug
> in the firmware loader core code.

A good explanation of the problem and subsequent discussion can be found
here:


http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975

Thierry


pgpjtEZRoIpvj.pgp
Description: PGP signature


RE: [PATCH 6/7] regulator: lp872x: set_current_limit should select the maximum current in specific range

2012-08-09 Thread Kim, Milo
> -Original Message-
> From: Axel Lin [mailto:axel@gmail.com]
> Sent: Wednesday, August 08, 2012 9:23 PM
> To: Mark Brown
> Cc: Kim, Milo; Girdwood, Liam; linux-kernel@vger.kernel.org
> Subject: [PATCH 6/7] regulator: lp872x: set_current_limit should select
> the maximum current in specific range
> 
> Signed-off-by: Axel Lin 
> ---
>  drivers/regulator/lp872x.c |   18 --
>  1 file changed, 8 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/regulator/lp872x.c b/drivers/regulator/lp872x.c
> index 212c38e..6199d0f 100644
> --- a/drivers/regulator/lp872x.c
> +++ b/drivers/regulator/lp872x.c
> @@ -374,8 +374,8 @@ static int lp8725_buck_set_current_limit(struct
> regulator_dev *rdev,
>  {
>   struct lp872x *lp = rdev_get_drvdata(rdev);
>   enum lp872x_regulator_id buck = rdev_get_id(rdev);
> - int i, max = ARRAY_SIZE(lp8725_buck_uA);
> - u8 addr, val;
> + int i;
> + u8 addr;
> 
>   switch (buck) {
>   case LP8725_ID_BUCK1:
> @@ -388,17 +388,15 @@ static int lp8725_buck_set_current_limit(struct
> regulator_dev *rdev,
>   return -EINVAL;
>   }
> 
> - for (i = 0 ; i < max ; i++)
> + for (i = ARRAY_SIZE(lp8725_buck_uA) - 1 ; i >= 0; i--) {
>   if (lp8725_buck_uA[i] >= min_uA &&
>   lp8725_buck_uA[i] <= max_uA)
> - break;
> -
> - if (i == max)
> - return -EINVAL;
> -
> - val = i << LP8725_BUCK_CL_S;
> + return lp872x_update_bits(lp, addr,
> +   LP8725_BUCK_CL_M,
> +   i << LP8725_BUCK_CL_S);
> + }
> 
> - return lp872x_update_bits(lp, addr, LP8725_BUCK_CL_M, val);
> + return -EINVAL;
>  }
> 
>  static int lp8725_buck_get_current_limit(struct regulator_dev *rdev)
> --
> 1.7.9.5
> 
> 

It looks a semantic patch rather than operation issue.
Could you let me know why we need this patch in more details ?
Thank you, all the time !

Best Regards,
Milo


Re: [RFC PATCH v2 10/16] ACPIHP: system device hotplug driver skeleton

2012-08-09 Thread Tang Chen
Hi Liu~

I compiled this driver as a module, acpihp_drv. And when I loaded this module, 
it
gave the following error message:

# modprobe acpihp_drv
(the command hangs up, no return after 10 min)

#dmesg
..
[  126.643350] BUG: unable to handle kernel NULL pointer dereference at 
0078
[  126.644007] IP: [] mutex_lock+0x19/0x37
[  126.644007] PGD 105277a067 PUD 104f823067 PMD 0 
[  126.644007] Oops: 0002 [#1] SMP 
[  126.644007] Modules linked in: acpihp_drv(+) ebtable_nat ebtables 
ipt_MASQUERADE iptable_nat nf_nat iptable_mangle bridge stp llc sunrpc 
cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 
nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun 
uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel 
microcode lpc_ich mfd_core pcspkr i2c_i801 i2c_core ioatdma e1000e 
acpi_memhotplug i7core_edac edac_core igb dca mptsas mptscsih mptbase 
scsi_transport_sas
[  126.644007] CPU 10 
[  126.644007] Pid: 2821, comm: modprobe Tainted: G   A 3.6.0-rc1+ #6 
FUJITSU-SV PRIMEQUEST 1800E/SB
[  126.644007] RIP: 0010:[]  [] 
mutex_lock+0x19/0x37
[  126.644007] RSP: 0018:8810589a9de8  EFLAGS: 00010246
[  126.644007] RAX:  RBX: 0078 RCX: 
[  126.644007] RDX:  RSI:  RDI: 0078
[  126.644007] RBP: 8810589a9e08 R08:  R09: 8810589a9d88
[  126.644007] R10: 13e5 R11: 13e5 R12: a01460d0
[  126.644007] R13:  R14: a014732b R15: 00bf
[  126.644007] FS:  7fecb1802700() GS:88105e64() 
knlGS:
[  126.644007] CS:  0010 DS:  ES:  CR0: 80050033
[  126.644007] CR2: 0078 CR3: 001052772000 CR4: 07e0
[  126.644007] DR0:  DR1:  DR2: 
[  126.644007] DR3:  DR6: 0ff0 DR7: 0400
[  126.644007] Process modprobe (pid: 2821, threadinfo 8810589a8000, task 
8810592f8000)
[  126.644007] Stack:
[  126.644007]  8810589a9e08 810be37f a0146220 
81a7b390
[  126.644007]  8810589a9e58 81317eb3 8810589a9e48 

[  126.644007]  81a342c0 81a342e0  
a0146220
[  126.644007] Call Trace:
[  126.644007]  [] ? tracepoint_module_notify+0xd9/0x14a
[  126.644007]  [] class_interface_register+0x4a/0xbc
[  126.644007]  [] ? 0xa00b7fff
[  126.644007]  [] acpihp_drv_init+0x10/0x12 [acpihp_drv]
[  126.644007]  [] do_one_initcall+0x7f/0x139
[  126.644007]  [] sys_init_module+0x12d3/0x14e3
[  126.644007]  [] ? ddebug_dyndbg_boot_param_cb+0x45/0x45
[  126.644007]  [] system_call_fastpath+0x16/0x1b
[  126.644007] Code: 48 8b 04 25 80 c6 00 00 48 89 43 18 31 c0 5b 5b c9 c3 55 
48 89 e5 53 48 83 ec 18 66 66 66 66 90 48 89 fb e8 5a 0c 00 00 48 89 df  ff 
0f 79 05 e8 06 ff ff ff 65 48 8b 04 25 80 c6 00 00 48 89 
[  126.644007] RIP  [] mutex_lock+0x19/0x37
[  126.644007]  RSP 
[  126.644007] CR2: 0078
[  129.981335] ---[ end trace da17e9c9de8dd560 ]---
[  139.085895] nr_pdflush_threads exported in /proc is scheduled for removal
[  139.167394] sysctl: The scan_unevictable_pages sysctl/node-interface has 
been disabled for lack of a legitimate use case.  If you have one, please send 
an email to linux...@kvack.org.

Looks like it dereferenced a NULL pointer here.
May be it was my mistake that I didn't configure the environment correctly.
Would you please give me some advice ?

Thanks. :)


On 08/04/2012 08:13 PM, Jiang Liu wrote:
> From: Jiang Liu 
> 
> This patch implements a skeleton for ACPI based system device hotplug driver.
> This device class driver will be bound to and manage ACPI hotplug slots.
> 
> This is the default hotplug driver for ACPI based system device hotplug.
> 
> Signed-off-by: Jiang Liu 
> Signed-off-by: Hanjun Guo 
> ---
>  drivers/acpi/Kconfig  |   12 ++
>  drivers/acpi/hotplug/Makefile |3 +
>  drivers/acpi/hotplug/acpihp_drv.h |   62 +++
>  drivers/acpi/hotplug/drv_main.c   |  331 
> +
>  4 files changed, 408 insertions(+)
>  create mode 100644 drivers/acpi/hotplug/acpihp_drv.h
>  create mode 100644 drivers/acpi/hotplug/drv_main.c
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index c9f7918..89047a3 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -354,6 +354,18 @@ config ACPI_HOTPLUG_ENUM_EJ0
>  
> It's the default method to detect ACPI hotplug slots.
>  
> +config ACPI_HOTPLUG_DRIVER
> + tristate "ACPI Based System Device Hotplug Driver"
> + depends on ACPI_HOTPLUG
> + default y
> + help
> +   This driver enables ACPI based system device hotplug, including
> +   physical processor, memory device, IO host bridge and computer
> +   node etc.
> +
> +   To compile

[PATCH net-next] time: jiffies_delta_to_clock_t() helper to the rescue

2012-08-09 Thread Eric Dumazet
From: Eric Dumazet 

Various /proc/net files sometimes report crazy timer values, expressed
in clock_t units.

This happens when an expired timer delta (expires - jiffies) is passed
to jiffies_to_clock_t().

This function has an overflow in :

return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC / USER_HZ);

commit cbbc719fccdb8cb (time: Change jiffies_to_clock_t() argument type
to unsigned long) only got around the problem.

As we cant output negative values in /proc/net/tcp without breaking
various tools, I suggest adding a jiffies_delta_to_clock_t() wrapper
that caps the negative delta to a 0 value.

Signed-off-by: Eric Dumazet 
Reported-by: Maciej Żenczykowski 
Cc: Thomas Gleixner 
Cc: Paul Gortmaker 
Cc: Andrew Morton 
Cc: hank 
---
 include/linux/jiffies.h   |6 ++
 net/bridge/br_fdb.c   |2 +-
 net/bridge/br_stp_timer.c |2 +-
 net/core/rtnetlink.c  |2 +-
 net/ipv4/igmp.c   |7 +--
 net/ipv4/tcp_ipv4.c   |   13 +
 net/ipv6/tcp_ipv6.c   |9 +++--
 7 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 265e2c3..aded9b1 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -303,7 +303,13 @@ extern void jiffies_to_timespec(const unsigned long 
jiffies,
 extern unsigned long timeval_to_jiffies(const struct timeval *value);
 extern void jiffies_to_timeval(const unsigned long jiffies,
   struct timeval *value);
+
 extern clock_t jiffies_to_clock_t(unsigned long x);
+static inline clock_t jiffies_delta_to_clock_t(long delta)
+{
+   return jiffies_to_clock_t(max(0L, delta));
+}
+
 extern unsigned long clock_t_to_jiffies(unsigned long x);
 extern u64 jiffies_64_to_clock_t(u64 x);
 extern u64 nsec_to_clock_t(u64 x);
diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index d21f323..9ce430b 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -312,7 +312,7 @@ int br_fdb_fillbuf(struct net_bridge *br, void *buf,
 
fe->is_local = f->is_local;
if (!f->is_static)
-   fe->ageing_timer_value = 
jiffies_to_clock_t(jiffies - f->updated);
+   fe->ageing_timer_value = 
jiffies_delta_to_clock_t(jiffies - f->updated);
++fe;
++num;
}
diff --git a/net/bridge/br_stp_timer.c b/net/bridge/br_stp_timer.c
index a6747e6..c3530a8 100644
--- a/net/bridge/br_stp_timer.c
+++ b/net/bridge/br_stp_timer.c
@@ -170,5 +170,5 @@ void br_stp_port_timer_init(struct net_bridge_port *p)
 unsigned long br_timer_value(const struct timer_list *timer)
 {
return timer_pending(timer)
-   ? jiffies_to_clock_t(timer->expires - jiffies) : 0;
+   ? jiffies_delta_to_clock_t(timer->expires - jiffies) : 0;
 }
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 2c5a0a0..db037c9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -618,7 +618,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst, u32 id,
   long expires, u32 error)
 {
struct rta_cacheinfo ci = {
-   .rta_lastuse = jiffies_to_clock_t(jiffies - dst->lastuse),
+   .rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse),
.rta_used = dst->__use,
.rta_clntref = atomic_read(&(dst->__refcnt)),
.rta_error = error,
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 6699f23..0b5580c 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -2435,6 +2435,8 @@ static int igmp_mc_seq_show(struct seq_file *seq, void *v)
struct ip_mc_list *im = (struct ip_mc_list *)v;
struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq);
char   *querier;
+   long delta;
+
 #ifdef CONFIG_IP_MULTICAST
querier = IGMP_V1_SEEN(state->in_dev) ? "V1" :
  IGMP_V2_SEEN(state->in_dev) ? "V2" :
@@ -2448,11 +2450,12 @@ static int igmp_mc_seq_show(struct seq_file *seq, void 
*v)
   state->dev->ifindex, state->dev->name, 
state->in_dev->mc_count, querier);
}
 
+   delta = im->timer.expires - jiffies;
seq_printf(seq,
   "\t\t\t\t%08X %5d %d:%08lX\t\t%d\n",
   im->multiaddr, im->users,
-  im->tm_running, im->tm_running ?
-  jiffies_to_clock_t(im->timer.expires-jiffies) : 0,
+  im->tm_running,
+  im->tm_running ? jiffies_delta_to_clock_t(delta) : 0,
   im->reporter);
}
return 0;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 42b2a6a..c660d2c 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2385,7 +2385,7 @@ static void get_openreq4

Re: [PATCH] ALSA: hda/realtek - Keep analog bypass muted on ALC892

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 09:02:10AM +0200, Takashi Iwai wrote:
> At Thu,  9 Aug 2012 08:47:19 +0200,
> Thierry Reding wrote:
> > 
> > Unmuting the analog bypass causes any input to be directly bypassed to
> > the outputs, which is usually not expected by the user.
> > 
> > Signed-off-by: Thierry Reding 
> > ---
> > Note that this patch could be generalized to include all codec variants
> > that have this second channel. As I don't have access to all the data
> > sheets I'll leave that decision up to somebody with more hardware
> > available.
> 
> Hmm, are you really sure that this is exclusive switch between two?
> All Realtek codecs with this mixer widget mixes two sources up.
> Otherwise it must not be a "mixer" widget at all but it must be a
> "selector" widget.

It's not an exclusive switch. But the second input is the analog bypass
that basically takes the signal directly from the input path and routes
it to the output.

> And I haven't received a bug report regarding this although ALC892 is
> no new chip and the auto-parser has been enabled for so long time.

I've seen this happen when feeding a signal into the CD-IN (NID 0x1c),
which is then directly routed to the output, so I actually hear that
signal at the output, even if there is no software loopback to capture
the input and play it back via the DAC.

This is probably not a very common use-case, so maybe nobody else has
come across it. Looking at the block diagram in the ALC892 datasheet
(page 5) clearly shows where the input of this second switch comes from.
I don't think this is intended to be unmuted by default, but only in
very special cases where you actually want to pass the analog input
signal directly to the output without going through the ADC and back out
the DAC.

Thierry


pgp5OOIWOVZQ2.pgp
Description: PGP signature


Re: [PATCH v5 08/12] block: Introduce new bio_split()

2012-08-09 Thread Tejun Heo
On Wed, Aug 08, 2012 at 06:39:23PM -0700, Kent Overstreet wrote:
> On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> > One more thing.
> > 
> > On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > > + if (bio_integrity(bio)) {
> > > + bio_integrity_clone(ret, bio, gfp, bs);
> > > + bio_integrity_trim(ret, 0, bio_sectors(ret));
> > > + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
> > 
> > Is this equivalent to bio_integrity_split() performance-wise?
> 
> Strictly speaking, no. But it has the advantage of being drastically
> simpler - and the only one only worked for single page bios so I
> would've had to come up with something new from scratch, and as
> confusing as the integrity stuff is I wouldn't trust the result.

There's already bio_integrity_split() and you're actively dropping it.

> I'm skeptical that it's going to matter in practice given how much
> iteration is done elsewhere in the course of processing a bio and given
> that this stuff isn't used with high end SSDs...

If you think the active dropping is justified, please let the change
and justification clearly stated.  You're burying the active change in
two separate patches without even mentioning it or cc'ing people who
care about bio-integrity (Martin K. Petersen).  Ummm this is
simply unacceptable and makes me a lot more suspicious about the
patchset.  Please be explicit about changes you make.  Peer-review
breaks down unless such trust can be maintained.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Using vanilla kernels headers to do backport testing -- issues with memcpy()

2012-08-09 Thread Sedat Dilek
On Thu, Aug 9, 2012 at 5:30 AM, Luis R. Rodriguez  wrote:
> On Wed, Aug 8, 2012 at 5:43 AM, Ozan Çağlayan  wrote:
>> On Tue, Aug 7, 2012 at 10:28 PM, Luis R. Rodriguez  
>> wrote:
>
>>> mcgrof@garbanzo ~/compat (git::master)$ objdump -T
>>> /home/mcgrof/compat-ksrc/lib/modules/3.4.4-030404-generic/build/scripts/genksyms/genksyms
>
> <-- snip -->
>
>>> Bleh:
>>>
>>>   DF *UND*    GLIBC_2.14  memcpy
>
> <-- snip -->
>
>> Can't we just rebuild genksyms in compat? I already have glibc 2.15 on
>> my system so I can't test this but normally this should adapt to the
>> build environment as it rebuilds and relinks the binary.
>
> Indeed! Implemented and pushed upstream:
>
> https://github.com/mcgrof/compat/commit/42faf2dc8d8bbbdc5b8913183fcd021a27e953c9
>

I am not an English native but this sentence sounds a bit strange to
me ("used" followed by "require" and double "Linux"?).
I can imagine what you want to express but have no better wording right now.

180 +  echo -e "   This was added since kernels >= 3.4 used require"
181 +  echo -e "   a glibc >= 2.14 for memcpy(), and not all Linux"
182 +  echo -e "   Linux distributions have such glibc.

- Sedat -

> So if you run into this issue all you have to do now is run:
>
> ./bin/get-compat-kernels -r
>
> New users of the script will automatically have this run after it
> downloads and installs the kernel headers.
>
> Case closed, thanks :)
>
>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/22] ARM: add mechanism for late code patching

2012-08-09 Thread Tixy
On Wed, 2012-08-08 at 12:56 -0400, Nicolas Pitre wrote:
> On Wed, 8 Aug 2012, Russell King - ARM Linux wrote:
> > Done correctly, it shouldn't be a problem, but I wouldn't say that
> > arch/arm/kernel/kprobes-test*.c is done correctly.  It's seen quite
> > a number of patching attempts since it was introduced for various
> > problems, and I've seen quite a number of builds fail for various
> > reasons in this file (none which I could be bothered to investigate.)

> >
> As for build failures with that test code, I'd suggest you simply drop a 
> note to Tixy who is normally very responsive.

Indeed. If there are build failures, I'm happy to investigate and fix.

-- 
Tixy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Takashi Iwai
At Thu, 9 Aug 2012 09:08:13 +0200,
Thierry Reding wrote:
> 
> On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > At Thu,  9 Aug 2012 08:45:23 +0200,
> > Thierry Reding wrote:
> > > 
> > > Recent changes to the firmware loading helpers cause drivers to stall
> > > when firmware is loaded during the module_init() call. The snd-hda-intel
> > > module requests firmware if the patch= parameter is used to load a patch
> > > file. This patch works around the problem by deferring the probe in such
> > > cases, which will cause the module to load successfully and the driver
> > > binding to the device outside the module_init() call.
> > 
> > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > 
> > In anyway, I don't understand why such a change was allowed.  Most
> > drivers do call request_firmware() at the device probing time.
> > If this really has to be resolved in the driver side, it must be a bug
> > in the firmware loader core code.
> 
> A good explanation of the problem and subsequent discussion can be found
> here:
> 
>   
> http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975

Yeah, but it doesn't justify this ugly module option.
It's a simple bug.  Papering over it with this option doesn't fix
anything.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 08/12] block: Introduce new bio_split()

2012-08-09 Thread Kent Overstreet
On Thu, Aug 09, 2012 at 12:22:17AM -0700, Tejun Heo wrote:
> On Wed, Aug 08, 2012 at 06:39:23PM -0700, Kent Overstreet wrote:
> > On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> > > One more thing.
> > > 
> > > On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > > > +   if (bio_integrity(bio)) {
> > > > +   bio_integrity_clone(ret, bio, gfp, bs);
> > > > +   bio_integrity_trim(ret, 0, bio_sectors(ret));
> > > > +   bio_integrity_trim(bio, bio_sectors(ret), 
> > > > bio_sectors(bio));
> > > 
> > > Is this equivalent to bio_integrity_split() performance-wise?
> > 
> > Strictly speaking, no. But it has the advantage of being drastically
> > simpler - and the only one only worked for single page bios so I
> > would've had to come up with something new from scratch, and as
> > confusing as the integrity stuff is I wouldn't trust the result.
> 
> There's already bio_integrity_split() and you're actively dropping it.

Because it only works for single page bios, AFAICT. I'd have to start
from scratch.

> > I'm skeptical that it's going to matter in practice given how much
> > iteration is done elsewhere in the course of processing a bio and given
> > that this stuff isn't used with high end SSDs...
> 
> If you think the active dropping is justified, please let the change
> and justification clearly stated.  You're burying the active change in
> two separate patches without even mentioning it or cc'ing people who
> care about bio-integrity (Martin K. Petersen). 

Not intentionally, he isn't in MAINTAINERS so get_maintainers.pl missed
it and it slipped by while I was looking for people to CC. Added him.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda/realtek - Keep analog bypass muted on ALC892

2012-08-09 Thread Takashi Iwai
At Thu, 9 Aug 2012 09:18:48 +0200,
Thierry Reding wrote:
> 
> On Thu, Aug 09, 2012 at 09:02:10AM +0200, Takashi Iwai wrote:
> > At Thu,  9 Aug 2012 08:47:19 +0200,
> > Thierry Reding wrote:
> > > 
> > > Unmuting the analog bypass causes any input to be directly bypassed to
> > > the outputs, which is usually not expected by the user.
> > > 
> > > Signed-off-by: Thierry Reding 
> > > ---
> > > Note that this patch could be generalized to include all codec variants
> > > that have this second channel. As I don't have access to all the data
> > > sheets I'll leave that decision up to somebody with more hardware
> > > available.
> > 
> > Hmm, are you really sure that this is exclusive switch between two?
> > All Realtek codecs with this mixer widget mixes two sources up.
> > Otherwise it must not be a "mixer" widget at all but it must be a
> > "selector" widget.
> 
> It's not an exclusive switch. But the second input is the analog bypass
> that basically takes the signal directly from the input path and routes
> it to the output.

Ah so you mean it as the analog loopback path.
Normal users even want this often.

> > And I haven't received a bug report regarding this although ALC892 is
> > no new chip and the auto-parser has been enabled for so long time.
> 
> I've seen this happen when feeding a signal into the CD-IN (NID 0x1c),
> which is then directly routed to the output, so I actually hear that
> signal at the output, even if there is no software loopback to capture
> the input and play it back via the DAC.
> 
> This is probably not a very common use-case, so maybe nobody else has
> come across it. Looking at the block diagram in the ALC892 datasheet
> (page 5) clearly shows where the input of this second switch comes from.
> I don't think this is intended to be unmuted by default, but only in
> very special cases where you actually want to pass the analog input
> signal directly to the output without going through the ADC and back out
> the DAC.

Each input has the analog loopback mixer switch (and off as default).
Why it doesn't suffice?


thanks,

Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/10] leds: renesas: fix error handling

2012-08-09 Thread Arnd Bergmann
On Thursday 09 August 2012, Bryan Wu wrote:
> 
> On Thu, Aug 9, 2012 at 5:27 AM, Arnd Bergmann  wrote:
> > bfe4c041 "leds: convert Renesas TPU LED driver to devm_kzalloc() and
> > cleanup error exit path" introduced a possible case in which r_tpu_probe
> > calls iounmap on a wild pointer. This changes the one case that was
> > missed in the same way as the other error paths.
> >
> > Without this patch, building kota2_defconfig results in:
> >
> > drivers/leds/leds-renesas-tpu.c: In function 'r_tpu_probe':
> > drivers/leds/leds-renesas-tpu.c:246:6: warning: 'ret' may be used 
> > uninitialized in this function [-Wuninitialized]
> > drivers/leds/leds-renesas-tpu.c:308:17: warning: 'p' may be used 
> > uninitialized in this function [-Wuninitialized]
> >
> 
> Thanks, Arnd. If you don't mind, I will apply this fixing through my led tree.
> 

Ok, excellent. One less patch for me to worry about. Dropped from my tree now.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> At Thu, 9 Aug 2012 09:08:13 +0200,
> Thierry Reding wrote:
> > 
> > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > Thierry Reding wrote:
> > > > 
> > > > Recent changes to the firmware loading helpers cause drivers to stall
> > > > when firmware is loaded during the module_init() call. The snd-hda-intel
> > > > module requests firmware if the patch= parameter is used to load a patch
> > > > file. This patch works around the problem by deferring the probe in such
> > > > cases, which will cause the module to load successfully and the driver
> > > > binding to the device outside the module_init() call.
> > > 
> > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > 
> > > In anyway, I don't understand why such a change was allowed.  Most
> > > drivers do call request_firmware() at the device probing time.
> > > If this really has to be resolved in the driver side, it must be a bug
> > > in the firmware loader core code.
> > 
> > A good explanation of the problem and subsequent discussion can be found
> > here:
> > 
> > 
> > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> 
> Yeah, but it doesn't justify this ugly module option.
> It's a simple bug.  Papering over it with this option doesn't fix
> anything.

It's not an option, all it does is defer probing if and only if the
patch parameter was specified to make sure the firmware load won't
stall. I realize that this may not be an optimal solution, but at least
it fixes the problem with no fallout.

A proper fix would require a larger rewrite because it would entail
using the asynchronous firmware load operations. That in turn would
require the initialization to be split into several stages.

Thierry


pgpKBr6h5feRG.pgp
Description: PGP signature


[PATCH] alarmtimer: implement minimum alarm interval for allowing suspend

2012-08-09 Thread Todd Poynor
alarmtimer suspend return -EBUSY if the next alarm will fire in less
than 2 seconds.  This allows one RTC seconds tick to occur subsequent
to this check before the alarm wakeup time is set, ensuring the wakeup
time is still in the future (assuming the RTC does not tick one more
second prior to setting the alarm).

If suspend is rejected due to an imminent alarm, hold a wakeup source
for 2 seconds to process the alarm prior to reattempting suspend.

If setting the alarm incurs an -ETIME for an alarm set in the past,
or any other problem setting the alarm, abort suspend and hold a
wakelock for 1 second while the alarm is allowed to be serviced or
other hopefully transient conditions preventing the alarm clear up.

Signed-off-by: Todd Poynor 
---
 kernel/time/alarmtimer.c |   18 +-
 1 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index aa27d39..f979d85 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -46,6 +46,8 @@ static struct alarm_base {
 static ktime_t freezer_delta;
 static DEFINE_SPINLOCK(freezer_delta_lock);
 
+static struct wakeup_source *ws;
+
 #ifdef CONFIG_RTC_CLASS
 /* rtc timer and device for setting alarm wakeups at suspend */
 static struct rtc_timerrtctimer;
@@ -250,6 +252,7 @@ static int alarmtimer_suspend(struct device *dev)
unsigned long flags;
struct rtc_device *rtc;
int i;
+   int ret;
 
spin_lock_irqsave(&freezer_delta_lock, flags);
min = freezer_delta;
@@ -279,8 +282,10 @@ static int alarmtimer_suspend(struct device *dev)
if (min.tv64 == 0)
return 0;
 
-   /* XXX - Should we enforce a minimum sleep time? */
-   WARN_ON(min.tv64 < NSEC_PER_SEC);
+   if (ktime_to_ns(min) < 2 * NSEC_PER_SEC) {
+   __pm_wakeup_event(ws, 2 * MSEC_PER_SEC);
+   return -EBUSY;
+   }
 
/* Setup an rtc timer to fire that far in the future */
rtc_timer_cancel(rtc, &rtctimer);
@@ -288,9 +293,11 @@ static int alarmtimer_suspend(struct device *dev)
now = rtc_tm_to_ktime(tm);
now = ktime_add(now, min);
 
-   rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
-
-   return 0;
+   /* Set alarm, if in the past reject suspend briefly to handle */
+   ret = rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
+   if (ret < 0)
+   __pm_wakeup_event(ws, 1 * MSEC_PER_SEC);
+   return ret;
 }
 #else
 static int alarmtimer_suspend(struct device *dev)
@@ -821,6 +828,7 @@ static int __init alarmtimer_init(void)
error = PTR_ERR(pdev);
goto out_drv;
}
+   ws = wakeup_source_register("alarmtimer");
return 0;
 
 out_drv:
-- 
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 10/16] ACPIHP: system device hotplug driver skeleton

2012-08-09 Thread Jiang Liu
Hi Tang,
Thanks for testing.
Currently there's a limitation that you need to insert acpihp_enum 
driver first.
Will fix this issue in next version.
Regards!
Gerry

On 2012-8-9 15:12, Tang Chen wrote:
> Hi Liu~
> 
> I compiled this driver as a module, acpihp_drv. And when I loaded this 
> module, it
> gave the following error message:
> 
> # modprobe acpihp_drv
> (the command hangs up, no return after 10 min)
> 
> #dmesg
> ..
> [  126.643350] BUG: unable to handle kernel NULL pointer dereference at 
> 0078
> [  126.644007] IP: [] mutex_lock+0x19/0x37
> [  126.644007] PGD 105277a067 PUD 104f823067 PMD 0 
> [  126.644007] Oops: 0002 [#1] SMP 
> [  126.644007] Modules linked in: acpihp_drv(+) ebtable_nat ebtables 
> ipt_MASQUERADE iptable_nat nf_nat iptable_mangle bridge stp llc sunrpc 
> cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 
> nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun 
> uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel 
> microcode lpc_ich mfd_core pcspkr i2c_i801 i2c_core ioatdma e1000e 
> acpi_memhotplug i7core_edac edac_core igb dca mptsas mptscsih mptbase 
> scsi_transport_sas
> [  126.644007] CPU 10 
> [  126.644007] Pid: 2821, comm: modprobe Tainted: G   A 3.6.0-rc1+ #6 
> FUJITSU-SV PRIMEQUEST 1800E/SB
> [  126.644007] RIP: 0010:[]  [] 
> mutex_lock+0x19/0x37
> [  126.644007] RSP: 0018:8810589a9de8  EFLAGS: 00010246
> [  126.644007] RAX:  RBX: 0078 RCX: 
> 
> [  126.644007] RDX:  RSI:  RDI: 
> 0078
> [  126.644007] RBP: 8810589a9e08 R08:  R09: 
> 8810589a9d88
> [  126.644007] R10: 13e5 R11: 13e5 R12: 
> a01460d0
> [  126.644007] R13:  R14: a014732b R15: 
> 00bf
> [  126.644007] FS:  7fecb1802700() GS:88105e64() 
> knlGS:
> [  126.644007] CS:  0010 DS:  ES:  CR0: 80050033
> [  126.644007] CR2: 0078 CR3: 001052772000 CR4: 
> 07e0
> [  126.644007] DR0:  DR1:  DR2: 
> 
> [  126.644007] DR3:  DR6: 0ff0 DR7: 
> 0400
> [  126.644007] Process modprobe (pid: 2821, threadinfo 8810589a8000, task 
> 8810592f8000)
> [  126.644007] Stack:
> [  126.644007]  8810589a9e08 810be37f a0146220 
> 81a7b390
> [  126.644007]  8810589a9e58 81317eb3 8810589a9e48 
> 
> [  126.644007]  81a342c0 81a342e0  
> a0146220
> [  126.644007] Call Trace:
> [  126.644007]  [] ? tracepoint_module_notify+0xd9/0x14a
> [  126.644007]  [] class_interface_register+0x4a/0xbc
> [  126.644007]  [] ? 0xa00b7fff
> [  126.644007]  [] acpihp_drv_init+0x10/0x12 [acpihp_drv]
> [  126.644007]  [] do_one_initcall+0x7f/0x139
> [  126.644007]  [] sys_init_module+0x12d3/0x14e3
> [  126.644007]  [] ? ddebug_dyndbg_boot_param_cb+0x45/0x45
> [  126.644007]  [] system_call_fastpath+0x16/0x1b
> [  126.644007] Code: 48 8b 04 25 80 c6 00 00 48 89 43 18 31 c0 5b 5b c9 c3 55 
> 48 89 e5 53 48 83 ec 18 66 66 66 66 90 48 89 fb e8 5a 0c 00 00 48 89 df  
> ff 0f 79 05 e8 06 ff ff ff 65 48 8b 04 25 80 c6 00 00 48 89 
> [  126.644007] RIP  [] mutex_lock+0x19/0x37
> [  126.644007]  RSP 
> [  126.644007] CR2: 0078
> [  129.981335] ---[ end trace da17e9c9de8dd560 ]---
> [  139.085895] nr_pdflush_threads exported in /proc is scheduled for removal
> [  139.167394] sysctl: The scan_unevictable_pages sysctl/node-interface has 
> been disabled for lack of a legitimate use case.  If you have one, please 
> send an email to linux...@kvack.org.
> 
> Looks like it dereferenced a NULL pointer here.
> May be it was my mistake that I didn't configure the environment correctly.
> Would you please give me some advice ?
> 
> Thanks. :)
> 
> 
> On 08/04/2012 08:13 PM, Jiang Liu wrote:
>> From: Jiang Liu 
>>
>> This patch implements a skeleton for ACPI based system device hotplug driver.
>> This device class driver will be bound to and manage ACPI hotplug slots.
>>
>> This is the default hotplug driver for ACPI based system device hotplug.
>>
>> Signed-off-by: Jiang Liu 
>> Signed-off-by: Hanjun Guo 
>> ---
>>  drivers/acpi/Kconfig  |   12 ++
>>  drivers/acpi/hotplug/Makefile |3 +
>>  drivers/acpi/hotplug/acpihp_drv.h |   62 +++
>>  drivers/acpi/hotplug/drv_main.c   |  331 
>> +
>>  4 files changed, 408 insertions(+)
>>  create mode 100644 drivers/acpi/hotplug/acpihp_drv.h
>>  create mode 100644 drivers/acpi/hotplug/drv_main.c
>>
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index c9f7918..89047a3 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -354,6 +354,18 @@ config ACP

Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Takashi Iwai
At Thu, 9 Aug 2012 09:36:42 +0200,
Thierry Reding wrote:
> 
> On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> > At Thu, 9 Aug 2012 09:08:13 +0200,
> > Thierry Reding wrote:
> > > 
> > > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > > Thierry Reding wrote:
> > > > > 
> > > > > Recent changes to the firmware loading helpers cause drivers to stall
> > > > > when firmware is loaded during the module_init() call. The 
> > > > > snd-hda-intel
> > > > > module requests firmware if the patch= parameter is used to load a 
> > > > > patch
> > > > > file. This patch works around the problem by deferring the probe in 
> > > > > such
> > > > > cases, which will cause the module to load successfully and the driver
> > > > > binding to the device outside the module_init() call.
> > > > 
> > > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > > 
> > > > In anyway, I don't understand why such a change was allowed.  Most
> > > > drivers do call request_firmware() at the device probing time.
> > > > If this really has to be resolved in the driver side, it must be a bug
> > > > in the firmware loader core code.
> > > 
> > > A good explanation of the problem and subsequent discussion can be found
> > > here:
> > > 
> > >   
> > > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> > 
> > Yeah, but it doesn't justify this ugly module option.
> > It's a simple bug.  Papering over it with this option doesn't fix
> > anything.
> 
> It's not an option, all it does is defer probing if and only if the
> patch parameter was specified to make sure the firmware load won't
> stall. I realize that this may not be an optimal solution, but at least
> it fixes the problem with no fallout.

Ah sorry, I misread the patch.

Then it shouldn't be checked at that point.  Since 3.5 kernel, the
probing code was already split for vga_switcheroo support.  The point
you added is the second stage.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC]block: disable discard request merge temporarily

2012-08-09 Thread Christoph Hellwig
On Thu, Aug 09, 2012 at 01:28:03PM +0800, Shaohua Li wrote:
> The SCSI discard request merge never worked, and looks no solution for in
> future, let's disable it temporarily.
> 
> Signed-off-by: Shaohua Li 

Looks good and is equivalent to what I proposed over a year aggo,

Reviewed-by: Christoph Hellwig 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/10] ARM: rpc: Fix building RiscPC

2012-08-09 Thread Russell King - ARM Linux
On Wed, Aug 08, 2012 at 11:27:57PM +0200, Arnd Bergmann wrote:
> ARMv3 support was removed in 357c9c1f07 "ARM: Remove support for ARMv3
> ARM610 and ARM710 CPUs", which explicitly left parts of the CPU32v3
> support in place for building RiscPC. However, this does not actually
> build in my test setup.
> 
> This is probably not the right solution, but maybe someone has a better
> idea for how to deal with this.

This isn't the right solution.  The RiscPC bus does not support half-word
load/stores, so we must continue to build that target for the 32v3
architecture (which is why I carefully left that stuff behind.)

The mistake is in what I did with arch/arm/lib - the armv3 code in there
needs to be resurected instead.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

2012-08-09 Thread Mel Gorman
On Thu, Aug 09, 2012 at 08:51:27AM +0900, Minchan Kim wrote:
> > > > > Just out of curiosity.
> > > > > What's the problem did you see? (ie, What's the problem do this patch 
> > > > > solve?)
> > > > 
> > > > Everythign in this series is related to the problem in the leader - high
> > > > order allocation success rates are lower. This patch increases the 
> > > > success
> > > > rates when allocating under load.
> > > > 
> > > > > AFAIUC, it seem to solve consecutive allocation success ratio through
> > > > > getting several free pageblocks all at once in a process/kswapd
> > > > > reclaim context. Right?
> > > > 
> > > > Only pageblocks if it is order-9 on x86, it reclaims an amount that 
> > > > depends
> > > > on an allocation size. This only happens during reclaim/compaction 
> > > > context
> > > > when we know that a high-order allocation has recently failed. The 
> > > > objective
> > > > is to reclaim enough order-0 pages so that compaction can succeed again.
> > > 
> > > Your patch increases the number of pages to be reclaimed with considering
> > > the number of fail case during deferring period and your test proved it's
> > > really good. Without your patch, why can't VM reclaim enough pages?
> > 
> > It could reclaim enough pages but it doesn't. nr_to_reclaim is
> > SWAP_CLUSTER_MAX and that gets short-cutted in direct reclaim at least
> > by 
> > 
> > if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> > goto out;
> > 
> > I could set nr_to_reclaim in try_to_free_pages() of course and drive
> > it from there but that's just different, not better. If driven from
> > do_try_to_free_pages(), it is also possible that priorities will rise.
> > When they reach DEF_PRIORITY-2, it will also start stalling and setting
> > pages for immediate reclaim which is more disruptive than not desirable
> > in this case. That is a more wide-reaching change than I would expect for
> > this problem and could cause another regression related to THP requests
> > causing interactive jitter.
> 
> Agreed.
> I hope it should be added by changelog.
> 

I guess but it's not really part of this patch is it? The decision on
where to drive should_continue_reclaim from was made in commit [3e7d3449:
mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim].

Anyway changelog now reads as

If allocation fails after compaction then compaction may be deferred
for a number of allocation attempts. If there are subsequent failures,
compact_defer_shift is increased to defer for longer periods. This
patch uses that information to scale the number of pages reclaimed with
compact_defer_shift until allocations succeed again. The rationale is
that reclaiming the normal number of pages still allowed compaction to
fail and its success depends on the number of pages. If it's failing,
reclaim more pages until it succeeds again.

Note that this is not implying that VM reclaim is not reclaiming enough
pages or that its logic is broken. try_to_free_pages() always asks for
SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
what it does. Direct reclaim stops normally with this check.

if (sc->nr_reclaimed >= sc->nr_to_reclaim)
goto out;

should_continue_reclaim delays when that check is made until a minimum number
of pages for reclaim/compaction are reclaimed. It is possible that this patch
could instead set nr_to_reclaim in try_to_free_pages() and drive it from
there but that's behaves differently and not necessarily for the better.
If driven from do_try_to_free_pages(), it is also possible that priorities
will rise. When they reach DEF_PRIORITY-2, it will also start stalling
and setting pages for immediate reclaim which is more disruptive than not
desirable in this case. That is a more wide-reaching change that could
cause another regression related to THP requests causing interactive jitter.

> > 
> > > Other processes steal the pages reclaimed?
> > 
> > Or the page it reclaimed were in pageblocks that could not be used.
> > 
> > > Why I ask a question is that I want to know what's the problem at current
> > > VM.
> > > 
> > 
> > We cannot reliably tell in advance whether compaction is going to succeed
> > in the future without doing a full scan of the zone which would be both
> > very heavy and race with any allocation requests. Compaction needs free
> > pages to succeed so the intention is to scale the number of pages reclaimed
> > with the number of recent compaction failures.
> 
> > If allocation fails after compaction then compaction may be deferred for
> > a number of allocation attempts. If there are subsequent failures,
> > compact_defer_shift is increased to defer for longer periods. This patch
> > uses that information to scale the number of pages reclaimed with
> > compact_defer_shift until allocations succeed again.
> > 
> > Signed-off-by: Mel Gorman 
> > ---
> >  mm/vmscan.c |   10 ++
> >  1 file changed, 10 insertions(+)
> > 
> > di

Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Takashi Iwai
At Thu, 09 Aug 2012 09:42:48 +0200,
Takashi Iwai wrote:
> 
> At Thu, 9 Aug 2012 09:36:42 +0200,
> Thierry Reding wrote:
> > 
> > On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> > > At Thu, 9 Aug 2012 09:08:13 +0200,
> > > Thierry Reding wrote:
> > > > 
> > > > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > > > Thierry Reding wrote:
> > > > > > 
> > > > > > Recent changes to the firmware loading helpers cause drivers to 
> > > > > > stall
> > > > > > when firmware is loaded during the module_init() call. The 
> > > > > > snd-hda-intel
> > > > > > module requests firmware if the patch= parameter is used to load a 
> > > > > > patch
> > > > > > file. This patch works around the problem by deferring the probe in 
> > > > > > such
> > > > > > cases, which will cause the module to load successfully and the 
> > > > > > driver
> > > > > > binding to the device outside the module_init() call.
> > > > > 
> > > > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > > > 
> > > > > In anyway, I don't understand why such a change was allowed.  Most
> > > > > drivers do call request_firmware() at the device probing time.
> > > > > If this really has to be resolved in the driver side, it must be a bug
> > > > > in the firmware loader core code.
> > > > 
> > > > A good explanation of the problem and subsequent discussion can be found
> > > > here:
> > > > 
> > > > 
> > > > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> > > 
> > > Yeah, but it doesn't justify this ugly module option.
> > > It's a simple bug.  Papering over it with this option doesn't fix
> > > anything.
> > 
> > It's not an option, all it does is defer probing if and only if the
> > patch parameter was specified to make sure the firmware load won't
> > stall. I realize that this may not be an optimal solution, but at least
> > it fixes the problem with no fallout.
> 
> Ah sorry, I misread the patch.
> 
> Then it shouldn't be checked at that point.  Since 3.5 kernel, the
> probing code was already split for vga_switcheroo support.  The point
> you added is the second stage.

... and the patch won't work properly if there are multiple HD-audio 
controllers.  Hmm.


Takashi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda/realtek - Keep analog bypass muted on ALC892

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 09:33:43AM +0200, Takashi Iwai wrote:
> At Thu, 9 Aug 2012 09:18:48 +0200,
> Thierry Reding wrote:
> > 
> > On Thu, Aug 09, 2012 at 09:02:10AM +0200, Takashi Iwai wrote:
> > > At Thu,  9 Aug 2012 08:47:19 +0200,
> > > Thierry Reding wrote:
> > > > 
> > > > Unmuting the analog bypass causes any input to be directly bypassed to
> > > > the outputs, which is usually not expected by the user.
> > > > 
> > > > Signed-off-by: Thierry Reding 
> > > > ---
> > > > Note that this patch could be generalized to include all codec variants
> > > > that have this second channel. As I don't have access to all the data
> > > > sheets I'll leave that decision up to somebody with more hardware
> > > > available.
> > > 
> > > Hmm, are you really sure that this is exclusive switch between two?
> > > All Realtek codecs with this mixer widget mixes two sources up.
> > > Otherwise it must not be a "mixer" widget at all but it must be a
> > > "selector" widget.
> > 
> > It's not an exclusive switch. But the second input is the analog bypass
> > that basically takes the signal directly from the input path and routes
> > it to the output.
> 
> Ah so you mean it as the analog loopback path.

Yes.

> > > And I haven't received a bug report regarding this although ALC892 is
> > > no new chip and the auto-parser has been enabled for so long time.
> > 
> > I've seen this happen when feeding a signal into the CD-IN (NID 0x1c),
> > which is then directly routed to the output, so I actually hear that
> > signal at the output, even if there is no software loopback to capture
> > the input and play it back via the DAC.
> > 
> > This is probably not a very common use-case, so maybe nobody else has
> > come across it. Looking at the block diagram in the ALC892 datasheet
> > (page 5) clearly shows where the input of this second switch comes from.
> > I don't think this is intended to be unmuted by default, but only in
> > very special cases where you actually want to pass the analog input
> > signal directly to the output without going through the ADC and back out
> > the DAC.
> 
> Each input has the analog loopback mixer switch (and off as default).
> Why it doesn't suffice?

I hadn't thought about that. I'll need to recheck if that solves the
issue that I was seeing.

Thierry


pgpp7lhMcaj08.pgp
Description: PGP signature


Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 09:50:57AM +0200, Takashi Iwai wrote:
> At Thu, 09 Aug 2012 09:42:48 +0200,
> Takashi Iwai wrote:
> > 
> > At Thu, 9 Aug 2012 09:36:42 +0200,
> > Thierry Reding wrote:
> > > 
> > > On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> > > > At Thu, 9 Aug 2012 09:08:13 +0200,
> > > > Thierry Reding wrote:
> > > > > 
> > > > > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > > > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > > > > Thierry Reding wrote:
> > > > > > > 
> > > > > > > Recent changes to the firmware loading helpers cause drivers to 
> > > > > > > stall
> > > > > > > when firmware is loaded during the module_init() call. The 
> > > > > > > snd-hda-intel
> > > > > > > module requests firmware if the patch= parameter is used to load 
> > > > > > > a patch
> > > > > > > file. This patch works around the problem by deferring the probe 
> > > > > > > in such
> > > > > > > cases, which will cause the module to load successfully and the 
> > > > > > > driver
> > > > > > > binding to the device outside the module_init() call.
> > > > > > 
> > > > > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > > > > 
> > > > > > In anyway, I don't understand why such a change was allowed.  Most
> > > > > > drivers do call request_firmware() at the device probing time.
> > > > > > If this really has to be resolved in the driver side, it must be a 
> > > > > > bug
> > > > > > in the firmware loader core code.
> > > > > 
> > > > > A good explanation of the problem and subsequent discussion can be 
> > > > > found
> > > > > here:
> > > > > 
> > > > >   
> > > > > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> > > > 
> > > > Yeah, but it doesn't justify this ugly module option.
> > > > It's a simple bug.  Papering over it with this option doesn't fix
> > > > anything.
> > > 
> > > It's not an option, all it does is defer probing if and only if the
> > > patch parameter was specified to make sure the firmware load won't
> > > stall. I realize that this may not be an optimal solution, but at least
> > > it fixes the problem with no fallout.
> > 
> > Ah sorry, I misread the patch.
> > 
> > Then it shouldn't be checked at that point.  Since 3.5 kernel, the
> > probing code was already split for vga_switcheroo support.  The point
> > you added is the second stage.
> 
> ... and the patch won't work properly if there are multiple HD-audio 
> controllers.  Hmm.

Right... the deferred probe would mess up the matching done by the
static dev variable. So maybe a proper implementation that uses
asynchronous firmware loading is inevitable.

Thierry


pgpk0jusaI1rK.pgp
Description: PGP signature


RE: [PATCH] net: add new QCA alx ethernet driver

2012-08-09 Thread Joe Perches
On Thu, 2012-08-09 at 06:49 +, Ren, Cloud wrote:
> Luis has ever asked joe for adding as a Signed-off-by. At that time joe agreed
> with it.

What I replied to Luis and some qca list was:

On Fri, 2012-01-20 at 19:49 -0800, Joe Perches wrote:
> On Fri, 2012-01-20 at 18:25 -0800, Luis R. Rodriguez wrote:
> > Hey Joe, do you mind if we add your SOB tag on the next RFC for
> > submission of this driver? I see all your patches were merged by Ren.
> > I want to make sure you get proper credit for your contributions.
> Whatever's appropriate.

At this time I don't think what I did amounts to a
sign-off of the whole driver.

I appreciate the thoughtfulness of trying to credit
me as a person by improving the driver a little.  I
just wouldn't say I signed off on the whole thing.

cheers, Joe


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 09:42:48AM +0200, Takashi Iwai wrote:
> At Thu, 9 Aug 2012 09:36:42 +0200,
> Thierry Reding wrote:
> > 
> > On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> > > At Thu, 9 Aug 2012 09:08:13 +0200,
> > > Thierry Reding wrote:
> > > > 
> > > > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > > > Thierry Reding wrote:
> > > > > > 
> > > > > > Recent changes to the firmware loading helpers cause drivers to 
> > > > > > stall
> > > > > > when firmware is loaded during the module_init() call. The 
> > > > > > snd-hda-intel
> > > > > > module requests firmware if the patch= parameter is used to load a 
> > > > > > patch
> > > > > > file. This patch works around the problem by deferring the probe in 
> > > > > > such
> > > > > > cases, which will cause the module to load successfully and the 
> > > > > > driver
> > > > > > binding to the device outside the module_init() call.
> > > > > 
> > > > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > > > 
> > > > > In anyway, I don't understand why such a change was allowed.  Most
> > > > > drivers do call request_firmware() at the device probing time.
> > > > > If this really has to be resolved in the driver side, it must be a bug
> > > > > in the firmware loader core code.
> > > > 
> > > > A good explanation of the problem and subsequent discussion can be found
> > > > here:
> > > > 
> > > > 
> > > > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> > > 
> > > Yeah, but it doesn't justify this ugly module option.
> > > It's a simple bug.  Papering over it with this option doesn't fix
> > > anything.
> > 
> > It's not an option, all it does is defer probing if and only if the
> > patch parameter was specified to make sure the firmware load won't
> > stall. I realize that this may not be an optimal solution, but at least
> > it fixes the problem with no fallout.
> 
> Ah sorry, I misread the patch.
> 
> Then it shouldn't be checked at that point.  Since 3.5 kernel, the
> probing code was already split for vga_switcheroo support.

Yes, I saw that. But unless you actually use vga_switcheroo, the second
stage, azx_probe_continue(), will still be called from azx_probe() and
therefore ultimately from module_init().

Before coming up with this patch I actually did play around a bit with
using the asynchronous firmware load functions but it turned out to be
rather difficult to do so I opted for the easy way. The biggest problem
I faced was that since patch loading needs to be done very early on, a
lot of the initialization would need to be done after .probe() and many
things could still fail, so cleaning up after errors would become
increasingly difficult.

> The point you added is the second stage.

I don't understand this sentence.

Thierry


pgp8EbInaoPo2.pgp
Description: PGP signature


Re: [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

2012-08-09 Thread Mel Gorman
On Thu, Aug 09, 2012 at 10:33:58AM +0900, Minchan Kim wrote:
> Hi Mel,
> 
> Just one questoin below.
> 

Sure! Your questions usually get me thinking about the right part of the
series, this series in particular :)

> > 
> > @@ -708,6 +750,10 @@ static int compact_zone(struct zone *zone, struct 
> > compact_control *cc)
> > goto out;
> > }
> > }
> > +
> > +   /* Capture a page now if it is a suitable size */
> 
> Why do we capture only when we migrate MIGRATE_MOVABLE type?
> If you have a reasone, it should have been added as comment.
> 

Good question and there is an answer. However, I also spotted a problem when
thinking about this more where !MIGRATE_MOVABLE allocations are forced to
do a full compaction. The simple solution would be to only set cc->page for
MIGRATE_MOVABLE but there is a better approach that I've implemented in the
patch below. It includes a comment that should answer your question. Does
this make sense to you?

diff --git a/mm/compaction.c b/mm/compaction.c
index 63af8d2..384164e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -53,13 +53,31 @@ static inline bool migrate_async_suitable(int migratetype)
 static void compact_capture_page(struct compact_control *cc)
 {
unsigned long flags;
-   int mtype;
+   int mtype, mtype_low, mtype_high;
 
if (!cc->page || *cc->page)
return;
 
+   /*
+* For MIGRATE_MOVABLE allocations we capture a suitable page ASAP
+* regardless of the migratetype of the freelist is is captured from.
+* This is fine because the order for a high-order MIGRATE_MOVABLE
+* allocation is typically at least a pageblock size and overall
+* fragmentation is not impaired. Other allocation types must
+* capture pages from their own migratelist because otherwise they
+* could pollute other pageblocks like MIGRATE_MOVABLE with
+* difficult to move pages and making fragmentation worse overall.
+*/
+   if (cc->migratetype == MIGRATE_MOVABLE) {
+   mtype_low = 0;
+   mtype_high = MIGRATE_PCPTYPES;
+   } else {
+   mtype_low = cc->migratetype;
+   mtype_high = cc->migratetype + 1;
+   }
+
/* Speculatively examine the free lists without zone lock */
-   for (mtype = 0; mtype < MIGRATE_PCPTYPES; mtype++) {
+   for (mtype = mtype_low; mtype < mtype_high; mtype++) {
int order;
for (order = cc->order; order < MAX_ORDER; order++) {
struct page *page;
@@ -752,8 +770,7 @@ static int compact_zone(struct zone *zone, struct 
compact_control *cc)
}
 
/* Capture a page now if it is a suitable size */
-   if (cc->migratetype == MIGRATE_MOVABLE)
-   compact_capture_page(cc);
+   compact_capture_page(cc);
}
 
 out:
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Takashi Iwai
At Thu, 9 Aug 2012 10:07:13 +0200,
Thierry Reding wrote:
> 
> On Thu, Aug 09, 2012 at 09:42:48AM +0200, Takashi Iwai wrote:
> > At Thu, 9 Aug 2012 09:36:42 +0200,
> > Thierry Reding wrote:
> > > 
> > > On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> > > > At Thu, 9 Aug 2012 09:08:13 +0200,
> > > > Thierry Reding wrote:
> > > > > 
> > > > > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > > > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > > > > Thierry Reding wrote:
> > > > > > > 
> > > > > > > Recent changes to the firmware loading helpers cause drivers to 
> > > > > > > stall
> > > > > > > when firmware is loaded during the module_init() call. The 
> > > > > > > snd-hda-intel
> > > > > > > module requests firmware if the patch= parameter is used to load 
> > > > > > > a patch
> > > > > > > file. This patch works around the problem by deferring the probe 
> > > > > > > in such
> > > > > > > cases, which will cause the module to load successfully and the 
> > > > > > > driver
> > > > > > > binding to the device outside the module_init() call.
> > > > > > 
> > > > > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > > > > 
> > > > > > In anyway, I don't understand why such a change was allowed.  Most
> > > > > > drivers do call request_firmware() at the device probing time.
> > > > > > If this really has to be resolved in the driver side, it must be a 
> > > > > > bug
> > > > > > in the firmware loader core code.
> > > > > 
> > > > > A good explanation of the problem and subsequent discussion can be 
> > > > > found
> > > > > here:
> > > > > 
> > > > >   
> > > > > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> > > > 
> > > > Yeah, but it doesn't justify this ugly module option.
> > > > It's a simple bug.  Papering over it with this option doesn't fix
> > > > anything.
> > > 
> > > It's not an option, all it does is defer probing if and only if the
> > > patch parameter was specified to make sure the firmware load won't
> > > stall. I realize that this may not be an optimal solution, but at least
> > > it fixes the problem with no fallout.
> > 
> > Ah sorry, I misread the patch.
> > 
> > Then it shouldn't be checked at that point.  Since 3.5 kernel, the
> > probing code was already split for vga_switcheroo support.
> 
> Yes, I saw that. But unless you actually use vga_switcheroo, the second
> stage, azx_probe_continue(), will still be called from azx_probe() and
> therefore ultimately from module_init().

Yeah, but this could be easily delayed.  The split was already done,
so the next step would be to return after the first half at probe,
then call the second half later.

> Before coming up with this patch I actually did play around a bit with
> using the asynchronous firmware load functions but it turned out to be
> rather difficult to do so I opted for the easy way. The biggest problem
> I faced was that since patch loading needs to be done very early on, a
> lot of the initialization would need to be done after .probe() and many
> things could still fail, so cleaning up after errors would become
> increasingly difficult.

async probe is also on my TODO list, but it's deferred ;)

> > The point you added is the second stage.
> 
> I don't understand this sentence.

I meant that your patch added the check at the second-half probing
function (azx_probe_contine()).  That is, it could be already the
point triggered by vga_switcheroo handler, not via module_init any
longer.

So, after rethinking what you suggested, I wrote a quick patch below.
Could you check whether this works?


Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index c8aced1..4e5839a 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -559,13 +559,17 @@ enum {
  * VGA-switcher support
  */
 #ifdef SUPPORT_VGA_SWITCHEROO
+#define use_vga_switcheroo(chip)   ((chip)->use_vga_switcheroo)
+#else
+#define use_vga_switcheroo(chip)   0
+#endif
+
+#if defined(SUPPORT_VGA_SWITCHEROO) || defined(CONFIG_SND_HDA_PATCH_LOADER)
 #define DELAYED_INIT_MARK
 #define DELAYED_INITDATA_MARK
-#define use_vga_switcheroo(chip)   ((chip)->use_vga_switcheroo)
 #else
 #define DELAYED_INIT_MARK  __devinit
 #define DELAYED_INITDATA_MARK  __devinitdata
-#define use_vga_switcheroo(chip)   0
 #endif
 
 static char *driver_short_names[] DELAYED_INITDATA_MARK = {
@@ -3154,6 +3158,20 @@ static int __devinit azx_probe(struct pci_dev *pci,
struct azx *chip;
int err;
 
+#ifdef CONFIG_SND_HDA_PATCH_LOADER
+   /* delayed probe */
+   card = pci_get_drvdata(pci);
+   if (card) {
+   struct azx *chip = card->private_data;
+   if (chip->disabled)
+   return 0; /* will be loaded via vga_switcheroo */
+   err = azx_probe_continue(chip);
+   if (err < 0)
+   goto out_free;
+   return 0;
+   }
+#endif
+
  

[PATCH v2 1/3] mfd: add lp8788 mfd driver

2012-08-09 Thread Kim, Milo
TI LP8788 PMU supports regulators, battery charger, RTC,
ADC, backlight driver and current sinks.

Patch v2.
(a) For interrupt handling, use generic irq rather than irq-domain
(b) Replace EXPORT_SYMBOL() with EXPORT_SYMBOL_GPL() for regmap apis
(c) Remove adc functions in mfd driver
  : add new iio driver for supporting adc functions.
seperate patch will be submitted

Signed-off-by: Milo(Woogyom) Kim 
---
 drivers/mfd/Kconfig  |9 +
 drivers/mfd/Makefile |2 +
 drivers/mfd/lp8788-irq.c |  240 +
 drivers/mfd/lp8788.c |  255 ++
 include/linux/mfd/lp8788-isink.h |   52 ++
 include/linux/mfd/lp8788.h   |  364 ++
 6 files changed, 922 insertions(+), 0 deletions(-)
 create mode 100644 drivers/mfd/lp8788-irq.c
 create mode 100644 drivers/mfd/lp8788.c
 create mode 100644 include/linux/mfd/lp8788-isink.h
 create mode 100644 include/linux/mfd/lp8788.h

diff --git a/drivers/mfd/Kconfig b/drivers/mfd/Kconfig
index d1facef..c7b6400 100644
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@ -450,6 +450,15 @@ config PMIC_ADP5520
  individual components like LCD backlight, LEDs, GPIOs and Kepad
  under the corresponding menus.
 
+config MFD_LP8788
+   bool "Texas Instruments LP8788 Power Management Unit Driver"
+   depends on I2C=y
+   select MFD_CORE
+   select REGMAP_I2C
+   help
+ TI LP8788 PMU supports regulators, battery charger, RTC,
+ adc, backlight driver and current sinks.
+
 config MFD_MAX77686
bool "Maxim Semiconductor MAX77686 PMIC Support"
depends on I2C=y && GENERIC_HARDIRQS
diff --git a/drivers/mfd/Makefile b/drivers/mfd/Makefile
index 79dd22d..489cab9 100644
--- a/drivers/mfd/Makefile
+++ b/drivers/mfd/Makefile
@@ -90,6 +90,8 @@ obj-$(CONFIG_PMIC_DA9052) += da9052-core.o
 obj-$(CONFIG_MFD_DA9052_SPI)   += da9052-spi.o
 obj-$(CONFIG_MFD_DA9052_I2C)   += da9052-i2c.o
 
+obj-$(CONFIG_MFD_LP8788)   += lp8788.o lp8788-irq.o
+
 obj-$(CONFIG_MFD_MAX77686) += max77686.o max77686-irq.o
 obj-$(CONFIG_MFD_MAX77693) += max77693.o max77693-irq.o
 max8925-objs   := max8925-core.o max8925-i2c.o
diff --git a/drivers/mfd/lp8788-irq.c b/drivers/mfd/lp8788-irq.c
new file mode 100644
index 000..5056383
--- /dev/null
+++ b/drivers/mfd/lp8788-irq.c
@@ -0,0 +1,240 @@
+/*
+ * TI LP8788 MFD - interrupt handler
+ *
+ * Copyright 2012 Texas Instruments
+ *
+ * Author: Milo(Woogyom) Kim 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* register address */
+#define LP8788_INT_1   0x00
+#define LP8788_INTEN_1 0x03
+
+#define BASE_INTEN_REG LP8788_INTEN_1
+#define SIZE_REG   8
+#define NUM_INTREGS3
+
+/*
+ * struct lp8788_irq_data
+ * @lp   : access to lp8788 registers
+ * @irq_lock : mutex for enabling/disabling the interrupt
+ * @enabled  : status of enabled interrupt
+ * @irq  : pin number of IRQ_N pin
+ * @irq_base : used for handling chained interrupt
+ */
+struct lp8788_irq_data {
+   struct lp8788 *lp;
+   struct mutex irq_lock;
+   int enabled[LP8788_INT_MAX];
+   int irq;
+   int irq_base;
+};
+
+static inline u8 _irq_to_addr(enum lp8788_int_id id)
+{
+   return id / SIZE_REG;
+}
+
+static inline u8 _irq_to_enable_addr(enum lp8788_int_id id)
+{
+   return _irq_to_addr(id) + BASE_INTEN_REG;
+}
+
+static inline u8 _irq_to_mask(enum lp8788_int_id id)
+{
+   return 1 << (id % SIZE_REG);
+}
+
+static inline u8 _irq_to_val(enum lp8788_int_id id, int enable)
+{
+   return enable << (id % SIZE_REG);
+}
+
+static void lp8788_irq_enable(struct irq_data *data)
+{
+   struct lp8788_irq_data *irqd = irq_data_get_irq_chip_data(data);
+   enum lp8788_int_id irq = data->irq - irqd->irq_base;
+
+   irqd->enabled[irq] = 1;
+}
+
+static void lp8788_irq_disable(struct irq_data *data)
+{
+   struct lp8788_irq_data *irqd = irq_data_get_irq_chip_data(data);
+   enum lp8788_int_id irq = data->irq - irqd->irq_base;
+
+   irqd->enabled[irq] = 0;
+}
+
+static void lp8788_irq_bus_lock(struct irq_data *data)
+{
+   struct lp8788_irq_data *irqd = irq_data_get_irq_chip_data(data);
+
+   mutex_lock(&irqd->irq_lock);
+}
+
+static void lp8788_irq_bus_sync_unlock(struct irq_data *data)
+{
+   struct lp8788_irq_data *irqd = irq_data_get_irq_chip_data(data);
+   enum lp8788_int_id irq = data->irq - irqd->irq_base;
+   u8 addr, mask, val;
+
+   addr = _irq_to_enable_addr(irq);
+   mask = _irq_to_mask(irq);
+   val = _irq_to_val(irq, irqd->enabled[irq]);
+
+  

[PATCH 2/3] iio: adc: add new lp8788 adc driver

2012-08-09 Thread Kim, Milo
TI LP8788 has ADC function.
The result of LP878 ADC is used in the LP8788 power supply driver.
(such like getting the battery voltage, temperature and etc)

Signed-off-by: Milo(Woogyom) Kim 
---
 drivers/iio/adc/Kconfig  |6 +
 drivers/iio/adc/Makefile |1 +
 drivers/iio/adc/lp8788_adc.c |  240 ++
 3 files changed, 247 insertions(+), 0 deletions(-)
 create mode 100644 drivers/iio/adc/lp8788_adc.c

diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
index 8a78b4f..30c06ed 100644
--- a/drivers/iio/adc/Kconfig
+++ b/drivers/iio/adc/Kconfig
@@ -22,4 +22,10 @@ config AT91_ADC
help
  Say yes here to build support for Atmel AT91 ADC.
 
+config LP8788_ADC
+   bool "LP8788 ADC driver"
+   depends on MFD_LP8788
+   help
+ Say yes here to build support for TI LP8788 ADC.
+
 endmenu
diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile
index 52eec25..72f9a76 100644
--- a/drivers/iio/adc/Makefile
+++ b/drivers/iio/adc/Makefile
@@ -4,3 +4,4 @@
 
 obj-$(CONFIG_AD7266) += ad7266.o
 obj-$(CONFIG_AT91_ADC) += at91_adc.o
+obj-$(CONFIG_LP8788_ADC) += lp8788_adc.o
diff --git a/drivers/iio/adc/lp8788_adc.c b/drivers/iio/adc/lp8788_adc.c
new file mode 100644
index 000..30767d5
--- /dev/null
+++ b/drivers/iio/adc/lp8788_adc.c
@@ -0,0 +1,240 @@
+/*
+ * TI LP8788 MFD - ADC driver
+ *
+ * Copyright 2012 Texas Instruments
+ *
+ * Author: Milo(Woogyom) Kim 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* register address */
+#define LP8788_ADC_CONF0x60
+#define LP8788_ADC_RAW 0x61
+#define LP8788_ADC_DONE0x63
+
+#define START_ADC_CHANNEL  LPADC_VBATT_5P5
+#define END_ADC_CHANNELLPADC_MAX
+#define ADC_CONV_START 1
+#define ADC_CONV_DELAY_US  100
+
+struct lp8788_adc {
+   struct lp8788 *lp;
+   struct iio_dev *indio_dev;
+};
+
+static const int adc_const[LPADC_MAX] = {
+   [LPADC_VBATT_5P5] = 1343,
+   [LPADC_VIN_CHG]   = 3052,
+   [LPADC_IBATT] = 610,
+   [LPADC_IC_TEMP]   = 610,
+   [LPADC_VBATT_6P0] = 1465,
+   [LPADC_VBATT_5P0] = 1221,
+   [LPADC_ADC1]  = 610,
+   [LPADC_ADC2]  = 610,
+   [LPADC_VDD]   = 1025,
+   [LPADC_VCOIN] = 757,
+   [LPADC_VDD_LDO]   = 610,
+   [LPADC_ADC3]  = 610,
+   [LPADC_ADC4]  = 610,
+};
+
+static inline unsigned int _get_adc_micro_unit(enum lp8788_adc_id id,
+   unsigned int adc_result)
+{
+   return adc_const[id] * ((adc_result * 1000 + 500) / 1000);
+}
+
+static int lp8788_adc_read_raw(struct iio_dev *indio_dev,
+   struct iio_chan_spec const *chan,
+   int *val, int *val2, long mask)
+{
+   struct lp8788_adc *adc = iio_priv(indio_dev);
+   int retry = 5;
+   unsigned int msb, lsb;
+   u8 data, rawdata[2], shift;
+   int size = ARRAY_SIZE(rawdata);
+
+   if (mask != IIO_CHAN_INFO_RAW)
+   return -EINVAL;
+
+   data = (chan->channel << 1) | ADC_CONV_START;
+   if (lp8788_write_byte(adc->lp, LP8788_ADC_CONF, data))
+   goto err;
+
+   /* retry until adc conversion is done */
+   data = 0;
+   while (retry--) {
+   udelay(ADC_CONV_DELAY_US);
+
+   if (lp8788_read_byte(adc->lp, LP8788_ADC_DONE, &data))
+   goto err;
+
+   /* conversion done */
+   if (data)
+   break;
+   }
+
+   if (lp8788_read_multi_bytes(adc->lp, LP8788_ADC_RAW, rawdata, size))
+   goto err;
+
+   shift = chan->scan_type.shift;
+   msb = (rawdata[0] << shift) & 0x0ff0;
+   lsb = (rawdata[1] >> shift) & 0x000f;
+   *val = _get_adc_micro_unit(chan->channel, msb | lsb);
+
+   return IIO_VAL_INT;
+err:
+   return -EINVAL;
+}
+
+static const struct iio_info lp8788_adc_info = {
+   .read_raw = &lp8788_adc_read_raw,
+   .driver_module = THIS_MODULE,
+};
+
+#define LP8788_COMMON_CH(id)   \
+   .indexed = 1,   \
+   .output = 1,\
+   .channel = LPADC_##id,  \
+   .info_mask = IIO_CHAN_INFO_RAW_SEPARATE_BIT,\
+   .address = LP8788_ADC_RAW,  \
+   .scan_type = IIO_ST('u', 8, 12, 4), \
+   .scan_index = 1,\
+   .datasheet_name = #id,  \
+
+#define LP8788_V_CHAN(id)  {   

[PATCH v2 3/3] power_supply: add new lp8788 charger driver

2012-08-09 Thread Kim, Milo
TI LP8788 PMU has the battery charger function.

Patch v2.
use iio consumer apis for adc results rather than raw lp8788 functions
(battery voltage and battery temperature )

Signed-off-by: Milo(Woogyom) Kim 
---
 drivers/power/Kconfig  |7 +
 drivers/power/Makefile |1 +
 drivers/power/lp8788-charger.c |  753 
 3 files changed, 761 insertions(+), 0 deletions(-)
 create mode 100644 drivers/power/lp8788-charger.c

diff --git a/drivers/power/Kconfig b/drivers/power/Kconfig
index c1892f3..ff86469 100644
--- a/drivers/power/Kconfig
+++ b/drivers/power/Kconfig
@@ -255,6 +255,13 @@ config CHARGER_LP8727
help
  Say Y here to enable support for LP8727 Charger Driver.
 
+config CHARGER_LP8788
+   tristate "TI LP8788 charger driver"
+   depends on MFD_LP8788
+   depends on LP8788_ADC
+   help
+ Say Y to enable support for the LP8788 linear charger.
+
 config CHARGER_GPIO
tristate "GPIO charger"
depends on GPIOLIB
diff --git a/drivers/power/Makefile b/drivers/power/Makefile
index ee58afb..587c5f1 100644
--- a/drivers/power/Makefile
+++ b/drivers/power/Makefile
@@ -39,6 +39,7 @@ obj-$(CONFIG_CHARGER_ISP1704) += isp1704_charger.o
 obj-$(CONFIG_CHARGER_MAX8903)  += max8903_charger.o
 obj-$(CONFIG_CHARGER_TWL4030)  += twl4030_charger.o
 obj-$(CONFIG_CHARGER_LP8727)   += lp8727_charger.o
+obj-$(CONFIG_CHARGER_LP8788)   += lp8788-charger.o
 obj-$(CONFIG_CHARGER_GPIO) += gpio-charger.o
 obj-$(CONFIG_CHARGER_MANAGER)  += charger-manager.o
 obj-$(CONFIG_CHARGER_MAX8997)  += max8997_charger.o
diff --git a/drivers/power/lp8788-charger.c b/drivers/power/lp8788-charger.c
new file mode 100644
index 000..0a8efd8
--- /dev/null
+++ b/drivers/power/lp8788-charger.c
@@ -0,0 +1,753 @@
+/*
+ * TI LP8788 MFD - battery charger driver
+ *
+ * Copyright 2012 Texas Instruments
+ *
+ * Author: Milo(Woogyom) Kim 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* register address */
+#define LP8788_CHG_STATUS  0x07
+#define LP8788_CHG_IDCIN   0x13
+#define LP8788_CHG_IBATT   0x14
+#define LP8788_CHG_VTERM   0x15
+#define LP8788_CHG_EOC 0x16
+
+/* mask/shift bits */
+#define LP8788_CHG_INPUT_STATE_M   0x03/* Addr 07h */
+#define LP8788_CHG_STATE_M 0x3C
+#define LP8788_CHG_STATE_S 2
+#define LP8788_NO_BATT_M   BIT(6)
+#define LP8788_BAD_BATT_M  BIT(7)
+#define LP8788_CHG_IBATT_M 0x1F/* Addr 14h */
+#define LP8788_CHG_VTERM_M 0x0F/* Addr 15h */
+#define LP8788_CHG_EOC_LEVEL_M 0x30/* Addr 16h */
+#define LP8788_CHG_EOC_LEVEL_S 4
+#define LP8788_CHG_EOC_TIME_M  0x0E
+#define LP8788_CHG_EOC_TIME_S  1
+#define LP8788_CHG_EOC_MODE_M  BIT(0)
+
+#define CHARGER_NAME   "charger"
+#define BATTERY_NAME   "main_batt"
+
+#define LP8788_CHG_START   0x11
+#define LP8788_CHG_END 0x1C
+
+#define BUF_SIZE   40
+#define MAX_BATT_CAPACITY  100
+#define LP8788_ISEL_MAX23
+#define LP8788_ISEL_STEP   50
+#define LP8788_VTERM_MIN   4100
+#define LP8788_VTERM_STEP  25
+
+/* CHG_STATE */
+enum lp8788_charging_state {
+   OFF,
+   WARM_UP,
+   LOW_INPUT = 0x3,
+   PRECHARGE,
+   CC,
+   CV,
+   MAINTENANCE,
+   BATTERY_FAULT,
+   SYSTEM_SUPPORT = 0xC,
+   HIGH_CURRENT = 0xF,
+   MAX_CHG_STATE,
+};
+
+enum lp8788_charger_input_state {
+   SYSTEM_SUPPLY = 1,
+   FULL_FUNCTION,
+};
+
+enum lp8788_charger_adc_sel {
+   VBATT,
+   BATT_TEMP,
+   NUM_CHG_ADC,
+};
+
+struct lp8788_charger {
+   struct lp8788 *lp;
+   struct iio_channel *chan[NUM_CHG_ADC];
+   struct power_supply charger;
+   struct power_supply battery;
+   struct work_struct charger_work;
+};
+
+static char *battery_supplied_to[] = {
+   BATTERY_NAME,
+};
+
+static enum power_supply_property lp8788_charger_prop[] = {
+   POWER_SUPPLY_PROP_ONLINE,
+};
+
+static enum power_supply_property lp8788_battery_prop[] = {
+   POWER_SUPPLY_PROP_STATUS,
+   POWER_SUPPLY_PROP_HEALTH,
+   POWER_SUPPLY_PROP_PRESENT,
+   POWER_SUPPLY_PROP_VOLTAGE_NOW,
+   POWER_SUPPLY_PROP_CAPACITY,
+   POWER_SUPPLY_PROP_TEMP,
+};
+
+static bool lp8788_is_charger_detected(struct lp8788_charger *pchg)
+{
+   u8 data;
+
+   lp8788_read_byte(pchg->lp, LP8788_CHG_STATUS, &data);
+   data &= LP8788_CHG_INPUT_STATE_M;
+
+   return (data == SYSTEM_SUPPLY || data == FULL_FUNCTION);
+}
+
+static int lp8788

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Mel Gorman
On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
> > 
> > 
> > Second, it updates compact_cached_free_pfn in a more limited set of
> > circumstances.
> > 
> > If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> > of the zone. When a wrapped scanner isolates a page, it updates
> > compact_cached_free_pfn to point to the highest pageblock it
> > can isolate pages from.
> 
> Okay until here.
> 

Great.

> > 
> > If a scanner has not wrapped when it has finished isolated pages it
> > checks if compact_cached_free_pfn is pointing to the end of the
> > zone. If so, the value is updated to point to the highest
> > pageblock that pages were isolated from. This value will not
> > be updated again until a free page scanner wraps and resets
> > compact_cached_free_pfn.
> 
> I tried to understand your intention of this part but unfortunately failed.
> By this part, the problem you mentioned could happen again?
> 

Potentially yes, I did say it still races in the changelog.

>   C
>  Process AM S F
>   |---|
>  Process BM   FS
>  
>  C is zone->compact_cached_free_pfn
>  S is cc->start_pfree_pfn
>  M is cc->migrate_pfn
>  F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn to end of the zone accordingly.
> 

Yes. Now that it has wrapped it updates the compact_cached_free_pfn
every loop of isolate_freepages here.

if (isolated) {
high_pfn = max(high_pfn, pfn);

/*
 * If the free scanner has wrapped, update
 * compact_cached_free_pfn to point to the highest
 * pageblock with free pages. This reduces excessive
 * scanning of full pageblocks near the end of the
 * zone
 */
if (cc->order > 0 && cc->wrapped)
zone->compact_cached_free_pfn = high_pfn;
}



> Simultaneously, Process B finishes isolating in a block and peek 
> compact_cached_free_pfn position and know it's end of the zone so
> update compact_cached_free_pfn to highest pageblock that pages were
> isolated from.
> 

Yes, they race at this point. One of two things happen here and I agree
that this is racy

1. Process A does another iteration of its loop and sets it back
2. Process A does not do another iteration of the loop, the cached_pfn
   is further along that it should. The next compacting process will
   wrap early and reset cached_pfn again but continue to scan the zone.

Either option is relatively harmless because in both cases the zone gets
scanned. In patch 4 it was possible that large portions of the zone were
frequently missed.

> Process A updates compact_cached_free_pfn to the highest pageblock which
> was set by process B because process A has wrapped. It ends up big jump
> without any scanning in process A.
> 

It recovers quickly and is nowhere near as severe as what patch 4
suffers from.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

2012-08-09 Thread Minchan Kim
On Thu, Aug 09, 2012 at 08:49:50AM +0100, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 08:51:27AM +0900, Minchan Kim wrote:
> > > > > > Just out of curiosity.
> > > > > > What's the problem did you see? (ie, What's the problem do this 
> > > > > > patch solve?)
> > > > > 
> > > > > Everythign in this series is related to the problem in the leader - 
> > > > > high
> > > > > order allocation success rates are lower. This patch increases the 
> > > > > success
> > > > > rates when allocating under load.
> > > > > 
> > > > > > AFAIUC, it seem to solve consecutive allocation success ratio 
> > > > > > through
> > > > > > getting several free pageblocks all at once in a process/kswapd
> > > > > > reclaim context. Right?
> > > > > 
> > > > > Only pageblocks if it is order-9 on x86, it reclaims an amount that 
> > > > > depends
> > > > > on an allocation size. This only happens during reclaim/compaction 
> > > > > context
> > > > > when we know that a high-order allocation has recently failed. The 
> > > > > objective
> > > > > is to reclaim enough order-0 pages so that compaction can succeed 
> > > > > again.
> > > > 
> > > > Your patch increases the number of pages to be reclaimed with 
> > > > considering
> > > > the number of fail case during deferring period and your test proved 
> > > > it's
> > > > really good. Without your patch, why can't VM reclaim enough pages?
> > > 
> > > It could reclaim enough pages but it doesn't. nr_to_reclaim is
> > > SWAP_CLUSTER_MAX and that gets short-cutted in direct reclaim at least
> > > by 
> > > 
> > > if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> > > goto out;
> > > 
> > > I could set nr_to_reclaim in try_to_free_pages() of course and drive
> > > it from there but that's just different, not better. If driven from
> > > do_try_to_free_pages(), it is also possible that priorities will rise.
> > > When they reach DEF_PRIORITY-2, it will also start stalling and setting
> > > pages for immediate reclaim which is more disruptive than not desirable
> > > in this case. That is a more wide-reaching change than I would expect for
> > > this problem and could cause another regression related to THP requests
> > > causing interactive jitter.
> > 
> > Agreed.
> > I hope it should be added by changelog.
> > 
> 
> I guess but it's not really part of this patch is it? The decision on
> where to drive should_continue_reclaim from was made in commit [3e7d3449:
> mm: vmscan: reclaim order-0 and use compaction instead of lumpy reclaim].
> 
> Anyway changelog now reads as
> 
> If allocation fails after compaction then compaction may be deferred
> for a number of allocation attempts. If there are subsequent failures,
> compact_defer_shift is increased to defer for longer periods. This
> patch uses that information to scale the number of pages reclaimed with
> compact_defer_shift until allocations succeed again. The rationale is
> that reclaiming the normal number of pages still allowed compaction to
> fail and its success depends on the number of pages. If it's failing,
> reclaim more pages until it succeeds again.
> 
> Note that this is not implying that VM reclaim is not reclaiming enough
> pages or that its logic is broken. try_to_free_pages() always asks for
> SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
> what it does. Direct reclaim stops normally with this check.
> 
> if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> goto out;
> 
> should_continue_reclaim delays when that check is made until a minimum number
> of pages for reclaim/compaction are reclaimed. It is possible that this patch
> could instead set nr_to_reclaim in try_to_free_pages() and drive it from
> there but that's behaves differently and not necessarily for the better.
> If driven from do_try_to_free_pages(), it is also possible that priorities
> will rise. When they reach DEF_PRIORITY-2, it will also start stalling
> and setting pages for immediate reclaim which is more disruptive than not
> desirable in this case. That is a more wide-reaching change that could
> cause another regression related to THP requests causing interactive jitter.
> 
> > > 
> > > > Other processes steal the pages reclaimed?
> > > 
> > > Or the page it reclaimed were in pageblocks that could not be used.
> > > 
> > > > Why I ask a question is that I want to know what's the problem at 
> > > > current
> > > > VM.
> > > > 
> > > 
> > > We cannot reliably tell in advance whether compaction is going to succeed
> > > in the future without doing a full scan of the zone which would be both
> > > very heavy and race with any allocation requests. Compaction needs free
> > > pages to succeed so the intention is to scale the number of pages 
> > > reclaimed
> > > with the number of recent compaction failures.
> > 
> > > If allocation fails after compaction then compaction may be deferred for
> > > a number of allocation attempts. If there are subsequent failures,
> > > compa

RE: [PATCH] net: add new QCA alx ethernet driver

2012-08-09 Thread Ren, Cloud
>-Original Message-
>From: Joe Perches [mailto:j...@perches.com]
>Sent: Thursday, August 09, 2012 3:58 PM
>
>On Thu, 2012-08-09 at 06:49 +, Ren, Cloud wrote:
>> Luis has ever asked joe for adding as a Signed-off-by. At that time
>> joe agreed with it.
>
>What I replied to Luis and some qca list was:
>
>On Fri, 2012-01-20 at 19:49 -0800, Joe Perches wrote:
>> On Fri, 2012-01-20 at 18:25 -0800, Luis R. Rodriguez wrote:
>> > Hey Joe, do you mind if we add your SOB tag on the next RFC for
>> > submission of this driver? I see all your patches were merged by Ren.
>> > I want to make sure you get proper credit for your contributions.
>> Whatever's appropriate.
>
>At this time I don't think what I did amounts to a sign-off of the whole 
>driver.
>
>I appreciate the thoughtfulness of trying to credit me as a person by improving
>the driver a little.  I just wouldn't say I signed off on the whole thing.
>
>cheers, Joe
>

Thanks for your help with alx driver. I will respect your any decisions. I will 
remove your
Signed-off-by. Thanks.

cloud
N�r��yb�X��ǧv�^�)޺{.n�+{zX����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf��^jǫy�m��@A�a���
0��h���i

Re: [PATCH 3/5] kmemleak: use rbtree instead of prio tree

2012-08-09 Thread Catalin Marinas
On Wed, Aug 08, 2012 at 06:07:39PM +0100, Michel Lespinasse wrote:
> kmemleak uses a tree where each node represents an allocated memory object
> in order to quickly find out what object a given address is part of.
> However, the objects don't overlap, so rbtrees are a better choice than
> prio tree for this use. They are both faster and have lower memory overhead.
> 
> Tested by booting a kernel with kmemleak enabled, loading the kmemleak_test
> module, and looking for the expected messages.
> 
> Signed-off-by: Michel Lespinasse 

The patch looks fine to me but I'll give it a test later today and let
you know.

-- 
Catalin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 1/6] mfd: add lp8788 mfd driver

2012-08-09 Thread Kim, Milo
> > +static irqreturn_t lp8788_irq_handler(int irq, void *ptr)
> > +{
> > +   struct lp8788_irq_data *irqd = ptr;
> > +   unsigned long delay = msecs_to_jiffies(DEBOUNCE_MSEC);
> > +
> > +   queue_delayed_work(irqd->thread, &irqd->work, delay);
> > +
> > +   return IRQ_HANDLED;
> > +}
> 
> Why a delayed work?  That's *very* unusual.
> 
> > +   if (!lp->pdata) {
> > +   dev_warn(lp->dev, "no platform data for irq\n");
> > +   goto no_err;
> > +   }
> 
> Given that you're using irq domains why does the device need platform
> data?
> 
> > +   if (irq_base) {
> > +   irq_base = irq_alloc_descs(irq_base, 0, LP8788_INT_MAX, 0);
> > +   if (irq_base < 0) {
> > +   dev_warn(lp->dev, "no allocated irq: %d\n", irq_base);
> > +   goto no_err;
> > +   }
> > +   }
> 
> This shouldn't be needed with irq domains.

In patch v2, generic irq chip is used for interrupt handling.
At this moment, no need to support irq domain with lp8788 driver.
Title: [PATCH v2 1/3] mfd: add lp8788 mfd driver

> > +EXPORT_SYMBOL(lp8788_read_byte);
> 
> You're reexporting regmap functionality with looser licensing
> requirements...
> 

All EXPORT_SYMBOLs were replaced with *_GPL() in the same patch v2.
 
> > +unsigned int lp8788_get_adc(struct lp8788 *lp, enum lp8788_adc_id id,
> > +   enum lp8788_adc_resolution res)
> 
> For new drivers the ADC should probably be integrated into IIO.

New iio driver patch was submitted.
Title : [PATCH 2/3] iio: adc: add new lp8788 adc driver

Thanks a lot for your detailed review.

Best Regards,
Milo



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 4/4] ftrace/x86: Add support for -mfentry to x86_64

2012-08-09 Thread Masami Hiramatsu
(2012/08/08 4:38), Steven Rostedt wrote:
> From: Steven Rostedt 
> 
> If the kernel is compiled with gcc 4.6.0 which supports -mfentry,
> then use that instead of mcount.
> 
> With mcount, frame pointers are forced with the -pg option and we
> get something like:
> 
> :
>55  push   %rbp
>48 89 e5mov%rsp,%rbp
>53  push   %rbx
>41 51   push   %r9
>e8 fe 6a 39 00  callq  81483d00 
>31 c0   xor%eax,%eax
>48 89 fbmov%rdi,%rbx
>48 89 d7mov%rdx,%rdi
>48 33 73 30 xor0x30(%rbx),%rsi
>48 f7 c6 ff ff ff f7test   $0xf7ff,%rsi
> 
> With -mfentry, frame pointers are no longer forced and the call looks
> like this:
> 
> :
>e8 33 af 37 00  callq  81461b40 <__fentry__>
>53  push   %rbx
>48 89 fbmov%rdi,%rbx
>31 c0   xor%eax,%eax
>48 89 d7mov%rdx,%rdi
>41 51   push   %r9
>48 33 73 30 xor0x30(%rbx),%rsi
>48 f7 c6 ff ff ff f7test   $0xf7ff,%rsi
> 
> This adds the ftrace hook at the beginning of the function before a
> frame is set up, and allows the function callbacks to be able to access
> parameters. As kprobes now can use function tracing (at least on x86)
> this speeds up the kprobe hooks that are at the beginning of the
> function.

This looks good for me:)

Reviewed-by: Masami Hiramatsu 

Thanks!

> 
> Cc: Masami Hiramatsu 
> Cc: Andi Kleen 
> Signed-off-by: Steven Rostedt 
> ---
>  arch/x86/Kconfig |1 +
>  arch/x86/include/asm/ftrace.h|7 ++-
>  arch/x86/kernel/entry_64.S   |   18 +-
>  arch/x86/kernel/x8664_ksyms_64.c |6 +-
>  4 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index c70684f..bbbf5d8 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -36,6 +36,7 @@ config X86
>   select HAVE_KRETPROBES
>   select HAVE_OPTPROBES
>   select HAVE_FTRACE_MCOUNT_RECORD
> + select HAVE_FENTRY if X86_64
>   select HAVE_C_RECORDMCOUNT
>   select HAVE_DYNAMIC_FTRACE
>   select HAVE_FUNCTION_TRACER
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index a6cae0c..9a25b52 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -35,7 +35,11 @@
>  #endif
>  
>  #ifdef CONFIG_FUNCTION_TRACER
> -#define MCOUNT_ADDR  ((long)(mcount))
> +#ifdef CC_USING_FENTRY
> +# define MCOUNT_ADDR ((long)(__fentry__))
> +#else
> +# define MCOUNT_ADDR ((long)(mcount))
> +#endif
>  #define MCOUNT_INSN_SIZE 5 /* sizeof mcount call */
>  
>  #ifdef CONFIG_DYNAMIC_FTRACE
> @@ -46,6 +50,7 @@
>  #ifndef __ASSEMBLY__
>  extern void mcount(void);
>  extern atomic_t modifying_ftrace_code;
> +extern void __fentry__(void);
>  
>  static inline unsigned long ftrace_call_adjust(unsigned long addr)
>  {
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index 38308fa..2add3bb 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -69,9 +69,16 @@
>  
>  #ifdef CONFIG_FUNCTION_TRACER
>  #ifdef CONFIG_DYNAMIC_FTRACE
> +
> +#ifdef CC_USING_FENTRY
> +ENTRY(__fentry__)
> + retq
> +END(__fentry__)
> +#else
>  ENTRY(mcount)
>   retq
>  END(mcount)
> +#endif
>  
>  /* skip is set if stack has been adjusted */
>  .macro ftrace_caller_setup skip=0
> @@ -84,7 +91,11 @@ END(mcount)
>   movq RIP(%rsp), %rdi
>   subq $MCOUNT_INSN_SIZE, %rdi
>   /* Load the parent_ip into the second parameter */
> +#ifdef CC_USING_FENTRY
> + movq SS+16(%rsp), %rsi
> +#else
>   movq 8(%rbp), %rsi
> +#endif
>  .endm
>  
>  ENTRY(ftrace_caller)
> @@ -215,9 +226,14 @@ END(mcount)
>  ENTRY(ftrace_graph_caller)
>   MCOUNT_SAVE_FRAME
>  
> +#ifdef CC_USING_FENTRY
> + leaq SS+16(%rsp), %rdi
> + movq $0, %rdx   /* No framepointers needed */
> +#else
>   leaq 8(%rbp), %rdi
> - movq RIP(%rsp), %rsi
>   movq (%rbp), %rdx
> +#endif
> + movq RIP(%rsp), %rsi
>   subq $MCOUNT_INSN_SIZE, %rsi
>  
>   callprepare_ftrace_return
> diff --git a/arch/x86/kernel/x8664_ksyms_64.c 
> b/arch/x86/kernel/x8664_ksyms_64.c
> index 9796c2f..643b236 100644
> --- a/arch/x86/kernel/x8664_ksyms_64.c
> +++ b/arch/x86/kernel/x8664_ksyms_64.c
> @@ -13,9 +13,13 @@
>  #include 
>  
>  #ifdef CONFIG_FUNCTION_TRACER
> -/* mcount is defined in assembly */
> +/* mcount and __fentry__ are defined in assembly */
> +#ifdef CC_USING_FENTRY
> +EXPORT_SYMBOL(__fentry__);
> +#else
>  EXPORT_SYMBOL(mcount);
>  #endif
> +#endif
>  
>  EXPORT_SYMBOL(__get_user_1);
>  EXPORT_SYMBOL(__get_user_2);
> 


-- 
Masami HIRAMATSU
Software Platf

Re: [PATCH 3/5] mm: compaction: Capture a suitable high-order page immediately when it is made available

2012-08-09 Thread Minchan Kim
On Thu, Aug 09, 2012 at 09:11:20AM +0100, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 10:33:58AM +0900, Minchan Kim wrote:
> > Hi Mel,
> > 
> > Just one questoin below.
> > 
> 
> Sure! Your questions usually get me thinking about the right part of the
> series, this series in particular :)
> 
> > > 
> > > @@ -708,6 +750,10 @@ static int compact_zone(struct zone *zone, struct 
> > > compact_control *cc)
> > >   goto out;
> > >   }
> > >   }
> > > +
> > > + /* Capture a page now if it is a suitable size */
> > 
> > Why do we capture only when we migrate MIGRATE_MOVABLE type?
> > If you have a reasone, it should have been added as comment.
> > 
> 
> Good question and there is an answer. However, I also spotted a problem when
> thinking about this more where !MIGRATE_MOVABLE allocations are forced to
> do a full compaction. The simple solution would be to only set cc->page for
> MIGRATE_MOVABLE but there is a better approach that I've implemented in the
> patch below. It includes a comment that should answer your question. Does
> this make sense to you?

It does make sense.
I will add my Reviewed-by in your next spin which includes below patch.

Thanks, Mel.

> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 63af8d2..384164e 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -53,13 +53,31 @@ static inline bool migrate_async_suitable(int migratetype)
>  static void compact_capture_page(struct compact_control *cc)
>  {
>   unsigned long flags;
> - int mtype;
> + int mtype, mtype_low, mtype_high;
>  
>   if (!cc->page || *cc->page)
>   return;
>  
> + /*
> +  * For MIGRATE_MOVABLE allocations we capture a suitable page ASAP
> +  * regardless of the migratetype of the freelist is is captured from.
 ^  ^
 typo?
-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 10/16] ACPIHP: system device hotplug driver skeleton

2012-08-09 Thread Tang Chen
Hi Liu ~

On 08/09/2012 03:40 PM, Jiang Liu wrote:
> Hi Tang,
>   Thanks for testing.
>   Currently there's a limitation that you need to insert acpihp_enum 
> driver first.

Sorry, I didn't make it clear. I did load acpihp_enum module first, and then 
load acpihp_drv.

And I just tried it some more times. It just hung up, but dmesg had no output.
Like this:

# modprobe acpihp_enum
(OK, and sysfs interfaces have been created)
# modprobe acpihp_drv
(hang up)

# dmesg
(nothing)

The "modprobe acpihp_drv" process's call trace shows that it hung at the 
following function:
#0  0x0032836aab80 in __nanosleep_nocancel () from /lib64/libc.so.6
#1  0x0032836deb64 in usleep () from /lib64/libc.so.6
..

I have tried several times and I cannot reproduce the situation I just said.
Maybe my box has something different with yours. And I'll try to find out why.

Thanks for your advice. :)

> Will fix this issue in next version.
>   Regards!
>   Gerry
> 
> On 2012-8-9 15:12, Tang Chen wrote:
>> Hi Liu~
>>
>> I compiled this driver as a module, acpihp_drv. And when I loaded this 
>> module, it
>> gave the following error message:
>>
>> # modprobe acpihp_drv
>> (the command hangs up, no return after 10 min)
>>
>> #dmesg
>> ..
>> [  126.643350] BUG: unable to handle kernel NULL pointer dereference at 
>> 0078
>> [  126.644007] IP: [] mutex_lock+0x19/0x37
>> [  126.644007] PGD 105277a067 PUD 104f823067 PMD 0 
>> [  126.644007] Oops: 0002 [#1] SMP 
>> [  126.644007] Modules linked in: acpihp_drv(+) ebtable_nat ebtables 
>> ipt_MASQUERADE iptable_nat nf_nat iptable_mangle bridge stp llc sunrpc 
>> cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT nf_conntrack_ipv6 
>> nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 vhost_net macvtap macvlan tun 
>> uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm crc32c_intel 
>> microcode lpc_ich mfd_core pcspkr i2c_i801 i2c_core ioatdma e1000e 
>> acpi_memhotplug i7core_edac edac_core igb dca mptsas mptscsih mptbase 
>> scsi_transport_sas
>> [  126.644007] CPU 10 
>> [  126.644007] Pid: 2821, comm: modprobe Tainted: G   A 3.6.0-rc1+ 
>> #6 FUJITSU-SV PRIMEQUEST 1800E/SB
>> [  126.644007] RIP: 0010:[]  [] 
>> mutex_lock+0x19/0x37
>> [  126.644007] RSP: 0018:8810589a9de8  EFLAGS: 00010246
>> [  126.644007] RAX:  RBX: 0078 RCX: 
>> 
>> [  126.644007] RDX:  RSI:  RDI: 
>> 0078
>> [  126.644007] RBP: 8810589a9e08 R08:  R09: 
>> 8810589a9d88
>> [  126.644007] R10: 13e5 R11: 13e5 R12: 
>> a01460d0
>> [  126.644007] R13:  R14: a014732b R15: 
>> 00bf
>> [  126.644007] FS:  7fecb1802700() GS:88105e64() 
>> knlGS:
>> [  126.644007] CS:  0010 DS:  ES:  CR0: 80050033
>> [  126.644007] CR2: 0078 CR3: 001052772000 CR4: 
>> 07e0
>> [  126.644007] DR0:  DR1:  DR2: 
>> 
>> [  126.644007] DR3:  DR6: 0ff0 DR7: 
>> 0400
>> [  126.644007] Process modprobe (pid: 2821, threadinfo 8810589a8000, 
>> task 8810592f8000)
>> [  126.644007] Stack:
>> [  126.644007]  8810589a9e08 810be37f a0146220 
>> 81a7b390
>> [  126.644007]  8810589a9e58 81317eb3 8810589a9e48 
>> 
>> [  126.644007]  81a342c0 81a342e0  
>> a0146220
>> [  126.644007] Call Trace:
>> [  126.644007]  [] ? tracepoint_module_notify+0xd9/0x14a
>> [  126.644007]  [] class_interface_register+0x4a/0xbc
>> [  126.644007]  [] ? 0xa00b7fff
>> [  126.644007]  [] acpihp_drv_init+0x10/0x12 [acpihp_drv]
>> [  126.644007]  [] do_one_initcall+0x7f/0x139
>> [  126.644007]  [] sys_init_module+0x12d3/0x14e3
>> [  126.644007]  [] ? ddebug_dyndbg_boot_param_cb+0x45/0x45
>> [  126.644007]  [] system_call_fastpath+0x16/0x1b
>> [  126.644007] Code: 48 8b 04 25 80 c6 00 00 48 89 43 18 31 c0 5b 5b c9 c3 
>> 55 48 89 e5 53 48 83 ec 18 66 66 66 66 90 48 89 fb e8 5a 0c 00 00 48 89 df 
>>  ff 0f 79 05 e8 06 ff ff ff 65 48 8b 04 25 80 c6 00 00 48 89 
>> [  126.644007] RIP  [] mutex_lock+0x19/0x37
>> [  126.644007]  RSP 
>> [  126.644007] CR2: 0078
>> [  129.981335] ---[ end trace da17e9c9de8dd560 ]---
>> [  139.085895] nr_pdflush_threads exported in /proc is scheduled for removal
>> [  139.167394] sysctl: The scan_unevictable_pages sysctl/node-interface has 
>> been disabled for lack of a legitimate use case.  If you have one, please 
>> send an email to linux...@kvack.org.
>>
>> Looks like it dereferenced a NULL pointer here.
>> May be it was my mistake that I didn't configure the environment correctly.
>> Would you please give me some advice ?
>>
>> Thanks. :)
>>
>>

-- 
Best Regards,
Tang chen
--
To unsubscribe from this list: send the line "unsu

Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Minchan Kim
On Thu, Aug 09, 2012 at 09:23:28AM +0100, Mel Gorman wrote:
> On Thu, Aug 09, 2012 at 09:12:12AM +0900, Minchan Kim wrote:
> > > 
> > > 
> > > Second, it updates compact_cached_free_pfn in a more limited set of
> > > circumstances.
> > > 
> > > If a scanner has wrapped, it updates compact_cached_free_pfn to the end
> > >   of the zone. When a wrapped scanner isolates a page, it updates
> > >   compact_cached_free_pfn to point to the highest pageblock it
> > >   can isolate pages from.
> > 
> > Okay until here.
> > 
> 
> Great.
> 
> > > 
> > > If a scanner has not wrapped when it has finished isolated pages it
> > >   checks if compact_cached_free_pfn is pointing to the end of the
> > >   zone. If so, the value is updated to point to the highest
> > >   pageblock that pages were isolated from. This value will not
> > >   be updated again until a free page scanner wraps and resets
> > >   compact_cached_free_pfn.
> > 
> > I tried to understand your intention of this part but unfortunately failed.
> > By this part, the problem you mentioned could happen again?
> > 
> 
> Potentially yes, I did say it still races in the changelog.
> 
> > C
> >  Process A  M S F
> > |---|
> >  Process B  M   FS
> >  
> >  C is zone->compact_cached_free_pfn
> >  S is cc->start_pfree_pfn
> >  M is cc->migrate_pfn
> >  F is cc->free_pfn
> > 
> > In this diagram, Process A has just reached its migrate scanner, wrapped
> > around and updated compact_cached_free_pfn to end of the zone accordingly.
> > 
> 
> Yes. Now that it has wrapped it updates the compact_cached_free_pfn
> every loop of isolate_freepages here.
> 
> if (isolated) {
> high_pfn = max(high_pfn, pfn);
> 
> /*
>  * If the free scanner has wrapped, update
>  * compact_cached_free_pfn to point to the highest
>  * pageblock with free pages. This reduces excessive
>  * scanning of full pageblocks near the end of the
>  * zone
>  */
> if (cc->order > 0 && cc->wrapped)
> zone->compact_cached_free_pfn = high_pfn;
> }
> 
> 
> 
> > Simultaneously, Process B finishes isolating in a block and peek 
> > compact_cached_free_pfn position and know it's end of the zone so
> > update compact_cached_free_pfn to highest pageblock that pages were
> > isolated from.
> > 
> 
> Yes, they race at this point. One of two things happen here and I agree
> that this is racy
> 
> 1. Process A does another iteration of its loop and sets it back
> 2. Process A does not do another iteration of the loop, the cached_pfn
>is further along that it should. The next compacting process will
>wrap early and reset cached_pfn again but continue to scan the zone.
> 
> Either option is relatively harmless because in both cases the zone gets
> scanned. In patch 4 it was possible that large portions of the zone were
> frequently missed.
> 
> > Process A updates compact_cached_free_pfn to the highest pageblock which
> > was set by process B because process A has wrapped. It ends up big jump
> > without any scanning in process A.
> > 
> 
> It recovers quickly and is nowhere near as severe as what patch 4
> suffers from.

Agreed.
Thanks, Mel.

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] mm: have order > 0 compaction start near a pageblock with free pages

2012-08-09 Thread Minchan Kim
On Wed, Aug 08, 2012 at 08:08:44PM +0100, Mel Gorman wrote:
> commit [7db8889a: mm: have order > 0 compaction start off where it left]
> introduced a caching mechanism to reduce the amount work the free page
> scanner does in compaction. However, it has a problem. Consider two process
> simultaneously scanning free pages
> 
>   C
> Process A M S F
>   |---|
> Process B M   FS
> 
> C is zone->compact_cached_free_pfn
> S is cc->start_pfree_pfn
> M is cc->migrate_pfn
> F is cc->free_pfn
> 
> In this diagram, Process A has just reached its migrate scanner, wrapped
> around and updated compact_cached_free_pfn accordingly.
> 
> Simultaneously, Process B finishes isolating in a block and updates
> compact_cached_free_pfn again to the location of its free scanner.
> 
> Process A moves to "end_of_zone - one_pageblock" and runs this check
> 
> if (cc->order > 0 && (!cc->wrapped ||
>   zone->compact_cached_free_pfn >
>   cc->start_free_pfn))
> pfn = min(pfn, zone->compact_cached_free_pfn);
> 
> compact_cached_free_pfn is above where it started so the free scanner skips
> almost the entire space it should have scanned. When there are multiple
> processes compacting it can end in a situation where the entire zone is
> not being scanned at all.  Further, it is possible for two processes to
> ping-pong update to compact_cached_free_pfn which is just random.
> 
> Overall, the end result wrecks allocation success rates.
> 
> There is not an obvious way around this problem without introducing new
> locking and state so this patch takes a different approach.
> 
> First, it gets rid of the skip logic because it's not clear that it matters
> if two free scanners happen to be in the same block but with racing updates
> it's too easy for it to skip over blocks it should not.
> 
> Second, it updates compact_cached_free_pfn in a more limited set of
> circumstances.
> 
> If a scanner has wrapped, it updates compact_cached_free_pfn to the end
>   of the zone. When a wrapped scanner isolates a page, it updates
>   compact_cached_free_pfn to point to the highest pageblock it
>   can isolate pages from.
> 
> If a scanner has not wrapped when it has finished isolated pages it
>   checks if compact_cached_free_pfn is pointing to the end of the
>   zone. If so, the value is updated to point to the highest
>   pageblock that pages were isolated from. This value will not
>   be updated again until a free page scanner wraps and resets
>   compact_cached_free_pfn.
> 
> This is not optimal and it can still race but the compact_cached_free_pfn
> will be pointing to or very near a pageblock with free pages.
> 
> Signed-off-by: Mel Gorman 
> Reviewed-by: Rik van Riel 
Reviewed-by: Minchan Kim 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 0/8] Raid: enable talitos xor offload for improving performance

2012-08-09 Thread qiang.liu
Hi all,

The following 8 patches enabling fsl-dma and talitos offload raid
operations for improving raid performance and balancing CPU load.

These patches include talitos, fsl-dma and carma module (caram uses
some features of fsl-dma).

Write performance will be improved by 25-30% tested by iozone.
Write performance is improved about 2% after using spin_lock_bh replace
spin_lock_irqsave.
CPU load will be reduced by 8%.

"fwiw, I gave v5 a test-drive, setting up a RAID5 array on ramdisks
[1], and this patchseries, along with FSL_DMA && NET_DMA set seems
to be holding water, so this series gets my:"

Tested-by: Kim Phillips 

[1] mdadm --create --verbose --force /dev/md0 --level=raid5 --raid-devices=4 \
/dev/ram[0123]

Changes in v7:
- add test result which is provided by Kim Phillips;
- correct one coding style issue in patch 5/8;
- add comments by Arnd Bergmann in patch 6/8;

Changes in v6:
- swap the order of original patch 3/6 and 4/6;
- merge Ira's patch to reduce the size of original patch;
- merge Ira's patch of carma in 8/8;
- update documents and descriptions according to Ira's advice;

Changes in v5:
- add detail description in patch 3/6 about the process of completed
descriptor, the process is in align with fsl-dma Reference Manual,
illustrate the potential risk and how to reproduce it;
- drop the patch 7/7 in v4 according to Timur's comments;

Changes in v4:
- fix an error in talitos when dest addr is same with src addr, dest
should be freed only one time if src is same with dest addr;
- correct coding style in fsl-dma according to Ira's comments;
- fix a race condition in fsl-dma fsl_tx_status(), remove the interface
which is used to free descriptors in queue ld_completed, this interface
has been included in fsldma_cleanup_descriptor(), in v3, there is one
place missed spin_lock protect;
- split the original patch 3/4 up to 2 patches 3/7 and 4/7 according to
Li Yang's comments;
- fix a warning of unitialized cookie;
- add memory copy self test in fsl-dma;
- add more detail description about use spin_lock_bh() to instead of
spin_lock_irqsave() according to Timur's comments.

Changes in v3:
- change release process of fsl-dma descriptor for resolve the
potential race condition;
- add test result when use spin_lock_bh replace spin_lock_irqsave;
- modify the benchmark results according to the latest patch.

Changes in v2:
- rebase onto cryptodev tree;
- split the patch 3/4 up to 3 independent patches;
- remove the patch 4/4, the fix is not for cryptodev tree;

Qiang Liu (8):
  Talitos: Support for async_tx XOR offload
  fsl-dma: remove attribute DMA_INTERRUPT of dmaengine
  fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication
  fsl-dma: move functions to avoid forward declarations
  fsl-dma: change release process of dma descriptor for supporting async_tx
  fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave
  fsl-dma: fix a warning of unitialized cookie
  carma: remove unnecessary DMA_INTERRUPT capability

 drivers/crypto/Kconfig  |9 +
 drivers/crypto/talitos.c|  413 ++
 drivers/crypto/talitos.h|   53 
 drivers/dma/fsldma.c|  488 +--
 drivers/dma/fsldma.h|   17 +-
 drivers/misc/carma/carma-fpga-program.c |1 -
 drivers/misc/carma/carma-fpga.c |2 +-
 7 files changed, 761 insertions(+), 222 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 1/8] Talitos: Support for async_tx XOR offload

2012-08-09 Thread qiang.liu
From: Qiang Liu 

Expose Talitos's XOR functionality to be used for RAID parity
calculation via the Async_tx layer.

Cc: Herbert Xu 
Cc: David S. Miller 
Signed-off-by: Dipen Dudhat 
Signed-off-by: Maneesh Gupta 
Signed-off-by: Kim Phillips 
Signed-off-by: Vishnu Suresh 
Signed-off-by: Qiang Liu 
---
 drivers/crypto/Kconfig   |9 +
 drivers/crypto/talitos.c |  413 ++
 drivers/crypto/talitos.h |   53 ++
 3 files changed, 475 insertions(+), 0 deletions(-)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index be6b2ba..f0a7c29 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -222,6 +222,15 @@ config CRYPTO_DEV_TALITOS
  To compile this driver as a module, choose M here: the module
  will be called talitos.

+config CRYPTO_DEV_TALITOS_RAIDXOR
+   bool "Talitos RAID5 XOR Calculation Offload"
+   default y
+   select DMA_ENGINE
+   depends on CRYPTO_DEV_TALITOS
+   help
+ Say 'Y' here to use the Freescale Security Engine (SEC) to
+ offload RAID XOR parity Calculation
+
 config CRYPTO_DEV_IXP4XX
tristate "Driver for IXP4xx crypto hardware acceleration"
depends on ARCH_IXP4XX
diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index efff788..b34264e 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -619,6 +619,399 @@ static void talitos_unregister_rng(struct device *dev)
hwrng_unregister(&priv->rng);
 }

+#ifdef CONFIG_CRYPTO_DEV_TALITOS_RAIDXOR
+static void talitos_release_xor(struct device *dev, struct talitos_desc 
*hwdesc,
+   void *context, int error);
+
+static enum dma_status talitos_is_tx_complete(struct dma_chan *chan,
+ dma_cookie_t cookie,
+ struct dma_tx_state *state)
+{
+   struct talitos_xor_chan *xor_chan;
+   dma_cookie_t last_used;
+   dma_cookie_t last_complete;
+
+   xor_chan = container_of(chan, struct talitos_xor_chan, common);
+
+   last_used = chan->cookie;
+   last_complete = xor_chan->completed_cookie;
+
+   if (state->last)
+   state->last = last_complete;
+
+   if (state->used)
+   state->used = last_used;
+
+   return dma_async_is_complete(cookie, last_complete, last_used);
+}
+
+static void talitos_process_pending(struct talitos_xor_chan *xor_chan)
+{
+   struct talitos_xor_desc *desc, *_desc;
+   unsigned long flags;
+   int status;
+   struct talitos_private *priv;
+   int ch;
+
+   priv = dev_get_drvdata(xor_chan->dev);
+   ch = atomic_inc_return(&priv->last_chan) &
+ (priv->num_channels - 1);
+   spin_lock_irqsave(&xor_chan->desc_lock, flags);
+
+   list_for_each_entry_safe(desc, _desc, &xor_chan->pending_q, node) {
+   status = talitos_submit(xor_chan->dev, ch, &desc->hwdesc,
+   talitos_release_xor, desc);
+   if (status != -EINPROGRESS)
+   break;
+
+   list_del(&desc->node);
+   list_add_tail(&desc->node, &xor_chan->in_progress_q);
+   }
+
+   spin_unlock_irqrestore(&xor_chan->desc_lock, flags);
+}
+
+static void talitos_xor_run_tx_complete_actions(struct talitos_xor_desc *desc,
+   struct talitos_xor_chan *xor_chan)
+{
+   struct device *dev = xor_chan->dev;
+   dma_addr_t dest, addr;
+   unsigned int src_cnt = desc->unmap_src_cnt;
+   unsigned int len = desc->unmap_len;
+   enum dma_ctrl_flags flags = desc->async_tx.flags;
+   struct dma_async_tx_descriptor *tx = &desc->async_tx;
+
+   /* unmap dma addresses */
+   dest = desc->hwdesc.ptr[6].ptr;
+   if (likely(!(flags & DMA_COMPL_SKIP_DEST_UNMAP)))
+   dma_unmap_page(dev, dest, len, DMA_BIDIRECTIONAL);
+
+   desc->idx = 6 - src_cnt;
+   if (likely(!(flags & DMA_COMPL_SKIP_SRC_UNMAP))) {
+   while(desc->idx < 6) {
+   addr = desc->hwdesc.ptr[desc->idx++].ptr;
+   if (addr == dest)
+   continue;
+   dma_unmap_page(dev, addr, len, DMA_TO_DEVICE);
+   }
+   }
+
+   /* run dependent operations */
+   dma_run_dependencies(tx);
+}
+
+static void talitos_release_xor(struct device *dev, struct talitos_desc 
*hwdesc,
+   void *context, int error)
+{
+   struct talitos_xor_desc *desc = context;
+   struct talitos_xor_chan *xor_chan;
+   dma_async_tx_callback callback;
+   void *callback_param;
+
+   if (unlikely(error))
+   dev_err(dev, "xor operation: talitos error %d\n", error);
+
+   xor_chan = container_of(desc->async_tx.chan, struct talitos_xor_chan,
+   common);
+   spin_lock_bh(&xor_chan->desc_lock);
+ 

[PATCH v7 2/8] fsl-dma: remove attribute DMA_INTERRUPT of dmaengine

2012-08-09 Thread qiang.liu
From: Qiang Liu 

Delete attribute DMA_INTERRUPT because fsl-dma doesn't support this function,
exception will be thrown if talitos is used to offload xor at the same time.

Cc: Dan Williams 
Cc: Vinod Koul 
Cc: Li Yang 
Signed-off-by: Qiang Liu 
Acked-by: Ira W. Snyder 
---
 drivers/dma/fsldma.c |   31 ---
 1 files changed, 0 insertions(+), 31 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 8f84761..4f2f212 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -543,35 +543,6 @@ static void fsl_dma_free_chan_resources(struct dma_chan 
*dchan)
 }

 static struct dma_async_tx_descriptor *
-fsl_dma_prep_interrupt(struct dma_chan *dchan, unsigned long flags)
-{
-   struct fsldma_chan *chan;
-   struct fsl_desc_sw *new;
-
-   if (!dchan)
-   return NULL;
-
-   chan = to_fsl_chan(dchan);
-
-   new = fsl_dma_alloc_descriptor(chan);
-   if (!new) {
-   chan_err(chan, "%s\n", msg_ld_oom);
-   return NULL;
-   }
-
-   new->async_tx.cookie = -EBUSY;
-   new->async_tx.flags = flags;
-
-   /* Insert the link descriptor to the LD ring */
-   list_add_tail(&new->node, &new->tx_list);
-
-   /* Set End-of-link to the last link descriptor of new list */
-   set_ld_eol(chan, new);
-
-   return &new->async_tx;
-}
-
-static struct dma_async_tx_descriptor *
 fsl_dma_prep_memcpy(struct dma_chan *dchan,
dma_addr_t dma_dst, dma_addr_t dma_src,
size_t len, unsigned long flags)
@@ -1352,12 +1323,10 @@ static int __devinit fsldma_of_probe(struct 
platform_device *op)
fdev->irq = irq_of_parse_and_map(op->dev.of_node, 0);

dma_cap_set(DMA_MEMCPY, fdev->common.cap_mask);
-   dma_cap_set(DMA_INTERRUPT, fdev->common.cap_mask);
dma_cap_set(DMA_SG, fdev->common.cap_mask);
dma_cap_set(DMA_SLAVE, fdev->common.cap_mask);
fdev->common.device_alloc_chan_resources = fsl_dma_alloc_chan_resources;
fdev->common.device_free_chan_resources = fsl_dma_free_chan_resources;
-   fdev->common.device_prep_dma_interrupt = fsl_dma_prep_interrupt;
fdev->common.device_prep_dma_memcpy = fsl_dma_prep_memcpy;
fdev->common.device_prep_dma_sg = fsl_dma_prep_sg;
fdev->common.device_tx_status = fsl_tx_status;
--
1.7.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 3/8] fsl-dma: add fsl_dma_free_descriptor() to reduce code duplication

2012-08-09 Thread qiang.liu
From: Qiang Liu 

There are several places where descriptors are freed using identical
code. Put this code into a function to reduce code duplication.

Cc: Dan Williams 
Cc: Dan Williams 
Cc: Vinod Koul 
Signed-off-by: Ira W. Snyder 
---
 drivers/dma/fsldma.c |   38 --
 1 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 4f2f212..d4720d3 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -427,6 +427,21 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
 }

 /**
+ * fsl_dma_free_descriptor - Free descriptor from channel's DMA pool.
+ * @chan : Freescale DMA channel
+ * @desc: descriptor to be freed
+ */
+static void fsl_dma_free_descriptor(struct fsldma_chan *chan,
+   struct fsl_desc_sw *desc)
+{
+   list_del(&desc->node);
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, "LD %p free\n", desc);
+#endif
+   dma_pool_free(chan->desc_pool, desc, desc->async_tx.phys);
+}
+
+/**
  * fsl_dma_alloc_descriptor - Allocate descriptor from channel's DMA pool.
  * @chan : Freescale DMA channel
  *
@@ -500,13 +515,8 @@ static void fsldma_free_desc_list(struct fsldma_chan *chan,
 {
struct fsl_desc_sw *desc, *_desc;

-   list_for_each_entry_safe(desc, _desc, list, node) {
-   list_del(&desc->node);
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, "LD %p free\n", desc);
-#endif
-   dma_pool_free(chan->desc_pool, desc, desc->async_tx.phys);
-   }
+   list_for_each_entry_safe(desc, _desc, list, node)
+   fsl_dma_free_descriptor(chan, desc);
 }

 static void fsldma_free_desc_list_reverse(struct fsldma_chan *chan,
@@ -514,13 +524,8 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 {
struct fsl_desc_sw *desc, *_desc;

-   list_for_each_entry_safe_reverse(desc, _desc, list, node) {
-   list_del(&desc->node);
-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, "LD %p free\n", desc);
-#endif
-   dma_pool_free(chan->desc_pool, desc, desc->async_tx.phys);
-   }
+   list_for_each_entry_safe_reverse(desc, _desc, list, node)
+   fsl_dma_free_descriptor(chan, desc);
 }

 /**
@@ -855,10 +860,7 @@ static void fsldma_cleanup_descriptor(struct fsldma_chan 
*chan,
dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
}

-#ifdef FSL_DMA_LD_DEBUG
-   chan_dbg(chan, "LD %p free\n", desc);
-#endif
-   dma_pool_free(chan->desc_pool, desc, txd->phys);
+   fsl_dma_free_descriptor(chan, desc);
 }

 /**
--
1.7.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 4/8] fsl-dma: move functions to avoid forward declarations

2012-08-09 Thread qiang.liu
From: Qiang Liu 

These functions will be modified in the next patch in the series. By
moving the function in a patch separate from the changes, it will make
review easier.

Cc: Dan Williams 
Cc: Dan Williams 
Cc: Vinod Koul 
Cc: Li Yang 
Signed-off-by: Ira W. Snyder 
Signed-off-by: Qiang Liu 
---
 drivers/dma/fsldma.c |  230 +-
 1 files changed, 115 insertions(+), 115 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index d4720d3..36490a3 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -472,6 +472,121 @@ static struct fsl_desc_sw 
*fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
 }

 /**
+ * fsl_chan_xfer_ld_queue - transfer any pending transactions
+ * @chan : Freescale DMA channel
+ *
+ * HARDWARE STATE: idle
+ * LOCKING: must hold chan->desc_lock
+ */
+static void fsl_chan_xfer_ld_queue(struct fsldma_chan *chan)
+{
+   struct fsl_desc_sw *desc;
+
+   /*
+* If the list of pending descriptors is empty, then we
+* don't need to do any work at all
+*/
+   if (list_empty(&chan->ld_pending)) {
+   chan_dbg(chan, "no pending LDs\n");
+   return;
+   }
+
+   /*
+* The DMA controller is not idle, which means that the interrupt
+* handler will start any queued transactions when it runs after
+* this transaction finishes
+*/
+   if (!chan->idle) {
+   chan_dbg(chan, "DMA controller still busy\n");
+   return;
+   }
+
+   /*
+* If there are some link descriptors which have not been
+* transferred, we need to start the controller
+*/
+
+   /*
+* Move all elements from the queue of pending transactions
+* onto the list of running transactions
+*/
+   chan_dbg(chan, "idle, starting controller\n");
+   desc = list_first_entry(&chan->ld_pending, struct fsl_desc_sw, node);
+   list_splice_tail_init(&chan->ld_pending, &chan->ld_running);
+
+   /*
+* The 85xx DMA controller doesn't clear the channel start bit
+* automatically at the end of a transfer. Therefore we must clear
+* it in software before starting the transfer.
+*/
+   if ((chan->feature & FSL_DMA_IP_MASK) == FSL_DMA_IP_85XX) {
+   u32 mode;
+
+   mode = DMA_IN(chan, &chan->regs->mr, 32);
+   mode &= ~FSL_DMA_MR_CS;
+   DMA_OUT(chan, &chan->regs->mr, mode, 32);
+   }
+
+   /*
+* Program the descriptor's address into the DMA controller,
+* then start the DMA transaction
+*/
+   set_cdar(chan, desc->async_tx.phys);
+   get_cdar(chan);
+
+   dma_start(chan);
+   chan->idle = false;
+}
+
+/**
+ * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
+ * @chan: Freescale DMA channel
+ * @desc: descriptor to cleanup and free
+ *
+ * This function is used on a descriptor which has been executed by the DMA
+ * controller. It will run any callbacks, submit any dependencies, and then
+ * free the descriptor.
+ */
+static void fsldma_cleanup_descriptor(struct fsldma_chan *chan,
+ struct fsl_desc_sw *desc)
+{
+   struct dma_async_tx_descriptor *txd = &desc->async_tx;
+   struct device *dev = chan->common.device->dev;
+   dma_addr_t src = get_desc_src(chan, desc);
+   dma_addr_t dst = get_desc_dst(chan, desc);
+   u32 len = get_desc_cnt(chan, desc);
+
+   /* Run the link descriptor callback function */
+   if (txd->callback) {
+#ifdef FSL_DMA_LD_DEBUG
+   chan_dbg(chan, "LD %p callback\n", desc);
+#endif
+   txd->callback(txd->callback_param);
+   }
+
+   /* Run any dependencies */
+   dma_run_dependencies(txd);
+
+   /* Unmap the dst buffer, if requested */
+   if (!(txd->flags & DMA_COMPL_SKIP_DEST_UNMAP)) {
+   if (txd->flags & DMA_COMPL_DEST_UNMAP_SINGLE)
+   dma_unmap_single(dev, dst, len, DMA_FROM_DEVICE);
+   else
+   dma_unmap_page(dev, dst, len, DMA_FROM_DEVICE);
+   }
+
+   /* Unmap the src buffer, if requested */
+   if (!(txd->flags & DMA_COMPL_SKIP_SRC_UNMAP)) {
+   if (txd->flags & DMA_COMPL_SRC_UNMAP_SINGLE)
+   dma_unmap_single(dev, src, len, DMA_TO_DEVICE);
+   else
+   dma_unmap_page(dev, src, len, DMA_TO_DEVICE);
+   }
+
+   fsl_dma_free_descriptor(chan, desc);
+}
+
+/**
  * fsl_dma_alloc_chan_resources - Allocate resources for DMA channel.
  * @chan : Freescale DMA channel
  *
@@ -816,121 +931,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 }

 /**
- * fsldma_cleanup_descriptor - cleanup and free a single link descriptor
- * @chan: Freescale DMA channel
- * @desc: descriptor to cleanup and free
- *
- * This function is used on a descriptor which has been 

[PATCH v7 5/8] fsl-dma: change release process of dma descriptor for supporting async_tx

2012-08-09 Thread qiang.liu
From: Qiang Liu 

Fix the potential risk when enable config NET_DMA and ASYNC_TX.
Async_tx is lack of support in current release process of dma descriptor,
all descriptors will be released whatever is acked or no-acked by async_tx,
so there is a potential race condition when dma engine is uesd by others
clients (e.g. when enable NET_DMA to offload TCP).

In our case, a race condition which is raised when use both of talitos
and dmaengine to offload xor is because napi scheduler will sync all
pending requests in dma channels, it affects the process of raid operations
due to ack_tx is not checked in fsl dma. The no-acked descriptor is freed
which is submitted just now, as a dependent tx, this freed descriptor trigger
BUG_ON(async_tx_test_ack(depend_tx)) in async_tx_submit().

TASK = ee1a94a0[1390] 'md0_raid5' THREAD: ecf4 CPU: 0
GPR00: 0001 ecf41ca0 ee44/921a94a0 003f 0001 c00593e4  
0001
GPR08:  a7a7a7a7 0001 045/92002 42028042 100a38d4 ed576d98 

GPR16: ed5a11b0  2b162000 0200 046/92000 2d555000 ed3015e8 
c15a7aa0
GPR24:  c155fc40  ecb63220 ecf41d28 e47/92f640bb0 ef640c30 
ecf41ca0
NIP [c02b048c] async_tx_submit+0x6c/0x2b4
LR [c02b068c] async_tx_submit+0x26c/0x2b4
Call Trace:
[ecf41ca0] [c02b068c] async_tx_submit+0x26c/0x2b448/92 (unreliable)
[ecf41cd0] [c02b0a4c] async_memcpy+0x240/0x25c
[ecf41d20] [c0421064] async_copy_data+0xa0/0x17c
[ecf41d70] [c0421cf4] __raid_run_ops+0x874/0xe10
[ecf41df0] [c0426ee4] handle_stripe+0x820/0x25e8
[ecf41e90] [c0429080] raid5d+0x3d4/0x5b4
[ecf41f40] [c04329b8] md_thread+0x138/0x16c
[ecf41f90] [c008277c] kthread+0x8c/0x90
[ecf41ff0] [c0011630] kernel_thread+0x4c/0x68

Another modification in this patch is the change of completed descriptors,
there is a potential risk which caused by exception interrupt, all descriptors
in ld_running list are seemed completed when an interrupt raised, it works fine
under normal condition, but if there is an exception occured, it cannot work
as our excepted. Hardware should not be depend on s/w list, the right way is
to read current descriptor address register to find the last completed
descriptor. If an interrupt is raised by an error, all descriptors in ld_running
should not be seemed finished, or these unfinished descriptors in ld_running
will be released wrongly.

A simple way to reproduce,
Enable dmatest first, then insert some bad descriptors which can trigger
Programming Error interrupts before the good descriptors. Last, the good
descriptors will be freed before they are processsed because of the exception
intrerrupt.

Note: the bad descriptors are only for simulating an exception interrupt.
This case can illustrate the potential risk in current fsl-dma very well.

Cc: Dan Williams 
Cc: Dan Williams 
Cc: Vinod Koul 
Cc: Li Yang 
Signed-off-by: Qiang Liu 
Signed-off-by: Ira W. Snyder 
Tested-by: Kim Phillips 
---
 drivers/dma/fsldma.c |  233 ++
 drivers/dma/fsldma.h |   17 +++-
 2 files changed, 175 insertions(+), 75 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 36490a3..b05a81f 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -472,6 +472,111 @@ static struct fsl_desc_sw 
*fsl_dma_alloc_descriptor(struct fsldma_chan *chan)
 }

 /**
+ * fsldma_clean_completed_descriptor - free all descriptors which
+ * has been completed and acked
+ * @chan: Freescale DMA channel
+ *
+ * This function is used on all completed and acked descriptors.
+ * All descriptors should only be freed in this function.
+ */
+static void
+fsldma_clean_completed_descriptor(struct fsldma_chan *chan)
+{
+   struct fsl_desc_sw *desc, *_desc;
+
+   /* Run the callback for each descriptor, in order */
+   list_for_each_entry_safe(desc, _desc, &chan->ld_completed, node)
+   if (async_tx_test_ack(&desc->async_tx))
+   fsl_dma_free_descriptor(chan, desc);
+}
+
+/**
+ * fsldma_run_tx_complete_actions - cleanup a single link descriptor
+ * @chan: Freescale DMA channel
+ * @desc: descriptor to cleanup and free
+ * @cookie: Freescale DMA transaction identifier
+ *
+ * This function is used on a descriptor which has been executed by the DMA
+ * controller. It will run any callbacks, submit any dependencies.
+ */
+static dma_cookie_t fsldma_run_tx_complete_actions(struct fsldma_chan *chan,
+   struct fsl_desc_sw *desc, dma_cookie_t cookie)
+{
+   struct dma_async_tx_descriptor *txd = &desc->async_tx;
+   struct device *dev = chan->common.device->dev;
+   dma_addr_t src = get_desc_src(chan, desc);
+   dma_addr_t dst = get_desc_dst(chan, desc);
+   u32 len = get_desc_cnt(chan, desc);
+
+   BUG_ON(txd->cookie < 0);
+
+   if (txd->cookie > 0) {
+   cookie = txd->cookie;
+
+   /* Run the link descriptor callback function */
+   if (txd->callback) {
+#ifdef FSL_DMA_LD_DEBUG
+   

[PATCH v7 6/8] fsl-dma: use spin_lock_bh to instead of spin_lock_irqsave

2012-08-09 Thread qiang.liu
From: Qiang Liu 

The use of spin_lock_irqsave() is a stronger locking mechanism than is
required throughout the driver. The minimum locking required should be
used instead. Interrupts will be turned off and context will be saved,
there is needless to use irqsave.

Change all instances of spin_lock_irqsave() to spin_lock_bh().
All manipulation of protected fields is done using tasklet context or
weaker, which makes spin_lock_bh() the correct choice.

Cc: Dan Williams 
Cc: Dan Williams 
Cc: Vinod Koul 
Cc: Li Yang 
Cc: Timur Tabi 
Signed-off-by: Qiang Liu 
Acked-by: Ira W. Snyder 
Acked-by: Arnd Bergmann 
---
Comments by Arnd Bergmann in v6:
"You could actually change the use of spin_lock_bh inside of the tasklet
function (dma_do_tasklet) do just spin_lock(), because softirqs are
already disabled there, but your version is also ok."

 drivers/dma/fsldma.c |   30 --
 1 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index b05a81f..8b9c0f7 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -405,10 +405,9 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
struct fsldma_chan *chan = to_fsl_chan(tx->chan);
struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
struct fsl_desc_sw *child;
-   unsigned long flags;
dma_cookie_t cookie;

-   spin_lock_irqsave(&chan->desc_lock, flags);
+   spin_lock_bh(&chan->desc_lock);

/*
 * assign cookies to all of the software descriptors
@@ -421,7 +420,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
/* put this transaction onto the tail of the pending queue */
append_ld_queue(chan, desc);

-   spin_unlock_irqrestore(&chan->desc_lock, flags);
+   spin_unlock_bh(&chan->desc_lock);

return cookie;
 }
@@ -762,15 +761,14 @@ static void fsldma_free_desc_list_reverse(struct 
fsldma_chan *chan,
 static void fsl_dma_free_chan_resources(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
-   unsigned long flags;

chan_dbg(chan, "free all channel resources\n");
-   spin_lock_irqsave(&chan->desc_lock, flags);
+   spin_lock_bh(&chan->desc_lock);
fsldma_cleanup_descriptors(chan);
fsldma_free_desc_list(chan, &chan->ld_pending);
fsldma_free_desc_list(chan, &chan->ld_running);
fsldma_free_desc_list(chan, &chan->ld_completed);
-   spin_unlock_irqrestore(&chan->desc_lock, flags);
+   spin_unlock_bh(&chan->desc_lock);

dma_pool_destroy(chan->desc_pool);
chan->desc_pool = NULL;
@@ -989,7 +987,6 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
 {
struct dma_slave_config *config;
struct fsldma_chan *chan;
-   unsigned long flags;
int size;

if (!dchan)
@@ -999,7 +996,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,

switch (cmd) {
case DMA_TERMINATE_ALL:
-   spin_lock_irqsave(&chan->desc_lock, flags);
+   spin_lock_bh(&chan->desc_lock);

/* Halt the DMA engine */
dma_halt(chan);
@@ -1010,7 +1007,7 @@ static int fsl_dma_device_control(struct dma_chan *dchan,
fsldma_free_desc_list(chan, &chan->ld_completed);
chan->idle = true;

-   spin_unlock_irqrestore(&chan->desc_lock, flags);
+   spin_unlock_bh(&chan->desc_lock);
return 0;

case DMA_SLAVE_CONFIG:
@@ -1052,11 +1049,10 @@ static int fsl_dma_device_control(struct dma_chan 
*dchan,
 static void fsl_dma_memcpy_issue_pending(struct dma_chan *dchan)
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
-   unsigned long flags;

-   spin_lock_irqsave(&chan->desc_lock, flags);
+   spin_lock_bh(&chan->desc_lock);
fsl_chan_xfer_ld_queue(chan);
-   spin_unlock_irqrestore(&chan->desc_lock, flags);
+   spin_unlock_bh(&chan->desc_lock);
 }

 /**
@@ -1069,15 +1065,14 @@ static enum dma_status fsl_tx_status(struct dma_chan 
*dchan,
 {
struct fsldma_chan *chan = to_fsl_chan(dchan);
enum dma_status ret;
-   unsigned long flags;

ret = dma_cookie_status(dchan, cookie, txstate);
if (ret == DMA_SUCCESS)
return ret;

-   spin_lock_irqsave(&chan->desc_lock, flags);
+   spin_lock_bh(&chan->desc_lock);
fsldma_cleanup_descriptors(chan);
-   spin_unlock_irqrestore(&chan->desc_lock, flags);
+   spin_unlock_bh(&chan->desc_lock);

return dma_cookie_status(dchan, cookie, txstate);
 }
@@ -1156,11 +1151,10 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data)
 static void dma_do_tasklet(unsigned long data)
 {
struct fsldma_chan *chan = (struct fsldma_chan *)data;
-   unsigned long flags;

chan_dbg(chan, "tasklet entry\n");

-   spin_lock_irqsave(&chan->desc_lock, flags);
+   spin_lock

[PATCH v7 7/8] fsl-dma: fix a warning of unitialized cookie

2012-08-09 Thread qiang.liu
From: Qiang Liu 

Fix a warning of unitialized value when compile with -Wuninitialized.

Cc: Dan Williams 
Cc: Dan Williams 
Cc: Vinod Koul 
Cc: Li Yang 
Signed-off-by: Qiang Liu 
Reported-by: Kim Phillips 
Acked-by: Ira W. Snyder 
---
 drivers/dma/fsldma.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c
index 8b9c0f7..361203d 100644
--- a/drivers/dma/fsldma.c
+++ b/drivers/dma/fsldma.c
@@ -405,7 +405,7 @@ static dma_cookie_t fsl_dma_tx_submit(struct 
dma_async_tx_descriptor *tx)
struct fsldma_chan *chan = to_fsl_chan(tx->chan);
struct fsl_desc_sw *desc = tx_to_fsl_desc(tx);
struct fsl_desc_sw *child;
-   dma_cookie_t cookie;
+   dma_cookie_t cookie = 0;

spin_lock_bh(&chan->desc_lock);

--
1.7.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 8/8] carma: remove unnecessary DMA_INTERRUPT capability

2012-08-09 Thread qiang.liu
From: Qiang Liu 

These drivers set the DMA_INTERRUPT capability bit when requesting a DMA
controller channel. This was historical, and is no longer needed.

Recent changes to the drivers/dma/fsldma.c driver have removed support
for this flag. This makes the carma drivers unable to find a DMA channel
with the required capabilities.

Cc: Arnd Bergmann 
Cc: Greg Kroah-Hartman 
Signed-off-by: Ira W. Snyder 
---
Based on git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
against branch char-misc-next.

 drivers/misc/carma/carma-fpga-program.c |1 -
 drivers/misc/carma/carma-fpga.c |2 +-
 2 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/misc/carma/carma-fpga-program.c 
b/drivers/misc/carma/carma-fpga-program.c
index a2d25e4..eaddfe9 100644
--- a/drivers/misc/carma/carma-fpga-program.c
+++ b/drivers/misc/carma/carma-fpga-program.c
@@ -978,7 +978,6 @@ static int fpga_of_probe(struct platform_device *op)
dev_set_drvdata(priv->dev, priv);
dma_cap_zero(mask);
dma_cap_set(DMA_MEMCPY, mask);
-   dma_cap_set(DMA_INTERRUPT, mask);
dma_cap_set(DMA_SLAVE, mask);
dma_cap_set(DMA_SG, mask);

diff --git a/drivers/misc/carma/carma-fpga.c b/drivers/misc/carma/carma-fpga.c
index 8c279da..0c43297 100644
--- a/drivers/misc/carma/carma-fpga.c
+++ b/drivers/misc/carma/carma-fpga.c
@@ -666,7 +666,7 @@ static int data_submit_dma(struct fpga_device *priv, struct 
data_buf *buf)
src = SYS_FPGA_BLOCK;
tx = chan->device->device_prep_dma_memcpy(chan, dst, src,
  REG_BLOCK_SIZE,
- DMA_PREP_INTERRUPT);
+ 0);
if (!tx) {
dev_err(priv->dev, "unable to prep SYS-FPGA DMA\n");
return -ENOMEM;
--
1.7.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [RFC PATCH] block:Fix some problems about handling plug in blk_queue_bio().

2012-08-09 Thread Shaohua Li
2012/8/8 Jianpeng Ma :
> On 2012-08-08 11:06 Shaohua Li  Wrote:
>>2012/8/8 Jianpeng Ma :
>>> I think there are three problems about handling plug in blk_queue_bio():
>>> 1:if request_count >= BLK_MAX_REQUEST_COUNT, avoid unnecessary 
>>> plug->should_sort judge.
>>this makes sense, though not a big deal, nice to fix it.
> Thanks
>>
>>> 2:Only two device can trace plug.
>>I didn't get the point, can you have more details?
>
>>>if (plug) {
>>>  /*
>>>   * If this is the first request added after a plug, fire
>>>   * of a plug trace. If others have been added before, check
>>>   * if we have multiple devices in this plug. If so, make a
>>>   * note to sort the list before dispatch.
>>>   */
>>>  if (list_empty(&plug->list))
>>>  trace_block_plug(q);
>>>  else {
>>>  if (!plug->should_sort) {
>>>  struct request *__rq;
>
>>>  __rq = list_entry_rq(plug->list.prev);
>>>  if (__rq->q != q)
>>>  plug->should_sort = 1;
>>>  }
>>>  if (request_count >= BLK_MAX_REQUEST_COUNT) {
>>>  blk_flush_plug_list(plug, false);
>>>  trace_block_plug(q);
> The code only trace two point;
> A:  list_empty(&plug->list)
> B:  request_count >= BLK_MAX_REQUEST_COUNT). it's the same like A which 
> plug->list is empty.
> Suppose:
> 1;reqA-deviceA firstly come, it will call trace_block_plug because the 
> list_empty(plug->list) is true.
> 2:reqB-deviceB comed, attempt_plug_merge will failed because not 
> deviceB-request-queue.But it'll not to call trace_block_plug.
>
> But call blk_flush_plug_list,it will trace_block_unplug all request_queue.

ok, this is true. please send a new patch for the item 1&2 then.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Regression] "x86-64/efi: Use EFI to deal with platform wall clock" prevents my machine from booting

2012-08-09 Thread Matt Fleming
On Tue, 2012-08-07 at 11:50 +0100, Jan Beulich wrote:
> > 
> > I managed to find a machine to reproduce this on and it looks like the
> > ASUS firmware engineers are upto their old tricks of referencing
> > physical addresses after we've taken control of the memory map,
> 
> Yippie. On such systems we simply can't do any runtime calls.
> Should we add a command line option forcing efi_native to false,
> thus suppressing all runtime calls? Or would the "noefi" one be
> enough already?

I think a better solution for this, seeing as there appear to be *so*
many ASUS machines in the wild with this inability to do virtual EFI
calls, is to provide a 1:1 mapping as well as our regular virt->phys
mapping for the benefit of the firmware. We can load our special page
table in efi_call_*, etc.

One thing to note is that because of breakage seen on Apple machines
last time Matthew tried this approach, we (the kernel) can't actually
access the 1:1 mapping, it would exist purely for the benefit of
firmware that was broken enough to reference physical addresses after
SetVirtualAddressMap().

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] perf/x86: disable PEBS on a guest entry.

2012-08-09 Thread Gleb Natapov
If PMU counter has PEBS enabled it is not enough to disable counter
on a guest entry since PEBS memory write can overshoot guest entry
and corrupt guest memory. Disabling PEBS during guest entry solves
the problem.

Tested-by: David Ahern 
Signed-off-by: Gleb Natapov 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c 
b/arch/x86/kernel/cpu/perf_event_intel.c
index 7a8b9d0..7268fd8 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1444,8 +1444,16 @@ static struct perf_guest_switch_msr 
*intel_guest_get_msrs(int *nr)
arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
+   /*
+* If PMU counter has PEBS enabled it is not enough to disable counter
+* on a guest entry since PEBS memory write can overshoot guest entry
+* and corrupt guest memory. Disabling PEBS solves the problem.
+*/
+   arr[1].msr = MSR_IA32_PEBS_ENABLE;
+   arr[1].host = cpuc->pebs_enabled;
+   arr[1].guest = 0;
 
-   *nr = 1;
+   *nr = 2;
return arr;
 }
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: recv(...,MSG_TRUNC)

2012-08-09 Thread ibrahem . batta
On Saturday, June 29, 2002 10:40:18 PM UTC+3, Samuel Thibault wrote:
> (Previous mail seemed to be garbaged by mime attachment)
> 
> Hello,
> 
> man recv says, about flags :
> 
>MSG_TRUNC
>   Return  the real length of the packet, even when it
>   was longer than the passed buffer. Only  valid  for
>   packet sockets.
> 
> But it is neither implemented in ipv4/udp.c, nor in ipv6/udp.c, although
> it is in tcp.c, for instance !
> 
> By searching with google, I could read old manpages where it didn't exist,
> but I find it very useful, especially in conjunction with MSG_PEEK, for
> trying to read a packet with a little buffer, and then really get it with
> an appropriate buffer.
> 
> So here's a patch which cures 2.4.18, and also works on 2.5 kernels.
> 
> Best regards,
> 
> Samuel Thibault
> 
> diff -urN linux-2.4.18/net/ipv4/udp.c linux-2.4.18-cor/net/ipv4/udp.c
> --- linux-2.4.18/net/ipv4/udp.c   Mon Jun 10 23:34:59 2002
> +++ linux-2.4.18-cor/net/ipv4/udp.c   Mon Jun 10 23:35:31 2002
> @@ -680,7 +680,7 @@
>   }
>   if (sk->protinfo.af_inet.cmsg_flags)
>   ip_cmsg_recv(msg, skb);
> - err = copied;
> + err = (flags&MSG_TRUNC) ? skb->len - sizeof(struct udphdr) : copied;
> 
>  out_free:
>   skb_free_datagram(sk, skb);
> diff -urN linux-2.4.18/net/ipv6/udp.c linux-2.4.18-cor/net/ipv6/udp.c
> --- linux-2.4.18/net/ipv6/udp.c   Mon Jun 10 23:35:07 2002
> +++ linux-2.4.18-cor/net/ipv6/udp.c   Mon Jun 10 23:35:36 2002
> @@ -432,7 +432,7 @@
>   }
>   }
>   }
> - err = copied;
> + err = (flags&MSG_TRUNC) ? skb->len - sizeof(struct udphdr) : copied;
> 
>  out_free:
>   skb_free_datagram(sk, skb);
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

man i want to see full example about receivefrom and MSG_TRUNC flag please 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 06/10] ARM: davinci: don't use broken ntosd2_init_i2c

2012-08-09 Thread Arnd Bergmann
On second thought, there is probably no use in keeping the broken function
around, and we can just kill it off. If someone wants to add an i2c
device to the bus, they can always add the init code back as well.

8<-
Subject: ARM: davinci: don't use broken ntosd2_init_i2c

ntosd2_init_i2c walks the ntosd2_i2c_info array, which it expects to
be populated with at least one member. gcc correctly warns about
the out-of-bounds access here.

Since this can not possibly work, it's better to disable i2c
support entirely on this board.

Without this patch, building davinci_all_defconfig results in:

arch/arm/mach-davinci/board-neuros-osd2.c: In function 'davinci_ntosd2_init':
arch/arm/mach-davinci/board-neuros-osd2.c:187:20: warning: array subscript is 
above array bounds [-Warray-bounds]

Signed-off-by: Arnd Bergmann 
Cc: Kevin Hilman 
Cc: Sekhar Nori 
Cc: Andrey Porodko 

diff --git a/arch/arm/mach-davinci/board-neuros-osd2.c 
b/arch/arm/mach-davinci/board-neuros-osd2.c
index 5de69f2..f6b9fc7 100644
--- a/arch/arm/mach-davinci/board-neuros-osd2.c
+++ b/arch/arm/mach-davinci/board-neuros-osd2.c
@@ -162,38 +162,6 @@ static void __init davinci_ntosd2_map_io(void)
dm644x_init();
 }
 
-/*
- I2C initialization
-*/
-static struct davinci_i2c_platform_data ntosd2_i2c_pdata = {
-   .bus_freq   = 20 /* kHz */,
-   .bus_delay  = 100 /* usec */,
-};
-
-static struct i2c_board_info __initdata ntosd2_i2c_info[] =  {
-};
-
-static int ntosd2_init_i2c(void)
-{
-   int status;
-
-   davinci_init_i2c(&ntosd2_i2c_pdata);
-   status = gpio_request(NTOSD2_MSP430_IRQ, ntosd2_i2c_info[0].type);
-   if (status == 0) {
-   status = gpio_direction_input(NTOSD2_MSP430_IRQ);
-   if (status == 0) {
-   status = gpio_to_irq(NTOSD2_MSP430_IRQ);
-   if (status > 0) {
-   ntosd2_i2c_info[0].irq = status;
-   i2c_register_board_info(1,
-   ntosd2_i2c_info,
-   ARRAY_SIZE(ntosd2_i2c_info));
-   }
-   }
-   }
-   return status;
-}
-
 static struct davinci_mmc_config davinci_ntosd2_mmc_config = {
.wires  = 4,
.version= MMC_CTLR_VERSION_1
@@ -218,7 +186,6 @@ static __init void davinci_ntosd2_init(void)
 {
struct clk *aemif_clk;
struct davinci_soc_info *soc_info = &davinci_soc_info;
-   int status;
 
aemif_clk = clk_get(NULL, "aemif");
clk_enable(aemif_clk);
@@ -242,12 +209,6 @@ static __init void davinci_ntosd2_init(void)
platform_add_devices(davinci_ntosd2_devices,
ARRAY_SIZE(davinci_ntosd2_devices));
 
-   /* Initialize I2C interface specific for this board */
-   status = ntosd2_init_i2c();
-   if (status < 0)
-   pr_warning("davinci_ntosd2_init: msp430 irq setup failed:"
-   "%d\n", status);
-
davinci_serial_init(&uart_config);
dm644x_init_asp(&dm644x_ntosd2_snd_data);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v6 1/3] mm: introduce compaction and migration for virtio ballooned pages

2012-08-09 Thread Mel Gorman
On Wed, Aug 08, 2012 at 07:53:19PM -0300, Rafael Aquini wrote:
> Memory fragmentation introduced by ballooning might reduce significantly
> the number of 2MB contiguous memory blocks that can be used within a guest,
> thus imposing performance penalties associated with the reduced number of
> transparent huge pages that could be used by the guest workload.
> 
> This patch introduces the helper functions as well as the necessary changes
> to teach compaction and migration bits how to cope with pages which are
> part of a guest memory balloon, in order to make them movable by memory
> compaction procedures.
> 
> Signed-off-by: Rafael Aquini 

Mostly looks ok but I have one question;

> 
>
> +/* putback_lru_page() counterpart for a ballooned page */
> +bool putback_balloon_page(struct page *page)
> +{
> + if (WARN_ON(!movable_balloon_page(page)))
> + return false;
> +
> + if (likely(trylock_page(page))) {
> + if (movable_balloon_page(page)) {
> + __putback_balloon_page(page);
> + put_page(page);
> + unlock_page(page);
> + return true;
> + }
> + unlock_page(page);
> + }

You might have answered this already as I skipped over a few revisions
and if you have, sorry about that and please add a comment :)

This trylock_page looks risky as it looks like it can fail if another
process running compaction tries to isolate this page. It locks the page,
finds it cant and releases the lock but in the meantime this trylock can
fail. It triggers a WARN_ON so we'll get a bug report but it leaves the
reference count elevated and this page has now leaked.

Why not just lock_page(page)? As you have already isolated this page you
know that the lock is only going to be held by a parallel compacting
process checking the reference count and the delay will be short. As a
bonus you can drop the WARN_ON check in the caller and make this void as
the WARN_ON check in the caller becomes redundant.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/6] virtio/console: Add splice_write support

2012-08-09 Thread Amit Shah
On (Tue) 24 Jul 2012 [11:37:07], Yoshihiro YUNOMAE wrote:
> From: Masami Hiramatsu 
> 
> Enable to use splice_write from pipe to virtio-console port.
> This steals pages from pipe and directly send it to host.
> 
> Note that this may accelerate only the guest to host path.
> 
> Signed-off-by: Masami Hiramatsu 
> Cc: Amit Shah 
> Cc: Arnd Bergmann 
> Cc: Greg Kroah-Hartman 
> ---

> +/* Faster zero-copy write by splicing */
> +static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
> +   struct file *filp, loff_t *ppos,
> +   size_t len, unsigned int flags)
> +{
> + struct port *port = filp->private_data;
> + struct sg_list sgl;
> + ssize_t ret;
> + struct splice_desc sd = {
> + .total_len = len,
> + .flags = flags,
> + .pos = *ppos,
> + .u.data = &sgl,
> + };
> +
> + sgl.n = 0;
> + sgl.len = 0;
> + sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
> +  GFP_ATOMIC);

Do you expect this function to be called from interrupt context?

> + if (unlikely(!sgl.sg))
> + return -ENOMEM;
> +
> + sg_init_table(sgl.sg, MAX_SPLICE_PAGES);
> + ret = __splice_from_pipe(pipe, &sd, pipe_to_sg);
> + if (likely(ret > 0))
> + ret = send_pages(port, sgl.sg, sgl.n, sgl.len, true);
> +
> + return ret;
> +}

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[2.6.32+drm33-longterm] Linux 2.6.32.59+drm33.26

2012-08-09 Thread Stefan Bader
I am announcing the release of the 2.6.32.59+drm33.26 longterm tree.

This tree is based on 2.6.32 and generally has all of the stable updates
applied. Except those to the DRM subsystem, which was based on 2.6.33 and
took updates from that upstream stable as long as that existed. It will
continue to add patches to the DRM subsystem as long as they are valid
according to the stable update rules (Documentation/stable_kernel_rules.txt).
DRM patches for this tree should be sent to kernel-t...@lists.ubuntu.com.

This release updates the DRM subsystem only.

The updated 2.6.32.y-drm33.z tree can be found at:
  git://git.kernel.org/pub/scm/linux/kernel/git/smb/linux-2.6.32.y-drm33.z.git
or
  git://kernel.ubuntu.com/smb/linux-2.6.32.y-drm33.z.git

and can be browsed through git web via:
  
http://git.kernel.org/?p=linux/kernel/git/smb/linux-2.6.32.y-drm33.z.git;a=summary
or
  http://kernel.ubuntu.com/git?p=smb/linux-2.6.32.y-drm33.z.git;a=summary

-Stefan

--

* drm: integer overflow in drm_mode_dirtyfb_ioctl()
* Linux 2.6.32.59+drm33.26

 Makefile   |2 +-
 drivers/gpu/drm/drm_crtc.c |4 
 include/drm/drm_mode.h |2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 7667dad..a9c82a4 100644
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 6
 SUBLEVEL = 32
-EXTRAVERSION = .59+drm33.25
+EXTRAVERSION = .59+drm33.26
 NAME = Man-Eating Seals of Antiquity

 # *DOCUMENTATION*
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index 81ba0e6..ec9e97e 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -1832,6 +1832,10 @@ int drm_mode_dirtyfb_ioctl(struct drm_device *dev,
}

if (num_clips && clips_ptr) {
+   if (num_clips < 0 || num_clips > DRM_MODE_FB_DIRTY_MAX_CLIPS) {
+   ret = -EINVAL;
+   goto out_err1;
+   }
clips = kzalloc(num_clips * sizeof(*clips), GFP_KERNEL);
if (!clips) {
ret = -ENOMEM;
diff --git a/include/drm/drm_mode.h b/include/drm/drm_mode.h
index c5ba163..9cf639c 100644
--- a/include/drm/drm_mode.h
+++ b/include/drm/drm_mode.h
@@ -232,6 +232,8 @@ struct drm_mode_fb_cmd {
 #define DRM_MODE_FB_DIRTY_ANNOTATE_FILL 0x02
 #define DRM_MODE_FB_DIRTY_FLAGS 0x03

+#define DRM_MODE_FB_DIRTY_MAX_CLIPS 256
+
 /*
  * Mark a region of a framebuffer as dirty.
  *
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Using vanilla kernels headers to do backport testing -- issues with memcpy()

2012-08-09 Thread Luis R. Rodriguez
On Thu, Aug 9, 2012 at 12:27 AM, Sedat Dilek  wrote:
> I am not an English native but this sentence sounds a bit strange to
> me ("used" followed by "require" and double "Linux"?).
> I can imagine what you want to express but have no better wording right now.
>
> 180 +  echo -e "   This was added since kernels >= 3.4 used require"
> 181 +  echo -e "   a glibc >= 2.14 for memcpy(), and not all Linux"
> 182 +  echo -e "   Linux distributions have such glibc.

Fixed.

 Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer

2012-08-09 Thread Amit Shah
On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
> From: Masami Hiramatsu 
> 
> Add a failback memcpy path for unstealable pipe buffer.
> If buf->ops->steal() fails, virtio-serial tries to
> copy the page contents to an allocated page, instead
> of just failing splice().
> 
> Signed-off-by: Masami Hiramatsu 
> Cc: Amit Shah 
> Cc: Arnd Bergmann 
> Cc: Greg Kroah-Hartman 
> ---
> 
>  drivers/char/virtio_console.c |   28 +---
>  1 files changed, 25 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> index fe31b2f..911cb3e 100644
> --- a/drivers/char/virtio_console.c
> +++ b/drivers/char/virtio_console.c
> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
> struct pipe_buffer *buf,
>   struct splice_desc *sd)
>  {
>   struct sg_list *sgl = sd->u.data;
> - unsigned int len = 0;
> + unsigned int offset, len;
>  
>   if (sgl->n == MAX_SPLICE_PAGES)
>   return 0;
> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
> struct pipe_buffer *buf,
>  
>   len = min(buf->len, sd->len);
>   sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> - sgl->n++;
> - sgl->len += len;
> + } else {
> + /* Failback to copying a page */
> + struct page *page = alloc_page(GFP_KERNEL);

I prefer zeroing out the page.  If there's not enough data to be
filled in the page, the remaining data can be leaked to the host.

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 1/9] thp: huge zero page: basic preparation

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

For now let's allocate the page on hugepage_init(). We'll switch to lazy
allocation later.

We are not going to map the huge zero page until we can handle it
properly on all code paths.

is_huge_zero_{pfn,pmd}() functions will be used by following patches to
check whether the pfn/pmd is huge zero page.

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |   29 +
 1 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 57c4b93..88e0a7a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -46,6 +46,7 @@ static unsigned int khugepaged_scan_sleep_millisecs 
__read_mostly = 1;
 /* during fragmentation poll the hugepage allocator once every minute */
 static unsigned int khugepaged_alloc_sleep_millisecs __read_mostly = 6;
 static struct task_struct *khugepaged_thread __read_mostly;
+static unsigned long huge_zero_pfn __read_mostly;
 static DEFINE_MUTEX(khugepaged_mutex);
 static DEFINE_SPINLOCK(khugepaged_mm_lock);
 static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
@@ -167,6 +168,28 @@ out:
return err;
 }
 
+static int init_huge_zero_page(void)
+{
+   struct page *hpage;
+
+   hpage = alloc_pages(GFP_TRANSHUGE | __GFP_ZERO, HPAGE_PMD_ORDER);
+   if (!hpage)
+   return -ENOMEM;
+
+   huge_zero_pfn = page_to_pfn(hpage);
+   return 0;
+}
+
+static inline bool is_huge_zero_pfn(unsigned long pfn)
+{
+   return pfn == huge_zero_pfn;
+}
+
+static inline bool is_huge_zero_pmd(pmd_t pmd)
+{
+   return is_huge_zero_pfn(pmd_pfn(pmd));
+}
+
 #ifdef CONFIG_SYSFS
 
 static ssize_t double_flag_show(struct kobject *kobj,
@@ -550,6 +573,10 @@ static int __init hugepage_init(void)
if (err)
return err;
 
+   err = init_huge_zero_page();
+   if (err)
+   goto out;
+
err = khugepaged_slab_init();
if (err)
goto out;
@@ -574,6 +601,8 @@ static int __init hugepage_init(void)
 
return 0;
 out:
+   if (huge_zero_pfn)
+   __free_page(pfn_to_page(huge_zero_pfn));
hugepage_exit_sysfs(hugepage_kobj);
return err;
 }
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 3/9] thp: copy_huge_pmd(): copy huge zero page

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

It's easy to copy huge zero page. Just set destination pmd to huge zero
page.

It's safe to copy huge zero page since we have none yet :-p

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |   17 +
 1 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9dcb9e6..a534f84 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -725,6 +725,18 @@ static inline struct page *alloc_hugepage(int defrag)
 }
 #endif
 
+static void set_huge_zero_page(pgtable_t pgtable, struct mm_struct *mm,
+   struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd)
+{
+   pmd_t entry;
+   entry = pfn_pmd(huge_zero_pfn, vma->vm_page_prot);
+   entry = pmd_wrprotect(entry);
+   entry = pmd_mkhuge(entry);
+   set_pmd_at(mm, haddr, pmd, entry);
+   prepare_pmd_huge_pte(pgtable, mm);
+   mm->nr_ptes++;
+}
+
 int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct 
*vma,
   unsigned long address, pmd_t *pmd,
   unsigned int flags)
@@ -802,6 +814,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct 
mm_struct *src_mm,
pte_free(dst_mm, pgtable);
goto out_unlock;
}
+   if (is_huge_zero_pmd(pmd)) {
+   set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd);
+   ret = 0;
+   goto out_unlock;
+   }
if (unlikely(pmd_trans_splitting(pmd))) {
/* split huge page running from under us */
spin_unlock(&src_mm->page_table_lock);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 0/9] Introduce huge zero page

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

During testing I noticed big (up to 2.5 times) memory consumption overhead
on some workloads (e.g. ft.A from NPB) if THP is enabled.

The main reason for that big difference is lacking zero page in THP case.
We have to allocate a real page on read page fault.

A program to demonstrate the issue:
#include 
#include 
#include 

#define MB 1024*1024

int main(int argc, char **argv)
{
char *p;
int i;

posix_memalign((void **)&p, 2 * MB, 200 * MB);
for (i = 0; i < 200 * MB; i+= 4096)
assert(p[i] == 0);
pause();
return 0;
}

With thp-never RSS is about 400k, but with thp-always it's 200M.
After the patcheset thp-always RSS is 400k too.

H. Peter Anvin proposed to use a "virtual huge zero page" -- a pmd table
with all pte set to 4k zero page. I haven't tried that approach and I'm
not sure if it's good idea (cache vs. tlb trashing). And I guess it will
require more code to handle.
For now, I just allocate 2M page and use it.

Kirill A. Shutemov (9):
  thp: huge zero page: basic preparation
  thp: zap_huge_pmd(): zap huge zero pmd
  thp: copy_huge_pmd(): copy huge zero page
  thp: do_huge_pmd_wp_page(): handle huge zero page
  thp: change_huge_pmd(): keep huge zero page write-protected
  thp: add address parameter to split_huge_page_pmd()
  thp: implement splitting pmd for huge zero page
  thp: setup huge zero page on non-write page fault
  thp: lazy huge zero page allocation

 Documentation/vm/transhuge.txt |4 +-
 arch/x86/kernel/vm86_32.c  |2 +-
 fs/proc/task_mmu.c |2 +-
 include/linux/huge_mm.h|   10 +-
 include/linux/mm.h |8 ++
 mm/huge_memory.c   |  228 +++-
 mm/memory.c|   11 +--
 mm/mempolicy.c |2 +-
 mm/mprotect.c  |2 +-
 mm/mremap.c|3 +-
 mm/pagewalk.c  |2 +-
 11 files changed, 226 insertions(+), 48 deletions(-)

-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 2/9] thp: zap_huge_pmd(): zap huge zero pmd

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

We don't have a real page to zap in huge zero page case. Let's just
clear pmd and remove it from tlb.

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |   27 +--
 1 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 88e0a7a..9dcb9e6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1071,16 +1071,23 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct 
vm_area_struct *vma,
struct page *page;
pgtable_t pgtable;
pgtable = get_pmd_huge_pte(tlb->mm);
-   page = pmd_page(*pmd);
-   pmd_clear(pmd);
-   tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
-   page_remove_rmap(page);
-   VM_BUG_ON(page_mapcount(page) < 0);
-   add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
-   VM_BUG_ON(!PageHead(page));
-   tlb->mm->nr_ptes--;
-   spin_unlock(&tlb->mm->page_table_lock);
-   tlb_remove_page(tlb, page);
+   if (is_huge_zero_pmd(*pmd)) {
+   pmd_clear(pmd);
+   tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
+   tlb->mm->nr_ptes--;
+   spin_unlock(&tlb->mm->page_table_lock);
+   } else {
+   page = pmd_page(*pmd);
+   pmd_clear(pmd);
+   tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
+   page_remove_rmap(page);
+   VM_BUG_ON(page_mapcount(page) < 0);
+   add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
+   VM_BUG_ON(!PageHead(page));
+   tlb->mm->nr_ptes--;
+   spin_unlock(&tlb->mm->page_table_lock);
+   tlb_remove_page(tlb, page);
+   }
pte_free(tlb->mm, pgtable);
ret = 1;
}
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 7/9] thp: implement splitting pmd for huge zero page

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

We can't split huge zero page itself, but we can split the pmd which
points to it.

On splitting the pmd we create a table with all ptes set to normal zero
page.

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |   36 
 1 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c8948d6..4df5841 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1599,6 +1599,7 @@ int split_huge_page(struct page *page)
struct anon_vma *anon_vma;
int ret = 1;
 
+   BUG_ON(is_huge_zero_pfn(page_to_pfn(page)));
BUG_ON(!PageAnon(page));
anon_vma = page_lock_anon_vma(page);
if (!anon_vma)
@@ -2503,6 +2504,36 @@ static int khugepaged(void *none)
return 0;
 }
 
+static void __split_huge_zero_page_pmd(struct mm_struct *mm, pmd_t *pmd,
+   unsigned long address)
+{
+   pgtable_t pgtable;
+   pmd_t _pmd;
+   unsigned long haddr = address & HPAGE_PMD_MASK;
+   struct vm_area_struct *vma;
+   int i;
+
+   vma = find_vma(mm, address);
+   VM_BUG_ON(vma == NULL);
+   pmdp_clear_flush_notify(vma, haddr, pmd);
+   /* leave pmd empty until pte is filled */
+
+   pgtable = get_pmd_huge_pte(mm);
+   pmd_populate(mm, &_pmd, pgtable);
+
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   pte_t *pte, entry;
+   entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
+   entry = pte_mkspecial(entry);
+   pte = pte_offset_map(&_pmd, haddr);
+   VM_BUG_ON(!pte_none(*pte));
+   set_pte_at(mm, haddr, pte, entry);
+   pte_unmap(pte);
+   }
+   smp_wmb(); /* make pte visible before pmd */
+   pmd_populate(mm, pmd, pgtable);
+}
+
 void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
unsigned long address)
 {
@@ -2513,6 +2544,11 @@ void __split_huge_page_pmd(struct mm_struct *mm, pmd_t 
*pmd,
spin_unlock(&mm->page_table_lock);
return;
}
+   if (is_huge_zero_pmd(*pmd)) {
+   __split_huge_zero_page_pmd(mm, pmd, address);
+   spin_unlock(&mm->page_table_lock);
+   return;
+   }
page = pmd_page(*pmd);
VM_BUG_ON(!page_count(page));
get_page(page);
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 5/9] thp: change_huge_pmd(): keep huge zero page write-protected

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

We want to get page fault on write attempt to huge zero page, so let's
keep it write-protected.

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f5029d4..4001f1a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1248,6 +1248,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
*pmd,
pmd_t entry;
entry = pmdp_get_and_clear(mm, addr, pmd);
entry = pmd_modify(entry, newprot);
+   if (is_huge_zero_pmd(entry))
+   entry = pmd_wrprotect(entry);
set_pmd_at(mm, addr, pmd, entry);
spin_unlock(&vma->vm_mm->page_table_lock);
ret = 1;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 9/9] thp: lazy huge zero page allocation

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

Instead of allocating huge zero page on hugepage_init() we can postpone it
until first huge zero page map. It saves memory if THP is not in use.

cmpxchg() is used to avoid race on huge_zero_pfn initialization.

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3a78677..6861230 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -168,21 +168,23 @@ out:
return err;
 }
 
-static int init_huge_zero_page(void)
+static int init_huge_zero_pfn(void)
 {
struct page *hpage;
+   unsigned long pfn;
 
hpage = alloc_pages(GFP_TRANSHUGE | __GFP_ZERO, HPAGE_PMD_ORDER);
if (!hpage)
return -ENOMEM;
-
-   huge_zero_pfn = page_to_pfn(hpage);
+   pfn = page_to_pfn(hpage);
+   if (cmpxchg(&huge_zero_pfn, 0, pfn))
+   __free_page(hpage);
return 0;
 }
 
 static inline bool is_huge_zero_pfn(unsigned long pfn)
 {
-   return pfn == huge_zero_pfn;
+   return huge_zero_pfn && pfn == huge_zero_pfn;
 }
 
 static inline bool is_huge_zero_pmd(pmd_t pmd)
@@ -573,10 +575,6 @@ static int __init hugepage_init(void)
if (err)
return err;
 
-   err = init_huge_zero_page();
-   if (err)
-   goto out;
-
err = khugepaged_slab_init();
if (err)
goto out;
@@ -601,8 +599,6 @@ static int __init hugepage_init(void)
 
return 0;
 out:
-   if (huge_zero_pfn)
-   __free_page(pfn_to_page(huge_zero_pfn));
hugepage_exit_sysfs(hugepage_kobj);
return err;
 }
@@ -752,6 +748,10 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, 
struct vm_area_struct *vma,
return VM_FAULT_OOM;
if (!(flags & FAULT_FLAG_WRITE)) {
pgtable_t pgtable;
+   if (unlikely(!huge_zero_pfn && init_huge_zero_pfn())) {
+   count_vm_event(THP_FAULT_FALLBACK);
+   goto out;
+   }
pgtable = pte_alloc_one(mm, haddr);
if (unlikely(!pgtable))
goto out;
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] xfs: check for possible overflow in xfs_ioc_trim

2012-08-09 Thread Tomas Racek
If range.start or range.minlen is bigger than filesystem size, return
invalid value error. This fixes possible overflow in BTOBB macro when
passed value was nearly ULLONG_MAX.

Signed-off-by: Tomas Racek 
---
 fs/xfs/xfs_discard.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c
index f9c3fe3..4e07a56 100644
--- a/fs/xfs/xfs_discard.c
+++ b/fs/xfs/xfs_discard.c
@@ -179,12 +179,14 @@ xfs_ioc_trim(
 * used by the fstrim application.  In the end it really doesn't
 * matter as trimming blocks is an advisory interface.
 */
+   if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) ||
+   range.minlen > XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks))
+   return -XFS_ERROR(EINVAL);
+
start = BTOBB(range.start);
end = start + BTOBBT(range.len) - 1;
minlen = BTOBB(max_t(u64, granularity, range.minlen));
 
-   if (XFS_BB_TO_FSB(mp, start) >= mp->m_sb.sb_dblocks)
-   return -XFS_ERROR(EINVAL);
if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1)
end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1;
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 8/9] thp: setup huge zero page on non-write page fault

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

All code paths seems covered. Now we can map huge zero page on read page
fault.

Signed-off-by: Kirill A. Shutemov 
---
 mm/huge_memory.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4df5841..3a78677 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -750,6 +750,16 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, 
struct vm_area_struct *vma,
return VM_FAULT_OOM;
if (unlikely(khugepaged_enter(vma)))
return VM_FAULT_OOM;
+   if (!(flags & FAULT_FLAG_WRITE)) {
+   pgtable_t pgtable;
+   pgtable = pte_alloc_one(mm, haddr);
+   if (unlikely(!pgtable))
+   goto out;
+   spin_lock(&mm->page_table_lock);
+   set_huge_zero_page(pgtable, mm, vma, haddr, pmd);
+   spin_unlock(&mm->page_table_lock);
+   return 0;
+   }
page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
  vma, haddr, numa_node_id(), 0);
if (unlikely(!page)) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH, RFC 6/9] thp: add address parameter to split_huge_page_pmd()

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

It's required to implement huge zero pmd splitting.

Signed-off-by: Kirill A. Shutemov 
---
 Documentation/vm/transhuge.txt |4 ++--
 arch/x86/kernel/vm86_32.c  |2 +-
 fs/proc/task_mmu.c |2 +-
 include/linux/huge_mm.h|   10 ++
 mm/huge_memory.c   |5 +++--
 mm/memory.c|4 ++--
 mm/mempolicy.c |2 +-
 mm/mprotect.c  |2 +-
 mm/mremap.c|3 ++-
 mm/pagewalk.c  |2 +-
 10 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
index f734bb2..b1fe2ca 100644
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -276,7 +276,7 @@ unaffected. libhugetlbfs will also work fine as usual.
 == Graceful fallback ==
 
 Code walking pagetables but unware about huge pmds can simply call
-split_huge_page_pmd(mm, pmd) where the pmd is the one returned by
+split_huge_page_pmd(mm, pmd, addr) where the pmd is the one returned by
 pmd_offset. It's trivial to make the code transparent hugepage aware
 by just grepping for "pmd_offset" and adding split_huge_page_pmd where
 missing after pmd_offset returns the pmd. Thanks to the graceful
@@ -299,7 +299,7 @@ diff --git a/mm/mremap.c b/mm/mremap.c
return NULL;
 
pmd = pmd_offset(pud, addr);
-+  split_huge_page_pmd(mm, pmd);
++  split_huge_page_pmd(mm, pmd, addr);
if (pmd_none_or_clear_bad(pmd))
return NULL;
 
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 255f58a..719ba0c 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -180,7 +180,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
if (pud_none_or_clear_bad(pud))
goto out;
pmd = pmd_offset(pud, 0xA);
-   split_huge_page_pmd(mm, pmd);
+   split_huge_page_pmd(mm, pmd, 0xA);
if (pmd_none_or_clear_bad(pmd))
goto out;
pte = pte_offset_map_lock(mm, pmd, 0xA, &ptl);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 4540b8f..27c1827 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -597,7 +597,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long 
addr,
spinlock_t *ptl;
struct page *page;
 
-   split_huge_page_pmd(walk->mm, pmd);
+   split_huge_page_pmd(walk->mm, pmd, addr);
if (pmd_trans_unstable(pmd))
return 0;
 
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4c59b11..ce91199 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -92,12 +92,14 @@ extern int handle_pte_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
pte_t *pte, pmd_t *pmd, unsigned int flags);
 extern int split_huge_page(struct page *page);
-extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd);
-#define split_huge_page_pmd(__mm, __pmd)   \
+extern void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
+   unsigned long address);
+#define split_huge_page_pmd(__mm, __pmd, __address)\
do {\
pmd_t *pmd = (__pmd);   \
if (unlikely(pmd_trans_huge(*pmd))) \
-   __split_huge_page_pmd(__mm, pmd);   \
+   __split_huge_page_pmd(__mm, pmd,\
+   __address); \
}  while (0)
 #define wait_split_huge_page(__anon_vma, __pmd)
\
do {\
@@ -174,7 +176,7 @@ static inline int split_huge_page(struct page *page)
 {
return 0;
 }
-#define split_huge_page_pmd(__mm, __pmd)   \
+#define split_huge_page_pmd(__mm, __pmd, __address)\
do { } while (0)
 #define wait_split_huge_page(__anon_vma, __pmd)\
do { } while (0)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4001f1a..c8948d6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2503,7 +2503,8 @@ static int khugepaged(void *none)
return 0;
 }
 
-void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd)
+void __split_huge_page_pmd(struct mm_struct *mm, pmd_t *pmd,
+   unsigned long address)
 {
struct page *page;
 
@@ -2547,7 +2548,7 @@ static void split_huge_page_address(struct mm_struct *mm,
 * Caller holds the mmap_sem write mode, so a huge pmd cannot
 * materialize from under us.
 */
-   split_huge_page_pmd(mm, pmd);
+   split_huge_page_pmd(mm, pmd, address);
 }
 
 void __vma_adjust_trans_huge(struct vm_area_struct *v

[PATCH, RFC 4/9] thp: do_huge_pmd_wp_page(): handle huge zero page

2012-08-09 Thread Kirill A. Shutemov
From: "Kirill A. Shutemov" 

On right access to huge zero page we alloc a new page and clear it.

In fallback path we create a new table and set pte around fault address
to the newly allocated page. All other ptes set to normal zero page.

Signed-off-by: Kirill A. Shutemov 
---
 include/linux/mm.h |8 
 mm/huge_memory.c   |  102 
 mm/memory.c|7 
 3 files changed, 95 insertions(+), 22 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b36d08c..c6eef63 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -514,6 +514,14 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct 
vm_area_struct *vma)
 }
 #endif
 
+#ifndef my_zero_pfn
+static inline unsigned long my_zero_pfn(unsigned long addr)
+{
+   extern unsigned long zero_pfn;
+   return zero_pfn;
+}
+#endif
+
 /*
  * Multiple processes may "see" the same page. E.g. for untouched
  * mappings of /dev/null, all processes see the same page full of
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a534f84..f5029d4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -867,6 +867,61 @@ pgtable_t get_pmd_huge_pte(struct mm_struct *mm)
return pgtable;
 }
 
+static int do_huge_pmd_wp_zero_page_fallback(struct mm_struct *mm,
+   struct vm_area_struct *vma, unsigned long address,
+   pmd_t *pmd, unsigned long haddr)
+{
+   pgtable_t pgtable;
+   pmd_t _pmd;
+   struct page *page;
+   int i, ret = 0;
+
+   page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
+   if (!page) {
+   ret |= VM_FAULT_OOM;
+   goto out;
+   }
+
+   if (mem_cgroup_newpage_charge(page, mm, GFP_KERNEL)) {
+   put_page(page);
+   ret |= VM_FAULT_OOM;
+   goto out;
+   }
+
+   clear_user_highpage(page, address);
+   __SetPageUptodate(page);
+
+   spin_lock(&mm->page_table_lock);
+   pmdp_clear_flush_notify(vma, haddr, pmd);
+   /* leave pmd empty until pte is filled */
+
+   pgtable = get_pmd_huge_pte(mm);
+   pmd_populate(mm, &_pmd, pgtable);
+
+   for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) {
+   pte_t *pte, entry;
+   if (haddr == (address & PAGE_MASK)) {
+   entry = mk_pte(page, vma->vm_page_prot);
+   entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+   page_add_new_anon_rmap(page, vma, haddr);
+   } else {
+   entry = pfn_pte(my_zero_pfn(haddr), vma->vm_page_prot);
+   entry = pte_mkspecial(entry);
+   }
+   pte = pte_offset_map(&_pmd, haddr);
+   VM_BUG_ON(!pte_none(*pte));
+   set_pte_at(mm, haddr, pte, entry);
+   pte_unmap(pte);
+   }
+   smp_wmb(); /* make pte visible before pmd */
+   pmd_populate(mm, pmd, pgtable);
+   spin_unlock(&mm->page_table_lock);
+
+   ret |= VM_FAULT_WRITE;
+out:
+   return ret;
+}
+
 static int do_huge_pmd_wp_page_fallback(struct mm_struct *mm,
struct vm_area_struct *vma,
unsigned long address,
@@ -964,17 +1019,19 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
unsigned long address, pmd_t *pmd, pmd_t orig_pmd)
 {
int ret = 0;
-   struct page *page, *new_page;
+   struct page *page = NULL, *new_page;
unsigned long haddr;
 
VM_BUG_ON(!vma->anon_vma);
+   haddr = address & HPAGE_PMD_MASK;
+   if (is_huge_zero_pmd(orig_pmd))
+   goto alloc;
spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(*pmd, orig_pmd)))
goto out_unlock;
 
page = pmd_page(orig_pmd);
VM_BUG_ON(!PageCompound(page) || !PageHead(page));
-   haddr = address & HPAGE_PMD_MASK;
if (page_mapcount(page) == 1) {
pmd_t entry;
entry = pmd_mkyoung(orig_pmd);
@@ -986,7 +1043,7 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
}
get_page(page);
spin_unlock(&mm->page_table_lock);
-
+alloc:
if (transparent_hugepage_enabled(vma) &&
!transparent_hugepage_debug_cow())
new_page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
@@ -996,28 +1053,39 @@ int do_huge_pmd_wp_page(struct mm_struct *mm, struct 
vm_area_struct *vma,
 
if (unlikely(!new_page)) {
count_vm_event(THP_FAULT_FALLBACK);
-   ret = do_huge_pmd_wp_page_fallback(mm, vma, address,
-  pmd, orig_pmd, page, haddr);
-   if (ret & VM_FAULT_OOM)
-   split_huge_page(page);
-   put_page(page);
+   if (is_huge_zero_pmd(orig_pmd)) {
+ 

Re: Re: [RFC PATCH 1/6] virtio/console: Add splice_write support

2012-08-09 Thread Masami Hiramatsu
(2012/08/09 18:00), Amit Shah wrote:
> On (Tue) 24 Jul 2012 [11:37:07], Yoshihiro YUNOMAE wrote:
>> From: Masami Hiramatsu 
>>
>> Enable to use splice_write from pipe to virtio-console port.
>> This steals pages from pipe and directly send it to host.
>>
>> Note that this may accelerate only the guest to host path.
>>
>> Signed-off-by: Masami Hiramatsu 
>> Cc: Amit Shah 
>> Cc: Arnd Bergmann 
>> Cc: Greg Kroah-Hartman 
>> ---
> 
>> +/* Faster zero-copy write by splicing */
>> +static ssize_t port_fops_splice_write(struct pipe_inode_info *pipe,
>> +  struct file *filp, loff_t *ppos,
>> +  size_t len, unsigned int flags)
>> +{
>> +struct port *port = filp->private_data;
>> +struct sg_list sgl;
>> +ssize_t ret;
>> +struct splice_desc sd = {
>> +.total_len = len,
>> +.flags = flags,
>> +.pos = *ppos,
>> +.u.data = &sgl,
>> +};
>> +
>> +sgl.n = 0;
>> +sgl.len = 0;
>> +sgl.sg = kmalloc(sizeof(struct scatterlist) * MAX_SPLICE_PAGES,
>> + GFP_ATOMIC);
> 
> Do you expect this function to be called from interrupt context?

No, not at all. Oops, that should be GFP_KERNEL...

Thank you for pointing it out.

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ALSA: hda - Defer probe when loading patch firmware

2012-08-09 Thread Takashi Iwai
At Thu, 09 Aug 2012 10:21:15 +0200,
Takashi Iwai wrote:
> 
> At Thu, 9 Aug 2012 10:07:13 +0200,
> Thierry Reding wrote:
> > 
> > On Thu, Aug 09, 2012 at 09:42:48AM +0200, Takashi Iwai wrote:
> > > At Thu, 9 Aug 2012 09:36:42 +0200,
> > > Thierry Reding wrote:
> > > > 
> > > > On Thu, Aug 09, 2012 at 09:31:30AM +0200, Takashi Iwai wrote:
> > > > > At Thu, 9 Aug 2012 09:08:13 +0200,
> > > > > Thierry Reding wrote:
> > > > > > 
> > > > > > On Thu, Aug 09, 2012 at 08:57:13AM +0200, Takashi Iwai wrote:
> > > > > > > At Thu,  9 Aug 2012 08:45:23 +0200,
> > > > > > > Thierry Reding wrote:
> > > > > > > > 
> > > > > > > > Recent changes to the firmware loading helpers cause drivers to 
> > > > > > > > stall
> > > > > > > > when firmware is loaded during the module_init() call. The 
> > > > > > > > snd-hda-intel
> > > > > > > > module requests firmware if the patch= parameter is used to 
> > > > > > > > load a patch
> > > > > > > > file. This patch works around the problem by deferring the 
> > > > > > > > probe in such
> > > > > > > > cases, which will cause the module to load successfully and the 
> > > > > > > > driver
> > > > > > > > binding to the device outside the module_init() call.
> > > > > > > 
> > > > > > > Is the "recent" change meant 3.6 kernel, or in linux-next?
> > > > > > > 
> > > > > > > In anyway, I don't understand why such a change was allowed.  Most
> > > > > > > drivers do call request_firmware() at the device probing time.
> > > > > > > If this really has to be resolved in the driver side, it must be 
> > > > > > > a bug
> > > > > > > in the firmware loader core code.
> > > > > > 
> > > > > > A good explanation of the problem and subsequent discussion can be 
> > > > > > found
> > > > > > here:
> > > > > > 
> > > > > > 
> > > > > > http://article.gmane.org/gmane.linux.drivers.video-input-infrastructure/49975
> > > > > 
> > > > > Yeah, but it doesn't justify this ugly module option.
> > > > > It's a simple bug.  Papering over it with this option doesn't fix
> > > > > anything.
> > > > 
> > > > It's not an option, all it does is defer probing if and only if the
> > > > patch parameter was specified to make sure the firmware load won't
> > > > stall. I realize that this may not be an optimal solution, but at least
> > > > it fixes the problem with no fallout.
> > > 
> > > Ah sorry, I misread the patch.
> > > 
> > > Then it shouldn't be checked at that point.  Since 3.5 kernel, the
> > > probing code was already split for vga_switcheroo support.
> > 
> > Yes, I saw that. But unless you actually use vga_switcheroo, the second
> > stage, azx_probe_continue(), will still be called from azx_probe() and
> > therefore ultimately from module_init().
> 
> Yeah, but this could be easily delayed.  The split was already done,
> so the next step would be to return after the first half at probe,
> then call the second half later.
> 
> > Before coming up with this patch I actually did play around a bit with
> > using the asynchronous firmware load functions but it turned out to be
> > rather difficult to do so I opted for the easy way. The biggest problem
> > I faced was that since patch loading needs to be done very early on, a
> > lot of the initialization would need to be done after .probe() and many
> > things could still fail, so cleaning up after errors would become
> > increasingly difficult.
> 
> async probe is also on my TODO list, but it's deferred ;)
> 
> > > The point you added is the second stage.
> > 
> > I don't understand this sentence.
> 
> I meant that your patch added the check at the second-half probing
> function (azx_probe_contine()).  That is, it could be already the
> point triggered by vga_switcheroo handler, not via module_init any
> longer.
> 
> So, after rethinking what you suggested, I wrote a quick patch below.
> Could you check whether this works?

Obviously it won't work if the module is re-loaded manually.
The -EPROBE_DEFER would work only at boot time, as it seems.


Takashi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] mm: vmscan: Scale number of pages reclaimed by reclaim/compaction based on failures

2012-08-09 Thread Mel Gorman
On Thu, Aug 09, 2012 at 05:27:15PM +0900, Minchan Kim wrote:
> > > > +  * pages reclaimed based on the number of consecutive allocation
> > > > +  * failures
> > > > +  */
> > > > + zone = lruvec_zone(lruvec);
> > > > + if (zone->compact_order_failed >= sc->order)
> > > 
> > > I can't understand this part.
> > > We don't defer lower order than compact_order_failed by aff62249.
> > > Do you mean lower order compaction context should be a lamb for
> > > deferred higher order allocation request success? I think it's not fair
> > > and even I can't understand rationale why it has to scale the number of 
> > > pages
> > > reclaimed with the number of recent compaction failture.
> > > Your changelog just says "What we have to do, NOT Why we have to do".
> > > 
> > 
> > I'm a moron, that should be <=, not >=. All my tests were based on order==9
> > and that was the only order using reclaim/compaction so it happened to
> > work as expected. Thanks! I fixed that and added the following
> > clarification to the changelog
> > 
> > The rationale is that reclaiming the normal number of pages still allowed
> > compaction to fail and its success depends on the number of pages. If it's
> > failing, reclaim more pages until it succeeds again.
> > 
> > Does that make more sense?
> 
> If compaction is defered, requestors fails to get high-order page and
> they normally do fallback by order-0 or something.

Yes. At least, one hopes they fell back to order-0.

> In this context, if they don't depends on fallback and retrying higher order
> allocation, your patch makes sense to me because your algorithm is based on
> past allocation request fail rate.
> Do I miss something?

Your question is difficult to parse but I think you are making an implicit
assumption that it's the same caller retrying the high order allocation.
That is not the case, not do I want it to be because that would be similar
to the caller using __GFP_REPEAT. Retrying with more reclaim until the
allocation succeeds would both stall and reclaim excessively.

The intention is that an allocation can fail but each subsequent attempt will
try harder until there is success. Each allocation request does a portion
of the necessary work to spread the cost between multiple requests. Take
THP for example where there is a constant request for THP allocations
for whatever reason (heavy fork workload, large buffer allocation being
populated etc.). Some of those allocations fail but if they do, future
THP requests will reclaim more pages. When compaction resumes again, it
will be more likely to succeed and compact_defer_shift gets reset. In the
specific case of THP there will be allocations that fail but khugepaged
will promote them later if the process is long-lived.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmc: sdio: Fix PM_SLEEP related build warnings

2012-08-09 Thread Rafael J. Wysocki
On Thursday, August 09, 2012, Thierry Reding wrote:
> Power management callbacks defined by SIMPLE_DEV_PM_OPS are only used if
> the PM_SLEEP Kconfig symbol has been defined. If not, the compiler will
> complain about them being unused.
> 
> Signed-off-by: Thierry Reding 

The pm_no_operation() thing is not necessary at all, AFAICT.  I'd just
remove it entirely. :-)

Thanks,
Rafael


> ---
>  drivers/mmc/core/sdio_bus.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c
> index 236842e..f8f90b4 100644
> --- a/drivers/mmc/core/sdio_bus.c
> +++ b/drivers/mmc/core/sdio_bus.c
> @@ -193,11 +193,12 @@ static int sdio_bus_remove(struct device *dev)
>  }
>  
>  #ifdef CONFIG_PM
> -
> +#ifdef CONFIG_PM_SLEEP
>  static int pm_no_operation(struct device *dev)
>  {
>   return 0;
>  }
> +#endif
>  
>  static const struct dev_pm_ops sdio_bus_pm_ops = {
>   SET_SYSTEM_SLEEP_PM_OPS(pm_no_operation, pm_no_operation)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch v2 1/1] HID:hid-multitouch: Add ELAN prouction request when resume

2012-08-09 Thread Scott Liu
Some of ELAN's production need to with set_idle commmand when reusme.

Signed-off-by: Scott Liu 
---
 drivers/hid/hid-ids.h|3 +++
 drivers/hid/hid-multitouch.c |   20 
 2 files changed, 23 insertions(+)

diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 41c34f2..a4d810c 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -829,4 +829,7 @@
 #define USB_VENDOR_ID_PRIMAX   0x0461
 #define USB_DEVICE_ID_PRIMAX_KEYBOARD  0x4e05
 
+#define USB_VENDOR_ID_ELAN  0x04f3
+#define USB_DEVICE_ID_ELAN_MOCCA0x000a
+
 #endif
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c
index 59c8b5c..b06b7d3 100644
--- a/drivers/hid/hid-multitouch.c
+++ b/drivers/hid/hid-multitouch.c
@@ -767,6 +767,25 @@ static int mt_reset_resume(struct hid_device *hdev)
mt_set_input_mode(hdev);
return 0;
 }
+
+static int mt_resume(struct hid_device *hdev)
+{
+   if (hdev->vendor == USB_VENDOR_ID_ELAN &&
+   hdev->product == USB_DEVICE_ID_ELAN_MOCCA) {
+
+   struct usb_interface *intf = to_usb_interface(hdev->dev.parent);
+   struct usb_host_interface *interface = intf->cur_altsetting;
+   struct usb_device *dev = hid_to_usb_dev(hdev);
+
+   usb_control_msg(dev, usb_sndctrlpipe(dev, 0),
+   HID_REQ_SET_IDLE, 
USB_TYPE_CLASS | USB_RECIP_INTERFACE,
+   0, 
interface->desc.bInterfaceNumber,
+   NULL, 0, USB_CTRL_SET_TIMEOUT);
+}
+
+return 0;
+}
+
 #endif
 
 static void mt_remove(struct hid_device *hdev)
@@ -1092,6 +,7 @@ static struct hid_driver mt_driver = {
.event = mt_event,
 #ifdef CONFIG_PM
.reset_resume = mt_reset_resume,
+   .resume = mt_resume,
 #endif
 };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer

2012-08-09 Thread Borislav Petkov
On Thu, Aug 09, 2012 at 02:33:12PM +0530, Amit Shah wrote:
> > @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
> > struct pipe_buffer *buf,
> >  
> > len = min(buf->len, sd->len);
> > sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> > -   sgl->n++;
> > -   sgl->len += len;
> > +   } else {
> > +   /* Failback to copying a page */
> > +   struct page *page = alloc_page(GFP_KERNEL);
> 
> I prefer zeroing out the page.  If there's not enough data to be
> filled in the page, the remaining data can be leaked to the host.

get_zeroed_page()?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer

2012-08-09 Thread Masami Hiramatsu
(2012/08/09 18:03), Amit Shah wrote:
> On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
>> From: Masami Hiramatsu 
>>
>> Add a failback memcpy path for unstealable pipe buffer.
>> If buf->ops->steal() fails, virtio-serial tries to
>> copy the page contents to an allocated page, instead
>> of just failing splice().
>>
>> Signed-off-by: Masami Hiramatsu 
>> Cc: Amit Shah 
>> Cc: Arnd Bergmann 
>> Cc: Greg Kroah-Hartman 
>> ---
>>
>>  drivers/char/virtio_console.c |   28 +---
>>  1 files changed, 25 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
>> index fe31b2f..911cb3e 100644
>> --- a/drivers/char/virtio_console.c
>> +++ b/drivers/char/virtio_console.c
>> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
>> struct pipe_buffer *buf,
>>  struct splice_desc *sd)
>>  {
>>  struct sg_list *sgl = sd->u.data;
>> -unsigned int len = 0;
>> +unsigned int offset, len;
>>  
>>  if (sgl->n == MAX_SPLICE_PAGES)
>>  return 0;
>> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
>> struct pipe_buffer *buf,
>>  
>>  len = min(buf->len, sd->len);
>>  sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
>> -sgl->n++;
>> -sgl->len += len;
>> +} else {
>> +/* Failback to copying a page */
>> +struct page *page = alloc_page(GFP_KERNEL);
> 
> I prefer zeroing out the page.  If there's not enough data to be
> filled in the page, the remaining data can be leaked to the host.

Yeah, it is really easy to fix that.
But out of curiosity, would that be really a problem?
I guess that host can access any guest page if need. If that
is right, is that really insecure to leak randomly allocated
unused page to the host?

Thank you,

-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] alarmtimer: implement minimum alarm interval for allowing suspend

2012-08-09 Thread Rafael J. Wysocki
On Thursday, August 09, 2012, Todd Poynor wrote:
> alarmtimer suspend return -EBUSY if the next alarm will fire in less
> than 2 seconds.  This allows one RTC seconds tick to occur subsequent
> to this check before the alarm wakeup time is set, ensuring the wakeup
> time is still in the future (assuming the RTC does not tick one more
> second prior to setting the alarm).
> 
> If suspend is rejected due to an imminent alarm, hold a wakeup source
> for 2 seconds to process the alarm prior to reattempting suspend.
> 
> If setting the alarm incurs an -ETIME for an alarm set in the past,
> or any other problem setting the alarm, abort suspend and hold a
> wakelock for 1 second while the alarm is allowed to be serviced or
> other hopefully transient conditions preventing the alarm clear up.
> 
> Signed-off-by: Todd Poynor 
> ---
>  kernel/time/alarmtimer.c |   18 +-
>  1 files changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
> index aa27d39..f979d85 100644
> --- a/kernel/time/alarmtimer.c
> +++ b/kernel/time/alarmtimer.c
> @@ -46,6 +46,8 @@ static struct alarm_base {
>  static ktime_t freezer_delta;
>  static DEFINE_SPINLOCK(freezer_delta_lock);
>  
> +static struct wakeup_source *ws;
> +
>  #ifdef CONFIG_RTC_CLASS
>  /* rtc timer and device for setting alarm wakeups at suspend */
>  static struct rtc_timer  rtctimer;
> @@ -250,6 +252,7 @@ static int alarmtimer_suspend(struct device *dev)
>   unsigned long flags;
>   struct rtc_device *rtc;
>   int i;
> + int ret;
>  
>   spin_lock_irqsave(&freezer_delta_lock, flags);
>   min = freezer_delta;
> @@ -279,8 +282,10 @@ static int alarmtimer_suspend(struct device *dev)
>   if (min.tv64 == 0)
>   return 0;
>  
> - /* XXX - Should we enforce a minimum sleep time? */
> - WARN_ON(min.tv64 < NSEC_PER_SEC);
> + if (ktime_to_ns(min) < 2 * NSEC_PER_SEC) {
> + __pm_wakeup_event(ws, 2 * MSEC_PER_SEC);
> + return -EBUSY;
> + }
>  
>   /* Setup an rtc timer to fire that far in the future */
>   rtc_timer_cancel(rtc, &rtctimer);
> @@ -288,9 +293,11 @@ static int alarmtimer_suspend(struct device *dev)
>   now = rtc_tm_to_ktime(tm);
>   now = ktime_add(now, min);
>  
> - rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
> -
> - return 0;
> + /* Set alarm, if in the past reject suspend briefly to handle */
> + ret = rtc_timer_start(rtc, &rtctimer, now, ktime_set(0, 0));
> + if (ret < 0)
> + __pm_wakeup_event(ws, 1 * MSEC_PER_SEC);

Why not just MSEC_PER_SEC?

> + return ret;
>  }
>  #else
>  static int alarmtimer_suspend(struct device *dev)
> @@ -821,6 +828,7 @@ static int __init alarmtimer_init(void)
>   error = PTR_ERR(pdev);
>   goto out_drv;
>   }
> + ws = wakeup_source_register("alarmtimer");
>   return 0;
>  
>  out_drv:
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmc: sdio: Fix PM_SLEEP related build warnings

2012-08-09 Thread Thierry Reding
On Thu, Aug 09, 2012 at 11:27:02AM +0200, Rafael J. Wysocki wrote:
> On Thursday, August 09, 2012, Thierry Reding wrote:
> > Power management callbacks defined by SIMPLE_DEV_PM_OPS are only used if
> > the PM_SLEEP Kconfig symbol has been defined. If not, the compiler will
> > complain about them being unused.
> > 
> > Signed-off-by: Thierry Reding 
> 
> The pm_no_operation() thing is not necessary at all, AFAICT.  I'd just
> remove it entirely. :-)

I don't actually have a setup where I can test suspend/resume with this
driver, but if you say so I can follow up with a patch that removes
pm_no_operation(). It should work because many drivers without proper
PM_SLEEP support do not specify .suspend() nor .resume() callbacks
either.

Thierry


pgpqFnxQlp6YG.pgp
Description: PGP signature


Re: [PATCH 0/5][RFC] Fallocate Volatile Ranges v6

2012-08-09 Thread Michel Lespinasse
Hi John,

On Fri, Jul 27, 2012 at 8:57 PM, John Stultz  wrote:
> So after not getting too much positive feedback on my last
> attempt at trying to use a non-shrinker method for managing
> & purging volatile ranges, I decided I'd go ahead and try
> to implement something along Minchan's ERECLAIM LRU list
> idea.

Agree that there hasn't been much feedback from MM folks yet - sorry
about that :/

I think one issue might be that most people don't have a good
background on how the feature is intended to be used, and it is very
difficult to comment meaningfully without that.

As for myself, I have been wondering:

- Why the feature needs to be on a per-range basis, rather than
per-file. Is this simply to make it easier to transition the android
use case from whatever they are doing right now, or is it that the
object boundaries within a file can't be known in advance, and thus
one wouldn't know how to split objects accross different files ? Or
could it be that some of the objects would be small (less than a page)
so space use would be inefficient if they were placed in different
files ? Or just that there would be too many files for efficient
management ?

- What are the desired semantics for the volatile objects. Can the
objects be accessed while they are marked as volatile, or do they have
to get unmarked first ? Is it really the case that we always want to
reclaim from volatile objects first, before any other kind of caches
we might have ? This sounds like a very strong hint, and I think I
would be more comfortable with something more subtle if that's
possible. Also, if we have several volatile objects to reclaim from,
is it desirable to reclaim from the one that's been marked volatile
the longest or does it make no difference ? When an object is marked
volatile, would it be sufficient to ensure it gets placed on the
inactive list (maybe with the referenced bit cleared) and let the
normal reclaim algorithm get to it, or is that an insufficiently
strong hint somehow ?

Basically, having some background information of how android would be
using the feature would help us better understand the design decision
here, I think.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Intel xhci: Only switch the switchable ports

2012-08-09 Thread Keng-Yu Lin
With a previous patch to enable the EHCI/XHCI port switching, it switches
all the available ports.

The assumption is not correct because the BIOS may expect some ports
not switchable by the OS.

There are two more registers that contains the information of the switchable
and non-switchable ports.

This patch adds the checking code for the two register so that only the
switchable ports are altered.

Signed-off-by: Keng-Yu Lin 
---
 drivers/usb/host/pci-quirks.c |   27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index 833b3c6..89f62f2 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -75,7 +75,9 @@
 #defineNB_PIF0_PWRDOWN_1   0x01100013
 
 #define USB_INTEL_XUSB2PR  0xD0
+#define USB_INTEL_USB2PRM  0xD4
 #define USB_INTEL_USB3_PSSEN   0xD8
+#define USB_INTEL_USB3PRM  0xDC
 
 static struct amd_chipset_info {
struct pci_dev  *nb_dev;
@@ -772,10 +774,18 @@ void usb_enable_xhci_ports(struct pci_dev *xhci_pdev)
return;
}
 
-   ports_available = 0x;
+   /* Read USB3PRM, the USB 3.0 Port Routing Mask Register
+* Indicate the ports that can be changed from OS.
+*/
+   pci_read_config_dword(xhci_pdev, USB_INTEL_USB3PRM,
+   &ports_available);
+
+   dev_dbg(&xhci_pdev->dev, "Configurable ports to enable SuperSpeed: 
0x%x\n",
+   ports_available);
+
/* Write USB3_PSSEN, the USB 3.0 Port SuperSpeed Enable
-* Register, to turn on SuperSpeed terminations for all
-* available ports.
+* Register, to turn on SuperSpeed terminations for the
+* switchable ports.
 */
pci_write_config_dword(xhci_pdev, USB_INTEL_USB3_PSSEN,
cpu_to_le32(ports_available));
@@ -785,7 +795,16 @@ void usb_enable_xhci_ports(struct pci_dev *xhci_pdev)
dev_dbg(&xhci_pdev->dev, "USB 3.0 ports that are now enabled "
"under xHCI: 0x%x\n", ports_available);
 
-   ports_available = 0x;
+   /* Read XUSB2PRM, xHCI USB 2.0 Port Routing Mask Register
+* Indicate the port to be controlled by the EHCI host.
+*/
+
+   pci_read_config_dword(xhci_pdev, USB_INTEL_USB2PRM,
+   &ports_available);
+
+   dev_dbg(&xhci_pdev->dev, "Configurable ports to hand over the ECHI host:
+   0x%x\n", ports_available);
+
/* Write XUSB2PR, the xHC USB 2.0 Port Routing Register, to
 * switch the USB 2.0 power and data lines over to the xHCI
 * host.
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mmc: sdio: Fix PM_SLEEP related build warnings

2012-08-09 Thread Thierry Reding
Power management callbacks defined by SIMPLE_DEV_PM_OPS are only used if
the PM_SLEEP Kconfig symbol has been defined. If not, the compiler will
complain about them being unused. However, since the callback for this
driver doesn't do anything it can just as well be dropped.

Signed-off-by: Thierry Reding 
---
 drivers/mmc/core/sdio_bus.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c
index 236842e..6bf6879 100644
--- a/drivers/mmc/core/sdio_bus.c
+++ b/drivers/mmc/core/sdio_bus.c
@@ -193,14 +193,7 @@ static int sdio_bus_remove(struct device *dev)
 }
 
 #ifdef CONFIG_PM
-
-static int pm_no_operation(struct device *dev)
-{
-   return 0;
-}
-
 static const struct dev_pm_ops sdio_bus_pm_ops = {
-   SET_SYSTEM_SLEEP_PM_OPS(pm_no_operation, pm_no_operation)
SET_RUNTIME_PM_OPS(
pm_generic_runtime_suspend,
pm_generic_runtime_resume,
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/2] Enhance DMABUF with reference counting for exporter module

2012-08-09 Thread Tomasz Stanislawski
Hello,
This patchset adds reference counting for an exporter module to DMABUF
framework.  Moreover, it adds setup of an owner field for exporters in DRM
subsystem.

v1: Original
v2:
  - split patch into DMABUF and DRM part
  - allow owner to be NULL

Regards,
Tomasz Stanislawski

Tomasz Stanislawski (2):
  dma-buf: add reference counting for exporter module
  drm: set owner field to for all DMABUF exporters

 Documentation/dma-buf-sharing.txt  |3 ++-
 drivers/base/dma-buf.c |9 -
 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |1 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |1 +
 drivers/gpu/drm/nouveau/nouveau_prime.c|1 +
 drivers/gpu/drm/radeon/radeon_prime.c  |1 +
 drivers/staging/omapdrm/omap_gem_dmabuf.c  |1 +
 include/linux/dma-buf.h|2 ++
 8 files changed, 17 insertions(+), 2 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] dma-buf: add reference counting for exporter module

2012-08-09 Thread Tomasz Stanislawski
This patch adds reference counting on a module that exported dma-buf and
implements its operations. This prevents the module from being unloaded while
DMABUF file is in use.

Signed-off-by: Tomasz Stanislawski 
Acked-by: Sumit Semwal 
Acked-by: Daniel Vetter 
CC: linux-...@vger.kernel.org
---
 Documentation/dma-buf-sharing.txt |3 ++-
 drivers/base/dma-buf.c|9 -
 include/linux/dma-buf.h   |2 ++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/dma-buf-sharing.txt 
b/Documentation/dma-buf-sharing.txt
index ad86fb8..2613057 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -49,7 +49,8 @@ The dma_buf buffer sharing API usage contains the following 
steps:
The buffer exporter announces its wish to export a buffer. In this, it
connects its own private buffer data, provides implementation for operations
that can be performed on the exported dma_buf, and flags for the file
-   associated with this buffer.
+   associated with this buffer. The operations structure has owner field.
+   You should initialize this to THIS_MODULE in most cases.
 
Interface:
   struct dma_buf *dma_buf_export(void *priv, struct dma_buf_ops *ops,
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index c30f3e1..a1d9cab 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static inline int is_dma_buf_file(struct file *);
 
@@ -40,6 +41,7 @@ static int dma_buf_release(struct inode *inode, struct file 
*file)
dmabuf = file->private_data;
 
dmabuf->ops->release(dmabuf);
+   module_put(dmabuf->ops->owner);
kfree(dmabuf);
return 0;
 }
@@ -105,9 +107,14 @@ struct dma_buf *dma_buf_export(void *priv, const struct 
dma_buf_ops *ops,
return ERR_PTR(-EINVAL);
}
 
+   if (!try_module_get(ops->owner))
+   return ERR_PTR(-ENOENT);
+
dmabuf = kzalloc(sizeof(struct dma_buf), GFP_KERNEL);
-   if (dmabuf == NULL)
+   if (dmabuf == NULL) {
+   module_put(ops->owner);
return ERR_PTR(-ENOMEM);
+   }
 
dmabuf->priv = priv;
dmabuf->ops = ops;
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index eb48f38..22953de 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -37,6 +37,7 @@ struct dma_buf_attachment;
 
 /**
  * struct dma_buf_ops - operations possible on struct dma_buf
+ * @owner: the module that implements dma_buf operations
  * @attach: [optional] allows different devices to 'attach' themselves to the
  * given buffer. It might return -EBUSY to signal that backing storage
  * is already allocated and incompatible with the requirements
@@ -70,6 +71,7 @@ struct dma_buf_attachment;
  * @vunmap: [optional] unmaps a vmap from the buffer
  */
 struct dma_buf_ops {
+   struct module *owner;
int (*attach)(struct dma_buf *, struct device *,
struct dma_buf_attachment *);
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] drm: set owner field to for all DMABUF exporters

2012-08-09 Thread Tomasz Stanislawski
This patch sets owner field in DMABUF operations for all DMABUF exporters in
DRM subsystem.  This prevents an exporting module from being unloaded while
exported DMABUF descriptor is in use.

Signed-off-by: Tomasz Stanislawski 
Acked-by: Sumit Semwal 
Acked-by: Daniel Vetter 
---
 drivers/gpu/drm/exynos/exynos_drm_dmabuf.c |1 +
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |1 +
 drivers/gpu/drm/nouveau/nouveau_prime.c|1 +
 drivers/gpu/drm/radeon/radeon_prime.c  |1 +
 drivers/staging/omapdrm/omap_gem_dmabuf.c  |1 +
 5 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c 
b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
index 613bf8a..cf3bc6d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_dmabuf.c
@@ -164,6 +164,7 @@ static void exynos_gem_dmabuf_kunmap(struct dma_buf 
*dma_buf,
 }
 
 static struct dma_buf_ops exynos_dmabuf_ops = {
+   .owner  = THIS_MODULE,
.map_dma_buf= exynos_gem_map_dma_buf,
.unmap_dma_buf  = exynos_gem_unmap_dma_buf,
.kmap   = exynos_gem_dmabuf_kmap,
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index aa308e1..07ff03b 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -152,6 +152,7 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, 
struct vm_area_struct *
 }
 
 static const struct dma_buf_ops i915_dmabuf_ops =  {
+   .owner = THIS_MODULE,
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
.release = i915_gem_dmabuf_release,
diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c 
b/drivers/gpu/drm/nouveau/nouveau_prime.c
index a25cf2c..8605033 100644
--- a/drivers/gpu/drm/nouveau/nouveau_prime.c
+++ b/drivers/gpu/drm/nouveau/nouveau_prime.c
@@ -127,6 +127,7 @@ static void nouveau_gem_prime_vunmap(struct dma_buf 
*dma_buf, void *vaddr)
 }
 
 static const struct dma_buf_ops nouveau_dmabuf_ops =  {
+   .owner = THIS_MODULE,
.map_dma_buf = nouveau_gem_map_dma_buf,
.unmap_dma_buf = nouveau_gem_unmap_dma_buf,
.release = nouveau_gem_dmabuf_release,
diff --git a/drivers/gpu/drm/radeon/radeon_prime.c 
b/drivers/gpu/drm/radeon/radeon_prime.c
index 6bef46a..4061fd3 100644
--- a/drivers/gpu/drm/radeon/radeon_prime.c
+++ b/drivers/gpu/drm/radeon/radeon_prime.c
@@ -127,6 +127,7 @@ static void radeon_gem_prime_vunmap(struct dma_buf 
*dma_buf, void *vaddr)
mutex_unlock(&dev->struct_mutex);
 }
 const static struct dma_buf_ops radeon_dmabuf_ops =  {
+   .owner = THIS_MODULE,
.map_dma_buf = radeon_gem_map_dma_buf,
.unmap_dma_buf = radeon_gem_unmap_dma_buf,
.release = radeon_gem_dmabuf_release,
diff --git a/drivers/staging/omapdrm/omap_gem_dmabuf.c 
b/drivers/staging/omapdrm/omap_gem_dmabuf.c
index 42728e0..6a4dd67 100644
--- a/drivers/staging/omapdrm/omap_gem_dmabuf.c
+++ b/drivers/staging/omapdrm/omap_gem_dmabuf.c
@@ -179,6 +179,7 @@ out_unlock:
 }
 
 struct dma_buf_ops omap_dmabuf_ops = {
+   .owner = THIS_MODULE,
.map_dma_buf = omap_gem_map_dma_buf,
.unmap_dma_buf = omap_gem_unmap_dma_buf,
.release = omap_gem_dmabuf_release,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2 10/16] ACPIHP: system device hotplug driver skeleton

2012-08-09 Thread Jiang Liu
On 2012-8-9 16:41, Tang Chen wrote:
> Hi Liu ~
> 
> On 08/09/2012 03:40 PM, Jiang Liu wrote:
>> Hi Tang,
>>  Thanks for testing.
>>  Currently there's a limitation that you need to insert acpihp_enum 
>> driver first.
> 
> Sorry, I didn't make it clear. I did load acpihp_enum module first, and then 
> load acpihp_drv.
> 
> And I just tried it some more times. It just hung up, but dmesg had no output.
> Like this:
> 
> # modprobe acpihp_enum
> (OK, and sysfs interfaces have been created)
> # modprobe acpihp_drv
> (hang up)
> 
> # dmesg
> (nothing)
> 
> The "modprobe acpihp_drv" process's call trace shows that it hung at the 
> following function:
> #0  0x0032836aab80 in __nanosleep_nocancel () from /lib64/libc.so.6
> #1  0x0032836deb64 in usleep () from /lib64/libc.so.6
> ..
> 
> I have tried several times and I cannot reproduce the situation I just said.
You can reproduce it by loading acpihp_drv without acpihp_enum driver, I guess.
The acpihp_drv module_init() should call acpihp_register_class() to initialize 
the core.

> Maybe my box has something different with yours. And I'll try to find out why.
> 
> Thanks for your advice. :)
> 
>> Will fix this issue in next version.
>>  Regards!
>>  Gerry
>>
>> On 2012-8-9 15:12, Tang Chen wrote:
>>> Hi Liu~
>>>
>>> I compiled this driver as a module, acpihp_drv. And when I loaded this 
>>> module, it
>>> gave the following error message:
>>>
>>> # modprobe acpihp_drv
>>> (the command hangs up, no return after 10 min)
>>>
>>> #dmesg
>>> ..
>>> [  126.643350] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0078
>>> [  126.644007] IP: [] mutex_lock+0x19/0x37
>>> [  126.644007] PGD 105277a067 PUD 104f823067 PMD 0 
>>> [  126.644007] Oops: 0002 [#1] SMP 
>>> [  126.644007] Modules linked in: acpihp_drv(+) ebtable_nat ebtables 
>>> ipt_MASQUERADE iptable_nat nf_nat iptable_mangle bridge stp llc sunrpc 
>>> cpufreq_ondemand acpi_cpufreq freq_table mperf ip6t_REJECT 
>>> nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipv6 vhost_net 
>>> macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support coretemp kvm_intel 
>>> kvm crc32c_intel microcode lpc_ich mfd_core pcspkr i2c_i801 i2c_core 
>>> ioatdma e1000e acpi_memhotplug i7core_edac edac_core igb dca mptsas 
>>> mptscsih mptbase scsi_transport_sas
>>> [  126.644007] CPU 10 
>>> [  126.644007] Pid: 2821, comm: modprobe Tainted: G   A 3.6.0-rc1+ 
>>> #6 FUJITSU-SV PRIMEQUEST 1800E/SB
>>> [  126.644007] RIP: 0010:[]  [] 
>>> mutex_lock+0x19/0x37
>>> [  126.644007] RSP: 0018:8810589a9de8  EFLAGS: 00010246
>>> [  126.644007] RAX:  RBX: 0078 RCX: 
>>> 
>>> [  126.644007] RDX:  RSI:  RDI: 
>>> 0078
>>> [  126.644007] RBP: 8810589a9e08 R08:  R09: 
>>> 8810589a9d88
>>> [  126.644007] R10: 13e5 R11: 13e5 R12: 
>>> a01460d0
>>> [  126.644007] R13:  R14: a014732b R15: 
>>> 00bf
>>> [  126.644007] FS:  7fecb1802700() GS:88105e64() 
>>> knlGS:
>>> [  126.644007] CS:  0010 DS:  ES:  CR0: 80050033
>>> [  126.644007] CR2: 0078 CR3: 001052772000 CR4: 
>>> 07e0
>>> [  126.644007] DR0:  DR1:  DR2: 
>>> 
>>> [  126.644007] DR3:  DR6: 0ff0 DR7: 
>>> 0400
>>> [  126.644007] Process modprobe (pid: 2821, threadinfo 8810589a8000, 
>>> task 8810592f8000)
>>> [  126.644007] Stack:
>>> [  126.644007]  8810589a9e08 810be37f a0146220 
>>> 81a7b390
>>> [  126.644007]  8810589a9e58 81317eb3 8810589a9e48 
>>> 
>>> [  126.644007]  81a342c0 81a342e0  
>>> a0146220
>>> [  126.644007] Call Trace:
>>> [  126.644007]  [] ? tracepoint_module_notify+0xd9/0x14a
>>> [  126.644007]  [] class_interface_register+0x4a/0xbc
>>> [  126.644007]  [] ? 0xa00b7fff
>>> [  126.644007]  [] acpihp_drv_init+0x10/0x12 [acpihp_drv]
>>> [  126.644007]  [] do_one_initcall+0x7f/0x139
>>> [  126.644007]  [] sys_init_module+0x12d3/0x14e3
>>> [  126.644007]  [] ? ddebug_dyndbg_boot_param_cb+0x45/0x45
>>> [  126.644007]  [] system_call_fastpath+0x16/0x1b
>>> [  126.644007] Code: 48 8b 04 25 80 c6 00 00 48 89 43 18 31 c0 5b 5b c9 c3 
>>> 55 48 89 e5 53 48 83 ec 18 66 66 66 66 90 48 89 fb e8 5a 0c 00 00 48 89 df 
>>>  ff 0f 79 05 e8 06 ff ff ff 65 48 8b 04 25 80 c6 00 00 48 89 
>>> [  126.644007] RIP  [] mutex_lock+0x19/0x37
>>> [  126.644007]  RSP 
>>> [  126.644007] CR2: 0078
>>> [  129.981335] ---[ end trace da17e9c9de8dd560 ]---
>>> [  139.085895] nr_pdflush_threads exported in /proc is scheduled for removal
>>> [  139.167394] sysctl: The scan_unevictable_pages sysctl/node-interface has 
>>> been disabled for lack of a legitimate use case. 

Re: [PATCH 1/3] dma-fence: dma-buf synchronization (v7)

2012-08-09 Thread Maarten Lankhorst
Hey Sumit,

Op 08-08-12 08:35, Sumit Semwal schreef:
> Hi Maarten,
>
> On 8 August 2012 00:17, Maarten Lankhorst
>  wrote:
>> Op 07-08-12 19:53, Maarten Lankhorst schreef:
>>> A dma-fence can be attached to a buffer which is being filled or consumed
>>> by hw, to allow userspace to pass the buffer without waiting to another
>>> device.  For example, userspace can call page_flip ioctl to display the
>>> next frame of graphics after kicking the GPU but while the GPU is still
>>> rendering.  The display device sharing the buffer with the GPU would
>>> attach a callback to get notified when the GPU's rendering-complete IRQ
>>> fires, to update the scan-out address of the display, without having to
>>> wake up userspace.
> Thanks for this patchset; Could you please also fill up
> Documentation/dma-buf-sharing.txt, to include the relevant bits?
>
> We've tried to make sure the Documentation corresponding is kept
> up-to-date as the framework has grown, and new features are added to
> it - and I think features as important as dma-fence and dmabufmgr do
> warrant a healthy update.

Ok I'll clean it up and add the documentation, one other question. If code
that requires dmabuf needs to select CONFIG_DMA_SHARED_BUFFER,
why does dma-buf.h have fallbacks for !CONFIG_DMA_SHARED_BUFFER?
This seems weird, would you have any objection if I removed those?

~Maarten

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] usb: otg: twl4030-usb: spin_unlock_irq in interrupt handler

2012-08-09 Thread Felipe Balbi
Hi,

On Sat, Jul 21, 2012 at 11:40:18AM +0400, Denis Efremov wrote:
> The replacement of spin_lock_irq/spin_unlock_irq pair in
> twl4030_usb_linkstat function by
> spin_lock_irqsave/spin_lock_irqrestore pair.
> The twl4030_usb_linkstat function is called from twl4030_usb_irq
> interrupt handler. Therefore reenabling of handler interrupt line
> should be avoided.
> 
> Found by Linux Driver Verification project (linuxtesting.org).
> 
> Signed-off-by: Denis Efremov 

I have pushed a patch which I think solves this issue. Can you test ?

commit 6b03b13336ee5d8da7bda8799c9ed990e3daedcc
Author: Felipe Balbi 
Date:   Thu Jun 14 13:24:42 2012 +0300

usb: otg: twl: add missing IRQF_ONESHOT

this patch fixes the following warning:

[2.825378] genirq: Threaded irq requested \
with handler=NULL and !ONESHOT for irq 363

Signed-off-by: Felipe Balbi 

diff --git a/drivers/usb/otg/twl4030-usb.c b/drivers/usb/otg/twl4030-usb.c
index c4a86da..0297930 100644
--- a/drivers/usb/otg/twl4030-usb.c
+++ b/drivers/usb/otg/twl4030-usb.c
@@ -651,8 +651,8 @@ static int __devinit twl4030_usb_probe(struct 
platform_device *pdev)
 */
twl->irq_enabled = true;
status = request_threaded_irq(twl->irq, NULL, twl4030_usb_irq,
-   IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING,
-   "twl4030_usb", twl);
+   IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING |
+   IRQF_ONESHOT, "twl4030_usb", twl);
if (status < 0) {
dev_dbg(&pdev->dev, "can't get IRQ %d, err %d\n",
twl->irq, status);

-- 
balbi


signature.asc
Description: Digital signature


[PATCH] block: Potential null pointer dereference in pkt_generic_packet()

2012-08-09 Thread Marina Makienko
The function blk_get_request() can return NULL. It can lead to further error in 
pkt_generic_packet().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Marina Makienko 
---
 drivers/block/pktcdvd.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index ba66e44..db3192e 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -743,6 +743,11 @@ static int pkt_generic_packet(struct pktcdvd_device *pd, 
struct packet_command *
rq = blk_get_request(q, (cgc->data_direction == CGC_DATA_WRITE) ?
 WRITE : READ, __GFP_WAIT);
 
+   if (!rq) {
+   printk(DRIVER_NAME"pkt_generic_packet: blk_get_request failed. 
\n");
+   return NULL;
+   }
+
if (cgc->buflen) {
if (blk_rq_map_kern(q, rq, cgc->buffer, cgc->buflen, 
__GFP_WAIT))
goto out;
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/5] [RFC] Add volatile range management code

2012-08-09 Thread Michel Lespinasse
On Fri, Jul 27, 2012 at 8:57 PM, John Stultz  wrote:
> v5:
> * Drop intervaltree for prio_tree usage per Michel &
>   Dmitry's suggestions.

Actually, I believe the ranges you need to track are non-overlapping, correct ?

If that is the case, a simple rbtree, sorted by start-of-range
address, would work best.
(I am trying to remove prio_tree users... :)

> +   /* First, find any existing intervals that overlap */
> +   prio_tree_iter_init(&iter, root, start, end);

Note that prio tree iterations take intervals as [start; last] not [start; end[
So if you want to stick with prio trees, you would have to use end-1 here.

> +   /* Coalesce left-adjacent ranges */
> +   prio_tree_iter_init(&iter, root, start-1, start);

Same here; you probably want to use start-1 on both ends

> +   node = prio_tree_next(&iter);
> +   while (node) {

I'm confused, I don't think you ever expect more than one range to
match, do you ???

> +   /* Coalesce right-adjacent ranges */
> +   prio_tree_iter_init(&iter, root, end, end+1);

Same again, here you probably want end on both ends

This is far from a complete code review, but I just wanted to point
out a couple details that jumped to me first. I am afraid I am missing
some of the background about how the feature is to be used to really
dig into the rest of the changes at this point :/

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Using vanilla kernels headers to do backport testing -- issues with memcpy()

2012-08-09 Thread Sedat Dilek
On Thu, Aug 9, 2012 at 11:01 AM, Luis R. Rodriguez  wrote:
> On Thu, Aug 9, 2012 at 12:27 AM, Sedat Dilek  wrote:
>> I am not an English native but this sentence sounds a bit strange to
>> me ("used" followed by "require" and double "Linux"?).
>> I can imagine what you want to express but have no better wording right now.
>>
>> 180 +  echo -e "   This was added since kernels >= 3.4 used require"
>> 181 +  echo -e "   a glibc >= 2.14 for memcpy(), and not all Linux"
>> 182 +  echo -e "   Linux distributions have such glibc.
>
> Fixed.
>

Thanks for the quick fix, sounds much better now.

- Sedat -

>  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmc: sdio: Fix PM_SLEEP related build warnings

2012-08-09 Thread Rafael J. Wysocki
On Thursday, August 09, 2012, Thierry Reding wrote:
> On Thu, Aug 09, 2012 at 11:27:02AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, August 09, 2012, Thierry Reding wrote:
> > > Power management callbacks defined by SIMPLE_DEV_PM_OPS are only used if
> > > the PM_SLEEP Kconfig symbol has been defined. If not, the compiler will
> > > complain about them being unused.
> > > 
> > > Signed-off-by: Thierry Reding 
> > 
> > The pm_no_operation() thing is not necessary at all, AFAICT.  I'd just
> > remove it entirely. :-)
> 
> I don't actually have a setup where I can test suspend/resume with this
> driver, but if you say so I can follow up with a patch that removes
> pm_no_operation(). It should work because many drivers without proper
> PM_SLEEP support do not specify .suspend() nor .resume() callbacks
> either.

Sure, it should work.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCHv2 1/4] modem_shm: Add Modem Access Framework

2012-08-09 Thread Arun MURTHY
> On Wed, Aug 08, 2012 at 05:36:05AM +0200, Arun MURTHY wrote:
> > > On Tue, Aug 07, 2012 at 12:24:28PM +0530, Arun Murthy wrote:
> > > > Adds Modem Access Framework, which allows for registering platform
> > > specific
> > > > modem access mechanisms. The framework also exposes APIs for
> > > > client
> > > drivers
> > > > for getting and releasing access to modem, regardless of the
> > > underlying
> > > > platform specific access mechanism.
> > >
> > > The term "modems" here has a lot of legacy connotations.  First of
> > > which is, userspace handles this today as tty devices, why aren't
> > > you doing the same here?  Why does this have to be something
> > > "special"?
> > >
> >
> > The main focus over there the modem IPC.
> 
> Over where?
> 
> What "modem IPC"?
> 

I mean the dir which is added in this patch, "modem_shm" is added to
have a separate list of such modem access drivers.

> You need to really explain what you are doing here, as I have no idea.
> And I have no idea why you still can't just use a tty device, why can't you?
> 


Its not that tty can be used or cant be used, but just to separate out the
modem related stuff in a separate folder.

> > In doing so, there are some functionality like waking the modem, or
> > releasing the modem etc. These will be used by the modem IPC drivers
> > and also few others like sim driver and security drivers.
> 
> What's a "sim driver", and what type of "security drivers" are you referring
> to?

SIM(Subscriber Identity Module) used in mobiles.
TEE(security) driver

> 
> > Since this is a shared call and hence has to be synchronized. Hence so
> > a small framework like is being done to monitor the modem access
> > related only operations.
> 
> Again, why can't the tty layer do this for you?
> 

Maybe tty can do this, but want to have all modem related  separately.
There are many such hardware and many drivers coming up in near future.

Thanks and Regards,
Arun R Murthy
-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mmc: sdio: Fix PM_SLEEP related build warnings

2012-08-09 Thread Rafael J. Wysocki
On Thursday, August 09, 2012, Thierry Reding wrote:
> Power management callbacks defined by SIMPLE_DEV_PM_OPS are only used if
> the PM_SLEEP Kconfig symbol has been defined. If not, the compiler will
> complain about them being unused. However, since the callback for this
> driver doesn't do anything it can just as well be dropped.
> 
> Signed-off-by: Thierry Reding 

Acked-by: Rafael J. Wysocki 

> ---
>  drivers/mmc/core/sdio_bus.c | 7 ---
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c
> index 236842e..6bf6879 100644
> --- a/drivers/mmc/core/sdio_bus.c
> +++ b/drivers/mmc/core/sdio_bus.c
> @@ -193,14 +193,7 @@ static int sdio_bus_remove(struct device *dev)
>  }
>  
>  #ifdef CONFIG_PM
> -
> -static int pm_no_operation(struct device *dev)
> -{
> - return 0;
> -}
> -
>  static const struct dev_pm_ops sdio_bus_pm_ops = {
> - SET_SYSTEM_SLEEP_PM_OPS(pm_no_operation, pm_no_operation)
>   SET_RUNTIME_PM_OPS(
>   pm_generic_runtime_suspend,
>   pm_generic_runtime_resume,
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer

2012-08-09 Thread Amit Shah
On (Thu) 09 Aug 2012 [18:24:58], Masami Hiramatsu wrote:
> (2012/08/09 18:03), Amit Shah wrote:
> > On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
> >> From: Masami Hiramatsu 
> >>
> >> Add a failback memcpy path for unstealable pipe buffer.
> >> If buf->ops->steal() fails, virtio-serial tries to
> >> copy the page contents to an allocated page, instead
> >> of just failing splice().
> >>
> >> Signed-off-by: Masami Hiramatsu 
> >> Cc: Amit Shah 
> >> Cc: Arnd Bergmann 
> >> Cc: Greg Kroah-Hartman 
> >> ---
> >>
> >>  drivers/char/virtio_console.c |   28 +---
> >>  1 files changed, 25 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
> >> index fe31b2f..911cb3e 100644
> >> --- a/drivers/char/virtio_console.c
> >> +++ b/drivers/char/virtio_console.c
> >> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
> >> struct pipe_buffer *buf,
> >>struct splice_desc *sd)
> >>  {
> >>struct sg_list *sgl = sd->u.data;
> >> -  unsigned int len = 0;
> >> +  unsigned int offset, len;
> >>  
> >>if (sgl->n == MAX_SPLICE_PAGES)
> >>return 0;
> >> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
> >> struct pipe_buffer *buf,
> >>  
> >>len = min(buf->len, sd->len);
> >>sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
> >> -  sgl->n++;
> >> -  sgl->len += len;
> >> +  } else {
> >> +  /* Failback to copying a page */
> >> +  struct page *page = alloc_page(GFP_KERNEL);
> > 
> > I prefer zeroing out the page.  If there's not enough data to be
> > filled in the page, the remaining data can be leaked to the host.
> 
> Yeah, it is really easy to fix that.
> But out of curiosity, would that be really a problem?
> I guess that host can access any guest page if need. If that
> is right, is that really insecure to leak randomly allocated
> unused page to the host?

I'm not sure if there is a way to really attack, but just something I
had thought about: the host kernel can access any guest page, that's
not something we can prevent.

However, if qemu is restricted from accessing guest pages, and the
guest shares this page with qemu for r/w purposes via the virtio
channel, a qemu exploit can expose guest data to host userspace.

I agree this is completely theoretical; can someone else with more
insight confirm or deny my apprehensions?

Amit
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 2/6] virtio/console: Add a failback for unstealable pipe buffer

2012-08-09 Thread Avi Kivity
On 08/09/2012 12:55 PM, Amit Shah wrote:
> On (Thu) 09 Aug 2012 [18:24:58], Masami Hiramatsu wrote:
>> (2012/08/09 18:03), Amit Shah wrote:
>> > On (Tue) 24 Jul 2012 [11:37:18], Yoshihiro YUNOMAE wrote:
>> >> From: Masami Hiramatsu 
>> >>
>> >> Add a failback memcpy path for unstealable pipe buffer.
>> >> If buf->ops->steal() fails, virtio-serial tries to
>> >> copy the page contents to an allocated page, instead
>> >> of just failing splice().
>> >>
>> >> Signed-off-by: Masami Hiramatsu 
>> >> Cc: Amit Shah 
>> >> Cc: Arnd Bergmann 
>> >> Cc: Greg Kroah-Hartman 
>> >> ---
>> >>
>> >>  drivers/char/virtio_console.c |   28 +---
>> >>  1 files changed, 25 insertions(+), 3 deletions(-)
>> >>
>> >> diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
>> >> index fe31b2f..911cb3e 100644
>> >> --- a/drivers/char/virtio_console.c
>> >> +++ b/drivers/char/virtio_console.c
>> >> @@ -794,7 +794,7 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
>> >> struct pipe_buffer *buf,
>> >>   struct splice_desc *sd)
>> >>  {
>> >>   struct sg_list *sgl = sd->u.data;
>> >> - unsigned int len = 0;
>> >> + unsigned int offset, len;
>> >>  
>> >>   if (sgl->n == MAX_SPLICE_PAGES)
>> >>   return 0;
>> >> @@ -807,9 +807,31 @@ static int pipe_to_sg(struct pipe_inode_info *pipe, 
>> >> struct pipe_buffer *buf,
>> >>  
>> >>   len = min(buf->len, sd->len);
>> >>   sg_set_page(&(sgl->sg[sgl->n]), buf->page, len, buf->offset);
>> >> - sgl->n++;
>> >> - sgl->len += len;
>> >> + } else {
>> >> + /* Failback to copying a page */
>> >> + struct page *page = alloc_page(GFP_KERNEL);
>> > 
>> > I prefer zeroing out the page.  If there's not enough data to be
>> > filled in the page, the remaining data can be leaked to the host.
>> 
>> Yeah, it is really easy to fix that.
>> But out of curiosity, would that be really a problem?
>> I guess that host can access any guest page if need. If that
>> is right, is that really insecure to leak randomly allocated
>> unused page to the host?
> 
> I'm not sure if there is a way to really attack, but just something I
> had thought about: the host kernel can access any guest page, that's
> not something we can prevent.
> 
> However, if qemu is restricted from accessing guest pages, and the
> guest shares this page with qemu for r/w purposes via the virtio
> channel, a qemu exploit can expose guest data to host userspace.
> 
> I agree this is completely theoretical; can someone else with more
> insight confirm or deny my apprehensions?

qemu can read and write any guest page (for the guest it controls).


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


ext4fs error "ext4_mb_generate_buddy:741:group 16, 8160 clusters in bitmap, 4064 in gd" (with repro)

2012-08-09 Thread Paolo Bonzini
Here is how to reproduce it.  It happens during fstrim.  I found other
occurrences of the error in the mailing list, but they were not related
to trim so they may be something different.

modprobe scsi_debug dev_size_mb=256 lbpws=1
dd if=/dev/zero of=/dev/sdb bs=1M  
fdisk /dev/sdb
 >> create a new partition accepting all defaults
fdisk -lu /dev/sdb|tail -1
 >> should show: /dev/sdb1 57  524285  262114+  83  Linux

mkfs.ext4 /dev/sdb1
mkdir test
mount /dev/sdb1 test
fstrim ./test

Here is the output in dmesg:

[140934.644166] EXT4-fs (sdb1): mounted filesystem with ordered data mode. 
Opts: (null)
[140941.562060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 
16, 8160 clusters in bitmap, 4064 in gd
[140941.603066] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 
25, 8192 clusters in bitmap, 7934 in gd
[140941.613060] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 
27, 8192 clusters in bitmap, 7934 in gd
[140941.634074] EXT4-fs error (device sdb1): ext4_mb_generate_buddy:741: group 
31, 8192 clusters in bitmap, 8159 in gd

Hope this helps,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] mfd: add lp8788 mfd driver

2012-08-09 Thread Mark Brown
On Thu, Aug 09, 2012 at 08:22:38AM +, Kim, Milo wrote:
> TI LP8788 PMU supports regulators, battery charger, RTC,
> ADC, backlight driver and current sinks.

> Patch v2.
> (a) For interrupt handling, use generic irq rather than irq-domain

This seems like a very substantial step backwards, why make this change?
Using irqdomain solves a bunch of problems, especially around virq
allocation, and is where we want all drivers to go longer term.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   >