[PATCH v5 7/25] compiler{,-gcc4}.h: Introduce __flatten function attribute

2012-09-25 Thread Daniel Santos
For gcc 4.1 & later, expands to __attribute__((flatten)) which forces
the compiler to inline everything it can into the function.  This is
useful in combination with noinline when you want to control the depth
of inlining, or create a single function where inline expansions will
occur. (see
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bflatten_007d-function-attribute-2512)

Normally, it's best to leave this type of thing up to the compiler.
However, the generic rbtree code uses inline functions just to be able
to inject compile-time constant data that specifies how the caller wants
the function to behave (via struct rb_relationship).  This data can be
thought of as the template parameters of a C++ templatized function.
Since some of these functions, once expanded, become quite large, gcc
sometimes decides not to perform some important inlining, in one case,
even generating a few bytes more code by not doing so. (Note: I have not
eliminated the possibility that this was an optimization bug, but the
flatten attribute fixes it in either case.)

Combining __flatten and noinline insures that important optimizations
occur in these cases and that the inline expansion occurs in exactly one
place, thus not leading to unnecissary bloat. However, it also can
eliminate some opportunities for optimization should gcc otherwise
decide the function its self is a good candidate for inlining.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |7 ++-
 include/linux/compiler.h  |4 
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 5755e23..38fb81d 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -15,7 +15,12 @@
 
 #if GCC_VERSION >= 40102
 # define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-#endif
+
+/* flatten introduced in 4.1, but broken in 4.6.0 (gcc bug #48731)*/
+# if GCC_VERSION != 40600
+#  define __flatten __attribute__((flatten))
+# endif
+#endif /* GCC_VERSION >= 40102 */
 
 #if GCC_VERSION >= 40300
 /* Mark functions as cold. gcc will assume any path leading to a call
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 4d9f353..b26d606 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -244,6 +244,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, 
int val, int expect);
 #define __always_inline inline
 #endif
 
+#ifndef __flatten
+#define __flatten
+#endif
+
 #endif /* __KERNEL__ */
 
 /*
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 5/25] compiler{,-gcc4}.h: Remove duplicate macros

2012-09-25 Thread Daniel Santos
__linktime_error() does the same thing as __compiletime_error() and is
only used in bug.h.  Since the macro defines a function attribute that
will cause a failure at compile-time (not link-time), it makes more
sense to keep __compiletime_error(), which is also neatly mated with
__compiletime_warning().

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |2 --
 include/linux/compiler.h  |3 ---
 2 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 7ad60cd..5755e23 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -33,8 +33,6 @@
the kernel context */
 #define __cold __attribute__((__cold__))
 
-#define __linktime_error(message) __attribute__((__error__(message)))
-
 #ifndef __CHECKER__
 # define __compiletime_warning(message) __attribute__((warning(message)))
 # define __compiletime_error(message) __attribute__((error(message)))
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 923d093..4d9f353 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -293,9 +293,6 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int 
val, int expect);
 #ifndef __compiletime_error
 # define __compiletime_error(message)
 #endif
-#ifndef __linktime_error
-# define __linktime_error(message)
-#endif
 /*
  * Prevent the compiler from merging or refetching accesses.  The compiler
  * is also forbidden from reordering successive instances of ACCESS_ONCE(),
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 3/25] compiler-gcc.h: Add gcc-recommended GCC_VERSION macro

2012-09-25 Thread Daniel Santos
Throughout compiler*.h, many version checks are made.  These can be
simplified by using the macro that gcc's documentation recommends.
However, my primary reason for adding this is that I need bug-check
macros that are enabled at certain gcc versions and it's cleaner to use
this macro than the tradition method:

if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ => 2)

If you add patch level, it gets this ugly:

if __GNUC__ > 4 || (__GNUC__ == 4 && (__GNUC_MINOR__ > 2 || \
   __GNUC_MINOR__ == 2 __GNUC_PATCHLEVEL__ >= 1))

As opposed to:

if GCC_VERSION >= 40201

While having separate headers for gcc 3 & 4 eliminates some of this
verbosity, they can still be cleaned up by this.

See also:
http://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc.h |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 6a6d7ae..24545cd 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -5,6 +5,9 @@
 /*
  * Common definitions for all gcc versions go here.
  */
+#define GCC_VERSION (__GNUC__ * 1 \
+  + __GNUC_MINOR__ * 100 \
+  + __GNUC_PATCHLEVEL__)
 
 
 /* Optimization barrier */
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 2/25] compiler-gcc4.h: Reorder macros based upon gcc ver

2012-09-25 Thread Daniel Santos
This helps to keep the file from getting confusing, removes one
duplicate version check and should encourage future editors to put new
macros where they belong.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |   20 +++-
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 10ce4fa..a334107 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -13,6 +13,10 @@
 #define __must_check   __attribute__((warn_unused_result))
 #define __compiler_offsetof(a,b) __builtin_offsetof(a,b)
 
+#if __GNUC_MINOR__ > 0
+# define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
+#endif
+
 #if __GNUC_MINOR__ >= 3
 /* Mark functions as cold. gcc will assume any path leading to a call
to them will be unlikely.  This means a lot of manual unlikely()s
@@ -31,6 +35,12 @@
 
 #define __linktime_error(message) __attribute__((__error__(message)))
 
+#ifndef __CHECKER__
+# define __compiletime_warning(message) __attribute__((warning(message)))
+# define __compiletime_error(message) __attribute__((error(message)))
+#endif /* __CHECKER__ */
+#endif /* __GNUC_MINOR__ >= 3 */
+
 #if __GNUC_MINOR__ >= 5
 /*
  * Mark a position in code as unreachable.  This can be used to
@@ -46,13 +56,5 @@
 /* Mark a function definition as prohibited from being cloned. */
 #define __noclone  __attribute__((__noclone__))
 
-#endif
-#endif
+#endif /* __GNUC_MINOR__ >= 5 */
 
-#if __GNUC_MINOR__ > 0
-#define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
-#endif
-#if __GNUC_MINOR__ >= 3 && !defined(__CHECKER__)
-#define __compiletime_warning(message) __attribute__((warning(message)))
-#define __compiletime_error(message) __attribute__((error(message)))
-#endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v5 1/25] compiler-gcc4.h: Correct verion check for __compiletime_error

2012-09-25 Thread Daniel Santos
__attribute__((error(msg))) was introduced in gcc 4.3 (not 4.4) and as I
was unable to find any gcc bugs pertaining to it, I'm presuming that it
has functioned as advertised since 4.3.0.

Signed-off-by: Daniel Santos 
---
 include/linux/compiler-gcc4.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 2f40791..10ce4fa 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -52,7 +52,7 @@
 #if __GNUC_MINOR__ > 0
 #define __compiletime_object_size(obj) __builtin_object_size(obj, 0)
 #endif
-#if __GNUC_MINOR__ >= 4 && !defined(__CHECKER__)
+#if __GNUC_MINOR__ >= 3 && !defined(__CHECKER__)
 #define __compiletime_warning(message) __attribute__((warning(message)))
 #define __compiletime_error(message) __attribute__((error(message)))
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: parisc: orphaned asm/compat_signal.h file?

2012-09-25 Thread John David Anglin

On 24-Sep-12, at 8:39 AM, Denys Vlasenko wrote:


Maybe it needs to be removed?


Worked for me with 3.5.4.

Dave
--
John David Anglin   dave.ang...@bell.net



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2012-09-20-17-25 uploaded (fs/bimfmt_elf on uml)

2012-09-25 Thread David Rientjes
On Wed, 26 Sep 2012, Stephen Rothwell wrote:

> > This still happens on x86_64 for linux-next as of today's tree.
> 
> Are you sure?  next-20120925?
> 
> $ grep -n vmalloc fs/binfmt_elf.c
> 30:#include 
> 1421: data = vmalloc(size);
> 

Ok, it looks like it's fixed by 1bb6a4c9514e in today's linux-next tree; 
that wasn't present when I pulled it at 2am PDT, so it must be a time zone 
difference.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC/PATCH] zsmalloc added back to zcache2 (Was: [RFC] mm: add support for zsmalloc and zcache)

2012-09-25 Thread Dan Magenheimer
Attached patch applies to staging-next and adds zsmalloc
support, optionally at compile-time and run-time, back into
zcache (aka zcache2).  It is only lightly tested and does
not provide some of the debug info from old zcache (aka zcache1)
because it needs to be converted from sysfs to debugfs.
I'll leave that as an exercise for someone else as I'm
not sure if any of those debug fields are critical to
anyone's needs and some of the datatypes are not supported
by debugfs.

Apologies if there are line breaks... I can't send this from
a linux mailer right now.  If it is broken, let me know,
and I will re-post tomorrow.

Signed-off-by: Dan Magenheimer 

diff --git a/drivers/staging/ramster/Kconfig b/drivers/staging/ramster/Kconfig
index 843c541..28403cc 100644
--- a/drivers/staging/ramster/Kconfig
+++ b/drivers/staging/ramster/Kconfig
@@ -15,6 +15,17 @@ config ZCACHE2
  again in the future.  Until then, zcache2 is a single-node
  version of ramster.
 
+config ZCACHE_ZSMALLOC
+   bool "Allow use of zsmalloc allocator for compression of swap pages"
+   depends on ZSMALLOC=y
+   default n
+   help
+ Zsmalloc is a much more efficient allocator for compresssed
+ pages but currently has some design deficiencies in that it
+ does not support reclaim nor compaction.  Select this if
+ you are certain your workload will fit or has mostly short
+ running processes.
+
 config RAMSTER
bool "Cross-machine RAM capacity sharing, aka peer-to-peer tmem"
depends on CONFIGFS_FS=y && SYSFS=y && !HIGHMEM && ZCACHE2=y
diff --git a/drivers/staging/ramster/zcache-main.c 
b/drivers/staging/ramster/zcache-main.c
index a09dd5c..9a4d780 100644
--- a/drivers/staging/ramster/zcache-main.c
+++ b/drivers/staging/ramster/zcache-main.c
@@ -26,6 +26,12 @@
 #include 
 #include 
 #include "tmem.h"
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+#include "../zsmalloc/zsmalloc.h"
+static int zsmalloc_enabled;
+#else
+#define zsmalloc_enabled 0
+#endif
 #include "zcache.h"
 #include "zbud.h"
 #include "ramster.h"
@@ -182,6 +188,35 @@ static unsigned long zcache_last_inactive_anon_pageframes;
 static unsigned long zcache_eph_nonactive_puts_ignored;
 static unsigned long zcache_pers_nonactive_puts_ignored;
 
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+#define ZS_CHUNK_SHIFT 6
+#define ZS_CHUNK_SIZE  (1 << ZS_CHUNK_SHIFT)
+#define ZS_CHUNK_MASK  (~(ZS_CHUNK_SIZE-1))
+#define ZS_NCHUNKS (((PAGE_SIZE - sizeof(struct tmem_handle)) & \
+   ZS_CHUNK_MASK) >> ZS_CHUNK_SHIFT)
+#define ZS_MAX_CHUNK   (ZS_NCHUNKS-1)
+
+/* total number of persistent pages may not exceed this percentage */
+static unsigned int zv_page_count_policy_percent = 75;
+/*
+ * byte count defining poor compression; pages with greater zsize will be
+ * rejected
+ */
+static unsigned int zv_max_zsize = (PAGE_SIZE / 8) * 7;
+/*
+ * byte count defining poor *mean* compression; pages with greater zsize
+ * will be rejected until sufficient better-compressed pages are accepted
+ * driving the mean below this threshold
+ */
+static unsigned int zv_max_mean_zsize = (PAGE_SIZE / 8) * 5;
+
+static atomic_t zv_curr_dist_counts[ZS_NCHUNKS];
+static atomic_t zv_cumul_dist_counts[ZS_NCHUNKS];
+static atomic_t zcache_curr_pers_pampd_count = ATOMIC_INIT(0);
+static unsigned long zcache_curr_pers_pampd_count_max;
+
+#endif
+
 #ifdef CONFIG_DEBUG_FS
 #include 
 #definezdfsdebugfs_create_size_t
@@ -370,6 +405,13 @@ int zcache_new_client(uint16_t cli_id)
if (cli->allocated)
goto out;
cli->allocated = 1;
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+   if (zsmalloc_enabled) {
+   cli->zspool = zs_create_pool("zcache", ZCACHE_GFP_MASK);
+   if (cli->zspool == NULL)
+   goto out;
+   }
+#endif
ret = 0;
 out:
return ret;
@@ -632,6 +674,105 @@ out:
return pampd;
 }
 
+#ifdef CONFIG_ZCACHE_ZSMALLOC
+struct zv_hdr {
+   uint32_t pool_id;
+   struct tmem_oid oid;
+   uint32_t index;
+   size_t size;
+};
+
+static unsigned long zv_create(struct zcache_client *cli, uint32_t pool_id,
+   struct tmem_oid *oid, uint32_t index,
+   struct page *page)
+{
+   struct zv_hdr *zv;
+   int chunks;
+   unsigned long curr_pers_pampd_count, total_zsize, zv_mean_zsize;
+   unsigned long handle = 0;
+   void *cdata;
+   unsigned clen;
+
+   curr_pers_pampd_count = atomic_read(_curr_pers_pampd_count);
+   if (curr_pers_pampd_count >
+   (zv_page_count_policy_percent * totalram_pages) / 100)
+   goto out;
+   zcache_compress(page, , );
+   /* reject if compression is too poor */
+   if (clen > zv_max_zsize) {
+   zcache_compress_poor++;
+   goto out;
+   }
+   /* reject if mean compression is too poor */
+   if ((clen > zv_max_mean_zsize) && (curr_pers_pampd_count > 0)) {

Re: [PATCH Resend] extcon: Fix return value in extcon_register_interest()

2012-09-25 Thread Chanwoo Choi
On 09/25/2012 08:01 PM, Sachin Kamat wrote:

> Propagate the value returned from extcon_find_cable_index()
> instead of -ENODEV. For readability, -EINVAL is returned in place of
> the variable.
> 
> Signed-off-by: Sachin Kamat 
> ---

Applied, thank you.

You can check it after some hours at below git repository for Extcon.
-
http://git.infradead.org/users/kmpark/linux-samsung/shortlog/refs/heads/extcon-for-next

Thanks,
Chanwoo Choi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -next v2] Shorten constant names for EFI variable attributes

2012-09-25 Thread Matthew Garrett
On Tue, Sep 25, 2012 at 05:06:32PM -0600, Khalid Aziz wrote:

> This is for code cleanup and does not impact functionality. This is
> based upon an earlier discussion -
> . The goal is to make the code more
> readable. V1 of this patch was discussed at
> .

Right. Keeping the spec names makes it difficult to write code in a 
readable way.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] x86: Added support for Acer Aspire 5755G fan control.

2012-09-25 Thread Tero Keski-Valkama
2012/9/26 Borislav Petkov :
> Adding Peter.
>
> On Tue, Sep 25, 2012 at 12:41:11PM +0300, Tero Keski-Valkama wrote:
>> 2012/9/25 Borislav Petkov :
>> > On Tue, Sep 25, 2012 at 10:34:13AM +0300, Tero Keski-Valkama wrote:
>> >
>> > But before we go with this any further: you mentioned some issues still
>> > with acerhdf - you don't want to turn off your fan but to turn it to
>> > full?
>> >
>> > I think in this case, you want to simply not load the driver and use the
>> > BIOS settings, no?
>> >
>> > --
>> > Regards/Gruss,
>> > Boris.
>>
>> Technically I think my original issue is about thermal zones, and BIOS
>> control doesn't solve it as is. However, the patch should be valid,
>> even if it doesn't solve my original issue.
>>
>> The thing is that passive cooling thermal zones throttle all the cores
>> to stone age, to 800 MHz, before the fan even really starts powering
>> up from the low default level.
>
> I don't understand: you want the thermal zones to not throttle your cpu
> to 800 Mhz and/or the fan to start sooner?
>
>> So, my original problem would be solved
>> by:
>> a) Direct control of the fan, to be able to put it on full for example
>> through userspace control,
>
> That's always a bad idea, especially if userspace dies on you.
>
>> b) Changing the active cooling thermal zones, so that the fan speeds
>> up earlier, or
>
> Ok, I see, here's what you want.

Yes. I disabled acpi_cpufreq and cpufreq daemon, so throttling to iron
age didn't happen now, but the fan didn't start increasing cooling
either. So while it seems during testing different values with the EC
registers I achieved at one point automatic fan control by BIOS, but
it's not the default state. The default state is still constant speed,
as it was in the start, and I can't get it to change anymore.

>
> Also, you've added yourself to the copyright - this means that you're
> pretty much going to get all future email about acerhdf. Do you really
> want that?

For now, that is what I want to the foreseeable future, as long as I
own such a laptop. I can perhaps contribute to the development a bit.

>
> --
> Regards/Gruss,
> Boris.



-- 
Kind Regards / Ystävällisin terveisin,

Tero Keski-Valkama, MSc(Tech)

+358 (0)46 876 0485

tero.keski-valk...@neter.fi
http://www.neter.fi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -next v2] Shorten constant names for EFI variable attributes

2012-09-25 Thread Khalid Aziz
On Wed, 2012-09-26 at 07:55 +1000, Stephen Rothwell wrote:
> Hi,
> 
> On Tue, 25 Sep 2012 09:41:00 -0600 Khalid Aziz  wrote:
> >
> > Replace very long constants for EFI variable attributes with
> > shorter and more convenient names. Also create an alias for
> > the current longer names so as to not break compatibility
> > with current API since these constants are used by
> > userspace programs.
> 
> Why do this?  It just looks like churn for no real gain.
> 

This is for code cleanup and does not impact functionality. This is
based upon an earlier discussion -
. The goal is to make the code more
readable. V1 of this patch was discussed at
.

-- 
Khalid Aziz 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-25 Thread Florian Dazinger
Am Tue, 25 Sep 2012 13:43:46 -0600
schrieb Alex Williamson :

> On Tue, 2012-09-25 at 20:54 +0200, Florian Dazinger wrote:
> > Am Tue, 25 Sep 2012 12:32:50 -0600
> > schrieb Alex Williamson :
> > 
> > > On Mon, 2012-09-24 at 21:03 +0200, Florian Dazinger wrote:
> > > > Hi,
> > > > I think I've found a regression, which causes an early boot crash, I
> > > > appended the kernel output via jpg file, since I do not have a serial
> > > > console or sth.
> > > > 
> > > > after bisection, it boils down to this commit:
> > > > 
> > > > 9dcd61303af862c279df86aa97fde7ce371be774 is the first bad commit
> > > > commit 9dcd61303af862c279df86aa97fde7ce371be774
> > > > Author: Alex Williamson 
> > > > Date:   Wed May 30 14:19:07 2012 -0600
> > > > 
> > > > amd_iommu: Support IOMMU groups
> > > > 
> > > > Add IOMMU group support to AMD-Vi device init and uninit code.
> > > > Existing notifiers make sure this gets called for each device.
> > > > 
> > > > Signed-off-by: Alex Williamson 
> > > > Signed-off-by: Joerg Roedel 
> > > > 
> > > > :04 04 2f6b1b8e104d6dfec0abaa9646750f9b5a4f4060
> > > > 837ae95e84f6d3553457c4df595a9caa56843c03 M  drivers
> > > 
> > > [switching back to mailing list thread]
> > > 
> > > I asked Florian for dmesg w/ amd_iommu_dump, here's the relevant lines:
> > > 
> > > [1.485645] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 
> > > 1300
> > > [1.485683] AMD-Vi:mmio-addr: feb2
> > > [1.485901] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:00.0 flags: 00
> > > [1.485935] AMD-Vi:   DEV_RANGE_END   devid: 00:00.2
> > > [1.485969] AMD-Vi:   DEV_SELECT  devid: 00:02.0 
> > > flags: 00
> > > [1.486002] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 01:00.0 flags: 00
> > > [1.486036] AMD-Vi:   DEV_RANGE_END   devid: 01:00.1
> > > [1.486070] AMD-Vi:   DEV_SELECT  devid: 00:04.0 
> > > flags: 00
> > > [1.486103] AMD-Vi:   DEV_SELECT  devid: 02:00.0 
> > > flags: 00
> > > [1.486137] AMD-Vi:   DEV_SELECT  devid: 00:05.0 
> > > flags: 00
> > > [1.486170] AMD-Vi:   DEV_SELECT  devid: 03:00.0 
> > > flags: 00
> > > [1.486204] AMD-Vi:   DEV_SELECT  devid: 00:06.0 
> > > flags: 00
> > > [1.486238] AMD-Vi:   DEV_SELECT  devid: 04:00.0 
> > > flags: 00
> > > [1.486271] AMD-Vi:   DEV_SELECT  devid: 00:07.0 
> > > flags: 00
> > > [1.486305] AMD-Vi:   DEV_SELECT  devid: 05:00.0 
> > > flags: 00
> > > [1.486338] AMD-Vi:   DEV_SELECT  devid: 00:09.0 
> > > flags: 00
> > > [1.486372] AMD-Vi:   DEV_SELECT  devid: 06:00.0 
> > > flags: 00
> > > [1.486406] AMD-Vi:   DEV_SELECT  devid: 00:0b.0 
> > > flags: 00
> > > [1.486439] AMD-Vi:   DEV_SELECT  devid: 07:00.0 
> > > flags: 00
> > > [1.486473] AMD-Vi:   DEV_ALIAS_RANGE devid: 08:01.0 
> > > flags: 00 devid_to: 08:00.0
> > > [1.486510] AMD-Vi:   DEV_RANGE_END   devid: 08:1f.7
> > > [1.486548] AMD-Vi:   DEV_SELECT  devid: 00:11.0 
> > > flags: 00
> > > [1.486581] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:12.0 flags: 00
> > > [1.486620] AMD-Vi:   DEV_RANGE_END   devid: 00:12.2
> > > [1.486654] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:13.0 flags: 00
> > > [1.486688] AMD-Vi:   DEV_RANGE_END   devid: 00:13.2
> > > [1.486721] AMD-Vi:   DEV_SELECT  devid: 00:14.0 
> > > flags: d7
> > > [1.486755] AMD-Vi:   DEV_SELECT  devid: 00:14.3 
> > > flags: 00
> > > [1.486788] AMD-Vi:   DEV_SELECT  devid: 00:14.4 
> > > flags: 00
> > > [1.486822] AMD-Vi:   DEV_ALIAS_RANGE devid: 09:00.0 
> > > flags: 00 devid_to: 00:14.4
> > > [1.486859] AMD-Vi:   DEV_RANGE_END   devid: 09:1f.7
> > > [1.486897] AMD-Vi:   DEV_SELECT  devid: 00:14.5 
> > > flags: 00
> > > [1.486931] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:16.0 flags: 00
> > > [1.486965] AMD-Vi:   DEV_RANGE_END   devid: 00:16.2
> > > [1.487055] AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
> > > 
> > > 
> > > > lspci:
> > > > 00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to 
> > > > PCI bridge (external gfx0 port B) (rev 02)
> > > > 00:00.2 IOMMU: Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory 
> > > > Management Unit (IOMMU)
> > > > 00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to 
> > > > PCI bridge (PCI express gpp port B)
> > > > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to 
> > > > PCI bridge (PCI express gpp port D)
> > > > 00:05.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to 
> > > > PCI bridge (PCI 

Re: [PATCH 0/4] Fix a crash when block device is read and block size is changed at the same time

2012-09-25 Thread Mikulas Patocka


On Tue, 25 Sep 2012, Jeff Moyer wrote:

> Jeff Moyer  writes:
> 
> > Mikulas Patocka  writes:
> >
> >> Hi Jeff
> >>
> >> Thanks for testing.
> >>
> >> It would be interesting ... what happens if you take the patch 3, leave 
> >> "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct 
> >> block_device", but remove any use of the semaphore from fs/block_dev.c? - 
> >> will the performance be like unpatched kernel or like patch 3? It could be 
> >> that the change in the alignment affects performance on your CPU too, just 
> >> differently than on my CPU.
> >
> > It turns out to be exactly the same performance as with the 3rd patch
> > applied, so I guess it does have something to do with cache alignment.
> > Here is the patch (against vanilla) I ended up testing.  Let me know if
> > I've botched it somehow.
> >
> > So, I next up I'll play similar tricks to what you did (padding struct
> > block_device in all kernels) to eliminate the differences due to
> > structure alignment and provide a clear picture of what the locking
> > effects are.
> 
> After trying again with the same padding you used in the struct
> bdev_inode, I see no performance differences between any of the
> patches.  I tried bumping up the number of threads to saturate the
> number of cpus on a single NUMA node on my hardware, but that resulted
> in lower IOPS to the device, and hence consumption of less CPU time.
> So, I believe my results to be inconclusive.

For me, the fourth patch with RCU-based locks performed better, so I am 
submitting that.

> After talking with Vivek about the problem, he had mentioned that it
> might be worth investigating whether bd_block_size could be protected
> using SRCU.  I looked into it, and the one thing I couldn't reconcile is
> updating both the bd_block_size and the inode->i_blkbits at the same
> time.  It would involve (afaiui) adding fields to both the inode and the
> block_device data structures and using rcu_assign_pointer  and
> rcu_dereference to modify and access the fields, and both fields would
> need to protected by the same struct srcu_struct.  I'm not sure whether
> that's a desirable approach.  When I started to implement it, it got
> ugly pretty quickly.  What do others think?

Using RCU doesn't seem sensible to me (except for lock implementation, as 
it is in patch 4). The major problem is that the block layer reads 
blocksize multiple times and when different values are read, a crash may 
happen - RCU doesn't protect you against that - if you read a variable 
multiple times in a RCU-protected section, you can still get different 
results.

If we wanted to use RCU, we would have to read blocksize just once and 
pass the value between all functions involved - that would result in a 
massive code change.

> For now, my preference is to get the full patch set in.  I will continue
> to investigate the performance impact of the data structure size changes
> that I've been seeing.

Yes, we should get the patches to the kernel.

Mikulas

> So, for the four patches:
> 
> Acked-by: Jeff Moyer 
> 
> Jens, can you have a look at the patch set?  We are seeing problem
> reports of this in the wild[1][2].
> 
> Cheers,
> Jeff
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=824107
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=812129
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[tip:x86/smap] x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space

2012-09-25 Thread tip-bot for H. Peter Anvin
Commit-ID:  e139e95590dfebab81841bf7a3ac46500f51a47c
Gitweb: http://git.kernel.org/tip/e139e95590dfebab81841bf7a3ac46500f51a47c
Author: H. Peter Anvin 
AuthorDate: Tue, 25 Sep 2012 15:42:18 -0700
Committer:  H. Peter Anvin 
CommitDate: Tue, 25 Sep 2012 15:42:18 -0700

x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space

With SMAP, the [f][x]rstor_checking() functions are no longer usable
for user-space pointers by applying a simple __force cast.  Instead,
create new [f][x]rstor_user() functions which do the proper SMAP
magic.

Signed-off-by: H. Peter Anvin 
Cc: Suresh Siddha 
Link: 
http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.sid...@intel.com
---
 arch/x86/include/asm/fpu-internal.h |   17 +
 arch/x86/kernel/xsave.c |6 +++---
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h 
b/arch/x86/include/asm/fpu-internal.h
index 409b9cc..831dbb9 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -181,11 +181,28 @@ static inline int fxrstor_checking(struct 
i387_fxsave_struct *fx)
  "m" (*fx));
 }
 
+static inline int fxrstor_user(struct i387_fxsave_struct __user *fx)
+{
+   if (config_enabled(CONFIG_X86_32))
+   return user_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+   else if (config_enabled(CONFIG_AS_FXSAVEQ))
+   return user_insn(fxrstorq %[fx], "=m" (*fx), [fx] "m" (*fx));
+
+   /* See comment in fpu_fxsave() below. */
+   return user_insn(rex64/fxrstor (%[fx]), "=m" (*fx), [fx] "R" (fx),
+ "m" (*fx));
+}
+
 static inline int frstor_checking(struct i387_fsave_struct *fx)
 {
return check_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
 }
 
+static inline int frstor_user(struct i387_fsave_struct __user *fx)
+{
+   return user_insn(frstor %[fx], "=m" (*fx), [fx] "m" (*fx));
+}
+
 static inline void fpu_fxsave(struct fpu *fpu)
 {
if (config_enabled(CONFIG_X86_32))
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 4e89b3d..ada87a3 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -315,7 +315,7 @@ static inline int restore_user_xstate(void __user *buf, u64 
xbv, int fx_only)
if ((unsigned long)buf % 64 || fx_only) {
u64 init_bv = pcntxt_mask & ~XSTATE_FPSSE;
xrstor_state(init_xstate_buf, init_bv);
-   return fxrstor_checking((__force void *) buf);
+   return fxrstor_user(buf);
} else {
u64 init_bv = pcntxt_mask & ~xbv;
if (unlikely(init_bv))
@@ -323,9 +323,9 @@ static inline int restore_user_xstate(void __user *buf, u64 
xbv, int fx_only)
return xrestore_user(buf, xbv);
}
} else if (use_fxsr()) {
-   return fxrstor_checking((__force void *) buf);
+   return fxrstor_user(buf);
} else
-   return frstor_checking((__force void *) buf);
+   return frstor_user(buf);
 }
 
 int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: next-20120925: BUG at drivers/scsi/scsi_lib.c:640!

2012-09-25 Thread Andrew Morton
(cc's added)

On Tue, 25 Sep 2012 22:06:37 +0400
Dmitry Monakhov  wrote:

> 
> Seems like barriers are broken again
> 
>  kernel BUG at drivers/scsi/scsi_lib.c:1180!
>  invalid opcode:  [#1] SMP 
>  Modules linked in: coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel 
> microcode sg xhci_hcd button ext3 jbd mbcache sd_mod crc_t10dif\
> elper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic 
> dm_mirror dm_region_hash dm_log dm_mod
>  CPU 0 
>  Pid: 753, comm: fsck.ext3 Not tainted 3.6.0-rc7-next-20120925+ #4
>   /DQ67SW
>  RIP: 0010:[]  [] 
> scsi_setup_fs_cmnd+0xec/0x180
>  RSP: 0018:880233aff9f8  EFLAGS: 00010002
>  RAX: 0003 RBX: 88022a741000 RCX: 0002
>  RDX:  RSI: 0001 RDI: 81f32b48
>  RBP: 880233affa18 R08: 0001 R09: 
>  R10: 88022a26c800 R11:  R12: 880229369968
>  R13: 0001 R14: 88022a741000 R15: 
>  FS:  7f1348632760() GS:88023e20() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 003a3dc0e550 CR3: 0002338cf000 CR4: 000407f0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: 0ff0 DR7: 0400
>  Process fsck.ext3 (pid: 753, threadinfo 880233afe000, task 
> 880233f48240)
>  Stack:
>   880233affa48 880229369968 0001 880229bdb550
>   880233affaa8 a00a8860 880233affab8 0082
>    8107d696 8802 817410d8
>  Call Trace:
>   [] sd_prep_fn+0x140/0xfe0 [sd_mod]
>   [] ? lock_timer_base+0x76/0xf0
>   [] ? _raw_spin_unlock_irq+0x48/0x80
>   [] blk_peek_request+0x23c/0x450
>   [] scsi_request_fn+0x70/0x820
>   [] __blk_run_queue+0x55/0x70
>   [] cfq_rq_enqueued+0x155/0x1c0
>   [] cfq_insert_request+0x2b6/0x2f0
>   [] ? cfq_insert_request+0x4d/0x2f0
>   [] ? md5_final+0x9f/0x130
>   [] ? __lock_release+0xc3/0xe0
>   [] ? drive_stat_acct+0x334/0x3b0
>   [] __elv_add_request+0x2a6/0x350
>   [] blk_queue_bio+0x52b/0x570
>   [] generic_make_request+0x125/0x1c0
>   [] submit_bio+0x1d8/0x240
>   [] ? bio_alloc_bioset+0x103/0x1e0
>   [] blkdev_issue_flush+0x177/0x200
>   [] blkdev_fsync+0x4a/0x70
>   [] vfs_fsync_range+0x36/0x60
>   [] vfs_fsync+0x1c/0x20
>   [] do_fsync+0x58/0x90
>   [] sys_fsync+0x10/0x20
>   [] system_call_fastpath+0x16/0x1b
>  Code: 00 48 c7 c7 48 2b f3 81 41 0f 94 c5 31 d2 44 89 ee e8 d9 e4 cd ff 49 
> 63 c5 48 83 c0 02 48 83 04 c5 b0 a5 13 82 01 45 85 ed 74 04 <0f\
>  48 89 df 31 db e8 a3 f6 ff ff 48 85 c0 48 
>  RIP  [] scsi_setup_fs_cmnd+0xec/0x180
>   RSP 
> 
> 
>  [ cut here ]
>  kernel BUG at drivers/scsi/scsi_lib.c:640!
>  invalid opcode:  [#1] SMP 
>  Modules linked in: coretemp kvm_intel kvm crc32c_intel ghash_clmulni_intel 
> microcode sg xhci_hcd button ext3 jbd mbcache sd_mod crc_t10dif\
> elper cryptd lrw aes_x86_64 xts gf128mul ahci libahci pata_acpi ata_generic 
> dm_mirror dm_region_hash dm_log dm_mod
>  CPU 0 
>  Pid: 727, comm: fsck.ext3 Not tainted 3.6.0-rc7-next-20120925+ #5
>   /DQ67SW
>  RIP: 0010:[]  [] 
> scsi_alloc_sgtable+0x55/0xe0
>  RSP: 0018:880228215aa8  EFLAGS: 00010002
>  RAX: 0003 RBX: 880228111a18 RCX: 0001
>  RDX:  RSI: 0001 RDI: 81f32a08
>  RBP: 880228215ac8 R08: 0001 R09: 
>  R10: 0002 R11:  R12: 
>  R13: 0020 R14: 0001 R15: 
>  FS:  7fb605f35760() GS:88023e20() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 003a3dc0e550 CR3: 000233e83000 CR4: 000407f0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: 0ff0 DR7: 0400
>  Process fsck.ext3 (pid: 727, threadinfo 880228214000, task 
> 880233af8c80)
>  Stack:
>   880228111a18 88022a0a0638  88022a679000
>   880228215b08 81470641 8802281119c0 88022a679000
>   880228215b28 8802281119c0 88022a0a0638 0020
>  Call Trace:
>   [] scsi_init_sgtable+0x31/0xe0
>   [] scsi_init_io+0x3d/0x2e0
>   [] scsi_setup_fs_cmnd+0x153/0x180
>   [] sd_prep_fn+0x140/0xfe0 [sd_mod]
>   [] ? __list_add+0x15c/0x180
>   [] ? elv_dispatch_sort+0x17b/0x180
>   [] blk_peek_request+0x23c/0x450
>   [] scsi_request_fn+0x70/0x820
>   [] __blk_run_queue+0x55/0x70
&g

[PATCH 2/2] Fix a crash when block device is read and block size is changed at the same time

2012-09-25 Thread Mikulas Patocka
blockdev: turn a rw semaphore into a percpu rw semaphore

This avoids cache line bouncing when many processes lock the semaphore
for read.

New percpu lock implementation

The lock consists of an array of percpu unsigned integers, a boolean
variable and a mutex.

When we take the lock for read, we enter rcu read section, check for a
"locked" variable. If it is false, we increase a percpu counter on the
current cpu and exit the rcu section. If "locked" is true, we exit the
rcu section, take the mutex and drop it (this waits until a writer
finished) and retry.

Unlocking for read just decreases percpu variable. Note that we can
unlock on a difference cpu than where we locked, in this case the
counter underflows. The sum of all percpu counters represents the number
of processes that hold the lock for read.

When we need to lock for write, we take the mutex, set "locked" variable
to true and synchronize rcu. Since RCU has been synchronized, no
processes can create new read locks. We wait until the sum of percpu
counters is zero - when it is, there are no readers in the critical
section.

Signed-off-by: Mikulas Patocka 

---
 Documentation/percpu-rw-semaphore.txt |   27 ++
 fs/block_dev.c|   27 ++
 include/linux/fs.h|3 -
 include/linux/percpu-rwsem.h  |   89 ++
 4 files changed, 135 insertions(+), 11 deletions(-)

---

Index: linux-2.6-copy/fs/block_dev.c
===
--- linux-2.6-copy.orig/fs/block_dev.c  2012-09-26 00:42:49.0 +0200
+++ linux-2.6-copy/fs/block_dev.c   2012-09-26 00:45:29.0 +0200
@@ -127,7 +127,7 @@ int set_blocksize(struct block_device *b
return -EINVAL;
 
/* Prevent starting I/O or mapping the device */
-   down_write(>bd_block_size_semaphore);
+   percpu_down_write(>bd_block_size_semaphore);
 
/* Check that the block device is not memory mapped */
mapping = bdev->bd_inode->i_mapping;
@@ -135,7 +135,7 @@ int set_blocksize(struct block_device *b
if (!prio_tree_empty(>i_mmap) ||
!list_empty(>i_mmap_nonlinear)) {
mutex_unlock(>i_mmap_mutex);
-   up_write(>bd_block_size_semaphore);
+   percpu_up_write(>bd_block_size_semaphore);
return -EBUSY;
}
mutex_unlock(>i_mmap_mutex);
@@ -148,7 +148,7 @@ int set_blocksize(struct block_device *b
kill_bdev(bdev);
}
 
-   up_write(>bd_block_size_semaphore);
+   percpu_up_write(>bd_block_size_semaphore);
 
return 0;
 }
@@ -460,6 +460,12 @@ static struct inode *bdev_alloc_inode(st
struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL);
if (!ei)
return NULL;
+
+   if (unlikely(percpu_init_rwsem(>bdev.bd_block_size_semaphore))) {
+   kmem_cache_free(bdev_cachep, ei);
+   return NULL;
+   }
+
return >vfs_inode;
 }
 
@@ -468,6 +474,8 @@ static void bdev_i_callback(struct rcu_h
struct inode *inode = container_of(head, struct inode, i_rcu);
struct bdev_inode *bdi = BDEV_I(inode);
 
+   percpu_free_rwsem(>bdev.bd_block_size_semaphore);
+
kmem_cache_free(bdev_cachep, bdi);
 }
 
@@ -491,7 +499,6 @@ static void init_once(void *foo)
inode_init_once(>vfs_inode);
/* Initialize mutex for freeze. */
mutex_init(>bd_fsfreeze_mutex);
-   init_rwsem(>bd_block_size_semaphore);
 }
 
 static inline void __bd_forget(struct inode *inode)
@@ -1593,11 +1600,11 @@ ssize_t blkdev_aio_read(struct kiocb *io
ssize_t ret;
struct block_device *bdev = I_BDEV(iocb->ki_filp->f_mapping->host);
 
-   down_read(>bd_block_size_semaphore);
+   percpu_down_read(>bd_block_size_semaphore);
 
ret = generic_file_aio_read(iocb, iov, nr_segs, pos);
 
-   up_read(>bd_block_size_semaphore);
+   percpu_up_read(>bd_block_size_semaphore);
 
return ret;
 }
@@ -1622,7 +1629,7 @@ ssize_t blkdev_aio_write(struct kiocb *i
 
blk_start_plug();
 
-   down_read(>bd_block_size_semaphore);
+   percpu_down_read(>bd_block_size_semaphore);
 
ret = __generic_file_aio_write(iocb, iov, nr_segs, >ki_pos);
if (ret > 0 || ret == -EIOCBQUEUED) {
@@ -1633,7 +1640,7 @@ ssize_t blkdev_aio_write(struct kiocb *i
ret = err;
}
 
-   up_read(>bd_block_size_semaphore);
+   percpu_up_read(>bd_block_size_semaphore);
 
blk_finish_plug();
 
@@ -1646,11 +1653,11 @@ int blkdev_mmap(struct file *file, struc
int ret;
struct block_device *bdev = I_BDEV(file->f_mapping->host);
 
-   down_read(>bd_block_size_semaphore);
+   percpu_down_read(>bd_block_size_semaphore);
 
ret = generic_file_mmap(file, vma);
 
-   up_read(>bd_block_size_semaphore);
+   percpu_up_read(>bd_block_size_semaphore);
 

[PATCH 1/2] Fix a crash when block device is read and block size is changed at the same time

2012-09-25 Thread Mikulas Patocka


On Tue, 25 Sep 2012, Jens Axboe wrote:

> On 2012-09-25 19:59, Jens Axboe wrote:
> > On 2012-09-25 19:49, Jeff Moyer wrote:
> >> Jeff Moyer  writes:
> >>
> >>> Mikulas Patocka  writes:
> >>>
>  Hi Jeff
> 
>  Thanks for testing.
> 
>  It would be interesting ... what happens if you take the patch 3, leave 
>  "struct percpu_rw_semaphore bd_block_size_semaphore" in "struct 
>  block_device", but remove any use of the semaphore from fs/block_dev.c? 
>  - 
>  will the performance be like unpatched kernel or like patch 3? It could 
>  be 
>  that the change in the alignment affects performance on your CPU too, 
>  just 
>  differently than on my CPU.
> >>>
> >>> It turns out to be exactly the same performance as with the 3rd patch
> >>> applied, so I guess it does have something to do with cache alignment.
> >>> Here is the patch (against vanilla) I ended up testing.  Let me know if
> >>> I've botched it somehow.
> >>>
> >>> So, I next up I'll play similar tricks to what you did (padding struct
> >>> block_device in all kernels) to eliminate the differences due to
> >>> structure alignment and provide a clear picture of what the locking
> >>> effects are.
> >>
> >> After trying again with the same padding you used in the struct
> >> bdev_inode, I see no performance differences between any of the
> >> patches.  I tried bumping up the number of threads to saturate the
> >> number of cpus on a single NUMA node on my hardware, but that resulted
> >> in lower IOPS to the device, and hence consumption of less CPU time.
> >> So, I believe my results to be inconclusive.
> >>
> >> After talking with Vivek about the problem, he had mentioned that it
> >> might be worth investigating whether bd_block_size could be protected
> >> using SRCU.  I looked into it, and the one thing I couldn't reconcile is
> >> updating both the bd_block_size and the inode->i_blkbits at the same
> >> time.  It would involve (afaiui) adding fields to both the inode and the
> >> block_device data structures and using rcu_assign_pointer  and
> >> rcu_dereference to modify and access the fields, and both fields would
> >> need to protected by the same struct srcu_struct.  I'm not sure whether
> >> that's a desirable approach.  When I started to implement it, it got
> >> ugly pretty quickly.  What do others think?
> >>
> >> For now, my preference is to get the full patch set in.  I will continue
> >> to investigate the performance impact of the data structure size changes
> >> that I've been seeing.
> >>
> >> So, for the four patches:
> >>
> >> Acked-by: Jeff Moyer 
> >>
> >> Jens, can you have a look at the patch set?  We are seeing problem
> >> reports of this in the wild[1][2].
> > 
> > I'll queue it up for 3.7. I can run my regular testing on the 8-way, it
> > has a nack for showing scaling problems very nicely in aio/dio. As long
> > as we're not adding per-inode cache line dirtying per IO (and the
> > per-cpu rw sem looks OK), then I don't think there's too much to worry
> > about.
> 
> I take that back. The series doesn't apply to my current tree. Not too
> unexpected, since it's some weeks old. But more importantly, please send
> this is a "real" patch series. I don't want to see two implementations
> of rw semaphores. I think it's perfectly fine to first do a regular rw
> sem, then a last patch adding the cache friendly variant from Eric and
> converting to that.
> 
> In other words, get rid of 3/4.
> 
> -- 
> Jens Axboe

Hi Jens

Here I'm resending it as two patches. The first one uses existing 
semaphore, the second converts it to RCU-based percpu semaphore.

Mikulas

---

blockdev: fix a crash when block size is changed and I/O is issued 
simultaneously

The kernel may crash when block size is changed and I/O is issued
simultaneously.

Because some subsystems (udev or lvm) may read any block device anytime,
the bug actually puts any code that changes a block device size in
jeopardy.

The crash can be reproduced if you place "msleep(1000)" to
blkdev_get_blocks just before "bh->b_size = max_blocks <<
inode->i_blkbits;".
Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct"
While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0"
You get a BUG.

The direct and non-direct I/O is written with the assumption that block
size does not change. It doesn't seem practical to fix these crashes
one-by-one there may be many crash possibilities when block size changes
at a certain place and it is impossible to find them all and verify the
code.

This patch introduces a new rw-lock bd_block_size_semaphore. The lock is
taken for read during I/O. It is taken for write when changing block
size. Consequently, block size can't be changed while I/O is being
submitted.

For asynchronous I/O, the patch only prevents block size change while
the I/O is being submitted. The block size can change when the I/O is in
progress or when the I/O is being finished. This is acceptable because

Re: linux-next: build failure after merge of the final tree (tty tree related)

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 12:36:44AM +1000, Stephen Rothwell wrote:
> Hi all,
> 
> After merging the final tree, today's linux-next build (sparc64 defconfig)
> failed like this:
> 
> fs/compat_ioctl.c:868:1: error: 'TIOCSRS485' undeclared here (not in a 
> function)
> fs/compat_ioctl.c:869:1: error: 'TIOCGRS485' undeclared here (not in a 
> function)
> 
> Caused by commit 84c3b8486044 ("compat_ioctl: Add RS-485 IOCTLs to the
> list") from the tty tree.

Thanks, Jaeden just sent me a fix for this, I'll go apply it.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the tty tree with the workqueues tree

2012-09-25 Thread Greg KH
On Mon, Sep 24, 2012 at 04:36:09PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the tty tree got a conflict in
> drivers/tty/serial/omap-serial.c between commit 43829731dd37 ("workqueue:
> deprecate flush[_delayed]_work_sync()") from the workqueues tree and
> commit ac57e7f38ea6 ("serial: omap: Remove unnecessary checks from
> suspend/resume") from the tty tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Looks fine to me, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the usb tree with the acpi tree

2012-09-25 Thread Greg KH
On Mon, Sep 24, 2012 at 04:54:49PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> On Mon, 24 Sep 2012 16:49:16 +1000 Stephen Rothwell  
> wrote:
> >
> > Today's linux-next merge of the usb tree got a conflict in
> > drivers/usb/core/usb-acpi.c between commit 59e6423ba8aa ("usb-acpi:
> > Comply with the ACPI API change") from the acpi tree and commit
> > 05f916894a69 ("usb/acpi: Store info on device removability") from the usb
> > tree.
> > 
> > The latter removed the function changed by the former, so I just did that
> > and can carry the fix as necessary (no action is required).
> 
> It also needed this merge fix patch:
> 
> From: Stephen Rothwell 
> Date: Mon, 24 Sep 2012 16:51:38 +1000
> Subject: [PATCH] usb-acpi: Comply with the ACPI API change after usb tree
>  merge
> 
> Signed-off-by: Stephen Rothwell 
> ---
>  drivers/usb/core/usb-acpi.c |6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Ick, thanks for this, hopefully Len handles this properly during the
merge window :)

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: manual merge of the staging tree with the vfs tree

2012-09-25 Thread Greg KH
On Mon, Sep 24, 2012 at 05:07:41PM +1000, Stephen Rothwell wrote:
> Hi Greg,
> 
> Today's linux-next merge of the staging tree got a conflict in
> drivers/staging/android/binder.c between commit 004223461eee ("new
> helper: __alloc_fd()") from the vfs tree and commits efde99cd281a
> ("Staging: android: binder: Make task_get_unused_fd_flags function
> static") and bf2023614201 ("Staging: android: binder: Remove an
> inconsequential conditional macro") from the staging tree.
> 
> I fixed it up (see below) and can carry the fix as necessary (no action
> is required).

Looks good to me, thanks.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arch/x86/lib/inat.c Error

2012-09-25 Thread Franklin Wei
I found something:
My inat-tables.c is blank!
Going to get the source from Git and try again.
Thanks for the help!

On 9/25/12, Borislav Petkov  wrote:
> On Mon, Sep 24, 2012 at 06:16:50PM -0700, Randy Dunlap wrote:
>> >> That's because all those _tables thingies are included from a
>> >> "inat-tables.c" in the same directory but it somehow doesn't get
>> >> included?!
>> >>
>> >> Can we get your .config pls? Also, you're doing a "normal" kernel
>> >> build
>> >> on the command line, right?
>> >>
>> >> Thanks.
>>
>>
>> Your config builds OK for me.
>> Do you see this line in your build output before inat.c is built?
>>
>>
>>   GEN arch/x86/lib/inat-tables.c
>
> Yeah, this looks like stale files from the build or similar are not
> being properly cleaned. Franklin, do the following:
>
> $ cp .config /tmp/
> $ make mrproper
> $ cp /tmp/.config .
> $ make oldconfig
> $ make -j
>
> to verify that a clean build actually fixes your issue.
>
> Thanks.
>
> --
> Regards/Gruss,
> Boris.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.

2012-09-25 Thread David Sharp
With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
  -0 [000]  6330.52: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 
next_prio=120
   sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash 
prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  -0   0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
  -0 [000] 16490053398357: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 
next_prio=120
   sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash 
prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  -0   0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
---
 arch/x86/include/asm/trace_clock.h |2 +-
 include/linux/ftrace_event.h   |6 +++
 kernel/trace/trace.c   |   15 +-
 kernel/trace/trace.h   |4 --
 kernel/trace/trace_output.c|   84 +---
 5 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h 
b/arch/x86/include/asm/trace_clock.h
index 7ee0d8c..45e17f5 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-   { trace_clock_x86_tsc,  "x86-tsc" },
+   { trace_clock_x86_tsc,  "x86-tsc",  .in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
cpumask_var_t   started;
 };
 
+enum trace_iter_flags {
+   TRACE_FILE_LAT_FMT  = 1,
+   TRACE_FILE_ANNOTATE = 2,
+   TRACE_FILE_TIME_IN_NS   = 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..cff3427 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
u64 (*func)(void);
const char *name;
+   int in_ns;  /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-   { trace_clock_local,"local" },
-   { trace_clock_global,   "global" },
-   { trace_clock_counter,  "counter" },
+   { trace_clock_local,"local",1 },
+   { trace_clock_global,   "global",   1 },
+   { trace_clock_counter,  "counter",  0 },
ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
if (ring_buffer_overruns(iter->tr->buffer))
iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
/* stop the trace while dumping */
tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct 
file *filp)
if (trace_flags & TRACE_ITER_LATENCY_FMT)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
iter->cpu_file = cpu_file;
iter->tr = _trace;
mutex_init(>mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ 

Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.

2012-09-25 Thread David Sharp
On Tue, Sep 25, 2012 at 2:42 PM, Steven Rostedt  wrote:
> Sorry, I should have been more picky before. I haven't totally tested
> this yet.
>
> On Tue, 2012-09-25 at 13:49 -0700, David Sharp wrote:
>> With the addition of the "tsc" clock, formatting timestamps to look like
>> fractional seconds is misleading. Mark clocks as either in nanoseconds or
>> not, and format non-nanosecond timestamps as decimal integers.
>>
>> Tested:
>> $ cd /sys/kernel/debug/tracing/
>> $ cat trace_clock
>> [local] global tsc
>> $ echo sched_switch > set_event
>> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
>> $ cat trace
>>   -0 [000]  6330.52: sched_switch: prev_comm=swapper 
>> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 
>> next_prio=120
>>sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash 
>> prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
>> next_prio=120
>>   ...
>> $ echo 1 > options/latency-format
>> $ cat trace
>>   -0   0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 
>> prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
>>sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash 
>> prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
>> next_prio=120
>>   ...
>> $ echo tsc > trace_clock
>> $ cat trace
>> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
>> $ echo 0 > options/latency-format
>> $ cat trace
>>   -0 [000] 16490053398357: sched_switch: prev_comm=swapper 
>> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 
>> next_prio=120
>>sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash 
>> prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
>> next_prio=120
>>   ...
>> echo 1 > options/latency-format
>> $ cat trace
>>   -0   0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 
>> prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
>>sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 
>> prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>>   ...
>>
>> v2:
>> Move arch-specific bits out of generic code.
>> v4:
>> Fix x86_32 build due to 64-bit division.
>>
>> Google-Bug-Id: 6980623
>> Signed-off-by: David Sharp 
>> Cc: Steven Rostedt 
>> Cc: Masami Hiramatsu 
>> ---
>>  arch/x86/include/asm/trace_clock.h |2 +-
>>  include/linux/ftrace_event.h   |6 +++
>>  kernel/trace/trace.c   |   15 +-
>>  kernel/trace/trace.h   |4 --
>>  kernel/trace/trace_output.c|   84 
>> +---
>>  5 files changed, 78 insertions(+), 33 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/trace_clock.h 
>> b/arch/x86/include/asm/trace_clock.h
>> index 7ee0d8c..45e17f5 100644
>> --- a/arch/x86/include/asm/trace_clock.h
>> +++ b/arch/x86/include/asm/trace_clock.h
>> @@ -9,7 +9,7 @@
>>  extern u64 notrace trace_clock_x86_tsc(void);
>>
>>  # define ARCH_TRACE_CLOCKS \
>> - { trace_clock_x86_tsc,  "x86-tsc" },
>> + { trace_clock_x86_tsc,  "x86-tsc",  .in_ns = 0 },
>>
>>  #endif
>>
>> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
>> index 642928c..c760670 100644
>> --- a/include/linux/ftrace_event.h
>> +++ b/include/linux/ftrace_event.h
>> @@ -86,6 +86,12 @@ struct trace_iterator {
>>   cpumask_var_t   started;
>>  };
>>
>> +enum trace_iter_flags {
>> + TRACE_FILE_LAT_FMT  = 1,
>> + TRACE_FILE_ANNOTATE = 2,
>> + TRACE_FILE_TIME_IN_NS   = 4,
>> +};
>> +
>>
>>  struct trace_event;
>>
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index 4e26df3..3fe4c5b 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -476,10 +476,11 @@ static const char *trace_options[] = {
>>  static struct {
>>   u64 (*func)(void);
>>   const char *name;
>> + int in_ns; /* is this clock in nanoseconds? */
>
> Add a few tabs between the ns; and /*

Done.

>
>
>>  } trace_clocks[] = {
>> - { trace_clock_local,"local" },
>> - { trace_clock_global,   "global" },
>> - { trace_clock_counter,  "counter" },
>> + { trace_clock_local,"local",1 },
>> + { trace_clock_global,   "global",   1 },
>> + { trace_clock_counter,  "counter",  0 },
>>   ARCH_TRACE_CLOCKS
>>  };
>>
>> @@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
>>   if (ring_buffer_overruns(iter->tr->buffer))
>>   iter->iter_flags |= TRACE_FILE_ANNOTATE;
>>
>> + /* Output in nanoseconds only if we are using a clock in nanoseconds. 
>> */
>> + if (trace_clocks[trace_clock_id].in_ns)
>> + iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
>> +
>>   /* stop the trace while dumping */
>>   tracing_stop();
>>
>> @@ -3324,6 +3329,10 @@ static int 

[PATCH] MAINTAINERS: update Intel C600 SAS driver maintainers

2012-09-25 Thread Dave Jiang
Cc: Lukasz Dorau 
Cc: Maciej Patelczyk 
Signed-off-by: Dave Jiang 
---

 MAINTAINERS |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b17587d..162f602 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3552,11 +3552,12 @@ K:  \b(ABS|SYN)_MT_
 
 INTEL C600 SERIES SAS CONTROLLER DRIVER
 M: Intel SCU Linux support 
+M: Lukasz Dorau 
+M: Maciej Patelczyk 
 M: Dave Jiang 
-M: Ed Nadolski 
 L: linux-s...@vger.kernel.org
-T: git git://git.kernel.org/pub/scm/linux/kernel/git/djbw/isci.git
-S: Maintained
+T: git git://git.code.sf.net/p/intel-sas/isci
+S: Supported
 F: drivers/scsi/isci/
 F: firmware/isci/
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] x86: Added support for Acer Aspire 5755G fan control.

2012-09-25 Thread Borislav Petkov
Adding Peter.

On Tue, Sep 25, 2012 at 12:41:11PM +0300, Tero Keski-Valkama wrote:
> 2012/9/25 Borislav Petkov :
> > On Tue, Sep 25, 2012 at 10:34:13AM +0300, Tero Keski-Valkama wrote:
> >
> > But before we go with this any further: you mentioned some issues still
> > with acerhdf - you don't want to turn off your fan but to turn it to
> > full?
> >
> > I think in this case, you want to simply not load the driver and use the
> > BIOS settings, no?
> >
> > --
> > Regards/Gruss,
> > Boris.
> 
> Technically I think my original issue is about thermal zones, and BIOS
> control doesn't solve it as is. However, the patch should be valid,
> even if it doesn't solve my original issue.
> 
> The thing is that passive cooling thermal zones throttle all the cores
> to stone age, to 800 MHz, before the fan even really starts powering
> up from the low default level.

I don't understand: you want the thermal zones to not throttle your cpu
to 800 Mhz and/or the fan to start sooner?

> So, my original problem would be solved
> by:
> a) Direct control of the fan, to be able to put it on full for example
> through userspace control,

That's always a bad idea, especially if userspace dies on you.

> b) Changing the active cooling thermal zones, so that the fan speeds
> up earlier, or

Ok, I see, here's what you want.

> c) Changing the passive cooling (CPU throttling) thermal zones, so
> that it doesn't prevent fan from speeding up in the first place.
> 
> At the moment, I don't see any way to change the thermal zones through
> existing interfaces for this laptop, and I don't see any potential
> thermal zone controls, or manual fan control option in the Embedded
> Controller registry.

I'll let Peter have a look at the patch too.

@Peter, it is at: http://lkml.org/lkml/2012/9/25/92

Btw, the commit message starts with "x86:... " and it should be
"acerhdf:.. " Peter, please change that when handling the patch further.

Also, you've added yourself to the copyright - this means that you're
pretty much going to get all future email about acerhdf. Do you really
want that?

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] kbuild: Do not package /boot and /lib in make tar-pkg

2012-09-25 Thread Michal Marek
There were reports of users destroying their Fedora installs by a kernel
tarball that replaces the /lib -> /usr/lib symlink. Let's remove the
toplevel directories from the tarball to prevent this from happening.

Reported-by: Andi Kleen 
Suggested-by: Ben Hutchings 
Signed-off-by: Michal Marek 
---
 arch/x86/Makefile|2 +-
 scripts/Makefile.fwinst  |4 ++--
 scripts/package/buildtar |2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index c098ca4..b0c5276 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -138,7 +138,7 @@ KBUILD_CFLAGS += $(call cc-option,-mno-avx,)
 KBUILD_CFLAGS += $(mflags-y)
 KBUILD_AFLAGS += $(mflags-y)
 
-archscripts: scripts_basic
+archscripts:
$(Q)$(MAKE) $(build)=arch/x86/tools relocs
 
 ###
diff --git a/scripts/Makefile.fwinst b/scripts/Makefile.fwinst
index 4d908d1..c3f69ae 100644
--- a/scripts/Makefile.fwinst
+++ b/scripts/Makefile.fwinst
@@ -27,7 +27,7 @@ endif
 installed-mod-fw := $(addprefix $(INSTALL_FW_PATH)/,$(mod-fw))
 
 installed-fw := $(addprefix $(INSTALL_FW_PATH)/,$(fw-shipped-all))
-installed-fw-dirs := $(sort $(dir $(installed-fw))) $(INSTALL_FW_PATH)/./
+installed-fw-dirs := $(sort $(dir $(installed-fw))) $(INSTALL_FW_PATH)/.
 
 # Workaround for make < 3.81, where .SECONDEXPANSION doesn't work.
 PHONY += $(INSTALL_FW_PATH)/$$(%) install-all-dirs
@@ -42,7 +42,7 @@ quiet_cmd_install = INSTALL $(subst $(srctree)/,,$@)
 $(installed-fw-dirs):
$(call cmd,mkdir)
 
-$(installed-fw): $(INSTALL_FW_PATH)/%: $(obj)/% | $(INSTALL_FW_PATH)/$$(dir %)
+$(installed-fw): $(INSTALL_FW_PATH)/%: $(obj)/% | $$(dir $(INSTALL_FW_PATH)/%)
$(call cmd,install)
 
 PHONY +=  __fw_install __fw_modinst FORCE
diff --git a/scripts/package/buildtar b/scripts/package/buildtar
index 8a7b155..d0d748e 100644
--- a/scripts/package/buildtar
+++ b/scripts/package/buildtar
@@ -109,7 +109,7 @@ esac
if tar --owner=root --group=root --help >/dev/null 2>&1; then
opts="--owner=root --group=root"
fi
-   tar cf - . $opts | ${compress} > "${tarball}${file_ext}"
+   tar cf - boot/* lib/* $opts | ${compress} > "${tarball}${file_ext}"
 )
 
 echo "Tarball successfully created in ${tarball}${file_ext}"
-- 
1.7.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [dm-devel] [PATCH v3 11/26] block: Add submit_bio_wait(), remove from md

2012-09-25 Thread Kent Overstreet
On Tue, Sep 25, 2012 at 07:51:07AM +0200, Hannes Reinecke wrote:
> On 09/25/2012 12:34 AM, Kent Overstreet wrote:
> > +/**
> > + * submit_bio_wait - submit a bio, and wait until it completes
> > + * @rw: whether to %READ or %WRITE, or maybe to %READA (read ahead)
> > + * @bio: The  bio which describes the I/O
> > + *
> > + * Simple wrapper around submit_bio(). Returns 0 on success, or the error 
> > from
> > + * bio_endio() on failure.
> > + */
> > +int submit_bio_wait(int rw, struct bio *bio)
> > +{
> > +   struct submit_bio_ret ret;
> > +
> > +   rw |= REQ_SYNC;
> > +   init_completion();
> > +   bio->bi_private = 
> > +   bio->bi_end_io = submit_bio_wait_endio;
> 
> Hmm. As this is meant to be a generic function, blindly overwriting
> the bi_end_io pointer doesn't look like a good idea; the caller
> could have set something there.
> 
> Please add at least a WARN_ON(bio->bi_end_io) prior to modifying it.

Nah, the general rule with bios is after it's completed anything
could've been modified; we don't document or enforce otherwise with
bi_end_io (and there's a fair amount of code that saves/sets bi_end_io,
and I don't think it all restores the original before calling it).
I'm not going to special case this unless we start documenting/enforcing
it in general.

Besides that, setting a callback on something that's being used
synchronously is just dumb. Personally, I make damn sure to read and
understand code I'm using. I mean, maybe if this restriction was in the
slightest way subtle, but... how else would submit_bio_wait() be
implemented? It's kind of obvious if you think for two seconds about it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Drbd-dev] [PATCH v3 06/26] block: Add bio_end_sector()

2012-09-25 Thread Kent Overstreet
On Tue, Sep 25, 2012 at 01:54:52PM +0200, Lars Ellenberg wrote:
> On Mon, Sep 24, 2012 at 03:34:46PM -0700, Kent Overstreet wrote:
> > Just a little convenience macro - main reason to add it now is preparing
> > for immutable bio vecs, it'll reduce the size of the patch that puts
> > bi_sector/bi_size/bi_idx into a struct bvec_iter.
> 
> 
> For the DRBD part:
> 
> > diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> > index 01b2ac6..d90a1fd 100644
> > --- a/drivers/block/drbd/drbd_req.c
> > +++ b/drivers/block/drbd/drbd_req.c
> > @@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, 
> > struct bio *bio)
> > /* to make some things easier, force alignment of requests within the
> >  * granularity of our hash tables */
> > s_enr = bio->bi_sector >> HT_SHIFT;
> > -   e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT 
> > : s_enr;
> > +   e_enr = (bio_end_sector(bio) - 1) >> HT_SHIFT;
> 
> You ignored the bio->bi_size ? : ;
> 
> #define bio_end_sector(bio)   ((bio)->bi_sector + bio_sectors(bio))
> which turns out (bio->bi_sector + (bio->bi_size >> 9))
> 
> Note that bi_size may be 0, bio_end_sector(bio)-1 then is bi_sector -1,
> for an empty flush with bi_sector == 0, this ends up as (sector_t)-1ULL,
> and this code path breaks horribly.

Man, that was dumb of me - thanks for catching it. Version below look
good?

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 01b2ac6..47f55db 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1144,7 +1144,7 @@ void drbd_make_request(struct request_queue *q, struct 
bio *bio)
/* to make some things easier, force alignment of requests within the
 * granularity of our hash tables */
s_enr = bio->bi_sector >> HT_SHIFT;
-   e_enr = bio->bi_size ? (bio->bi_sector+(bio->bi_size>>9)-1) >> HT_SHIFT 
: s_enr;
+   e_enr = bio->bi_size ? (bio_end_sector(bio) - 1) >> HT_SHIFT : s_enr;
 
if (likely(s_enr == e_enr)) {
do {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Wrong system clock vs X.509 date specifiers

2012-09-25 Thread David Howells

How about the attached?  I knew perl had to be good for something...

David
---
#!/usr/bin/perl -w
#
# Generate an X.509 certificate from a public key.
#
# Format:
#
#   gen-x509-cert  \
#  [C=] [O=] [CN=] [Email=] \
#  [--from=] [--to=output
#
use strict;
use POSIX qw(strftime);

my $UNIV = 0 << 6;
my $APPL = 1 << 6;
my $CONT = 2 << 6;
my $PRIV = 3 << 6;

my $BOOLEAN = 0x01;
my $INTEGER = 0x02;
my $BIT_STRING  = 0x03;
my $OCTET_STRING = 0x04;
my $NULL= 0x05;
my $OBJ_ID  = 0x06;
my $UTF8String  = 0x0c;
my $SEQUENCE= 0x10;
my $SET = 0x11;
my $GeneralizedTime = 0x18;

my %OIDs = (
commonName  => pack("CCC", 85, 4, 3),
countryName => pack("CCC", 85, 4, 6),
organizationName=> pack("CCC", 85, 4, 10),
organizationUnitName=> pack("CCC", 85, 4, 11),
rsaEncryption   => pack("C", 42, 134, 72, 134, 247, 13, 
1, 1, 1),
sha1WithRSAEncryption   => pack("C", 42, 134, 72, 134, 247, 13, 
1, 1, 5),
emailAddress=> pack("C", 42, 134, 72, 134, 247, 13, 
1, 9, 1),
authorityKeyIdentifier  => pack("CCC", 85, 29, 35),
subjectKeyIdentifier=> pack("CCC", 85, 29, 14),
keyUsage=> pack("CCC", 85, 29, 15),
basicConstraints=> pack("CCC", 85, 29, 19)
);


#
# Set up the X.509 params
#
die "Format:  [options]"
if ($#ARGV == -1);

my $privfilename = shift @ARGV;

my %subject_name;

if ($#ARGV == -1) {
# Make something up if they don't want to admit to it
$subject_name{"C"}  = 'h2g2',
$subject_name{"O"}  = 'Magrathea',
$subject_name{"CN"} = 'Glacier signing key',
$subject_name{"Email"}  = 'slartibartfast@magrathea.h2g2'
}

my $from = 7 * 24 * 60 * 60;
my $to = 36500 * 24 * 60 * 60;
foreach my $_ (@ARGV) {
if (/--from=(.*)/) {
$from = $1;
} elsif (/--to=(.*)/) {
$to = $1;
} elsif (/([A-Z][A-Za-z]*)=(.*)/) {
$subject_name{$1} = $2;
} else {
last;
}
}

my $now = time();
my $valid_from = strftime("%Y%m%d%H%M%SZ", gmtime($now - $from));
my $valid_to   = strftime("%Y%m%d%H%M%SZ", gmtime($now + $to));

#
# openssl can be used to give us the public key in exactly the form we need -
# including ASN.1 wrappings - for inclusion in the certificate.
#
open PUBKEYFD, "openssl rsa -in $privfilename -pubout -outform DER 2>/dev/null 
|" ||
die "Unable to process $privfilename through openssl rsa: $!\n";
binmode PUBKEYFD;

my $pubkey = "";
my $tmp;
while (read(PUBKEYFD, $tmp, 512)) {
$pubkey .= $tmp;
}
close PUBKEYFD ||
die "Unable to close channel to openssl rsa: $!\n";

#
# Generate a serial number
#
my $serial = "";
for (my $i = int(rand(6)) + 6; $i > 0; $i--) {
$serial .= pack("C", rand(256));
}
$serial = pack("x") . $serial
if (unpack("C", substr($serial, 0, 1)) >= 0x80);

#
# Generate the SubjectKeyIdentifier.  This is the ASN.1 sum of the contents of
# the bit string element from the public key.
#
die "Can't disassemble RSA public key wrapping\n"
if (substr($pubkey,  0, 2) ne pack("n", 0x3082) ||
substr($pubkey,  4, 4) ne pack("N", 0x300d0609) ||
substr($pubkey,  8, 9) ne $OIDs{"rsaEncryption"} ||
substr($pubkey, 17, 2) ne pack("n", 0x0500) ||
substr($pubkey, 19, 2) ne pack("n", 0x0382) ||
substr($pubkey, 23, 1) ne pack("C", 0x00));

my $key_data = substr($pubkey, 24);

sub sha1sum($)
{
my ($data) = @_;

my ($TO_RD, $TO_WR, $FROM_RD, $FROM_WR);
pipe $TO_RD, $TO_WR;
pipe $FROM_RD, $FROM_WR;

my $sha1output;
my $child = fork();
if ($child == 0) {
close $TO_WR;
close $FROM_RD;
open(STDIN, ">&", $TO_RD) or die "Can't direct $TO_RD to STDIN: $!";
open(STDOUT, ">&", $FROM_WR) or die "Can't direct $FROM_WR to STDOUT: 
$!";
close $TO_RD;
close $FROM_WR;
exec("sha1sum");
} elsif (!$child) {
die;
} else {
close $TO_RD;
close $FROM_WR;
binmode $TO_WR;
syswrite $TO_WR, $data || die;
close $TO_WR || die;
$sha1output = <$FROM_RD> || die;
close $FROM_RD;
die "sha1sum failed\n"
if (waitpid($child, 0) != $child);
}

return pack("H*", substr($sha1output, 0, 40));
}

my $keyid = sha1sum($key_data);

###
#
# Generate a header
#
###
sub emit_asn1_hdr($$)
{
my ($tag, $len) = @_;

if ($len < 0x80) {
return pack("CC",   $tag, $len);
} elsif ($len <= 0xff) {
return pack("CCC",   $tag, 0x81, $len);
} elsif ($len <= 0x) {
return pack("CCn",  $tag, 0x82, $len);
} elsif ($len <= 0xff) {
return pack("CCCn", $tag, 0x83, $len >> 16, $len & 0x);
} else {
return 

Re: [PATCH -next v2] Shorten constant names for EFI variable attributes

2012-09-25 Thread Stephen Rothwell
Hi,

On Tue, 25 Sep 2012 09:41:00 -0600 Khalid Aziz  wrote:
>
> Replace very long constants for EFI variable attributes with
> shorter and more convenient names. Also create an alias for
> the current longer names so as to not break compatibility
> with current API since these constants are used by
> userspace programs.

Why do this?  It just looks like churn for no real gain.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpLkk55rSkYS.pgp
Description: PGP signature


[PATCH 0/3] Optimize CRC32C calculation using PCLMULQDQ in crc32c-intel module

2012-09-25 Thread Tim Chen
This patch series optimized CRC32C calculations with PCLMULQDQ
instruction for crc32c-intel module.  It speeds up the original
implementation by 1.6x for 1K buffer and by 3x for buffer 4k or
more.  The tcrypt module was enhanced for doing speed test
on crc32c calculations.

Tim

Signed-off-by: Tim Chen 
---
Tim Chen (3):
  Rename crc32c-intel.c to crc32c-intel_glue.c
  Optimize CRC32C calculation with PCLMULQDQ instruction
  Added speed test in tcrypt for crc32c

 arch/x86/crypto/Makefile   |1 +
 .../crypto/{crc32c-intel.c => crc32c-intel_glue.c} |   75 
 arch/x86/crypto/crc32c-pcl-intel-asm.S |  460 
 crypto/tcrypt.c|4 +
 4 files changed, 540 insertions(+), 0 deletions(-)
 rename arch/x86/crypto/{crc32c-intel.c => crc32c-intel_glue.c} (70%)
 create mode 100644 arch/x86/crypto/crc32c-pcl-intel-asm.S

-- 
1.7.7.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] Added speed test in tcrypt for crc32c

2012-09-25 Thread Tim Chen
This patch adds a test case in tcrypt to perform speed test for
crc32c checksum calculation.

Tim

Signed-off-by: Tim Chen 
---
 crypto/tcrypt.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index 581081d..6deb77f 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -1437,6 +1437,10 @@ static int do_test(int m)
test_hash_speed("ghash-generic", sec, hash_speed_template_16);
if (mode > 300 && mode < 400) break;
 
+   case 319:
+   test_hash_speed("crc32c", sec, generic_hash_speed_template);
+   if (mode > 300 && mode < 400) break;
+
case 399:
break;
 
-- 
1.7.7.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] Optimize CRC32C calculation with PCLMULQDQ instruction

2012-09-25 Thread Tim Chen
This patch adds the crc_pcl function that calculates CRC32C checksum using the
PCLMULQDQ instruction on processors that support this feature. This will
provide speedup over using CRC32 instruction only.
The usage of PCLMULQDQ necessitate the invocation of kernel_fpu_begin and
kernel_fpu_end and incur some overhead.  So the new crc_pcl function is only
invoked for buffer size of 512 bytes or more.  Larger sized
buffers will expect to see greater speedup.  This feature is best used coupled
with eager_fpu which reduces the kernel_fpu_begin/end overhead.  For
buffer size of 1K the speedup is around 1.6x and for buffer size greater than
4K, the speedup is around 3x compared to original implementation in crc32c-intel
module. Test was performed on Sandy Bridge based platform with constant 
frequency 
set for cpu.

A white paper detailing the algorithm can be found here:
http://download.intel.com/design/intarch/papers/323405.pdf

Tim

Signed-off-by: Tim Chen 
---
 arch/x86/crypto/Makefile   |2 +-
 arch/x86/crypto/crc32c-intel_glue.c|   75 +
 arch/x86/crypto/crc32c-pcl-intel-asm.S |  460 
 3 files changed, 536 insertions(+), 1 deletions(-)
 create mode 100644 arch/x86/crypto/crc32c-pcl-intel-asm.S

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index edd2268..0babdcb 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -43,4 +43,4 @@ serpent-avx-x86_64-y := serpent-avx-x86_64-asm_64.o 
serpent_avx_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
 sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
-crc32c-intel-y := crc32c-intel_glue.o
+crc32c-intel-y := crc32c-pcl-intel-asm.o crc32c-intel_glue.o
diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
b/arch/x86/crypto/crc32c-intel_glue.c
index 493f959..432af71 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -32,9 +32,31 @@
 
 #include 
 #include 
+#include 
+#include 
 
 #define CHKSUM_BLOCK_SIZE  1
 #define CHKSUM_DIGEST_SIZE 4
+/*
+ * use carryless multiply version of crc32c when buffer
+ * size is >= 512 (when eager fpu is enabled) or
+ * >= 1024 (when eager fpu is disabled) to account
+ * for fpu state save/restore overhead.
+ */
+#define CRC32C_PCL_BREAKEVEN_EAGERFPU  512
+#define CRC32C_PCL_BREAKEVEN_NOEAGERFPU1024
+
+static int crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_EAGERFPU;
+#if defined(X86_FEATURE_EAGER_FPU)
+#define set_pcl_breakeven_point()  \
+do {   \
+   if (!use_eager_fpu())   \
+   crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU; \
+} while (0)
+#else
+#define set_pcl_breakeven_point()  \
+   (crc32c_pcl_breakeven = CRC32C_PCL_BREAKEVEN_NOEAGERFPU)
+#endif
 
 #define SCALE_Fsizeof(unsigned long)
 
@@ -44,6 +66,9 @@
 #define REX_PRE
 #endif
 
+asmlinkage unsigned int crc_pcl(const u8 *buffer, int len,
+   unsigned int crc_init);
+
 static u32 crc32c_intel_le_hw_byte(u32 crc, unsigned char const *data, size_t 
length)
 {
while (length--) {
@@ -117,6 +142,24 @@ static int crc32c_intel_update(struct shash_desc *desc, 
const u8 *data,
return 0;
 }
 
+static int crc32c_pcl_intel_update(struct shash_desc *desc, const u8 *data,
+  unsigned int len)
+{
+   u32 *crcp = shash_desc_ctx(desc);
+
+   /*
+* use faster PCL version if datasize is large enough to
+* overcome kernel fpu state save/restore overhead
+*/
+   if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) {
+   kernel_fpu_begin();
+   *crcp = crc_pcl(data, len, *crcp);
+   kernel_fpu_end();
+   } else
+   *crcp = crc32c_intel_le_hw(*crcp, data, len);
+   return 0;
+}
+
 static int __crc32c_intel_finup(u32 *crcp, const u8 *data, unsigned int len,
u8 *out)
 {
@@ -124,12 +167,31 @@ static int __crc32c_intel_finup(u32 *crcp, const u8 
*data, unsigned int len,
return 0;
 }
 
+static int __crc32c_pcl_intel_finup(u32 *crcp, const u8 *data, unsigned int 
len,
+   u8 *out)
+{
+   if (len >= crc32c_pcl_breakeven && irq_fpu_usable()) {
+   kernel_fpu_begin();
+   *(__le32 *)out = ~cpu_to_le32(crc_pcl(data, len, *crcp));
+   kernel_fpu_end();
+   } else
+   *(__le32 *)out =
+   ~cpu_to_le32(crc32c_intel_le_hw(*crcp, data, len));
+   return 0;
+}
+
 static int crc32c_intel_finup(struct shash_desc *desc, const u8 *data,
  unsigned int len, u8 *out)
 {
return __crc32c_intel_finup(shash_desc_ctx(desc), data, len, 

[PATCH 1/3] Rename crc32c-intel.c to crc32c-intel_glue.c

2012-09-25 Thread Tim Chen
This patch rename the crc32c-intel.c file to crc32c-intel_glue.c file
in preparation for linking with the new crc32c-pcl-intel-asm.S file,
which contains optimized crc32c calculation based on PCLMULQDQ
instruction.

Tim

Signed-off-by: Tim Chen 
---
 arch/x86/crypto/Makefile   |1 +
 .../crypto/{crc32c-intel.c => crc32c-intel_glue.c} |0
 2 files changed, 1 insertions(+), 0 deletions(-)
 rename arch/x86/crypto/{crc32c-intel.c => crc32c-intel_glue.c} (100%)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index e908e5d..edd2268 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -43,3 +43,4 @@ serpent-avx-x86_64-y := serpent-avx-x86_64-asm_64.o 
serpent_avx_glue.o
 aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o fpu.o
 ghash-clmulni-intel-y := ghash-clmulni-intel_asm.o ghash-clmulni-intel_glue.o
 sha1-ssse3-y := sha1_ssse3_asm.o sha1_ssse3_glue.o
+crc32c-intel-y := crc32c-intel_glue.o
diff --git a/arch/x86/crypto/crc32c-intel.c 
b/arch/x86/crypto/crc32c-intel_glue.c
similarity index 100%
rename from arch/x86/crypto/crc32c-intel.c
rename to arch/x86/crypto/crc32c-intel_glue.c
-- 
1.7.7.6



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: mmotm 2012-09-20-17-25 uploaded (fs/bimfmt_elf on uml)

2012-09-25 Thread Stephen Rothwell
Hi David,

On Tue, 25 Sep 2012 12:43:53 -0700 (PDT) David Rientjes  
wrote:
>
> On Sat, 22 Sep 2012, Stephen Rothwell wrote:
> 
> > > on uml for x86_64 defconfig:
> > > 
> > > fs/binfmt_elf.c: In function 'fill_files_note':
> > > fs/binfmt_elf.c:1419:2: error: implicit declaration of function 'vmalloc'
> > > fs/binfmt_elf.c:1419:7: warning: assignment makes pointer from integer 
> > > without a cast
> > > fs/binfmt_elf.c:1437:5: error: implicit declaration of function 'vfree'
> > 
> > reported in linux-next (offending patch reverted for other
> > problems).
> 
> This still happens on x86_64 for linux-next as of today's tree.

Are you sure?  next-20120925?

$ grep -n vmalloc fs/binfmt_elf.c
30:#include 
1421:   data = vmalloc(size);

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpgIVI8tUvMd.pgp
Description: PGP signature


Re: [PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.

2012-09-25 Thread Steven Rostedt
Sorry, I should have been more picky before. I haven't totally tested
this yet.

On Tue, 2012-09-25 at 13:49 -0700, David Sharp wrote:
> With the addition of the "tsc" clock, formatting timestamps to look like
> fractional seconds is misleading. Mark clocks as either in nanoseconds or
> not, and format non-nanosecond timestamps as decimal integers.
> 
> Tested:
> $ cd /sys/kernel/debug/tracing/
> $ cat trace_clock
> [local] global tsc
> $ echo sched_switch > set_event
> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
> $ cat trace
>   -0 [000]  6330.52: sched_switch: prev_comm=swapper 
> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 
> next_prio=120
>sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash 
> prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
> next_prio=120
>   ...
> $ echo 1 > options/latency-format
> $ cat trace
>   -0   0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 
> prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
>sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 
> prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>   ...
> $ echo tsc > trace_clock
> $ cat trace
> $ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
> $ echo 0 > options/latency-format
> $ cat trace
>   -0 [000] 16490053398357: sched_switch: prev_comm=swapper 
> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 
> next_prio=120
>sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash 
> prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
> next_prio=120
>   ...
> echo 1 > options/latency-format
> $ cat trace
>   -0   0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 
> prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
>sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 
> prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
>   ...
> 
> v2:
> Move arch-specific bits out of generic code.
> v4:
> Fix x86_32 build due to 64-bit division.
> 
> Google-Bug-Id: 6980623
> Signed-off-by: David Sharp 
> Cc: Steven Rostedt 
> Cc: Masami Hiramatsu 
> ---
>  arch/x86/include/asm/trace_clock.h |2 +-
>  include/linux/ftrace_event.h   |6 +++
>  kernel/trace/trace.c   |   15 +-
>  kernel/trace/trace.h   |4 --
>  kernel/trace/trace_output.c|   84 
> +---
>  5 files changed, 78 insertions(+), 33 deletions(-)
> 
> diff --git a/arch/x86/include/asm/trace_clock.h 
> b/arch/x86/include/asm/trace_clock.h
> index 7ee0d8c..45e17f5 100644
> --- a/arch/x86/include/asm/trace_clock.h
> +++ b/arch/x86/include/asm/trace_clock.h
> @@ -9,7 +9,7 @@
>  extern u64 notrace trace_clock_x86_tsc(void);
>  
>  # define ARCH_TRACE_CLOCKS \
> - { trace_clock_x86_tsc,  "x86-tsc" },
> + { trace_clock_x86_tsc,  "x86-tsc",  .in_ns = 0 },
>  
>  #endif
>  
> diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
> index 642928c..c760670 100644
> --- a/include/linux/ftrace_event.h
> +++ b/include/linux/ftrace_event.h
> @@ -86,6 +86,12 @@ struct trace_iterator {
>   cpumask_var_t   started;
>  };
>  
> +enum trace_iter_flags {
> + TRACE_FILE_LAT_FMT  = 1,
> + TRACE_FILE_ANNOTATE = 2,
> + TRACE_FILE_TIME_IN_NS   = 4,
> +};
> +
>  
>  struct trace_event;
>  
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 4e26df3..3fe4c5b 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -476,10 +476,11 @@ static const char *trace_options[] = {
>  static struct {
>   u64 (*func)(void);
>   const char *name;
> + int in_ns; /* is this clock in nanoseconds? */

Add a few tabs between the ns; and /*


>  } trace_clocks[] = {
> - { trace_clock_local,"local" },
> - { trace_clock_global,   "global" },
> - { trace_clock_counter,  "counter" },
> + { trace_clock_local,"local",1 },
> + { trace_clock_global,   "global",   1 },
> + { trace_clock_counter,  "counter",  0 },
>   ARCH_TRACE_CLOCKS
>  };
>  
> @@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
>   if (ring_buffer_overruns(iter->tr->buffer))
>   iter->iter_flags |= TRACE_FILE_ANNOTATE;
>  
> + /* Output in nanoseconds only if we are using a clock in nanoseconds. */
> + if (trace_clocks[trace_clock_id].in_ns)
> + iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
> +
>   /* stop the trace while dumping */
>   tracing_stop();
>  
> @@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, 
> struct file *filp)
>   if (trace_flags & TRACE_ITER_LATENCY_FMT)
>   iter->iter_flags |= TRACE_FILE_LAT_FMT;
>  
> + /* Output in 

Re: [PATCH 5/9] mm: compaction: Acquire the zone->lru_lock as late as possible

2012-09-25 Thread Andrew Morton
On Tue, 25 Sep 2012 17:13:27 +0900
Minchan Kim  wrote:

> I see. To me, your saying is better than current comment.
> I hope comment could be more explicit.
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index df01b4e..f1d2cc7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -542,8 +542,9 @@ isolate_migratepages_range(struct zone *zone, struct 
> compact_control *cc,
>  * splitting and collapsing (collapsing has already happened
>  * if PageLRU is set) but the lock is not necessarily taken
>  * here and it is wasteful to take it just to check transhuge.
> -* Check transhuge without lock and skip if it's either a
> -* transhuge or hugetlbfs page.
> +* Check transhuge without lock and *skip* if it's either a
> +* transhuge or hugetlbfs page because it's not safe to call
> +* compound_order.
>  */
> if (PageTransHuge(page)) {
> if (!locked)

Going a bit further:

--- 
a/mm/compaction.c~mm-compaction-acquire-the-zone-lru_lock-as-late-as-possible-fix
+++ a/mm/compaction.c
@@ -415,7 +415,8 @@ isolate_migratepages_range(struct zone *
 * if PageLRU is set) but the lock is not necessarily taken
 * here and it is wasteful to take it just to check transhuge.
 * Check transhuge without lock and skip if it's either a
-* transhuge or hugetlbfs page.
+* transhuge or hugetlbfs page because calling compound_order()
+* requires lru_lock to exclude isolation and splitting.
 */
if (PageTransHuge(page)) {
if (!locked)
_


but...  the requirement to hold lru_lock for compound_order() is news
to me.  It doesn't seem to be written down or explained anywhere, and
one wonders why the cheerily undocumented compound_lock() doesn't have
this effect.  What's going on here??

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] xen/pciback: When resetting the device don't disable twice.

2012-09-25 Thread Konrad Rzeszutek Wilk
We call 'pci_disable_device' which sets the bus_master to zero
and it also disables the PCI_COMMAND. There is no need to
do it outside the PCI library.

Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/xen-pciback/pciback_ops.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/xen-pciback/pciback_ops.c 
b/drivers/xen/xen-pciback/pciback_ops.c
index 97f5d26..2e62279 100644
--- a/drivers/xen/xen-pciback/pciback_ops.c
+++ b/drivers/xen/xen-pciback/pciback_ops.c
@@ -114,10 +114,6 @@ void xen_pcibk_reset_device(struct pci_dev *dev)
pci_disable_msi(dev);
 #endif
pci_disable_device(dev);
-
-   pci_write_config_word(dev, PCI_COMMAND, 0);
-
-   dev->is_busmaster = 0;
} else {
pci_read_config_word(dev, PCI_COMMAND, );
if (cmd & (PCI_COMMAND_INVALIDATE)) {
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] fixes to xen-pciback for v3.7 (v1)

2012-09-25 Thread Konrad Rzeszutek Wilk
One fixes that I thought I had fixed but not so. This was discovered
when trying to passthrough an PCIe network card to an PVHVM guest
and finding that it can't use MSIs. I thought I had it fixed with 
git commit 80ba77dfbce85f2d1be54847de3c866de1b18a9a
"xen/pciback: Fix proper FLR steps." but that fixed only one use
case (bind the device to xen-pciback, then unbind it).

The underlaying reason was that after we do an FLR (if the card supports it),
we also do a D3 (so turn off the PCIe card), then followed by a D0
(power is back).  However we did not the follow the rest of the process
that pci_reset_function does - restore the device's PCI configuration state!

(Note: We cannot use pci_reset_function as it holds a mutex that we
hold as well - so we use the low-level reset functions that we can
invoke and hold a mutex - and we forgot to do the right calls that
pci_reset_function does).

With this patch:
 [PATCH 1/2] xen/pciback: Restore the PCI config space after an FLR.

I can pass through an PCIe e1000e card succesfully to my Win7 and Linux
guest.

This patch:
 [PATCH 2/2] xen/pciback: When resetting the device don't disable

is just a cleanup.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] xen/pciback: Restore the PCI config space after an FLR.

2012-09-25 Thread Konrad Rzeszutek Wilk
When we do an FLR, or D0->D3_hot we may lose the BARs as the
device has turned itself off (and on). This means the device cannot
function unless the pci_restore_state is called - which it is
when the PCI device is unbound from the Xen PCI backend driver.
For PV guests it ends up calling pci_enable_device / pci_enable_msi[x]
which does the proper steps

That however is not happening if a HVM guest is run as QEMU
deals with PCI configuration space. QEMU also requires that the
device be "parked"  under the ownership of a pci-stub driver to
guarantee that the PCI device is not being used. Hence we
follow the same incantation as pci_reset_function does - by
doing an FLR, then restoring the PCI configuration space.

The result of this patch is that when you run lspci, you get
now this:

-   Region 0: [virtual] Memory at fe8c (32-bit, non-prefetchable) 
[size=128K]
-   Region 1: [virtual] Memory at fe80 (32-bit, non-prefetchable) 
[size=512K]
+   Region 0: Memory at fe8c (32-bit, non-prefetchable) [size=128K]
+   Region 1: Memory at fe80 (32-bit, non-prefetchable) [size=512K]
Region 2: I/O ports at c000 [size=32]
-   Region 3: [virtual] Memory at fe8e (32-bit, non-prefetchable) 
[size=16K]
+   Region 3: Memory at fe8e (32-bit, non-prefetchable) [size=16K]

The [virtual] means that lspci read those entries from SysFS but when
it read them from the device it got a different value (0xfff).

CC: sta...@vger.kernel.org # only for v3.4 and v3.5
Signed-off-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/xen-pciback/pci_stub.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/xen/xen-pciback/pci_stub.c 
b/drivers/xen/xen-pciback/pci_stub.c
index acec6fa..e5a0c13 100644
--- a/drivers/xen/xen-pciback/pci_stub.c
+++ b/drivers/xen/xen-pciback/pci_stub.c
@@ -362,6 +362,7 @@ static int __devinit pcistub_init_device(struct pci_dev 
*dev)
else {
dev_dbg(>dev, "reseting (FLR, D3, etc) the device\n");
__pci_reset_function_locked(dev);
+   pci_restore_state(dev);
}
/* Now disable the device (this also ensures some private device
 * data is setup before we export)
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] 8139too: add 1013:1211 PCI ID for a strange SMC1211TX.

2012-09-25 Thread W. Trevor King
The FCC ID on the board is HEDEN1207DTXR01, which belongs to Accton
Technology Corporation.  This matches the expected 1113 ID.  Perhaps
my board just has a dying EEPROM?

Signed-off-by: W. Trevor King 
---
I'm not sure if this qualifies as a patch-able issue, but I thought
I'd send it in in case someone else gets bitten by this.

 drivers/net/ethernet/realtek/8139too.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139too.c 
b/drivers/net/ethernet/realtek/8139too.c
index 1d83565..b7cf947 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -238,6 +238,7 @@ static DEFINE_PCI_DEVICE_TABLE(rtl8139_pci_tbl) = {
{0x10ec, 0x8139, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
{0x10ec, 0x8138, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
{0x1113, 0x1211, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
+   {0x1013, 0x1211, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
{0x1500, 0x1360, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
{0x4033, 0x1360, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
{0x1186, 0x1300, PCI_ANY_ID, PCI_ANY_ID, 0, 0, RTL8139 },
-- 
1.7.8.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/10] mm, util: Use dup_user to duplicate user memory

2012-09-25 Thread Andrew Morton
On Sat,  8 Sep 2012 17:47:54 -0300
Ezequiel Garcia  wrote:

> Previously the strndup_user allocation was being done through memdup_user,
> and the caller was wrongly traced as being strndup_user
> (the correct trace must report the caller of strndup_user).
> 
> This is a common problem: in order to get accurate callsite tracing,
> a utils function can't allocate through another utils function,
> but instead do the allocation himself (or inlined).
> 
> Here we fix this by creating an always inlined dup_user() function to
> performed the real allocation and to be used by memdup_user and strndup_user.

This patch increases util.o's text size by 238 bytes.  A larger kernel
with a worsened cache footprint.

And we did this to get marginally improved tracing output?  This sounds
like a bad tradeoff to me.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/resend/bypass] um: Preinclude include/linux/kern_levels.h

2012-09-25 Thread Richard Weinberger
Am Tue, 25 Sep 2012 22:37:13 +0200
schrieb Geert Uytterhoeven :

> On Tue, Sep 25, 2012 at 9:43 PM, Al Viro 
> wrote:
> > On Tue, Sep 25, 2012 at 12:20:55PM -0700, Linus Torvalds wrote:
> >> IOW, this part of the patch:
> >>
> >> -   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include user.h
> >> $(CFLAGS_$(basetarget).o)
> >> +   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include
> >> $(srctree)/include/linux/kern_levels.h -include user.h
> >> $(CFLAGS_$(basetarget).o)
> >>
> >> just makes me go want to puke. The user.h file already has other
> >> #include's in it, so I really don't see why you create this insane
> >> special case.
> >>
> >> And why does UM have those "UM_KERN_XYZ" defines in the first
> >> place? Why isn't it just using KERN_XYZ directly? Is it because
> >> kern_levels.h didn't use to exist, so it was some kind of "let's
> >> create our own that we can hide in our special headers".
> >
> > Because user.h is included *without* kernel headers in include path.
> 
> Indeed.
> 
> > It's for the stuff that is compiled with host libc headers.  Keep in
> > mind that UML talks to libc like normal architecture would talk to
> > hardware.  IOW, analogs of asm glue are in (host) userland C.  And
> > they need libc headers instead of the kernel ones.  That's what that
> > USER_OBJ thing is about.  Kernel-side constants, etc. are delivered
> > to that sucker using the same mechanism we normally use to give them
> > to assembler - asm-offsets.c.  And here, of course, slapping ifndef
> > __ASSEMBLER__ around the tricky bits will not work - the header
> > itself is just fine, but getting kernel headers in the search path
> > really isn't.
> >
> > I agree that proposed solution is ugly.  What we can do is
> > copy the damn header into include/generated and #include
> >  from user.h.  And kill UM_KERN_...
> > stuff.  Objections?
> 
> My first submission had "We may convert all UM_KERN_* users to KERN_*
> and drop the extra defines?" as a suggestion, but so far I haven't
> found time to implement that...
> 
> Still, no one came up with a better patch, and this is a regression.

Yeah, I'd like to take the "ugly" patch to get rid of the regresion.
Later we can get rid of UM_KERN_*, which is IMHO also very ugly.

Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RTL8101E/RTL8102E PCI Express Fast Ethernet controller (rev 02)

2012-09-25 Thread Francois Romieu
Thanasis  :
[...]
> Ping failed in the following step:
> 
> HEAD is now at 3c6ad46 r8169: move rtl_set_rx_mode before its
> rtl_hw_start callers.

*spleen*

It's a genuine code move without any real change. Imho it's more a
matter of sleeping a few seconds for the link to settle after the
device is brought up.

The differences between the top-most r8169 driver you tried and the
real v3.5.4 r8169 driver are minor : mostly Ben Grear's corrupted
frames rx work (default: disabled) and a skb_timestamp which comes
too late in your setup.

So, either your problem lacks of reproducibility with 3.5.4 - cold reboot,
driver which does not fail the first time - or it needs something else
in the kernel to happen.

The "PME# disabled" messages have disappeared between 2.6 and 3.5.4 in your
dmesg. It's probably due to a dev_dbg/dev_printk + CONFIG_DYNAMIC_DEBUG
change. It's still worth checking runtime pm settings though

Can you check the content of /sys/class/pci_bus/:02/power, set it
to "on" if it contains "auto" and plug the cable again (with 3.5.4) ?

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Suresh Siddha
On Mon, 2012-09-24 at 12:12 -0700, Linus Torvalds wrote:
> On Mon, Sep 24, 2012 at 11:26 AM, Mike Galbraith  wrote:
> >
> > Aside from the cache pollution I recall having been mentioned, on my
> > E5620, cross core is a tbench win over affine, cross thread is not.
> 
> Oh, I agree with trying to avoid HT threads, the resource contention
> easily gets too bad.
> 
> It's more a question of "if we have real cores with separate L1's but
> shared L2's, go with those first, before we start distributing it out
> to separate L2's".

There is one issue though. If the tasks continue to run in this state
and the periodic balance notices an idle L2, it will force migrate
(using active migration) one of the tasks to the idle L2. As the
periodic balance tries to spread the load as far as possible to take
maximum advantage of the available resources (and the perf advantage of
this really depends on the workload, cache usage/memory bw, the upside
of turbo etc).

But I am not sure if this was the reason why we chose to spread it out
to separate L2's during wakeup.

Anyways, this is one of the places where the Paul Turner's task load
average tracking patches will be useful. Depending on how long a task
typically runs, we can probably even chose a SMT siblings or a separate
L2 to run.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Recent frv breakage

2012-09-25 Thread Geert Uytterhoeven
http://kisskb.ellerman.id.au/kisskb/buildresult/7270498/

> arch/frv/kernel/entry.S:871: Error: VLIW packing constraint violation

What's wrong there? Introduced by "frv: split ret_from_fork, simplify
kernel_thread() a lot"

> make[2]: *** [arch/frv/kernel/entry.o] Error 1
> arch/frv/kernel/process.c:197:3: error: 'chilregs' undeclared (first use in 
> this function)

typo in "frv: switch to generic kernel_thread()", should be "childregs"

> make[2]: *** [arch/frv/kernel/process.o] Error 1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT] Networking

2012-09-25 Thread David Miller

1) Eric Dumazet discovered and fixed what turned out to be a family of
   bugs.  These functions were using pskb_may_pull() which might need
   to reallocate the linear SKB data buffer, but the callers were not
   expecting this possibility.  The callers have cached pointers to
   the packet header areas, and would need to reload them if we were
   to continue using pskb_may_pull().

   So they could end up reading garbage.

   It's easier to just change these RAW4/RAW6/MIP6 routines to use
   skb_header_pointer() instead of pskb_may_pull(), which won't
   modify the linear SKB data area.

2) Dave Jone's syscall spammer caught a case where a non-TCP socket
   can call down into the TCP keepalive code.  The case basically
   involves creating a raw socket with sk_protocol == IPPROTO_TCP,
   then calling setsockopt(sock_fd, SO_KEEPALIVE, ...)

   Fixed by Eric Dumazet.

3) Bluetooth devices do not get configured properly while being
   powered on, resulting in always using legacy pairing instead
   of SSP.  Fix from Andrzej Kaczmarek.

4) Bluetooth cancels delayed work erroneously, put stricter
   checks in place.  From Andrei Emeltchenko.

5) Fix deadlock between cfg80211_mutex and reg_regdb_search_mutex
   in cfg80211, from Luis R. Rodriguez.

6) Fix interrupt double release in iwlwifi, from Emmanuel Grumbach.

7) Missing module license in bcm87xx driver, from Peter Huewe.

8) Team driver can lose port changed events when adding devices to a
   team, fix from Jiri Pirko.

9) Fix endless loop when trying ot unregister PPPOE device in
   zombie state, from Xiaodong Xu.

10) batman-adv layer needs to set MAC address of software device
earlier, otherwise we call tt_local_add with it uninitialized.

11) Fix handling of KSZ8021 PHYs, it's matched currently by KS8051
but that doesn't program the device properly.  From Marek
Vasut.

Please pull, thanks a lot!

The following changes since commit abef3bd71029b80ec1bdd6c6244b5b0b99f56633:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2012-09-21 
14:32:55 -0700)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git master

for you to fetch changes up to 96af69ea2a83d292238bdba20e4508ee967cf8cb:

  ipv6: mip6: fix mip6_mh_filter() (2012-09-25 16:04:44 -0400)


Andrei Emeltchenko (1):
  Bluetooth: Fix freeing uninitialized delayed works

Andrzej Kaczmarek (2):
  Bluetooth: mgmt: Fix enabling SSP while powered off
  Bluetooth: mgmt: Fix enabling LE while powered off

David S. Miller (2):
  Merge branch 'for-davem' of git://git.kernel.org/.../linville/wireless
  Merge tag 'batman-adv-fix-for-davem' of 
git://git.open-mesh.org/linux-merge

Def (1):
  batman-adv: Fix change mac address of soft iface.

Emmanuel Grumbach (1):
  iwlwifi: don't double free the interrupt in failure path

Eric Dumazet (4):
  ipv4: raw: fix icmp_filter()
  net: guard tcp_set_keepalive() to tcp sockets
  ipv6: raw: fix icmpv6_filter()
  ipv6: mip6: fix mip6_mh_filter()

Jiri Pirko (1):
  team: send port changed when added

John W. Linville (1):
  Merge branch 'master' of git://git.kernel.org/.../linville/wireless into 
for-davem

Linus Lüssing (1):
  batman-adv: Fix symmetry check / route flapping in multi interface setups

Luis R. Rodriguez (1):
  cfg80211: fix possible circular lock on reg_regdb_search()

Marek Vasut (3):
  phy/micrel: Implement support for KSZ8021
  phy/micrel: Rename KS80xx to KSZ80xx
  phy/micrel: Add missing header to micrel_phy.h

Peter Hüwe (1):
  net/phy/bcm87xx: Add MODULE_LICENSE("GPL") to GPL driver

Vinicius Costa Gomes (1):
  Bluetooth: Fix not removing power_off delayed work

Xiaodong Xu (1):
  pppoe: drop PPPOX_ZOMBIEs in pppoe_release

 arch/arm/mach-mxs/mach-mxs.c  |  2 +-
 drivers/net/phy/bcm87xx.c |  2 ++
 drivers/net/phy/micrel.c  | 45 
-
 drivers/net/ppp/pppoe.c   |  2 +-
 drivers/net/team/team.c   | 32 
 drivers/net/wireless/iwlwifi/pcie/trans.c |  1 +
 include/linux/micrel_phy.h| 19 ---
 net/batman-adv/bat_iv_ogm.c   | 13 +++--
 net/batman-adv/soft-interface.c   |  7 +--
 net/bluetooth/hci_core.c  |  2 ++
 net/bluetooth/l2cap_core.c|  2 +-
 net/bluetooth/mgmt.c  | 16 
 net/core/sock.c   |  3 ++-
 net/ipv4/raw.c| 14 --
 net/ipv6/mip6.c   | 20 +++-
 net/ipv6/raw.c| 21 ++---
 net/wireless/reg.c| 12 +---
 17 files changed, 152 insertions(+), 61 deletions(-)
--
To unsubscribe 

Re: [PATCH 4/5] dev: Add dev_vprintk_emit and dev_printk_emit

2012-09-25 Thread Geert Uytterhoeven
On Sun, Aug 26, 2012 at 1:25 PM, Joe Perches  wrote:
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -914,6 +918,13 @@ int _dev_info(const struct device *dev, const char *fmt, 
> ...);
>
>  #else
>
> +static int dev_vprintk_emit(int level, const struct device *dev,

Missing "inline", cfr. http://kisskb.ellerman.id.au/kisskb/buildresult/7271354/

include/linux/device.h:930:12: error: 'dev_vprintk_emit' defined but
not used [-Werror=unused-function]
cc1: all warnings being treated as errors
make[2]: *** [arch/sh/kernel/dma-nommu.o] Error 1

> +   const char *fmt, va_list args)
> +{ return 0; }

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 2/4] memcg: make it suck faster

2012-09-25 Thread Andrew Morton
On Tue, 25 Sep 2012 12:52:51 +0400
Glauber Costa  wrote:

> It is an accepted fact that memcg sucks. But can it suck faster?  Or in
> a more fair statement, can it at least stop draining everyone's
> performance when it is not in use?
> 
> This experimental and slightly crude patch demonstrates that we can do
> that by using static branches to patch it out until the first memcg
> comes to life. There are edges to be trimmed, and I appreciate comments
> for direction. In particular, the events in the root are not fired, but
> I believe this can be done without further problems by calling a
> specialized event check from mem_cgroup_newpage_charge().
> 
> My goal was to have enough numbers to demonstrate the performance gain
> that can come from it. I tested it in a 24-way 2-socket Intel box, 24 Gb
> mem. I used Mel Gorman's pft test, that he used to demonstrate this
> problem back in the Kernel Summit. There are three kernels:
> 
> nomemcg  : memcg compile disabled.
> base : memcg enabled, patch not applied.
> bypassed : memcg enabled, with patch applied.
> 
> basebypassed
> User  109.12  105.64
> System   1646.84 1597.98
> Elapsed   229.56  215.76
> 
>  nomemcgbypassed
> User  104.35  105.64
> System   1578.19 1597.98
> Elapsed   212.33  215.76
> 
> So as one can see, the difference between base and nomemcg in terms
> of both system time and elapsed time is quite drastic, and consistent
> with the figures shown by Mel Gorman in the Kernel summit. This is a
> ~ 7 % drop in performance, just by having memcg enabled. memcg functions
> appear heavily in the profiles, even if all tasks lives in the root
> memcg.
> 
> With bypassed kernel, we drop this down to 1.5 %, which starts to fall
> in the acceptable range. More investigation is needed to see if we can
> claim that last percent back, but I believe at last part of it should
> be.

Well that's encouraging.  I wonder how many users will actually benefit
from this - did I hear that major distros are now using memcg in some
system-infrastructure-style code?

iirc, the idea of disabling memcg operations until someone enables a
container had a couple of problems:

a) certain boot-time initialisation issn't performed and

b) when memcg starts running for real, it expects that certain stats
   gathering has been running since boot.  If this is not the case,
   those stats are wrong and stuff breaks.

It would be helpful if you could summarise these and similar issues
and describe how they were addressed.

>
> ...
>
>  struct mem_cgroup *mem_cgroup_from_cont(struct cgroup *cont)
>  {
> + if (mem_cgroup_disabled())
> + return root_mem_cgroup;

There would be some benefit in inlining the above instructions into the
caller.

>   return mem_cgroup_from_css(
>   cgroup_subsys_state(cont, mem_cgroup_subsys_id));
>  }

In fact the entire mem_cgroup_from_cont() could be inlined.

>  struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p)
>  {
> +
> + if (mem_cgroup_disabled())
> + return root_mem_cgroup;

Ditto.

>   /*
>* mm_update_next_owner() may clear mm->owner to NULL
>* if it races with swapoff, page migration, etc.
>
> ...
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.

2012-09-25 Thread David Sharp
With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ cat trace
  -0 [000]  6330.52: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 
next_prio=120
   sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash 
prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
$ echo 1 > options/latency-format
$ cat trace
  -0   0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
   sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_enabled ; sleep 0.0005 ; echo 0 > tracing_enabled
$ echo 0 > options/latency-format
$ cat trace
  -0 [000] 16490053398357: sched_switch: prev_comm=swapper 
prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 
next_prio=120
   sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash 
prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 
next_prio=120
  ...
echo 1 > options/latency-format
$ cat trace
  -0   0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 
prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
   sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 
prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
  ...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Signed-off-by: David Sharp 
Cc: Steven Rostedt 
Cc: Masami Hiramatsu 
---
 arch/x86/include/asm/trace_clock.h |2 +-
 include/linux/ftrace_event.h   |6 +++
 kernel/trace/trace.c   |   15 +-
 kernel/trace/trace.h   |4 --
 kernel/trace/trace_output.c|   84 +---
 5 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/arch/x86/include/asm/trace_clock.h 
b/arch/x86/include/asm/trace_clock.h
index 7ee0d8c..45e17f5 100644
--- a/arch/x86/include/asm/trace_clock.h
+++ b/arch/x86/include/asm/trace_clock.h
@@ -9,7 +9,7 @@
 extern u64 notrace trace_clock_x86_tsc(void);
 
 # define ARCH_TRACE_CLOCKS \
-   { trace_clock_x86_tsc,  "x86-tsc" },
+   { trace_clock_x86_tsc,  "x86-tsc",  .in_ns = 0 },
 
 #endif
 
diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 642928c..c760670 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -86,6 +86,12 @@ struct trace_iterator {
cpumask_var_t   started;
 };
 
+enum trace_iter_flags {
+   TRACE_FILE_LAT_FMT  = 1,
+   TRACE_FILE_ANNOTATE = 2,
+   TRACE_FILE_TIME_IN_NS   = 4,
+};
+
 
 struct trace_event;
 
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e26df3..3fe4c5b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -476,10 +476,11 @@ static const char *trace_options[] = {
 static struct {
u64 (*func)(void);
const char *name;
+   int in_ns; /* is this clock in nanoseconds? */
 } trace_clocks[] = {
-   { trace_clock_local,"local" },
-   { trace_clock_global,   "global" },
-   { trace_clock_counter,  "counter" },
+   { trace_clock_local,"local",1 },
+   { trace_clock_global,   "global",   1 },
+   { trace_clock_counter,  "counter",  0 },
ARCH_TRACE_CLOCKS
 };
 
@@ -2425,6 +2426,10 @@ __tracing_open(struct inode *inode, struct file *file)
if (ring_buffer_overruns(iter->tr->buffer))
iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
/* stop the trace while dumping */
tracing_stop();
 
@@ -3324,6 +3329,10 @@ static int tracing_open_pipe(struct inode *inode, struct 
file *filp)
if (trace_flags & TRACE_ITER_LATENCY_FMT)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
+   /* Output in nanoseconds only if we are using a clock in nanoseconds. */
+   if (trace_clocks[trace_clock_id].in_ns)
+   iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
+
iter->cpu_file = cpu_file;
iter->tr = _trace;
mutex_init(>mutex);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55e1f7f..84fefed 100644
--- a/kernel/trace/trace.h
+++ 

Re: [PATCH v3 3/3] tracing: format non-nanosec times from tsc clock without a decimal point.

2012-09-25 Thread David Sharp
On Mon, Sep 24, 2012 at 8:27 PM, Steven Rostedt  wrote:
> On Thu, 2012-09-20 at 15:52 -0700, David Sharp wrote:
>> With the addition of the "tsc" clock, formatting timestamps to look like
>> fractional seconds is misleading. Mark clocks as either in nanoseconds or
>> not, and format non-nanosecond timestamps as decimal integers.
>
> I got this:
>
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__udivdi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__udivdi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__umoddi3'
> /work/autotest/nobackup/linux-test.git/kernel/trace/trace_output.c:636: 
> undefined reference to `__udivdi3'
>
> when building for x86_32.
>
>> + int ret;
>> + struct trace_seq *s = >seq;
>> + unsigned long verbose = trace_flags & TRACE_ITER_VERBOSE;
>> + unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
>> + unsigned long long abs_ts = iter->ts - iter->tr->time_start;
>> + unsigned long long rel_ts = next_ts - iter->ts;
>> + unsigned long mark_thresh;
>> +
>> + if (in_ns) {
>> + abs_ts = ns2usecs(abs_ts);
>> + rel_ts = ns2usecs(rel_ts);
>> + mark_thresh = preempt_mark_thresh_us;
>> + } else
>> + mark_thresh = preempt_mark_thresh_cycles;
>> +
>> + if (verbose && in_ns) {
>> + ret = trace_seq_printf(
>> + s, "[%08llx] %lld.%03lldms (+%lld.%03lldms): ",
>> + ns2usecs(iter->ts),
>> + abs_ts / USEC_PER_MSEC,
>> + abs_ts % USEC_PER_MSEC,
>> + rel_ts / USEC_PER_MSEC,
>
> You can't divide 64 bit numbers in the kernel. It breaks on 32bit archs.

I didn't realize that. Sorry. Not sure why I changed them to long long
in the first place.

Although, that does mean that it is currently overflowing every 4.29
seconds. I may as well fix that.

>
> -- Steve
>
>> + rel_ts % USEC_PER_MSEC);
>> + } else if (verbose && !in_ns) {
>> + ret = trace_seq_printf(
>> + s, "[%016llx] %lld (+%lld): ",
>> + iter->ts, abs_ts, rel_ts);
>> + } else { /* !verbose */
>> + ret = trace_seq_printf(
>> + s, " %4lld%s%c: ",
>> + abs_ts,
>> + in_ns ? "us" : "",
>> + rel_ts > mark_thresh ? '!' :
>> +   rel_ts > 1 ? '+' : ' ');
>> + }
>> + return ret;
>>  }
>>
>>  int trace_print_context(struct trace_iterator *iter)
>>  {
>>   struct trace_seq *s = >seq;
>>   struct trace_entry *entry = iter->ent;
>> - unsigned long long t = ns2usecs(iter->ts);
>> - unsigned long usec_rem = do_div(t, USEC_PER_SEC);
>> - unsigned long secs = (unsigned long)t;
>> + unsigned long long t;
>> + unsigned long secs, usec_rem;
>>   char comm[TASK_COMM_LEN];
>>   int ret;
>>
>> @@ -644,8 +677,13 @@ int trace_print_context(struct trace_iterator *iter)
>>   return 0;
>>   }
>>
>> - return trace_seq_printf(s, " %5lu.%06lu: ",
>> - secs, usec_rem);
>> + if (iter->iter_flags & TRACE_FILE_TIME_IN_NS) {
>> + t = ns2usecs(iter->ts);
>> + usec_rem = do_div(t, USEC_PER_SEC);
>> + secs = (unsigned long)t;
>> + return trace_seq_printf(s, "%5lu.%06lu: ", secs, usec_rem);
>> + } else
>> + return trace_seq_printf(s, "%12llu: ", iter->ts);
>>  }
>>
>>  int trace_print_lat_context(struct trace_iterator *iter)
>> @@ -659,36 +697,30 @@ int trace_print_lat_context(struct trace_iterator 
>> *iter)
>>  *next_entry = trace_find_next_entry(iter, NULL,
>>  _ts);
>>   unsigned long verbose = (trace_flags & TRACE_ITER_VERBOSE);
>> - unsigned long abs_usecs = ns2usecs(iter->ts - iter->tr->time_start);
>> - unsigned long rel_usecs;
>> +
>>
>>   /* Restore the original ent_size */
>>   iter->ent_size = ent_size;
>>
>>   if (!next_entry)
>>   next_ts = iter->ts;
>> - rel_usecs = ns2usecs(next_ts - iter->ts);
>>
>>   if (verbose) {
>>   char comm[TASK_COMM_LEN];
>>
>>   trace_find_cmdline(entry->pid, comm);
>>
>> - ret = trace_seq_printf(s, "%16s %5d %3d %d %08x %08lx 

Re: [PATCH/resend/bypass] um: Preinclude include/linux/kern_levels.h

2012-09-25 Thread Geert Uytterhoeven
On Tue, Sep 25, 2012 at 9:43 PM, Al Viro  wrote:
> On Tue, Sep 25, 2012 at 12:20:55PM -0700, Linus Torvalds wrote:
>> IOW, this part of the patch:
>>
>> -   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include user.h
>> $(CFLAGS_$(basetarget).o)
>> +   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include
>> $(srctree)/include/linux/kern_levels.h -include user.h
>> $(CFLAGS_$(basetarget).o)
>>
>> just makes me go want to puke. The user.h file already has other
>> #include's in it, so I really don't see why you create this insane
>> special case.
>>
>> And why does UM have those "UM_KERN_XYZ" defines in the first place?
>> Why isn't it just using KERN_XYZ directly? Is it because kern_levels.h
>> didn't use to exist, so it was some kind of "let's create our own that
>> we can hide in our special headers".
>
> Because user.h is included *without* kernel headers in include path.

Indeed.

> It's for the stuff that is compiled with host libc headers.  Keep in
> mind that UML talks to libc like normal architecture would talk to
> hardware.  IOW, analogs of asm glue are in (host) userland C.  And
> they need libc headers instead of the kernel ones.  That's what that
> USER_OBJ thing is about.  Kernel-side constants, etc. are delivered
> to that sucker using the same mechanism we normally use to give them
> to assembler - asm-offsets.c.  And here, of course, slapping ifndef
> __ASSEMBLER__ around the tricky bits will not work - the header itself
> is just fine, but getting kernel headers in the search path really
> isn't.
>
> I agree that proposed solution is ugly.  What we can do is copy
> the damn header into include/generated and #include 
> from user.h.  And kill UM_KERN_... stuff.  Objections?

My first submission had "We may convert all UM_KERN_* users to KERN_*
and drop the extra defines?" as a suggestion, but so far I haven't found time
to implement that...

Still, no one came up with a better patch, and this is a regression.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ktest.pl always returns 0?

2012-09-25 Thread Steven Rostedt
On Tue, 2012-09-25 at 12:40 -0700, Greg KH wrote:

> Hey, it's not my fault your employer has a crummy email system that
> can't handle remote access well, I just went off of the Author: line in
> your ktest.pl kernel commits :)

Yeah, I'm not upset by it. I just want to warn people that there's times
I may spend long periods of not answering that email.

> 
> > > I'm trying to use ktest to do build tests of the stable patch series to
> > > verify I didn't mess anything up, but I'm finding that ktest always
> > > returns 0 when finished, no matter if the build test was successful or
> > > failed.
> > 
> > Hmm, I should fix that. Yeah, I agree, if it fails a test it should
> > return something other than zero. But I think that only happens if you
> > have DIE_ON_FAILURE = 0. As IIRC, the perl "die" command should exit the
> > application with an error code.
> > 
> > But yeah, I agree, if one of the tests fail, the error code should not
> > be zero. I'll write up a patch to fix that. Or at least add an option to
> > make that happen.
> 
> That would be great.
> 
> > > Is this right?  Is there some other way to determine if ktest fails
> > > other than greping the output log?
> > 
> > If you have DIE_ON_FAILURE = 1 (default) it should exit with non zero.
> 
> It doesn't do that, test it and see (this is with what is in Linus's
> 3.6-rc7 tree, I didn't test linux-next if that is newer, my apologies.)

This should have been something from day one. I'll go ahead and try it
out. According to the perl-doc man pages the "die" command has:

   If an uncaught exception results in interpreter exit, the exit
   code is determined from the values of $! and $? with this
   pseudocode:

   exit $! if $!;  # errno
   exit $? >> 8 if $? >> 8;# child exit status
   exit 255;   # last resort

I'll investigate this further.

> 
> > > Oh, and any hints on kicking off a ktest process on a remote machine in
> > > a "simple" way?  I'm just using ssh to copy over a script that runs
> > > there, wrapping ktest.pl up with other stuff, I didn't miss the fact
> > > that ktest itself can run remotely already, did I?
> > 
> > I'm a little confused by this question. Do you want a server ktest? That
> > is, have a ktest daemon that listens for clients that sends it config
> > files and then runs them? That would actually be a fun project ;-)
> > 
> > You're not running ktest on the target machine are you? The way I use it
> > is the following:
> > 
> > I have a server that I ssh to and run ktest from. It does all the builds
> > there on the server and this server has a means to monitor some target.
> > I use ttywatch that connects to the serial of the target, in which ktest
> > uses to read from.
> > 
> > Sometimes this "server" is the machine I'm logged in to.  And I just run
> > ktest directly.
> > 
> > Can you explain more of what you are looking for?
> 
> I want to be able to say:
>   - take this set of stable patches and go run a 'make
> allmodconfig' build on a remote machine and email me back the
> answer because I might not be able to keep an internet
> connection open for the next 5-15 minutes it might take to
> complete that task.

I cheat and run all my ktests in screen sessions ;-)

> 
> I don't do boot tests with these kernel build tests, although sometime
> in the future it would be nice to do that.  Right now I do that testing
> manually, as it's pretty infrequent (once per release usually.)
> 
> So yes, a 'ktest' server would be nice.  I've attached the (horrible)
> script below that I'm using for this so far.  It seems to work well, and
> I can do builds on a "cloud" server as well as my local build server
> just fine, only thing needed to do is change the user and machine name
> in the script.

This looks like my next "when I have time" project ;-).


> 
> I know ktest doesn't handle quilt patches yet, which is why I apply them
> "by hand" now to a given git tree branch, if you ever do add that
> option, I'll gladly test it out and change my script to use whatever
> format it needs.
> 

Yeah, I need to make ktest work with quilt, as I'm still a fan.

But currently the ones that pay me actually are giving me things to do.
Something about satisfying customers or some other crap. Thus, my "down
time" is limited at the moment :-(  But when things on the customer side
slows down again, I'll definitely work on these changes.

Thanks for the ideas! I'm actually looking forward to working on this.
But in the mean time, I will test the next time ktest fails on me to see
what the result of $? is.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


re: Thermal: Update binding logic based on platform data

2012-09-25 Thread Dan Carpenter
Hello Durgadoss R,

This is a semi-automatic email about new static checker warnings.

The patch 9b70dfa68ae8: "Thermal: Update binding logic based on 
platform data" from Sep 18, 2012, leads to the following Smatch 
complaint:

drivers/thermal/thermal_sys.c:292 bind_tz()
 error: we previously assumed 'tzp' could be null (see line 283)

drivers/thermal/thermal_sys.c
   282  /* If there is no platform data, try to use ops->bind */
   283  if (!tzp && tz->ops->bind) {

New check.

   284  list_for_each_entry(pos, _cdev_list, node) {
   285  ret = tz->ops->bind(tz, pos);
   286  if (ret)
   287  print_bind_err_msg(tz, pos, ret);
   288  }
   289  goto exit;
   290  }
   291  
   292  if (!tzp->tbp)
 
New dereference.

   293  goto exit;
   294  

There are also some locking bugs which need to be fixed as well.

drivers/thermal/thermal_sys.c:268 bind_cdev() warn: inconsistent returns 
mutex:_list_lock: locked (256) unlocked (268)
drivers/thermal/thermal_sys.c:396 update_temperature() warn: inconsistent 
returns mutex:>lock: locked (390) unlocked (396)

regards,
dan carpenter


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memstick: add support for legacy memorysticks

2012-09-25 Thread Maxim Levitsky
On Tue, 2012-09-25 at 12:38 -0700, Tejun Heo wrote: 
> Hello, Maxim.
> 
> On Tue, Sep 25, 2012 at 09:26:13PM +0200, Maxim Levitsky wrote:
> > > Probably not the best idea to use a name this generic in driver code.
> > > linux/scatterlist.h likely might wanna use the name.
> >
> > Lets not go this route again. I already once submitted these, and had
> > a share of problems with merging these poor functions into the scatter
> > list.
> > scatter list users mostly dont need these as they just translate it into
> > hardware specific representation.
> > In my case, I don't and yet its easier that working with BIOs. 
> 
> Hmmm... then please at least add a prefix to the names.
Will do! 
> 
> > > Also, from what it does, it seems sg_copy() is a bit of misnomer.
> > > Rename it to sg_remap_with_offset() or something and move it to
> > > lib/scatterlist.c?
> >
> > Don't think so. This copies part of a scatter list into another
> > scatterlist.
> > I have to use is as memstick underlying drivers expect a single
> > scatterlist for each 512 bytes sector I read. Yes, it contains just one
> > entry, but still. I haven't designed the interface. 
> 
> It doesn't really matter if it's a function only used in the driver,
> but please don't use sg_copy() as its name.
Sure! 
> 
> > > Maybe we can make sg_copy_buffer() more generic so that it takes a
> > > callback and implement this on top of it?  Having sglist manipulation
> > > scattered isn't too attractive.
> >
> > Again this is very specific to my driver. Usually nobody pokes the
> > scatterlists. 
> 
> The problem is that there are talks of improving sglist handling (make
> it more generic, unify it with bvec and so on) and this sort of one
> off direct manipulations often become headaches afterwards, so if at
> all possible it's best to keep stuff centralized.
If the sglist gets improved, I will be glad to update my driver to use
the new generic code.

Compared to my xD driver, which just works directly with pointers to 512
byte buffers (with the price of not utilizing highmem), the sglist (and
bios are even worse, I recall), gave me lot of headaches :-(

> 
> > > Is it really necessary to implement explicit state machine?  Can't you
> > > just throw a work item at it and process it synchronously?  Explicit
> > > state machines can save some resources at the cost of a lot more
> > > complexity and generally making things a lot more fragile.  Is it
> > > really worth it here?
> >
> > It would be awesome to not use these ugly state machines, but I have to,
> > because this is required by underlying driver interface.
> > Its callback driven, that is underlying driver calls into this driver,
> > when it wants, and I supply it new command to execute.
> > Unfortunately with legacy memsticks, to read or write a sector, one
> > needs to send it many different commands. Thats why these state
> > machines. I at least made them more or less consistent.
> > Just take a look at mspro_blk.c
> 
> I see.  Eeeek...

I tried once to improve it, dunno if I did or made it even worse, its a
long story Anyway it wasn't liked my its author.

Anyway, my goal now is to get this driver merged, as its the last piece
of the puzzle to make my card reader perfect. If can now read/write
everything, and its the only such reader supported by Linux.

To be honest the Jmicron reader I have doesn't have a driver for xD
part, and I need to stop beeing lazy and write a driver, I do have even
docs for it.

Thanks for review, I will address these comments soon,

Best regards,
Maxim Levitsky




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Tejun Heo
Hello, Lai.

On Tue, Sep 25, 2012 at 05:02:43PM +0800, Lai Jiangshan wrote:
> It is not possible to remove cascading. If cascading code is
> not in flush_workqueue(), it must be in some where else.

Yeah, sure, I liked that it didn't have to be done explicitly as a
separate step.

> If you force overflow to wait for freed color before do flush(which also
> force only one flusher for one color), and force the sole flush_workqueue()
> to grab ->flush_mutex twice, we can simplify the flush_workqueue().
> (see the attached patch, it remove 100 LOC, and the cascading code becomes
> only 3 LOC). But these two forcing slow down the caller a little.

Hmmm... so, that's a lot simpler.  flush_workqueue() isn't a super-hot
code path and I don't think grabbing mutex twice is too big a deal.  I
haven't actually reviewed the code but if it can be much simpler and
thus easier to understand and verify, I might go for that.

> (And if you allow to use SRCU(which is only TWO colors), you can remove 
> another
> 150 LOC. flush_workqueue() will become single line. But it will add some more 
> overhead
> in flush_workqueue() because SRCU's readsite is lockless)

I'm not really following how SRCU would factor into this but
supporting multiple colors was something explicitly requested by
Linus.  The initial implementation was a lot simpler which supported
only two colors.  Linus was worried that the high possibility of
flusher clustering could lead to chaining of latencies.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/1] perf, Add support for Xeon-Phi PMU

2012-09-25 Thread Cyrill Gorcunov
On Tue, Sep 25, 2012 at 12:23:23PM -0400, Vince Weaver wrote:
> Hello
> 
> This is an updated version of the patch.  It uses
> ARCH_PERFMON_EVENTSEL_INT for the DATA_READ event, with the assumption
> that x86_pmu_hw_config() is going to set that bit anyway.  This lets
> the code get through the test for an event being 0 without
> triggering -ENOENT.

FWIW, looks good to me, thanks Vince!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/10] msm: iommu: Convert to clk_prepare/unprepare

2012-09-25 Thread Stephen Boyd
On 09/24/12 15:32, Saravana Kannan wrote:
>> @@ -275,8 +275,11 @@ static int msm_iommu_remove(struct
>> platform_device *pdev)
>>
>>   drv = platform_get_drvdata(pdev);
>>   if (drv) {
>> -if (drv->clk)
>> +if (drv->clk) {
>> +clk_unprepare(drv->clk);
>>   clk_put(drv->clk);
>> +}
>> +clk_unprepare(drv->pclk);
>
>
> Are these changes right? Every other clk API change in this patch is
> using the combined prepare_enable/disable_unprepare() calls. So, when
> would we end up at this location with the clocks prepared but not
> enabled?
>
> Also, what if the device gets probed and then immediately removed.
> Will it work correctly?
>

It should work correctly. If you look at the bottom of msm_iommu_probe()
you see that it call clk_disable() and doesn't unprepare the clock. So
if the driver is unbound the clocks should be disabled but still prepared.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/10] workqueue: restructure flush_workqueue() and start all flusher at the same time

2012-09-25 Thread Tejun Heo
Hello, Lai.

On Tue, Sep 25, 2012 at 05:02:31PM +0800, Lai Jiangshan wrote:
> I found the flush_workqueue() is not nature for me, especially

I don't think it's natural for anybody.  I'm not a big fan of that
code either.

> the usage of the colors and flush_workqueue_prep_cwqs().
> so I try to improve it without change too much things/behavior.
> 
> (These patchset delay other simple patches, I think I should
> send simple patches at first.)

Yes, please do so.

> I always check the code by hard reviewing the code. I always try to image
> there are many thread run in my brain orderless and I write all possible
> transitions in paper. This progress is the most important, no test can
> replace it.
> 
> Human brain can wrong, the attached patch is my testing code.
> It verify flush_workqueue() by cookie number.

Sure, nothing beats careful reviews but then again it's difficult to
have any level of confidence without actually excercising and
verifying each possible code path.  For tricky changes, it helps a lot
if you describe how the code was verified and why and how much you
feel confident about the change.

> I also need your testing code for workqueue. ^_^

Heh, you asked for it.  Attached.  It's a scenario based thing and I
use different scenarios usually in combination with some debug printks
to verify things are behaving as I think they should.

Thanks.

-- 
tejun
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define MAX_WQ_NAME		64
#define MAX_WQS			64
#define MAX_WORKS		64

struct wq_spec {
	int			id;	/* -1 terminates */
	unsigned int		max_active;
	unsigned int		flags;
};

enum action {
	ACT_TERM,			/* end */
	ACT_LOG,			/* const char * */
	ACT_BURN,			/* ulong duration_msecs */
	ACT_SLEEP,			/* ulong duration_msecs */
	ACT_WAKEUP,			/* ulong work_id */
	ACT_REQUEUE,			/* ulong delay_msecs, ulong cpu */
	ACT_FLUSH,			/* ulong work_id */
	ACT_FLUSH_WQ,			/* ulong workqueue_id */
	ACT_CANCEL,			/* ulong work_id */
};

struct work_action {
	enum action		action;	/* ACT_TERM terminates */
	union {
		unsigned long	v;
		const char	*s;
	};
	unsigned long		v1;
};

struct work_spec {
	int			id;		/* -1 terminates */
	int			wq_id;
	int			requeue_cnt;
	unsigned int		cpu;
	unsigned long		initial_delay;	/* msecs */

	const struct work_action *actions;
};

struct test_scenario {
	const struct wq_spec	*wq_spec;
	const struct work_spec	**work_spec;	/* NULL terminated */
};

static const struct wq_spec dfl_wq_spec[] = {
	{
		.id		= 0,
		.max_active	= 32,
		.flags		= 0,
	},
	{
		.id		= 1,
		.max_active	= 32,
		.flags		= 0,
	},
	{
		.id		= 2,
		.max_active	= 32,
		.flags		= WQ_RESCUER,
	},
	{
		.id		= 3,
		.max_active	= 32,
		.flags		= WQ_FREEZABLE,
	},
	{
		.id		= 4,
		.max_active	= 1,
		.flags		= WQ_UNBOUND | WQ_FREEZABLE/* | WQ_DBG*/,
	},
	{
		.id		= 5,
		.max_active	= 32,
		.flags		= WQ_NON_REENTRANT,
	},
	{
		.id		= 6,
		.max_active	= 4,
		.flags		= WQ_HIGHPRI,
	},
	{
		.id		= 7,
		.max_active	= 4,
		.flags		= WQ_CPU_INTENSIVE,
	},
	{
		.id		= 8,
		.max_active	= 4,
		.flags		= WQ_HIGHPRI | WQ_CPU_INTENSIVE,
	},
	{ .id = -1 },
};

/*
 * Scenario 0.  All are on cpu0.  work16 and 17 burn cpus for 10 and
 * 5msecs respectively and requeue themselves.  18 sleeps 2 secs and
 * cancel both.
 */
static const struct work_spec work_spec0[] = {
	{
		.id		= 16,
		.requeue_cnt	= 1024,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 10 }},
			{ ACT_REQUEUE,	{ 0 }, NR_CPUS },
			{ ACT_TERM },
		},
	},
	{
		.id		= 17,
		.requeue_cnt	= 1024,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 5 }},
			{ ACT_REQUEUE,	{ 0 }, NR_CPUS },
			{ ACT_TERM },
		},
	},
	{
		.id		= 18,
		.actions	= (const struct work_action[]) {
			{ ACT_LOG,	{ .s = "will sleep 2s and cancel both" }},
			{ ACT_SLEEP,	{ 2000 }},
			{ ACT_CANCEL,	{ 16 }},
			{ ACT_CANCEL,	{ 17 }},
			{ ACT_TERM },
		},
	},
	{ .id = -1 },
};

static const struct test_scenario scenario0 = {
	.wq_spec		= dfl_wq_spec,
	.work_spec		=
	(const struct work_spec *[]) { work_spec0, NULL },
};

/*
 * Scenario 1.  All are on cpu0.  Work 0, 1 and 2 sleep for different
 * intervals but all three will terminate at around 30secs.  3 starts
 * at @28 and 4 at @33 and both sleep for five secs and then
 * terminate.  5 waits for 0, 1, 2 and then flush wq which by the time
 * should have 3 on it.  After 3 completes @32, 5 terminates too.
 * After 4 secs, 4 terminates and all test sequence is done.
 */
static const struct work_spec work_spec1[] = {
	{
		.id		= 0,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 3 }},	/* to cause sched activation */
			{ ACT_LOG,	{ .s = "will sleep 30s" }},
			{ ACT_SLEEP,	{ 3 }},
			{ ACT_TERM },
		},
	},
	{
		.id		= 1,
		.actions	= (const struct work_action[]) {
			{ ACT_BURN,	{ 5 }},
			{ ACT_LOG,	{ .s = "will sleep 10s and burn 5msec and repeat 3 times" }},
			{ ACT_SLEEP,	{ 1 }},
			{ ACT_BURN,	{ 5 }},
			{ ACT_LOG,	{ .s = "@10s" }},
			{ ACT_SLEEP,	{ 1 }},
			{ ACT_BURN,	{ 5 }},
		

Re: [PATCH v2 memstick: support for legacy sony memsticks

2012-09-25 Thread Maxim Levitsky
On Tue, 2012-09-25 at 12:40 -0700, Tejun Heo wrote: 
> On Tue, Sep 25, 2012 at 09:34:39PM +0200, Maxim Levitsky wrote:
> > But this just adds the WQ_UNBOUND. Dunno, without lock I had several
> > crashes, that for high level of confidence caused by by parallel
> > execution of work items. Once I added this mutex, I couldnt reproduce
> > these.
> 
> Yes the combination of WQ_UNBOUND and max_active==1 guarantees
> strictly ordered one-by-one execution.
> 
> > I had the __blk_end_request fail with NULL msb->req. I can't see how
> > that can happen if work queue isn't executed in parallel.
> > (and then the I didn't even had by mistake the code that sets it to NULL
> > in msb_stop, so I really fail to see how that could happen due internal
> > bug in my code. 
> 
> If you're seeing parallel execution w/ ordered workqueue, it is a
> critical bug which would make the kernel crash left and right.  Please
> try alloc_ordered_workqueue() and if you still see parallel execution,
> please report.
I will test this very soon. Good to know, I am pretty sure, it will
work.

> 
> Thanks.
> 

-- 
Best regards,
Maxim Levitsky

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Resume with SSD

2012-09-25 Thread Marcos Souza
Hi Carlos

2012/9/25 Carlos Moffat :
> Hi
>
> On 09/25/2012 12:07 PM, Srivatsa S. Bhat wrote:
>>
>> On 09/26/2012 12:00 AM, Carlos Moffat wrote:
>>>
>>> Hi,
>>>
>>> (please let me know if this is the wrong list to ask this)
>>>
>>> I have a Crucial M4 512 GB SSD installed on my Thinkpad X220 (Ubuntu
>>> Precise). Overall this runs very nicely, but it takes 10+ seconds to
>>> resume from suspend, apparently because some issue with the hardrive.
>>> The only message I see while resuming is "COMRESET failed (errno=-16)".
>>>
>>> [52483.228615] ata1: link is slow to respond, please be patient (ready=0)
>>> [52487.870616] ata1: COMRESET failed (errno=-16)
>>> [52488.190222] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>>> [52488.190752] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
>>> succeeded
>>> [52488.190754] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
>>> LOCK) filtered out
>>> [52488.190755] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
>>> filtered out
>>> [52488.191849] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
>>> succeeded
>>> [52488.191855] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
>>> LOCK) filtered out
>>> [52488.191860] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
>>> filtered out
>>> [52488.192406] ata1.00: configured for UDMA/100
>>> [52488.206298] sd 0:0:0:0: [sda] Starting disk
>>> [52488.207334] Extended CMOS year: 2000
>>> [52488.208335] PM: resume of devices complete after 10376.896 msecs
>>> [52488.208552] PM: resume devices took 10.376 seconds
>>>
>>> The only relevant post I've found was in the crucial support site:
>>>
>>>
>>> http://forums.crucial.com/t5/Solid-State-Drives-SSD/SOLVED-M4-CT512M4SSD1-7mm-512Gb-SSD-too-slow-when-laptop-wakes/td-p/102666
>>>
>>>
>>> which suggested adding libata.force=nohrst as a boot option to get rid
>>> of the problem.
>>>
>>> I tried that, but the laptop wouldn't suspend.
>>>
>>> Any ideas?
>>>
>>
>> (Adding relevant people to CC)
>>
>> I recall seeing a similar problem getting fixed in mainline quite a long
>> time ago (around v3.3 I think). Did you try the latest mainline kernel?
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>
>
> Yes, I'm using 3.5.4 (Ubuntu Mainline packages).

I believe that the kernel that Ubuntu uses have some patches from
them, am I right?

Can you try the last stable kernel(vanilla) from mainline? Maybe
downloading from the kernel.org and compiling.

This can help us to track the error.

Thanks!

> Carlos
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Att,

Marcos Paulo de Souza
Acadêmico de Ciencia da Computação - FURB - SC
Github: https://github.com/marcosps/
"Uma vida sem desafios é uma vida sem razão"
"A life without challenges, is a non reason life"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/9] mm: compaction: Cache if a pageblock was scanned and no pages were isolated

2012-09-25 Thread Andrew Morton
On Tue, 25 Sep 2012 10:12:07 +0100
Mel Gorman  wrote:

> First, we'd introduce a variant of get_pageblock_migratetype() that returns
> all the bits for the pageblock flags and then helpers to extract either the
> migratetype or the PG_migrate_skip. We already are incurring the cost of
> get_pageblock_migratetype() so it will not be much more expensive than what
> is already there. If there is an allocation or free within a pageblock that
> as the PG_migrate_skip bit set then we increment a counter. When the counter
> reaches some to-be-decided "threshold" then compaction may clear all the
> bits. This would match the criteria of the clearing being based on activity.
> 
> There are four potential problems with this
> 
> 1. The logic to retrieve all the bits and split them up will be a little
>convulated but maybe it would not be that bad.
> 
> 2. The counter is a shared-writable cache line but obviously it could
>be moved to vmstat and incremented with inc_zone_page_state to offset
>the cost a little.
> 
> 3. The biggested weakness is that there is not way to know if the
>counter is incremented based on activity in a small subset of blocks.
> 
> 4. What should the threshold be?
> 
> The first problem is minor but the other three are potentially a mess.
> Adding another vmstat counter is bad enough in itself but if the counter
> is incremented based on a small subsets of pageblocks, the hint becomes
> is potentially useless.
> 
> However, does this match what you have in mind or am I over-complicating
> things?

Sounds complicated.

Using wall time really does suck.  Are you sure you can't think of
something more logical?

How would we demonstrate the suckage?  What would be the observeable downside of
switching that 5 seconds to 5 hours?

> > > > > + for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) 
> > > > > {
> > > > > + struct page *page;
> > > > > + if (!pfn_valid(pfn))
> > > > > + continue;
> > > > > +
> > > > > + page = pfn_to_page(pfn);
> > > > > + if (zone != page_zone(page))
> > > > > + continue;
> > > > > +
> > > > > + clear_pageblock_skip(page);
> > > > > + }
> > > > 
> > > > What's the worst-case loop count here?
> > > > 
> > > 
> > > zone->spanned_pages >> pageblock_order
> > 
> > What's the worst-case value of (zone->spanned_pages >> pageblock_order) :)
> 
> Lets take an unlikely case - 128G single-node machine. That loop count
> on x86-64 would be 65536. It'll be fast enough, particularly in this
> path.

That could easily exceed a millisecond.  Can/should we stick a
cond_resched() in there?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] ARM: dove: Remove watchdog from DT

2012-09-25 Thread Arnd Bergmann
On Tuesday 25 September 2012, sebastien requiem wrote:
> On Tue, Sep 25, 2012 at 2:33 PM, Arnd Bergmann  wrote:
> > On Tuesday 25 September 2012, Arnd Bergmann wrote:
> >> On Tuesday 25 September 2012, Andrew Lunn wrote:
> >> > > Regarding mv78xx0, I agree that I'm not sure what to do. The number of
> >> > > supported platforms is small. Should we simply mark mv78xx0 deprecated
> >> > > now, wait a few release cycles to see if anyone shows up, and see what
> >> > > to do at this point?
> >>
> >> We should let Sebastien Requiem comment. He is the only person outside of
> >> Marvell who has contributed a board file for mv78xx0. If he's interested in
> >> keeping it alive, he's hopefully also able to find the time to test the
> >> devicetree version of that platform in mach-mvebu. Similarly, if anyone
> >> has the MASA reference design, that one could be moved over to mach-mvebu
> >> first.
> >>
> >> There is a much smaller user base for mv78xx0 than for orion5x, so as long
> >> as we can keep the support working with DT, we can throw out the legacy
> >> code much faster than for orion. If it doesn't get put into mach-mvebu
> >> and you can't find anyone who has hardware to test on, you could also
> >> stop maintaining it and leave it to bitrot, but I wouldn't just remove it
> >> on a fast track then.
> >
> > The address I used for Sebastien appears to be dead. Maybe this one still
> > works.
> 
> Yes, this one works. Sorry for not having updated my email address.
> 
> I would be happy to convert the mv78xx0 platform to DT (and also do
> the monkey work). My knowledge is quite limited regarding the recent changes
> but I am sure that some of you could help me in the process.
> 
> Moreover, I still have a board to test at home.

Ok, excellent!

If you want to start looking into things, I suggest you follow the
examples from the mach-dove directory, which is similar to mv78xx0 in
that it also has only a small number of boards that are supported,
and we can convert them all at the same time, rather than supporting
both methods in parallel as we do for orion5x and kirkwood.

The basic idea is to start with a DT_START_MACHINE section that will
end up being used for all machines and just initializes all the
devices that you have on your machine, but also calls of_platform_populate.
Then you can gradually move over one device at a time from being
statically initialized to being added to a board description in
arch/arm/boot/dts/*.dts.

One thing that seems to be special about mv78xx0 (though not the wxl
in particular) is that we can have Linux running in two instances on
either core of the machine and just give it a few of the devices.
I think this can be handled nicely with DT by having a .dtsi include
file that actually describes all of the machine but marks most of the
devices as disabled, and then have different .dts files including
the main file and selectively enabling the parts that are used there.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: ux500: Move regulator-name properties out to board DTS files

2012-09-25 Thread Linus Walleij
On Tue, Sep 25, 2012 at 1:37 PM, Lee Jones  wrote:

> Regulator supply names should be allocated by board rather than
> per SoC, as the same SoC could be wired differently on varying
> hardware. Here we push all regulator-name allocation out to the
> dbx5x0 subordinate board files; HREF and Snowball.
>
> Requested-by: Mark Brown 
> Signed-off-by: Lee Jones 

Acked-by: Linus Walleij 

Thanks,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch slab/next] mm, slob: fix build breakage in __kmalloc_node_track_caller

2012-09-25 Thread Ezequiel Garcia
On Tue, Sep 25, 2012 at 4:53 PM, David Rientjes  wrote:
> On Sat, 8 Sep 2012, Ezequiel Garcia wrote:
>
>> @@ -454,15 +455,35 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
>>   gfp |= __GFP_COMP;
>>   ret = slob_new_pages(gfp, order, node);
>>
>> - trace_kmalloc_node(_RET_IP_, ret,
>> + trace_kmalloc_node(caller, ret,
>>  size, PAGE_SIZE << order, gfp, node);
>>   }
>>
>>   kmemleak_alloc(ret, size, 1, gfp);
>>   return ret;
>>  }
>> +
>> +void *__kmalloc_node(size_t size, gfp_t gfp, int node)
>> +{
>> + return __do_kmalloc_node(size, gfp, node, _RET_IP_);
>> +}
>>  EXPORT_SYMBOL(__kmalloc_node);
>>
>> +#ifdef CONFIG_TRACING
>> +void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
>> +{
>> + return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller);
>> +}
>> +
>> +#ifdef CONFIG_NUMA
>> +void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
>> + int node, unsigned long caller)
>> +{
>> + return __do_kmalloc_node(size, gfp, node, caller);
>> +}
>> +#endif
>
> This breaks Pekka's slab/next tree with this:
>
> mm/slob.c: In function '__kmalloc_node_track_caller':
> mm/slob.c:488: error: 'gfp' undeclared (first use in this function)
> mm/slob.c:488: error: (Each undeclared identifier is reported only once
> mm/slob.c:488: error: for each function it appears in.)
>
>
> mm, slob: fix build breakage in __kmalloc_node_track_caller
>
> "mm, slob: Add support for kmalloc_track_caller()" breaks the build
> because gfp is undeclared.  Fix it.
>
> Signed-off-by: David Rientjes 
> ---
>  mm/slob.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slob.c b/mm/slob.c
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -482,7 +482,7 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfp, 
> unsigned long caller)
>  }
>
>  #ifdef CONFIG_NUMA
> -void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> +void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
> int node, unsigned long caller)
>  {
> return __do_kmalloc_node(size, gfp, node, caller);

Acked-by: Ezequiel Garcia 

Thanks,
Ezequiel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[patch slab/next] mm, slob: fix build breakage in __kmalloc_node_track_caller

2012-09-25 Thread David Rientjes
On Sat, 8 Sep 2012, Ezequiel Garcia wrote:

> @@ -454,15 +455,35 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
>   gfp |= __GFP_COMP;
>   ret = slob_new_pages(gfp, order, node);
>  
> - trace_kmalloc_node(_RET_IP_, ret,
> + trace_kmalloc_node(caller, ret,
>  size, PAGE_SIZE << order, gfp, node);
>   }
>  
>   kmemleak_alloc(ret, size, 1, gfp);
>   return ret;
>  }
> +
> +void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> +{
> + return __do_kmalloc_node(size, gfp, node, _RET_IP_);
> +}
>  EXPORT_SYMBOL(__kmalloc_node);
>  
> +#ifdef CONFIG_TRACING
> +void *__kmalloc_track_caller(size_t size, gfp_t gfp, unsigned long caller)
> +{
> + return __do_kmalloc_node(size, gfp, NUMA_NO_NODE, caller);
> +}
> +
> +#ifdef CONFIG_NUMA
> +void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> + int node, unsigned long caller)
> +{
> + return __do_kmalloc_node(size, gfp, node, caller);
> +}
> +#endif

This breaks Pekka's slab/next tree with this:

mm/slob.c: In function '__kmalloc_node_track_caller':
mm/slob.c:488: error: 'gfp' undeclared (first use in this function)
mm/slob.c:488: error: (Each undeclared identifier is reported only once
mm/slob.c:488: error: for each function it appears in.)


mm, slob: fix build breakage in __kmalloc_node_track_caller

"mm, slob: Add support for kmalloc_track_caller()" breaks the build 
because gfp is undeclared.  Fix it.

Signed-off-by: David Rientjes 
---
 mm/slob.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slob.c b/mm/slob.c
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -482,7 +482,7 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfp, 
unsigned long caller)
 }
 
 #ifdef CONFIG_NUMA
-void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
+void *__kmalloc_node_track_caller(size_t size, gfp_t gfp,
int node, unsigned long caller)
 {
return __do_kmalloc_node(size, gfp, node, caller);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 0/8] Avoid cache trashing on clearing huge/gigantic page

2012-09-25 Thread Andrea Arcangeli
Hi Kirill,

On Tue, Sep 25, 2012 at 05:27:03PM +0300, Kirill A. Shutemov wrote:
> On Fri, Sep 14, 2012 at 07:52:10AM +0200, Ingo Molnar wrote:
> > Without repeatable hard numbers such code just gets into the 
> > kernel and bitrots there as new CPU generations come in - a few 
> > years down the line the original decisions often degrade to pure 
> > noise. We've been there, we've done that, we don't want to 
> > repeat it.
> 
> 
> 
> Hard numbers are hard.
> I've checked some workloads: Mosbench, NPB, specjvm2008. Most of time the
> patchset doesn't show any difference (within run-to-run deviation).
> On NPB it recovers THP regression, but it's probably not enough to make
> decision.
> 
> It would be nice if somebody test the patchset on other system or
> workload. Especially, if the configuration shows regression with
> THP enabled.

If the only workload that gets a benefit is NPB then we've the proof
this is too hardware dependend to be a conclusive result.

It may have been slower by an accident, things like cache
associativity off by one bit, combined with the implicit coloring
provided to the lowest 512 colors could hurts more if the cache
associativity is low.

I'm saying this because NPB on a thinkpad (Intel CPU I assume) is the
benchmark that shows the most benefit among all benchmarks run on that
hardware.

http://www.phoronix.com/scan.php?page=article=linux_transparent_hugepages=2

I've once seen certain computations that run much slower with perfect
cache coloring but most others runs much faster with the page
coloring. Doesn't mean page coloring is bad per se. So the NPB on that
specific hardware may have been the exception and not the interesting
case. Especially considering the effect of cache-copying is opposite
on slightly different hw.

I think the the static_key should be off by default whenever the CPU
L2 cache size is >= the size of the copy (2*HPAGE_PMD_SIZE). Now the
cache does random replacement so maybe we could also allow cache
copies for twice the size of the copy (L2size >=
4*HPAGE_PMD_SIZE). Current CPUs have caches much larger than 2*2MB...

It would make a whole lot more sense for hugetlbfs giga pages than for
THP (unlike for THP, cache trashing with giga pages is guaranteed),
but even with giga pages, it's not like they're allocated frequently
(maybe once per OS reboot) so that's also sure totally lost in the
noise as it only saves a few accesses after the cache copy is
finished.

It's good to have tested it though.

Thanks,
Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/resend/bypass] um: Preinclude include/linux/kern_levels.h

2012-09-25 Thread Al Viro
On Tue, Sep 25, 2012 at 12:20:55PM -0700, Linus Torvalds wrote:
> IOW, this part of the patch:
> 
> -   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include user.h
> $(CFLAGS_$(basetarget).o)
> +   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include
> $(srctree)/include/linux/kern_levels.h -include user.h
> $(CFLAGS_$(basetarget).o)
> 
> just makes me go want to puke. The user.h file already has other
> #include's in it, so I really don't see why you create this insane
> special case.
> 
> And why does UM have those "UM_KERN_XYZ" defines in the first place?
> Why isn't it just using KERN_XYZ directly? Is it because kern_levels.h
> didn't use to exist, so it was some kind of "let's create our own that
> we can hide in our special headers".

Because user.h is included *without* kernel headers in include path.
It's for the stuff that is compiled with host libc headers.  Keep in
mind that UML talks to libc like normal architecture would talk to
hardware.  IOW, analogs of asm glue are in (host) userland C.  And
they need libc headers instead of the kernel ones.  That's what that
USER_OBJ thing is about.  Kernel-side constants, etc. are delivered
to that sucker using the same mechanism we normally use to give them
to assembler - asm-offsets.c.  And here, of course, slapping ifndef
__ASSEMBLER__ around the tricky bits will not work - the header itself
is just fine, but getting kernel headers in the search path really
isn't.

I agree that proposed solution is ugly.  What we can do is copy
the damn header into include/generated and #include 
from user.h.  And kill UM_KERN_... stuff.  Objections?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-25 Thread Alex Williamson
On Tue, 2012-09-25 at 20:54 +0200, Florian Dazinger wrote:
> Am Tue, 25 Sep 2012 12:32:50 -0600
> schrieb Alex Williamson :
> 
> > On Mon, 2012-09-24 at 21:03 +0200, Florian Dazinger wrote:
> > > Hi,
> > > I think I've found a regression, which causes an early boot crash, I
> > > appended the kernel output via jpg file, since I do not have a serial
> > > console or sth.
> > > 
> > > after bisection, it boils down to this commit:
> > > 
> > > 9dcd61303af862c279df86aa97fde7ce371be774 is the first bad commit
> > > commit 9dcd61303af862c279df86aa97fde7ce371be774
> > > Author: Alex Williamson 
> > > Date:   Wed May 30 14:19:07 2012 -0600
> > > 
> > > amd_iommu: Support IOMMU groups
> > > 
> > > Add IOMMU group support to AMD-Vi device init and uninit code.
> > > Existing notifiers make sure this gets called for each device.
> > > 
> > > Signed-off-by: Alex Williamson 
> > > Signed-off-by: Joerg Roedel 
> > > 
> > > :04 04 2f6b1b8e104d6dfec0abaa9646750f9b5a4f4060
> > > 837ae95e84f6d3553457c4df595a9caa56843c03 M  drivers
> > 
> > [switching back to mailing list thread]
> > 
> > I asked Florian for dmesg w/ amd_iommu_dump, here's the relevant lines:
> > 
> > [1.485645] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
> > [1.485683] AMD-Vi:mmio-addr: feb2
> > [1.485901] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:00.0 flags: 00
> > [1.485935] AMD-Vi:   DEV_RANGE_END   devid: 00:00.2
> > [1.485969] AMD-Vi:   DEV_SELECT  devid: 00:02.0 
> > flags: 00
> > [1.486002] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 01:00.0 flags: 00
> > [1.486036] AMD-Vi:   DEV_RANGE_END   devid: 01:00.1
> > [1.486070] AMD-Vi:   DEV_SELECT  devid: 00:04.0 
> > flags: 00
> > [1.486103] AMD-Vi:   DEV_SELECT  devid: 02:00.0 
> > flags: 00
> > [1.486137] AMD-Vi:   DEV_SELECT  devid: 00:05.0 
> > flags: 00
> > [1.486170] AMD-Vi:   DEV_SELECT  devid: 03:00.0 
> > flags: 00
> > [1.486204] AMD-Vi:   DEV_SELECT  devid: 00:06.0 
> > flags: 00
> > [1.486238] AMD-Vi:   DEV_SELECT  devid: 04:00.0 
> > flags: 00
> > [1.486271] AMD-Vi:   DEV_SELECT  devid: 00:07.0 
> > flags: 00
> > [1.486305] AMD-Vi:   DEV_SELECT  devid: 05:00.0 
> > flags: 00
> > [1.486338] AMD-Vi:   DEV_SELECT  devid: 00:09.0 
> > flags: 00
> > [1.486372] AMD-Vi:   DEV_SELECT  devid: 06:00.0 
> > flags: 00
> > [1.486406] AMD-Vi:   DEV_SELECT  devid: 00:0b.0 
> > flags: 00
> > [1.486439] AMD-Vi:   DEV_SELECT  devid: 07:00.0 
> > flags: 00
> > [1.486473] AMD-Vi:   DEV_ALIAS_RANGE devid: 08:01.0 
> > flags: 00 devid_to: 08:00.0
> > [1.486510] AMD-Vi:   DEV_RANGE_END   devid: 08:1f.7
> > [1.486548] AMD-Vi:   DEV_SELECT  devid: 00:11.0 
> > flags: 00
> > [1.486581] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:12.0 flags: 00
> > [1.486620] AMD-Vi:   DEV_RANGE_END   devid: 00:12.2
> > [1.486654] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:13.0 flags: 00
> > [1.486688] AMD-Vi:   DEV_RANGE_END   devid: 00:13.2
> > [1.486721] AMD-Vi:   DEV_SELECT  devid: 00:14.0 
> > flags: d7
> > [1.486755] AMD-Vi:   DEV_SELECT  devid: 00:14.3 
> > flags: 00
> > [1.486788] AMD-Vi:   DEV_SELECT  devid: 00:14.4 
> > flags: 00
> > [1.486822] AMD-Vi:   DEV_ALIAS_RANGE devid: 09:00.0 
> > flags: 00 devid_to: 00:14.4
> > [1.486859] AMD-Vi:   DEV_RANGE_END   devid: 09:1f.7
> > [1.486897] AMD-Vi:   DEV_SELECT  devid: 00:14.5 
> > flags: 00
> > [1.486931] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:16.0 flags: 00
> > [1.486965] AMD-Vi:   DEV_RANGE_END   devid: 00:16.2
> > [1.487055] AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
> > 
> > 
> > > lspci:
> > > 00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to 
> > > PCI bridge (external gfx0 port B) (rev 02)
> > > 00:00.2 IOMMU: Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory 
> > > Management Unit (IOMMU)
> > > 00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > > bridge (PCI express gpp port B)
> > > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > > bridge (PCI express gpp port D)
> > > 00:05.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > > bridge (PCI express gpp port E)
> > > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > > bridge (PCI express gpp port F)
> > > 00:07.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > > bridge (PCI express gpp port G)
> > > 

Re: mmotm 2012-09-20-17-25 uploaded (fs/bimfmt_elf on uml)

2012-09-25 Thread David Rientjes
On Sat, 22 Sep 2012, Stephen Rothwell wrote:

> > on uml for x86_64 defconfig:
> > 
> > fs/binfmt_elf.c: In function 'fill_files_note':
> > fs/binfmt_elf.c:1419:2: error: implicit declaration of function 'vmalloc'
> > fs/binfmt_elf.c:1419:7: warning: assignment makes pointer from integer 
> > without a cast
> > fs/binfmt_elf.c:1437:5: error: implicit declaration of function 'vfree'
> 
> reported in linux-next (offending patch reverted for other
> problems).
> 

This still happens on x86_64 for linux-next as of today's tree.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.5.4] rcu_sched self-detected stall on CPU { 1} (t=54862991 jiffies)

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 07:04:19PM +0200, Paweł Sikora wrote:
> On Tuesday 25 of September 2012 09:44:54 Greg KH wrote:
> > On Tue, Sep 25, 2012 at 06:31:36PM +0200, Paweł Sikora wrote:
> > > On Monday 24 of September 2012 10:36:33 Greg KH wrote:
> > > > On Mon, Sep 24, 2012 at 10:05:23AM +0200, Paweł Sikora wrote:
> > > > > Hi,
> > > > > 
> > > > > with the new stable line i'm observing strange locks on my old 
> > > > > amd-phenom-II mini-server.
> > > > > here's a dmesg:
> > > > 
> > > > Did this show up in 3.5.3?  If not, can you run 'git bisect' to find the
> > > > problem patch?
> > > 
> > > heh, the old good kernel put some light on this issue.
> > > 
> > > Sep 25 08:50:24 nexus kernel: [60330.301639] Clocksource tsc unstable 
> > > (delta = -474690884 ns)
> > > Sep 25 08:50:24 nexus kernel: [60330.325477] [ cut here 
> > > ]
> > > Sep 25 08:50:24 nexus kernel: [60330.325484] WARNING: at 
> > > /home/users/builder/rpm/BUILD/kernel-2.6.37.6/linux-2.6.37/net/sched/sch_generic.c:258
> > >  dev_watchdog+0x25d/0x270()
> > > Sep 25 08:50:24 nexus kernel: [60330.325486] Hardware name: 
> > > GA-MA785GMT-UD2H
> > > Sep 25 08:50:24 nexus kernel: [60330.325487] NETDEV WATCHDOG: eth0 
> > > (r8169): transmit queue 0 timed out
> > > (...)
> > > Sep 25 08:50:25 nexus kernel: [60330.851093] Switching to clocksource 
> > > acpi_pm
> > > 
> > > afaics, this amd-phenom cpu does the cpu frequency scaling and causes 
> > > plain 'tsc' timer
> > > instability which leads to network card watchdog timeout (i can login via 
> > > local console
> > > while any network traffic is dead). on the recent 3.5.x kernel the 
> > > 'clocksource unstable'
> > > message appears *after* 'task blocked' flood and there's no clear info 
> > > about watchog timeout.
> > > currently i'm testing hpet clocksource becasue better tsc modes 
> > > (constant_tsc, nonstop_tsc)
> > > aren't present in 
> > > /sys/devices/system/clocksource/clocksource0/available_clocksource while
> > > cpu supports them.
> > 
> > I'm sorry, I don't understand, that's a 2.6.37 kernel you are comparing
> > this to.  Where did this problem show up?  In 3.5.4 where 3.5.3 was
> > fine?
> 
> 'cpu-stall' from topic has appeared in 3.5.2 (after upgrade from 3.4.10).

So, can you run 'git bisect' from 3.4.10 and 3.5.2 to find the commit
causing the problem?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: ktest.pl always returns 0?

2012-09-25 Thread Greg KH
On Tue, Sep 25, 2012 at 02:15:17PM -0400, Steven Rostedt wrote:
> On Tue, 2012-09-25 at 11:00 -0700, Greg KH wrote:
> > Hi Steven,
> 
> Note, emailing my RH account is hit or miss. If I'm traveling I don't
> read it, and I wont return messages until I'm back. It's best to email
> my rost...@goodmis.org account, as I have better access to that account.
> I author my patches by the email of the people that pay me to write
> them. This isn't for your "who wrote the kernel" scripts. This is for
> anyone that happens to do a git log.

Hey, it's not my fault your employer has a crummy email system that
can't handle remote access well, I just went off of the Author: line in
your ktest.pl kernel commits :)

> > I'm trying to use ktest to do build tests of the stable patch series to
> > verify I didn't mess anything up, but I'm finding that ktest always
> > returns 0 when finished, no matter if the build test was successful or
> > failed.
> 
> Hmm, I should fix that. Yeah, I agree, if it fails a test it should
> return something other than zero. But I think that only happens if you
> have DIE_ON_FAILURE = 0. As IIRC, the perl "die" command should exit the
> application with an error code.
> 
> But yeah, I agree, if one of the tests fail, the error code should not
> be zero. I'll write up a patch to fix that. Or at least add an option to
> make that happen.

That would be great.

> > Is this right?  Is there some other way to determine if ktest fails
> > other than greping the output log?
> 
> If you have DIE_ON_FAILURE = 1 (default) it should exit with non zero.

It doesn't do that, test it and see (this is with what is in Linus's
3.6-rc7 tree, I didn't test linux-next if that is newer, my apologies.)

> > Oh, and any hints on kicking off a ktest process on a remote machine in
> > a "simple" way?  I'm just using ssh to copy over a script that runs
> > there, wrapping ktest.pl up with other stuff, I didn't miss the fact
> > that ktest itself can run remotely already, did I?
> 
> I'm a little confused by this question. Do you want a server ktest? That
> is, have a ktest daemon that listens for clients that sends it config
> files and then runs them? That would actually be a fun project ;-)
> 
> You're not running ktest on the target machine are you? The way I use it
> is the following:
> 
> I have a server that I ssh to and run ktest from. It does all the builds
> there on the server and this server has a means to monitor some target.
> I use ttywatch that connects to the serial of the target, in which ktest
> uses to read from.
> 
> Sometimes this "server" is the machine I'm logged in to.  And I just run
> ktest directly.
> 
> Can you explain more of what you are looking for?

I want to be able to say:
- take this set of stable patches and go run a 'make
  allmodconfig' build on a remote machine and email me back the
  answer because I might not be able to keep an internet
  connection open for the next 5-15 minutes it might take to
  complete that task.

I don't do boot tests with these kernel build tests, although sometime
in the future it would be nice to do that.  Right now I do that testing
manually, as it's pretty infrequent (once per release usually.)

So yes, a 'ktest' server would be nice.  I've attached the (horrible)
script below that I'm using for this so far.  It seems to work well, and
I can do builds on a "cloud" server as well as my local build server
just fine, only thing needed to do is change the user and machine name
in the script.

I know ktest doesn't handle quilt patches yet, which is why I apply them
"by hand" now to a given git tree branch, if you ever do add that
option, I'll gladly test it out and change my script to use whatever
format it needs.

thanks,

greg k-h
#!/bin/bash
#
# Testing script to take a stable kernel patch set, build it on a remote
# machine using ktest, and email back the results.
#
# Copyright 2012 Greg Kroah-Hartman 
#
# Released under the GPLv2 only.
#
#
# Some variables you might want to mess with are:
#
# EMAIL: who to send the email to
# REMOTE_STABLE_GIT: on the remote machine, where the linux-stable git tree is 
located
# REMOTE_WORK: on the remote machine, what temporary location we can use to 
create a subdirectory and do our work in
# REMOTE_SERVER: the remote machine name
# REMOTE_USER: the username to run the script on the remote machine
# LOCAL_WORK: temporay location on the local machine to create some files 
before we copy them to the remote machine
# LOCAL_KTEST: local location of a version of ktest.pl that you want to run 
remotely (usually better than the version in the stable tree under testing due 
to age issues.)
# QUEUE_DIR: local location of the stable-queue git tree we are wanting to test


EMAIL="g...@kroah.com"

REMOTE_STABLE_GIT="/home/gregkh/linux/stable/linux-stable/"
REMOTE_WORK="/home/gregkh/tmp/"
REMOTE_SERVER="build"
REMOTE_USER="gregkh"

LOCAL_WORK="/tmp/"

Re: [PATCH v2 memstick: support for legacy sony memsticks

2012-09-25 Thread Tejun Heo
On Tue, Sep 25, 2012 at 09:34:39PM +0200, Maxim Levitsky wrote:
> But this just adds the WQ_UNBOUND. Dunno, without lock I had several
> crashes, that for high level of confidence caused by by parallel
> execution of work items. Once I added this mutex, I couldnt reproduce
> these.

Yes the combination of WQ_UNBOUND and max_active==1 guarantees
strictly ordered one-by-one execution.

> I had the __blk_end_request fail with NULL msb->req. I can't see how
> that can happen if work queue isn't executed in parallel.
> (and then the I didn't even had by mistake the code that sets it to NULL
> in msb_stop, so I really fail to see how that could happen due internal
> bug in my code. 

If you're seeing parallel execution w/ ordered workqueue, it is a
critical bug which would make the kernel crash left and right.  Please
try alloc_ordered_workqueue() and if you still see parallel execution,
please report.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memstick: add support for legacy memorysticks

2012-09-25 Thread Tejun Heo
Hello, Maxim.

On Tue, Sep 25, 2012 at 09:26:13PM +0200, Maxim Levitsky wrote:
> > Probably not the best idea to use a name this generic in driver code.
> > linux/scatterlist.h likely might wanna use the name.
>
> Lets not go this route again. I already once submitted these, and had
> a share of problems with merging these poor functions into the scatter
> list.
> scatter list users mostly dont need these as they just translate it into
> hardware specific representation.
> In my case, I don't and yet its easier that working with BIOs. 

Hmmm... then please at least add a prefix to the names.

> > Also, from what it does, it seems sg_copy() is a bit of misnomer.
> > Rename it to sg_remap_with_offset() or something and move it to
> > lib/scatterlist.c?
>
> Don't think so. This copies part of a scatter list into another
> scatterlist.
> I have to use is as memstick underlying drivers expect a single
> scatterlist for each 512 bytes sector I read. Yes, it contains just one
> entry, but still. I haven't designed the interface. 

It doesn't really matter if it's a function only used in the driver,
but please don't use sg_copy() as its name.

> > Maybe we can make sg_copy_buffer() more generic so that it takes a
> > callback and implement this on top of it?  Having sglist manipulation
> > scattered isn't too attractive.
>
> Again this is very specific to my driver. Usually nobody pokes the
> scatterlists. 

The problem is that there are talks of improving sglist handling (make
it more generic, unify it with bvec and so on) and this sort of one
off direct manipulations often become headaches afterwards, so if at
all possible it's best to keep stuff centralized.

> > Is it really necessary to implement explicit state machine?  Can't you
> > just throw a work item at it and process it synchronously?  Explicit
> > state machines can save some resources at the cost of a lot more
> > complexity and generally making things a lot more fragile.  Is it
> > really worth it here?
>
> It would be awesome to not use these ugly state machines, but I have to,
> because this is required by underlying driver interface.
> Its callback driven, that is underlying driver calls into this driver,
> when it wants, and I supply it new command to execute.
> Unfortunately with legacy memsticks, to read or write a sector, one
> needs to send it many different commands. Thats why these state
> machines. I at least made them more or less consistent.
> Just take a look at mspro_blk.c

I see.  Eeeek...

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 memstick: support for legacy sony memsticks

2012-09-25 Thread Maxim Levitsky
On Tue, 2012-09-25 at 11:02 -0700, Tejun Heo wrote: 
> Hello,
> 
> > * Swithced to using a workqueue.
> >   Unfortunelly, I still see that workqueue items are executed in parallel
> >   I suspect that this happens if one work item sleeps. In this case I
> >   don't want other work items to run too. I fixed this with a mutex, and 
> > anyway
> >   it nice to have it to guaranteee this.
> 
> You can use alloc_ordered_workqueue("namefmt", WQ_MEM_RECLAIM).  No
> mutex needed for inter-work exclusion.
But this just adds the WQ_UNBOUND. Dunno, without lock I had several
crashes, that for high level of confidence caused by by parallel
execution of work items. Once I added this mutex, I couldnt reproduce
these.

I had the __blk_end_request fail with NULL msb->req. I can't see how
that can happen if work queue isn't executed in parallel.
(and then the I didn't even had by mistake the code that sets it to NULL
in msb_stop, so I really fail to see how that could happen due internal
bug in my code. 
> 
> Thanks.
> 

-- 
Best regards,
Maxim Levitsky


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: WARNING: at kernel/workqueue.c:1034 __queue_work+0x23d/0x2d7()

2012-09-25 Thread Tejun Heo
Hello,

On Tue, Sep 25, 2012 at 09:28:47PM +0200, Borislav Petkov wrote:
> FYI: just hit this below on plain -rc7. Only once so far...
> 
> [ 5294.811825] [ cut here ]
> [ 5294.816682] WARNING: at kernel/workqueue.c:1034 __queue_work+0x23d/0x2d7()
> [ 5294.823816] Hardware name: Dinar
> [ 5294.827165] Modules linked in: ohci_hcd radeon kvm_amd ttm drm_kms_helper 
> kvm hwmon backlight cfbcopyarea e1000e cfbimgblt ehci_hcd amd64_edac_mod 
> cfbfillrect edac_core microcode
> [ 5294.844005] Pid: 5076, comm: test Not tainted 3.6.0-rc7 #2
> [ 5294.849689] Call Trace:
> [ 5294.852232][] warn_slowpath_common+0x85/0x9d
> [ 5294.859145]  [] warn_slowpath_null+0x1a/0x1c
> [ 5294.865206]  [] __queue_work+0x23d/0x2d7
> [ 5294.877240]  [] delayed_work_timer_fn+0x2a/0x2e
> [ 5294.883573]  [] run_timer_softirq+0x264/0x381
> [ 5294.908176]  [] __do_softirq+0xdc/0x1de
> [ 5294.913786]  [] call_softirq+0x1c/0x30
> [ 5294.919304]  [] do_softirq+0x3d/0x86
> [ 5294.924646]  [] irq_exit+0x53/0xb2
> [ 5294.929806]  [] smp_apic_timer_interrupt+0x8b/0x99
> [ 5294.936408]  [] apic_timer_interrupt+0x6c/0x80
> [ 5294.942640][] ? retint_swapgs+0xe/0x13
> [ 5294.949000] ---[ end trace eaf0764fd1698db2 ]---

That's __queue_work() seeing a delayed work item which either hasn't
been initialized properly or got corrupted somehow.  Can you please
stick printk("XXX offending function = %pf\n", work->func); right
below the WARN_ON() and try to reproduce the problem?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


WARNING: at kernel/workqueue.c:1034 __queue_work+0x23d/0x2d7()

2012-09-25 Thread Borislav Petkov
FYI: just hit this below on plain -rc7. Only once so far...

[ 5294.811825] [ cut here ]
[ 5294.816682] WARNING: at kernel/workqueue.c:1034 __queue_work+0x23d/0x2d7()
[ 5294.823816] Hardware name: Dinar
[ 5294.827165] Modules linked in: ohci_hcd radeon kvm_amd ttm drm_kms_helper 
kvm hwmon backlight cfbcopyarea e1000e cfbimgblt ehci_hcd amd64_edac_mod 
cfbfillrect edac_core microcode
[ 5294.844005] Pid: 5076, comm: test Not tainted 3.6.0-rc7 #2
[ 5294.849689] Call Trace:
[ 5294.852232][] warn_slowpath_common+0x85/0x9d
[ 5294.859145]  [] warn_slowpath_null+0x1a/0x1c
[ 5294.865206]  [] __queue_work+0x23d/0x2d7
[ 5294.870910]  [] ? run_timer_softirq+0x1df/0x381
[ 5294.877240]  [] delayed_work_timer_fn+0x2a/0x2e
[ 5294.883573]  [] run_timer_softirq+0x264/0x381
[ 5294.889721]  [] ? run_timer_softirq+0x1df/0x381
[ 5294.896051]  [] ? __run_hrtimer+0xef/0x182
[ 5294.901933]  [] ? flush_delayed_work+0x46/0x46
[ 5294.908176]  [] __do_softirq+0xdc/0x1de
[ 5294.913786]  [] call_softirq+0x1c/0x30
[ 5294.919304]  [] do_softirq+0x3d/0x86
[ 5294.924646]  [] irq_exit+0x53/0xb2
[ 5294.929806]  [] smp_apic_timer_interrupt+0x8b/0x99
[ 5294.936408]  [] apic_timer_interrupt+0x6c/0x80
[ 5294.942640][] ? retint_swapgs+0xe/0x13
[ 5294.949000] ---[ end trace eaf0764fd1698db2 ]---

Thanks.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] memstick: add support for legacy memorysticks

2012-09-25 Thread Maxim Levitsky
On Tue, 2012-09-25 at 11:25 -0700, Tejun Heo wrote: 
> Hello,
> 
> On Tue, Sep 25, 2012 at 10:38:46AM +0200, Maxim Levitsky wrote:
> > diff --git a/drivers/memstick/core/ms_block.c 
> > b/drivers/memstick/core/ms_block.c
> > new file mode 100644
> > index 000..318e40b
> > --- /dev/null
> > +++ b/drivers/memstick/core/ms_block.c
> > @@ -0,0 +1,2422 @@
> > +/*
> > + *  ms_block.c - Sony MemoryStick (legacy) storage support
> > +
> 
> Missing '*'?
> 
> > + *  Copyright (C) 2012 Maxim Levitsky 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + *
> > + * Minor portions of the driver were copied from mspro_block.c which is
> > + * Copyright (C) 2007 Alex Dubov 
> > + *
> > + */
> ...
> > +static size_t sg_copy(struct scatterlist *sg_from, struct scatterlist 
> > *sg_to,
> > +   int to_nents, size_t offset, size_t len)
> 
> Probably not the best idea to use a name this generic in driver code.
> linux/scatterlist.h likely might wanna use the name.
Lets not go this route again. I already once submitted these, and had
a share of problems with merging these poor functions into the scatter
list.
scatter list users mostly dont need these as they just translate it into
hardware specific representation.
In my case, I don't and yet its easier that working with BIOs. 
> 
> > +{
> > +   size_t copied = 0;
> > +
> > +   while (offset > 0) {
> > +
> > +   if (offset >= sg_from->length) {
> > +   if (sg_is_last(sg_from))
> > +   return 0;
> > +
> > +   offset -= sg_from->length;
> > +   sg_from = sg_next(sg_from);
> > +   continue;
> > +   }
> > +
> > +   copied = min(len, sg_from->length - offset);
> > +   sg_set_page(sg_to, sg_page(sg_from),
> > +   copied, sg_from->offset + offset);
> > +
> > +   len -= copied;
> > +   offset = 0;
> > +
> > +   if (sg_is_last(sg_from) || !len)
> > +   goto out;
> > +
> > +   sg_to = sg_next(sg_to);
> > +   to_nents--;
> > +   sg_from = sg_next(sg_from);
> > +   }
> > +
> > +   while (len > sg_from->length && to_nents--) {
> > +
> > +   len -= sg_from->length;
> > +   copied += sg_from->length;
> > +
> > +   sg_set_page(sg_to, sg_page(sg_from),
> > +   sg_from->length, sg_from->offset);
> > +
> > +   if (sg_is_last(sg_from) || !len)
> > +   goto out;
> > +
> > +   sg_from = sg_next(sg_from);
> > +   sg_to = sg_next(sg_to);
> > +   }
> > +
> > +   if (len && to_nents) {
> > +   sg_set_page(sg_to, sg_page(sg_from), len, sg_from->offset);
> > +   copied += len;
> > +   }
> > +
> > +out:
> > +   sg_mark_end(sg_to);
> > +   return copied;
> > +}
> 
> Also, from what it does, it seems sg_copy() is a bit of misnomer.
> Rename it to sg_remap_with_offset() or something and move it to
> lib/scatterlist.c?
Don't think so. This copies part of a scatter list into another
scatterlist.
I have to use is as memstick underlying drivers expect a single
scatterlist for each 512 bytes sector I read. Yes, it contains just one
entry, but still. I haven't designed the interface. 
> 
> > +/*
> > + * Compares section of 'sg' starting from offset 'offset' and with length 
> > 'len'
> > + * to linear buffer of length 'len' at address 'buffer'
> > + * Returns 0 if equal and  -1 otherwice
> > + */
> > +static int sg_compare_to_buffer(struct scatterlist *sg,
> > +   size_t offset, u8 *buffer, size_t len)
> > +{
> > +   int retval = 0;
> > +   struct sg_mapping_iter miter;
> > +
> > +   sg_miter_start(, sg, sg_nents(sg),
> > +   SG_MITER_ATOMIC | SG_MITER_FROM_SG);
> > +
> > +   while (sg_miter_next() && len > 0) {
> > +
> > +   int cmplen;
> > +
> > +   if (offset >= miter.length) {
> > +   offset -= miter.length;
> > +   continue;
> > +   }
> > +
> > +   cmplen = min(miter.length - offset, len);
> > +   retval = memcmp(miter.addr + offset, buffer, cmplen) ? -1 : 0;
> > +   if (retval)
> > +   break;
> > +
> > +   buffer += cmplen;
> > +   len -= cmplen;
> > +   offset = 0;
> > +   }
> > +
> > +   if (!retval && len)
> > +   retval = -1;
> > +
> > +   sg_miter_stop();
> > +   return retval;
> > +}
> 
> Maybe we can make sg_copy_buffer() more generic so that it takes a
> callback and implement this on top of it?  Having sglist manipulation
> scattered isn't too attractive.
Again this is very specific to my driver. Usually nobody pokes the
scatterlists. 
> 
> ...
> > +/*
> > + * This function is a handler for reads of one page from 

RE: [RFC] mm: add support for zsmalloc and zcache

2012-09-25 Thread Dan Magenheimer
> From: Sasha Levin [mailto:levinsasha...@gmail.com]
> Subject: Re: [RFC] mm: add support for zsmalloc and zcache

Sorry for delayed response!
 
> On 09/22/2012 03:31 PM, Sasha Levin wrote:
> > On 09/21/2012 09:14 PM, Dan Magenheimer wrote:
>  +#define MAX_CLIENTS 16
> 
>  Seems a bit arbitrary. Why 16?
> >> Sasha Levin posted a patch to fix this but it was tied in to
> >> the proposed KVM implementation, so was never merged.
> >>
> >
> > My patch changed the max pools per client, not the maximum amount of 
> > clients.
> > That patch has already found it's way in.
> >
> > (MAX_CLIENTS does look like an arbitrary number though).
> 
> btw, while we're on the subject of KVM, the implementation of tmem/kvm was
> blocked due to insufficient performance caused by the lack of multi-page
> ops/batching.

Hmmm... I recall that was an unproven assertion.  The tmem/kvm
implementation was not exposed to any wide range of workloads
IIRC?  Also, the WasActive patch is intended to reduce the problem
that multi-guest high volume reads would provoke, so any testing
without that patch may be moot.
 
> Are there any plans to make it better in the future?

If it indeed proves to be a problem, the ramster-merged zcache
(aka zcache2) should be capable of managing a "split" zcache
implementation, i.e. zcache executing in the guest and "overflowing"
page cache pages to the zcache in the host, which should at least
ameliorate most of Avi's concern.  I personally have no plans
to implement that, but would be willing to assist if others
attempt to implement it.

The other main concern expressed by the KVM community, by
Andrea, was zcache's lack of ability to "overflow" frontswap
pages in the host to a real swap device.  The foundation
for that was one of the objectives of the zcache2 redesign;
I am working on a "yet-to-be-posted" patch built on top of zcache2
that will require some insight and review from MM experts.

Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] PNP: Unbind drivers if the new driver matches _HID rather than _CID

2012-09-25 Thread Matthew Garrett
On Tue, Sep 25, 2012 at 01:04:25PM -0600, Bjorn Helgaas wrote:
> On Tue, Sep 25, 2012 at 7:25 AM, Matthew Garrett  wrote:

> Do you know of any scenarios besides this IPMI one where there's the
> possibility of two drivers matching the same device?  If so, does the
> detach/attach process work reasonably?  My worry is that drivers don't
> normally give up devices, so the detach path is not well exercised.
> And I don't know what happens to any users of the device during the
> switch.  For example, if something was using a TPM and we replaced the
> driver, what does that look like to the user?

Yeah, this could definitely happen with TPM - tpm_infinion could 
displace tpm_tis. This actually flags up something kind of obviously 
broken in the TPM code, since tpm_infineon comes *after* tpm_tis in the 
link order despite being more specific. Winning. It looks like there's a 
valid tpm_release function, but I'll find an infineon machine and figure 
out whether it actually works or not.

-- 
Matthew Garrett | mj...@srcf.ucam.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/resend/bypass] um: Preinclude include/linux/kern_levels.h

2012-09-25 Thread Linus Torvalds
On Tue, Sep 25, 2012 at 12:11 PM, Geert Uytterhoeven
 wrote:
>
> To fix this:
>   - Move the mapping from UM_KERN_ to KERN_ from
> arch/um/include/shared/common-offsets.h to
> arch/um/include/shared/user.h, which is preincluded for all userspace
> parts,
>   - Preinclude include/linux/kern_levels.h for all userspace parts, to
> obtain the in-kernel KERN_ constant definitions. This doesn't
> violate the kernel/userspace separation, as include/linux/kern_levels.h
> is self-contained and doesn't expose any other kernel internals.
>   - Remove the now unused STR() and DEFINE_STR() macros.

Ugh.

Why do you preinclude kern_levels.h instead of just having a
"#include" in user.h?

IOW, this part of the patch:

-   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include user.h
$(CFLAGS_$(basetarget).o)
+   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include
$(srctree)/include/linux/kern_levels.h -include user.h
$(CFLAGS_$(basetarget).o)

just makes me go want to puke. The user.h file already has other
#include's in it, so I really don't see why you create this insane
special case.

And why does UM have those "UM_KERN_XYZ" defines in the first place?
Why isn't it just using KERN_XYZ directly? Is it because kern_levels.h
didn't use to exist, so it was some kind of "let's create our own that
we can hide in our special headers".

IOW, I really thinks this patch makes things uglier. At the very least
it could be done more prettily, but preferably we'd get rid of the odd
and useless UM_ prefix from these things entirely.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/8] sta2x11-mfd : add apb-soc regs driver and factor out common code

2012-09-25 Thread Mark Brown
On Wed, Sep 12, 2012 at 12:22:47PM +0200, cimina...@gnudd.com wrote:
> From: Davide Ciminaghi 
> 
> A driver for the apb-soc registers is needed by the clock
> infrastructure code to configure and control clocks on the sta2x11
> chip.
> Since some of the functions in sta2x11-mfd.c were almost identical
> for the two existing platform devices, the following changes
> have been performed to avoid further code duplication while
> adding the apb-soc-regs driver:

Glancing at the diff here this looks a lot like regmap-mmio...  not sure
if it is or not, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] trace: Move trace event enable from fs_initcall to core_initcall

2012-09-25 Thread Steven Rostedt
From: Ezequiel Garcia 

This patch splits trace event initialization in two stages:
 * ftrace enable
 * sysfs event entry creation

This allows to capture trace events from an earlier point
by using 'trace_event' kernel parameter and is important
to trace boot-up allocations.

Note that, in order to enable events at core_initcall,
it's necessary to move init_ftrace_syscalls() from
core_initcall to early_initcall.

Link: 
http://lkml.kernel.org/r/1347461277-25302-1-git-send-email-elezegar...@gmail.com

Signed-off-by: Ezequiel Garcia 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace_events.c   |  108 +++--
 kernel/trace/trace_syscalls.c |2 +-
 2 files changed, 73 insertions(+), 37 deletions(-)

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index bbb0e63..d608d09 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1199,6 +1199,31 @@ event_create_dir(struct ftrace_event_call *call, struct 
dentry *d_events,
return 0;
 }
 
+static void event_remove(struct ftrace_event_call *call)
+{
+   ftrace_event_enable_disable(call, 0);
+   if (call->event.funcs)
+   __unregister_ftrace_event(>event);
+   list_del(>list);
+}
+
+static int event_init(struct ftrace_event_call *call)
+{
+   int ret = 0;
+
+   if (WARN_ON(!call->name))
+   return -EINVAL;
+
+   if (call->class->raw_init) {
+   ret = call->class->raw_init(call);
+   if (ret < 0 && ret != -ENOSYS)
+   pr_warn("Could not initialize trace events/%s\n",
+   call->name);
+   }
+
+   return ret;
+}
+
 static int
 __trace_add_event_call(struct ftrace_event_call *call, struct module *mod,
   const struct file_operations *id,
@@ -1209,19 +1234,9 @@ __trace_add_event_call(struct ftrace_event_call *call, 
struct module *mod,
struct dentry *d_events;
int ret;
 
-   /* The linker may leave blanks */
-   if (!call->name)
-   return -EINVAL;
-
-   if (call->class->raw_init) {
-   ret = call->class->raw_init(call);
-   if (ret < 0) {
-   if (ret != -ENOSYS)
-   pr_warning("Could not initialize trace 
events/%s\n",
-  call->name);
-   return ret;
-   }
-   }
+   ret = event_init(call);
+   if (ret < 0)
+   return ret;
 
d_events = event_trace_events_dir();
if (!d_events)
@@ -1272,13 +1287,10 @@ static void remove_subsystem_dir(const char *name)
  */
 static void __trace_remove_event_call(struct ftrace_event_call *call)
 {
-   ftrace_event_enable_disable(call, 0);
-   if (call->event.funcs)
-   __unregister_ftrace_event(>event);
-   debugfs_remove_recursive(call->dir);
-   list_del(>list);
+   event_remove(call);
trace_destroy_fields(call);
destroy_preds(call);
+   debugfs_remove_recursive(call->dir);
remove_subsystem_dir(call->class->system);
 }
 
@@ -1450,15 +1462,43 @@ static __init int setup_trace_event(char *str)
 }
 __setup("trace_event=", setup_trace_event);
 
+static __init int event_trace_enable(void)
+{
+   struct ftrace_event_call **iter, *call;
+   char *buf = bootup_event_buf;
+   char *token;
+   int ret;
+
+   for_each_event(iter, __start_ftrace_events, __stop_ftrace_events) {
+
+   call = *iter;
+   ret = event_init(call);
+   if (!ret)
+   list_add(>list, _events);
+   }
+
+   while (true) {
+   token = strsep(, ",");
+
+   if (!token)
+   break;
+   if (!*token)
+   continue;
+
+   ret = ftrace_set_clr_event(token, 1);
+   if (ret)
+   pr_warn("Failed to enable trace event: %s\n", token);
+   }
+   return 0;
+}
+
 static __init int event_trace_init(void)
 {
-   struct ftrace_event_call **call;
+   struct ftrace_event_call *call;
struct dentry *d_tracer;
struct dentry *entry;
struct dentry *d_events;
int ret;
-   char *buf = bootup_event_buf;
-   char *token;
 
d_tracer = tracing_init_dentry();
if (!d_tracer)
@@ -1497,24 +1537,19 @@ static __init int event_trace_init(void)
if (trace_define_common_fields())
pr_warning("tracing: Failed to allocate common fields");
 
-   for_each_event(call, __start_ftrace_events, __stop_ftrace_events) {
-   __trace_add_event_call(*call, NULL, _event_id_fops,
+   /*
+* Early initialization already enabled ftrace event.
+* Now it's only necessary to create the event directory.
+*/
+   list_for_each_entry(call, _events, list) {
+
+   ret = 

[PATCH 0/2] [GIT PULL][v3.7] tracing: A couple more updates

2012-09-25 Thread Steven Rostedt

Ingo,

Please pull the latest tip/perf/core tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/core

Head SHA1: 8781915ad2716adcd8cd5cc52cee791fc8b00fdf


Ezequiel Garcia (1):
  trace: Move trace event enable from fs_initcall to core_initcall

Mandeep Singh Baines (1):
  tracing: Add an option for disabling markers


 kernel/trace/trace.c  |6 ++-
 kernel/trace/trace.h  |1 +
 kernel/trace/trace_events.c   |  108 +++--
 kernel/trace/trace_syscalls.c |2 +-
 4 files changed, 79 insertions(+), 38 deletions(-)


signature.asc
Description: This is a digitally signed message part


[PATCH 1/2] tracing: Add an option for disabling markers

2012-09-25 Thread Steven Rostedt
From: Mandeep Singh Baines 

In our application, we have trace markers spread through user-space.
We have markers in GL, X, etc. These are super handy for Chrome's
about:tracing feature (Chrome + system + kernel trace view), but
can be very distracting when you're trying to debug a kernel issue.

I normally, use "grep -v tracing_mark_write" but it would be nice
if I could just temporarily disable markers all together.

Link: 
http://lkml.kernel.org/r/1347066739-26285-1-git-send-email-...@chromium.org

CC: Frederic Weisbecker 
Signed-off-by: Mandeep Singh Baines 
Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c |6 +-
 kernel/trace/trace.h |1 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 08acf42..1ec5c1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -328,7 +328,7 @@ static DECLARE_WAIT_QUEUE_HEAD(trace_wait);
 unsigned long trace_flags = TRACE_ITER_PRINT_PARENT | TRACE_ITER_PRINTK |
TRACE_ITER_ANNOTATE | TRACE_ITER_CONTEXT_INFO | TRACE_ITER_SLEEP_TIME |
TRACE_ITER_GRAPH_TIME | TRACE_ITER_RECORD_CMD | TRACE_ITER_OVERWRITE |
-   TRACE_ITER_IRQ_INFO;
+   TRACE_ITER_IRQ_INFO | TRACE_ITER_MARKERS;
 
 static int trace_stop_count;
 static DEFINE_RAW_SPINLOCK(tracing_start_lock);
@@ -470,6 +470,7 @@ static const char *trace_options[] = {
"overwrite",
"disable_on_free",
"irq-info",
+   "markers",
NULL
 };
 
@@ -3886,6 +3887,9 @@ tracing_mark_write(struct file *filp, const char __user 
*ubuf,
if (tracing_disabled)
return -EINVAL;
 
+   if (!(trace_flags & TRACE_ITER_MARKERS))
+   return -EINVAL;
+
if (cnt > TRACE_BUF_SIZE)
cnt = TRACE_BUF_SIZE;
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 593debe..63a2da0 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -680,6 +680,7 @@ enum trace_iterator_flags {
TRACE_ITER_OVERWRITE= 0x20,
TRACE_ITER_STOP_ON_FREE = 0x40,
TRACE_ITER_IRQ_INFO = 0x80,
+   TRACE_ITER_MARKERS  = 0x100,
 };
 
 /*
-- 
1.7.10.4




signature.asc
Description: This is a digitally signed message part


[PATCH/resend/bypass] um: Preinclude include/linux/kern_levels.h

2012-09-25 Thread Geert Uytterhoeven
The userspace part of UML uses the asm-offsets.h generator mechanism to
create definitions for UM_KERN_ that match the in-kernel
KERN_ constant definitions.

As of commit 04d2c8c83d0e3ac5f78aeede51babb3236200112 ("printk: convert
the format for KERN_ to a 2 byte pattern"), KERN_ is no
longer expanded to the literal '""', but to '"\001" "LEVEL"', i.e.
it contains two parts.

However, the combo of DEFINE_STR() in
arch/x86/um/shared/sysdep/kernel-offsets.h and sed-y in Kbuild doesn't
support string literals consisting of multiple parts. Hence for all
UM_KERN_ definitions, only the SOH character is retained in the actual
definition, while the remainder ends up in the comment. E.g. in
include/generated/asm-offsets.h we get

#define UM_KERN_INFO "\001" /* "6" KERN_INFO */

instead of

#define UM_KERN_INFO "\001" "6" /* KERN_INFO */

This causes spurious '^A' output in some kernel messages:

Calibrating delay loop... 4640.76 BogoMIPS (lpj=23203840)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 256
^AChecking that host ptys support output SIGIO...Yes
^AChecking that host ptys support SIGIO on close...No, enabling workaround
^AUsing 2.6 host AIO
NET: Registered protocol family 16
bio: create slab  at 0
Switching to clocksource itimer

To fix this:
  - Move the mapping from UM_KERN_ to KERN_ from
arch/um/include/shared/common-offsets.h to
arch/um/include/shared/user.h, which is preincluded for all userspace
parts,
  - Preinclude include/linux/kern_levels.h for all userspace parts, to
obtain the in-kernel KERN_ constant definitions. This doesn't
violate the kernel/userspace separation, as include/linux/kern_levels.h
is self-contained and doesn't expose any other kernel internals.
  - Remove the now unused STR() and DEFINE_STR() macros.

Signed-off-by: Geert Uytterhoeven 
---
This fixes a regression from the KERN_ conversion

 arch/um/include/shared/common-offsets.h|   10 --
 arch/um/include/shared/user.h  |   11 +++
 arch/um/scripts/Makefile.rules |2 +-
 arch/x86/um/shared/sysdep/kernel-offsets.h |3 ---
 4 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/arch/um/include/shared/common-offsets.h 
b/arch/um/include/shared/common-offsets.h
index 40db8f7..2df313b 100644
--- a/arch/um/include/shared/common-offsets.h
+++ b/arch/um/include/shared/common-offsets.h
@@ -7,16 +7,6 @@ DEFINE(UM_KERN_PAGE_MASK, PAGE_MASK);
 DEFINE(UM_KERN_PAGE_SHIFT, PAGE_SHIFT);
 DEFINE(UM_NSEC_PER_SEC, NSEC_PER_SEC);
 
-DEFINE_STR(UM_KERN_EMERG, KERN_EMERG);
-DEFINE_STR(UM_KERN_ALERT, KERN_ALERT);
-DEFINE_STR(UM_KERN_CRIT, KERN_CRIT);
-DEFINE_STR(UM_KERN_ERR, KERN_ERR);
-DEFINE_STR(UM_KERN_WARNING, KERN_WARNING);
-DEFINE_STR(UM_KERN_NOTICE, KERN_NOTICE);
-DEFINE_STR(UM_KERN_INFO, KERN_INFO);
-DEFINE_STR(UM_KERN_DEBUG, KERN_DEBUG);
-DEFINE_STR(UM_KERN_CONT, KERN_CONT);
-
 DEFINE(UM_ELF_CLASS, ELF_CLASS);
 DEFINE(UM_ELFCLASS32, ELFCLASS32);
 DEFINE(UM_ELFCLASS64, ELFCLASS64);
diff --git a/arch/um/include/shared/user.h b/arch/um/include/shared/user.h
index 4fa82c0..cef0685 100644
--- a/arch/um/include/shared/user.h
+++ b/arch/um/include/shared/user.h
@@ -26,6 +26,17 @@
 extern void panic(const char *fmt, ...)
__attribute__ ((format (printf, 1, 2)));
 
+/* Requires preincluding include/linux/kern_levels.h */
+#define UM_KERN_EMERG  KERN_EMERG
+#define UM_KERN_ALERT  KERN_ALERT
+#define UM_KERN_CRIT   KERN_CRIT
+#define UM_KERN_ERRKERN_ERR
+#define UM_KERN_WARNINGKERN_WARNING
+#define UM_KERN_NOTICE KERN_NOTICE
+#define UM_KERN_INFO   KERN_INFO
+#define UM_KERN_DEBUG  KERN_DEBUG
+#define UM_KERN_CONT   KERN_CONT
+
 #ifdef UML_CONFIG_PRINTK
 extern int printk(const char *fmt, ...)
__attribute__ ((format (printf, 1, 2)));
diff --git a/arch/um/scripts/Makefile.rules b/arch/um/scripts/Makefile.rules
index d50270d..15889df 100644
--- a/arch/um/scripts/Makefile.rules
+++ b/arch/um/scripts/Makefile.rules
@@ -8,7 +8,7 @@ USER_OBJS += $(filter %_user.o,$(obj-y) $(obj-m)  
$(USER_SINGLE_OBJS))
 USER_OBJS := $(foreach file,$(USER_OBJS),$(obj)/$(file))
 
 $(USER_OBJS:.o=.%): \
-   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include user.h 
$(CFLAGS_$(basetarget).o)
+   c_flags = -Wp,-MD,$(depfile) $(USER_CFLAGS) -include 
$(srctree)/include/linux/kern_levels.h -include user.h $(CFLAGS_$(basetarget).o)
 
 # These are like USER_OBJS but filter USER_CFLAGS through unprofile instead of
 # using it directly.
diff --git a/arch/x86/um/shared/sysdep/kernel-offsets.h 
b/arch/x86/um/shared/sysdep/kernel-offsets.h
index 5868526..46a9df9 100644
--- a/arch/x86/um/shared/sysdep/kernel-offsets.h
+++ b/arch/x86/um/shared/sysdep/kernel-offsets.h
@@ -7,9 +7,6 @@
 #define DEFINE(sym, val) \
asm volatile("\n->" #sym " %0 " #val : : "i" (val))
 
-#define STR(x) #x
-#define DEFINE_STR(sym, val) asm volatile("\n->" #sym " " STR(val) " " #val: : 
)
-
 #define BLANK() asm volatile("\n->" 

Re: Slow Resume with SSD

2012-09-25 Thread Carlos Moffat

Hi

On 09/25/2012 12:07 PM, Srivatsa S. Bhat wrote:

On 09/26/2012 12:00 AM, Carlos Moffat wrote:

Hi,

(please let me know if this is the wrong list to ask this)

I have a Crucial M4 512 GB SSD installed on my Thinkpad X220 (Ubuntu
Precise). Overall this runs very nicely, but it takes 10+ seconds to
resume from suspend, apparently because some issue with the hardrive.
The only message I see while resuming is "COMRESET failed (errno=-16)".

[52483.228615] ata1: link is slow to respond, please be patient (ready=0)
[52487.870616] ata1: COMRESET failed (errno=-16)
[52488.190222] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[52488.190752] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
[52488.190754] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[52488.190755] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[52488.191849] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
succeeded
[52488.191855] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
LOCK) filtered out
[52488.191860] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
filtered out
[52488.192406] ata1.00: configured for UDMA/100
[52488.206298] sd 0:0:0:0: [sda] Starting disk
[52488.207334] Extended CMOS year: 2000
[52488.208335] PM: resume of devices complete after 10376.896 msecs
[52488.208552] PM: resume devices took 10.376 seconds

The only relevant post I've found was in the crucial support site:

http://forums.crucial.com/t5/Solid-State-Drives-SSD/SOLVED-M4-CT512M4SSD1-7mm-512Gb-SSD-too-slow-when-laptop-wakes/td-p/102666


which suggested adding libata.force=nohrst as a boot option to get rid
of the problem.

I tried that, but the laptop wouldn't suspend.

Any ideas?



(Adding relevant people to CC)

I recall seeing a similar problem getting fixed in mainline quite a long
time ago (around v3.3 I think). Did you try the latest mainline kernel?

Regards,
Srivatsa S. Bhat




Yes, I'm using 3.5.4 (Ubuntu Mainline packages).

Carlos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND 2/2] ARM: local timers: add timer support using IO mapped register

2012-09-25 Thread Rohit Vaswani

Any comments ?

Marc, would it be possible for you to pull this into your timers-next tree ?

-Rohit

On 9/15/2012 12:41 AM, Rohit Vaswani wrote:

The current arch_timer only support accessing through CP15 interface.
Add support for ARM processors that only support IO mapped register
interface. The memory mapped timer interface works with SPI
interrupts instead of PPI.

Signed-off-by: Rohit Vaswani 
---
  .../devicetree/bindings/arm/arch_timer.txt |9 +-
  arch/arm/kernel/arch_timer.c   |  299 +++-
  2 files changed, 297 insertions(+), 11 deletions(-)

diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt 
b/Documentation/devicetree/bindings/arm/arch_timer.txt
index 52478c8..8e01328 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -7,10 +7,13 @@ The timer is attached to a GIC to deliver its per-processor 
interrupts.
  
  ** Timer node properties:
  
-- compatible : Should at least contain "arm,armv7-timer".

+- compatible : Should at least contain "arm,armv7-timer" or
+  "arm,armv7-timer-mem" if using the memory mapped arch timer interface.
  
-- interrupts : Interrupt list for secure, non-secure, virtual and

-  hypervisor timers, in that order.
+- interrupts : If using the cp15 interface, the interrupt list for secure,
+  non-secure, virtual and hypervisor timers, in that order.
+  If using the memory mapped interface, list the interrupts for each core,
+  starting with core 0.
  
  - clock-frequency : The frequency of the main counter, in Hz. Optional.
  
diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c

index 8672a75..f79092d 100644
--- a/arch/arm/kernel/arch_timer.c
+++ b/arch/arm/kernel/arch_timer.c
@@ -17,7 +17,9 @@
  #include 
  #include 
  #include 
+#include 
  #include 
+#include 
  #include 
  
  #include 

@@ -44,6 +46,11 @@ extern void init_current_timer_delay(unsigned long freq);
  
  static bool arch_timer_use_virtual = true;
  
+static bool arch_timer_irq_percpu = true;

+static void __iomem *timer_base;
+static unsigned arch_timer_mem_irqs[NR_CPUS];
+static unsigned arch_timer_num_irqs;
+
  /*
   * Architected system timer support.
   */
@@ -56,8 +63,17 @@ static bool arch_timer_use_virtual = true;
  #define ARCH_TIMER_REG_FREQ   1
  #define ARCH_TIMER_REG_TVAL   2
  
+/* Iomapped Register Offsets */

+static unsigned arch_timer_mem_offset[] = {0x2C, 0x10, 0x28};
+#define ARCH_TIMER_CNTP_LOW_REG0x0
+#define ARCH_TIMER_CNTP_HIGH_REG   0x4
+#define ARCH_TIMER_CNTV_LOW_REG0x8
+#define ARCH_TIMER_CNTV_HIGH_REG   0xC
+
  #define ARCH_TIMER_PHYS_ACCESS0
  #define ARCH_TIMER_VIRT_ACCESS1
+#define ARCH_TIMER_MEM_PHYS_ACCESS 2
+#define ARCH_TIMER_MEM_VIRT_ACCESS 3
  
  /*

   * These register accessors are marked inline so the compiler can
@@ -88,6 +104,9 @@ static inline void arch_timer_reg_write(const int access, 
const int reg, u32 val
}
}
  
+	if (access == ARCH_TIMER_MEM_PHYS_ACCESS)

+   __raw_writel(val, timer_base + arch_timer_mem_offset[reg]);
+
isb();
  }
  
@@ -120,12 +139,16 @@ static inline u32 arch_timer_reg_read(const int access, const int reg)

}
}
  
+	if (access == ARCH_TIMER_MEM_PHYS_ACCESS)

+   val = __raw_readl(timer_base + arch_timer_mem_offset[reg]);
+
return val;
  }
  
  static inline cycle_t arch_timer_counter_read(const int access)

  {
cycle_t cval = 0;
+   u32 cvall, cvalh, thigh;
  
  	if (access == ARCH_TIMER_PHYS_ACCESS)

asm volatile("mrrc p15, 0, %Q0, %R0, c14" : "=r" (cval));
@@ -133,17 +156,49 @@ static inline cycle_t arch_timer_counter_read(const int 
access)
if (access == ARCH_TIMER_VIRT_ACCESS)
asm volatile("mrrc p15, 1, %Q0, %R0, c14" : "=r" (cval));
  
+	if (access == ARCH_TIMER_MEM_PHYS_ACCESS) {

+   do {
+   cvalh = __raw_readl(timer_base +
+   ARCH_TIMER_CNTP_HIGH_REG);
+   cvall = __raw_readl(timer_base +
+   ARCH_TIMER_CNTP_LOW_REG);
+   thigh = __raw_readl(timer_base +
+   ARCH_TIMER_CNTP_HIGH_REG);
+   } while (cvalh != thigh);
+
+   cval = ((cycle_t) cvalh << 32) | cvall;
+   }
+
+   if (access == ARCH_TIMER_MEM_VIRT_ACCESS) {
+   do {
+   cvalh = __raw_readl(timer_base +
+   ARCH_TIMER_CNTV_HIGH_REG);
+   cvall = __raw_readl(timer_base +
+   ARCH_TIMER_CNTV_LOW_REG);
+   thigh = __raw_readl(timer_base +
+   

Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Linus Torvalds
On Tue, Sep 25, 2012 at 11:42 AM, Borislav Petkov  wrote:
>>
>> Is this literally just removing it entirely?
>
> Basically yes:

Ok, so you make it just always select 'target'. Fine. I wondered if
you just removed the calling logic entirely.

>> How does pgbench look? That's the one that apparently really wants to
>> spread out, possibly due to user-level spinlocks. So I assume it will
>> show the reverse pattern, with "kill select_idle_sibling" being the
>> worst case.
>
> Let me run pgbench tomorrow (I had run it only on an older family 0x10
> single-node box) on Bulldozer to check that out. And we haven't started
> the multi-node measurements at all.

Ack, this clearly needs much more testing. That said, I really would
*love* to just get rid of the function entirely.

>> Sad, because it really would be lovely to just remove that thing ;)
>
> Right, so why did we need it all, in the first place? There has to be
> some reason for it.

I'm not entirely convinced.

Looking at the history of that thing, it's long and tortuous, and has
a few commits completely fixing the "logic" of it (eg see commit
99bd5e2f245d).

To the point where I don't think it necessarily even matches what the
original cause for it was. So it's *possible* that we have a case of
historical code that may have improved performance originally on at
least some machines, but that has (a) been changed due to it being
broken and (b) CPU's have changed too, so it may well be that it
simply doesn't help any more.

And we've had problems with this function before. See for example:
 - 4dcfe1025b51: sched: Avoid SMT siblings in select_idle_sibling() if possible
 - 518cd6234178: sched: Only queue remote wakeups when crossing cache boundaries

so we've basically had odd special-case "tuning" of this function from
the original. I do not think that there is any solid reason to believe
that it does what it used to do, or that what it used to do makes
sense any more.

It's entirely possible that "prev_cpu" basically ends up being the
better choice for spreading things out.

That said, my *guess* is that when you run pgbench, you'll see the
same regression that we saw due to Mike's patch too. It simply looks
like tbench wants to have minimal cpu selection and avoid moving
things around, while pgbench probably wants to spread out maximally.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Slow Resume with SSD

2012-09-25 Thread Srivatsa S. Bhat
On 09/26/2012 12:00 AM, Carlos Moffat wrote:
> Hi,
> 
> (please let me know if this is the wrong list to ask this)
> 
> I have a Crucial M4 512 GB SSD installed on my Thinkpad X220 (Ubuntu
> Precise). Overall this runs very nicely, but it takes 10+ seconds to
> resume from suspend, apparently because some issue with the hardrive.
> The only message I see while resuming is "COMRESET failed (errno=-16)".
> 
> [52483.228615] ata1: link is slow to respond, please be patient (ready=0)
> [52487.870616] ata1: COMRESET failed (errno=-16)
> [52488.190222] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> [52488.190752] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
> succeeded
> [52488.190754] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
> LOCK) filtered out
> [52488.190755] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [52488.191849] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES)
> succeeded
> [52488.191855] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE
> LOCK) filtered out
> [52488.191860] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES)
> filtered out
> [52488.192406] ata1.00: configured for UDMA/100
> [52488.206298] sd 0:0:0:0: [sda] Starting disk
> [52488.207334] Extended CMOS year: 2000
> [52488.208335] PM: resume of devices complete after 10376.896 msecs
> [52488.208552] PM: resume devices took 10.376 seconds
> 
> The only relevant post I've found was in the crucial support site:
> 
> http://forums.crucial.com/t5/Solid-State-Drives-SSD/SOLVED-M4-CT512M4SSD1-7mm-512Gb-SSD-too-slow-when-laptop-wakes/td-p/102666
> 
> 
> which suggested adding libata.force=nohrst as a boot option to get rid
> of the problem.
> 
> I tried that, but the laptop wouldn't suspend.
> 
> Any ideas?
> 

(Adding relevant people to CC)

I recall seeing a similar problem getting fixed in mainline quite a long
time ago (around v3.3 I think). Did you try the latest mainline kernel?
 
Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] PNP: Unbind drivers if the new driver matches _HID rather than _CID

2012-09-25 Thread Bjorn Helgaas
On Tue, Sep 25, 2012 at 7:25 AM, Matthew Garrett  wrote:
> ACPIPNP devices may have two levels of ID - the HID (a single string
> defining the hardware) and the CIDs (zero or more strings defining
> interfaces compatible with the hardware). If a driver matching a CID is
> bound first, it will be impossible to bind a driver that matches the HID
> despite it being more specific. This has been seen in the wild with
> platforms that define an IPMI device as:
>
> Device (NIPM)
> {
> Name (_HID, EisaId ("IPI0001"))
> Name (_CID, EisaId ("PNP0C01"))
>
> resulting in the device being claimed by the generic motherboard resource
> driver binding to the PNP0C01 CID. The IPMI driver attempts to bind later
> and fails, preventing the IPMI device from being made available to the ACPI
> layer.
>
> This can be avoided at driver probe time by detaching the existing driver
> if the new driver matches the device HID. Since each device can only have
> a single HID this will only permit more specific drivers to dislodge more
> generic drivers.
>
> Signed-off-by: Matthew Garrett 

Seems reasonable to me.  The HID/CID idea is not new to ACPI; both
isapnp and pnpbios seem to support it as well.

Do you know of any scenarios besides this IPMI one where there's the
possibility of two drivers matching the same device?  If so, does the
detach/attach process work reasonably?  My worry is that drivers don't
normally give up devices, so the detach path is not well exercised.
And I don't know what happens to any users of the device during the
switch.  For example, if something was using a TPM and we replaced the
driver, what does that look like to the user?

> ---
>  drivers/pnp/driver.c | 42 +++---
>  1 file changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/pnp/driver.c b/drivers/pnp/driver.c
> index 00e9403..8d22836 100644
> --- a/drivers/pnp/driver.c
> +++ b/drivers/pnp/driver.c
> @@ -25,6 +25,14 @@ static int compare_func(const char *ida, const char *idb)
> return 1;
>  }
>
> +static int compare_single_pnp_id(const char *ida, const char *idb)
> +{
> +   if (memcmp(ida, idb, 3) == 0)
> +   if (compare_func(ida, idb) == 1)
> +   return 1;
> +   return 0;
> +}
> +
>  int compare_pnp_id(struct pnp_id *pos, const char *id)
>  {
> if (!pos || !id || (strlen(id) != 7))
> @@ -32,9 +40,8 @@ int compare_pnp_id(struct pnp_id *pos, const char *id)
> if (memcmp(id, "ANYDEVS", 7) == 0)
> return 1;
> while (pos) {
> -   if (memcmp(pos->id, id, 3) == 0)
> -   if (compare_func(pos->id, id) == 1)
> -   return 1;
> +   if (compare_single_pnp_id(pos->id, id) == 1)
> +   return 1;
> pos = pos->next;
> }
> return 0;
> @@ -56,6 +63,22 @@ static const struct pnp_device_id *match_device(struct 
> pnp_driver *drv,
> return NULL;
>  }
>
> +static int match_hid(struct pnp_driver *drv, struct pnp_dev *dev)
> +{
> +   const struct pnp_device_id *drv_id = drv->id_table;
> +
> +   if (!drv_id)
> +   return 0;
> +
> +   while (_id->id) {
> +   /* The first ID is _HID */
> +   if (compare_single_pnp_id(dev->id->id, drv_id->id))
> +   return 1;
> +   drv_id++;
> +   }
> +   return 0;
> +}
> +
>  int pnp_device_attach(struct pnp_dev *pnp_dev)
>  {
> spin_lock(_lock);
> @@ -151,6 +174,19 @@ static int pnp_bus_match(struct device *dev, struct 
> device_driver *drv)
>
> if (match_device(pnp_drv, pnp_dev) == NULL)
> return 0;
> +
> +   /*
> +* ACPIPNP offers two levels of device ID - HID and CID. HID defines
> +* the specific device ID while CID represents the device
> +* compatibility IDs. If a device is matched by a compatibility ID
> +* first, it will be impossible for a hardware-specific driver to
> +* bind since there will already be a driver. We can handle this case
> +* by unbinding the original driver if the device has one bound and
> +* if the new driver matches the HID rather than a compatibility ID.
> +*/
> +   if (dev->driver && match_hid(pnp_drv, pnp_dev))
> +   device_release_driver(dev);
> +
> return 1;
>  }
>
> --
> 1.7.11.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] pagemap: fix wrong KPF_THP on slab pages

2012-09-25 Thread KOSAKI Motohiro
On Tue, Sep 25, 2012 at 1:05 PM, Naoya Horiguchi
 wrote:
> On Tue, Sep 25, 2012 at 11:59:51AM -0400, KOSAKI Motohiro wrote:
>> On Tue, Sep 25, 2012 at 9:56 AM, Naoya Horiguchi
>>  wrote:
>> > KPF_THP can be set on non-huge compound pages like slab pages, because
>> > PageTransCompound only sees PG_head and PG_tail. Obviously this is a bug
>> > and breaks user space applications which look for thp via /proc/kpageflags.
>> > Currently thp is constructed only on anonymous pages, so this patch makes
>> > KPF_THP be set when both of PageAnon and PageTransCompound are true.
>>
>> Indeed. Please add some comment too.
>
> Sure. I send revised one.
>
> Thanks,
> Naoya
> ---
> From: Naoya Horiguchi 
> Date: Mon, 24 Sep 2012 16:28:30 -0400
> Subject: [PATCH v2] pagemap: fix wrong KPF_THP on slab pages
>
> KPF_THP can be set on non-huge compound pages like slab pages, because
> PageTransCompound only sees PG_head and PG_tail. Obviously this is a bug
> and breaks user space applications which look for thp via /proc/kpageflags.
> Currently thp is constructed only on anonymous pages, so this patch makes
> KPF_THP be set when both of PageAnon and PageTransCompound are true.
>
> Changelog in v2:
>   - add a comment in code
>
> Signed-off-by: Naoya Horiguchi 
> ---
>  fs/proc/page.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 7fcd0d6..f7cd2f6c 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -115,7 +115,12 @@ u64 stable_page_flags(struct page *page)
> u |= 1 << KPF_COMPOUND_TAIL;
> if (PageHuge(page))
> u |= 1 << KPF_HUGE;
> -   else if (PageTransCompound(page))
> +   /*
> +* Since THP is relevant only for anonymous pages so far, we check it
> +* explicitly with PageAnon. Otherwise thp is confounded with non-huge
> +* compound pages like slab pages.
> +*/
> +   else if (PageTransCompound(page) && PageAnon(page))
> u |= 1 << KPF_THP;

Looks good to me.

Acked-by: KOSAKI Motohiro 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v2] nohz: fix idle ticks in cpu summary line of /proc/stat

2012-09-25 Thread Srivatsa S. Bhat
On 09/10/2012 04:43 PM, Srivatsa S. Bhat wrote:
> From: Michal Hocko 
> 
> Git commit 09a1d34f8535ecf9 "nohz: Make idle/iowait counter update
> conditional" introduced a bug in regard to cpu hotplug. The effect is
> that the number of idle ticks in the cpu summary line in /proc/stat is
> still counting ticks for offline cpus.
> 
> Reproduction is easy, just start a workload that keeps all cpus busy,
> switch off one or more cpus and then watch the idle field in top.
> On a dual-core with one cpu 100% busy and one offline cpu you will get
> something like this:
> 
> %Cpu(s): 48.7 us,  1.3 sy,  0.0 ni, 50.0 id,  0.0 wa,  0.0 hi,  0.0 si,
> %0.0 st
> 
> The problem is that an offline cpu still has ts->idle_active == 1.
> To fix this we should make sure that the cpu is online when calling
> get_cpu_idle_time_us and get_cpu_iowait_time_us.
> 
> Cc: Thomas Gleixner 
> Cc: sta...@vger.kernel.org
> Reported-by: Martin Schwidefsky 
> Reviewed-by: Srivatsa S. Bhat 
> Signed-off-by: Michal Hocko 
> [srivatsa.b...@linux.vnet.ibm.com: Rebased to current mainline]
> Signed-off-by: Srivatsa S. Bhat 
> ---
> 
> Hi Thomas,
> 
> This is a resend of the patch posted by Michal at [1]. Martin had explained
> the importance of this patch for fixing the bug for x86 case in [2]. (The s390
> fix is already upstream, commit id cb85a6ed67e9). Could you kindly consider
> taking this fix?
>

Hi Thomas,
Any thoughts on this?

Regards,
Srivatsa S. Bhat
 
> [1]. http://thread.gmane.org/gmane.linux.kernel/1265374/focus=1266457
> [2]. http://thread.gmane.org/gmane.linux.kernel/1265374/focus=1276336
> 
>  fs/proc/stat.c |   14 ++
>  1 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/proc/stat.c b/fs/proc/stat.c
> index 64c3b31..e296572 100644
> --- a/fs/proc/stat.c
> +++ b/fs/proc/stat.c
> @@ -45,10 +45,13 @@ static cputime64_t get_iowait_time(int cpu)
> 
>  static u64 get_idle_time(int cpu)
>  {
> - u64 idle, idle_time = get_cpu_idle_time_us(cpu, NULL);
> + u64 idle, idle_time = -1ULL;
> +
> + if (cpu_online(cpu))
> + idle_time = get_cpu_idle_time_us(cpu, NULL);
> 
>   if (idle_time == -1ULL)
> - /* !NO_HZ so we can rely on cpustat.idle */
> + /* !NO_HZ or cpu offline so we can rely on cpustat.idle */
>   idle = kcpustat_cpu(cpu).cpustat[CPUTIME_IDLE];
>   else
>   idle = usecs_to_cputime64(idle_time);
> @@ -58,10 +61,13 @@ static u64 get_idle_time(int cpu)
> 
>  static u64 get_iowait_time(int cpu)
>  {
> - u64 iowait, iowait_time = get_cpu_iowait_time_us(cpu, NULL);
> + u64 iowait, iowait_time = -1ULL;
> +
> + if (cpu_online(cpu))
> + iowait_time = get_cpu_iowait_time_us(cpu, NULL);
> 
>   if (iowait_time == -1ULL)
> - /* !NO_HZ so we can rely on cpustat.iowait */
> + /* !NO_HZ or cpu offline so we can rely on cpustat.iowait */
>   iowait = kcpustat_cpu(cpu).cpustat[CPUTIME_IOWAIT];
>   else
>   iowait = usecs_to_cputime64(iowait_time);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 20% performance drop on PostgreSQL 9.2 from kernel 3.5.3 to 3.6-rc5 on AMD chipsets - bisected

2012-09-25 Thread Borislav Petkov
On Tue, Sep 25, 2012 at 10:21:28AM -0700, Linus Torvalds wrote:
> On Tue, Sep 25, 2012 at 10:00 AM, Borislav Petkov  wrote:
> >
> > 3.6-rc6+tip/auto-latest-kill select_idle_sibling()
> 
> Is this literally just removing it entirely?

Basically yes:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b800a14b990..016ba387c7f2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2640,6 +2640,8 @@ static int select_idle_sibling(struct task_struct *p, int 
target)
struct sched_group *sg;
int i;
 
+   goto done;
+
/*
 * If the task is going to be woken-up on this cpu and if it is
 * already idle, then it is the right target.

> Because apart from the latency spike at 4 procs (and the latency
> numbers look very noisy, so that's probably just noise), it looks
> clearly superior to everything else. On that benchmark, at least.

Yep, I need more results for a more reliable say here.

> How does pgbench look? That's the one that apparently really wants to
> spread out, possibly due to user-level spinlocks. So I assume it will
> show the reverse pattern, with "kill select_idle_sibling" being the
> worst case.

Let me run pgbench tomorrow (I had run it only on an older family 0x10
single-node box) on Bulldozer to check that out. And we haven't started
the multi-node measurements at all.

> Sad, because it really would be lovely to just remove that thing ;)

Right, so why did we need it all, in the first place? There has to be
some reason for it.

Thanks.

-- 
Regards/Gruss,
Boris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.6-rc7 boot crash + bisection

2012-09-25 Thread Alex Williamson
On Tue, 2012-09-25 at 12:32 -0600, Alex Williamson wrote:
> On Mon, 2012-09-24 at 21:03 +0200, Florian Dazinger wrote:
> > Hi,
> > I think I've found a regression, which causes an early boot crash, I
> > appended the kernel output via jpg file, since I do not have a serial
> > console or sth.
> > 
> > after bisection, it boils down to this commit:
> > 
> > 9dcd61303af862c279df86aa97fde7ce371be774 is the first bad commit
> > commit 9dcd61303af862c279df86aa97fde7ce371be774
> > Author: Alex Williamson 
> > Date:   Wed May 30 14:19:07 2012 -0600
> > 
> > amd_iommu: Support IOMMU groups
> > 
> > Add IOMMU group support to AMD-Vi device init and uninit code.
> > Existing notifiers make sure this gets called for each device.
> > 
> > Signed-off-by: Alex Williamson 
> > Signed-off-by: Joerg Roedel 
> > 
> > :04 04 2f6b1b8e104d6dfec0abaa9646750f9b5a4f4060
> > 837ae95e84f6d3553457c4df595a9caa56843c03 M  drivers
> 
> [switching back to mailing list thread]
> 
> I asked Florian for dmesg w/ amd_iommu_dump, here's the relevant lines:
> 
> [1.485645] AMD-Vi: device: 00:00.2 cap: 0040 seg: 0 flags: 3e info 1300
> [1.485683] AMD-Vi:mmio-addr: feb2
> [1.485901] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:00.0 flags: 00
> [1.485935] AMD-Vi:   DEV_RANGE_END   devid: 00:00.2
> [1.485969] AMD-Vi:   DEV_SELECT  devid: 00:02.0 
> flags: 00
> [1.486002] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 01:00.0 flags: 00
> [1.486036] AMD-Vi:   DEV_RANGE_END   devid: 01:00.1
> [1.486070] AMD-Vi:   DEV_SELECT  devid: 00:04.0 
> flags: 00
> [1.486103] AMD-Vi:   DEV_SELECT  devid: 02:00.0 
> flags: 00
> [1.486137] AMD-Vi:   DEV_SELECT  devid: 00:05.0 
> flags: 00
> [1.486170] AMD-Vi:   DEV_SELECT  devid: 03:00.0 
> flags: 00
> [1.486204] AMD-Vi:   DEV_SELECT  devid: 00:06.0 
> flags: 00
> [1.486238] AMD-Vi:   DEV_SELECT  devid: 04:00.0 
> flags: 00
> [1.486271] AMD-Vi:   DEV_SELECT  devid: 00:07.0 
> flags: 00
> [1.486305] AMD-Vi:   DEV_SELECT  devid: 05:00.0 
> flags: 00
> [1.486338] AMD-Vi:   DEV_SELECT  devid: 00:09.0 
> flags: 00
> [1.486372] AMD-Vi:   DEV_SELECT  devid: 06:00.0 
> flags: 00
> [1.486406] AMD-Vi:   DEV_SELECT  devid: 00:0b.0 
> flags: 00
> [1.486439] AMD-Vi:   DEV_SELECT  devid: 07:00.0 
> flags: 00
> [1.486473] AMD-Vi:   DEV_ALIAS_RANGE devid: 08:01.0 
> flags: 00 devid_to: 08:00.0
> [1.486510] AMD-Vi:   DEV_RANGE_END   devid: 08:1f.7
> [1.486548] AMD-Vi:   DEV_SELECT  devid: 00:11.0 
> flags: 00
> [1.486581] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:12.0 flags: 00
> [1.486620] AMD-Vi:   DEV_RANGE_END   devid: 00:12.2
> [1.486654] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:13.0 flags: 00
> [1.486688] AMD-Vi:   DEV_RANGE_END   devid: 00:13.2
> [1.486721] AMD-Vi:   DEV_SELECT  devid: 00:14.0 
> flags: d7
> [1.486755] AMD-Vi:   DEV_SELECT  devid: 00:14.3 
> flags: 00
> [1.486788] AMD-Vi:   DEV_SELECT  devid: 00:14.4 
> flags: 00
> [1.486822] AMD-Vi:   DEV_ALIAS_RANGE devid: 09:00.0 
> flags: 00 devid_to: 00:14.4
> [1.486859] AMD-Vi:   DEV_RANGE_END   devid: 09:1f.7
> [1.486897] AMD-Vi:   DEV_SELECT  devid: 00:14.5 
> flags: 00
> [1.486931] AMD-Vi:   DEV_SELECT_RANGE_START  devid: 00:16.0 flags: 00
> [1.486965] AMD-Vi:   DEV_RANGE_END   devid: 00:16.2
> [1.487055] AMD-Vi: Enabling IOMMU at :00:00.2 cap 0x40
> 
> 
> > lspci:
> > 00:00.0 Host bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (external gfx0 port B) (rev 02)
> > 00:00.2 IOMMU: Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory 
> > Management Unit (IOMMU)
> > 00:02.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (PCI express gpp port B)
> > 00:04.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (PCI express gpp port D)
> > 00:05.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (PCI express gpp port E)
> > 00:06.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (PCI express gpp port F)
> > 00:07.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (PCI express gpp port G)
> > 00:09.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (PCI express gpp port H)
> > 00:0b.0 PCI bridge: Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI 
> > bridge (NB-SB link)
> > 00:11.0 SATA controller: Advanced Micro Devices [AMD] nee 

Re: Wrong system clock vs X.509 date specifiers

2012-09-25 Thread Tomas Mraz
On Tue, 2012-09-25 at 18:31 +0100, David Howells wrote: 
> Tomas Mraz  wrote:
> 
> > You can use openssl ca that allows to set arbitrary start date to
> > generate selfsigned certs as well (-selfsign option).
> 
> That seems to require some stuff I don't have installed:
> 
> warthog>openssl ca -in signing_key.priv -extensions v3_ca -out newcert.pem
> Using configuration from /etc/pki/tls/openssl.cnf
> Error opening CA private key /etc/pki/CA/private/cakey.pem
> 140244246955872:error:0200100D:system library:fopen:Permission 
> denied:bss_file.c:398:fopen('/etc/pki/CA/private/cakey.pem','r')
> 140244246955872:error:20074002:BIO routines:FILE_CTRL:system 
> lib:bss_file.c:400:
> unable to load CA private key
> unable to write 'random state'
> 
> (the /etc/pki/CA/private/ dir is inaccessible if not root and doesn't in any
> case contain cakey.pem).
> 
> Do I need to start with all the CA stuff in the right places to use it?

You can configure it to point to a different directories. But yes, you
have to create a CA cert and so on. 
-- 
Tomas Mraz
No matter how far down the wrong road you've gone, turn back.
  Turkish proverb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RCU idle CPU detection is broken in linux-next

2012-09-25 Thread Paul E. McKenney
On Tue, Sep 25, 2012 at 08:28:23PM +0200, Sasha Levin wrote:
> On 09/25/2012 02:06 PM, Frederic Weisbecker wrote:
> > Sasha, sorry to burden you with more testing request.
> > Could you please try out this new branch? It includes some fixes after Wu 
> > Fenguang and
> > Dan Carpenter reports (not related to your warnings though) and a patch on 
> > the top
> > of the pile to ensure I diagnosed well the problem, which return 
> > immediately from
> > rcu_user_*() APIs if we are in an interrupt.
> > 
> > This way we'll have a clearer view. I also would like to know if there are 
> > other
> > problems with the rcu user mode.
> > 
> > Thanks!
> 
> Alrighty, I don't see any warnings anymore.
> 
> I'll keep everything running just in case.

Very good news!!!  Thank you both!!!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ext4: remove static from struct match_token used in token2str

2012-09-25 Thread Theodore Ts'o
On Fri, Sep 21, 2012 at 10:21:00PM -0300, Herton Ronaldo Krzesinski wrote:
> There is no reason to use static there, and it will cause issues when
> reading /proc/fs/ext4//options concurrently.
> 
> Signed-off-by: Herton Ronaldo Krzesinski 
> Cc: sta...@vger.kernel.org # 3.4+

Herton,

Thanks for finding the bug fix!  I changed the commit descipription
slightly and have applied your patch to the ext4 tree.

- Ted

ext4: fix crash when accessing /proc/mounts concurrently

From: Herton Ronaldo Krzesinski 

The crash was caused by a variable being erronously declared static in
token2str().

In addition to /proc/mounts, the problem can also be easily replicated
by accessing /proc/fs/ext4//options in parallel:

$ cat /proc/fs/ext4//options > options.txt

... and then running the following command in two different terminals:

$ while diff /proc/fs/ext4//options options.txt; do true; done

This is also the cause of the following a crash while running xfstests
#234, as reported in the following bug reports:

https://bugs.launchpad.net/bugs/1053019
https://bugzilla.kernel.org/show_bug.cgi?id=47731

Signed-off-by: Herton Ronaldo Krzesinski 
Signed-off-by: "Theodore Ts'o" 
Cc: Brad Figg 
Cc: sta...@vger.kernel.org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   7   8   9   10   >