Re: [PATCH v4] Kconfig: introduce HAS_IOPORT option and select it as necessary

2023-04-05 Thread H. Peter Anvin
On April 5, 2023 8:12:38 AM PDT, Niklas Schnelle  wrote:
>On Thu, 2023-03-23 at 17:33 +0100, Niklas Schnelle wrote:
>> We introduce a new HAS_IOPORT Kconfig option to indicate support for I/O
>> Port access. In a future patch HAS_IOPORT=n will disable compilation of
>> the I/O accessor functions inb()/outb() and friends on architectures
>> which can not meaningfully support legacy I/O spaces such as s390.
>> 
>> The following architectures do not select HAS_IOPORT:
>> 
>> * ARC
>> * C-SKY
>> * Hexagon
>> * Nios II
>> * OpenRISC
>> * s390
>> * User-Mode Linux
>> * Xtensa
>> 
>> All other architectures select HAS_IOPORT at least conditionally.
>> 
>> The "depends on" relations on HAS_IOPORT in drivers as well as ifdefs
>> for HAS_IOPORT specific sections will be added in subsequent patches on
>> a per subsystem basis.
>> 
>> Co-developed-by: Arnd Bergmann 
>> Signed-off-by: Arnd Bergmann 
>> Acked-by: Johannes Berg  # for ARCH=um
>> Acked-by: Geert Uytterhoeven 
>> Signed-off-by: Niklas Schnelle 
>> ---
>> Note: This patch is the initial patch of a larger series[0]. This patch
>> introduces the HAS_IOPORT config option while the rest of the series adds
>> driver dependencies and the final patch removes inb() / outb() and friends on
>> platforms that don't support them. 
>> 
>> Thus each of the per-subsystem patches is independent from each other but
>> depends on this patch while the final patch depends on the whole series. Thus
>> splitting this initial patch off allows the per-subsytem HAS_IOPORT 
>> dependency
>> addition be merged separately via different trees without breaking the build.
>> 
>> [0] 
>> https://lore.kernel.org/lkml/20230314121216.413434-1-schne...@linux.ibm.com/
>> 
>> Changes since v3:
>> - List archs without HAS_IOPORT in commit message (Arnd)
>> - Select HAS_IOPORT for LoongArch (Arnd)
>> - Use "select HAS_IOPORT if (E)ISA || .." instead of a "depends on" for 
>> (E)ISA
>>   for m68k and parisc
>> - Select HAS_IOPORT with config GSC on parisc (Arnd)
>> - Drop "depends on HAS_IOPORT" for um's config ISA (Johannes)
>> - Drop "depends on HAS_IOPORT" for config ISA on x86 and parisc where it is
>>   always selected (Arnd)
>> 
>
>Gentle ping. As far as I can tell this hasn't been picked to any tree
>sp far but also hasn't seen complains so I'm wondering if I should send
>a new version of the combined series of this patch plus the added
>HAS_IOPORT dependencies per subsystem or wait until this is picked up.
>
>Thanks,
>Niklas
>
>

You need this on a system supporting not just ISA but also PCI.

Typically on non-x86 architectures this is simply mapped into a memory window.


Re: [PATCH v3 00/24] Remove COMMAND_LINE_SIZE from uapi

2023-03-02 Thread H. Peter Anvin
On March 1, 2023 7:17:18 PM PST, Palmer Dabbelt  wrote:
>On Tue, 14 Feb 2023 01:19:02 PST (-0800), h...@linux.ibm.com wrote:
>> On Tue, Feb 14, 2023 at 09:58:17AM +0100, Geert Uytterhoeven wrote:
>>> Hi Heiko,
>>> 
>>> On Tue, Feb 14, 2023 at 9:39 AM Heiko Carstens  wrote:
>>> > On Tue, Feb 14, 2023 at 08:49:01AM +0100, Alexandre Ghiti wrote:
>>> > > This all came up in the context of increasing COMMAND_LINE_SIZE in the
>>> > > RISC-V port.  In theory that's a UABI break, as COMMAND_LINE_SIZE is the
>>> > > maximum length of /proc/cmdline and userspace could staticly rely on
>>> > > that to be correct.
>>> > >
>>> > > Usually I wouldn't mess around with changing this sort of thing, but
>>> > > PowerPC increased it with a5980d064fe2 ("powerpc: Bump COMMAND_LINE_SIZE
>>> > > to 2048").  There are also a handful of examples of COMMAND_LINE_SIZE
>>> > > increasing, but they're from before the UAPI split so I'm not quite sure
>>> > > what that means: e5a6a1c90948 ("powerpc: derive COMMAND_LINE_SIZE from
>>> > > asm-generic"), 684d2fd48e71 ("[S390] kernel: Append scpdata to kernel
>>> > > boot command line"), 22242681cff5 ("MIPS: Extend COMMAND_LINE_SIZE"),
>>> > > and 2b74b85693c7 ("sh: Derive COMMAND_LINE_SIZE from
>>> > > asm-generic/setup.h.").
>>> > >
>>> > > It seems to me like COMMAND_LINE_SIZE really just shouldn't have been
>>> > > part of the uapi to begin with, and userspace should be able to handle
>>> > > /proc/cmdline of whatever length it turns out to be.  I don't see any
>>> > > references to COMMAND_LINE_SIZE anywhere but Linux via a quick Google
>>> > > search, but that's not really enough to consider it unused on my end.
>>> > >
>>> > > The feedback on the v1 seemed to indicate that COMMAND_LINE_SIZE really
>>> > > shouldn't be part of uapi, so this now touches all the ports.  I've
>>> > > tried to split this all out and leave it bisectable, but I haven't
>>> > > tested it all that aggressively.
>>> >
>>> > Just to confirm this assumption a bit more: that's actually the same
>>> > conclusion that we ended up with when commit 3da0243f906a ("s390: make
>>> > command line configurable") went upstream.
>
>Thanks, I guess I'd missed that one.  At some point I think there was some 
>discussion of making this a Kconfig for everyone, which seems reasonable to me 
>-- our use case for this being extended is syzkaller, but we're sort of just 
>picking a value that's big enough for now and running with it.
>
>Probably best to get it out of uapi first, though, as that way at least it's 
>clear that it's not uABI.
>
>>> Commit 622021cd6c560ce7 ("s390: make command line configurable"),
>>> I assume?
>> 
>> Yes, sorry for that. I got distracted while writing and used the wrong
>> branch to look this up.
>
>Alex: Probably worth adding that to the list in the cover letter as it looks 
>like you were planning on a v4 anyway (which I guess you now have to do, given 
>that I just added the issue to RISC-V).

The only use that is uapi is the *default* length of the command line if the 
kernel header doesn't include it (in the case of x86, it is in the bzImage 
header, but that is atchitecture- or even boot format-specific.)


Re: [PATCH] random: remove CONFIG_ARCH_RANDOM and "nordrand"

2022-07-06 Thread H. Peter Anvin
On July 6, 2022 5:23:31 AM PDT, Borislav Petkov  wrote:
>On Tue, Jul 05, 2022 at 04:11:45PM -0700, H. Peter Anvin wrote:
>> What I'm wondering is if we shouldn't be simply instrument *every*
>> invocation, and set the trust to zero if we ever trip it.
>
>I guess you can add some logic to rdrand_long() to sanity-check what it
>returns...
>
>But would that be worth the effort?
>

I think doing it centrally, as non-arch-specific code, and letting it subsume 
ad hoc checks for known failure conditions could be a win.


Re: [PATCH] random: remove CONFIG_ARCH_RANDOM and "nordrand"

2022-07-05 Thread H. Peter Anvin
On July 5, 2022 3:00:04 PM PDT, Borislav Petkov  wrote:
>On Tue, Jul 05, 2022 at 02:50:34PM -0700, H. Peter Anvin wrote:
>> It's just math. The only variable is your confidence level, i.e. at
>> what level do you decide that the likelihood of pure chance is way
>> smaller than the likelihood of hardware failure.
>
>That might be but the likelyhood of certain BIOSes dropping the ball
>after resume is 100%:
>
>7879fc4bdc75 ("x86/rdrand: Sanity-check RDRAND output")
>

What I'm wondering is if we shouldn't be simply instrument *every* invocation, 
and set the trust to zero if we ever trip it.


Re: [PATCH] random: remove CONFIG_ARCH_RANDOM and "nordrand"

2022-07-05 Thread H. Peter Anvin
On July 5, 2022 12:57:04 PM PDT, Borislav Petkov  wrote:
>On Tue, Jul 05, 2022 at 09:44:17PM +0200, Jason A. Donenfeld wrote:
>> Oh, huh. Maybe in that case I should adjust the message to say "consider
>> using `random.trust_cpu=0`," which is the thing that would actually make
>> a security difference.
>
>Why isn't that option documented in
>Documentation/admin-guide/kernel-parameters.txt?
>
>> But actually, one thing that wasn't clear to me was: does `nordrand`
>> affect what userspace sees? While random.c is okay in lots of
>> circumstances, I could imagine `nordrand` playing a role in preventing
>> userspace from using it, which might be desirable. Is this the case? If
>> so, I can remove the nordrand chunk from this patch for v2. If not, I'll
>> adjust the text to mention `random.trust_cpu=0`.
>
>Unfortunately, it doesn't disable the instruction. It would be lovely if
>we had a switch like that...
>
>That's why this message is supposed to be noisy so that people can pay
>attention at least.
>
>> In the sense that random.c can handle mostly any input without making
>> the quality worse. So, you can't accidentally taint it. The only risk is
>> if it thinks RDRAND is good and trustable when it isn't, but that's what
>> `random.trust_cpu=0` is for.
>
>And that's why I'm saying that if you detect RDRAND returning the
>same thing over and over again, you should simply stop using it.
>Automatically. Not rely on the user to do anything.
>

It's just math. The only variable is your confidence level, i.e. at what level 
do you decide that the likelihood of pure chance is way smaller than the 
likelihood of hardware failure. For example, the likelihood of m n-bit samples 
in a row being identical is 2^-(n*(m-3/2)), and the likelihood of the CPU being 
destroyed by a meterorite in the same microsecond is about 2^-100.



Re: [PATCH v3 02/22] compat: provide compat_ptr() on all architectures

2020-01-07 Thread H. Peter Anvin
,oprofile-l...@lists.sf.net,linux-s390 
,sparclinux 
From: h...@zytor.com
Message-ID: <41625f06-d755-4c82-86df-a9415feee...@zytor.com>

On January 7, 2020 12:08:31 AM PST, Arnd Bergmann  wrote:
>On Tue, Jan 7, 2020 at 3:05 AM Michael Ellerman 
>wrote:
>> Arnd Bergmann  writes:
>> > +
>> > +static inline compat_uptr_t ptr_to_compat(void __user *uptr)
>> > +{
>> > + return (u32)(unsigned long)uptr;
>> > +}
>>
>> Is there a reason we cast to u32 directly instead of using
>compat_uptr_t?
>
>Probably Al found this to be more explicit at the time when he
>introduced
>it on all the architectures in 2005. I just moved it here and kept the
>definition.
>
>   Arnd

Did compat_uptr_t exist back then?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH v3 02/22] compat: provide compat_ptr() on all architectures

2020-01-06 Thread H. Peter Anvin
On 2020-01-02 06:55, Arnd Bergmann wrote:
> In order to avoid needless #ifdef CONFIG_COMPAT checks,
> move the compat_ptr() definition to linux/compat.h
> where it can be seen by any file regardless of the
> architecture.
> 
> Only s390 needs a special definition, this can use the
> self-#define trick we have elsewhere.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm64/include/asm/compat.h   | 17 -
>  arch/mips/include/asm/compat.h| 18 --
>  arch/parisc/include/asm/compat.h  | 17 -
>  arch/powerpc/include/asm/compat.h | 17 -
>  arch/powerpc/oprofile/backtrace.c |  2 +-
>  arch/s390/include/asm/compat.h|  6 +-
>  arch/sparc/include/asm/compat.h   | 17 -
>  arch/x86/include/asm/compat.h | 17 -
>  include/linux/compat.h| 18 ++
>  9 files changed, 20 insertions(+), 109 deletions(-)
> 

For x86:

Reviewed-by: H. Peter Anvin 

It still suffers from the zero-one-infinity rule failure of the compat
architecture as a whole, but that is a very different problem. In this case
"compat" is obviously meaning "a 32-on-64 ABI" and simply centralizes a common
API, which is a Good Thing[TM].

-hpa



Re: [PATCH] Linux: Define struct termios2 in under _GNU_SOURCE [BZ #10339]

2019-04-17 Thread H. Peter Anvin
On 4/16/19 2:59 AM, Florian Weimer wrote:
> * hpa:
> 
>> Using symbol versioning doesn't really help much since the real
>> problem is that struct termios can be passed around in userspace, and
>> the interfaces between user space libraries don't have any
>> versioning. However, my POC code deals with that too by only seeing
>> BOTHER when necessary, so if the structure is extended garbage in the
>> extra fields will be ignored unless new baud rates are in use.
> 
> That still doesn't solve the problem of changing struct offsets after a
> struct field of type struct termios.
> 
>> Exporting termios2 to user space feels a bit odd at this stage as it
>> would only be usable as a fallback on old glibc. Call it
>> kernel_termios2 at least.
> 
> I'm not sure why we should do that?  The kernel calls it struct termios2
> in its UAPI headers.  If that name is not appropriate, it should be
> changed first in the UAPI headers.
> 

It should; I believe the namespace we ought to use is struct
__kernel_termios[2].

"struct termios" defined in these headers is obviously completely wrong
for userspace.

-hpa



Re: [PATCH] Linux: Define struct termios2 in under _GNU_SOURCE [BZ #10339]

2019-04-17 Thread H. Peter Anvin
On 4/15/19 10:22 AM, Adhemerval Zanella wrote:
>>
>> New interfaces are only necessary for the handful of architectures that 
>> don't have the speed fields *and* to space to put them in. 
> 
> Based on your WIP, it seems that both sparc and mips could be adapted.
> Do we still have glibc supported architecture that would require compat
> symbols?
> 
>>
>> Using symbol versioning doesn't really help much since the real problem is 
>> that struct termios can be passed around in userspace, and the interfaces 
>> between user space libraries don't have any versioning. However, my POC code 
>> deals with that too by only seeing BOTHER when necessary, so if the 
>> structure is extended garbage in the extra fields will be ignored unless new 
>> baud rates are in use.
> 
> Yeah, we discussed this earlier and if recall correctly it was not settled
> that all architectures would allow the use to extra space for the new
> fields. It seems the case, which makes the adaptation for termios2 even 
> easier.
> 
> The question I have for kernel side is whether termios2 is fully compatible
> with termios, meaning that if there is conner cases we need to handle in
> userland.
> 

It depends on what you mean with "fully compatible."

Functionality-wise, the termios2 interfaces are a strict superset. There
is not, however, any guarantee that struct kernel_termios2 *contains* a
contiguous binary equivalent to struct kernel_termios (in fact, on most
architectures, it doesn't.)

>>
>> My POC code deals with Alpha as well by falling back to the old interfaces 
>> if necessary and possible, otherwise return error.
>>
>> Exporting termios2 to user space feels a bit odd at this stage as it would 
>> only be usable as a fallback on old glibc. Call it kernel_termios2 at least. 
>> ioctls using struct termios *must* be changed to kernel_termios anyway!
>>
> 
> I still prefer to avoid export it to userland and make it usable through
> default termios, as your wip does.  My understanding is new interfaces 
> should be semantic equal to current one with the only deviation that
> non-standard baudrates will handled as its values.  The only issue I 
> can foresee is if POSIX starts to export new bauds value.
> 

... which will be easy to handle: just define a B constant with the
value equal to the baud rate.

-hhpa




Re: [PATCH] treewide: remove current_text_addr

2018-08-27 Thread H. Peter Anvin
On 08/27/18 06:11, Peter Zijlstra wrote:
> On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:
> 
>> _THIS_IP_, however, is completely ill-defined, other than being an
>> address *somewhere* in the same global function (not even necessarily
>> the same function if the function is static!)  As my experiment show, in
>> many (nearly) cases gcc will hoist the address all the way to the top of
>> the function, at least for the current generic implementation.
> 
> It seems to have mostly worked so far... did anything change?
> 

Most likely because the major architectures contain a arch-specific
assembly implementation.  The generic implementation used in some places
is completely broken, as my experiments show.

>> For the case where _THIS_IP_ is passed to an out-of-line function in all
>> cases, it is extra pointless because all it does is increase the
>> footprint of every caller: _RET_IP_ is inherently passed to the function
>> anyway, and with tailcall protection it will uniquely identify a callsite.
> 
> So I think we can convert many of the lockdep _THIS_IP_ calls to
> _RET_IP_ on the other side, with a wee bit of care.
> 
> A little something like so perhaps...

I don't have time to look at this right now (I'm on sabbatical, and I'm
dealing with personal legal stuff right at the moment), but I think it
is the right direction.

-hpa


Re: [PATCH] treewide: remove current_text_addr

2018-08-27 Thread H. Peter Anvin
On 08/27/18 00:33, Peter Zijlstra wrote:
> 
> What problem are we trying to solve? _THIS_IP_ and _RET_IP_ work fine.
> We're 'good' at dealing with text addresses, we use them for call stacks
> and all sorts. Why does this need changing?
> 

_RET_IP_ works fine, with the following two caveats:

1. To get a unique IP for each call site, the function call needs to be
   tailcall protected (easily done by wrapping the function in an
  __always_inline function with the notailcall() function I described
  earlier.  Alternatively, a generic macro wrapper for the same thing:

  #define notailcall(x) ({ typeof(x) _x = (x); asm volatile("");  _x; })

2. To uniformly get the return IP, it needs to be defined as:

#define _RET_IP_((unsigned long) \
__builtin_extract_return_addr(__builtin_return_address(0)))

[sorry for the line wrapping]

Using the type unsigned long instead of void * seems kind of pointless
though.


_THIS_IP_, however, is completely ill-defined, other than being an
address *somewhere* in the same global function (not even necessarily
the same function if the function is static!)  As my experiment show, in
many (nearly) cases gcc will hoist the address all the way to the top of
the function, at least for the current generic implementation.

For the case where _THIS_IP_ is passed to an out-of-line function in all
cases, it is extra pointless because all it does is increase the
footprint of every caller: _RET_IP_ is inherently passed to the function
anyway, and with tailcall protection it will uniquely identify a callsite.

For the case where _THIS_IP_ is used inline, I believe the version I
described will at the very least avoid hoisting around volatile accesses
like READ_ONCE(). Surrounding the marked code with asm volatile("");
[which should be turned into a macro or inline, obviously] might be
necessary for it to make any kind of inherent sense.

The proposed "location identifier" does have a serious problem: with
inline functions you might very well have a bunch of duplicates pointing
into the inline function, so a single callsite isn't identifiable.

-hpa



Re: [PATCH] treewide: remove current_text_addr

2018-08-26 Thread H. Peter Anvin
On 08/26/18 12:30, H. Peter Anvin wrote:
> Here is a full-blown (user space) test program demonstrating the whole
> technique and how to use it.
> 
>   -hpa

Incidentally, it looks like _RET_IP_ really should be defined as:


/*
 * Is there any reason whatsoever to have _RET_IP_ an unsigned int
 * rather than a pointer throughout?
 */

#define _RET_IP_PTR_ \
__builtin_extract_return_addr(__builtin_return_addr(0))
#define _RET_IP_ ((unsigned long)_RET_IP_PTR_)

On some architectures __builtin_extract_return_addr() is apparently
necessary; its a nop on x86.  Why that isn't part of
__builtin_return_addr() one can really wonder.

So, checking into all of this, the generic _THIS_IP_ DOES NOT WORK on
x86.  I have tried a tons of variants, including adding various asm
volatile(...) instructions, and no matter what I do, it will always
return the address of the surrounding function rather than any kind of
local IP.  The only way to get a localized address seems to be in
assembly, but even so, there is absolutely no guarantee that the value
of _THIS_IP_ has anything to do with where the code is otherwise
localized in the function.

>From examining the output of gcc, the fundamental problem seems to be
that *no matter what* one do with the label, unless gcc actually
produces a computed goto somewhere in the code, that it can't remove
with dead-code elimination or constant propagation, it will arbitrarily
hoist the labels all the way to the beginning of the function. Given
that, I suspect that other versions of gcc might have similar problems.

This is the closest thing to arch-neutral I have been able to find that
also works on x86, while not at the same time horribly polluting the
namespace:

#define __here(n) ___here_ ## n
#define __hereasm(n) ".L___here_" #n
#define _THIS_IP_CTR_(n)\
({  \
extern const char __here(n) asm(__hereasm(n));  \
asm volatile(__hereasm(n) ": /* _THIS_IP_ */"); \
(unsigned long)&__here(n);  \
})
#define _THIS_IP_ _THIS_IP_CTR_(__COUNTER__)

The use of asm volatile() to define a label means that the position in
the instruction stream is at least reasonably well-defined.

-hpa


Re: [PATCH] treewide: remove current_text_addr

2018-08-26 Thread H. Peter Anvin
Here is a full-blown (user space) test program demonstrating the whole
technique and how to use it.

-hpa

#include 
#include 

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define noinline __attribute__((noinline))
#define used __attribute__((used))
/* __always_inline is defined in glibc already */
#define ifconst(x,y)	__builtin_choose_expr(__builtin_constant_p(x),(x),(y))
static inline void notailcall(void)
{
	asm volatile("");
}

/* Change this to a null string to make all functions global */
#define STATIC static

struct myputs_string {
	unsigned short len;
	char str[0];
};

STATIC int _myputs_struct(const struct myputs_string * const strs);
STATIC int _myputs_string(const char *str);
STATIC int __myputs(unsigned long ip, const char *str, size_t len);

#if 1

#include 

STATIC void dump_caller(unsigned long where)
{
	const char *opname = NULL;
	const char *wheretoname = NULL;
	char ichar;
	unsigned long whereto = 0;

#if defined(__i386__) || defined(__x86_64__)
	char opname_buf[4];
	unsigned char opcode;
	
	where -= 5;
	opcode = *(unsigned char *)where;

	switch (opcode) {
	case 0xe8:
		opname = "call";
		whereto = where + 5 + *(signed int *)(where + 1);
		break;
	case 0xe9:
		opname = "jmp";
		whereto = where + 5 + *(signed int *)(where + 1);
		break;
	default:
		snprintf(opname_buf, sizeof opname_buf, "?%02x", opcode);
		opname = opname_buf;
		break;
	}

#elif defined(__sparc__)
	const char regtype[4] = "gilo";
	unsigned int opcode, op1, op3, ibit;
	signed int simm13, simm30;
	char opname_buf[32];
	char *p;

	where -= 8;
	
	opcode = *(signed int *)where;
	op1 = opcode >> 30;
	op3 = (opcode >> 19) & 0x3f;
	ibit = (opcode >> 13) & 1;
	simm13 = (opcode & 0x1fff) << 2;
	simm30 = (opcode & 0x3fff) << 2;

	opname = opname_buf;
	
	if (op1 == 1) {
		opname = "call";
		whereto = where + simm30;
	} else if (op1 == 2 && op3 == 0x38) {
		if (ibit) {
			snprintf(opname_buf, sizeof opname_buf,
 "jmpl %%%c%u %c 0x%x",
 regtype[(opcode >> 17) & 3],
 (opcode >> 14) & 7,
 simm13 < 0 ? '-' : '+',
 abs(simm13));
		} else {
			snprintf(opname_buf, sizeof opname_buf,
 "jmpl %%%c%u + %%%c%u",
 regtype[(opcode >> 17) & 3],
 (opcode >> 14) & 7,
 regtype[(opcode >> 3) & 3],
 opcode & 7);
		}
	} else {
		snprintf(opname_buf, sizeof opname_buf,
			 "?0x08x", opcode);
	}
#else
	/* Unknown architecture */
#endif
	if (whereto == (unsigned long)_myputs_struct) {
		wheretoname = "_myputs_struct";
	} else if (whereto == (unsigned long)_myputs_string) {
		wheretoname = "_myputs_string";
	} else {
		wheretoname = "?";
	}

	ichar = '[';
	
	if (opname) {
		printf("%c%p: %s",
		   ichar, (void *)where, opname);
		ichar = ' ';
	}
	if (whereto) {
		printf("%c%p <%s>", ichar, (void *)whereto, wheretoname);
		ichar = ' ';
	}
	if (ichar != '[')
		putchar(']');
}
	
STATIC int __myputs(unsigned long where, const char *str, size_t len)
{
	size_t slen = strlen(str);
	size_t rv;
	
	len--;
	rv = printf("%p: \"%.*s\"%*s", (void *)where, (int)len, str,
		16-(int)slen, "");
	dump_caller(where);
	if (slen != len)
		printf(" \n", slen, len);
	else
		printf(" \n", len);
	
	return rv;
}

STATIC int noinline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

STATIC int noinline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}
#endif

#define myputs(s)			\
({	\
	int _rv;			\
	if (__builtin_constant_p(s) &&	\
	__builtin_constant_p(strlen(s)) &&\
	strlen(s)+1 == sizeof(s) &&	\
	sizeof(s) <= (size_t)65535) {\
	static const struct {		\
		struct myputs_string _mps_hdr;\
		char _mps_str[sizeof(s)];\
	} _mps = {			\
		._mps_hdr.len = sizeof(s),\
		._mps_str = ifconst(s,""),\
	};\
		_rv = _myputs_struct(&_mps._mps_hdr);			\
	} else {			\
		_rv = _myputs_string(s);\
	}\
	notailcall();			\
	_rv;\
})

STATIC int test1(void);
STATIC int test2(const char *strx);

STATIC int test1(void)
{
	return myputs("Foobar");
}

STATIC int test2(const char *strx)
{
	return myputs(strx);
}

int main(int argc, char *argv[])
{
	(void)argc;

	test1();
	test2(argv[0]);
	return 0;
}


Re: [PATCH] treewide: remove current_text_addr

2018-08-26 Thread H. Peter Anvin
On 08/25/18 20:16, H. Peter Anvin wrote:
> On 08/25/18 19:38, H. Peter Anvin wrote:
>>
>> If it was worthwhile it would make more sense to at least force this
>> into the rodata section with the string, something like the attached
>> file for an example; however, I have a hunch it doesn't matter.
>>
> 
> An even nuttier version which avoids the extra pointer indirection.
> Read it and fear.
> 
>   -hpa
> 

OK, so one more thing, I guess: it is necessary to suppress the tailcall
optimization for _RET_IP_ to make any sense, but that should be pretty
simple:

static inline void notailcall(void)
{
asm volatile("");
}

-hpa


Re: [PATCH] treewide: remove current_text_addr

2018-08-26 Thread H. Peter Anvin
On 08/25/18 03:48, Helge Deller wrote:
> 
> Currently alpha, s390, sparc, sh, c6x, ia64 and parisc provide an
> inline assembly function to get the current instruction pointer. 
> As mentioned in an earlier thread, I personally would *prefer* if 
> _THIS_IP_ would use those inline assembly instructions on those
> architectures instead of the (currently used) higher C-level
> implementation.
> 

Older ones have as well, e.g. x86.

The only reason to retain the use of an assembly function would be in
the case where either:

a) the C implementation produces bad or invalid code on certain
   architectures;
b) there is a specific requirement that either an absolute or a relative
   value is used in the binary, e.g. due to constraints on relocation.
   The latter particularly comes to mind since the x86-64 implementation
   in assembly will produce movq $.,%reg (which requires relocation)
   instead of the more natural leaq .(%rip),%reg.

In the case (a) those architectures ought to be able to simply

#undef _THIS_IP_
#define _THIS_IP_ blah...

and in case (b) *those specific instances* should be using some kind of
specially flagged function e.g. current_true_ip() vs.
current_linktime_ip() or somesuch.

I also note that a lot of those functions are not marked
__always_inline, which is a serious error should the compiler ever get
the idea to out-of-line these functions, which could potentially happen
as gcc is rather bad at assigning weight to an assembly statement.

I'm also going to throw in a perhaps ugly bomb into this discussion:

_THIS_IP_ seems to be horribly ill-defined; there is no kind of
serialization, and no reason to believe it can't be arbitrarily hoisted
inside a function. Furthermore, *most of the uses of _THIS_IP_ seem to
be either discarded or passed to a function*, and the location of a
function call, unlike _THIS_IP_ is very well defined.

In that case, the use of this mechanism is completely pointless and
ought to be replaced with _RET_IP_.  It seems like most invocations of
_THIS_IP_ can be trivially replaced with _RET_IP_ inside the function,
which would also reduce the footprint of the function call, for example:

__trace_puts() is only ever called with _THIS_IP_ as the first argument;
drop that argument and use _RET_IP_ inside the function (also,
__trace_puts() only ever uses strlen() as the third argument, which gcc
can of course optimize into a constant for the case of a consta t
string, but *is that optimization actually worth it*?  In the case of
__trace_puts(), a variable strlen() would only ever need to be called in
the case of an allocation actually happening -- otherwise str is never
examined -- and again, it increases the *code size* of the call site.
If it was worthwhile it would make more sense to at least force this
into the rodata section with the string, something like the attached
file for an example; however, I have a hunch it doesn't matter.

I wouldn't be surprised if all or nearly all instances of _THIS_IP_ can
be completely removed.

-hpa
#include 
#include 

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define no_inline __attribute__((noinline))
#define must_inline __attribute__((always_inline)) inline

struct myputs_string {
	size_t len;
	const char *str;
};

int _myputs_struct(const struct myputs_string * const strs);
int _myputs_string(const char *str);
int __myputs(unsigned long ip, const char *str, size_t len);

int no_inline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

int no_inline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}

#define myputs(s)			\
({	\
	int _rv;			\
	if (__builtin_constant_p(s) &&	\
	__builtin_constant_p(strlen(s)) &&\
	strlen(s)+1 == sizeof(s)) {	\
		static const struct myputs_string _mps = {		\
			.len = sizeof(s),\
			.str = __builtin_constant_p(s) ? s : NULL,	\
		};			\
		_rv = _myputs_struct(&_mps);\
	} else {			\
		_rv = _myputs_string(s);\
	}\
	_rv;\
})

int test1(void);
int test2(const char *strx);

int test1(void)
{
	return myputs("Foobar");
}

int test2(const char *strx)
{
	return myputs(strx);
}

		


Re: [PATCH] treewide: remove current_text_addr

2018-08-26 Thread H. Peter Anvin
On 08/25/18 19:38, H. Peter Anvin wrote:
>
> If it was worthwhile it would make more sense to at least force this
> into the rodata section with the string, something like the attached
> file for an example; however, I have a hunch it doesn't matter.
> 

An even nuttier version which avoids the extra pointer indirection.
Read it and fear.

-hpa

#include 
#include 

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define no_inline __attribute__((noinline))
#define must_inline __attribute__((always_inline)) inline

struct myputs_string {
	unsigned short len;
	char str[0];
};

int _myputs_struct(const struct myputs_string * const strs);
int _myputs_string(const char *str);
int __myputs(unsigned long ip, const char *str, size_t len);

int no_inline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

int no_inline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}

#define ifconst(x,y)	__builtin_choose_expr(__builtin_constant_p(x),(x),(y))

#define myputs(s)			\
({	\
	int _rv;			\
	if (__builtin_constant_p(s) &&	\
	__builtin_constant_p(strlen(s)) &&\
	strlen(s)+1 == sizeof(s) &&	\
	sizeof(s) <= (size_t)65535) {\
	static const struct {		\
		struct myputs_string _mps_hdr;\
		char _mps_str[sizeof(s)];\
	} _mps = {			\
		._mps_hdr.len = sizeof(s),\
		._mps_str = ifconst(s,""),\
	};\
		_rv = _myputs_struct(&_mps._mps_hdr);			\
	} else {			\
		_rv = _myputs_string(s);\
	}\
	_rv;\
})

int test1(void);
int test2(const char *strx);

int test1(void)
{
	return myputs("Foobar");
}

int test2(const char *strx)
{
	return myputs(strx);
}

		


Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread H. Peter Anvin
,Eric Biederman ,Tejun Heo 
,Paolo Bonzini ,Andrew Morton 
,"Kirill A . Shutemov" 
,Lu Baolu 
From: h...@zytor.com
Message-ID: 

On July 26, 2017 9:24:45 PM GMT+02:00, Brijesh Singh  
wrote:
>
>Hi Arnd and David,
>
>On 07/26/2017 05:45 AM, Arnd Bergmann wrote:
>> On Tue, Jul 25, 2017 at 11:51 AM, David Laight
> wrote:
>>> From: Brijesh Singh
 Sent: 24 July 2017 20:08
 From: Tom Lendacky 

 Secure Encrypted Virtualization (SEV) does not support string I/O,
>so
 unroll the string I/O operation into a loop operating on one
>element at
 a time.

 Signed-off-by: Tom Lendacky 
 Signed-off-by: Brijesh Singh 
 ---
   arch/x86/include/asm/io.h | 26 ++
   1 file changed, 22 insertions(+), 4 deletions(-)

 diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
 index e080a39..2f3c002 100644
 --- a/arch/x86/include/asm/io.h
 +++ b/arch/x86/include/asm/io.h
 @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int
>port)   \

>   \
   static inline void outs##bwl(int port, const void *addr, unsigned
>long count) \
   {
>> 
>> This will clash with a fix I did to add a "memory" clobber
>> for the traditional implementation, see
>> https://patchwork.kernel.org/patch/9854573/
>> 
>>> Is it even worth leaving these as inline functions?
>>> Given the speed of IO cycles it is unlikely that the cost of calling
>a real
>>> function will be significant.
>>> The code bloat reduction will be significant.
>> 
>> I think the smallest code would be the original "rep insb" etc, which
>> should be smaller than a function call, unlike the loop. Then again,
>> there is a rather small number of affected device drivers, almost all
>> of them for ancient hardware that you won't even build in a 64-bit
>> x86 kernel, see the list below. The only user I found that is
>actually
>> still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the
>early
>> console.
>
>
>There are some indirect user of string I/O functions. The following
>functions
>defined in lib/iomap.c calls rep version of ins and outs.
>
>- ioread8_rep, ioread16_rep, ioread32_rep
>- iowrite8_rep, iowrite16_rep, iowrite32_rep
>
>I found that several drivers use above functions.
>
>Here is one approach to convert it into non-inline functions. In this
>approach,
>I have added a new file arch/x86/kernel/io.c which provides non rep
>version of
>string I/O routines. The file gets built and used only when
>AMD_MEM_ENCRYPT is
>enabled. On positive side, if we don't build kernel with
>AMD_MEM_ENCRYPT support
>then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
>a function
>call. Inside the function we unroll only when SEV is active.
>
>Do you see any issue with this approach ? thanks
>
>diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>index e080a39..104927d 100644
>--- a/arch/x86/include/asm/io.h
>+++ b/arch/x86/include/asm/io.h
>@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)  
>\
>  unsigned type value = in##bwl(port);\
>  slow_down_io(); \
>  return value;   \
>-} 
>\
>-  
>\
>+}
>+
>+#define BUILDIO_REP(bwl, bw, type)
>\
>static inline void outs##bwl(int port, const void *addr, unsigned long
>count) \
>{ 
>\
>  asm volatile("rep; outs" #bwl   \
>@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
>unsigned long count)\
>{ 
>\
>  asm volatile("rep; ins" #bwl\
>   : "+D"(addr), "+c"(count) : "d"(port));\
>-}
>+} 
>\
>  
>  BUILDIO(b, b, char)
>  BUILDIO(w, w, short)
>  BUILDIO(l, , int)
>  
>+#ifdef CONFIG_AMD_MEM_ENCRYPT
>+extern void outsb_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insb_try_rep(int port, void *addr, unsigned long count);
>+extern void outsw_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insw_try_rep(int port, void *addr, unsigned long count);
>+extern 

Re: [PATCH 1/3] futex: remove duplicated code

2017-03-08 Thread H. Peter Anvin
,Thomas Gleixner ,Ingo Molnar 
,Chris Zankel ,Max Filippov 
,Arnd Bergmann 
,x...@kernel.org,linux-al...@vger.kernel.org,linux-snps-...@lists.infradead.org,linux-arm-ker...@lists.infradead.org,linux-hexa...@vger.kernel.org,linux-i...@vger.kernel.org,linux-m...@linux-mips.org,openr...@lists.librecores.org,linux-par...@vger.kernel.org,linuxppc-dev@lists.ozlabs.org,linux-s...@vger.kernel.org,linux...@vger.kernel.org,sparcli...@vger.kernel.org,linux-xte...@linux-xtensa.org,linux-a...@vger.kernel.org
From: h...@zytor.com
Message-ID: <83324528-aaa1-4bed-b0c7-48426ecba...@zytor.com>

On March 8, 2017 8:16:49 PM PST, Rob Landley  wrote:
>On 03/04/2017 07:05 AM, Russell King - ARM Linux wrote:
>> On Fri, Mar 03, 2017 at 01:27:10PM +0100, Jiri Slaby wrote:
>>> diff --git a/kernel/futex.c b/kernel/futex.c
>>> index b687cb22301c..c5ff9850952f 100644
>>> --- a/kernel/futex.c
>>> +++ b/kernel/futex.c
>>> @@ -1457,6 +1457,42 @@ futex_wake(u32 __user *uaddr, unsigned int
>flags, int nr_wake, u32 bitset)
>>> return ret;
>>>  }
>>>  
>>> +static int futex_atomic_op_inuser(int encoded_op, u32 __user
>*uaddr)
>>> +{
>>> +   int op = (encoded_op >> 28) & 7;
>>> +   int cmp = (encoded_op >> 24) & 15;
>>> +   int oparg = (encoded_op << 8) >> 20;
>>> +   int cmparg = (encoded_op << 20) >> 20;
>> 
>> Hmm.  oparg and cmparg look like they're doing these shifts to get
>sign
>> extension of the 12-bit values by assuming that "int" is 32-bit -
>> probably worth a comment, or for safety, they should be "s32" so it's
>> not dependent on the bit-width of "int".
>
>I thought Linux depended on the LP64 standard for all architectures?
>
>Standard: http://www.unix.org/whitepapers/64bit.html
>Rationale: http://www.unix.org/version2/whatsnew/lp64_wp.html
>
>So int has a defined bit width (32) on linux?
>
>Rob

Linux is ILP32 on 32-bit architectures and LP64 on 64-bit architectures, but 
that doesn't inherently make this stuff clear.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 1/3] futex: remove duplicated code

2017-03-04 Thread H. Peter Anvin
<da...@davemloft.net>,Chris Metcalf <cmetc...@mellanox.com>,Thomas Gleixner 
<t...@linutronix.de>,Ingo Molnar <mi...@redhat.com>,Chris Zankel 
<ch...@zankel.net>,Max Filippov <jcmvb...@gmail.com>,Arnd Bergmann 
<a...@arndb.de>,x...@kernel.org,linux-al...@vger.kernel.org,linux-snps-...@lists.infradead.org,linux-arm-ker...@lists.infradead.org,linux-hexa...@vger.kernel.org,linux-i...@vger.kernel.org,linux-m...@linux-mips.org,openr...@lists.librecores.org,linux-par...@vger.kernel.org,linuxppc-dev@lists.ozlabs.org,linux-s...@vger.kernel.org,linux...@vger.kernel.org,sparcli...@vger.kernel.org,linux-xte...@linux-xtensa.org,linux-a...@vger.kernel.org
From: h...@zytor.com
Message-ID: <cf18535e-39e7-44d3-88d0-80b9961e6...@zytor.com>

On March 4, 2017 1:38:05 PM PST, Stafford Horne <sho...@gmail.com> wrote:
>On Sat, Mar 04, 2017 at 11:15:17AM -0800, H. Peter Anvin wrote:
>> On 03/04/17 05:05, Russell King - ARM Linux wrote:
>> >>  
>> >> +static int futex_atomic_op_inuser(int encoded_op, u32 __user
>*uaddr)
>> >> +{
>> >> + int op = (encoded_op >> 28) & 7;
>> >> + int cmp = (encoded_op >> 24) & 15;
>> >> + int oparg = (encoded_op << 8) >> 20;
>> >> + int cmparg = (encoded_op << 20) >> 20;
>> > 
>> > Hmm.  oparg and cmparg look like they're doing these shifts to get
>sign
>> > extension of the 12-bit values by assuming that "int" is 32-bit -
>> > probably worth a comment, or for safety, they should be "s32" so
>it's
>> > not dependent on the bit-width of "int".
>> > 
>> 
>> For readability, perhaps we should make sign- and zero-extension an
>> explicit facility?
>
>There is some of this in already here, 32 and 64 bit versions:
>
>  include/linux/bitops.h
>
>Do we really need zero extension? It seems the same.
>
>Example implementation from bitops.h
>
>static inline __s32 sign_extend32(__u32 value, int index)
>{
>__u8 shift = 31 - index;
>return (__s32)(value << shift) >> shift;
>}
>
>> /*
>>  * Truncate an integer x to n bits, using sign- or
>>  * zero-extension, respectively.
>>  */
>> static inline __const_func__ s32 sex32(s32 x, int n)
>> {
>>   return (x << (32-n)) >> (32-n);
>> }
>> 
>> static inline __const_func__ s64 sex64(s64 x, int n)
>> {
>>   return (x << (64-n)) >> (64-n);
>> }
>> 
>> #define sex(x,y) \
>>  ((__typeof__(x))\
>>   (((__builtin_constant_p(y) && ((y) <= 32)) ||  \
>> (sizeof(x) <= sizeof(s32)))  \
>>? sex32((x),(y)) : sex64((x),(y
>> 
>> static inline __const_func__ u32 zex32(u32 x, int n)
>> {
>>   return (x << (32-n)) >> (32-n);
>> }
>> 
>> static inline __const_func__ u64 zex64(u64 x, int n)
>> {
>>   return (x << (64-n)) >> (64-n);
>> }
>> 
>> #define zex(x,y) \
>>  ((__typeof__(x))\
>>   (((__builtin_constant_p(y) && ((y) <= 32)) ||  \
>> (sizeof(x) <= sizeof(u32)))  \
>>? zex32((x),(y)) : zex64((x),(y
>> 

Also, i strongly believe that making it syntactically cumbersome encodes people 
to open-code it which is bad...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 1/3] futex: remove duplicated code

2017-03-04 Thread H. Peter Anvin
<da...@davemloft.net>,Chris Metcalf <cmetc...@mellanox.com>,Thomas Gleixner 
<t...@linutronix.de>,Ingo Molnar <mi...@redhat.com>,Chris Zankel 
<ch...@zankel.net>,Max Filippov <jcmvb...@gmail.com>,Arnd Bergmann 
<a...@arndb.de>,x...@kernel.org,linux-al...@vger.kernel.org,linux-snps-...@lists.infradead.org,linux-arm-ker...@lists.infradead.org,linux-hexa...@vger.kernel.org,linux-i...@vger.kernel.org,linux-m...@linux-mips.org,openr...@lists.librecores.org,linux-par...@vger.kernel.org,linuxppc-dev@lists.ozlabs.org,linux-s...@vger.kernel.org,linux...@vger.kernel.org,sparcli...@vger.kernel.org,linux-xte...@linux-xtensa.org,linux-a...@vger.kernel.org
From: h...@zytor.com
Message-ID: <9b46afa6-c422-41fd-8e8a-356e44a6d...@zytor.com>

On March 4, 2017 1:38:05 PM PST, Stafford Horne <sho...@gmail.com> wrote:
>On Sat, Mar 04, 2017 at 11:15:17AM -0800, H. Peter Anvin wrote:
>> On 03/04/17 05:05, Russell King - ARM Linux wrote:
>> >>  
>> >> +static int futex_atomic_op_inuser(int encoded_op, u32 __user
>*uaddr)
>> >> +{
>> >> + int op = (encoded_op >> 28) & 7;
>> >> + int cmp = (encoded_op >> 24) & 15;
>> >> + int oparg = (encoded_op << 8) >> 20;
>> >> + int cmparg = (encoded_op << 20) >> 20;
>> > 
>> > Hmm.  oparg and cmparg look like they're doing these shifts to get
>sign
>> > extension of the 12-bit values by assuming that "int" is 32-bit -
>> > probably worth a comment, or for safety, they should be "s32" so
>it's
>> > not dependent on the bit-width of "int".
>> > 
>> 
>> For readability, perhaps we should make sign- and zero-extension an
>> explicit facility?
>
>There is some of this in already here, 32 and 64 bit versions:
>
>  include/linux/bitops.h
>
>Do we really need zero extension? It seems the same.
>
>Example implementation from bitops.h
>
>static inline __s32 sign_extend32(__u32 value, int index)
>{
>__u8 shift = 31 - index;
>return (__s32)(value << shift) >> shift;
>}
>
>> /*
>>  * Truncate an integer x to n bits, using sign- or
>>  * zero-extension, respectively.
>>  */
>> static inline __const_func__ s32 sex32(s32 x, int n)
>> {
>>   return (x << (32-n)) >> (32-n);
>> }
>> 
>> static inline __const_func__ s64 sex64(s64 x, int n)
>> {
>>   return (x << (64-n)) >> (64-n);
>> }
>> 
>> #define sex(x,y) \
>>  ((__typeof__(x))\
>>   (((__builtin_constant_p(y) && ((y) <= 32)) ||  \
>> (sizeof(x) <= sizeof(s32)))  \
>>? sex32((x),(y)) : sex64((x),(y
>> 
>> static inline __const_func__ u32 zex32(u32 x, int n)
>> {
>>   return (x << (32-n)) >> (32-n);
>> }
>> 
>> static inline __const_func__ u64 zex64(u64 x, int n)
>> {
>>   return (x << (64-n)) >> (64-n);
>> }
>> 
>> #define zex(x,y) \
>>  ((__typeof__(x))\
>>   (((__builtin_constant_p(y) && ((y) <= 32)) ||  \
>> (sizeof(x) <= sizeof(u32)))  \
>>? zex32((x),(y)) : zex64((x),(y
>> 

Signed versus unsigned...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 1/3] futex: remove duplicated code

2017-03-04 Thread H. Peter Anvin
On 03/04/17 05:05, Russell King - ARM Linux wrote:
>>  
>> +static int futex_atomic_op_inuser(int encoded_op, u32 __user *uaddr)
>> +{
>> +int op = (encoded_op >> 28) & 7;
>> +int cmp = (encoded_op >> 24) & 15;
>> +int oparg = (encoded_op << 8) >> 20;
>> +int cmparg = (encoded_op << 20) >> 20;
> 
> Hmm.  oparg and cmparg look like they're doing these shifts to get sign
> extension of the 12-bit values by assuming that "int" is 32-bit -
> probably worth a comment, or for safety, they should be "s32" so it's
> not dependent on the bit-width of "int".
> 

For readability, perhaps we should make sign- and zero-extension an
explicit facility?

/*
 * Truncate an integer x to n bits, using sign- or
 * zero-extension, respectively.
 */
static inline __const_func__ s32 sex32(s32 x, int n)
{
  return (x << (32-n)) >> (32-n);
}

static inline __const_func__ s64 sex64(s64 x, int n)
{
  return (x << (64-n)) >> (64-n);
}

#define sex(x,y)\
((__typeof__(x))\
 (((__builtin_constant_p(y) && ((y) <= 32)) ||  \
   (sizeof(x) <= sizeof(s32)))  \
  ? sex32((x),(y)) : sex64((x),(y

static inline __const_func__ u32 zex32(u32 x, int n)
{
  return (x << (32-n)) >> (32-n);
}

static inline __const_func__ u64 zex64(u64 x, int n)
{
  return (x << (64-n)) >> (64-n);
}

#define zex(x,y)\
((__typeof__(x))\
 (((__builtin_constant_p(y) && ((y) <= 32)) ||  \
   (sizeof(x) <= sizeof(u32)))  \
  ? zex32((x),(y)) : zex64((x),(y



Re: [PATCH 00/14] Present useful limits to user (v2)

2016-07-15 Thread H. Peter Anvin
,Johannes Weiner ,Alexei Starovoitov 
,Arnaldo Carvalho de Melo ,Alexander Shishkin 
,Balbir Singh 
,Markus Elfring ,"David 
S. Miller" ,Nicolas Dichtel 
,Andrew Morton 
,Konstantin Khlebnikov ,Jiri Slaby 
,Cyrill Gorcunov ,Michal Hocko 
,Vlastimil Babka ,Dave Hansen 
,Greg Kroah-Hartman 
,Dan Carpenter ,Michael 
Kerrisk ,"Kirill A. Shutemov" 
,Marcus Gelderie ,Vladimir 
Davydov ,Joe Perches ,Frederic 
Weisbecker ,Andrea Arcangeli ,!
 "Eric W.
Biederman" ,Andi Kleen ,Oleg 
Nesterov ,Stas Sergeev ,Amanieu d'Antras 
,Richard Weinberger ,Wang Xiaoqiang 
,Helge Deller ,Mateusz Guzik 
,Alex Thorlton ,Ben Segall 
,John Stultz ,Rik van Riel 
,Eric B Munson ,Alexey Klimov 
,Chen Gang ,Andrey Ryabinin 
,David Rientjes ,Hugh Dickins 
,Alexander Kuleshov ,"open 
list:DOCUMENTATION" ,"open list:IA64 (Itanium) 
PLATFORM" ,"open list:KERNEL VIRTUAL MACHINE (KVM) 
FOR POWERPC" ,"open list:KERNEL VIRTUAL MACHINE (KVM)" 
,"open list:LINUX FOR POWERPC!
  (32-BIT
AND 64-BIT)" ,"open list:INFINIBAND SUBSYSTEM" 
,"open list:FILESYSTEMS (VFS and infrastructure)" 
,"open list:CONTROL GROUP (CGROUP)" 
,"open list:BPF (Safe dynamic programs and tools)" 
,"open list:MEMORY MANAGEMENT" 
Message-ID: 

On July 15, 2016 6:59:56 AM PDT, Peter Zijlstra  wrote:
>On Fri, Jul 15, 2016 at 01:52:48PM +, Topi Miettinen wrote:
>> On 07/15/16 12:43, Peter Zijlstra wrote:
>> > On Fri, Jul 15, 2016 at 01:35:47PM +0300, Topi Miettinen wrote:
>> >> Hello,
>> >>
>> >> There are many basic ways to control processes, including
>capabilities,
>> >> cgroups and resource limits. However, there are far fewer ways to
>find out
>> >> useful values for the limits, except blind trial and error.
>> >>
>> >> This patch series attempts to fix that by giving at least a nice
>starting
>> >> point from the highwater mark values of the resources in question.
>> >> I looked where each limit is checked and added a call to update
>the mark
>> >> nearby.
>> > 
>> > And how is that useful? Setting things to the high watermark is
>> > basically the same as not setting the limit at all.
>> 
>> What else would you use, too small limits?
>
>That question doesn't make sense.
>
>What's the point of setting a limit if it ends up being the same as
>no-limit (aka unlimited).
>
>If you cannot explain; and you have not so far; what use these values
>are, why would we look at the patches.

One reason is to catch a malfunctioning process rather than dragging the whole 
system down with it.  It could also be useful for development.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/8] Use correctly the Xen memory terminologies in Linux

2015-07-28 Thread H. Peter Anvin
On 07/28/2015 08:02 AM, Julien Grall wrote:
 Hi all,
 
 This patch series aims to use the memory terminologies described in
 include/linux/mm.h [1] for Linux xen code.
 
 Linux is using mistakenly MFN when GFN is meant, I suspect this is because the
 first support of Xen was for PV. This has brought some misimplementation
 of memory helpers on ARM and make the developper confused about the expected
 behavior.
 
 For instance, with pfn_to_mfn, we expect to get a MFN based on the name.
 Although, if we look at the implementation on x86, it's returning a GFN.
 Most of the callers are also using it this way.
 
 The first 2 patches of this series is ARM related in order to remove
 PV specific helpers which should not be used and fixing the implementation of
 pfn_to_mfn.
 
 The rest of the series is here rename most of the usage in the common code
 of MFN to GFN. I also took the opportunity to replace most of the call to
 pfn_to_gfn in the common code by page_to_gfn avoid construction such
 as pfn_to_gfn(page_to_pfn(...).
 
 Note the one xen-blkfront will be dropped by 64K series [2], I can include it
 if necessary.
 

Can we actually get some documentation for Xen before starting to change
names around?

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5] kexec: Fix make headers_check

2014-10-06 Thread H. Peter Anvin
On 08/30/2014 05:47 AM, Paul Bolle wrote:
 
 Obviously, this can only break compiling those libraries, or other
 users. It can't break already compiled binaries. Besides I don't think
 those libraries, etc actually exist. Maximilian mentioned klibc in
 January, but I wasn't able to find a version of klibc that cared about
 this prototype. No one pointed me at a version that does (or any other
 library, etc., for that matter).
 

... and that would be a klibc bug and we'd just fix it.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-10 Thread H. Peter Anvin
On 09/08/2014 10:52 AM, One Thousand Gnomes wrote:
 
 I think the whole removing Alpha EV5 support is basically bonkers. Just
 use set_bit in the tty layer. Alpha will continue to work as well as it
 always has done and you won't design out support for any future processor
 that turns out not to do byte aligned stores.
 

I think it's pretty safe to say such a processor is unlikely to ever be
created, for the same reason there weren't any 48-bit computers when
32-bit computers ran out of steam: it caused more problems than it
solved, and Alpha pretty much proved that.  The engineering advantages
would have to be so overwhelmingly in favor for someone to want to walk
down that road again.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
On 09/08/2014 10:52 AM, One Thousand Gnomes wrote:
 On Fri, 05 Sep 2014 08:41:52 -0700
 H. Peter Anvin h...@zytor.com wrote:
 
 On 09/05/2014 08:31 AM, Peter Hurley wrote:

 Which is a bit ironic because I remember when Digital had a team
 working on emulating native x86 apps on Alpha/NT.


 Right, because the x86 architecture was obsolete and would never scale...
 
 Talking about not scaling can anyone explain how a you need to use
 set_bit() and friends bug report scaled into a hundred message plus
 discussion about ambiguous properties of processors (and nobody has
 audited all the embedded platforms we support yet, or the weirder ARMs)
 and a propsal to remove Alpha support.
 
 Wouldn't it be *much* simpler to do what I suggested in the first place
 and use the existing intended for purpose, deliberately put there,
 functions for atomic bitops, because they are fast on sane processors and
 they work on everything else.
 
 I think the whole removing Alpha EV5 support is basically bonkers. Just
 use set_bit in the tty layer. Alpha will continue to work as well as it
 always has done and you won't design out support for any future processor
 that turns out not to do byte aligned stores.
 
 Alan
 

Is *that* what we are talking about?  I was added to this conversation
in the middle where it had already generalized, so I had no idea.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
On 09/07/2014 10:56 PM, James Bottomley wrote:
 On Sun, 2014-09-07 at 16:39 -0700, H. Peter Anvin wrote:
 How many PARISC systems do we have that actually do real work on Linux?
 
 I'd be very surprised if this problem didn't exist on all alignment
 requiring architectures, like PPC and Sparc as well.  I know it would be
 very convenient if all the world were an x86 ... but it would also be
 very boring as well.

I wouldn't be so sure about that.  That is a pretty aggressive
relaxation of ordering that PARISC has enacted here, kind of like the
Alpha we don't need no stinking byte accesses.

 The rules for coping with it are well known and a relaxation of what we
 currently do in the kernel, so I don't see what the actual problem is.
 
 In the context of this thread, PA can't do atomic bit sets (no atomic
 RMW except the ldcw operation) it can do atomic writes to fundamental
 sizes (byte, short, int, long) provided gcc emits the correct primitive
 (there are lots of gotchas in this, but that's not an architectural
 problem).  These atomicity guarantees depend on the underlying storage
 and are respected for main memory but not for any other type of bus.

So I'm not trying to push the all the world is an x86... certainly not
given that x86 has abnormally strict ordering rules and so is itself an
outlier.  What I *don't* really want to have to deal with is going
through more than causal effort to accommodate outliers which no longer
have any real value -- we have too much work to do.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
On 09/08/2014 12:09 PM, James Bottomley wrote:
 
 Um, I think you need to re-read the thread; that's not what I said at
 all. It's even written lower down: PA can't do atomic bit sets (no
 atomic RMW except the ldcw operation) it can do atomic writes to
 fundamental sizes (byte, short, int, long) provided gcc emits the
 correct primitive.  The original question was whether atomicity
 required native bus width access, which we currently assume, so there's
 no extant problem.
 

The issue at hand was whether or not partially overlapped (but natually
aligned) writes can pass each other.  *This* is the aggressive
relaxation to which I am referring.

I would guess that that is a very unusual constraint.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
On 09/08/2014 12:09 PM, James Bottomley wrote:
 
 Um, I think you need to re-read the thread; that's not what I said at
 all. It's even written lower down: PA can't do atomic bit sets (no
 atomic RMW except the ldcw operation) it can do atomic writes to
 fundamental sizes (byte, short, int, long) provided gcc emits the
 correct primitive.  The original question was whether atomicity
 required native bus width access, which we currently assume, so there's
 no extant problem.
 

The issue at hand was whether or not partially overlapped (but natually
aligned) writes can pass each other.  *This* is the aggressive
relaxation to which I am referring.

I would guess that that is a very unusual constraint.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
On 09/08/2014 03:43 PM, James Bottomley wrote:
 
 This was years ago (possibly decades).  We had to implement in-kernel
 unaligned traps for the networking layer because it could access short
 and int fields that weren't of the correct alignment when processing
 packets.  It that's all corrected now, we wouldn't really notice (except
 a bit of a speed up since an unaligned trap effectively places the
 broken out instructions into the bit stream).
 
 James
 

Well, ARM doesn't trap, it just silently gives garbage on unaligned
memory references.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
On 09/08/2014 07:56 PM, James Bottomley wrote:

 Yeah, the extra requirement I added is basically nonsense, since the
 only issue is what instructions the compiler is emitting. So if compiler
 thinks the alignment is natural and combines the writes -- ok. If the
 compiler thinks the alignment is off and doesn't combine the writes --
 also ok.
 
 Yes, I think I can agree that the only real problem is gcc thinking the
 store or load needs splitting.
 

That seems much saner, and yes, that applies to any architecture which
needs unaligned references.  Now, if the references are *actually*
unaligned they can end up being split even on x86, especially if they
cross cache line boundaries.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-08 Thread H. Peter Anvin
Add a short member for proper alignment and one will probably pop out.

Sent from my tablet, pardon any formatting problems.

 On Sep 8, 2014, at 19:56, James Bottomley 
 james.bottom...@hansenpartnership.com wrote:
 
 On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
 On 09/08/2014 01:50 AM, James Bottomley wrote:
 Two things: I think that gcc has given up on combining adjacent writes,
 perhaps because unaligned writes on some arches are prohibitive, so
 whatever minor optimization was believed to be gained was quickly lost,
 multi-fold. (Although this might not be the reason since one would
 expect that gcc would know the alignment of the promoted store).
 
 Um, gcc assumes architecturally correct alignment; that's why it pads
 structures.  Only when accessing packed structures will it use the
 lowest unit load/store.
 
 if you have a struct { char a, char b }; and load first a then b with a
 constant gcc will obligingly optimise to a short store.
 
 Um, please re-look at the test above. The exact test you describe is
 coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
 the kernel compiler options.
 
 In the generated code, please note the _absence_ of a combined write
 to two adjacent byte stores.
 
 So one version of gcc doesn't do it.  Others do because I've been
 surprised seeing it in assembly.
 
 But additionally, even if gcc combines adjacent writes _that are part
 of the program flow_ then I believe the situation is no worse than
 would otherwise exist.
 
 For instance, given the following:
 
 struct x {
spinlock_t lock;
long a;
byte b;
byte c;
 };
 
 void locked_store_b(struct x *p)
 {
spin_lock(p-lock);
p-b = 1;
spin_unlock(p-lock);
p-c = 2;
 }
 
 Granted, the author probably expects ordered writes of
STORE B
STORE C
 but that's not guaranteed because there is no memory barrier
 ordering B before C.
 
 Yes, there is: loads and stores may not migrate into or out of critical
 sections.
 
 That's a common misconception.
 
 The processor is free to re-order this to:
 
STORE C
STORE B
UNLOCK
 
 That's because the unlock() only guarantees that:
 
 Stores before the unlock in program order are guaranteed to complete
 before the unlock completes. Stores after the unlock _may_ complete
 before the unlock completes.
 
 My point was that even if compiler barriers had the same semantics
 as memory barriers, the situation would be no worse. That is, code
 that is sensitive to memory barriers (like the example I gave above)
 would merely have the same fragility with one-way compiler barriers
 (with respect to the compiler combining writes).
 
 That's what I meant by no worse than would otherwise exist.
 
 Actually, that's not correct.  This is actually deja vu with me on the
 other side of the argument.  When we first did spinlocks on PA, I argued
 as you did: lock only a barrier for code after and unlock for code
 before.  The failing case is that you can have a critical section which
 performs an atomically required operation and a following unit which
 depends on it being performed.  If you begin the following unit before
 the atomic requirement, you may end up losing.  It turns out this kind
 of pattern is inherent in a lot of mail box device drivers: you need to
 set up the mailbox atomically then poke it.  Setup is usually atomic,
 deciding which mailbox to prime and actually poking it is in the
 following unit.  Priming often involves an I/O bus transaction and if
 you poke before priming, you get a misfire.
 
 I see no problem with gcc write-combining in the absence of
 memory barriers (as long as alignment is still respected,
 ie., the size-promoted write is still naturally aligned). The
 combined write is still atomic.
 
 Actual alignment is pretty irrelevant.  That's why all architectures
 which require alignment also have to implement misaligned traps ... this
 is a fundamental requirement of the networking code, for instance.
 
 The main problem is gcc thinking there's a misalignment (or a packed
 structure).  That causes splitting of loads and stores and that destroys
 atomicity.
 
 Yeah, the extra requirement I added is basically nonsense, since the
 only issue is what instructions the compiler is emitting. So if compiler
 thinks the alignment is natural and combines the writes -- ok. If the
 compiler thinks the alignment is off and doesn't combine the writes --
 also ok.
 
 Yes, I think I can agree that the only real problem is gcc thinking the
 store or load needs splitting.
 
 James
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-07 Thread H. Peter Anvin
I'm confused why storing 0x0102 would be a problem.  I think gcc does that even 
on other cpus.

More atomicity can't hurt, can it?

On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney 
paul...@linux.vnet.ibm.com wrote:
On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
 On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
  On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
   On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:
 Hi James,
 
 On 09/04/2014 10:11 PM, James Bottomley wrote:
  On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:
  +And there are anti-guarantees:
  +
  + (*) These guarantees do not apply to bitfields, because
compilers often
  + generate code to modify these using non-atomic
read-modify-write
  + sequences.  Do not attempt to use bitfields to
synchronize parallel
  + algorithms.
  +
  + (*) Even in cases where bitfields are protected by
locks, all fields
  + in a given bitfield must be protected by one lock. 
If two fields
  + in a given bitfield are protected by different
locks, the compiler's
  + non-atomic read-modify-write sequences can cause an
update to one
  + field to corrupt the value of an adjacent field.
  +
  + (*) These guarantees apply only to properly aligned and
sized scalar
  + variables.  Properly sized currently means int
and long,
  + because some CPU families do not support loads and
stores of
  + other sizes.  (Some CPU families is currently
believed to
  + be only Alpha 21064.  If this is actually the case,
a different
  + non-guarantee is likely to be formulated.)
  
  This is a bit unclear.  Presumably you're talking about
definiteness of
  the outcome (as in what's seen after multiple stores to the
same
  variable).
 
 No, the last conditions refers to adjacent byte stores from
different
 cpu contexts (either interrupt or SMP).
 
  The guarantees are only for natural width on Parisc as
well,
  so you would get a mess if you did byte stores to adjacent
memory
  locations.
 
 For a simple test like:
 
 struct x {
  long a;
  char b;
  char c;
  char d;
  char e;
 };
 
 void store_bc(struct x *p) {
  p-b = 1;
  p-c = 2;
 }
 
 on parisc, gcc generates separate byte stores
 
 void store_bc(struct x *p) {
0:34 1c 00 02 ldi 1,ret0
4:0f 5c 12 08 stb ret0,4(r26)
8:34 1c 00 04 ldi 2,ret0
c:e8 40 c0 00 bv r0(rp)
   10:0f 5c 12 0a stb ret0,5(r26)
 
 which appears to confirm that on parisc adjacent byte data
 is safe from corruption by concurrent cpu updates; that is,
 
 CPU 0| CPU 1
  |
 p-b = 1 | p-c = 2
  |
 
 will result in p-b == 1  p-c == 2 (assume both values
 were 0 before the call to store_bc()).

What Peter said.  I would ask for suggestions for better
wording, but
I would much rather be able to say that single-byte reads and
writes
are atomic and that aligned-short reads and writes are also
atomic.

Thus far, it looks like we lose only very old Alpha systems, so
unless
I hear otherwise, I update my patch to outlaw these very old
systems.
   
   This isn't universally true according to the architecture manual.
 The
   PARISC CPU can make byte to long word stores atomic against the
memory
   bus but not against the I/O bus for instance.  Atomicity is a
property
   of the underlying substrate, not of the CPU.  Implying that
atomicity is
   a CPU property is incorrect.
  
  OK, fair point.
  
  But are there in-use-for-Linux PARISC memory fabrics (for normal
memory,
  not I/O) that do not support single-byte and double-byte stores?
 
 For aligned access, I believe that's always the case for the memory
bus
 (on both 32 and 64 bit systems).  However, it only applies to machine
 instruction loads and stores of the same width..  If you mix the
widths
 on the loads and stores, all bets are off.  That means you have to
 beware of the gcc penchant for coalescing loads and stores: if it
sees
 two adjacent byte stores it can coalesce them into a short store
 instead ... that screws up the atomicity guarantees.

OK, that means that to make PARISC work reliably, we need to use
ACCESS_ONCE() for loads and stores that could have racing accesses.
If I understand correctly, this will -not- be needed for code guarded
by locks, even with Peter's examples.

So if we have something like this:

   struct foo {
   char a;
   char b;
   };
   struct foo *fp;

then this code would be bad:

   fp-a = 1;
   

Re: bit fields data tearing

2014-09-07 Thread H. Peter Anvin
How many PARISC systems do we have that actually do real work on Linux?

On September 7, 2014 4:36:55 PM PDT, Paul E. McKenney 
paul...@linux.vnet.ibm.com wrote:
On Sun, Sep 07, 2014 at 04:17:30PM -0700, H. Peter Anvin wrote:
 I'm confused why storing 0x0102 would be a problem.  I think gcc does
that even on other cpus.
 
 More atomicity can't hurt, can it?

I must defer to James for any additional details on why PARISC systems
don't provide atomicity for partially overlapping stores.  ;-)

   Thanx, Paul

 On September 7, 2014 4:00:19 PM PDT, Paul E. McKenney
paul...@linux.vnet.ibm.com wrote:
 On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
  On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:
   On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:
On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:
 On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley
wrote:
  Hi James,
  
  On 09/04/2014 10:11 PM, James Bottomley wrote:
   On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney
wrote:
   +And there are anti-guarantees:
   +
   + (*) These guarantees do not apply to bitfields,
because
 compilers often
   + generate code to modify these using non-atomic
 read-modify-write
   + sequences.  Do not attempt to use bitfields to
 synchronize parallel
   + algorithms.
   +
   + (*) Even in cases where bitfields are protected by
 locks, all fields
   + in a given bitfield must be protected by one
lock. 
 If two fields
   + in a given bitfield are protected by different
 locks, the compiler's
   + non-atomic read-modify-write sequences can cause
an
 update to one
   + field to corrupt the value of an adjacent field.
   +
   + (*) These guarantees apply only to properly aligned
and
 sized scalar
   + variables.  Properly sized currently means
int
 and long,
   + because some CPU families do not support loads
and
 stores of
   + other sizes.  (Some CPU families is currently
 believed to
   + be only Alpha 21064.  If this is actually the
case,
 a different
   + non-guarantee is likely to be formulated.)
   
   This is a bit unclear.  Presumably you're talking about
 definiteness of
   the outcome (as in what's seen after multiple stores to
the
 same
   variable).
  
  No, the last conditions refers to adjacent byte stores
from
 different
  cpu contexts (either interrupt or SMP).
  
   The guarantees are only for natural width on Parisc as
 well,
   so you would get a mess if you did byte stores to
adjacent
 memory
   locations.
  
  For a simple test like:
  
  struct x {
long a;
char b;
char c;
char d;
char e;
  };
  
  void store_bc(struct x *p) {
p-b = 1;
p-c = 2;
  }
  
  on parisc, gcc generates separate byte stores
  
  void store_bc(struct x *p) {
 0: 34 1c 00 02 ldi 1,ret0
 4: 0f 5c 12 08 stb ret0,4(r26)
 8: 34 1c 00 04 ldi 2,ret0
 c: e8 40 c0 00 bv r0(rp)
10: 0f 5c 12 0a stb ret0,5(r26)
  
  which appears to confirm that on parisc adjacent byte data
  is safe from corruption by concurrent cpu updates; that
is,
  
  CPU 0| CPU 1
   |
  p-b = 1 | p-c = 2
   |
  
  will result in p-b == 1  p-c == 2 (assume both values
  were 0 before the call to store_bc()).
 
 What Peter said.  I would ask for suggestions for better
 wording, but
 I would much rather be able to say that single-byte reads
and
 writes
 are atomic and that aligned-short reads and writes are also
 atomic.
 
 Thus far, it looks like we lose only very old Alpha systems,
so
 unless
 I hear otherwise, I update my patch to outlaw these very old
 systems.

This isn't universally true according to the architecture
manual.
  The
PARISC CPU can make byte to long word stores atomic against
the
 memory
bus but not against the I/O bus for instance.  Atomicity is a
 property
of the underlying substrate, not of the CPU.  Implying that
 atomicity is
a CPU property is incorrect.
   
   OK, fair point.
   
   But are there in-use-for-Linux PARISC memory fabrics (for normal
 memory,
   not I/O) that do not support single-byte and double-byte stores?
  
  For aligned access, I believe that's always the case for the
memory
 bus
  (on both 32 and 64 bit systems).  However, it only applies to
machine
  instruction loads and stores of the same width..  If you mix the
 widths
  on the loads and stores, all bets are off.  That means you have to
  beware of the gcc penchant for coalescing loads and stores: if it
 sees
  two adjacent byte stores it can

Re: bit fields data tearing

2014-09-05 Thread H. Peter Anvin
On 09/05/2014 08:31 AM, Peter Hurley wrote:
 
 Which is a bit ironic because I remember when Digital had a team
 working on emulating native x86 apps on Alpha/NT.
 

Right, because the x86 architecture was obsolete and would never scale...

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-05 Thread H. Peter Anvin
On 09/05/2014 01:12 PM, Peter Zijlstra wrote:

 ... and I'm wondering if I should _remove_ pre-EV56 configurations or
 move the default choice and produce a warning about unsupported Alpha
 CPUs instead?

 
  depends BROKEN 
 
 or is that deprecated?
 

Just rip it out, like I did for the i386.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-05 Thread H. Peter Anvin
On 09/05/2014 01:14 PM, Peter Hurley wrote:
 
 Here's how I read the two statements.
 
 First, the commit message:
 
 It [this commit] documents that CPUs [supported by the Linux kernel]
 _must provide_ atomic one-byte and two-byte naturally aligned loads and 
 stores.
 
 Second, in the body of the document:
 
 The Linux kernel no longer supports pre-EV56 Alpha CPUs, because these
 older CPUs _do not provide_ atomic one-byte and two-byte loads and stores.
 

Does this apply in general or only to SMP configurations?  I guess
non-SMP configurations would still have problems if interrupted in the
wrong place...

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-04 Thread H. Peter Anvin
On 09/04/2014 02:52 AM, Benjamin Herrenschmidt wrote:
 
 Yeah correct, alpha and bytes right ? Is there any other ? That's why I
 suggested int.
 

Even for Alpha it is only the 21064 AFAIK.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-04 Thread H. Peter Anvin
On 09/04/2014 12:42 PM, Peter Hurley wrote:
 
 Or we could give up on the Alpha.
 

If Alpha is turning into Voyager (kept alive only as a museum piece, but
actively causing problems) then please let's kill it.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-04 Thread H. Peter Anvin
On 09/04/2014 05:59 PM, Peter Hurley wrote:
 I have no idea how prevalent the ev56 is compared to the ev5.
 Still we're talking about a chip that came out in 1996.

Ah yes, I stand corrected.  According to Wikipedia, the affected CPUs
were all the 2106x CPUs (EV4, EV45, LCA4, LCA45) plus the 21164 with no
suffix (EV5).  However, we're still talking about museum pieces here.

I wonder what the one I have in my garage is... I'm sure I could emulate
it faster, though.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: bit fields data tearing

2014-09-04 Thread H. Peter Anvin
On 09/04/2014 05:59 PM, Peter Hurley wrote:
 I have no idea how prevalent the ev56 is compared to the ev5.
 Still we're talking about a chip that came out in 1996.

Ah yes, I stand corrected.  According to Wikipedia, the affected CPUs
were all the 2106x CPUs (EV4, EV45, LCA4, LCA45) plus the 21164 with no
suffix (EV5).  However, we're still talking about museum pieces here.

I wonder what the one I have in my garage is... I'm sure I could emulate
it faster, though.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] arm64, ia64, ppc, s390, sh, tile, um, x86, mm: Remove default gate area

2014-07-18 Thread H. Peter Anvin
On 07/18/2014 09:53 AM, Andy Lutomirski wrote:
 
 Splitting this will be annoying: I'd probably have to add a flag asking for
 the new behavior, update all the arches, then remove the flag.  The chance
 of screwing up bisectability in the process seems pretty high.  This seems
 like overkill for a patch that mostly deletes code.
 
 Akpm, can you take this?
 

I'm fine with it as-is.

Acked-by: H. Peter Anvin h...@linux.intel.com


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 10/10] Kconfig: cleanup SERIO_I8042 dependencies

2013-12-14 Thread H. Peter Anvin
On 12/14/2013 08:59 AM, Mark Salter wrote:
 Remove messy dependencies from SERIO_I8042 by having it depend on one
 Kconfig symbol (ARCH_MIGHT_HAVE_PC_SERIO) and having architectures
 which need it select ARCH_MIGHT_HAVE_PC_SERIO in arch/*/Kconfig.
 New architectures are unlikely to need SERIO_I8042, so this avoids
 having an ever growing list of architectures to exclude.
 
 Signed-off-by: Mark Salter msal...@redhat.com
 CC: Dmitry Torokhov dmitry.torok...@gmail.com
 CC: Richard Henderson r...@twiddle.net
 CC: linux-al...@vger.kernel.org
 CC: Russell King li...@arm.linux.org.uk
 CC: linux-arm-ker...@lists.infradead.org
 CC: Tony Luck tony.l...@intel.com
 CC: Fenghua Yu fenghua...@intel.com
 CC: linux-i...@vger.kernel.org
 CC: Ralf Baechle r...@linux-mips.org
 CC: linux-m...@linux-mips.org
 CC: Benjamin Herrenschmidt b...@kernel.crashing.org
 CC: Paul Mackerras pau...@samba.org
 CC: linuxppc-dev@lists.ozlabs.org
 CC: Paul Mundt let...@linux-sh.org
 CC: linux...@vger.kernel.org
 CC: David S. Miller da...@davemloft.net
 CC: sparcli...@vger.kernel.org
 CC: Guan Xuetao g...@mprc.pku.edu.cn
 CC: Ingo Molnar mi...@redhat.com
 CC: Thomas Gleixner t...@linutronix.de
 CC: H. Peter Anvin h...@zytor.com
 CC: x...@kernel.org

Acked-by: H. Peter Anvin h...@linux.intel.com


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-10 Thread H. Peter Anvin
On 10/10/2013 03:17 AM, Alexander Gordeev wrote:
 On Wed, Oct 09, 2013 at 03:24:08PM +1100, Benjamin Herrenschmidt wrote:
 
 Ok, this suggestion sounded in one or another form by several people.
 What about name it pcim_enable_msix_range() and wrap in couple more
 helpers to complete an API:
 
 int pcim_enable_msix_range(pdev, msix_entries, nvec, minvec);
   0 - error code
   0 - number of MSIs allocated, where minvec = result = nvec
 
 int pcim_enable_msix(pdev, msix_entries, nvec);
   0 - error code
   0 - number of MSIs allocated, where 1 = result = nvec 
 
 int pcim_enable_msix_exact(pdev, msix_entries, nvec);
   0 - error code
   0 - number of MSIs allocated, where result == nvec
 
 The latter's return value seems odd, but I can not help to make
 it consistent with the first two.
 

Is there a reason for the wrappers, as opposed to just specifying either
1 or nvec as the minimum?

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern

2013-10-08 Thread H. Peter Anvin
On 10/02/2013 03:29 AM, Alexander Gordeev wrote:
 
 As result, device drivers will cease to use the overcomplicated
 repeated fallbacks technique and resort to a straightforward
 pattern - determine the number of MSI/MSI-X interrupts required
 before calling pci_enable_msi_block() and pci_enable_msix()
 interfaces:
 
 
   rc = pci_msix_table_size(adapter-pdev);
   if (rc  0)
   return rc;
 
   nvec = min(nvec, rc);
   if (nvec  FOO_DRIVER_MINIMUM_NVEC) {
   return -ENOSPC;
 
   for (i = 0; i  nvec; i++)
   adapter-msix_entries[i].entry = i;
 
   rc = pci_enable_msix(adapter-pdev,
adapter-msix_entries, nvec);
   return rc;
 

Why not add a minimum number to pci_enable_msix(), i.e.:

pci_enable_msix(pdev, msix_entries, nvec, minvec)

... which means nvec is the number of interrupts *requested*, and
minvec is the minimum acceptable number (otherwise fail).

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] arch: configuration, deleting 'CONFIG_BUG' since always need it.

2013-05-28 Thread H. Peter Anvin
On 05/28/2013 01:19 AM, Ingo Molnar wrote:
 
 So I think the same principle applies to it as to any other debugging 
 code: it's fine to be able to turn debugging off. It's a performance 
 versus kernel robustness/determinism trade-off.
 

I suspect, rather, that BUG() should turn into a trap (or jump to a
death routine) under any circumstances.  The one thing that can be
omitted for small configurations are the annotations, which only serve
to output a more human-readable error message.

-hpa


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] arch: configuration, deleting 'CONFIG_BUG' since always need it.

2013-05-28 Thread H. Peter Anvin
On 05/28/2013 08:43 AM, Arnd Bergmann wrote:
 
 Right, that is what the patch I just posted does.
 
 On a related note, I found that WARN_ON() can no longer be compiled
 out since there is already code that relies on the side-effects of
 the condition. I assume that was an intentional change I missed,
 since it used to be defined so that you could remove it completely.
 

It is possible to define WARN_ON() as:

#define WARN_ON(x) ((void)(x))

... which preserves side effects.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/3] x86/mm/numa: use setup_nr_node_ids() instead of opencoding.

2013-03-27 Thread H. Peter Anvin
On 03/26/2013 10:56 AM, Yinghai Lu wrote:
 
 For 1 and 2,
 
 Acked-by: Yinghai Lu ying...@kernel.org
 

Similarly:

Acked-by: H. Peter Anvin h...@linux.intel.com

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2 4/6] x86: Add clear_page_nocache

2012-08-09 Thread H. Peter Anvin

On 08/09/2012 08:03 AM, Kirill A. Shutemov wrote:

From: Andi Kleen a...@linux.intel.com

Add a cache avoiding version of clear_page. Straight forward integer variant
of the existing 64bit clear_page, for both 32bit and 64bit.

Also add the necessary glue for highmem including a layer that non cache
coherent architectures that use the virtual address for flushing can
hook in. This is not needed on x86 of course.

If an architecture wants to provide cache avoiding version of clear_page
it should to define ARCH_HAS_USER_NOCACHE to 1 and implement
clear_page_nocache() and clear_user_highpage_nocache().



Compile failure:

/home/hpa/kernel/tip.x86-mm/arch/x86/mm/fault.c: In function 
‘clear_user_highpage_nocache’:
/home/hpa/kernel/tip.x86-mm/arch/x86/mm/fault.c:1215:30: error: 
‘KM_USER0’ undeclared (first use in this function)
/home/hpa/kernel/tip.x86-mm/arch/x86/mm/fault.c:1215:30: note: each 
undeclared identifier is reported only once for each function it appears in
/home/hpa/kernel/tip.x86-mm/arch/x86/mm/fault.c:1215:2: error: too many 
arguments to function ‘kmap_atomic’
In file included from 
/home/hpa/kernel/tip.x86-mm/include/linux/pagemap.h:10:0,
 from 
/home/hpa/kernel/tip.x86-mm/include/linux/mempolicy.h:70,
 from 
/home/hpa/kernel/tip.x86-mm/include/linux/hugetlb.h:15,

 from /home/hpa/kernel/tip.x86-mm/arch/x86/mm/fault.c:14:
/home/hpa/kernel/tip.x86-mm/include/linux/highmem.h:66:21: note: 
declared here

make[4]: *** [arch/x86/mm/fault.o] Error 1
make[3]: *** [arch/x86/mm] Error 2
make[2]: *** [arch/x86] Error 2
make[1]: *** [sub-make] Error 2
make[1]: Leaving directory `/home/hpa/kernel/tip.x86-mm'

This happens on *all* my test configurations, including both x86-64 and 
i386 allyesconfig.  I suspect your patchset base is stale.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-next: build warnings after merge of the final tree

2011-12-14 Thread H. Peter Anvin
On 12/14/2011 08:32 AM, Steven Rostedt wrote:
 On Wed, 2011-12-14 at 18:15 +1100, Stephen Rothwell wrote:
 
 Maybe caused by commit d5e553d6e0a4 (trace: Include asm/asm-offsets.h
 in trace_syscalls.c) from the tip tree.  These warnings may have been
 here for a while as it is hard to catch the new ones among the flood.

 
 hpa,
 
 Was this change only needed for x86? If so, could you put that into
 asm/ftrace.h instead?
 

Yes (on both accounts).  It was part of the syscall changes; I'll move
the include.

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCHv5] atomic: add *_dec_not_zero

2011-12-04 Thread H. Peter Anvin
On 12/04/2011 02:41 PM, Benjamin Herrenschmidt wrote:
 
 I agree with Russell, his approach is a lot easier to maintain long run,
 we should even consider converting existing definitions.
 

Thirded.

-hpa


-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC] virtio: expose for non-virtualization users too

2011-07-05 Thread H. Peter Anvin
On 07/05/2011 07:06 AM, Ohad Ben-Cohen wrote:
 virtio has been so far used only in the context of virtualization,
 and the virtio Kconfig was sourced directly by the relevant arch
 Kconfigs when VIRTUALIZATION was selected.
 
 Now that we start using virtio for inter-processor communications,
 we need to source the virtio Kconfig outside of the virtualization
 scope too.
 

Seems reasonable to me.

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-15 Thread H. Peter Anvin
On 06/15/2011 12:47 AM, Russell King - ARM Linux wrote:

 OK, serial-8250 is clearly just plain wrong, since the 8250 series UARTs
 are ubiquitous across just about every platform.

 Floppy is special (in the short bus sense), since it is closely tied to
 ISA DMA.  Conditionalizing this on ISA DMA makes total sense.
 
 No it doesn't.  It depends on the ISA DMA API, not that the machine has
 ISA DMA.
 
 I have a platform which has no ISA DMA but uses the floppy driver.  Please
 don't break it.

OK, even more case in point, then.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-15 Thread H. Peter Anvin
On 06/15/2011 12:39 AM, Russell King - ARM Linux wrote:
 On Tue, Jun 14, 2011 at 09:31:27PM -0700, H. Peter Anvin wrote:
 On 06/14/2011 03:08 PM, Luck, Tony wrote:
 I took a look at the back of all my ia64 systems - none of them
 have a parallel port.  It seems unlikely that new systems will
 start adding parallel ports :-)

 So even if I had a printer (or other device) that used a parallel
 port, I have no way to test it.

 If it has PCI slots, it can have a parallel port.
 
 Is that a clue about where a select statement should be?

Not really, because it's a sufficient condition, not a required one.

All a platform needs to expose a PC-style parallel port interface is a
minimum of 3 contiguous I/O locations, and although in the PC they are
I/O mapped, they don't need to be.

The basic (SPP) parallel port interface is really just a glorified set
of GPIOs and could at least in theory be implemented as-is on any
platform with contiguous GPIO ports.  The faster modes (EPP and ECP) do
contain logic, and ECP depends on the ISA DMA API (thanks to Russell for
pointing out that actual ISA DMA is not required, any slave DMA solution
will do.)


-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-14 Thread H. Peter Anvin
On 06/14/2011 12:08 PM, Ralf Baechle wrote:
 The PC parallel port Kconfig as acquired one of those messy terms to
 describe it's architecture dependencies:
 
depends on (!SPARC64 || PCI)  !SPARC32  !M32R  !FRV  \
(!M68K || ISA)  !MN10300  !AVR32  !BLACKFIN
 
 This isn't just ugly - it also almost certainly describes the dependencies
 too coarse grainedly.  This is an attempt at cleaing the mess up.
 
 I tried to faithfully aproximate the old behaviour but the existing
 behaviour seems inacurate if not wrong for some architectures or platforms.
 To improve on this I rely on comments from other arch and platforms
 maintainers.  Any system that can take PCI multi-IO card or has a PC-style
 parallel port on the mainboard should probably should now do a
 select HAVE_PC_PARPORT.  And some arch Kconfig files should further
 restrict the use of HAVE_PC_PARPORT to only those platforms that actually
 need it.
 

Why on earth restrict it like that?  It's just a device driver, like
more or less any other device driver...

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-14 Thread H. Peter Anvin
On 06/14/2011 03:34 PM, Ralf Baechle wrote:
 
 There is no point in offering to build something that couldn't possibly be
 used.  It just makes the kernel harder to configure and inflates the test
 matrix for no good reason.
 

I see... that's why a bunch of devices that only exist on ARM and MIPS
SoCs are offered on x86 platforms?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-14 Thread H. Peter Anvin
On 06/14/2011 02:33 PM, Arnd Bergmann wrote:

 Why on earth restrict it like that?  It's just a device driver, like
 more or less any other device driver...
 
 I'd say any other classic ISA/PC driver, including floppy, gameport or
 serial-8250. One problem with these is that we never fully worked out
 the dependencies for these, which we probably should. CONFIG_ISA
 generally means ISA add-on cards, but that might not be enabled for
 platforms that have a pc-parport but no ISA slots.
 

OK, serial-8250 is clearly just plain wrong, since the 8250 series UARTs
are ubiquitous across just about every platform.

Floppy is special (in the short bus sense), since it is closely tied to
ISA DMA.  Conditionalizing this on ISA DMA makes total sense.

Parallel port is an intermediate case... Centronics parallel ports
predate the PC ecosystem by quite a bit, and the particular arrangement
of ports became popular with the PC and spread to other platforms, but
the particular variant of it known as ECP (as opposed to EPP) is ISA DMA
specific.

 On the other hand, you have embedded platforms that currently build support
 for parport-pc but define the inb/outb macros to plain pointer dereferences
 (otherwise you can't build the 8250 driver). Loading parport-pc on those
 machines typically results in derefencing user memory in the best case.

 What I'd love to see is a configuration option for arch has working
 PC-style inb/outb instructions, so we can build a kernel without them but
 still get MMIO based drivers for PCI-less platforms.

Now, isn't that was iowrite/ioread was designed for?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-14 Thread H. Peter Anvin
On 06/14/2011 03:08 PM, Luck, Tony wrote:
 diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
 index 38280ef..849805a 100644
 --- a/arch/ia64/Kconfig
 +++ b/arch/ia64/Kconfig
 @@ -23,6 +23,7 @@ config IA64
   select HAVE_ARCH_TRACEHOOK
   select HAVE_DMA_API_DEBUG
   select HAVE_GENERIC_HARDIRQS
 + select HAVE_PC_PARPORT
   select GENERIC_IRQ_PROBE
   select GENERIC_PENDING_IRQ if SMP
   select IRQ_PER_CPU
 
 I took a look at the back of all my ia64 systems - none of them
 have a parallel port.  It seems unlikely that new systems will
 start adding parallel ports :-)
 
 So even if I had a printer (or other device) that used a parallel
 port, I have no way to test it.
 

If it has PCI slots, it can have a parallel port.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC,PATCH] Cleanup PC parallel port Kconfig

2011-06-14 Thread H. Peter Anvin
On 06/14/2011 09:40 PM, Guenter Roeck wrote:
 On Wed, Jun 15, 2011 at 12:18:36AM -0400, H. Peter Anvin wrote:
 On 06/14/2011 03:34 PM, Ralf Baechle wrote:

 There is no point in offering to build something that couldn't possibly be
 used.  It just makes the kernel harder to configure and inflates the test
 matrix for no good reason.


 I see... that's why a bunch of devices that only exist on ARM and MIPS
 SoCs are offered on x86 platforms?

 http://en.wikipedia.org/wiki/Two_wrongs_make_a_right
 

Except in this case it's not wrong.  It was done that way because it was
discovered a long time ago that restricting drivers that were not
*inherently* limited to specific platform just resulted in more bitrot
and nasty surprises for the users who *did* need specific things after
all, even though the maintainers had not thought so.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] Audit: push audit success and retcode into arch ptrace.h

2011-06-02 Thread H. Peter Anvin
On 06/02/2011 02:04 PM, Eric Paris wrote:
 The audit system previously expected arches calling to audit_syscall_exit to
 supply as arguments if the syscall was a success and what the return code was.
 Audit also provides a helper AUDITSC_RESULT which was supposed to simplify by
 converting from negative retcodes to an audit internal magic value stating
 success or failure.  This helper was wrong and could indicate that a valid
 pointer returned to userspace was a failed syscall.  The fix is to fix the
 layering foolishness.  We now pass audit_syscall_exit a struct pt_reg and it
 in turns calls back into arch code to collect the return value and to
 determine if the syscall was a success or failure.
 
 In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
 audit code.  But the ptrace_64.h code was previously using regs[3] for the
 regs_return_value().  I have no idea which one is correct, but this patch now
 uses regs[3].
 
 Signed-off-by: Eric Paris epa...@redhat.com

For the x86 portions:

Acked-by: H. Peter Anvin h...@zytor.com

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 0/5] make *_gate_vma accept mm_struct instead of task_struct II

2011-03-10 Thread H. Peter Anvin
Sorry... I confused them too. It's TS_COMPAT which is problematic.
-- 
Sent from my mobile phone. Please pardon any lack of formatting.

Stephen Wilson wils...@start.ca wrote:

On Thu, Mar 10, 2011 at 08:38:09AM -0800, Andi Kleen wrote:  On Thu, Mar 10, 
2011 at 08:00:32AM -0800, Andi Kleen wrote:   On Tue, Mar 08, 2011 at 
07:31:56PM -0500, Stephen Wilson wrote:The only architecture this change 
impacts in any significant way is x86_64.The principle change on that 
architecture is to mirror TIF_IA32 viaa new flag in mm_context_t.
 The problem is -- you're adding a likely cache miss on mm_struct for   
every 32bit compat syscall now, even if they don't need mm_struct   currently 
(and a lot of them do not) Unless there's a very good   justification to make 
up for this performance issue elsewhere   (including numbers) this seems like 
a bad idea.   Hmm I see you're only setting it on exec time actually on 
rereading  the patches. I thought you were changing TS_COMPAT which is in  
the syscall path.   Never mind. I have no problems with doing such a change 
on exec  time. OK. Great! Does this mean I have your ACK'e!
 d by or
reviewed by? Thanks for taking a look! -- steve 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/5] make *_gate_vma accept mm_struct instead of task_struct

2011-03-10 Thread H. Peter Anvin
TIF_IA32 is set during the execution of a 32-bit system call - so touched on 
each compat system call. Is this the actual flag you want? A 32-bit address 
space flag is different from TIF_IA32.
-- 
Sent from my mobile phone. Please pardon any lack of formatting.

Stephen Wilson wils...@start.ca wrote:

On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:  On Tue, Mar 08, 
2011 at 07:31:56PM -0500, Stephen Wilson wrote: Morally, the question 
of whether an address lies in a gate vma should be asked   with respect to an 
mm, not a particular task. Practically, dropping the dependency on 
task_struct will help make current and   future operations on mm's more 
flexible and convenient. In particular, it   allows some code paths to avoid 
the need to hold task_lock. The only architecture this change impacts 
in any significant way is x86_64.   The principle change on that architecture 
is to mirror TIF_IA32 via   a new flag in mm_context_t.   The problem is -- 
you're adding a likely cache miss on mm_struct for  every 32bit compat syscall 
now, even if they don't need mm_struct  currently (and a lot of them do not) 
Unless there's a very good  justification to make up for this performance 
issue elsewhere  (including numbers) this seems like !
 a bad
idea. I do not think this will result in cache misses on the scale you suggest. 
I am simply mirroring the *state* of the TIF_IA32 flag in mm_struct, not 
testing/accessing it in the same way. The only place where this flag is 
accessed (outside the exec() syscall path) is in x86/mm/init_64.c, 
get_gate_vma(), which in turn is needed by a few, relatively heavy weight, page 
locking/pinning routines on the mm side (get_user_pages, for example). Patches 
3 and 4 in the series show the extent of the change. Or am I missing something? 
  /proc/pid/mem. I will be posting the second series to lkml shortly. These  
 Making every syscall slower for /proc/pid/mem doesn't seem like a good  
tradeoff to me. Please solve this in some other way.   -Andi -- steve 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC] [PATCH] allow low HZ values?

2010-10-11 Thread H. Peter Anvin
On 10/11/2010 03:33 PM, Thomas Gleixner wrote:
 On Tue, 12 Oct 2010, Benjamin Herrenschmidt wrote:
 
 On Mon, 2010-10-11 at 13:11 -0700, Tim Pepper wrote:
 I'm not necessarily wanting to open up the age old question of what is
 a good HZ, but we were doing some testing on timer tick overheads for
 HPC applications and this came up...

 Note that this is also very useful when working on CPU prototypes
 implemented in FPGAs and running at something like 12Mhz :-)
 
 /me hands benh 0.5$ for a FPGA upgrade

That's often not possible if the CPU cannot be mapped onto a single FPGA
(either because the core is too large, multiple cores are tested, or
because there is debugging logic is included.)  The interconnects slows
things down tremendously.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 31/40] trace syscalls: Convert various generic compat syscalls

2010-06-23 Thread H. Peter Anvin
On 06/23/2010 04:38 AM, Frederic Weisbecker wrote:
 
 I haven't heard any complains about existing syscalls wrappers.
 

Then you truly haven't been listening.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 12/40] x86, compat: convert ia32 layer to use

2010-06-23 Thread H. Peter Anvin
On 06/23/2010 04:41 AM, Frederic Weisbecker wrote:
 On Wed, Jun 23, 2010 at 12:46:04PM +0200, Christoph Hellwig wrote:
 On Wed, Jun 23, 2010 at 12:36:21PM +0200, Frederic Weisbecker wrote:
 I think we wanted that to keep the sys32_ prefixed based naming, to avoid
 collisions with generic compat handler names.

 For native syscalls we do this by adding a arch prefix inside the
 syscall name, e.g.:

 arch/s390/kernel/sys_s390.c:SYSCALL_DEFINE(s390_fallocate)(int fd, int mode, 
 loff_t offset,
 arch/sparc/kernel/sys_sparc_64.c:SYSCALL_DEFINE1(sparc_pipe_real, struct 
 pt_regs *, regs)
 
 In fact we sort of wanted to standardize the name of arch overriden compat
 syscalls, so that userspace programs playing with syscalls tracing won't have
 to deal with arch naming differences.
 

That seems totally wrong in so many ways.

What userspace sees is the system call name, e.g. fallocate or pipe.  It
should *not* be visible to userspace that there is an arch-specici
implementation.

There are already a huge amount of gratuitous user space ABI
differences, of course, partly because we *still* don't actually have a
systematic model for argument passing.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v21 011/100] eclone (11/11): Document sys_eclone

2010-06-09 Thread H. Peter Anvin
On 06/09/2010 11:14 AM, Sukadev Bhattiprolu wrote:
 | 
 | Even for x86, it's an easier API. Callers would be specifying
 | two numbers they already have: the argument and return value
 | for malloc. Currently the numbers must be added together,
 | destroying information, except on hppa (must not add size)
 | and ia64 (must use what I'm proposing already).
 
 I agree its easier and would avoid #ifdefs in the applications.
 
 Peter, Arnd, Roland - do you have any concerns with requiring all
 architectures to specify the stack to eclone() as [base, offset]
 

Makes sense to me.  There might be advantages to be able to track the
size of the stack allocation even for other architectures, too.

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Badness on the Warp

2009-06-21 Thread H. Peter Anvin
Frans Pop wrote:
 
 The fact that your bisect ended at a merge essentially means that it is 
 invalid. As a merge does not introduce any actual change (unless it 
 includes changes to resolve conflicts), it normally cannot be the cause 
 of a regression.
 

Sort of.  You can have a bum merge where code conflicts logically,
even if it isn't visible as a conflict in git.  Furthermore, of course,
you can have misresolved actual conflicts.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2 1/6] Add new macros for page-aligned data and bss sections.

2009-05-01 Thread H. Peter Anvin
Tim Abbott wrote:
 On Fri, 1 May 2009, Sam Ravnborg wrote:
 
 On Thu, Apr 30, 2009 at 03:54:08PM -0400, Tim Abbott wrote:
 +#define __PAGE_ALIGNED_DATA.section .data.page_aligned, aw, 
 @progbits
 +#define __PAGE_ALIGNED_BSS .section .bss.page_aligned, aw, @nobits
 It is my understanding that the linker will automatically
 assume nobits for section names starting with .bss and likewise
 progbits for section names starting with .data - so we can leave them out?
 
 I believe that is correct.
 

... but that doesn't mean it's the right thing to do.

It's better to be fully explicit when macroizing this kind of stuff.
This is part of why macroizing it is good: it means we end up with *one*
place that determines this stuff, not some magic heuristics in the linker.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v2 1/6] Add new macros for page-aligned data and bss sections.

2009-05-01 Thread H. Peter Anvin
Sam Ravnborg wrote:
 On Fri, May 01, 2009 at 09:33:13AM -0700, H. Peter Anvin wrote:
 Tim Abbott wrote:
 On Fri, 1 May 2009, Sam Ravnborg wrote:

 On Thu, Apr 30, 2009 at 03:54:08PM -0400, Tim Abbott wrote:
 +#define __PAGE_ALIGNED_DATA  .section .data.page_aligned, aw, 
 @progbits
 +#define __PAGE_ALIGNED_BSS   .section .bss.page_aligned, aw, 
 @nobits
 It is my understanding that the linker will automatically
 assume nobits for section names starting with .bss and likewise
 progbits for section names starting with .data - so we can leave them out?
 I believe that is correct.

 ... but that doesn't mean it's the right thing to do.

 It's better to be fully explicit when macroizing this kind of stuff.
 This is part of why macroizing it is good: it means we end up with *one*
 place that determines this stuff, not some magic heuristics in the linker.
 
 Do you know if we can use % in place of @?
 I could see that gas supports both - at least in trunk in cvs.
 

I think it might depend on the architecture(!)... but it would
definitely have to be an issue with testing a bunch of different versions.

What's wrong with @?

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v2 1/6] Add new macros for page-aligned data and bss sections.

2009-05-01 Thread H. Peter Anvin
Sam Ravnborg wrote:

 What's wrong with @?
 arm does not support it :-(
 I recall it denote a comment in arm assembler.
 
 I could do some magic to detect the ARM case but I'm reluctant to do so.
 I could also ignore the arm issue for now as it is not used by arm,
 but that strikes me as the wrong approach.
 

If we really have to use different tokens, I would say:

#ifdef __ARM__
# define _PROGBITS %progbits/* or whatever */
# define _NOBITS   %nobits
#else
# define _PROGBITS @progbits
# define _NOBITS   @nobits
#endif

Otherwise we probably need to ask the binutils people...

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 3/3] powerpc/86xx: Add MMC SPI support for MPC8610HPCD boards

2009-04-04 Thread H. Peter Anvin
Anton Vorontsov wrote:
 This patch adds spi and mmc-spi-slot nodes, plus a gpio-controller
 for PIXIS' sdcsr bank that is used for managing SPI chip-select and
 for reading card's states.
 
 Note that spi-max-frequency is lowered since at high frequencies SD
 cards don't work reliably (there is some problem with the chip select
 line, it's probably quite slow because it's routed via PIXIS).
 
 Previously there was a work around: we didn't use chip-select line
 at all, but some cards don't like that, so we'd better use the low
 frequency.
 
 Signed-off-by: Anton Vorontsov avoront...@ru.mvista.com

I have to admit being somewhat confused.  The CS# line is pulled low and
stays low for the duration of a command, so it shouldn't affect the SPI
frequency, only the delay between commands.

You're saying some cards don't like not using the CS# line ... in
particular, I know there are hardware devices which toggle the clock at
low frequency ( 400 kHz) for a while (I think the minimum is 88 clocks
or so) and then permanently assert the CS# line before giving the
starting CMD0.  Even so, the CMD0 is supposed to be sent at  400 kHz
since the card will still be in open drain mode at that point.

Does that disagree with your observations?  If so, I'd be really
interested to find out what you have seen in more detail, as it adds to
the understanding of MMC/SD cards in the field.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 59/60] microblaze_v4: syscall_table.S and unistd.h

2008-06-26 Thread H. Peter Anvin

Arnd Bergmann wrote:


You still set __NR_fork. There is no point defining the number if you
can't actually call the syscall in the first place.



Worse, it is actively *harmful* to set the number; klibc or anything 
that uses similar kinds of scripts for portability will see the symbol 
and think the system call is available.


-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Microblaze init port v4

2008-06-26 Thread H. Peter Anvin

Arnd Bergmann wrote:


* You are using sys_ipc and sys_socketcall instead of the
broken-out syscalls, which is a direct consequence of following
the i386 numbering scheme.



Please pretty please kill, kill, kill...

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 48/60] microblaze_v4: headers simple files - empty or redirect to asm-generic

2008-06-26 Thread H. Peter Anvin

Arnd Bergmann wrote:

On Thursday 26 June 2008, Adrian Bunk wrote:
The comment could be nuked (as well as the #ifdef __KERNEL__), but for 
the one #define an asm-generic header would IMHO be overkill.


I agree that it doesn't technically make sense to have a one-line
asm-generic header, but I like the idea of reducing the number of
asm/*.h files that actually contain anything other that
#include asm-generic/$FILE.h, mostly for documentation purposes.

If we can eventually agree on a way to get rid of the requirement
for explicit redirection, we can remove a larger number of source
files completely.



The sanest way to do that would probably be something along the lines of:

- Change include/asm-xxx to arch/xxx/include/asm
- Create arch/generic
- Make sure arch/xxx/include/asm and arch/generic/include/asm are both 
in the include path, in that order.


That would also get rid of the symlink.

On the other hand, the redirection isn't all that bad.

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 30/60] microblaze_v4: support for a.out

2008-06-26 Thread H. Peter Anvin

Michal Simek wrote:

I expect it but I haven't information about. I'll check it.


I'm assuming Microblaze doesn't actually add a.out support, does it?

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 30/60] microblaze_v4: support for a.out

2008-06-26 Thread H. Peter Anvin

Michal Simek wrote:

Michal Simek wrote:

I expect it but I haven't information about. I'll check it.

I'm assuming Microblaze doesn't actually add a.out support, does it?


I think that any support were there but I assume that no one use it.


Most other new architectures don't bother, hence the question.

-hpa

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: How to link a .o with all modules

2008-05-04 Thread H. Peter Anvin

Kumar Gala wrote:

Sam,

We have a case in powerpc in which we want to link some library routines 
with all module objects.  The routines are intended for handling 
out-of-line function call register save/restore so having them as 
EXPORT_SYMBOL() is counter productive (we do also need to link the same 
library code into the kernel).




Why is having them as an EXPORT_SYMBOL() counterproductive?  It sounds 
like *exactly* what you need -- and then having the kernel provide the 
same code to modules, instead of replication...?


-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Too verbose compat_ioctl messages?

2007-07-06 Thread H. Peter Anvin
Andi Kleen wrote:
 H. Peter Anvin [EMAIL PROTECTED] writes:
 For one thing, it looks like we're returning the wrong thing (EINVAL
 rather than ENOTTY) across the board.  This was unfortunately a common
 misunderstanding with non-tty-related ioctls in the early days of Linux.
 
 ENOTTY is so excessively misnamed that it is actually surprising
 anybody ever got that right (for very small values of right i guess)
 

No argument there.

-hpa
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev