Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm

2015-02-09 Thread Konstantin Belousov
On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote:
 On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote:
 
  Log:
   Do not mark shared structures as __packed, it leads to race condition
 
   If structure packed as __packed clang (and probably gcc) generates
   code that loads word fields (e.g. tx_pos)  byte-by-byte and if it's
   modified by VideoCore in the same time as ARM loads the value result
   is going to be mixed combination of bytes from previous value and
   new one.
 
 Most uses of __packed are bugs.  It gives pessimizations as well as
 non-atomic accesses for sub-object accesses.
 
 I think the full bugs only occur when arch has strict alignment
 requirements and the alignment of the __packed objects is not known.
 This means that only lesser bugs occur on x86 (unless you enable
 alignment checking, but this arguably breaks the ABI).  The compiler
 just generates possibly-misaligned full-width accesses if the arch
 doesn't have strict alignment requirements.  Often the acceses turn
 out to be aligned at runtime.  Otherwise, the hardware does them
 atomically, with a smaller efficiency penalty than split accesses.

On x86 unaligned access is non-atomic.  This was very visible on
Core2 CPUs where DPCPU code mishandled the alignment, resulting in
the mutexes from the per-cpu areas breaking badly.

Modern CPUs should not lock several cache lines simultaneously either.
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm

2015-02-09 Thread Bruce Evans

On Mon, 9 Feb 2015, Konstantin Belousov wrote:


On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote:

On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote:
...
I think the full bugs only occur when arch has strict alignment
requirements and the alignment of the __packed objects is not known.
This means that only lesser bugs occur on x86 (unless you enable
alignment checking, but this arguably breaks the ABI).  The compiler
just generates possibly-misaligned full-width accesses if the arch
doesn't have strict alignment requirements.  Often the acceses turn
out to be aligned at runtime.  Otherwise, the hardware does them
atomically, with a smaller efficiency penalty than split accesses.


On x86 unaligned access is non-atomic.  This was very visible on
Core2 CPUs where DPCPU code mishandled the alignment, resulting in
the mutexes from the per-cpu areas breaking badly.

Modern CPUs should not lock several cache lines simultaneously either.


Interesting.  I thought that this was relatively easy to handle in
hardware and required for compatibility, so hardware did it.

This gives a reason other than efficiency to enable alignment checking
so as to find all places that do misaligned accesses.  I last tried this
more than 20 years ago.  Compilers mostly generated aligned accesses.
One exception was for copying small (sub)structs.  Inlining of the copy
assumed maximal alignment or no alignment traps.  Library functions are
more of a problem.  FreeBSD amd64 and i386 memcpy also assume this.
Similarly for the MD mem* in the kernel.  Mostly things are suitably
aligned, so it is the correct optimization to not do extra work to align.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm

2015-02-09 Thread Konstantin Belousov
On Mon, Feb 09, 2015 at 08:51:02PM +1100, Bruce Evans wrote:
 On Mon, 9 Feb 2015, Konstantin Belousov wrote:
 
  On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote:
  On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote:
  ...
  I think the full bugs only occur when arch has strict alignment
  requirements and the alignment of the __packed objects is not known.
  This means that only lesser bugs occur on x86 (unless you enable
  alignment checking, but this arguably breaks the ABI).  The compiler
  just generates possibly-misaligned full-width accesses if the arch
  doesn't have strict alignment requirements.  Often the acceses turn
  out to be aligned at runtime.  Otherwise, the hardware does them
  atomically, with a smaller efficiency penalty than split accesses.
 
  On x86 unaligned access is non-atomic.  This was very visible on
  Core2 CPUs where DPCPU code mishandled the alignment, resulting in
  the mutexes from the per-cpu areas breaking badly.
 
  Modern CPUs should not lock several cache lines simultaneously either.
 
 Interesting.  I thought that this was relatively easy to handle in
 hardware and required for compatibility, so hardware did it.
Trying to lock to cache lines easily results in deadlock.
FWIW, multi-socket Intel platforms are already deadlock-prone due
to the cache, and have some facilities to debug this.

 
 This gives a reason other than efficiency to enable alignment checking
 so as to find all places that do misaligned accesses.  I last tried this
 more than 20 years ago.  Compilers mostly generated aligned accesses.
 One exception was for copying small (sub)structs.  Inlining of the copy
 assumed maximal alignment or no alignment traps.  Library functions are
 more of a problem.  FreeBSD amd64 and i386 memcpy also assume this.
 Similarly for the MD mem* in the kernel.  Mostly things are suitably
 aligned, so it is the correct optimization to not do extra work to align.

I also did experiments with preloadable dso which sets EFLAGS.AC bit.
Last time I tried, it broke in the very early libc initialization code,
due to unaligned access generated by compiler, as you described. This
was with in-tree gcc. Tried with the clang-compiled world, I got SIGBUS
due to unaligned access in ld-elf.so.1.

AC does not work in ring 0, and Intel re-purposed the bit for kernel
recently for 'security' theater.
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm

2015-02-08 Thread Bruce Evans

On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote:


Log:
 Do not mark shared structures as __packed, it leads to race condition

 If structure packed as __packed clang (and probably gcc) generates
 code that loads word fields (e.g. tx_pos)  byte-by-byte and if it's
 modified by VideoCore in the same time as ARM loads the value result
 is going to be mixed combination of bytes from previous value and
 new one.


Most uses of __packed are bugs.  It gives pessimizations as well as
non-atomic accesses for sub-object accesses.

I think the full bugs only occur when arch has strict alignment
requirements and the alignment of the __packed objects is not known.
This means that only lesser bugs occur on x86 (unless you enable
alignment checking, but this arguably breaks the ABI).  The compiler
just generates possibly-misaligned full-width accesses if the arch
doesn't have strict alignment requirements.  Often the acceses turn
out to be aligned at runtime.  Otherwise, the hardware does them
atomically, with a smaller efficiency penalty than split accesses.

Many packed structs should also be declared as __aligned(N).  This is
rarely done.  Old networking code in netinet uses __packed just once,
and this also uses __aligned(4) (for struct ip)  Newer networking code
in netinet (for ipv6) is massively pessimized, with __packed used
extensively and __aligned(N) never used with __packed.  sctp is worse.
It uses its own macro that defeats grepping for __packed.  It has 82
instances of this in netinet where ipv6 has only 34.  Newer networking
should need __packed less than old ones, since no one would misdesign a
data struct so that it needs packing now.

gcc documents the -Wpacked flag for finding some cases of bogus packing.
It is rarely used.

Bruce
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org


svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm

2015-02-08 Thread Oleksandr Tymoshenko
Author: gonzo
Date: Mon Feb  9 02:31:27 2015
New Revision: 278431
URL: https://svnweb.freebsd.org/changeset/base/278431

Log:
  Do not mark shared structures as __packed, it leads to race condition
  
  If structure packed as __packed clang (and probably gcc) generates
  code that loads word fields (e.g. tx_pos)  byte-by-byte and if it's
  modified by VideoCore in the same time as ARM loads the value result
  is going to be mixed combination of bytes from previous value and
  new one.

Modified:
  head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h

Modified: head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h
==
--- head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h Mon Feb  9 
02:27:33 2015(r278430)
+++ head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h Mon Feb  9 
02:31:27 2015(r278431)
@@ -280,7 +280,7 @@ typedef struct vchiq_slot_info_struct {
/* Use two counters rather than one to avoid the need for a mutex. */
short use_count;
short release_count;
-} __packed VCHIQ_SLOT_INFO_T; /* XXXGONZO: check it */
+}  VCHIQ_SLOT_INFO_T;
 
 typedef struct vchiq_service_struct {
VCHIQ_SERVICE_BASE_T base;
@@ -381,7 +381,7 @@ typedef struct vchiq_shared_state_struct
 
/* Debugging state */
int debug[DEBUG_MAX];
-} __packed VCHIQ_SHARED_STATE_T;
+} VCHIQ_SHARED_STATE_T;
 
 typedef struct vchiq_slot_zero_struct {
int magic;
@@ -395,7 +395,7 @@ typedef struct vchiq_slot_zero_struct {
VCHIQ_SHARED_STATE_T master;
VCHIQ_SHARED_STATE_T slave;
VCHIQ_SLOT_INFO_T slots[VCHIQ_MAX_SLOTS];
-} __packed VCHIQ_SLOT_ZERO_T;
+} VCHIQ_SLOT_ZERO_T;
 
 struct vchiq_state_struct {
int id;
___
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org