Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm
On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote: On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote: Log: Do not mark shared structures as __packed, it leads to race condition If structure packed as __packed clang (and probably gcc) generates code that loads word fields (e.g. tx_pos) byte-by-byte and if it's modified by VideoCore in the same time as ARM loads the value result is going to be mixed combination of bytes from previous value and new one. Most uses of __packed are bugs. It gives pessimizations as well as non-atomic accesses for sub-object accesses. I think the full bugs only occur when arch has strict alignment requirements and the alignment of the __packed objects is not known. This means that only lesser bugs occur on x86 (unless you enable alignment checking, but this arguably breaks the ABI). The compiler just generates possibly-misaligned full-width accesses if the arch doesn't have strict alignment requirements. Often the acceses turn out to be aligned at runtime. Otherwise, the hardware does them atomically, with a smaller efficiency penalty than split accesses. On x86 unaligned access is non-atomic. This was very visible on Core2 CPUs where DPCPU code mishandled the alignment, resulting in the mutexes from the per-cpu areas breaking badly. Modern CPUs should not lock several cache lines simultaneously either. ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org
Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm
On Mon, 9 Feb 2015, Konstantin Belousov wrote: On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote: On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote: ... I think the full bugs only occur when arch has strict alignment requirements and the alignment of the __packed objects is not known. This means that only lesser bugs occur on x86 (unless you enable alignment checking, but this arguably breaks the ABI). The compiler just generates possibly-misaligned full-width accesses if the arch doesn't have strict alignment requirements. Often the acceses turn out to be aligned at runtime. Otherwise, the hardware does them atomically, with a smaller efficiency penalty than split accesses. On x86 unaligned access is non-atomic. This was very visible on Core2 CPUs where DPCPU code mishandled the alignment, resulting in the mutexes from the per-cpu areas breaking badly. Modern CPUs should not lock several cache lines simultaneously either. Interesting. I thought that this was relatively easy to handle in hardware and required for compatibility, so hardware did it. This gives a reason other than efficiency to enable alignment checking so as to find all places that do misaligned accesses. I last tried this more than 20 years ago. Compilers mostly generated aligned accesses. One exception was for copying small (sub)structs. Inlining of the copy assumed maximal alignment or no alignment traps. Library functions are more of a problem. FreeBSD amd64 and i386 memcpy also assume this. Similarly for the MD mem* in the kernel. Mostly things are suitably aligned, so it is the correct optimization to not do extra work to align. Bruce ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org
Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm
On Mon, Feb 09, 2015 at 08:51:02PM +1100, Bruce Evans wrote: On Mon, 9 Feb 2015, Konstantin Belousov wrote: On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote: On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote: ... I think the full bugs only occur when arch has strict alignment requirements and the alignment of the __packed objects is not known. This means that only lesser bugs occur on x86 (unless you enable alignment checking, but this arguably breaks the ABI). The compiler just generates possibly-misaligned full-width accesses if the arch doesn't have strict alignment requirements. Often the acceses turn out to be aligned at runtime. Otherwise, the hardware does them atomically, with a smaller efficiency penalty than split accesses. On x86 unaligned access is non-atomic. This was very visible on Core2 CPUs where DPCPU code mishandled the alignment, resulting in the mutexes from the per-cpu areas breaking badly. Modern CPUs should not lock several cache lines simultaneously either. Interesting. I thought that this was relatively easy to handle in hardware and required for compatibility, so hardware did it. Trying to lock to cache lines easily results in deadlock. FWIW, multi-socket Intel platforms are already deadlock-prone due to the cache, and have some facilities to debug this. This gives a reason other than efficiency to enable alignment checking so as to find all places that do misaligned accesses. I last tried this more than 20 years ago. Compilers mostly generated aligned accesses. One exception was for copying small (sub)structs. Inlining of the copy assumed maximal alignment or no alignment traps. Library functions are more of a problem. FreeBSD amd64 and i386 memcpy also assume this. Similarly for the MD mem* in the kernel. Mostly things are suitably aligned, so it is the correct optimization to not do extra work to align. I also did experiments with preloadable dso which sets EFLAGS.AC bit. Last time I tried, it broke in the very early libc initialization code, due to unaligned access generated by compiler, as you described. This was with in-tree gcc. Tried with the clang-compiled world, I got SIGBUS due to unaligned access in ld-elf.so.1. AC does not work in ring 0, and Intel re-purposed the bit for kernel recently for 'security' theater. ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org
Re: svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm
On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote: Log: Do not mark shared structures as __packed, it leads to race condition If structure packed as __packed clang (and probably gcc) generates code that loads word fields (e.g. tx_pos) byte-by-byte and if it's modified by VideoCore in the same time as ARM loads the value result is going to be mixed combination of bytes from previous value and new one. Most uses of __packed are bugs. It gives pessimizations as well as non-atomic accesses for sub-object accesses. I think the full bugs only occur when arch has strict alignment requirements and the alignment of the __packed objects is not known. This means that only lesser bugs occur on x86 (unless you enable alignment checking, but this arguably breaks the ABI). The compiler just generates possibly-misaligned full-width accesses if the arch doesn't have strict alignment requirements. Often the acceses turn out to be aligned at runtime. Otherwise, the hardware does them atomically, with a smaller efficiency penalty than split accesses. Many packed structs should also be declared as __aligned(N). This is rarely done. Old networking code in netinet uses __packed just once, and this also uses __aligned(4) (for struct ip) Newer networking code in netinet (for ipv6) is massively pessimized, with __packed used extensively and __aligned(N) never used with __packed. sctp is worse. It uses its own macro that defeats grepping for __packed. It has 82 instances of this in netinet where ipv6 has only 34. Newer networking should need __packed less than old ones, since no one would misdesign a data struct so that it needs packing now. gcc documents the -Wpacked flag for finding some cases of bogus packing. It is rarely used. Bruce ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org
svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm
Author: gonzo Date: Mon Feb 9 02:31:27 2015 New Revision: 278431 URL: https://svnweb.freebsd.org/changeset/base/278431 Log: Do not mark shared structures as __packed, it leads to race condition If structure packed as __packed clang (and probably gcc) generates code that loads word fields (e.g. tx_pos) byte-by-byte and if it's modified by VideoCore in the same time as ARM loads the value result is going to be mixed combination of bytes from previous value and new one. Modified: head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h Modified: head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h == --- head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h Mon Feb 9 02:27:33 2015(r278430) +++ head/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h Mon Feb 9 02:31:27 2015(r278431) @@ -280,7 +280,7 @@ typedef struct vchiq_slot_info_struct { /* Use two counters rather than one to avoid the need for a mutex. */ short use_count; short release_count; -} __packed VCHIQ_SLOT_INFO_T; /* XXXGONZO: check it */ +} VCHIQ_SLOT_INFO_T; typedef struct vchiq_service_struct { VCHIQ_SERVICE_BASE_T base; @@ -381,7 +381,7 @@ typedef struct vchiq_shared_state_struct /* Debugging state */ int debug[DEBUG_MAX]; -} __packed VCHIQ_SHARED_STATE_T; +} VCHIQ_SHARED_STATE_T; typedef struct vchiq_slot_zero_struct { int magic; @@ -395,7 +395,7 @@ typedef struct vchiq_slot_zero_struct { VCHIQ_SHARED_STATE_T master; VCHIQ_SHARED_STATE_T slave; VCHIQ_SLOT_INFO_T slots[VCHIQ_MAX_SLOTS]; -} __packed VCHIQ_SLOT_ZERO_T; +} VCHIQ_SLOT_ZERO_T; struct vchiq_state_struct { int id; ___ svn-src-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to svn-src-all-unsubscr...@freebsd.org