Re: [capnproto] relaxing alignment requirements in capnproto-rust: am I missing anything?

Ian Denhardt Sat, 11 Jan 2020 08:42:03 -0800

I'm generally supportive of this, but also worth considering if the
change doesn't land: as an alternative to the current unsafe
bytes_to_words, you could provide a version that returns a Result, which
is Err unless the argument is not 8-byte aligned or the cpu architecture
is known to be able to handle unaligned access.


-Ian

Quoting David Renshaw (2020-01-11 11:11:54)
>    Thanks for the feedback!
>    I figured out how to get rustc to emit assembly for a variety of
>    targets. Results are in this blog
>    post:� [1]https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligne
>    d-memory-access.html
>    I don't think there's any case in which the extra copy will actually be
>    an out-of-line memcpy function call.
>    - David
>
>    On Fri, Jan 10, 2020 at 10:25 AM Kenton Varda
>    <[2]ken...@cloudflare.com> wrote:
>
>    First, make sure you add the -O2 compiler option in godbolt, so that
>    these are actually optimized. If you do that, `direct()` becomes two
>    instructions (on both architectures), while `indirect()` on ARM is
>    still 9 instructions.
>    It's true that on x86_64, this change will have no negative impact, as
>    you observed. But that's specifically because x86_64 supports unaligned
>    reads and writes, and so on this platform you don't actually need to
>    change anything to support unaligned buffers.
>    On ARM, your example is generating an out-of-line function call to
>    memcpy. I could be wrong, but I think this will be heavier than you are
>    imagining. There are three issues:
>    - The function call itself takes several instructions.
>    - An out-of-line function call will force the compiler to be more
>    conservative about optimizations around it. When a getter is inlined
>    into a larger function body, this could lead to a lot more overhead
>    than is visible in the godbolt example. For example, caller-saved
>    registers used by that outer function would need to be saved and
>    restored around each call.
>    - The glibc implementation of memcpy() itself needs to be designed to
>    handle any size of memcpy, and is optimized for larger, variable-sized
>    copies, since small fixed copies would normally be inlined. Several
>    branches will be needed even for a small copy.
>    Here's the
>    code:� [3]https://github.com/lattera/glibc/blob/master/string/memcpy.c
>    And macros it depends
>    on:� [4]https://github.com/lattera/glibc/blob/master/sysdeps/generic/me
>    mcopy.h
>    It's hard to say how much effect all this would really have, but it
>    would make me uncomfortable.
>    But it might not be too hard to convince the compiler to generate a
>    fixed sequence of byte copies, rather than a memcpy call. That could be
>    a lot better. I'm kind of surprised that GCC doesn't optimize it this
>    way automatically, TBH.
>    BTW it looks like arm64 gets optimized to an unaligned load just like
>    x86_64. So the future seems to be one where we don't need to worry
>    about alignment anymore. Maybe that's a good argument for going ahead
>    with this approach now.
>    -Kenton
>
>    On Thu, Jan 9, 2020 at 10:03 PM David Renshaw <[5]dwrens...@gmail.com>
>    wrote:
>
>    I want to make it easy and safe for users of capnproto-rust to read
>    messages from unaligned buffers without copying.�  (See [6]this github
>    issue.)
>    Currently, a user must pass their unaligned buffer through� [7]unsafe
>    fn bytes_to_words(), asserting that they believe their hardware� to be
>    okay with unaligned reads. In other words, we require that the user
>    understand some tricky low-level processor details, and that the user
>    preclude their software from running on many platforms.
>    (With libraries like sqlite, zmq, redis, and many others, there simply
>    is no way to request that a buffer be aligned -- you are just given an
>    array of bytes. You can copy the bytes into an aligned buffer, but that
>    has a performance cost and a complexity cost (who owns the new
>    buffer?).)
>    I believe that it would be better for capnproto-rust to work natively
>    on unaligned buffers. In fact, I have a work-in-progress branch that
>    achieves this, essentially by changing a bunch of direct memory
>    accesses into tiny memcpy() calls. This [8]c++ godbolt snippet captures
>    the main idea, and shows that, on x86_64 at least, the extra
>    indirection gets optimized away completely. Indeed, my performance
>    measurements so far support the hypothesis that there will be no
>    performance cost in the x86_64 case. For processors that don't support
>    unaligned access, the extra copy will still be there (e.g.
>    [9]https://godbolt.org/z/qgsGMT), but I hypothesize that it will be
>    fast.
>    All in all, this change seems to me like a big usability win. So I'm
>    wondering: have I missed anything in the above analysis? Are there good
>    reasons I shouldn't make the change?
>    - David
>
>      --
>      You received this message because you are subscribed to the Google
>      Groups "Cap'n Proto" group.
>      To unsubscribe from this group and stop receiving emails from it,
>      send an email to [10]capnproto+unsubscr...@googlegroups.com.
>      To view this discussion on the web visit
>      [11]https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4
>      cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com.
>
>    --
>    You received this message because you are subscribed to the Google
>    Groups "Cap'n Proto" group.
>    To unsubscribe from this group and stop receiving emails from it, send
>    an email to [12]capnproto+unsubscr...@googlegroups.com.
>    To view this discussion on the web visit
>    [13]https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_F
>    Ex_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com.
>
> Verweise
>
>    1. 
> https://dwrensha.github.io/capnproto-rust/2020/01/11/unaligned-memory-access.html
>    2. mailto:ken...@cloudflare.com
>    3. https://github.com/lattera/glibc/blob/master/string/memcpy.c
>    4. https://github.com/lattera/glibc/blob/master/sysdeps/generic/memcopy.h
>    5. mailto:dwrens...@gmail.com
>    6. https://github.com/capnproto/capnproto-rust/issues/101
>    7. 
> https://github.com/capnproto/capnproto-rust/blob/d1988731887b2bbb0ccb35c68b9292d98f317a48/capnp/src/lib.rs#L82-L88
>    8. https://godbolt.org/z/Wki7uy
>    9. https://godbolt.org/z/qgsGMT
>   10. mailto:capnproto+unsubscr...@googlegroups.com
>   11. 
> https://groups.google.com/d/msgid/capnproto/CABR6rW-JpiJntc0i7O4cVywzfvd2YnVp89BgYeJp_Gwzoc_Edg%40mail.gmail.com?utm_medium=email&utm_source=footer
>   12. mailto:capnproto+unsubscr...@googlegroups.com
>   13. 
> https://groups.google.com/d/msgid/capnproto/CABR6rW8Xw5eveWtJGpv3_FEx_wKesHc0EDHEtdw-q0Fow%3DK6eA%40mail.gmail.com?utm_medium=email&utm_source=footer

-- 
You received this message because you are subscribed to the Google Groups 
"Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to capnproto+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/capnproto/157876085900.74264.10639491434134744676%40localhost.localdomain.

Re: [capnproto] relaxing alignment requirements in capnproto-rust: am I missing anything?

Reply via email to