Re: [DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-20 Thread Micah Kornfield
Created https://jira.apache.org/jira/browse/ARROW-5380 to track turning
fixing and turning on unaligned access warnings in UBSan
https://jira.apache.org/jira/browse/ARROW-5365 tracks turning on ASAN and
UBSAN in CI.

Thanks,
Micah



On Fri, May 17, 2019 at 1:48 PM Antoine Pitrou  wrote:

>
> Le 17/05/2019 à 21:22, Micah Kornfield a écrit :
> > I recently ran UBSan over parquet and arrow code bases and there are
> quite
> > a few unaligned pointer warnings (we do reinterpret casts on integer
> types
> > without checking alignment).  Most of them are in Arrow itself, which
> > parquet calls into.
> >
> > Is this something the community would like to fix?
> >
> > I imagine adding a helper method something like:
> >
> > template
> > T LoadUnaligned(T* pointer_to_t) {
> >   T aligned;
> >   memcpy(, t, sizeof(T));
> >   return aligned;
> > }
> >
> > I believe clang/GCC/MSVC should be good enough to recognize that an
> > unaligned load can replace the memcpy and inline it.  But hopefully
> archery
> > will be able to catch any performance regressions if this isn't the case.
>
> +1 from me.  We may need a similar helper for unaligned stores.
>
> By the way, running ASan and UBSan is something that we should ideally
> do in CI, especially now that our Valgrind runs are disabled.
>
> Regards
>
> Antoine.
>


Re: [DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Antoine Pitrou


Le 17/05/2019 à 21:22, Micah Kornfield a écrit :
> I recently ran UBSan over parquet and arrow code bases and there are quite
> a few unaligned pointer warnings (we do reinterpret casts on integer types
> without checking alignment).  Most of them are in Arrow itself, which
> parquet calls into.
> 
> Is this something the community would like to fix?
> 
> I imagine adding a helper method something like:
> 
> template
> T LoadUnaligned(T* pointer_to_t) {
>   T aligned;
>   memcpy(, t, sizeof(T));
>   return aligned;
> }
> 
> I believe clang/GCC/MSVC should be good enough to recognize that an
> unaligned load can replace the memcpy and inline it.  But hopefully archery
> will be able to catch any performance regressions if this isn't the case.

+1 from me.  We may need a similar helper for unaligned stores.

By the way, running ASan and UBSan is something that we should ideally
do in CI, especially now that our Valgrind runs are disabled.

Regards

Antoine.


Re: [DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Tim Armstrong
I don't know the Arrow and parquet-cpp codebases but this is exactly what
we did in Impala to solve similar issues and we haven't had any performance
problems with it - it should get compiled to a single load/store on x86-64.

On Fri, May 17, 2019 at 12:22 PM Micah Kornfield 
wrote:

> I recently ran UBSan over parquet and arrow code bases and there are quite
> a few unaligned pointer warnings (we do reinterpret casts on integer types
> without checking alignment).  Most of them are in Arrow itself, which
> parquet calls into.
>
> Is this something the community would like to fix?
>
> I imagine adding a helper method something like:
>
> template
> T LoadUnaligned(T* pointer_to_t) {
>   T aligned;
>   memcpy(, t, sizeof(T));
>   return aligned;
> }
>
> I believe clang/GCC/MSVC should be good enough to recognize that an
> unaligned load can replace the memcpy and inline it.  But hopefully archery
> will be able to catch any performance regressions if this isn't the case.
>
> Thoughts?
>
> Thanks,
> Micah
>


[DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Micah Kornfield
I recently ran UBSan over parquet and arrow code bases and there are quite
a few unaligned pointer warnings (we do reinterpret casts on integer types
without checking alignment).  Most of them are in Arrow itself, which
parquet calls into.

Is this something the community would like to fix?

I imagine adding a helper method something like:

template
T LoadUnaligned(T* pointer_to_t) {
  T aligned;
  memcpy(, t, sizeof(T));
  return aligned;
}

I believe clang/GCC/MSVC should be good enough to recognize that an
unaligned load can replace the memcpy and inline it.  But hopefully archery
will be able to catch any performance regressions if this isn't the case.

Thoughts?

Thanks,
Micah