That's not the point – if you already have memory and have to fill it, then
you're not in any position for the kernel to lazily zero it, so the
alignment of arbitrary arrays is irrelevant. The point SGJ was making is
that we want to allocate the memory using something calloc-like so that the
kernel can do lazy zeroing for us, but we also need that memory to be
16-byte aligned, but there is not portable way to get 16-byte-aligned
memory that the kernel will lazily zero for you. We can have lazy zeroing
or 16-byte alignment but not both. This makes me wonder if we couldn't just
allocate 15 bytes more than necessary and return the first address that on
a 16-byte boundary.

On Mon, Nov 24, 2014 at 11:02 PM, Viral Shah <vi...@mayin.org> wrote:

> To add to the point, you can also get non-aligned stuff with subarrays or
> results from a ccall.
>
> -viral
>
>
> On Tuesday, November 25, 2014 9:24:36 AM UTC+5:30, Simon Kornblith wrote:
>>
>> In general, arrays cannot be assumed to be 16-byte aligned because it's
>> always possible to create one that isn't using pointer_to_array.
>> However, from Intel's AVX introduction
>> <https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions>
>> :
>>
>> Intel® AVX has relaxed some memory alignment requirements, so now Intel
>> AVX by default allows unaligned access; however, this access may come at a
>> performance slowdown, so the old rule of designing your data to be memory
>> aligned is still good practice (16-byte aligned for 128-bit access and
>> 32-byte aligned for 256-bit access).
>>
>> On Monday, November 24, 2014 10:01:45 PM UTC-5, Erik Schnetter wrote:
>>>
>>> On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson
>>> <steve...@gmail.com> wrote:
>>> > Unfortunately, Julia allocates 16-byte aligned data by default (to
>>> help SIMD
>>> > code), and there is no calloc version of posix_memalign as far as I
>>> know.
>>>
>>> The generated machine code I've seen does not make use of this. All
>>> the load/store instructions in vectorized or unrolled loops assume
>>> unaligned pointers. (Plus, with AVX one should align to 32 bytes
>>> instead.)
>>>
>>> -erik
>>>
>>> --
>>> Erik Schnetter <schn...@cct.lsu.edu>
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>
>>

Reply via email to