Yes, true, I think we shall develop code to support 64-byte alignment for using 
AVX512
https://en.wikipedia.org/wiki/AVX-512   
https://gcc.gnu.org/wiki/cauldron2014?action=AttachFile&do=get&target=Cauldron14_AVX-512_Vector_ISA_Kirill_Yukhin_20140711.pdf
 

-----Original Message-----
From: Wes McKinney [mailto:w...@cloudera.com] 
Sent: Friday, April 08, 2016 8:40 AM
To: dev@arrow.apache.org
Subject: Re: Some questions/proposals for the spec (Layout.md)

On the SIMD question, it seems AVX is going to 512 bits, so one could
even argue for 64-byte alignment as a matter of future-proofing.  AVX2
/ 256-bit seems fairly widely available nowadays, but it would be
great if Todd or any of the hardware folks (e.g. from Intel) on the
list could weigh in with guidance.

https://en.wikipedia.org/wiki/Advanced_Vector_Extensions

On Fri, Apr 8, 2016 at 8:33 AM, Wes McKinney <w...@cloudera.com> wrote:
> On Fri, Apr 8, 2016 at 8:07 AM, Jacques Nadeau <jacq...@apache.org> wrote:
>>>
>>>
>>> > I believe this choice was primarily about simplifying the code (similar
>>> to why we have a n+1
>>> > offsets instead of just n in the list/varchar representations (even
>>> though n=0 is always 0)). In both
>>> > situations, you don't have to worry about writing special code (and a
>>> condition) for the boundary
>>> > condition inside tight loops (e.g. the last few bytes need to be handled
>>> differently since they
>>> > aren't word width).
>>>
>>> Sounds reasonable.  It might be worth illustrating this with a
>>> concrete example.  One scenario that this scheme seems useful for is a
>>> creating a new bitmap based on evaluating a predicate (i.e. all
>>> elements >X).  In this case would it make sense to make it a multiple
>>> of 16, so we can consistently use SIMD instructions for the logical
>>> "and" operation?
>>>
>>
>> Hmm... interesting thought. I'd have to look but I also recall some of the
>> newer stuff supporting even wider widths. What do others think?
>>
>>
>>> I think the spec is slightly inconsistent.  It says there is 6 bytes
>>> of overhead per entry but then follows: "with the smallest byte width
>>> capable of representing the number of types in the union."  I'm
>>> perfectly happy to say it is always 1, always 2, or always capped at
>>> 2.  I agree 32K/64K+ types is a very unlikely scenario.  We just need
>>> to clear up the ambiguity.
>>>
>>
>> Agreed. Do you want to propose an approach & patch to clarify?
>
> I can also take responsibility for the ambiguity here. My preference
> is to use int16_t for the types array (memory suitably aligned), but
> as 1 byte will be sufficient nearly all of the time, it's a slight
> trade-off in memory use vs. code complexity, e.g.
>
> if (children_.size() < 128) {
>   // types is only 1 byte
> } else {
>   // types is 2 bytes
> }
>
> Realistically there won't be that many affected code paths, so I'm
> comfortable with either choice (2-bytes always, or 1 or 2 bytes
> depending on the size of the union).

Reply via email to