[ 
https://issues.apache.org/jira/browse/ARROW-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398143#comment-16398143
 ] 

Robert Nishihara edited comment on ARROW-2308 at 3/14/18 6:27 AM:
------------------------------------------------------------------

It's probably worth discussing what the best way to do this since it involves 
changing the format a little.

cc [~pcmoritz] [~wesmckinn]


was (Author: robertnishihara):
It's probably worth discussing what the best way to do this since it involves 
changing the format a little.

 

cc [~pcmoritz] [~wesmckinn]

> Serialized tensor data should be 64-byte aligned.
> -------------------------------------------------
>
>                 Key: ARROW-2308
>                 URL: https://issues.apache.org/jira/browse/ARROW-2308
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Robert Nishihara
>            Priority: Major
>
> See [https://github.com/ray-project/ray/issues/1658] for an example of this 
> issue. Non-aligned data can trigger a copy when fed into TensorFlow and 
> things like that.
> {code}
> import pyarrow as pa
> import numpy as np
> x = np.zeros(10)
> y = pa.deserialize(pa.serialize(x).to_buffer())
> x.ctypes.data % 64  # 0 (it starts out aligned)
> y.ctypes.data % 64  # 48 (it is no longer aligned)
> {code}
> It should be possible to fix this by calling something like 
> {{RETURN_NOT_OK(AlignStreamPosition(dst));}} before writing the array data. 
> Note that we already do this before writing the tensor header, but the tensor 
> header is not necessarily a multiple of 64 bytes, so the subsequent data can 
> be unaligned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to