Hi Hatem,

It is intended that the convention is application-dependent.  From
Arrow's point of view, the binary string is an opaque blob of data.
Depending on your application, it might be an UTF16-encoded piece of
text, a JPEG image, anything.

By the way, if you store ASCII text data, I would recommend using the
utf8 type, since the UTF-8 encoding is a superset of ASCII.

Regards

Antoine.


Le 06/02/2019 à 11:34, Hatem Helal a écrit :
> Hi all,
> 
> I wanted to make sure I understood the distinction/use cases for choosing 
> between the utf8 and binary logical types.
> 
> Based on this doc 
> <https://arrow.apache.org/docs/format/Metadata.html#utf8-and-binary>
> 
> * Utf8 data is Unicode values with UTF-8 encoding
> * Binary is any other variable length bytes
> 
> I wonder what is the correct way to consume a binary array.  It seems like a 
> binary array is likely representing some string data but without the encoding 
> it isn't not clear how to safely interpret it.  Is there a convention (e.g. 
> assume a binary type is ASCII encoded) that we can follow?
> 
> Many thanks,
> 
> Hatem
> 

Reply via email to