Hi Hatem,
It is intended that the convention is application-dependent. From Arrow's point of view, the binary string is an opaque blob of data. Depending on your application, it might be an UTF16-encoded piece of text, a JPEG image, anything. By the way, if you store ASCII text data, I would recommend using the utf8 type, since the UTF-8 encoding is a superset of ASCII. Regards Antoine. Le 06/02/2019 à 11:34, Hatem Helal a écrit : > Hi all, > > I wanted to make sure I understood the distinction/use cases for choosing > between the utf8 and binary logical types. > > Based on this doc > <https://arrow.apache.org/docs/format/Metadata.html#utf8-and-binary> > > * Utf8 data is Unicode values with UTF-8 encoding > * Binary is any other variable length bytes > > I wonder what is the correct way to consume a binary array. It seems like a > binary array is likely representing some string data but without the encoding > it isn't not clear how to safely interpret it. Is there a convention (e.g. > assume a binary type is ASCII encoded) that we can follow? > > Many thanks, > > Hatem >