NHDaly opened a new issue #282: URL: https://github.com/apache/arrow-julia/issues/282
We have a data source (Relations from our database engine at RelationalAI) that have _columnar data_, but without column names. (We represent a Relation as a Set of Tuples, e.g. `movie_title` relates movie IDs to Titles, so the positions are meaningful but they do not have names.) We would like to encode this in Arrow as essentially a Vector of columns. In JSON, we would encode this as: ```json [ [1001, 2232, 3582, 4030], ["The Matrix", "50 First Dates", "I Am Legend", "The Notebook"] ] ``` From what I can tell, this _is_ supported by the Arrow spec, but isn't currently supported by the Arrow.jl package? This is the understanding my colleague and I have come to of the current situation: - Looking at the Arrow spec, each RecordBatch message, containing the actual data, is preceded by a Schema message, defining the logical schema of the former. The Schema contains an array of Field types that define the columns of the RecordBatch in proper order. The name property appears to be optional. That would mean we could serialize columns without a name. - https://github.com/apache/arrow/blob/56d060ca197352f575edced64e6a1fbc9331b336/format/Schema.fbs#L463 - The fields in the Schema message are flattened, see: https://arrow.apache.org/docs/format/Columnar.html#recordbatch-message - Arrow.jl does support writing unnamed columns, but only if we supply the data row-wise. Then the resulting arrow schema upon loading contains column names like the following: `Symbol("1")` (which is a bit cumbersome to work with in Julia). Can we work to expose this ability through the Arrow.jl package as well, in the code to construct an Arrow stream from a column-wise data source? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org