NHDaly opened a new issue #282:
URL: https://github.com/apache/arrow-julia/issues/282


   We have a data source (Relations from our database engine at RelationalAI) 
that have _columnar data_, but without column names. (We represent a Relation 
as a Set of Tuples, e.g. `movie_title` relates movie IDs to Titles, so the 
positions are meaningful but they do not have names.)
   
   We would like to encode this in Arrow as essentially a Vector of columns. In 
JSON, we would encode this as:
   ```json
   [
       [1001, 2232, 3582, 4030],
       ["The Matrix", "50 First Dates", "I Am Legend", "The Notebook"]
   ]
   ```
   
   From what I can tell,  this _is_ supported by the Arrow spec, but isn't 
currently supported by the Arrow.jl package?
   
   
   This is the understanding my colleague and I have come to of the current 
situation:
   
   - Looking at the Arrow spec, each RecordBatch message, containing the actual 
data, is preceded by a Schema message, defining the logical schema of the 
former. The Schema contains an array of Field types that define the columns of 
the RecordBatch in proper order. The name property appears to be optional. That 
would mean we could serialize columns without a name. 
       - 
https://github.com/apache/arrow/blob/56d060ca197352f575edced64e6a1fbc9331b336/format/Schema.fbs#L463
   - The fields in the Schema message are flattened, see: 
https://arrow.apache.org/docs/format/Columnar.html#recordbatch-message
   - Arrow.jl does support writing unnamed columns, but only if we supply the 
data row-wise. Then the resulting arrow schema upon loading contains column 
names like the following: `Symbol("1")` (which is a bit cumbersome to work with 
in Julia).
   
   
   
   Can we work to expose this ability through the Arrow.jl package as well, in 
the code to construct an Arrow stream from a column-wise data source?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to