neilechao commented on PR #45375:
URL: https://github.com/apache/arrow/pull/45375#issuecomment-2737552566

   Here's what I'm seeing as differences between the 
[Geometry](https://github.com/apache/arrow/pull/45459) and Variant PRs so far -
   
   - Geometry has a reader_test which calls test_utils for creating sample 
objects. For Variant, this doesn't make sense until we put in the encoding / 
decoding piece
    - There are lots of metadata and statistics for Geometry. For Variant, 
since we're starting with unshredded, we don't have statistics so far
   - Geometry has a stricter thrift definition, whereas unshredded variant is a 
Group of two binaries, metadata and value. The metadata and value don't make 
sense until encoding is done. So logic using thrift defs doesn't make sense 
until encoding is in.
   
   Here is a very loose set of steps to get full Variant C++ support, which I'm 
sure is missing some/many pieces. Please fill in the missing steps and 
capabilities
   1. Get a logical type skeleton merged. Reading / writing binary with no 
interpretation of what it means
     a. Ideally we could get this PR to just do 1 and get it merged before 
moving on to the next steps. For this, I'm not sure what is remaining
   2. Parquet-java checks in sample parquet files with variant into 
parquet-testing
   3. Add decoding in some variant_util class(es) using those parquet-testing 
files
   4. Add encoding
     a. Reading/writing tests are unblocked
   5. Move on to shredded variants. This is a very large work item that will 
definitely get expanded into multiple sub items
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to