mapleFU commented on issue #35697:
URL: https://github.com/apache/arrow/issues/35697#issuecomment-1557051817

   ```c++
   TEST(ArrowReadWrite, NestedNonFixedSizeList3) {
     using ::arrow::field;
     using ::arrow::list;
     using ::arrow::struct_;
   
     auto type = list(list(::arrow::int16()));
   
     const char* json = R"([
         [[1, 2], [3, 4]],
         null,
         [[5, 6], null],
         [null, [7, 8]]])";
     auto array = ::arrow::ArrayFromJSON(type, json);
     auto table = ::arrow::Table::Make(::arrow::schema({field("root", type)}), 
{array});
     auto props_store_schema = 
ArrowWriterProperties::Builder().store_schema()->build();
     CheckSimpleRoundtrip(table, 2, props_store_schema);
   }
   ```
   
   By the way, this case can pass the test. I gothrough the code, and I guess 
I've find out the reason. The test arrow expect to write batch with size "2"
   
   The batch1:
   
   ```
         [[1, 2], [3, 4]],
         null
   ```
   
   The batch2:
   
   ```
         [[5, 6], null],
         [null, [7, 8]]
   ```
   
   Now, for `List` ( not fixed-size list ), the underlying data (in array) are:
   
   ```
   1 2 3 4 5 6 7 8
   ```
   
   So, when calling `WritePath` in `src/parquet/arrow/path_internal.cc`, the 
underlying data is successive, so `RecordPostListVisit` will concat them 
together:
   
   ```
     // Incorporates |range| into visited elements. If the |range| is contiguous
     // with the last range, extend the last range, otherwise add |range| 
separately
     // to the list.
     void RecordPostListVisit(const ElementRange& range) {
       if (!visited_elements.empty() && range.start == 
visited_elements.back().end) {
         visited_elements.back().end = range.end;
         return;
       }
       visited_elements.push_back(range);
     }
   ```
   
   However, for `FixedSizeList`, the underlying data is:
   
   ```
   1 2 ? ? ? ? 3 4 5 6 ? ? ? ? 7 8
   ```
   
   So, underlying data is **not** successive, and `WritePath` will trigger:
   
   ```
                 size_t visited_component_size = 
result.post_list_visited_elements.size();
                 DCHECK_GT(visited_component_size, 0);
                 if (visited_component_size != 1) {
                   return Status::NotImplemented(
                       "Lists with non-zero length null components are not 
supported");
                 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to