opensourcegeek commented on issue #4328:
URL: https://github.com/apache/arrow-rs/issues/4328#issuecomment-2100542181

   That makes sense - thanks @alamb 
   
   ```rust
   pub fn parquet_stats_to_arrow(
       arrow_datatype: &DataType, 
       statistics: impl IntoIterator<Item = Option<&Statistics>> 
   ) -> Result<ArrowStatisics> {
     todo!()
   }
   ```
   
   To implement the above function, I'm just trying to suss out the details 
now. Below are the questions (probably very basic - apologies) using your `a = 
5` example,
   
   - `arrow_datatype`, this will be `a`s arrow data type => Int64 or the likes?
   
   - `impl IntoIterator<Item = Option<&Statistics>>`, will this be Parquet 
Statistics of all columns in 'current' row group? So I'd have to fish out `a`? 
Not sure if I've interpreted correctly, to be able to fish statistics out for 
`a` I'd need to know I'm fishing out for `a`. So I'm wondering if it is already 
Parquet Statistics for `a` only, if that's the case why it's `impl 
IntoIterator` and not just `Option<&Statistics>`?
   
   - `Result<ArrowStatistics>`, once I get a handle on `a`'s Parquet 
[statistic](https://docs.rs/parquet/latest/parquet/file/statistics/enum.Statistics.html),
 I think I'd need to convert each of the 
[ValueStatistic](https://docs.rs/parquet/latest/parquet/file/statistics/struct.ValueStatistics.html)
 to [ArrayRef](https://docs.rs/arrow/latest/arrow/array/type.ArrayRef.html) 
based on `a`'s type? I couldn't find `row_count()` in `ValueStatistics` though. 
   
   Sorry,  just trying to get an understanding of all the moving parts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to