wypoon commented on PR #6026:
URL: https://github.com/apache/iceberg/pull/6026#issuecomment-1287558326

   > I'm a bit confused of this behavior: `ReadConf.startRowPositions` is valid 
only if `_pos` column exists in the `expectedSchema` due to #1716. Are there 
use cases that `_pos` is absent and we still need `ReadConf.startRowPositions`? 
By looking at the class `VectorizedParquetReader` and `ParquetReader` who are 
consuming `ReadConf.startRowPositions`, it seems likely the schema doesn't have 
`_pos`.
   
   I too was surprised by the behavior. In my example, **before my fix**, when 
the query
   ```
   select count(*) from default.test_iceberg where e is null
   ```
   is run after the update, the `Schema` that is passed to 
`ReadConf#generateOffsetToStartPos(Schema)`  is
   ```
   {
     5: e: optional double
   }
   ```
   so it did not have `_pos`.
   
   @flyrain are you asking if there are **other** cases where `_pos` will still 
be absent **after** this fix and we need `ReadConf#startRowPositions()` to 
return a valid `startRowPositions`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to