cccs-jc commented on PR #8980:
URL: https://github.com/apache/iceberg/pull/8980#issuecomment-1853964957

   so I did more digging. On our production tables I search for all manifests 
which have a `existing_data_files_count > 0` and  `added_data_files_count > 0` 
and I find none. This leads me to believe that a commit will either be an 
append with `added_data_files_count`  **or** a rewrite with 
`existing_data_files_count` .
   
   This query returns no results:
   ```sql
   select
         distinct added_snapshot_id
     from
         catalog1.schema1.table1.manifests
     where
         existing_data_files_count > 0
         and added_data_files_count > 0
   ```            
   
   I can search for manifests which have `existing_data_files_count > 0` and 
join those results to the snapshots.  
   
   ```sql
   select
       *
   from
       catalog1.schema1.table1.snapshots
   where
       snapshot_id in (
           select
               distinct added_snapshot_id
           from
               catalog1.schema1.table1.manifests
           where
               existing_data_files_count > 0
       )
   ```
   
   
   
   Manifests with the snapshot_id they belong to
   
![image](https://github.com/apache/iceberg/assets/56140112/9cd9e32e-b980-4ded-be67-13748729da93)
   
   Their corresponding snapshots are all rewrite snapshots:
   
![image](https://github.com/apache/iceberg/assets/56140112/5fa5ecb3-d820-4f8d-812a-3569531b2e91)
   
   
   When streaming we skip over rewrites snapshots. Thus we will never encounter 
a manifest with an `existing_data_files_count > 0`.
   
   So this calling this in the code does nothing `+ existingFilesCount();`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to