openinx commented on issue #4515:
URL: https://github.com/apache/iceberg/issues/4515#issuecomment-1091032661

   I opened a debug to verify those files, and I can see the files are laying 
like the following: 
   
   ```bash
   ➜  default find . 
   .
   ./default
   ./a.txt
   ./db
   ./db/upsert_on_pk_at_schema_end
   ./db/upsert_on_pk_at_schema_end/data
   ./db/upsert_on_pk_at_schema_end/data/data=aaa
   
./db/upsert_on_pk_at_schema_end/data/data=aaa/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00002.parquet.crc
   
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00002.parquet
   
./db/upsert_on_pk_at_schema_end/data/data=aaa/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet.crc
   
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet
   
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00005.parquet
   
./db/upsert_on_pk_at_schema_end/data/data=aaa/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00005.parquet.crc
   ./db/upsert_on_pk_at_schema_end/data/data=bbb
   
./db/upsert_on_pk_at_schema_end/data/data=bbb/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00003.parquet
   
./db/upsert_on_pk_at_schema_end/data/data=bbb/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00004.parquet.crc
   
./db/upsert_on_pk_at_schema_end/data/data=bbb/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00004.parquet
   
./db/upsert_on_pk_at_schema_end/data/data=bbb/.00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00003.parquet.crc
   ./db/upsert_on_pk_at_schema_end/metadata
   ./db/upsert_on_pk_at_schema_end/metadata/version-hint.text
   ./db/upsert_on_pk_at_schema_end/metadata/.version-hint.text.crc
   
./db/upsert_on_pk_at_schema_end/metadata/e21484e0-cea6-4001-b4af-3e34c9249a88-m0.avro
   
./db/upsert_on_pk_at_schema_end/metadata/snap-3867050424845709517-1-e21484e0-cea6-4001-b4af-3e34c9249a88.avro
   
./db/upsert_on_pk_at_schema_end/metadata/e21484e0-cea6-4001-b4af-3e34c9249a88-m1.avro
   ./db/upsert_on_pk_at_schema_end/metadata/.v2.metadata.json.crc
   ./db/upsert_on_pk_at_schema_end/metadata/v2.metadata.json
   ./db/upsert_on_pk_at_schema_end/metadata/.v1.metadata.json.crc
   
./db/upsert_on_pk_at_schema_end/metadata/.snap-3867050424845709517-1-e21484e0-cea6-4001-b4af-3e34c9249a88.avro.crc
   
./db/upsert_on_pk_at_schema_end/metadata/.e21484e0-cea6-4001-b4af-3e34c9249a88-m1.avro.crc
   
./db/upsert_on_pk_at_schema_end/metadata/.e21484e0-cea6-4001-b4af-3e34c9249a88-m0.avro.crc
   ./db/upsert_on_pk_at_schema_end/metadata/v1.metadata.json
   ```
   
   That means there is only one checkpoint to commit those three records, while 
in the partition `data=aaa`, those records are: 
   
   ```bash
   #  The equality delete file.
   ➜  default parquet cat 
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00002.parquet
 
   {"data": "aaa", "dt": 19052}
   
   # The insert data file.
   ➜  default parquet cat 
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet
 
   {"id": 2, "data": "aaa", "dt": 19052}
   {"id": 1, "data": "aaa", "dt": 19052}
   
   # The positional delete file.
   ➜  default parquet cat 
./db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00005.parquet
   {"file_path": 
"file:/var/folders/fg/kbb4swcd0gl_3s0wlhhk9bch0000gp/T/junit3828094481211623109/default/db/upsert_on_pk_at_schema_end/data/data=aaa/00000-0-f9269f62-299b-4689-8b9f-ad59d7fb539d-00001.parquet",
 "pos": 0}
   ```
   
   The tricky thing is:  the `record (2, 'aaa', '2022-03-1')` was written 
before `record(1, 'aaa', '2022-03-01')`,  that' why we encountered the failure 
case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to