Hi, When I use Dataframe’s save append function, I find that the parquet partition size are very different.
Part-r-00001 to 00021 are generated at the first time save append function is called. Part-r-00022 to 00042 is generated at the second time save append function is called. As you can see, the size of Part-r-00001 to 00021 is 200M, while the size of Part-r-00022 to 00042 is 700M. But the source data is the same, which confused me. -rw-r--r-- 1 sysplatform sysplatform 2.0K Apr 8 10:01 _common_metadata -rw-r--r-- 1 sysplatform sysplatform 392K Apr 8 10:01 _metadata -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00001.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:44 part-r-00002.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:43 part-r-00003.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00004.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00005.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00006.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:43 part-r-00007.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00008.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00009.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00010.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00011.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00012.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:43 part-r-00013.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00014.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00015.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00016.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:43 part-r-00017.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00018.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:43 part-r-00019.parquet -rw-r--r-- 1 sysplatform sysplatform 199M Apr 8 09:43 part-r-00020.parquet -rw-r--r-- 1 sysplatform sysplatform 200M Apr 8 09:43 part-r-00021.parquet -rw-r--r-- 1 sysplatform sysplatform 720M Apr 8 10:01 part-r-00022.parquet -rw-r--r-- 1 sysplatform sysplatform 723M Apr 8 10:00 part-r-00023.parquet -rw-r--r-- 1 sysplatform sysplatform 721M Apr 8 10:01 part-r-00024.parquet -rw-r--r-- 1 sysplatform sysplatform 721M Apr 8 10:00 part-r-00025.parquet -rw-r--r-- 1 sysplatform sysplatform 717M Apr 8 10:00 part-r-00026.parquet -rw-r--r-- 1 sysplatform sysplatform 721M Apr 8 10:00 part-r-00027.parquet -rw-r--r-- 1 sysplatform sysplatform 720M Apr 8 10:01 part-r-00028.parquet -rw-r--r-- 1 sysplatform sysplatform 725M Apr 8 10:01 part-r-00029.parquet -rw-r--r-- 1 sysplatform sysplatform 720M Apr 8 10:00 part-r-00030.parquet -rw-r--r-- 1 sysplatform sysplatform 725M Apr 8 10:01 part-r-00031.parquet -rw-r--r-- 1 sysplatform sysplatform 724M Apr 8 10:01 part-r-00032.parquet -rw-r--r-- 1 sysplatform sysplatform 724M Apr 8 10:00 part-r-00033.parquet -rw-r--r-- 1 sysplatform sysplatform 721M Apr 8 10:01 part-r-00034.parquet -rw-r--r-- 1 sysplatform sysplatform 721M Apr 8 10:01 part-r-00035.parquet -rw-r--r-- 1 sysplatform sysplatform 720M Apr 8 10:00 part-r-00036.parquet -rw-r--r-- 1 sysplatform sysplatform 717M Apr 8 10:00 part-r-00037.parquet -rw-r--r-- 1 sysplatform sysplatform 724M Apr 8 10:01 part-r-00038.parquet -rw-r--r-- 1 sysplatform sysplatform 722M Apr 8 10:01 part-r-00039.parquet -rw-r--r-- 1 sysplatform sysplatform 722M Apr 8 10:00 part-r-00040.parquet -rw-r--r-- 1 sysplatform sysplatform 721M Apr 8 10:01 part-r-00041.parquet -rw-r--r-- 1 sysplatform sysplatform 723M Apr 8 10:01 part-r-00042.parquet -rw-r--r-- 1 sysplatform sysplatform 0 Apr 8 10:01 _SUCCESS