[GitHub] [iceberg] kbendick edited a comment on issue #2962: Parquet 1.11.1 update causes regressions while reading iceberg data written with v1.11.0

GitBox Wed, 22 Sep 2021 15:11:01 -0700


kbendick edited a comment on issue #2962:
URL: https://github.com/apache/iceberg/issues/2962#issuecomment-925362178



   When moving the original parquet-1.10.2 jars that came with OSS Spark 3.1.2 
back, as you can see the data generated with Iceberg 0.9.0 and Iceberg 0.10.0 
are all readable (and even generated with Iceberg 0.12.0).
   
   ```
   root@spark:/opt/spark# ls jars | grep parquet
   parquet-column-1.10.1.jar
   parquet-common-1.10.1.jar
   parquet-encoding-1.10.1.jar
   parquet-format-2.4.0.jar
   parquet-hadoop-1.10.1.jar
   parquet-jackson-1.10.1.jar
   root@spark:/opt/spark# ./bin/spark-shell     --packages 
org.apache.iceberg:iceberg-spark3-runtime:0.12.0    --driver-memory 2g     
--conf 
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog    
 --conf spark.sql.catalog.spark_catalog.type=hive      --conf 
spark.hadoop.hive.metastore.uris=thrift://hive:9083
   :: loading settings :: url = 
jar:file:/opt/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
   Ivy Default Cache set to: /root/.ivy2/cache
   The jars for the packages stored in: /root/.ivy2/jars
   org.apache.iceberg#iceberg-spark3-runtime added as a dependency
   :: resolving dependencies :: 
org.apache.spark#spark-submit-parent-7a1b923a-77d9-455b-b946-72100bf36f6c;1.0
        confs: [default]
        found org.apache.iceberg#iceberg-spark3-runtime;0.12.0 in central
   :: resolution report :: resolve 144ms :: artifacts dl 2ms
        :: modules in use:
        org.apache.iceberg#iceberg-spark3-runtime;0.12.0 from central in 
[default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
        ---------------------------------------------------------------------
   :: retrieving :: 
org.apache.spark#spark-submit-parent-7a1b923a-77d9-455b-b946-72100bf36f6c
        confs: [default]
        0 artifacts copied, 1 already retrieved (0kB/5ms)
   21/09/22 21:55:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   Setting default log level to "WARN".
   To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
   Spark context Web UI available at http://spark-box:4040
   Spark context available as 'sc' (master = local[*], app id = 
local-1632347722175).
   Spark session available as 'spark'.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 3.1.2
         /_/
   
   Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_292)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> 
spark.table("default.test_parquet_map_regression_iceberg_010").show(false)
   +-------------------------------+
   |mapCol                         |
   +-------------------------------+
   |{0 -> {{true, 0.0, 0}, 0}}     |
   |{1 -> {{false, 1.0, 1}, 1}}    |
   |{2 -> {{true, 2.0, 2}, 2}}     |
   |{3 -> {{false, 3.0, 3}, 3}}    |
   |{4 -> {{true, 4.0, 4}, 4}}     |
   |{5 -> {{false, 5.0, 5}, 5}}    |
   |{6 -> {{true, 6.0, 6}, 6}}     |
   |{7 -> {{false, 7.0, 7}, 7}}    |
   |{8 -> {{true, 8.0, 8}, 8}}     |
   |{9 -> {{false, 9.0, 9}, 9}}    |
   |{10 -> {{true, 10.0, 10}, 10}} |
   |{11 -> {{false, 11.0, 11}, 11}}|
   |{12 -> {{true, 12.0, 12}, 12}} |
   |{13 -> {{false, 13.0, 13}, 13}}|
   |{14 -> {{true, 14.0, 14}, 14}} |
   |{15 -> {{false, 15.0, 15}, 15}}|
   |{16 -> {{true, 16.0, 16}, 16}} |
   |{17 -> {{false, 17.0, 17}, 17}}|
   |{18 -> {{true, 18.0, 18}, 18}} |
   |{19 -> {{false, 19.0, 19}, 19}}|
   +-------------------------------+
   only showing top 20 rows
   
   
   scala> 
spark.table("default.test_parquet_map_regression_iceberg_090").show(false)
   +-------------------------------+
   |mapCol                         |
   +-------------------------------+
   |{0 -> {{true, 0.0, 0}, 0}}     |
   |{1 -> {{false, 1.0, 1}, 1}}    |
   |{2 -> {{true, 2.0, 2}, 2}}     |
   |{3 -> {{false, 3.0, 3}, 3}}    |
   |{4 -> {{true, 4.0, 4}, 4}}     |
   |{5 -> {{false, 5.0, 5}, 5}}    |
   |{6 -> {{true, 6.0, 6}, 6}}     |
   |{7 -> {{false, 7.0, 7}, 7}}    |
   |{8 -> {{true, 8.0, 8}, 8}}     |
   |{9 -> {{false, 9.0, 9}, 9}}    |
   |{10 -> {{true, 10.0, 10}, 10}} |
   |{11 -> {{false, 11.0, 11}, 11}}|
   |{12 -> {{true, 12.0, 12}, 12}} |
   |{13 -> {{false, 13.0, 13}, 13}}|
   |{14 -> {{true, 14.0, 14}, 14}} |
   |{15 -> {{false, 15.0, 15}, 15}}|
   |{16 -> {{true, 16.0, 16}, 16}} |
   |{17 -> {{false, 17.0, 17}, 17}}|
   |{18 -> {{true, 18.0, 18}, 18}} |
   |{19 -> {{false, 19.0, 19}, 19}}|
   +-------------------------------+
   only showing top 20 rows
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick edited a comment on issue #2962: Parquet 1.11.1 update causes regressions while reading iceberg data written with v1.11.0

Reply via email to