Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
shardulm94 closed issue #10225: byte and short types in spark no longer auto coerce to int32 URL: https://github.com/apache/iceberg/issues/10225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2088423502 Just wanted to make sure you were aware reproducing is pretty simple ``` Author: jkolash Date: Thu Apr 25 19:23:22 2024 -0400 Failing test for issue #10225 https://github.com/apache/iceberg/issues/10225 diff --git a/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java b/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java index 76b138ced..9193154ce 100644 --- a/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java +++ b/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java @@ -177,6 +177,17 @@ public class TestDataFrameWriterV2 extends SparkTestBaseWithCatalog { sql("select * from %s order by id", tableName)); } + @Test + public void testByte() { +SparkSession sparkSession = spark.cloneSession(); +Dataset dataset = +sparkSession.sql("select inline(array(from_json('{\"b\": 3}', 'struct')))"); + +dataset.show(); + +dataset.writeTo(tableName).createOrReplace(); + } + @Test public void testWriteWithCaseSensitiveOption() throws NoSuchTableException, ParseException { SparkSession sparkSession = spark.cloneSession(); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078535487 Ok this is reproducing via the github actions build on my public fork https://github.com/jkolash/iceberg/actions/runs/8842101257/job/24280206652 ``` TestDataFrameWriterV2 > testByte FAILED org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 7) (localhost executor driver): java.lang.ClassCastException: class java.lang.Byte cannot be cast to class java.lang.Integer (java.lang.Byte and java.lang.Integer are in module java.base of loader 'bootstrap') at org.apache.iceberg.parquet.ColumnWriter$2.write(ColumnWriter.java:39) at org.apache.iceberg.parquet.ParquetValueWriters$PrimitiveWriter.write(ParquetValueWriters.java:131) at org.apache.iceberg.parquet.ParquetValueWriters$OptionWriter.write(ParquetValueWriters.java:356) at org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:589) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078200948 hmm I think this may be related to the spark version we are using as I tested on spark-3.4.1 and didn't see the issue but see it on our 3.4.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078157080 If/when there is a PR I can test it on my side. where I have exhaustive type testing. ``` java.lang.ClassCastException: class java.lang.Byte cannot be cast to class java.lang.Integer (java.lang.Byte and java.lang.Integer are in module java.base of loader 'bootstrap') at org.apache.iceberg.parquet.ColumnWriter$2.write(ColumnWriter.java:39) at org.apache.iceberg.parquet.ParquetValueWriters$PrimitiveWriter.write(ParquetValueWriters.java:131) at org.apache.iceberg.parquet.ParquetValueWriters$OptionWriter.write(ParquetValueWriters.java:375) at org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:608) ``` is the error I get -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078150463 ```kotlin val df = spark.sql("""select inline(array(from_json('{"b":82}', 'struct')))""") df.show() ``` ``` +---+ | b| +---+ | 82| +---+ ``` ```kotlin df.writeTo("staging.iceberg_table_3") .using("iceberg") .createOrReplace() ``` using this spark config ``` conf.set("spark.sql.catalog.staging", "org.apache.iceberg.spark.SparkCatalog") .set("spark.sql.catalog.staging.type", "hadoop") .set("spark.sql.catalog.staging.warehouse", "/tmp/random_directory"); -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078099689 @Fokko Thanks for the quick response I will try to write up a code snippet reproducing the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]
Fokko commented on issue #10225: URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078097905 Hey @jkolash Thanks for reporting this. The behavior should stay the same, due to the logic here: https://github.com/apache/iceberg/pull/9440/files#diff-8ac59cbdbcc60cc0c558051dfe8dcf9ffeb4c66379e48c49867a93ee43e27528R224-R236 What's the error that you're seeing? This will help me to reproduce the issue on my end and see if we can come up with a fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[I] byte and short types in spark no longer auto coerce to int32 [iceberg]
jkolash opened a new issue, #10225: URL: https://github.com/apache/iceberg/issues/10225 ### Apache Iceberg version 1.5.0 ### Query engine Spark ### Please describe the bug The removal of the code ```java private static PrimitiveWriter ints(DataType type, ColumnDescriptor desc) { if (type instanceof ByteType) { return ParquetValueWriters.tinyints(desc); } else if (type instanceof ShortType) { return ParquetValueWriters.shorts(desc); } return ParquetValueWriters.ints(desc); } ``` In this PR https://github.com/apache/iceberg/pull/9440/files broke this auto-coercion Is there a reason for the removal of byte short support auto coercing to int? before on iceberg 1.4.x we were able to materialize this into iceberg just fine but now on iceberg 1.5.x it doesn't work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org