Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-05-19 Thread via GitHub


shardulm94 closed issue #10225: byte and short types in spark no longer auto 
coerce to int32
URL: https://github.com/apache/iceberg/issues/10225


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-05-01 Thread via GitHub


jkolash commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2088423502

   Just wanted to make sure you were aware reproducing is pretty simple
   
   ```
   Author: jkolash 
   Date:   Thu Apr 25 19:23:22 2024 -0400
   
   Failing test for issue #10225
   
   https://github.com/apache/iceberg/issues/10225
   
   diff --git 
a/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java
 
b/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java
   index 76b138ced..9193154ce 100644
   --- 
a/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java
   +++ 
b/spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestDataFrameWriterV2.java
   @@ -177,6 +177,17 @@ public class TestDataFrameWriterV2 extends 
SparkTestBaseWithCatalog {
sql("select * from %s order by id", tableName));
  }

   +  @Test
   +  public void testByte() {
   +SparkSession sparkSession = spark.cloneSession();
   +Dataset dataset =
   +sparkSession.sql("select inline(array(from_json('{\"b\": 3}', 
'struct')))");
   +
   +dataset.show();
   +
   +dataset.writeTo(tableName).createOrReplace();
   +  }
   +
  @Test
  public void testWriteWithCaseSensitiveOption() throws 
NoSuchTableException, ParseException {
SparkSession sparkSession = spark.cloneSession();
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


jkolash commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078535487

   Ok this is reproducing via the github actions build on my public fork
   https://github.com/jkolash/iceberg/actions/runs/8842101257/job/24280206652
   ```
   TestDataFrameWriterV2 > testByte FAILED
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 
(TID 7) (localhost executor driver): java.lang.ClassCastException: class 
java.lang.Byte cannot be cast to class java.lang.Integer (java.lang.Byte and 
java.lang.Integer are in module java.base of loader 'bootstrap')
at org.apache.iceberg.parquet.ColumnWriter$2.write(ColumnWriter.java:39)
at 
org.apache.iceberg.parquet.ParquetValueWriters$PrimitiveWriter.write(ParquetValueWriters.java:131)
at 
org.apache.iceberg.parquet.ParquetValueWriters$OptionWriter.write(ParquetValueWriters.java:356)
at 
org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:589)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


jkolash commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078200948

   hmm I think this may be related to the spark version we are using as I 
tested on spark-3.4.1 and didn't see the issue but see it on our 3.4.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


jkolash commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078157080

   If/when there is a PR I can test it on my side. where I have exhaustive type 
testing.
   
   ```
   java.lang.ClassCastException: class java.lang.Byte cannot be cast to class 
java.lang.Integer (java.lang.Byte and java.lang.Integer are in module java.base 
of loader 'bootstrap')
at org.apache.iceberg.parquet.ColumnWriter$2.write(ColumnWriter.java:39)
at 
org.apache.iceberg.parquet.ParquetValueWriters$PrimitiveWriter.write(ParquetValueWriters.java:131)
at 
org.apache.iceberg.parquet.ParquetValueWriters$OptionWriter.write(ParquetValueWriters.java:375)
at 
org.apache.iceberg.parquet.ParquetValueWriters$StructWriter.write(ParquetValueWriters.java:608)
   ```
   is the error I get


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


jkolash commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078150463

   ```kotlin
   val df = spark.sql("""select inline(array(from_json('{"b":82}', 
'struct')))""")
   df.show()
   ```
   
   ```
   +---+
   |  b|
   +---+
   | 82|
   +---+
   ```
   
   
   ```kotlin
   df.writeTo("staging.iceberg_table_3")
   .using("iceberg")
   .createOrReplace()
   ```
   
   using this spark config
   ```
   conf.set("spark.sql.catalog.staging", 
"org.apache.iceberg.spark.SparkCatalog")
   .set("spark.sql.catalog.staging.type", "hadoop")
   .set("spark.sql.catalog.staging.warehouse", 
"/tmp/random_directory");
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


jkolash commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078099689

   @Fokko Thanks for the quick response I will try to write up a code snippet 
reproducing the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



Re: [I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


Fokko commented on issue #10225:
URL: https://github.com/apache/iceberg/issues/10225#issuecomment-2078097905

   Hey @jkolash Thanks for reporting this. The behavior should stay the same, 
due to the logic here:
   
   
https://github.com/apache/iceberg/pull/9440/files#diff-8ac59cbdbcc60cc0c558051dfe8dcf9ffeb4c66379e48c49867a93ee43e27528R224-R236
   
   What's the error that you're seeing? This will help me to reproduce the 
issue on my end and see if we can come up with a fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[I] byte and short types in spark no longer auto coerce to int32 [iceberg]

2024-04-25 Thread via GitHub


jkolash opened a new issue, #10225:
URL: https://github.com/apache/iceberg/issues/10225

   ### Apache Iceberg version
   
   1.5.0
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 
   
   The removal of the code
   
   ```java
private static PrimitiveWriter ints(DataType type, ColumnDescriptor 
desc) {
   if (type instanceof ByteType) {
 return ParquetValueWriters.tinyints(desc);
   } else if (type instanceof ShortType) {
 return ParquetValueWriters.shorts(desc);
   }
   return ParquetValueWriters.ints(desc);
 }
   ```
   In this PR https://github.com/apache/iceberg/pull/9440/files
   
   broke this auto-coercion
   
   Is there a reason for the removal of byte short support auto coercing to 
int? before on iceberg 1.4.x we were able to materialize this into iceberg just 
fine but now on iceberg 1.5.x it doesn't work
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org