[jira] [Updated] (SPARK-50570) ClassCastException in PySpark when writing to DynamoDB (MapWritable to DynamoDBItemWritable)

Sakthi (Jira) Fri, 13 Dec 2024 03:57:46 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-50570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sakthi updated SPARK-50570:
---------------------------
    Description: 
Writing data to a DynamoDB table using PySpark results in the following error:
{code:java}
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable 
cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable
    at 
org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code}
However, when using the same logic written in Scala, the write operation works 
as expected.

This indicates a potential issue in the PySpark logic responsible for 
converting RDD elements before they are written to DynamoDB via 
`DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for 
integration with DynamoDb.
 * Potential area of interest: 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186]

 

*Broad steps to reproduce the issue:*
 * Create a DynamoDB table with the required schema.

 * Load some data into S3 to serve as the source data.

 * Read from S3 and write to the DynamoDB table using PySpark.

 ** You should encounter the error mentioned above.

 * Repeat the same steps using Scala.

 ** This time, the data will be successfully written to the DynamoDB table with 
no errors encountered.

  was:
Writing data to a DynamoDB table using PySpark results in the following error:
{code:java}
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable 
cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable
    at 
org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code}
However, when using the same logic written in Scala, the write operation works 
as expected.

This indicates a potential issue in the PySpark logic responsible for 
converting RDD elements before they are written to DynamoDB via 
`DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for 
integration with DynamoDb.
 * Potential area of interest: 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186]

 

*Broad steps to reproduce the issue:*
 * Create a DynamoDB table with the required schema.
 * Load some data into S3 to serve as the source data.
 * Read from S3 and write to the DynamoDB table using PySpark.
 ** You should encounter the error mentioned above.
 * Repeat the same steps using Scala.
 ** This time, the data will be successfully written to the DynamoDB table with 
no errors encountered.


> ClassCastException in PySpark when writing to DynamoDB (MapWritable to 
> DynamoDBItemWritable)
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-50570
>                 URL: https://issues.apache.org/jira/browse/SPARK-50570
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.5.2
>            Reporter: Sakthi
>            Priority: Major
>
> Writing data to a DynamoDB table using PySpark results in the following error:
> {code:java}
> Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable 
> cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable
>     at 
> org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code}
> However, when using the same logic written in Scala, the write operation 
> works as expected.
> This indicates a potential issue in the PySpark logic responsible for 
> converting RDD elements before they are written to DynamoDB via 
> `DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark 
> for integration with DynamoDb.
>  * Potential area of interest: 
> [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186]
>  
> *Broad steps to reproduce the issue:*
>  * Create a DynamoDB table with the required schema.
>  * Load some data into S3 to serve as the source data.
>  * Read from S3 and write to the DynamoDB table using PySpark.
>  ** You should encounter the error mentioned above.
>  * Repeat the same steps using Scala.
>  ** This time, the data will be successfully written to the DynamoDB table 
> with no errors encountered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-50570) ClassCastException in PySpark when writing to DynamoDB (MapWritable to DynamoDBItemWritable)

Reply via email to