[jira] [Created] (SPARK-50570) ClassCastException in PySpark when writing to DynamoDB (MapWritable to DynamoDBItemWritable)

Sakthi (Jira) Fri, 13 Dec 2024 03:56:59 -0800

Sakthi created SPARK-50570:
------------------------------

             Summary: ClassCastException in PySpark when writing to DynamoDB 
(MapWritable to DynamoDBItemWritable)
                 Key: SPARK-50570
                 URL: https://issues.apache.org/jira/browse/SPARK-50570
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.2
            Reporter: Sakthi

Writing data to a DynamoDB table using PySpark results in the following error:
{code:java}
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable
cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable
at
org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code}
However, when using the same logic written in Scala, the write operation works
as expected.

This indicates a potential issue in the PySpark logic responsible for
converting RDD elements before they are written to DynamoDB via
`DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for
integration with DynamoDb.
* Potential area of interest:
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186]

*Broad steps to reproduce the issue:*
* Create a DynamoDB table with the required schema.
* Load some data into S3 to serve as the source data.
* Read from S3 and write to the DynamoDB table using PySpark.
** You should encounter the error mentioned above.
* Repeat the same steps using Scala.
** This time, the data will be successfully written to the DynamoDB table with
no errors encountered.

--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-50570) ClassCastException in PySpark when writing to DynamoDB (MapWritable to DynamoDBItemWritable)

Reply via email to