Sakthi created SPARK-50570:
------------------------------

             Summary: ClassCastException in PySpark when writing to DynamoDB 
(MapWritable to DynamoDBItemWritable)
                 Key: SPARK-50570
                 URL: https://issues.apache.org/jira/browse/SPARK-50570
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.5.2
            Reporter: Sakthi


Writing data to a DynamoDB table using PySpark results in the following error:
{code:java}
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable 
cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable
    at 
org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code}
However, when using the same logic written in Scala, the write operation works 
as expected.

This indicates a potential issue in the PySpark logic responsible for 
converting RDD elements before they are written to DynamoDB via 
`DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for 
integration with DynamoDb.
 * Potential area of interest: 
[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186]

 

*Broad steps to reproduce the issue:*
 * Create a DynamoDB table with the required schema.
 * Load some data into S3 to serve as the source data.
 * Read from S3 and write to the DynamoDB table using PySpark.
 ** You should encounter the error mentioned above.
 * Repeat the same steps using Scala.
 ** This time, the data will be successfully written to the DynamoDB table with 
no errors encountered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to