Sakthi created SPARK-50570: ------------------------------ Summary: ClassCastException in PySpark when writing to DynamoDB (MapWritable to DynamoDBItemWritable) Key: SPARK-50570 URL: https://issues.apache.org/jira/browse/SPARK-50570 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.2 Reporter: Sakthi
Writing data to a DynamoDB table using PySpark results in the following error: {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable at org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code} However, when using the same logic written in Scala, the write operation works as expected. This indicates a potential issue in the PySpark logic responsible for converting RDD elements before they are written to DynamoDB via `DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for integration with DynamoDb. * Potential area of interest: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186] *Broad steps to reproduce the issue:* * Create a DynamoDB table with the required schema. * Load some data into S3 to serve as the source data. * Read from S3 and write to the DynamoDB table using PySpark. ** You should encounter the error mentioned above. * Repeat the same steps using Scala. ** This time, the data will be successfully written to the DynamoDB table with no errors encountered. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org