[ https://issues.apache.org/jira/browse/SPARK-50570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905473#comment-17905473 ]
Sakthi commented on SPARK-50570: -------------------------------- I would like to pick up this issue and start working on a potential fix. > ClassCastException in PySpark when writing to DynamoDB (MapWritable to > DynamoDBItemWritable) > -------------------------------------------------------------------------------------------- > > Key: SPARK-50570 > URL: https://issues.apache.org/jira/browse/SPARK-50570 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.5.2 > Reporter: Sakthi > Priority: Major > > Writing data to a DynamoDB table using PySpark results in the following error: > {code:java} > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable > cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable > at > org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code} > However, when using the same logic written in Scala, the write operation > works as expected. > This indicates a potential issue in the PySpark logic responsible for > converting RDD elements before they are written to DynamoDB via > `DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark > for integration with DynamoDb. > * Potential area of interest: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186] > > *Broad steps to reproduce the issue:* > * Create a DynamoDB table with the required schema. > * Load some data into S3 to serve as the source data. > * Read from S3 and write to the DynamoDB table using PySpark. > ** You should encounter the error mentioned above. > * Repeat the same steps using Scala. > ** This time, the data will be successfully written to the DynamoDB table > with no errors encountered. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org