[ https://issues.apache.org/jira/browse/SPARK-50570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sakthi updated SPARK-50570: --------------------------- Description: Writing data to a DynamoDB table using PySpark results in the following error: {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable at org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code} However, when using the same logic written in Scala, the write operation works as expected. This indicates a potential issue in the PySpark logic responsible for converting RDD elements before they are written to DynamoDB via `DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for integration with DynamoDb. * Potential area of interest: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186] *Broad steps to reproduce the issue:* * Create a DynamoDB table with the required schema. * Load some data into S3 to serve as the source data. * Read from S3 and write to the DynamoDB table using PySpark. ** You should encounter the error mentioned above. * Repeat the same steps using Scala. ** This time, the data will be successfully written to the DynamoDB table with no errors encountered. was: Writing data to a DynamoDB table using PySpark results in the following error: {code:java} Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable at org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code} However, when using the same logic written in Scala, the write operation works as expected. This indicates a potential issue in the PySpark logic responsible for converting RDD elements before they are written to DynamoDB via `DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark for integration with DynamoDb. * Potential area of interest: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186] *Broad steps to reproduce the issue:* * Create a DynamoDB table with the required schema. * Load some data into S3 to serve as the source data. * Read from S3 and write to the DynamoDB table using PySpark. ** You should encounter the error mentioned above. * Repeat the same steps using Scala. ** This time, the data will be successfully written to the DynamoDB table with no errors encountered. > ClassCastException in PySpark when writing to DynamoDB (MapWritable to > DynamoDBItemWritable) > -------------------------------------------------------------------------------------------- > > Key: SPARK-50570 > URL: https://issues.apache.org/jira/browse/SPARK-50570 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 3.5.2 > Reporter: Sakthi > Priority: Major > > Writing data to a DynamoDB table using PySpark results in the following error: > {code:java} > Caused by: java.lang.ClassCastException: org.apache.hadoop.io.MapWritable > cannot be cast to org.apache.hadoop.dynamodb.DynamoDBItemWritable > at > org.apache.hadoop.dynamodb.write.DefaultDynamoDBRecordWriter.convertValueToDynamoDBItem(DefaultDynamoDBRecordWriter.java:22){code} > However, when using the same logic written in Scala, the write operation > works as expected. > This indicates a potential issue in the PySpark logic responsible for > converting RDD elements before they are written to DynamoDB via > `DefaultDynamoDBRecordWriter`. This prevents users from leveraging PySpark > for integration with DynamoDb. > * Potential area of interest: > [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/PythonHadoopUtil.scala#L183-L186] > > *Broad steps to reproduce the issue:* > * Create a DynamoDB table with the required schema. > * Load some data into S3 to serve as the source data. > * Read from S3 and write to the DynamoDB table using PySpark. > ** You should encounter the error mentioned above. > * Repeat the same steps using Scala. > ** This time, the data will be successfully written to the DynamoDB table > with no errors encountered. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org