[ 
https://issues.apache.org/jira/browse/SPARK-31718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-31718:
---------------------------------
       Fix Version/s:     (was: 2.4.0)
    Target Version/s:   (was: 2.4.0)

Don't set Fix/Target Version

>  DataSourceV2 unexpected behavior with partition data distribution
> ------------------------------------------------------------------
>
>                 Key: SPARK-31718
>                 URL: https://issues.apache.org/jira/browse/SPARK-31718
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.4.0
>            Reporter: Serhii
>            Priority: Major
>
> Hi team,
>   
>  We are using DataSourceV2.
>   
>  We have a queston regarding using interface 
> org.apache.spark.sql.sources.v2.writer.DataWriter<T>
>   
>  We have faced with following unexpected behavior.
>  When we use a repartion on dataframe we expect that for each partion Spark 
> will create new instance of DataWriter interface and sends the repartition 
> data to appropriate instances but sometimes we observe that Spark sends the 
> data from different partitions to the same instance of DataWriter interface.
>  It behavior sometimes occures on Yarn cluster.
>   
>  If we run Spark job as Local run Spark really creates a new instance of 
> DataWriter interface for each partiion after repartion and publishes the 
> repartion data to appropriate instances.
>   
> Possible there is a Spark limit a number of  DataWriter instances?
>  Can you explain it is a bug or expected behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to