[ https://issues.apache.org/jira/browse/SPARK-31718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen updated SPARK-31718: --------------------------------- Fix Version/s: (was: 2.4.0) Target Version/s: (was: 2.4.0) Don't set Fix/Target Version > DataSourceV2 unexpected behavior with partition data distribution > ------------------------------------------------------------------ > > Key: SPARK-31718 > URL: https://issues.apache.org/jira/browse/SPARK-31718 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 2.4.0 > Reporter: Serhii > Priority: Major > > Hi team, > > We are using DataSourceV2. > > We have a queston regarding using interface > org.apache.spark.sql.sources.v2.writer.DataWriter<T> > > We have faced with following unexpected behavior. > When we use a repartion on dataframe we expect that for each partion Spark > will create new instance of DataWriter interface and sends the repartition > data to appropriate instances but sometimes we observe that Spark sends the > data from different partitions to the same instance of DataWriter interface. > It behavior sometimes occures on Yarn cluster. > > If we run Spark job as Local run Spark really creates a new instance of > DataWriter interface for each partiion after repartion and publishes the > repartion data to appropriate instances. > > Possible there is a Spark limit a number of DataWriter instances? > Can you explain it is a bug or expected behavior? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org