[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305905#comment-16305905 ]
zhengruifeng commented on SPARK-22905: -------------------------------------- Many other models are saved in the same way {sparkSession.createDataFrame(...).repartition(1).write.parquet}, are they needed to be fixed? > Fix ChiSqSelectorModel save implementation > ------------------------------------------ > > Key: SPARK-22905 > URL: https://issues.apache.org/jira/browse/SPARK-22905 > Project: Spark > Issue Type: Bug > Components: MLlib > Affects Versions: 2.2.1 > Reporter: Weichen Xu > Assignee: Weichen Xu > Fix For: 2.3.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, in `ChiSqSelectorModel`, save: > {code} > spark.createDataFrame(dataArray).repartition(1).write... > {code} > The default partition number used by createDataFrame is "defaultParallelism", > Current RoundRobinPartitioning won't guarantee the "repartition" generating > the same order result with local array. We need fix it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org