[ https://issues.apache.org/jira/browse/SPARK-38161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-38161: --------------------------------- Component/s: SQL (was: Block Manager) > when clean data hope to spilt one dataframe or dataset to two dataframe > ------------------------------------------------------------------------ > > Key: SPARK-38161 > URL: https://issues.apache.org/jira/browse/SPARK-38161 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 3.2.1 > Reporter: gaokui > Priority: Major > > when I am processing data clean, I meet such scene. > one coloumn need judge by empy or null condition. > so I do it right now similar code as following: > df1= dataframe.filter("coloumn=null") > df2= dataframe.filter("coloumn=!null") > and then write df1 and df2 into hdfs parquet file. > but when i have thousand condition. every job need more stage. > I hope dataframe can filter by one condition once and not twice. and that can > generate two dataframe. > > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org