[ https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Liu Cao updated SPARK-47845: ---------------------------- Description: I have a use case to split a String typed column with different delimiters defined in other columns of the dataframe. SQL already supports this, but scala / python functions currently don't. A hypothetical example to illustrate: {code:java} import org.apache.spark.sql.functions.{col, split} val example = spark.createDataFrame( Seq( ("Doe, John", ", ", 2), ("Smith,Jane", ",", 2), ("Johnson", ",", 1) ) ) .toDF("name", "delim", "expected_parts_count") example.createOrReplaceTempView("test_data") // works for SQL spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM test_data").show() // currently erroring out for scala example.withColumn("name_parts", split(col("name"), col("delim"), col("expected_parts_count"))).show() {code} Pretty simple patch that I can make a PR soon was: I have a use case to split a String typed column with different delimiters defined in other columns of the dataframe. SQL already supports this, but scala / python functions currently don't. A hypothetical example to illustrate: {code:java} import org.apache.spark.sql.functions.{col, split} val example = spark.createDataFrame( Seq( ("Doe, John", ", ", 2), ("Smith,Jane", ",", 2), ("Johnson", ",", 1) ) ) .toDF("name", "delim", "expected_parts_count") example.createOrReplaceTempView("test_data") // works for SQL spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM test_data").show() // currently erroring out for scala example.withColumn("name_parts", split(col("name"), col("delim"), col("expected_parts_count"))).show() {code} > Support column type in split function in scala and python > --------------------------------------------------------- > > Key: SPARK-47845 > URL: https://issues.apache.org/jira/browse/SPARK-47845 > Project: Spark > Issue Type: New Feature > Components: Connect, Spark Core > Affects Versions: 3.5.1 > Reporter: Liu Cao > Priority: Major > > I have a use case to split a String typed column with different delimiters > defined in other columns of the dataframe. SQL already supports this, but > scala / python functions currently don't. > > A hypothetical example to illustrate: > {code:java} > import org.apache.spark.sql.functions.{col, split} > val example = spark.createDataFrame( > Seq( > ("Doe, John", ", ", 2), > ("Smith,Jane", ",", 2), > ("Johnson", ",", 1) > ) > ) > .toDF("name", "delim", "expected_parts_count") > example.createOrReplaceTempView("test_data") > // works for SQL > spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM > test_data").show() > // currently erroring out for scala > example.withColumn("name_parts", split(col("name"), col("delim"), > col("expected_parts_count"))).show() {code} > > Pretty simple patch that I can make a PR soon -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org