[jira] [Updated] (SPARK-47845) Support column type in split function in scala and python

2024-04-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47845:
---
Labels: pull-request-available  (was: )

> Support column type in split function in scala and python
> -
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Spark Core
>Affects Versions: 3.5.1
>Reporter: Liu Cao
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case to split a String typed column with different delimiters 
> defined in other columns of the dataframe. SQL already supports this, but 
> scala / python functions currently don't.
>  
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
>   ("Doe, John", ", ", 2),
>   ("Smith,Jane", ",", 2),
>   ("Johnson", ",", 1)
> )
>   )
>   .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
> test_data").show()
> // currently doesn't compile for scala, but easy to support
> example.withColumn("name_parts", split(col("name"), col("delim"), 
> col("expected_parts_count"))).show() {code}
>  
> Pretty simple patch that I can make a PR soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47845) Support column type in split function in scala and python

2024-04-14 Thread Liu Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Cao updated SPARK-47845:

Target Version/s:   (was: 4.0.0)

> Support column type in split function in scala and python
> -
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Spark Core
>Affects Versions: 3.5.1
>Reporter: Liu Cao
>Priority: Major
>
> I have a use case to split a String typed column with different delimiters 
> defined in other columns of the dataframe. SQL already supports this, but 
> scala / python functions currently don't.
>  
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
>   ("Doe, John", ", ", 2),
>   ("Smith,Jane", ",", 2),
>   ("Johnson", ",", 1)
> )
>   )
>   .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
> test_data").show()
> // currently doesn't compile for scala, but easy to support
> example.withColumn("name_parts", split(col("name"), col("delim"), 
> col("expected_parts_count"))).show() {code}
>  
> Pretty simple patch that I can make a PR soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47845) Support column type in split function in scala and python

2024-04-14 Thread Liu Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Cao updated SPARK-47845:

Description: 
I have a use case to split a String typed column with different delimiters 
defined in other columns of the dataframe. SQL already supports this, but scala 
/ python functions currently don't.

 

A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}

val example = spark.createDataFrame(
Seq(
  ("Doe, John", ", ", 2),
  ("Smith,Jane", ",", 2),
  ("Johnson", ",", 1)
)
  )
  .toDF("name", "delim", "expected_parts_count")

example.createOrReplaceTempView("test_data")

// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
test_data").show()

// currently doesn't compile for scala, but easy to support
example.withColumn("name_parts", split(col("name"), col("delim"), 
col("expected_parts_count"))).show() {code}
 

Pretty simple patch that I can make a PR soon

  was:
I have a use case to split a String typed column with different delimiters 
defined in other columns of the dataframe. SQL already supports this, but scala 
/ python functions currently don't.

 

A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}

val example = spark.createDataFrame(
Seq(
  ("Doe, John", ", ", 2),
  ("Smith,Jane", ",", 2),
  ("Johnson", ",", 1)
)
  )
  .toDF("name", "delim", "expected_parts_count")

example.createOrReplaceTempView("test_data")

// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
test_data").show()

// currently erroring out for scala
example.withColumn("name_parts", split(col("name"), col("delim"), 
col("expected_parts_count"))).show() {code}
 

Pretty simple patch that I can make a PR soon


> Support column type in split function in scala and python
> -
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Spark Core
>Affects Versions: 3.5.1
>Reporter: Liu Cao
>Priority: Major
>
> I have a use case to split a String typed column with different delimiters 
> defined in other columns of the dataframe. SQL already supports this, but 
> scala / python functions currently don't.
>  
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
>   ("Doe, John", ", ", 2),
>   ("Smith,Jane", ",", 2),
>   ("Johnson", ",", 1)
> )
>   )
>   .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
> test_data").show()
> // currently doesn't compile for scala, but easy to support
> example.withColumn("name_parts", split(col("name"), col("delim"), 
> col("expected_parts_count"))).show() {code}
>  
> Pretty simple patch that I can make a PR soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47845) Support column type in split function in scala and python

2024-04-14 Thread Liu Cao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Cao updated SPARK-47845:

Description: 
I have a use case to split a String typed column with different delimiters 
defined in other columns of the dataframe. SQL already supports this, but scala 
/ python functions currently don't.

 

A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}

val example = spark.createDataFrame(
Seq(
  ("Doe, John", ", ", 2),
  ("Smith,Jane", ",", 2),
  ("Johnson", ",", 1)
)
  )
  .toDF("name", "delim", "expected_parts_count")

example.createOrReplaceTempView("test_data")

// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
test_data").show()

// currently erroring out for scala
example.withColumn("name_parts", split(col("name"), col("delim"), 
col("expected_parts_count"))).show() {code}
 

Pretty simple patch that I can make a PR soon

  was:
I have a use case to split a String typed column with different delimiters 
defined in other columns of the dataframe. SQL already supports this, but scala 
/ python functions currently don't.

 

A hypothetical example to illustrate:
{code:java}
import org.apache.spark.sql.functions.{col, split}

val example = spark.createDataFrame(
Seq(
  ("Doe, John", ", ", 2),
  ("Smith,Jane", ",", 2),
  ("Johnson", ",", 1)
)
  )
  .toDF("name", "delim", "expected_parts_count")

example.createOrReplaceTempView("test_data")

// works for SQL
spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
test_data").show()

// currently erroring out for scala
example.withColumn("name_parts", split(col("name"), col("delim"), 
col("expected_parts_count"))).show() {code}
 


> Support column type in split function in scala and python
> -
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Spark Core
>Affects Versions: 3.5.1
>Reporter: Liu Cao
>Priority: Major
>
> I have a use case to split a String typed column with different delimiters 
> defined in other columns of the dataframe. SQL already supports this, but 
> scala / python functions currently don't.
>  
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
>   ("Doe, John", ", ", 2),
>   ("Smith,Jane", ",", 2),
>   ("Johnson", ",", 1)
> )
>   )
>   .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
> test_data").show()
> // currently erroring out for scala
> example.withColumn("name_parts", split(col("name"), col("delim"), 
> col("expected_parts_count"))).show() {code}
>  
> Pretty simple patch that I can make a PR soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org