[jira] [Updated] (SPARK-49686) Spark get stuck while evaluating sorted wrong dataframe rdd

Alvaro Berdonces (Jira) Tue, 17 Sep 2024 07:39:04 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-49686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alvaro Berdonces updated SPARK-49686:
-------------------------------------
    Description: 
Under the below scenario `sorted.rdd` takes for ever instead of throwing the 
expected exception because the schema does not fit the data types.

 
{code:java}
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types._
import scala.util.Try

val data: Seq[Row] = Seq(
  Row(1, "a"),
  Row(2, "b"),
  Row(3, "c")
)

val schema = StructType(Seq(
  StructField("id", StringType),
  StructField("value", StringType)
))

val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data), schema
)

val sorted = df.orderBy(col("value"))

Try(sorted.rdd)
sorted.rdd
{code}
 

 

A less simplified error is happening to us when using Holden Karau Spark 
Testing Base and as workaround we are forcing an action before the assert, but 
I guess it is not the expected behaviour.

  was:
Under the below scenario `sorted.rdd` takes for ever instead of throwing the 
expected exception because the schema does not fit the data types.

 

 
{code:java}
import org.apache.spark.sql.Row
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types._
import scala.util.Try
val data: Seq[Row] = Seq(
  Row(1, "a"),
  Row(2, "b"),
  Row(3, "c")
)
val schema = StructType(Seq(
  StructField("id", StringType),
  StructField("value", StringType)
))
val df = spark.createDataFrame(
  spark.sparkContext.parallelize(data), schema
)
val sorted = df.orderBy(col("value"))
Try(sorted.rdd)
sorted.rdd
{code}
 

 

A less simplified error is happening to us when using Holden Karau Spark 
Testing Base and as workaround we are forcing an action before the assert, but 
I guess it is not the expected behaviour.


> Spark get stuck while evaluating sorted wrong dataframe rdd
> -----------------------------------------------------------
>
>                 Key: SPARK-49686
>                 URL: https://issues.apache.org/jira/browse/SPARK-49686
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.5.0, 3.5.1, 3.5.2
>            Reporter: Alvaro Berdonces
>            Priority: Minor
>
> Under the below scenario `sorted.rdd` takes for ever instead of throwing the 
> expected exception because the schema does not fit the data types.
>  
> {code:java}
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.functions.col
> import org.apache.spark.sql.types._
> import scala.util.Try
> val data: Seq[Row] = Seq(
>   Row(1, "a"),
>   Row(2, "b"),
>   Row(3, "c")
> )
> val schema = StructType(Seq(
>   StructField("id", StringType),
>   StructField("value", StringType)
> ))
> val df = spark.createDataFrame(
>   spark.sparkContext.parallelize(data), schema
> )
> val sorted = df.orderBy(col("value"))
> Try(sorted.rdd)
> sorted.rdd
> {code}
>  
>  
> A less simplified error is happening to us when using Holden Karau Spark 
> Testing Base and as workaround we are forcing an action before the assert, 
> but I guess it is not the expected behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-49686) Spark get stuck while evaluating sorted wrong dataframe rdd

Reply via email to