[jira] [Commented] (SPARK-45424) Regression in CSV schema inference when timestamps do not match specified timestampFormat

2023-10-05 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17772365#comment-17772365
 ] 

Andy Grove commented on SPARK-45424:


The regression seems to have been introduced in 
https://issues.apache.org/jira/browse/SPARK-39280 and/or 
https://issues.apache.org/jira/browse/SPARK-39281

Commits:

[https://github.com/apache/spark/commit/b1c0d599ba32a4562ae1697e3f488264f1d03c76]

[https://github.com/apache/spark/commit/3192bbd29585607d43d0819c6c2d3ac00180261a]

 

[~fanjia] Do you understand why this behavior has changed?

> Regression in CSV schema inference when timestamps do not match specified 
> timestampFormat
> -
>
> Key: SPARK-45424
> URL: https://issues.apache.org/jira/browse/SPARK-45424
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> There is a regression in Spark 3.5.0 when inferring the schema of CSV files 
> containing timestamps, where a column will be inferred as a timestamp even if 
> the contents do not match the specified timestampFormat.
> *Test Data*
> I have the following CSV file:
> {code:java}
> 2884-06-24T02:45:51.138
> 2884-06-24T02:45:51.138
> 2884-06-24T02:45:51.138
> {code}
> *Spark 3.4.0 Behavior (correct)*
> In Spark 3.4.0, if I specify the correct timestamp format, then the schema is 
> inferred as timestamp:
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss.SSS").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
> {code}
> If I specify an incompatible timestampFormat, then the schema is inferred as 
> string:
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: string]
> {code}
> *Spark 3.5.0*
> In Spark 3.5.0, the column will be inferred as timestamp even if the data 
> does not match the specified timestampFormat.
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
> {code}
> Reading the DataFrame then results in an error:
> {code:java}
> Caused by: java.time.format.DateTimeParseException: Text 
> '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45424) Regression in CSV schema inference when timestamps do not match specified timestampFormat

2023-10-05 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-45424:
---
Summary: Regression in CSV schema inference when timestamps do not match 
specified timestampFormat  (was: Regression in CSV schema inference when 
timestampFormat is specified)

> Regression in CSV schema inference when timestamps do not match specified 
> timestampFormat
> -
>
> Key: SPARK-45424
> URL: https://issues.apache.org/jira/browse/SPARK-45424
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> There is a regression in Spark 3.5.0 when inferring the schema of CSV files 
> containing timestamps, where a column will be inferred as a timestamp even if 
> the contents do not match the specified timestampFormat.
> *Test Data*
> I have the following CSV file:
> {code:java}
> 2884-06-24T02:45:51.138
> 2884-06-24T02:45:51.138
> 2884-06-24T02:45:51.138
> {code}
> *Spark 3.4.0 Behavior (correct)*
> In Spark 3.4.0, if I specify the correct timestamp format, then the schema is 
> inferred as timestamp:
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss.SSS").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
> {code}
> If I specify an incompatible timestampFormat, then the schema is inferred as 
> string:
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: string]
> {code}
> *Spark 3.5.0*
> In Spark 3.5.0, the column will be inferred as timestamp even if the data 
> does not match the specified timestampFormat.
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
> {code}
> Reading the DataFrame then results in an error:
> {code:java}
> Caused by: java.time.format.DateTimeParseException: Text 
> '2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45424) Regression in CSV schema inference when timestampFormat is specified

2023-10-05 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-45424:
---
Description: 
There is a regression in Spark 3.5.0 when inferring the schema of CSV files 
containing timestamps, where a column will be inferred as a timestamp even if 
the contents do not match the specified timestampFormat.

*Test Data*

I have the following CSV file:
{code:java}
2884-06-24T02:45:51.138
2884-06-24T02:45:51.138
2884-06-24T02:45:51.138
{code}
*Spark 3.4.0 Behavior (correct)*

In Spark 3.4.0, if I specify the correct timestamp format, then the schema is 
inferred as timestamp:
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss.SSS").option("inferSchema", 
true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
{code}
If I specify an incompatible timestampFormat, then the schema is inferred as 
string:
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("inferSchema", true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]
{code}
*Spark 3.5.0*

In Spark 3.5.0, the column will be inferred as timestamp even if the data does 
not match the specified timestampFormat.
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("inferSchema", true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
{code}
Reading the DataFrame then results in an error:
{code:java}
Caused by: java.time.format.DateTimeParseException: Text 
'2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
{code}

  was:
There is a regression in Spark 3.5.0 when inferring the schema of files 
containing timestamps, where a column will be inferred as a timestamp even if 
the contents do not match the specified timestampFormat.

*Test Data*

I have the following csv file:
{code:java}
2884-06-24T02:45:51.138
2884-06-24T02:45:51.138
2884-06-24T02:45:51.138
{code}
*Spark 3.4.0 Behavior (correct)*

In Spark 3.4.0, if I specify the correct timestamp format, then the schema is 
inferred as timestamp:
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss.SSS").option("inferSchema", 
true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
{code}
If I specify an incompatible timestampFormat, then the schema is inferred as 
string:
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("inferSchema", true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]
{code}
*Spark 3.5.0*

In Spark 3.5.0, the column will be inferred as timestamp even if the data does 
not match the specified timestampFormat.
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("inferSchema", true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
{code}
Reading the DataFrame then results in an error:
{code:java}
Caused by: java.time.format.DateTimeParseException: Text 
'2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
{code}


> Regression in CSV schema inference when timestampFormat is specified
> 
>
> Key: SPARK-45424
> URL: https://issues.apache.org/jira/browse/SPARK-45424
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> There is a regression in Spark 3.5.0 when inferring the schema of CSV files 
> containing timestamps, where a column will be inferred as a timestamp even if 
> the contents do not match the specified timestampFormat.
> *Test Data*
> I have the following CSV file:
> {code:java}
> 2884-06-24T02:45:51.138
> 2884-06-24T02:45:51.138
> 2884-06-24T02:45:51.138
> {code}
> *Spark 3.4.0 Behavior (correct)*
> In Spark 3.4.0, if I specify the correct timestamp format, then the schema is 
> inferred as timestamp:
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss.SSS").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
> {code}
> If I specify an incompatible timestampFormat, then the schema is inferred as 
> string:
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss").option("inferSchema", 
> true).csv("/tmp/timestamps.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: string]
> {code}
> *Spark 3.5.0*
> In Spark 3.5.0, the column will be inferred as timestamp even if the data 
> does not match the specified timestampFormat.
> {code:java}
> scala> val df = spark.read.option("timestampFormat", 
> "-MM-dd'T'HH:mm:ss").option(

[jira] [Created] (SPARK-45424) Regression in CSV schema inference when timestampFormat is specified

2023-10-05 Thread Andy Grove (Jira)
Andy Grove created SPARK-45424:
--

 Summary: Regression in CSV schema inference when timestampFormat 
is specified
 Key: SPARK-45424
 URL: https://issues.apache.org/jira/browse/SPARK-45424
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Andy Grove


There is a regression in Spark 3.5.0 when inferring the schema of files 
containing timestamps, where a column will be inferred as a timestamp even if 
the contents do not match the specified timestampFormat.

*Test Data*

I have the following csv file:
{code:java}
2884-06-24T02:45:51.138
2884-06-24T02:45:51.138
2884-06-24T02:45:51.138
{code}
*Spark 3.4.0 Behavior (correct)*

In Spark 3.4.0, if I specify the correct timestamp format, then the schema is 
inferred as timestamp:
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss.SSS").option("inferSchema", 
true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
{code}
If I specify an incompatible timestampFormat, then the schema is inferred as 
string:
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("inferSchema", true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: string]
{code}
*Spark 3.5.0*

In Spark 3.5.0, the column will be inferred as timestamp even if the data does 
not match the specified timestampFormat.
{code:java}
scala> val df = spark.read.option("timestampFormat", 
"-MM-dd'T'HH:mm:ss").option("inferSchema", true).csv("/tmp/timestamps.csv")
df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
{code}
Reading the DataFrame then results in an error:
{code:java}
Caused by: java.time.format.DateTimeParseException: Text 
'2884-06-24T02:45:51.138' could not be parsed, unparsed text found at index 19
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44616) Hive Generic UDF support no longer supports short-circuiting of argument evaluation

2023-08-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-44616:
---
Description: 
PR [https://github.com/apache/spark/pull/39555] changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{code:java}
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
{code}

 

  was:
PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{{{
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
}}}


> Hive Generic UDF support no longer supports short-circuiting of argument 
> evaluation
> ---
>
> Key: SPARK-44616
> URL: https://issues.apache.org/jira/browse/SPARK-44616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> PR [https://github.com/apache/spark/pull/39555] changed DeferredObject to no 
> longer contain a function, and instead contains a value. This removes the 
> deferred evaluation capability and means that HiveGenericUDF implementations 
> can no longer short-circuit the evaluation of their arguments, which could be 
> a performance issue for some users.
> Here is a relevant javadoc comment from the Hive source for DeferredObject:
> {code:java}
>   /**
>* A Defered Object allows us to do lazy-evaluation and short-circuiting.
>* GenericUDF use DeferedObject to pass arguments.
>*/
>   public static interface DeferredObject {
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44616) Hive Generic UDF support no longer supports short-circuiting of argument evaluation

2023-08-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-44616:
---
Description: 
PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.

Here is a relevant javadoc comment from the Hive source for DeferredObject:

{{{
  /**
   * A Defered Object allows us to do lazy-evaluation and short-circuiting.
   * GenericUDF use DeferedObject to pass arguments.
   */
  public static interface DeferredObject {
}}}

  was:PR https://github.com/apache/spark/pull/39555 changed DeferredObject to 
no longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.


> Hive Generic UDF support no longer supports short-circuiting of argument 
> evaluation
> ---
>
> Key: SPARK-44616
> URL: https://issues.apache.org/jira/browse/SPARK-44616
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Andy Grove
>Priority: Major
>
> PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
> longer contain a function, and instead contains a value. This removes the 
> deferred evaluation capability and means that HiveGenericUDF implementations 
> can no longer short-circuit the evaluation of their arguments, which could be 
> a performance issue for some users.
> Here is a relevant javadoc comment from the Hive source for DeferredObject:
> {{{
>   /**
>* A Defered Object allows us to do lazy-evaluation and short-circuiting.
>* GenericUDF use DeferedObject to pass arguments.
>*/
>   public static interface DeferredObject {
> }}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44616) Hive Generic UDF support no longer supports short-circuiting of argument evaluation

2023-07-31 Thread Andy Grove (Jira)
Andy Grove created SPARK-44616:
--

 Summary: Hive Generic UDF support no longer supports 
short-circuiting of argument evaluation
 Key: SPARK-44616
 URL: https://issues.apache.org/jira/browse/SPARK-44616
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Andy Grove


PR https://github.com/apache/spark/pull/39555 changed DeferredObject to no 
longer contain a function, and instead contains a value. This removes the 
deferred evaluation capability and means that HiveGenericUDF implementations 
can no longer short-circuit the evaluation of their arguments, which could be a 
performance issue for some users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39991) AQE should use available column statistics from completed query stages

2022-08-05 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-39991:
---
Description: 
In QueryStageExec.computeStats we copy partial statistics from materlized query 
stages by calling QueryStageExec#getRuntimeStatistics, which in turn calls 
ShuffleExchangeLike#runtimeStatistics or 
BroadcastExchangeLike#runtimeStatistics.

Only dataSize and numOutputRows are copied into the new Statistics object:

 {code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
    val runtimeStats = getRuntimeStatistics
    val dataSize = runtimeStats.sizeInBytes.max(0)
    val numOutputRows = runtimeStats.rowCount.map(_.max(0))
    Some(Statistics(dataSize, numOutputRows, isRuntime = true))
  } else {
    None
  }
{code}

I would like to also copy over the column statistics stored in 
Statistics.attributeMap so that they can be fed back into the logical plan 
optimization phase. This is a small change as shown below:

{code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
val runtimeStats = getRuntimeStatistics
val dataSize = runtimeStats.sizeInBytes.max(0)
val numOutputRows = runtimeStats.rowCount.map(_.max(0))
val attributeStats = runtimeStats.attributeStats
Some(Statistics(dataSize, numOutputRows, attributeStats, isRuntime = true))
  } else {
None
  }
{code}

The Spark implementations of ShuffleExchangeLike and BroadcastExchangeLike do 
not currently provide such column statistics, but other custom implementations 
can.

  was:
n QueryStageExec.computeStats we copy partial statistics from materlized query 
stages by calling QueryStageExec#getRuntimeStatistics, which in turn calls 
ShuffleExchangeLike#runtimeStatistics or 
BroadcastExchangeLike#runtimeStatistics.

Only dataSize and numOutputRows are copied into the new Statistics object:

 {code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
    val runtimeStats = getRuntimeStatistics
    val dataSize = runtimeStats.sizeInBytes.max(0)
    val numOutputRows = runtimeStats.rowCount.map(_.max(0))
    Some(Statistics(dataSize, numOutputRows, isRuntime = true))
  } else {
    None
  }
{code}

I would like to also copy over the column statistics stored in 
Statistics.attributeMap so that they can be fed back into the logical plan 
optimization phase. This is a small change as shown below:

{code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
val runtimeStats = getRuntimeStatistics
val dataSize = runtimeStats.sizeInBytes.max(0)
val numOutputRows = runtimeStats.rowCount.map(_.max(0))
val attributeStats = runtimeStats.attributeStats
Some(Statistics(dataSize, numOutputRows, attributeStats, isRuntime = true))
  } else {
None
  }
{code}

The Spark implementations of ShuffleExchangeLike and BroadcastExchangeLike do 
not currently provide such column statistics, but other custom implementations 
can.


> AQE should use available column statistics from completed query stages
> --
>
> Key: SPARK-39991
> URL: https://issues.apache.org/jira/browse/SPARK-39991
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Andy Grove
>Priority: Major
>
> In QueryStageExec.computeStats we copy partial statistics from materlized 
> query stages by calling QueryStageExec#getRuntimeStatistics, which in turn 
> calls ShuffleExchangeLike#runtimeStatistics or 
> BroadcastExchangeLike#runtimeStatistics.
> Only dataSize and numOutputRows are copied into the new Statistics object:
>  {code:scala}
>   def computeStats(): Option[Statistics] = if (isMaterialized) {
>     val runtimeStats = getRuntimeStatistics
>     val dataSize = runtimeStats.sizeInBytes.max(0)
>     val numOutputRows = runtimeStats.rowCount.map(_.max(0))
>     Some(Statistics(dataSize, numOutputRows, isRuntime = true))
>   } else {
>     None
>   }
> {code}
> I would like to also copy over the column statistics stored in 
> Statistics.attributeMap so that they can be fed back into the logical plan 
> optimization phase. This is a small change as shown below:
> {code:scala}
>   def computeStats(): Option[Statistics] = if (isMaterialized) {
> val runtimeStats = getRuntimeStatistics
> val dataSize = runtimeStats.sizeInBytes.max(0)
> val numOutputRows = runtimeStats.rowCount.map(_.max(0))
> val attributeStats = runtimeStats.attributeStats
> Some(Statistics(dataSize, numOutputRows, attributeStats, isRuntime = 
> true))
>   } else {
> None
>   }
> {code}
> The Spark implementations of ShuffleExchangeLike and BroadcastExchangeLike do 
> not currently provide such column statistics, but other custom 
> implementations can.



--
This message was sent by Atlassian Jira
(v

[jira] [Created] (SPARK-39991) AQE should use available column statistics from completed query stages

2022-08-05 Thread Andy Grove (Jira)
Andy Grove created SPARK-39991:
--

 Summary: AQE should use available column statistics from completed 
query stages
 Key: SPARK-39991
 URL: https://issues.apache.org/jira/browse/SPARK-39991
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Andy Grove


n QueryStageExec.computeStats we copy partial statistics from materlized query 
stages by calling QueryStageExec#getRuntimeStatistics, which in turn calls 
ShuffleExchangeLike#runtimeStatistics or 
BroadcastExchangeLike#runtimeStatistics.

Only dataSize and numOutputRows are copied into the new Statistics object:

 {code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
    val runtimeStats = getRuntimeStatistics
    val dataSize = runtimeStats.sizeInBytes.max(0)
    val numOutputRows = runtimeStats.rowCount.map(_.max(0))
    Some(Statistics(dataSize, numOutputRows, isRuntime = true))
  } else {
    None
  }
{code}

I would like to also copy over the column statistics stored in 
Statistics.attributeMap so that they can be fed back into the logical plan 
optimization phase. This is a small change as shown below:

{code:scala}
  def computeStats(): Option[Statistics] = if (isMaterialized) {
val runtimeStats = getRuntimeStatistics
val dataSize = runtimeStats.sizeInBytes.max(0)
val numOutputRows = runtimeStats.rowCount.map(_.max(0))
val attributeStats = runtimeStats.attributeStats
Some(Statistics(dataSize, numOutputRows, attributeStats, isRuntime = true))
  } else {
None
  }
{code}

The Spark implementations of ShuffleExchangeLike and BroadcastExchangeLike do 
not currently provide such column statistics, but other custom implementations 
can.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30788) Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp formatters

2022-03-08 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503123#comment-17503123
 ] 

Andy Grove commented on SPARK-30788:


[~cloud_fan] [~maxgekk] I think the fix version on this issue should be 3.3.0 
and not 3.0.0 ?

> Support `SimpleDateFormat` and `FastDateFormat` as legacy date/timestamp 
> formatters
> ---
>
> Key: SPARK-30788
> URL: https://issues.apache.org/jira/browse/SPARK-30788
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> To be absolutely sure that Spark 3.0 is compatible with 2.4 when 
> spark.sql.legacy.timeParser.enabled is set to true, need to support 
> SimpleDateFormat and FastDateFormat as legacy parsers/formatters in 
> TimestampFormatter. 
> Spark 2.4.x uses the following parsers for parsing/formatting date/timestamp 
> strings:
> # DateTimeFormat in CSV/JSON datasource
> # SimpleDateFormat - is used in JDBC datasource, in partitions parsing.
> # SimpleDateFormat in strong mode (lenient = false). It is used by the 
> date_format, from_unixtime, unix_timestamp and to_unix_timestamp functions.
> Spark 3.0 should use the same parsers in those cases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38060) Inconsistent behavior from JSON option allowNonNumericNumbers

2022-01-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-38060:
---
Description: 
The behavior of the JSON option allowNonNumericNumbers is not consistent:

1. Some NaN and Infinity values are still parsed when the option is set to false

2. Some values are parsed differently depending on whether they are quoted or 
not (see results for positive and negative Infinity)
h2. Input data
{code:java}
{ "number": "NaN" }
{ "number": NaN }
{ "number": "+INF" }
{ "number": +INF }
{ "number": "-INF" }
{ "number": -INF }
{ "number": "INF" }
{ "number": INF }
{ "number": Infinity }
{ "number": +Infinity }
{ "number": -Infinity }
{ "number": "Infinity" }
{ "number": "+Infinity" }
{ "number": "-Infinity" }
{code}
h2. Setup
{code:java}
import org.apache.spark.sql.types._

val schema = StructType(Seq(StructField("number", DataTypes.FloatType, false))) 
{code}
h2. allowNonNumericNumbers = false
{code:java}
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
"false").json("nan_valid.json")

df.show

+-+
|   number|
+-+
|      NaN|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
| Infinity|
|     null|
|-Infinity|
+-+ {code}
h2. allowNonNumericNumbers = true
{code:java}
val df = 
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
"true").json("nan_valid.json") 

df.show

+-+
|   number|
+-+
|      NaN|
|      NaN|
|     null|
| Infinity|
|     null|
|-Infinity|
|     null|
|     null|
| Infinity|
| Infinity|
|-Infinity|
| Infinity|
|     null|
|-Infinity|
+-+{code}

  was:
The behavior of the JSON option allowNonNumericNumbers is not consistent and 
still supports parsing NaN and Infinity values in some cases when the option is 
set to false.
h2. Input data
{code:java}
{ "number": "NaN" }
{ "number": NaN }
{ "number": "+INF" }
{ "number": +INF }
{ "number": "-INF" }
{ "number": -INF }
{ "number": "INF" }
{ "number": INF }
{ "number": Infinity }
{ "number": +Infinity }
{ "number": -Infinity }
{ "number": "Infinity" }
{ "number": "+Infinity" }
{ "number": "-Infinity" }
{code}
h2. Setup
{code:java}
import org.apache.spark.sql.types._

val schema = StructType(Seq(StructField("number", DataTypes.FloatType, false))) 
{code}
h2. allowNonNumericNumbers = false
{code:java}
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
"false").json("nan_valid.json")

df.show

+-+
|   number|
+-+
|      NaN|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
| Infinity|
|     null|
|-Infinity|
+-+ {code}
h2. allowNonNumericNumbers = true
{code:java}
val df = 
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
"true").json("nan_valid.json") 

df.show

+-+
|   number|
+-+
|      NaN|
|      NaN|
|     null|
| Infinity|
|     null|
|-Infinity|
|     null|
|     null|
| Infinity|
| Infinity|
|-Infinity|
| Infinity|
|     null|
|-Infinity|
+-+{code}


> Inconsistent behavior from JSON option allowNonNumericNumbers
> -
>
> Key: SPARK-38060
> URL: https://issues.apache.org/jira/browse/SPARK-38060
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
> Environment: Running Spark 3.2.0 in local mode on Ubuntu 20.04.3 LTS
>Reporter: Andy Grove
>Priority: Minor
>
> The behavior of the JSON option allowNonNumericNumbers is not consistent:
> 1. Some NaN and Infinity values are still parsed when the option is set to 
> false
> 2. Some values are parsed differently depending on whether they are quoted or 
> not (see results for positive and negative Infinity)
> h2. Input data
> {code:java}
> { "number": "NaN" }
> { "number": NaN }
> { "number": "+INF" }
> { "number": +INF }
> { "number": "-INF" }
> { "number": -INF }
> { "number": "INF" }
> { "number": INF }
> { "number": Infinity }
> { "number": +Infinity }
> { "number": -Infinity }
> { "number": "Infinity" }
> { "number": "+Infinity" }
> { "number": "-Infinity" }
> {code}
> h2. Setup
> {code:java}
> import org.apache.spark.sql.types._
> val schema = StructType(Seq(StructField("number", DataTypes.FloatType, 
> false))) {code}
> h2. allowNonNumericNumbers = false
> {code:java}
> spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
> "false").json("nan_valid.json")
> df.show
> +-+
> |   number|
> +-+
> |      NaN|
> |     null|
> |     null|
> |     null|
> |     null|
> |     null|
> |     null|
> |     null|
> |     null|
> |     null|
> |     null|
> | Infinity|
> |     null|
> |-Infinity|
> +-+ {code}
> h2. allowNonNumericNumbers =

[jira] [Created] (SPARK-38060) Inconsistent behavior from JSON option allowNonNumericNumbers

2022-01-28 Thread Andy Grove (Jira)
Andy Grove created SPARK-38060:
--

 Summary: Inconsistent behavior from JSON option 
allowNonNumericNumbers
 Key: SPARK-38060
 URL: https://issues.apache.org/jira/browse/SPARK-38060
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
 Environment: Running Spark 3.2.0 in local mode on Ubuntu 20.04.3 LTS
Reporter: Andy Grove


The behavior of the JSON option allowNonNumericNumbers is not consistent and 
still supports parsing NaN and Infinity values in some cases when the option is 
set to false.
h2. Input data
{code:java}
{ "number": "NaN" }
{ "number": NaN }
{ "number": "+INF" }
{ "number": +INF }
{ "number": "-INF" }
{ "number": -INF }
{ "number": "INF" }
{ "number": INF }
{ "number": Infinity }
{ "number": +Infinity }
{ "number": -Infinity }
{ "number": "Infinity" }
{ "number": "+Infinity" }
{ "number": "-Infinity" }
{code}
h2. Setup
{code:java}
import org.apache.spark.sql.types._

val schema = StructType(Seq(StructField("number", DataTypes.FloatType, false))) 
{code}
h2. allowNonNumericNumbers = false
{code:java}
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
"false").json("nan_valid.json")

df.show

+-+
|   number|
+-+
|      NaN|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
|     null|
| Infinity|
|     null|
|-Infinity|
+-+ {code}
h2. allowNonNumericNumbers = true
{code:java}
val df = 
spark.read.format("json").schema(schema).option("allowNonNumericNumbers", 
"true").json("nan_valid.json") 

df.show

+-+
|   number|
+-+
|      NaN|
|      NaN|
|     null|
| Infinity|
|     null|
|-Infinity|
|     null|
|     null|
| Infinity|
| Infinity|
|-Infinity|
| Infinity|
|     null|
|-Infinity|
+-+{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec

2021-09-03 Thread Andy Grove (Jira)
Andy Grove created SPARK-3:
--

 Summary: [SQL] Regression in AQEShuffleReadExec
 Key: SPARK-3
 URL: https://issues.apache.org/jira/browse/SPARK-3
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Andy Grove


I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 
3.2 release candidate and there is a regression in AQEShuffleReadExec where it 
now throws an exception if the shuffle's output partitioning does not match a 
specific list of schemes.

The problem can be solved by returning UnknownPartitioning, as it does in some 
cases, rather than throwing an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36666) [SQL] Regression in AQEShuffleReadExec

2021-09-03 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-3:
---
Description: 
I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 
3.2 release candidate and there is a regression in AQEShuffleReadExec where it 
now throws an exception if the shuffle's output partitioning does not match a 
specific list of schemes.

The problem can be solved by returning UnknownPartitioning, as it already does 
in some cases, rather than throwing an exception.

  was:
I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 
3.2 release candidate and there is a regression in AQEShuffleReadExec where it 
now throws an exception if the shuffle's output partitioning does not match a 
specific list of schemes.

The problem can be solved by returning UnknownPartitioning, as it does in some 
cases, rather than throwing an exception.


> [SQL] Regression in AQEShuffleReadExec
> --
>
> Key: SPARK-3
> URL: https://issues.apache.org/jira/browse/SPARK-3
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> I am currently testing the RAPIDS Accelerator for Apache Spark with the Spark 
> 3.2 release candidate and there is a regression in AQEShuffleReadExec where 
> it now throws an exception if the shuffle's output partitioning does not 
> match a specific list of schemes.
> The problem can be solved by returning UnknownPartitioning, as it already 
> does in some cases, rather than throwing an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to timestamp

2021-07-09 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-36076:
---
Summary: [SQL] ArrayIndexOutOfBounds in CAST string to timestamp  (was: 
[SQL] ArrayIndexOutOfBounds in CAST string to date)

> [SQL] ArrayIndexOutOfBounds in CAST string to timestamp
> ---
>
> Key: SPARK-36076
> URL: https://issues.apache.org/jira/browse/SPARK-36076
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Priority: Major
>
> I discovered this bug during some fuzz testing.
> {code:java}
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>   /_/
>  
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> 
> scala> import org.apache.spark.sql.types.DataTypes
> scala> val df = Seq(":8:434421+ 98:38").toDF("c0")
> df: org.apache.spark.sql.DataFrame = [c0: string]
> scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
> df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]
> scala> df2.show
> java.lang.ArrayIndexOutOfBoundsException: 9
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to date

2021-07-09 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-36076:
---
Description: 
I discovered this bug during some fuzz testing.
{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
  /_/
 
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.scala> 

scala> import org.apache.spark.sql.types.DataTypes

scala> val df = Seq(":8:434421+ 98:38").toDF("c0")
df: org.apache.spark.sql.DataFrame = [c0: string]

scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]

scala> df2.show
java.lang.ArrayIndexOutOfBoundsException: 9
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
 {code}

  was:
{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
  /_/
 
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.scala> 

scala> import org.apache.spark.sql.types.DataTypes

scala> val df = Seq(":8:434421+ 98:38").toDF("c0")
df: org.apache.spark.sql.DataFrame = [c0: string]

scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]

scala> df2.show
java.lang.ArrayIndexOutOfBoundsException: 9
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
 {code}


> [SQL] ArrayIndexOutOfBounds in CAST string to date
> --
>
> Key: SPARK-36076
> URL: https://issues.apache.org/jira/browse/SPARK-36076
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Priority: Major
>
> I discovered this bug during some fuzz testing.
> {code:java}
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>   /_/
>  
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> 
> scala> import org.apache.spark.sql.types.DataTypes
> scala> val df = Seq(":8:434421+ 98:38").toDF("c0")
> df: org.apache.spark.sql.DataFrame = [c0: string]
> scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
> df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]
> scala> df2.show
> java.lang.ArrayIndexOutOfBoundsException: 9
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to date

2021-07-09 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-36076:
---
Description: 
{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
  /_/
 
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.scala> 

scala> import org.apache.spark.sql.types.DataTypes

scala> val df = Seq(":8:434421+ 98:38").toDF("c0")
df: org.apache.spark.sql.DataFrame = [c0: string]

scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]

scala> df2.show
java.lang.ArrayIndexOutOfBoundsException: 9
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
 {code}

  was:
{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
  /_/
 
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.scala> 
spark.conf.set("spark.rapids.sql.enabled", "false")scala> val df = 
Seq(":8:434421+ 98:38").toDF("c0")
df: org.apache.spark.sql.DataFrame = [c0: string]scala> val df2 = 
df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
:25: error: not found: value DataTypes
   val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
^scala> import 
org.spark.sql.types.DataTypes
:23: error: object spark is not a member of package org
   import org.spark.sql.types.DataTypes
  ^scala> import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.DataTypesscala> val df2 = df.withColumn("c1", 
col("c0").cast(DataTypes.TimestampType))
df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]scala> df2.show
java.lang.ArrayIndexOutOfBoundsException: 9
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
 {code}


> [SQL] ArrayIndexOutOfBounds in CAST string to date
> --
>
> Key: SPARK-36076
> URL: https://issues.apache.org/jira/browse/SPARK-36076
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Priority: Major
>
> {code:java}
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.1
>   /_/
>  
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> 
> scala> import org.apache.spark.sql.types.DataTypes
> scala> val df = Seq(":8:434421+ 98:38").toDF("c0")
> df: org.apache.spark.sql.DataFrame = [c0: string]
> scala> val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
> df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]
> scala> df2.show
> java.lang.ArrayIndexOutOfBoundsException: 9
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
>   at 
> org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-

[jira] [Created] (SPARK-36076) [SQL] ArrayIndexOutOfBounds in CAST string to date

2021-07-09 Thread Andy Grove (Jira)
Andy Grove created SPARK-36076:
--

 Summary: [SQL] ArrayIndexOutOfBounds in CAST string to date
 Key: SPARK-36076
 URL: https://issues.apache.org/jira/browse/SPARK-36076
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Andy Grove


{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.1.1
  /_/
 
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_282)
Type in expressions to have them evaluated.
Type :help for more information.scala> 
spark.conf.set("spark.rapids.sql.enabled", "false")scala> val df = 
Seq(":8:434421+ 98:38").toDF("c0")
df: org.apache.spark.sql.DataFrame = [c0: string]scala> val df2 = 
df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
:25: error: not found: value DataTypes
   val df2 = df.withColumn("c1", col("c0").cast(DataTypes.TimestampType))
^scala> import 
org.spark.sql.types.DataTypes
:23: error: object spark is not a member of package org
   import org.spark.sql.types.DataTypes
  ^scala> import org.apache.spark.sql.types.DataTypes
import org.apache.spark.sql.types.DataTypesscala> val df2 = df.withColumn("c1", 
col("c0").cast(DataTypes.TimestampType))
df2: org.apache.spark.sql.DataFrame = [c0: string, c1: timestamp]scala> df2.show
java.lang.ArrayIndexOutOfBoundsException: 9
  at 
org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTimestamp(DateTimeUtils.scala:328)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$2(Cast.scala:455)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.buildCast(Cast.scala:295)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.$anonfun$castToTimestamp$1(Cast.scala:451)
  at 
org.apache.spark.sql.catalyst.expressions.CastBase.nullSafeEval(Cast.scala:840)
  at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:476)
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-29 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Affects Version/s: 3.0.3
   3.1.2

> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.3, 3.1.2, 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-28 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
supportsColumnar method also always returns false.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then call the appropriate doExecute method, bypassing the 
doExecute methods on AdaptiveSparkPlanExec. We would like a supported mechanism 
for executing a columnar AQE plan so that we do not need to use reflection.

 

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality. We also need a mechanism for invoking 
finalPlanUpdate after the query has been executed.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The 
> supportsColumnar method also always returns false.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then call the appropriate doExecute method, bypassing 
> the doExecute methods on AdaptiveSparkPlanExec. We would like a supported 
> mechanism for executing a columnar AQE plan so that we do not need to use 
> reflection.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-24 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality. We also need a mechanism for invoking 
finalPlanUpdate after the query has been executed.

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
> which is another limitation. However, AQE is special because we don't know if 
> the final stage will be columnar or not until the child stages have been 
> executed and the final stage has been re-planned and re-optimized, so we 
> can't easily change the behavior of supportsColumnar. We can't just implement 
> doExecuteColumnar because we don't know whether the final stage will be 
> columnar oir not until after we start executing the query.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then calling the appropriate doExecute method, 
> bypassing the doExecute methods on AdaptiveSparkPlanExec.
> I propose that we make getFinalPhysicalPlan public, and part of the developer 
> API, so that columnar plugins 

[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-24 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35881:
---
Description: 
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then determine if that plan is 
columnar or not, and then calling the appropriate doExecute method, bypassing 
the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 

  was:
In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then invoke that plan, 
bypassing the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 


> [SQL] AQE does not support columnar execution for the final query stage
> ---
>
> Key: SPARK-35881
> URL: https://issues.apache.org/jira/browse/SPARK-35881
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Andy Grove
>Priority: Major
>
> In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
> are executed until the entire query has been executed. These stages can be 
> row-based or columnar. However, the final stage, produced by the private 
> getFinalPhysicalPlan method is always assumed to be row-based. The only way 
> to execute the final stage is by calling the various doExecute methods on 
> AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
> which is another limitation. However, AQE is special because we don't know if 
> the final stage will be columnar or not until the child stages have been 
> executed and the final stage has been re-planned and re-optimized, so we 
> can't easily change the behavior of supportsColumnar. We can't just implement 
> doExecuteColumnar because we don't know whether the final stage will be 
> columnar oir not until after we start executing the query.
> In the RAPIDS Accelerator for Apache Spark, we currently call the private 
> getFinalPhysicalPlan method using reflection and then determine if that plan 
> is columnar or not, and then calling the appropriate doExecute method, 
> bypassing the doExecute methods on AdaptiveSparkPlanExec.
> I propose that we make getFinalPhysicalPlan public, and part of the developer 
> API, so that columnar plugins can call this method and determine if the final 
> stage is columnar or not, and execute it appropriately. This would not affect 
> any existing Spark functionality.
>

[jira] [Created] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage

2021-06-24 Thread Andy Grove (Jira)
Andy Grove created SPARK-35881:
--

 Summary: [SQL] AQE does not support columnar execution for the 
final query stage
 Key: SPARK-35881
 URL: https://issues.apache.org/jira/browse/SPARK-35881
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Andy Grove


In AdaptiveSparkPlanExec, a query is broken down into stages and these stages 
are executed until the entire query has been executed. These stages can be 
row-based or columnar. However, the final stage, produced by the private 
getFinalPhysicalPlan method is always assumed to be row-based. The only way to 
execute the final stage is by calling the various doExecute methods on 
AdaptiveSparkPlanExec. The supportsColumnar method also always returns false, 
which is another limitation. However, AQE is special because we don't know if 
the final stage will be columnar or not until the child stages have been 
executed and the final stage has been re-planned and re-optimized, so we can't 
easily change the behavior of supportsColumnar. We can't just implement 
doExecuteColumnar because we don't know whether the final stage will be 
columnar oir not until after we start executing the query.

In the RAPIDS Accelerator for Apache Spark, we currently call the private 
getFinalPhysicalPlan method using reflection and then invoke that plan, 
bypassing the doExecute methods on AdaptiveSparkPlanExec.

I propose that we make getFinalPhysicalPlan public, and part of the developer 
API, so that columnar plugins can call this method and determine if the final 
stage is columnar or not, and execute it appropriately. This would not affect 
any existing Spark functionality.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35767) [SQL] CoalesceExec can execute child plan twice

2021-06-15 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35767:
---
Priority: Minor  (was: Major)

> [SQL] CoalesceExec can execute child plan twice
> ---
>
> Key: SPARK-35767
> URL: https://issues.apache.org/jira/browse/SPARK-35767
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.2
>Reporter: Andy Grove
>Priority: Minor
>
> CoalesceExec calls `child.execute()` in the if condition and throws away the 
> results, then calls `child.execute()` again in the else condition. This could 
> cause a section of the plan to be executed twice.
> {code:java}
> protected override def doExecute(): RDD[InternalRow] = {
>   if (numPartitions == 1 && child.execute().getNumPartitions < 1) {
> // Make sure we don't output an RDD with 0 partitions, when claiming that 
> we have a
> // `SinglePartition`.
> new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions)
>   } else {
> child.execute().coalesce(numPartitions, shuffle = false)
>   }
> } {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35767) [SQL] CoalesceExec can execute child plan twice

2021-06-15 Thread Andy Grove (Jira)
Andy Grove created SPARK-35767:
--

 Summary: [SQL] CoalesceExec can execute child plan twice
 Key: SPARK-35767
 URL: https://issues.apache.org/jira/browse/SPARK-35767
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.2, 3.0.2
Reporter: Andy Grove


CoalesceExec calls `child.execute()` in the if condition and throws away the 
results, then calls `child.execute()` again in the else condition. This could 
cause a section of the plan to be executed twice.
{code:java}
protected override def doExecute(): RDD[InternalRow] = {
  if (numPartitions == 1 && child.execute().getNumPartitions < 1) {
// Make sure we don't output an RDD with 0 partitions, when claiming that 
we have a
// `SinglePartition`.
new CoalesceExec.EmptyRDDWithPartitions(sparkContext, numPartitions)
  } else {
child.execute().coalesce(numPartitions, shuffle = false)
  }
} {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32304) Flaky Test: AdaptiveQueryExecSuite.multiple joins

2021-05-19 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347817#comment-17347817
 ] 

Andy Grove commented on SPARK-32304:


[~dongjoon] This is the issue we just saw in branch-3.0.

 

 

> Flaky Test: AdaptiveQueryExecSuite.multiple joins
> -
>
> Key: SPARK-32304
> URL: https://issues.apache.org/jira/browse/SPARK-32304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> {code}
> AdaptiveQueryExecSuite:
> - Change merge join to broadcast join (313 milliseconds)
> - Reuse the parallelism of CoalescedShuffleReaderExec in 
> LocalShuffleReaderExec (265 milliseconds)
> - Reuse the default parallelism in LocalShuffleReaderExec (230 milliseconds)
> - Empty stage coalesced to 0-partition RDD (514 milliseconds)
> - Scalar subquery (406 milliseconds)
> - Scalar subquery in later stages (500 milliseconds)
> - multiple joins *** FAILED *** (739 milliseconds)
>   ArrayBuffer(BroadcastHashJoin [b#251429], [a#251438], Inner, BuildLeft
>   :- BroadcastQueryStage 5
>   :  +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[3, int, 
> false] as bigint))), [id=#504817]
>   : +- CustomShuffleReader local
>   :+- ShuffleQueryStage 4
>   :   +- Exchange hashpartitioning(b#251429, 5), true, [id=#504777]
>   :  +- *(7) SortMergeJoin [key#251418], [a#251428], Inner
>   : :- *(5) Sort [key#251418 ASC NULLS FIRST], false, 0
>   : :  +- CustomShuffleReader coalesced
>   : : +- ShuffleQueryStage 0
>   : :+- Exchange hashpartitioning(key#251418, 5), 
> true, [id=#504656]
>   : :   +- *(1) Filter (isnotnull(value#251419) AND 
> (cast(value#251419 as int) = 1))
>   : :  +- *(1) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).key AS key#251418, 
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData, true])).value, true, false) 
> AS value#251419]
>   : : +- Scan[obj#251417]
>   : +- *(6) Sort [a#251428 ASC NULLS FIRST], false, 0
>   :+- CustomShuffleReader coalesced
>   :   +- ShuffleQueryStage 1
>   :  +- Exchange hashpartitioning(a#251428, 5), true, 
> [id=#504663]
>   : +- *(2) Filter (b#251429 = 1)
>   :+- *(2) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData2, true])).a AS a#251428, 
> knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData2, true])).b AS b#251429]
>   :   +- Scan[obj#251427]
>   +- BroadcastHashJoin [n#251498], [a#251438], Inner, BuildRight
>  :- CustomShuffleReader local
>  :  +- ShuffleQueryStage 2
>  : +- Exchange hashpartitioning(n#251498, 5), true, [id=#504680]
>  :+- *(3) Filter (n#251498 = 1)
>  :   +- *(3) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$LowerCaseData, true])).n AS n#251498, 
> staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$LowerCaseData, true])).l, true, false) 
> AS l#251499]
>  :  +- Scan[obj#251497]
>  +- BroadcastQueryStage 6
> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, 
> int, false] as bigint))), [id=#504830]
>+- CustomShuffleReader local
>   +- ShuffleQueryStage 3
>  +- Exchange hashpartitioning(a#251438, 5), true, [id=#504694]
> +- *(4) Filter (a#251438 = 1)
>+- *(4) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData3, true])).a AS a#251438, 
> unwrapoption(IntegerType, knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$TestData3, true])).b) AS b#251439]
>   +- Scan[obj#251437]
>   , BroadcastHashJoin [n#251498], [a#251438], Inner, BuildRight
>   :- CustomShuffleReader local
>   :  +- ShuffleQueryStage 2
>   : +- Exchange hashpartitioning(n#251498, 5), true, [id=#504680]
>   :+- *(3) Filter (n#251498 = 1)
>   :   +- *(3) SerializeFromObject 
> [knownnotnull(assertnotnull(input[0, 
> org.apache.spark.sql.test.SQLTestData$Lowe

[jira] [Updated] (SPARK-35353) Cross-building docker images to ARM64 is failing (with Ubuntu host)

2021-05-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-35353:
---
Summary: Cross-building docker images to ARM64 is failing (with Ubuntu 
host)  (was: Cross-building docker images to ARM64 is failing)

> Cross-building docker images to ARM64 is failing (with Ubuntu host)
> ---
>
> Key: SPARK-35353
> URL: https://issues.apache.org/jira/browse/SPARK-35353
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Priority: Minor
>
> I was trying to cross-build Spark 3.1.1 for ARM64 so that I could deploy to a 
> Raspberry Pi Kubernetes cluster this weekend and the Docker build fails.
> Here are the commands I used:
> {code:java}
> docker buildx create --use
> ./bin/docker-image-tool.sh -n -r andygrove -t 3.1.1 -X build {code}
> The Docker build for ARM64 fails on the following command:
> {code:java}
>  apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps{code}
> The install fails with "Error while loading /usr/sbin/dpkg-split: No such 
> file or directory".
> Here is a fragment of the output showing the relevant error message.
> {code:java}
> #6 6.034 Get:35 https://deb.debian.org/debian buster/main arm64 libnss3 arm64 
> 2:3.42.1-1+deb10u3 [1082 kB]
> #6 6.102 Get:36 https://deb.debian.org/debian buster/main arm64 psmisc arm64 
> 23.2-1 [122 kB]
> #6 6.109 Get:37 https://deb.debian.org/debian buster/main arm64 tini arm64 
> 0.18.0-1 [194 kB]
> #6 6.767 debconf: delaying package configuration, since apt-utils is not 
> installed
> #6 6.883 Fetched 18.1 MB in 1s (13.4 MB/s)
> #6 6.956 Error while loading /usr/sbin/dpkg-split: No such file or directory
> #6 6.959 Error while loading /usr/sbin/dpkg-deb: No such file or directory
> #6 6.961 dpkg: error processing archive 
> /tmp/apt-dpkg-install-NdOR40/00-libncurses6_6.1+20181013-2+deb10u2_arm64.deb 
> (--unpack):
>  {code}
> My host environment details:
>  * Ubuntu 18.04.5 LTS
>  * Docker version 20.10.6, build 370c289
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35353) Cross-building docker images to ARM64 is failing

2021-05-10 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342066#comment-17342066
 ] 

Andy Grove commented on SPARK-35353:


The issue seems specific to running on an Ubuntu host. I reproduced the issue 
on two computers. It works fine on my Macbook Pro though.

> Cross-building docker images to ARM64 is failing
> 
>
> Key: SPARK-35353
> URL: https://issues.apache.org/jira/browse/SPARK-35353
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.1
>Reporter: Andy Grove
>Priority: Minor
>
> I was trying to cross-build Spark 3.1.1 for ARM64 so that I could deploy to a 
> Raspberry Pi Kubernetes cluster this weekend and the Docker build fails.
> Here are the commands I used:
> {code:java}
> docker buildx create --use
> ./bin/docker-image-tool.sh -n -r andygrove -t 3.1.1 -X build {code}
> The Docker build for ARM64 fails on the following command:
> {code:java}
>  apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps{code}
> The install fails with "Error while loading /usr/sbin/dpkg-split: No such 
> file or directory".
> Here is a fragment of the output showing the relevant error message.
> {code:java}
> #6 6.034 Get:35 https://deb.debian.org/debian buster/main arm64 libnss3 arm64 
> 2:3.42.1-1+deb10u3 [1082 kB]
> #6 6.102 Get:36 https://deb.debian.org/debian buster/main arm64 psmisc arm64 
> 23.2-1 [122 kB]
> #6 6.109 Get:37 https://deb.debian.org/debian buster/main arm64 tini arm64 
> 0.18.0-1 [194 kB]
> #6 6.767 debconf: delaying package configuration, since apt-utils is not 
> installed
> #6 6.883 Fetched 18.1 MB in 1s (13.4 MB/s)
> #6 6.956 Error while loading /usr/sbin/dpkg-split: No such file or directory
> #6 6.959 Error while loading /usr/sbin/dpkg-deb: No such file or directory
> #6 6.961 dpkg: error processing archive 
> /tmp/apt-dpkg-install-NdOR40/00-libncurses6_6.1+20181013-2+deb10u2_arm64.deb 
> (--unpack):
>  {code}
> My host environment details:
>  * Ubuntu 18.04.5 LTS
>  * Docker version 20.10.6, build 370c289
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35353) Cross-building docker images to ARM64 is failing

2021-05-08 Thread Andy Grove (Jira)
Andy Grove created SPARK-35353:
--

 Summary: Cross-building docker images to ARM64 is failing
 Key: SPARK-35353
 URL: https://issues.apache.org/jira/browse/SPARK-35353
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.1.1
Reporter: Andy Grove


I was trying to cross-build Spark 3.1.1 for ARM64 so that I could deploy to a 
Raspberry Pi Kubernetes cluster this weekend and the Docker build fails.

Here are the commands I used:
{code:java}
docker buildx create --use
./bin/docker-image-tool.sh -n -r andygrove -t 3.1.1 -X build {code}
The Docker build for ARM64 fails on the following command:
{code:java}
 apt install -y bash tini libc6 libpam-modules krb5-user libnss3 procps{code}
The install fails with "Error while loading /usr/sbin/dpkg-split: No such file 
or directory".

Here is a fragment of the output showing the relevant error message.
{code:java}
#6 6.034 Get:35 https://deb.debian.org/debian buster/main arm64 libnss3 arm64 
2:3.42.1-1+deb10u3 [1082 kB]
#6 6.102 Get:36 https://deb.debian.org/debian buster/main arm64 psmisc arm64 
23.2-1 [122 kB]
#6 6.109 Get:37 https://deb.debian.org/debian buster/main arm64 tini arm64 
0.18.0-1 [194 kB]
#6 6.767 debconf: delaying package configuration, since apt-utils is not 
installed
#6 6.883 Fetched 18.1 MB in 1s (13.4 MB/s)
#6 6.956 Error while loading /usr/sbin/dpkg-split: No such file or directory
#6 6.959 Error while loading /usr/sbin/dpkg-deb: No such file or directory
#6 6.961 dpkg: error processing archive 
/tmp/apt-dpkg-install-NdOR40/00-libncurses6_6.1+20181013-2+deb10u2_arm64.deb 
(--unpack):
 {code}
My host environment details:
 * Ubuntu 18.04.5 LTS
 * Docker version 20.10.6, build 370c289

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-35093) [SQL] AQE columnar mismatch on exchange reuse

2021-04-15 Thread Andy Grove (Jira)
Andy Grove created SPARK-35093:
--

 Summary: [SQL] AQE columnar mismatch on exchange reuse
 Key: SPARK-35093
 URL: https://issues.apache.org/jira/browse/SPARK-35093
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1, 3.0.2
Reporter: Andy Grove


With AQE enabled, AdaptiveSparkPlanExec will attempt to reuse exchanges that 
are semantically equal.

This is done by comparing the canonicalized plan for two Exchange nodes to see 
if they are the same.

Unfortunately this does not take into account the fact that two exchanges with 
the same canonical plan might be replaced by a plugin in a way that makes them 
not compatible. For example, a plugin could create one version with 
supportsColumnar=true and another with supportsColumnar=false. It is not valid 
to re-use exchanges if there is a supportsColumnar mismatch.

I have tested a fix for this and will put up a PR once I figure out how to 
write the tests.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34682) Regression in "operating on canonicalized plan" check in CustomShuffleReaderExec

2021-03-09 Thread Andy Grove (Jira)
Andy Grove created SPARK-34682:
--

 Summary: Regression in "operating on canonicalized plan" check in 
CustomShuffleReaderExec
 Key: SPARK-34682
 URL: https://issues.apache.org/jira/browse/SPARK-34682
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.1
Reporter: Andy Grove
 Fix For: 3.2.0, 3.1.2


In Spark 3.0.2 if I attempt to execute on a canonicalized version of 
CustomShuffleReaderExec I get an error "operating on canonicalized plan", as 
expected.

There is a regression in Spark 3.1.1 where this check can never be reached 
because of a new call to sendDriverMetrics that was added prior to the check. 
This method will fail if operating on a canonicalized plan because it assumes 
the existence of metrics that do not exist if this is a canonicalized plan.
{code:java}
 private lazy val shuffleRDD: RDD[_] = {
  sendDriverMetrics()

  shuffleStage.map { stage =>
stage.shuffle.getShuffleRDD(partitionSpecs.toArray)
  }.getOrElse {
throw new IllegalStateException("operating on canonicalized plan")
  }
}{code}
The specific error looks like this:


{code:java}
java.util.NoSuchElementException: key not found: numPartitions
at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:101)
at scala.collection.immutable.Map$EmptyMap$.apply(Map.scala:99)
at 
org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.sendDriverMetrics(CustomShuffleReaderExec.scala:122)
at 
org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD$lzycompute(CustomShuffleReaderExec.scala:182)
at 
org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.shuffleRDD(CustomShuffleReaderExec.scala:181)
at 
org.apache.spark.sql.execution.adaptive.CustomShuffleReaderExec.doExecuteColumnar(CustomShuffleReaderExec.scala:196)
 {code}
I think the fix is simply to avoid calling sendDriverMetrics if the plan is 
canonicalized and I am planning on creating a PR to fix this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes

2021-03-05 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296391#comment-17296391
 ] 

Andy Grove commented on SPARK-34645:


I can reproduce this with Spark 3.1.1 + JDK 11 as well (and works fine with JDK 
8).

I will see if I can make a repro case.

> [K8S] Driver pod stuck in Running state after job completes
> ---
>
> Key: SPARK-34645
> URL: https://issues.apache.org/jira/browse/SPARK-34645
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.2
> Environment: Kubernetes:
> {code:java}
> Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", 
> GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", 
> BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", 
> Platform:"linux/amd64"}
> Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", 
> GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", 
> BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", 
> Platform:"linux/amd64"}
>  {code}
>Reporter: Andy Grove
>Priority: Major
>
> I am running automated benchmarks in k8s, using spark-submit in cluster mode, 
> so the driver runs in a pod.
> When running with Spark 3.0.1 and 3.1.1 everything works as expected and I 
> see the Spark context being shut down after the job completes.
> However, when running with Spark 3.0.2 I do not see the context get shut down 
> and the driver pod is stuck in the Running state indefinitely.
> This is the output I see after job completion with 3.0.1 and 3.1.1 and this 
> output does not appear with 3.0.2. With 3.0.2 there is no output at all after 
> the job completes.
> {code:java}
> 2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from 
> shutdown hook
> 2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped 
> Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
> 2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at 
> http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040
> 2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting 
> down all executors
> 2021-03-05 20:09:24,600 INFO 
> k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes 
> client has been closed (this is expected if the application is shutting down.)
> 2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared
> 2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped
> 2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster 
> stopped
> 2021-03-05 20:09:24,752 INFO 
> scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped 
> SparkContext
> 2021-03-05 20:09:24,768 INFO util.ShutdownHookManager: Shutdown hook called
> 2021-03-05 20:09:24,769 INFO util.ShutdownHookManager: Deleting directory 
> /var/data/spark-67fa44df-e86c-463a-a149-25d95817ff8e/spark-a5476c14-c103-4108-b733-961400485d8a
> 2021-03-05 20:09:24,772 INFO util.ShutdownHookManager: Deleting directory 
> /tmp/spark-9d6261f5-4394-472b-9c9a-e22bde877814
> 2021-03-05 20:09:24,778 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
> system stopped.
> 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
> system shutdown complete.
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes

2021-03-05 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296358#comment-17296358
 ] 

Andy Grove commented on SPARK-34645:


I just confirmed that this only happens with JDK 11 and works fine with JDK 8.

> [K8S] Driver pod stuck in Running state after job completes
> ---
>
> Key: SPARK-34645
> URL: https://issues.apache.org/jira/browse/SPARK-34645
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.2
> Environment: Kubernetes:
> {code:java}
> Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", 
> GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", 
> BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", 
> Platform:"linux/amd64"}
> Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", 
> GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", 
> BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", 
> Platform:"linux/amd64"}
>  {code}
>Reporter: Andy Grove
>Priority: Major
>
> I am running automated benchmarks in k8s, using spark-submit in cluster mode, 
> so the driver runs in a pod.
> When running with Spark 3.0.1 and 3.1.1 everything works as expected and I 
> see the Spark context being shut down after the job completes.
> However, when running with Spark 3.0.2 I do not see the context get shut down 
> and the driver pod is stuck in the Running state indefinitely.
> This is the output I see after job completion with 3.0.1 and 3.1.1 and this 
> output does not appear with 3.0.2. With 3.0.2 there is no output at all after 
> the job completes.
> {code:java}
> 2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from 
> shutdown hook
> 2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped 
> Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
> 2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at 
> http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040
> 2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting 
> down all executors
> 2021-03-05 20:09:24,600 INFO 
> k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes 
> client has been closed (this is expected if the application is shutting down.)
> 2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared
> 2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped
> 2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster 
> stopped
> 2021-03-05 20:09:24,752 INFO 
> scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped 
> SparkContext
> 2021-03-05 20:09:24,768 INFO util.ShutdownHookManager: Shutdown hook called
> 2021-03-05 20:09:24,769 INFO util.ShutdownHookManager: Deleting directory 
> /var/data/spark-67fa44df-e86c-463a-a149-25d95817ff8e/spark-a5476c14-c103-4108-b733-961400485d8a
> 2021-03-05 20:09:24,772 INFO util.ShutdownHookManager: Deleting directory 
> /tmp/spark-9d6261f5-4394-472b-9c9a-e22bde877814
> 2021-03-05 20:09:24,778 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
> system stopped.
> 2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
> system shutdown complete.
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-34645) [K8S] Driver pod stuck in Running state after job completes

2021-03-05 Thread Andy Grove (Jira)
Andy Grove created SPARK-34645:
--

 Summary: [K8S] Driver pod stuck in Running state after job 
completes
 Key: SPARK-34645
 URL: https://issues.apache.org/jira/browse/SPARK-34645
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.0.2
 Environment: Kubernetes:


{code:java}
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.2", 
GitCommit:"f5743093fd1c663cb0cbc89748f730662345d44d", GitTreeState:"clean", 
BuildDate:"2020-09-16T13:41:02Z", GoVersion:"go1.15", Compiler:"gc", 
Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", 
GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", 
BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", 
Platform:"linux/amd64"}
 {code}
Reporter: Andy Grove


I am running automated benchmarks in k8s, using spark-submit in cluster mode, 
so the driver runs in a pod.

When running with Spark 3.0.1 and 3.1.1 everything works as expected and I see 
the Spark context being shut down after the job completes.

However, when running with Spark 3.0.2 I do not see the context get shut down 
and the driver pod is stuck in the Running state indefinitely.

This is the output I see after job completion with 3.0.1 and 3.1.1 and this 
output does not appear with 3.0.2. With 3.0.2 there is no output at all after 
the job completes.
{code:java}
2021-03-05 20:09:24,576 INFO spark.SparkContext: Invoking stop() from shutdown 
hook
2021-03-05 20:09:24,592 INFO server.AbstractConnector: Stopped 
Spark@784499d0{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2021-03-05 20:09:24,594 INFO ui.SparkUI: Stopped Spark web UI at 
http://benchmark-runner-3e8a38780400e0d1-driver-svc.default.svc:4040
2021-03-05 20:09:24,599 INFO k8s.KubernetesClusterSchedulerBackend: Shutting 
down all executors
2021-03-05 20:09:24,600 INFO 
k8s.KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
2021-03-05 20:09:24,609 WARN k8s.ExecutorPodsWatchSnapshotSource: Kubernetes 
client has been closed (this is expected if the application is shutting down.)
2021-03-05 20:09:24,719 INFO spark.MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
2021-03-05 20:09:24,736 INFO memory.MemoryStore: MemoryStore cleared
2021-03-05 20:09:24,738 INFO storage.BlockManager: BlockManager stopped
2021-03-05 20:09:24,744 INFO storage.BlockManagerMaster: BlockManagerMaster 
stopped
2021-03-05 20:09:24,752 INFO 
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
2021-03-05 20:09:24,768 INFO spark.SparkContext: Successfully stopped 
SparkContext
2021-03-05 20:09:24,768 INFO util.ShutdownHookManager: Shutdown hook called
2021-03-05 20:09:24,769 INFO util.ShutdownHookManager: Deleting directory 
/var/data/spark-67fa44df-e86c-463a-a149-25d95817ff8e/spark-a5476c14-c103-4108-b733-961400485d8a
2021-03-05 20:09:24,772 INFO util.ShutdownHookManager: Deleting directory 
/tmp/spark-9d6261f5-4394-472b-9c9a-e22bde877814
2021-03-05 20:09:24,778 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
metrics system...
2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system stopped.
2021-03-05 20:09:24,779 INFO impl.MetricsSystemImpl: s3a-file-system metrics 
system shutdown complete.
 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33744) Canonicalization error in SortAggregate

2020-12-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-33744:
---
Affects Version/s: 3.0.1

> Canonicalization error in SortAggregate
> ---
>
> Key: SPARK-33744
> URL: https://issues.apache.org/jira/browse/SPARK-33744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Andy Grove
>Priority: Minor
>
> The canonicalization plan for a simple aggregate query is different each time 
> for SortAggregate but not for HashAggregate.
> The issue can be demonstrated by adding the following unit tests to 
> SQLQuerySuite. The HashAggregate test passes and the SortAggregate test fails.
> The first test has numeric input and the second test is operating on strings, 
> which forces the use of SortAggregate rather than HashAggregate.
> {code:java}
> test("HashAggregate canonicalization") {
>   val data = Seq((1, 1)).toDF("c0", "c1")
>   val df1 = data.groupBy(col("c0")).agg(first("c1"))
>   val df2 = data.groupBy(col("c0")).agg(first("c1"))
>   assert(df1.queryExecution.executedPlan.canonicalized ==
>   df2.queryExecution.executedPlan.canonicalized)
> }
> test("SortAggregate canonicalization") {
>   val data = Seq(("a", "a")).toDF("c0", "c1")
>   val df1 = data.groupBy(col("c0")).agg(first("c1"))
>   val df2 = data.groupBy(col("c0")).agg(first("c1"))
>   assert(df1.queryExecution.executedPlan.canonicalized ==
>   df2.queryExecution.executedPlan.canonicalized)
> } {code}
> The SortAggregate test fails with the following output .
> {code:java}
> SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
> #1])
> +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#105]
>   +- SortAggregate(key=[none#0], functions=[partial_first(none#1, 
> false)], output=[none#0, none#2, none#3])
>  +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
> +- *(1) Project [none#0 AS #0, none#1 AS #1]
>+- *(1) LocalTableScan [none#0, none#1]
>  did not equal 
> SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
> #1])
> +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#148]
>   +- SortAggregate(key=[none#0], functions=[partial_first(none#1, 
> false)], output=[none#0, none#2, none#3])
>  +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
> +- *(1) Project [none#0 AS #0, none#1 AS #1]
>+- *(1) LocalTableScan [none#0, none#1] {code}
> The error is caused by the resultExpression for the aggregate function being 
> assigned a new ExprId in the final aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33744) Canonicalization error in SortAggregate

2020-12-10 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-33744:
---
Affects Version/s: (was: 3.1.0)

> Canonicalization error in SortAggregate
> ---
>
> Key: SPARK-33744
> URL: https://issues.apache.org/jira/browse/SPARK-33744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Andy Grove
>Priority: Minor
>
> The canonicalization plan for a simple aggregate query is different each time 
> for SortAggregate but not for HashAggregate.
> The issue can be demonstrated by adding the following unit tests to 
> SQLQuerySuite. The HashAggregate test passes and the SortAggregate test fails.
> The first test has numeric input and the second test is operating on strings, 
> which forces the use of SortAggregate rather than HashAggregate.
> {code:java}
> test("HashAggregate canonicalization") {
>   val data = Seq((1, 1)).toDF("c0", "c1")
>   val df1 = data.groupBy(col("c0")).agg(first("c1"))
>   val df2 = data.groupBy(col("c0")).agg(first("c1"))
>   assert(df1.queryExecution.executedPlan.canonicalized ==
>   df2.queryExecution.executedPlan.canonicalized)
> }
> test("SortAggregate canonicalization") {
>   val data = Seq(("a", "a")).toDF("c0", "c1")
>   val df1 = data.groupBy(col("c0")).agg(first("c1"))
>   val df2 = data.groupBy(col("c0")).agg(first("c1"))
>   assert(df1.queryExecution.executedPlan.canonicalized ==
>   df2.queryExecution.executedPlan.canonicalized)
> } {code}
> The SortAggregate test fails with the following output .
> {code:java}
> SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
> #1])
> +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#105]
>   +- SortAggregate(key=[none#0], functions=[partial_first(none#1, 
> false)], output=[none#0, none#2, none#3])
>  +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
> +- *(1) Project [none#0 AS #0, none#1 AS #1]
>+- *(1) LocalTableScan [none#0, none#1]
>  did not equal 
> SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
> #1])
> +- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#148]
>   +- SortAggregate(key=[none#0], functions=[partial_first(none#1, 
> false)], output=[none#0, none#2, none#3])
>  +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
> +- *(1) Project [none#0 AS #0, none#1 AS #1]
>+- *(1) LocalTableScan [none#0, none#1] {code}
> The error is caused by the resultExpression for the aggregate function being 
> assigned a new ExprId in the final aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33744) Canonicalization error in SortAggregate

2020-12-10 Thread Andy Grove (Jira)
Andy Grove created SPARK-33744:
--

 Summary: Canonicalization error in SortAggregate
 Key: SPARK-33744
 URL: https://issues.apache.org/jira/browse/SPARK-33744
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Andy Grove


The canonicalization plan for a simple aggregate query is different each time 
for SortAggregate but not for HashAggregate.

The issue can be demonstrated by adding the following unit tests to 
SQLQuerySuite. The HashAggregate test passes and the SortAggregate test fails.

The first test has numeric input and the second test is operating on strings, 
which forces the use of SortAggregate rather than HashAggregate.
{code:java}
test("HashAggregate canonicalization") {
  val data = Seq((1, 1)).toDF("c0", "c1")
  val df1 = data.groupBy(col("c0")).agg(first("c1"))
  val df2 = data.groupBy(col("c0")).agg(first("c1"))
  assert(df1.queryExecution.executedPlan.canonicalized ==
  df2.queryExecution.executedPlan.canonicalized)
}

test("SortAggregate canonicalization") {
  val data = Seq(("a", "a")).toDF("c0", "c1")
  val df1 = data.groupBy(col("c0")).agg(first("c1"))
  val df2 = data.groupBy(col("c0")).agg(first("c1"))
  assert(df1.queryExecution.executedPlan.canonicalized ==
  df2.queryExecution.executedPlan.canonicalized)
} {code}
The SortAggregate test fails with the following output .
{code:java}
SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
#1])
+- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#105]
  +- SortAggregate(key=[none#0], functions=[partial_first(none#1, false)], 
output=[none#0, none#2, none#3])
 +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
+- *(1) Project [none#0 AS #0, none#1 AS #1]
   +- *(1) LocalTableScan [none#0, none#1]

 did not equal 

SortAggregate(key=[none#0], functions=[first(none#0, false)], output=[none#0, 
#1])
+- *(2) Sort [none#0 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(none#0, 5), ENSURE_REQUIREMENTS, [id=#148]
  +- SortAggregate(key=[none#0], functions=[partial_first(none#1, false)], 
output=[none#0, none#2, none#3])
 +- *(1) Sort [none#0 ASC NULLS FIRST], false, 0
+- *(1) Project [none#0 AS #0, none#1 AS #1]
   +- *(1) LocalTableScan [none#0, none#1] {code}
The error is caused by the resultExpression for the aggregate function being 
assigned a new ExprId in the final aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32671) Race condition in MapOutputTracker.getStatistics

2020-08-20 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved SPARK-32671.

Resolution: Invalid

I was mistaken about this issue.

> Race condition in MapOutputTracker.getStatistics
> 
>
> Key: SPARK-32671
> URL: https://issues.apache.org/jira/browse/SPARK-32671
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Andy Grove
>Priority: Major
>
> MapOutputTracker.getStatistics builds an array of partition sizes for a 
> shuffle id and in some cases uses multiple threads running in parallel to 
> update this array. This code is not thread-safe and the output is 
> non-deterministic when there are multiple MapStatus entries for the same 
> partition.
> We have unit tests such as the skewed join tests in AdaptiveQueryExecSuite 
> that depend on the output being deterministic, and intermittent failures in 
> these tests led me to track this bug down.
> The issue is trivial to fix by using an AtomicLong when building the array of 
> partition sizes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32671) Race condition in MapOutputTracker.getStatistics

2020-08-20 Thread Andy Grove (Jira)
Andy Grove created SPARK-32671:
--

 Summary: Race condition in MapOutputTracker.getStatistics
 Key: SPARK-32671
 URL: https://issues.apache.org/jira/browse/SPARK-32671
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0, 3.0.1
Reporter: Andy Grove


MapOutputTracker.getStatistics builds an array of partition sizes for a shuffle 
id and in some cases uses multiple threads running in parallel to update this 
array. This code is not thread-safe and the output is non-deterministic when 
there are multiple MapStatus entries for the same partition.

We have unit tests such as the skewed join tests in AdaptiveQueryExecSuite that 
depend on the output being deterministic, and intermittent failures in these 
tests led me to track this bug down.

The issue is trivial to fix by using an AtomicLong when building the array of 
partition sizes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-12-12 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994510#comment-16994510
 ] 

Andy Grove commented on SPARK-29640:


Closing this as not a bug. We have confirmed that this is due to the way 
certain EKS clusters are set up. The issue only happens when Spark is on the 
same node as a CoreDNS pod and only happens intermittently even then. We have 
experienced the same issue with applications other than Spark as well.

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>   at java.net.InetAddress.getAllByName(InetAddress.java:119

[jira] [Issue Comment Deleted] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-12-12 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-29640:
---
Comment: was deleted

(was: We were finally able to get to a root cause on this so I'm documenting it 
here in the hopes that it helps someone else in the future.

The issue was due to the way that routing was set up on our EKS clusters 
combined with the fact that we were using an NLB rather than ELB along with 
nginx ingress controllers.

Specifically, NLB does not support "hairpinning" as explained in 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html]

In layman's terms, if pod A tries to communicate with pod B, and both pods are 
on the same node and the request egresses from the node and is then routed back 
to the node via NLB and nginx controller then the request can never succeed and 
will time out.

Switching to an ELB resolves the issue but a better solution is to use cluster 
local addressing so that communicate between pods on the same nodes uses the 
local network.)

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:

[jira] [Commented] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-12-11 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994257#comment-16994257
 ] 

Andy Grove commented on SPARK-29640:


We were finally able to get to a root cause on this so I'm documenting it here 
in the hopes that it helps someone else in the future.

The issue was due to the way that routing was set up on our EKS clusters 
combined with the fact that we were using an NLB rather than ELB along with 
nginx ingress controllers.

Specifically, NLB does not support "hairpinning" as explained in 
[https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-troubleshooting.html]

In layman's terms, if pod A tries to communicate with pod B, and both pods are 
on the same node and the request egresses from the node and is then routed back 
to the node via NLB and nginx controller then the request can never succeed and 
will time out.

Switching to an ELB resolves the issue but a better solution is to use cluster 
local addressing so that communicate between pods on the same nodes uses the 
local network.

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(Kubern

[jira] [Resolved] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-12-11 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved SPARK-29640.

Resolution: Not A Bug

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>   at okhttp3.Dns$1.lookup(Dns.java:39)
>   at 
> okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
>   at 
> okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
>   at okhttp3.

[jira] [Reopened] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-12-04 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove reopened SPARK-29640:


Re-opening this as the issue came back and is ongoing. I am exploring other 
solutions and will post here if/when I find a solution that works.

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>   at okhttp3.Dns$1.lookup(Dns.java:39)
>   at 
> okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.

[jira] [Resolved] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-11-05 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved SPARK-29640.

Resolution: Not A Bug

Specifying TCP mode for DNS lookups did not help and this turned out to be an 
env issue caused by two nodes in the cluster. Deleting the nodes and creating 
new ones resolved it.

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1193)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1127)
>   at okhttp3.Dns$1.lookup(Dns.java:39)
>   at 
> okhttp3.internal.conn

[jira] [Commented] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-10-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963175#comment-16963175
 ] 

Andy Grove commented on SPARK-29640:


A hacky workaround is to wait for DNS to resolve before creating the Spark 
context:
{code:java}
def waitForDns(): Unit = {
  
  val host = "kubernetes.default.svc"

  println(s"Resolving $host ...")
  val t1 = System.currentTimeMillis()
  var attempts = 0
  while (System.currentTimeMillis() - t1 < 15000) {
try {
  attempts += 1
  val address = InetAddress.getByName(host)
  println(s"Resolved $host as ${address.getHostAddress()} after $attempts 
attempt(s)")
  return
} catch {
  case _: UnknownHostException =>
println(s"Failed to resolve $host due to UnknownHostException (attempt 
$attempts)")
Thread.sleep(100)
}
  }
} {code}

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in 
> Spark driver
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
> Fix For: 2.4.5
>
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" when trying to create executors. We are 
> running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External 
> scheduler cannot be instantiated
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
>   at org.apache.spark.SparkContext.(SparkContext.scala:493)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
>   at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
>   at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
> [get]  for kind: [Pod]  with name: 
> [wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
> [tenant-8-workflows]  failed.
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
>   at 
> io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
>   at 
> io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
>   at 
> org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
>   at 
> org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
>   ... 20 more
> Caused by: java.net.Unkno

[jira] [Updated] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-10-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-29640:
---
Description: 
We are running into intermittent DNS issues where the Spark driver fails to 
resolve "kubernetes.default.svc" when trying to create executors. We are 
running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS.

This happens approximately 10% of the time.

Here is the stack trace:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: External scheduler 
cannot be instantiated
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
[get]  for kind: [Pod]  with name: 
[wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
[tenant-8-workflows]  failed.
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 20 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at okhttp3.Dns$1.lookup(Dns.java:39)
at 
okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
at 
okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
at 
okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
at 
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at 
okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at 

[jira] [Updated] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver

2019-10-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-29640:
---
Description: 
We are running into intermittent DNS issues where the Spark driver fails to 
resolve "kubernetes.default.svc" when trying to create executors. We are 
running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode.

This happens approximately 10% of the time.

Here is the stack trace:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: External scheduler 
cannot be instantiated
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
at org.apache.spark.SparkContext.(SparkContext.scala:493)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
at 
org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: 
[get]  for kind: [Pod]  with name: 
[wf-5-69674f15d0fc45-1571354060179-driver]  in namespace: 
[tenant-8-workflows]  failed.
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
at 
io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
at 
io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55)
at 
org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
... 20 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at okhttp3.Dns$1.lookup(Dns.java:39)
at 
okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
at 
okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
at 
okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
at 
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
at 
okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
at 
okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at 
okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
at 
okhttp

[jira] [Commented] (SPARK-29640) [K8S] Make it possible to set DNS option to use TCP instead of UDP

2019-10-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963059#comment-16963059
 ] 

Andy Grove commented on SPARK-29640:


Installing node local dns cache daemon might be another workaround for this 
issue: 
https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-511750647

> [K8S] Make it possible to set DNS option to use TCP instead of UDP
> --
>
> Key: SPARK-29640
> URL: https://issues.apache.org/jira/browse/SPARK-29640
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Andy Grove
>Priority: Major
> Fix For: 2.4.5
>
>
> We are running into intermittent DNS issues where the Spark driver fails to 
> resolve "kubernetes.default.svc" and this seems to be caused by 
> [https://github.com/kubernetes/kubernetes/issues/76790]
> One suggested workaround is to specify TCP mode for DNS lookups in the pod 
> spec 
> ([https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-424498508]).
> I would like the ability to provide a flag to spark-submit to specify to use 
> TCP mode for DNS lookups.
> I am working on a PR for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29640) [K8S] Make it possible to set DNS option to use TCP instead of UDP

2019-10-29 Thread Andy Grove (Jira)
Andy Grove created SPARK-29640:
--

 Summary: [K8S] Make it possible to set DNS option to use TCP 
instead of UDP
 Key: SPARK-29640
 URL: https://issues.apache.org/jira/browse/SPARK-29640
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.4
Reporter: Andy Grove
 Fix For: 2.4.5


We are running into intermittent DNS issues where the Spark driver fails to 
resolve "kubernetes.default.svc" and this seems to be caused by 
[https://github.com/kubernetes/kubernetes/issues/76790]

One suggested workaround is to specify TCP mode for DNS lookups in the pod spec 
([https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-424498508]).

I would like the ability to provide a flag to spark-submit to specify to use 
TCP mode for DNS lookups.

I am working on a PR for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2019-09-09 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-28921:
---
Affects Version/s: 2.3.0
   2.3.1
   2.4.0
   2.4.1
   2.4.2
   2.4.4

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: Paul Schweigert
>Assignee: Andy Grove
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14

2019-09-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove resolved SPARK-28925.

Resolution: Duplicate

> Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 
> 1.14
> 
>
> Key: SPARK-28925
> URL: https://issues.apache.org/jira/browse/SPARK-28925
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Eric
>Priority: Minor
>
> Hello,
> If you use Spark with Kubernetes 1.13 or 1.14 you will see this error:
> {code:java}
> {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": 
> "org.apache.spark.internal.Logging", 
> "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to 
> request 1 executors from Kubernetes."}
> {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": 
> "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", 
> "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: 
> HTTP 403, Status: 403 - "}
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> {code}
> Apparently the bug is fixed here: 
> [https://github.com/fabric8io/kubernetes-client/pull/1669]
> We have currently compiled Spark source code with Kubernetes-client 4.4.2 and 
> it's working great on our cluster. We are using Kubernetes 1.13.10.
>  
> Could it be possible to update that dependency version?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2019-09-01 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920411#comment-16920411
 ] 

Andy Grove commented on SPARK-28921:


[~dongjoon] we are seeing it on both of the EKS clusters where we are running 
Spark jobs. I imagine it affects all EKS clusters?

The versions we are using are 1.11.10 and 1.12.10 .. full version info:
{code:java}
Server Version: version.Info{Major:"1", Minor:"11+", 
GitVersion:"v1.11.10-eks-7f15cc", 
GitCommit:"7f15ccb4e58f112866f7ddcfebf563f199558488", GitTreeState:"clean", 
BuildDate:"2019-08-19T17:46:02Z", GoVersion:"go1.12.9", Compiler:"gc", 
Platform:"linux/amd64"} {code}
{code:java}
Server Version: version.Info{Major:"1", Minor:"12+", 
GitVersion:"v1.12.10-eks-825e5d", 
GitCommit:"825e5de08cb05714f9b224cd6c47d9514df1d1a7", GitTreeState:"clean", 
BuildDate:"2019-08-18T03:58:32Z", GoVersion:"go1.12.9", Compiler:"gc", 
Platform:"linux/amd64"} {code}

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Major
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.12.10, 1.11.10)

2019-09-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-28921:
---
Summary: Spark jobs failing on latest versions of Kubernetes (1.15.3, 
1.14.6, 1,13.10, 1.12.10, 1.11.10)  (was: Spark jobs failing on latest versions 
of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.11.10))

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.12.10, 1.11.10)
> ---
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Major
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 1.11.10)

2019-09-01 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-28921:
---
Summary: Spark jobs failing on latest versions of Kubernetes (1.15.3, 
1.14.6, 1,13.10, 1.11.10)  (was: Spark jobs failing on latest versions of 
Kubernetes (1.15.3, 1.14.6, 1,13.10))

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10, 
> 1.11.10)
> --
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Major
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)

2019-08-31 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920133#comment-16920133
 ] 

Andy Grove commented on SPARK-28921:


Here's a PR to fix against master branch since it didn't automatically link to 
this JIRA: https://github.com/apache/spark/pull/25640

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
> -
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Critical
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28921) Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)

2019-08-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-28921:
---
Affects Version/s: 2.3.3

> Spark jobs failing on latest versions of Kubernetes (1.15.3, 1.14.6, 1,13.10)
> -
>
> Key: SPARK-28921
> URL: https://issues.apache.org/jira/browse/SPARK-28921
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Paul Schweigert
>Priority: Critical
>
> Spark jobs are failing on latest versions of Kubernetes when jobs attempt to 
> provision executor pods (jobs like Spark-Pi that do not launch executors run 
> without a problem):
>  
> Here's an example error message:
>  
> {code:java}
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.
> 19/08/30 01:29:09 INFO ExecutorPodsAllocator: Going to request 2 executors 
> from Kubernetes.19/08/30 01:29:09 WARN WatchConnectionManager: Exec Failure: 
> HTTP 403, Status: 403 - 
> java.net.ProtocolException: Expected HTTP 101 response but was '403 
> Forbidden' 
> at 
> okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216) 
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183) 
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141) 
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
> at java.lang.Thread.run(Thread.java:748)
> {code}
>  
> Looks like the issue is caused by fixes for a recent CVE : 
> CVE: [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14809]
> Fix: [https://github.com/fabric8io/kubernetes-client/pull/1669]
>  
> Looks like upgrading kubernetes-client to 4.4.2 would solve this issue.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14

2019-08-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919907#comment-16919907
 ] 

Andy Grove edited comment on SPARK-28925 at 8/30/19 9:54 PM:
-

This also impacts Spark 2.3.3 on EKS 1.11 due to security patches that were 
rolled out in the past week.
{code:java}
Server Version: version.Info{Major:"1", Minor:"11+", 
GitVersion:"v1.11.10-eks-7f15cc", 
GitCommit:"7f15ccb4e58f112866f7ddcfebf563f199558488", GitTreeState:"clean", 
BuildDate:"2019-08-19T17:46:02Z", GoVersion:"go1.12.9", Compiler:"gc", 
Platform:"linux/amd64"} {code}
 


was (Author: andygrove):
This also impacts Spark 2.3.3

> Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 
> 1.14
> 
>
> Key: SPARK-28925
> URL: https://issues.apache.org/jira/browse/SPARK-28925
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Eric
>Priority: Minor
>
> Hello,
> If you use Spark with Kubernetes 1.13 or 1.14 you will see this error:
> {code:java}
> {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": 
> "org.apache.spark.internal.Logging", 
> "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to 
> request 1 executors from Kubernetes."}
> {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": 
> "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", 
> "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: 
> HTTP 403, Status: 403 - "}
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> {code}
> Apparently the bug is fixed here: 
> [https://github.com/fabric8io/kubernetes-client/pull/1669]
> We have currently compiled Spark source code with Kubernetes-client 4.4.2 and 
> it's working great on our cluster. We are using Kubernetes 1.13.10.
>  
> Could it be possible to update that dependency version?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14

2019-08-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919907#comment-16919907
 ] 

Andy Grove edited comment on SPARK-28925 at 8/30/19 9:55 PM:
-

This also impacts Spark 2.3.3 on EKS 1.11 due to security patches that were 
rolled out in the past week.
{code:java}
Server Version: version.Info{Major:"1", Minor:"11+", 
GitVersion:"v1.11.10-eks-7f15cc", 
GitCommit:"7f15ccb4e58f112866f7ddcfebf563f199558488", GitTreeState:"clean", 
BuildDate:"2019-08-19T17:46:02Z", GoVersion:"go1.12.9", Compiler:"gc", 
Platform:"linux/amd64"} {code}
 I experimented with replacing {{kubernetes-client.jar}} with version 4.4.2 and 
it did resolve this issue, but caused other issues, so isn't a real option for 
a workaround for my use case.


was (Author: andygrove):
This also impacts Spark 2.3.3 on EKS 1.11 due to security patches that were 
rolled out in the past week.
{code:java}
Server Version: version.Info{Major:"1", Minor:"11+", 
GitVersion:"v1.11.10-eks-7f15cc", 
GitCommit:"7f15ccb4e58f112866f7ddcfebf563f199558488", GitTreeState:"clean", 
BuildDate:"2019-08-19T17:46:02Z", GoVersion:"go1.12.9", Compiler:"gc", 
Platform:"linux/amd64"} {code}
 

> Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 
> 1.14
> 
>
> Key: SPARK-28925
> URL: https://issues.apache.org/jira/browse/SPARK-28925
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Eric
>Priority: Minor
>
> Hello,
> If you use Spark with Kubernetes 1.13 or 1.14 you will see this error:
> {code:java}
> {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": 
> "org.apache.spark.internal.Logging", 
> "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to 
> request 1 executors from Kubernetes."}
> {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": 
> "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", 
> "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: 
> HTTP 403, Status: 403 - "}
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> {code}
> Apparently the bug is fixed here: 
> [https://github.com/fabric8io/kubernetes-client/pull/1669]
> We have currently compiled Spark source code with Kubernetes-client 4.4.2 and 
> it's working great on our cluster. We are using Kubernetes 1.13.10.
>  
> Could it be possible to update that dependency version?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14

2019-08-30 Thread Andy Grove (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-28925:
---
Affects Version/s: 2.3.3

> Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 
> 1.14
> 
>
> Key: SPARK-28925
> URL: https://issues.apache.org/jira/browse/SPARK-28925
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.3, 2.4.3
>Reporter: Eric
>Priority: Minor
>
> Hello,
> If you use Spark with Kubernetes 1.13 or 1.14 you will see this error:
> {code:java}
> {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": 
> "org.apache.spark.internal.Logging", 
> "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to 
> request 1 executors from Kubernetes."}
> {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": 
> "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", 
> "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: 
> HTTP 403, Status: 403 - "}
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> {code}
> Apparently the bug is fixed here: 
> [https://github.com/fabric8io/kubernetes-client/pull/1669]
> We have currently compiled Spark source code with Kubernetes-client 4.4.2 and 
> it's working great on our cluster. We are using Kubernetes 1.13.10.
>  
> Could it be possible to update that dependency version?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28925) Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 1.14

2019-08-30 Thread Andy Grove (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919907#comment-16919907
 ] 

Andy Grove commented on SPARK-28925:


This also impacts Spark 2.3.3

> Update Kubernetes-client to 4.4.2 to be compatible with Kubernetes 1.13 and 
> 1.14
> 
>
> Key: SPARK-28925
> URL: https://issues.apache.org/jira/browse/SPARK-28925
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.3
>Reporter: Eric
>Priority: Minor
>
> Hello,
> If you use Spark with Kubernetes 1.13 or 1.14 you will see this error:
> {code:java}
> {"time": "2019-08-28T09:56:11.866Z", "lvl":"INFO", "logger": 
> "org.apache.spark.internal.Logging", 
> "thread":"kubernetes-executor-snapshots-subscribers-0","msg":"Going to 
> request 1 executors from Kubernetes."}
> {"time": "2019-08-28T09:56:12.028Z", "lvl":"WARN", "logger": 
> "io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2", 
> "thread":"OkHttp https://kubernetes.default.svc/...","msg":"Exec Failure: 
> HTTP 403, Status: 403 - "}
> java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
> {code}
> Apparently the bug is fixed here: 
> [https://github.com/fabric8io/kubernetes-client/pull/1669]
> We have currently compiled Spark source code with Kubernetes-client 4.4.2 and 
> it's working great on our cluster. We are using Kubernetes 1.13.10.
>  
> Could it be possible to update that dependency version?
>  
> Thanks!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110785#comment-15110785
 ] 

Andy Grove commented on SPARK-12932:


Here is a pull request to change the error message: 
https://github.com/apache/spark/pull/10865

> Bad error message with trying to create Dataset from RDD of Java objects that 
> are not bean-compliant
> 
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Ubuntu 15.10 / Java 8
>Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java 
> API), I got the error "java.lang.UnsupportedOperationException: no encoder 
> found for example_java.dataset.Person". This is not a very helpful error and 
> no other logging information was apparent to help troubleshoot this.
> It turned out that the problem was that my Person class did not have a 
> default constructor and also did not have setter methods and that was the 
> root cause.
> This JIRA is for implementing a more usful error message to help Java 
> developers who are trying out the Dataset API for the first time.
> The full stack trace is:
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: no 
> encoder found for example_java.common.Person
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
>   at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
>   at org.apache.spark.sql.Encoders.bean(Encoder.scala)
> {code}
> NOTE that if I do provide EITHER the default constructor OR the setters, but 
> not both, then I get a stack trace with much more useful information, but 
> omitting BOTH causes this issue.
> The original source is below.
> {code:title=Example.java}
> public class JavaDatasetExample {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setAppName("Example")
> .setMaster("local[*]");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> SQLContext sqlContext = new SQLContext(sc);
> List people = ImmutableList.of(
> new Person("Joe", "Bloggs", 21, "NY")
> );
> Dataset dataset = sqlContext.createDataset(people, 
> Encoders.bean(Person.class));
> {code}
> {code:title=Person.java}
> class Person implements Serializable {
> String first;
> String last;
> int age;
> String state;
> public Person() {
> }
> public Person(String first, String last, int age, String state) {
> this.first = first;
> this.last = last;
> this.age = age;
> this.state = state;
> }
> public String getFirst() {
> return first;
> }
> public String getLast() {
> return last;
> }
> public int getAge() {
> return age;
> }
> public String getState() {
> return state;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-21 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110772#comment-15110772
 ] 

Andy Grove commented on SPARK-12932:


After reviewing the code for this, I think it is just a case of changing the 
error message text in 
`org.apache.spark.sql.catalyst.JavaTypeInference#extractorFor`. The error 
should be changed to:

{code}
s"Cannot infer type for Java class ${other.getName} because it is not 
bean-compliant"
{code}

> Bad error message with trying to create Dataset from RDD of Java objects that 
> are not bean-compliant
> 
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Ubuntu 15.10 / Java 8
>Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java 
> API), I got the error "java.lang.UnsupportedOperationException: no encoder 
> found for example_java.dataset.Person". This is not a very helpful error and 
> no other logging information was apparent to help troubleshoot this.
> It turned out that the problem was that my Person class did not have a 
> default constructor and also did not have setter methods and that was the 
> root cause.
> This JIRA is for implementing a more usful error message to help Java 
> developers who are trying out the Dataset API for the first time.
> The full stack trace is:
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: no 
> encoder found for example_java.common.Person
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
>   at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
>   at org.apache.spark.sql.Encoders.bean(Encoder.scala)
> {code}
> NOTE that if I do provide EITHER the default constructor OR the setters, but 
> not both, then I get a stack trace with much more useful information, but 
> omitting BOTH causes this issue.
> The original source is below.
> {code:title=Example.java}
> public class JavaDatasetExample {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setAppName("Example")
> .setMaster("local[*]");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> SQLContext sqlContext = new SQLContext(sc);
> List people = ImmutableList.of(
> new Person("Joe", "Bloggs", 21, "NY")
> );
> Dataset dataset = sqlContext.createDataset(people, 
> Encoders.bean(Person.class));
> {code}
> {code:title=Person.java}
> class Person implements Serializable {
> String first;
> String last;
> int age;
> String state;
> public Person() {
> }
> public Person(String first, String last, int age, String state) {
> this.first = first;
> this.last = last;
> this.age = age;
> this.state = state;
> }
> public String getFirst() {
> return first;
> }
> public String getLast() {
> return last;
> }
> public int getAge() {
> return age;
> }
> public String getState() {
> return state;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-20 Thread Andy Grove (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15109389#comment-15109389
 ] 

Andy Grove commented on SPARK-12932:


OK, I updated the JIRA with more info. Yes, I actually would like to have a go 
at providing a patch for this. It will be a good way for me to get more 
familiar with the internals of the Dataset API.

> Bad error message with trying to create Dataset from RDD of Java objects that 
> are not bean-compliant
> 
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Ubuntu 15.10 / Java 8
>Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java 
> API), I got the error "java.lang.UnsupportedOperationException: no encoder 
> found for example_java.dataset.Person". This is not a very helpful error and 
> no other logging information was apparent to help troubleshoot this.
> It turned out that the problem was that my Person class did not have a 
> default constructor and also did not have setter methods and that was the 
> root cause.
> This JIRA is for implementing a more usful error message to help Java 
> developers who are trying out the Dataset API for the first time.
> The full stack trace is:
> {code}
> Exception in thread "main" java.lang.UnsupportedOperationException: no 
> encoder found for example_java.common.Person
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
>   at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
>   at org.apache.spark.sql.Encoders.bean(Encoder.scala)
> {code}
> NOTE that if I do provide EITHER the default constructor OR the setters, but 
> not both, then I get a stack trace with much more useful information, but 
> omitting BOTH causes this issue.
> The original source is below.
> {code:title=Example.java}
> public class JavaDatasetExample {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setAppName("Example")
> .setMaster("local[*]");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> SQLContext sqlContext = new SQLContext(sc);
> List people = ImmutableList.of(
> new Person("Joe", "Bloggs", 21, "NY")
> );
> Dataset dataset = sqlContext.createDataset(people, 
> Encoders.bean(Person.class));
> {code}
> {code:title=Person.java}
> class Person implements Serializable {
> String first;
> String last;
> int age;
> String state;
> public Person() {
> }
> public Person(String first, String last, int age, String state) {
> this.first = first;
> this.last = last;
> this.age = age;
> this.state = state;
> }
> public String getFirst() {
> return first;
> }
> public String getLast() {
> return last;
> }
> public int getAge() {
> return age;
> }
> public String getState() {
> return state;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-20 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-12932:
---
Description: 
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:

{code}
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder 
found for example_java.common.Person
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
{code}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}


  was:
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:
{code}
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder 
found for example_java.common.Person
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
{code}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String stat

[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-20 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-12932:
---
Description: 
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:

{{Exception in thread "main" java.lang.UnsupportedOperationException: no 
encoder found for example_java.common.Person
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
}}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}


  was:
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:

{unformatted}
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder 
found for example_java.common.Person
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
{unformatted}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
Strin

[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-20 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-12932:
---
Description: 
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.

The full stack trace is:

{unformatted}
Exception in thread "main" java.lang.UnsupportedOperationException: no encoder 
found for example_java.common.Person
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$extractorFor(JavaTypeInference.scala:403)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.extractorsFor(JavaTypeInference.scala:314)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:75)
at org.apache.spark.sql.Encoders$.bean(Encoder.scala:176)
at org.apache.spark.sql.Encoders.bean(Encoder.scala)
{unformatted}

NOTE that if I do provide EITHER the default constructor OR the setters, but 
not both, then I get a stack trace with much more useful information, but 
omitting BOTH causes this issue.

The original source is below.

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}


  was:
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.


{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}



> Bad error message with trying to create Dataset from RDD of Java objects that 
> are not bean-compliant
> 
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
>  Issue Type: Bug
>  Compo

[jira] [Updated] (SPARK-12932) Bad error message with trying to create Dataset from RDD of Java objects that are not bean-compliant

2016-01-20 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-12932:
---
Description: 
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I got the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person". This is not a very helpful error and no other 
logging information was apparent to help troubleshoot this.

It turned out that the problem was that my Person class did not have a default 
constructor and also did not have setter methods and that was the root cause.

This JIRA is for implementing a more usful error message to help Java 
developers who are trying out the Dataset API for the first time.


{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}


  was:
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I get the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person".

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}


Summary: Bad error message with trying to create Dataset from RDD of 
Java objects that are not bean-compliant  (was: Cannot convert list of simple 
Java objects to Dataset (no encoder found))

> Bad error message with trying to create Dataset from RDD of Java objects that 
> are not bean-compliant
> 
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Ubuntu 15.10 / Java 8
>Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java 
> API), I got the error "java.lang.UnsupportedOperationException: no encoder 
> found for example_java.dataset.Person". This is not a very helpful error and 
> no other logging information was apparent to help troubleshoot this.
> It turned out that the problem was that my Person class did not have a 
> default constructor and also did not have setter methods and that was the 
> root cause.
> This JIRA is for implementing a more usful error message to help Java 
> developers who are trying out the Dataset API for the first time.
> {code:title=Example.java}
> public class JavaDatasetExample {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setAppName("Example")
> .setMaster("local[*]");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> SQLContext 

[jira] [Created] (SPARK-12932) Cannot convert list of simple Java objects to Dataset (no encoder found)

2016-01-20 Thread Andy Grove (JIRA)
Andy Grove created SPARK-12932:
--

 Summary: Cannot convert list of simple Java objects to Dataset (no 
encoder found)
 Key: SPARK-12932
 URL: https://issues.apache.org/jira/browse/SPARK-12932
 Project: Spark
  Issue Type: Bug
  Components: Java API
Affects Versions: 1.6.0
 Environment: Ubuntu 15.10 / Java 8
Reporter: Andy Grove


When trying to create a Dataset from an RDD of Person (all using the Java API), 
I get the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person".

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12932) Cannot convert list of simple Java objects to Dataset (no encoder found)

2016-01-20 Thread Andy Grove (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Grove updated SPARK-12932:
---
Description: 
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I get the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person".

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:title=Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}


  was:
When trying to create a Dataset from an RDD of Person (all using the Java API), 
I get the error "java.lang.UnsupportedOperationException: no encoder found for 
example_java.dataset.Person".

{code:title=Example.java}
public class JavaDatasetExample {

public static void main(String[] args) throws Exception {

SparkConf sparkConf = new SparkConf()
.setAppName("Example")
.setMaster("local[*]");

JavaSparkContext sc = new JavaSparkContext(sparkConf);

SQLContext sqlContext = new SQLContext(sc);

List people = ImmutableList.of(
new Person("Joe", "Bloggs", 21, "NY")
);

Dataset dataset = sqlContext.createDataset(people, 
Encoders.bean(Person.class));

{code}

{code:Person.java}
class Person implements Serializable {

String first;
String last;
int age;
String state;

public Person() {
}

public Person(String first, String last, int age, String state) {
this.first = first;
this.last = last;
this.age = age;
this.state = state;
}

public String getFirst() {
return first;
}

public String getLast() {
return last;
}

public int getAge() {
return age;
}

public String getState() {
return state;
}

}
{code}



> Cannot convert list of simple Java objects to Dataset (no encoder found)
> 
>
> Key: SPARK-12932
> URL: https://issues.apache.org/jira/browse/SPARK-12932
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 1.6.0
> Environment: Ubuntu 15.10 / Java 8
>Reporter: Andy Grove
>
> When trying to create a Dataset from an RDD of Person (all using the Java 
> API), I get the error "java.lang.UnsupportedOperationException: no encoder 
> found for example_java.dataset.Person".
> {code:title=Example.java}
> public class JavaDatasetExample {
> public static void main(String[] args) throws Exception {
> SparkConf sparkConf = new SparkConf()
> .setAppName("Example")
> .setMaster("local[*]");
> JavaSparkContext sc = new JavaSparkContext(sparkConf);
> SQLContext sqlContext = new SQLContext(sc);
> List people = ImmutableList.of(
> new Person("Joe", "Bloggs", 21, "NY")
> );
> Dataset dataset = sqlContext.createDataset(people, 
> Encoders.bean(Person.class));
> {code}
> {code:title=Person.java}
> class Person implements Serializable {
> String first;
> String last;
> int age;
> String state;
> public Person() {
> }
> public Person(String first, String last, int age, String state) {
> this.first = first;
> this.last = last;
> this.age = age;
> this.state = state;
> }
> public String getFirst() {
> return first;
> }
> public String getLast() {
> return last;
> }
> public int getAge() {
> return age;
> }
> public String getState() {
> return state;
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional c