[jira] [Resolved] (SPARK-37137) Inline type hints for python/pyspark/conf.py

2021-11-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37137.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34411
[https://github.com/apache/spark/pull/34411]

> Inline type hints for python/pyspark/conf.py
> 
>
> Key: SPARK-37137
> URL: https://issues.apache.org/jira/browse/SPARK-37137
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: Byron Hsu
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37137) Inline type hints for python/pyspark/conf.py

2021-11-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37137:


Assignee: Byron Hsu

> Inline type hints for python/pyspark/conf.py
> 
>
> Key: SPARK-37137
> URL: https://issues.apache.org/jira/browse/SPARK-37137
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: Byron Hsu
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions

2021-11-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37211:


Assignee: (was: Apache Spark)

> More descriptions and adding an image to the failure message about enabling 
> GitHub Actions
> --
>
> Key: SPARK-37211
> URL: https://issues.apache.org/jira/browse/SPARK-37211
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> I've seen and experienced that the build-and-test workflow of first-time PRs 
> fails and it was caused by developers forgetting to enable Github Actions on 
> their own repositories.
> I think developers will be able to notice the cause quicker by adding more 
> descriptions and an image to the test-failure message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions

2021-11-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37211:


Assignee: Apache Spark

> More descriptions and adding an image to the failure message about enabling 
> GitHub Actions
> --
>
> Key: SPARK-37211
> URL: https://issues.apache.org/jira/browse/SPARK-37211
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Yuto Akutsu
>Assignee: Apache Spark
>Priority: Minor
>
> I've seen and experienced that the build-and-test workflow of first-time PRs 
> fails and it was caused by developers forgetting to enable Github Actions on 
> their own repositories.
> I think developers will be able to notice the cause quicker by adding more 
> descriptions and an image to the test-failure message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439042#comment-17439042
 ] 

Apache Spark commented on SPARK-37211:
--

User 'yutoacts' has created a pull request for this issue:
https://github.com/apache/spark/pull/34487

> More descriptions and adding an image to the failure message about enabling 
> GitHub Actions
> --
>
> Key: SPARK-37211
> URL: https://issues.apache.org/jira/browse/SPARK-37211
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Yuto Akutsu
>Priority: Minor
>
> I've seen and experienced that the build-and-test workflow of first-time PRs 
> fails and it was caused by developers forgetting to enable Github Actions on 
> their own repositories.
> I think developers will be able to notice the cause quicker by adding more 
> descriptions and an image to the test-failure message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37157) Inline type hints for python/pyspark/util.py

2021-11-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37157.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34438
[https://github.com/apache/spark/pull/34438]

> Inline type hints for python/pyspark/util.py
> 
>
> Key: SPARK-37157
> URL: https://issues.apache.org/jira/browse/SPARK-37157
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37157) Inline type hints for python/pyspark/util.py

2021-11-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37157:


Assignee: dch nguyen

> Inline type hints for python/pyspark/util.py
> 
>
> Key: SPARK-37157
> URL: https://issues.apache.org/jira/browse/SPARK-37157
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Byron Hsu
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37210) An error occurred while concurrently writing to different static partitions

2021-11-04 Thread Zhen Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Wang updated SPARK-37210:
--
Description: 
An error occurred while concurrently writing to different static partitions.

For writing to a static partition, committerOutputPath is the location path of 
the table. When multiple tasks write to the same table concurrently, the 
_temporary path will be deleted after one task ends, causing another task to 
fail.

 

test code:

 
{code:java}
// code placeholder
object HiveTests {

  def main(args: Array[String]): Unit = {

val spark = SparkSession
  .builder()
  .master("local[*]")
  .appName("HiveTests")
  .enableHiveSupport()
  .getOrCreate()

//rows
val users1 = new util.ArrayList[Row]()
users1.add(Row(1, "user1", "2021-11-03", 10))
users1.add(Row(2, "user2", "2021-11-03", 10))
users1.add(Row(3, "user3", "2021-11-03", 10))

//schema
val structType = StructType(Array(
  StructField("id", IntegerType, true),
  StructField("name", StringType, true),
  StructField("dt", StringType, true),
  StructField("hour", IntegerType, true)
))

spark.sql("set hive.exec.dynamic.partition=true")
spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")

spark.sql("drop table if exists default.test")

spark.sql(
  """
|create table if not exists default.test (
|  id int,
|  name string)
|partitioned by (dt string, hour int)
|stored as parquet
|""".stripMargin)

spark.sql("desc formatted default.test").show()

spark.sqlContext
  .createDataFrame(users1, structType)
  .select("id", "name")
  .createOrReplaceTempView("user1")

val thread1 = new Thread(() => {
  spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-03', 
hour=10) select * from user1")
})
thread1.start()

val thread2 = new Thread(() => {
  spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-04', 
hour=10) select * from user1")
})
thread2.start()

thread1.join()
thread2.join()

spark.sql("select * from test").show()

spark.stop()

  }

}
{code}
 

error message:

 
{code:java}
// code placeholder

21/11/04 19:01:21 ERROR Utils: Aborting task
ExitCodeException exitCode=1: chmod: cannot access 
'/data/spark-examples/spark-warehouse/test/_temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01_
4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc':
 No such file or directoryat 
org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
at org.apache.hadoop.util.Shell.run(Shell.java:901)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294)
at 
org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at 
org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
at 
org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:150)
at 
org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:290)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(F

[jira] [Created] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions

2021-11-04 Thread Yuto Akutsu (Jira)
Yuto Akutsu created SPARK-37211:
---

 Summary: More descriptions and adding an image to the failure 
message about enabling GitHub Actions
 Key: SPARK-37211
 URL: https://issues.apache.org/jira/browse/SPARK-37211
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 3.3.0
Reporter: Yuto Akutsu


I've seen and experienced that the build-and-test workflow of first-time PRs 
fails and it was caused by developers forgetting to enable Github Actions on 
their own repositories.

I think developers will be able to notice the cause quicker by adding more 
descriptions and an image to the test-failure message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37210) An error occurred while concurrently writing to different static partitions

2021-11-04 Thread Zhen Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439025#comment-17439025
 ] 

Zhen Wang commented on SPARK-37210:
---

The test code can be executed normally in spark 2.4.3. I noticed that spark 
2.4.3 uses InsertIntoHiveTable, and spark 3.1.1 uses 
InsertIntoHadoopFsRelationCommand, is this a problem?

> An error occurred while concurrently writing to different static partitions
> ---
>
> Key: SPARK-37210
> URL: https://issues.apache.org/jira/browse/SPARK-37210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Zhen Wang
>Priority: Major
>
> An error occurred while concurrently writing to different static partitions.
>  
> test code:
>  
> {code:java}
> // code placeholder
> object HiveTests {
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession
>   .builder()
>   .master("local[*]")
>   .appName("HiveTests")
>   .enableHiveSupport()
>   .getOrCreate()
> //rows
> val users1 = new util.ArrayList[Row]()
> users1.add(Row(1, "user1", "2021-11-03", 10))
> users1.add(Row(2, "user2", "2021-11-03", 10))
> users1.add(Row(3, "user3", "2021-11-03", 10))
> //schema
> val structType = StructType(Array(
>   StructField("id", IntegerType, true),
>   StructField("name", StringType, true),
>   StructField("dt", StringType, true),
>   StructField("hour", IntegerType, true)
> ))
> spark.sql("set hive.exec.dynamic.partition=true")
> spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")
> spark.sql("drop table if exists default.test")
> spark.sql(
>   """
> |create table if not exists default.test (
> |  id int,
> |  name string)
> |partitioned by (dt string, hour int)
> |stored as parquet
> |""".stripMargin)
> spark.sql("desc formatted default.test").show()
> spark.sqlContext
>   .createDataFrame(users1, structType)
>   .select("id", "name")
>   .createOrReplaceTempView("user1")
> val thread1 = new Thread(() => {
>   spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-03', 
> hour=10) select * from user1")
> })
> thread1.start()
> val thread2 = new Thread(() => {
>   spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-04', 
> hour=10) select * from user1")
> })
> thread2.start()
> thread1.join()
> thread2.join()
> spark.sql("select * from test").show()
> spark.stop()
>   }
> }
> {code}
>  
> error message:
>  
> {code:java}
> // code placeholder
> 21/11/04 19:01:21 ERROR Utils: Aborting task
> ExitCodeException exitCode=1: chmod: cannot access 
> '/data/spark-examples/spark-warehouse/test/_temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01_
> 4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc':
>  No such file or directoryat 
> org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
> at org.apache.hadoop.util.Shell.run(Shell.java:901)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
> at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
> at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482)
> at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRec

[jira] [Created] (SPARK-37210) An error occurred while concurrently writing to different static partitions

2021-11-04 Thread Zhen Wang (Jira)
Zhen Wang created SPARK-37210:
-

 Summary: An error occurred while concurrently writing to different 
static partitions
 Key: SPARK-37210
 URL: https://issues.apache.org/jira/browse/SPARK-37210
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.1.1
Reporter: Zhen Wang


An error occurred while concurrently writing to different static partitions.

 

test code:

 
{code:java}
// code placeholder
object HiveTests {

  def main(args: Array[String]): Unit = {

val spark = SparkSession
  .builder()
  .master("local[*]")
  .appName("HiveTests")
  .enableHiveSupport()
  .getOrCreate()

//rows
val users1 = new util.ArrayList[Row]()
users1.add(Row(1, "user1", "2021-11-03", 10))
users1.add(Row(2, "user2", "2021-11-03", 10))
users1.add(Row(3, "user3", "2021-11-03", 10))

//schema
val structType = StructType(Array(
  StructField("id", IntegerType, true),
  StructField("name", StringType, true),
  StructField("dt", StringType, true),
  StructField("hour", IntegerType, true)
))

spark.sql("set hive.exec.dynamic.partition=true")
spark.sql("set hive.exec.dynamic.partition.mode=nonstrict")

spark.sql("drop table if exists default.test")

spark.sql(
  """
|create table if not exists default.test (
|  id int,
|  name string)
|partitioned by (dt string, hour int)
|stored as parquet
|""".stripMargin)

spark.sql("desc formatted default.test").show()

spark.sqlContext
  .createDataFrame(users1, structType)
  .select("id", "name")
  .createOrReplaceTempView("user1")

val thread1 = new Thread(() => {
  spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-03', 
hour=10) select * from user1")
})
thread1.start()

val thread2 = new Thread(() => {
  spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-04', 
hour=10) select * from user1")
})
thread2.start()

thread1.join()
thread2.join()

spark.sql("select * from test").show()

spark.stop()

  }

}
{code}
 

error message:

 
{code:java}
// code placeholder

21/11/04 19:01:21 ERROR Utils: Aborting task
ExitCodeException exitCode=1: chmod: cannot access 
'/data/spark-examples/spark-warehouse/test/_temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01_
4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc':
 No such file or directoryat 
org.apache.hadoop.util.Shell.runCommand(Shell.java:1008)
at org.apache.hadoop.util.Shell.run(Shell.java:901)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324)
at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294)
at 
org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428)
at 
org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521)
at 
org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at 
org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
at 
org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:150)
at 
org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:290)
at 
org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(FileFormatDataWriter.scala:357

[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7

2021-11-04 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438986#comment-17438986
 ] 

Kousuke Saruta commented on SPARK-35496:


[~dongjoon]
Thank you for letting me know. That's great.

> Upgrade Scala 2.13 to 2.13.7
> 
>
> Key: SPARK-35496
> URL: https://issues.apache.org/jira/browse/SPARK-35496
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> This issue aims to upgrade to Scala 2.13.7.
> Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). 
> However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 
> which is different from both Scala 2.13.5 and Scala 3.
> - https://github.com/scala/bug/issues/12403
> {code}
> scala3-3.0.0:$ bin/scala
> scala> Array.empty[Double].intersect(Array(0.0))
> val res0: Array[Double] = Array()
> scala-2.13.6:$ bin/scala
> Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292).
> Type in expressions for evaluation. Or try :help.
> scala> Array.empty[Double].intersect(Array(0.0))
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D
>   ... 32 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7

2021-11-04 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438984#comment-17438984
 ] 

Dongjoon Hyun commented on SPARK-35496:
---

Hi, [~sarutak]. [~LuciferYang] already updated his PR. Please see the PR.

> Upgrade Scala 2.13 to 2.13.7
> 
>
> Key: SPARK-35496
> URL: https://issues.apache.org/jira/browse/SPARK-35496
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> This issue aims to upgrade to Scala 2.13.7.
> Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). 
> However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 
> which is different from both Scala 2.13.5 and Scala 3.
> - https://github.com/scala/bug/issues/12403
> {code}
> scala3-3.0.0:$ bin/scala
> scala> Array.empty[Double].intersect(Array(0.0))
> val res0: Array[Double] = Array()
> scala-2.13.6:$ bin/scala
> Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292).
> Type in expressions for evaluation. Or try :help.
> scala> Array.empty[Double].intersect(Array(0.0))
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D
>   ... 32 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36895) Add Create Index syntax support

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438920#comment-17438920
 ] 

Apache Spark commented on SPARK-36895:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/34486

> Add Create Index syntax support
> ---
>
> Key: SPARK-36895
> URL: https://issues.apache.org/jira/browse/SPARK-36895
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438880#comment-17438880
 ] 

Apache Spark commented on SPARK-37208:
--

User 'tgravescs' has created a pull request for this issue:
https://github.com/apache/spark/pull/34485

> Support mapping Spark gpu/fpga resource types to custom YARN resource type
> --
>
> Key: SPARK-37208
> URL: https://issues.apache.org/jira/browse/SPARK-37208
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> Currently Spark supports gpu/fpga resource scheduling and specifically on 
> YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
> yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
> made it easier for users to plugin in custom resource types. This means users 
> may create a custom resource type that represents a GPU or FPGAs because they 
> want additional logic that YARN the built in versions don't have. Ideally 
> Spark users still just  use the generic "gpu" or "fpga" types in Spark. So we 
> should add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37208:


Assignee: Apache Spark

> Support mapping Spark gpu/fpga resource types to custom YARN resource type
> --
>
> Key: SPARK-37208
> URL: https://issues.apache.org/jira/browse/SPARK-37208
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Major
>
> Currently Spark supports gpu/fpga resource scheduling and specifically on 
> YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
> yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
> made it easier for users to plugin in custom resource types. This means users 
> may create a custom resource type that represents a GPU or FPGAs because they 
> want additional logic that YARN the built in versions don't have. Ideally 
> Spark users still just  use the generic "gpu" or "fpga" types in Spark. So we 
> should add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37208:


Assignee: (was: Apache Spark)

> Support mapping Spark gpu/fpga resource types to custom YARN resource type
> --
>
> Key: SPARK-37208
> URL: https://issues.apache.org/jira/browse/SPARK-37208
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> Currently Spark supports gpu/fpga resource scheduling and specifically on 
> YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
> yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
> made it easier for users to plugin in custom resource types. This means users 
> may create a custom resource type that represents a GPU or FPGAs because they 
> want additional logic that YARN the built in versions don't have. Ideally 
> Spark users still just  use the generic "gpu" or "fpga" types in Spark. So we 
> should add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31726) Make spark.files available in driver with cluster deploy mode on kubernetes

2021-11-04 Thread Martin Andersson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438862#comment-17438862
 ] 

Martin Andersson edited comment on SPARK-31726 at 11/4/21, 6:10 PM:


I've also run into this issue. Trying to simply include logging and application 
configuration using {{--files}}, but I suppose I'll have to include those in 
the docker image instead until this issue gets fixed.

EDIT: I tried using the --jars option as well to include a logging config file 
as well. Files included in this fashion should also be added to the classpath, 
but that doesn't seem to be the case as well.


was (Author: beregon87):
I've also run into this issue. Trying to simply include logging and application 
configuration using {{--files}}, but I suppose I'll have to include those in 
the docker image instead until this issue gets fixed.

> Make spark.files available in driver with cluster deploy mode on kubernetes
> ---
>
> Key: SPARK-31726
> URL: https://issues.apache.org/jira/browse/SPARK-31726
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: koert kuipers
>Priority: Minor
>
> currently on yarn with cluster deploy mode --files makes the files available 
> for driver and executors and also put them on classpath for driver and 
> executors.
> on k8s with cluster deploy mode --files makes the files available on 
> executors but they are not on classpath. it does not make the files available 
> on driver and they are not on driver classpath.
> it would be nice if the k8s behavior was consistent with yarn, or at least 
> makes the files available on driver. once the files are available there is a 
> simple workaround to get them on classpath using 
> spark.driver.extraClassPath="./"
> background:
> we recently started testing kubernetes for spark. our main platform is yarn 
> on which we use client deploy mode. our first experience was that client 
> deploy mode was difficult to use on k8s (we dont launch from inside a pod). 
> so we switched to cluster deploy mode, which seems to behave well on k8s. but 
> then we realized that our program rely on reading files on classpath 
> (application.conf, log4j.properties etc.) that are on the client but now are 
> no longer on the driver (since driver is no longer on client). an easy fix 
> for this seems to be to ship the files using --files to make them available 
> on driver, but we could not get this to work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31726) Make spark.files available in driver with cluster deploy mode on kubernetes

2021-11-04 Thread Martin Andersson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438862#comment-17438862
 ] 

Martin Andersson commented on SPARK-31726:
--

I've also run into this issue. Trying to simply include logging and application 
configuration using {{--files}}, but I suppose I'll have to include those in 
the docker image instead until this issue gets fixed.

> Make spark.files available in driver with cluster deploy mode on kubernetes
> ---
>
> Key: SPARK-31726
> URL: https://issues.apache.org/jira/browse/SPARK-31726
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: koert kuipers
>Priority: Minor
>
> currently on yarn with cluster deploy mode --files makes the files available 
> for driver and executors and also put them on classpath for driver and 
> executors.
> on k8s with cluster deploy mode --files makes the files available on 
> executors but they are not on classpath. it does not make the files available 
> on driver and they are not on driver classpath.
> it would be nice if the k8s behavior was consistent with yarn, or at least 
> makes the files available on driver. once the files are available there is a 
> simple workaround to get them on classpath using 
> spark.driver.extraClassPath="./"
> background:
> we recently started testing kubernetes for spark. our main platform is yarn 
> on which we use client deploy mode. our first experience was that client 
> deploy mode was difficult to use on k8s (we dont launch from inside a pod). 
> so we switched to cluster deploy mode, which seems to behave well on k8s. but 
> then we realized that our program rely on reading files on classpath 
> (application.conf, log4j.properties etc.) that are on the client but now are 
> no longer on the driver (since driver is no longer on client). an easy fix 
> for this seems to be to ship the files using --files to make them available 
> on driver, but we could not get this to work.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed

2021-11-04 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37209:
-
Attachment: success-unit-tests.log

> YarnShuffleIntegrationSuite  and other two similar cases in 
> `resource-managers` test failed
> ---
>
> Key: SPARK-37209
> URL: https://issues.apache.org/jira/browse/SPARK-37209
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: failed-unit-tests.log, success-unit-tests.log
>
>
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn
> The test will successful.
>  
> Execute :
>  # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
>  # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # Delete assembly/target/scala-2.12/jars manually
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> The error stack is :
> {code:java}
> 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: 
> User class threw exception: org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 4 times,
>  most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 
> 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
> at 
> org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537)
> at scala.collection.immutable.List.flatMap(List.scala:293)
> at scala.collection.immutable.List.flatMap(List.scala:79)
> at 
> org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535)
> at 
> org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502)
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226)
> at 
> org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
> at 
> com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
> at 
> org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
> at 
> org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432)
> at 
> org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76)
> at 
> org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59)
> at 
> org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83)
> at 
> org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772)
> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
> at 
> org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:5

[jira] [Updated] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed

2021-11-04 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37209:
-
Attachment: failed-unit-tests.log

> YarnShuffleIntegrationSuite  and other two similar cases in 
> `resource-managers` test failed
> ---
>
> Key: SPARK-37209
> URL: https://issues.apache.org/jira/browse/SPARK-37209
> Project: Spark
>  Issue Type: Bug
>  Components: Tests, YARN
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
> Attachments: failed-unit-tests.log, success-unit-tests.log
>
>
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn
> The test will successful.
>  
> Execute :
>  # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
>  # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> Execute :
>  # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
> -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
> -Pkubernetes -Phive
>  # Delete assembly/target/scala-2.12/jars manually
>  # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
> -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
> -Pscala-2.13 -pl resource-managers/yarn 
> The test will failed.
>  
> The error stack is :
> {code:java}
> 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: 
> User class threw exception: org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 4 times,
>  most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 
> 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
> at 
> org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537)
> at scala.collection.immutable.List.flatMap(List.scala:293)
> at scala.collection.immutable.List.flatMap(List.scala:79)
> at 
> org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535)
> at 
> org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502)
> at 
> org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226)
> at 
> org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
> at 
> com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
> at 
> org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
> at 
> org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266)
> at 
> org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432)
> at 
> org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76)
> at 
> org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59)
> at 
> org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83)
> at 
> org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772)
> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
> at 
> org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
> at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52

[jira] [Updated] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed

2021-11-04 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-37209:
-
Description: 
Execute :
 # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
-Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
-Pkubernetes -Phive
 # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13 -pl resource-managers/yarn

The test will successful.

 

Execute :
 # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
 # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13 -pl resource-managers/yarn 

The test will failed.

 

Execute :
 # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
-Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
-Pkubernetes -Phive
 # Delete assembly/target/scala-2.12/jars manually
 # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13 -pl resource-managers/yarn 

The test will failed.

 

The error stack is :
{code:java}
21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: User 
class threw exception: org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times,
 most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 
1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
at 
org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537)
at scala.collection.immutable.List.flatMap(List.scala:293)
at scala.collection.immutable.List.flatMap(List.scala:79)
at 
org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535)
at 
org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502)
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226)
at 
org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
at 
com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
at 
org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
at 
org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266)
at 
org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432)
at 
org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76)
at 
org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59)
at 
org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83)
at 
org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
at 
org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: breeze.linalg.Matrix
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass

[jira] [Created] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed

2021-11-04 Thread Yang Jie (Jira)
Yang Jie created SPARK-37209:


 Summary: YarnShuffleIntegrationSuite  and other two similar cases 
in `resource-managers` test failed
 Key: SPARK-37209
 URL: https://issues.apache.org/jira/browse/SPARK-37209
 Project: Spark
  Issue Type: Bug
  Components: Tests, YARN
Affects Versions: 3.3.0
Reporter: Yang Jie


Execute :
 # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
-Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
-Pkubernetes -Phive
 # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13 -pl resource-managers/yarn

The test will successful.

 

Execute :
 # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive
 # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13 -pl resource-managers/yarn 

The test will failed.

 

Execute :
 # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud 
-Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl 
-Pkubernetes -Phive
 # Delete assembly/target/scala-2.12/jars manually

 # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn 
-Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive 
-Pscala-2.13 -pl resource-managers/yarn 

The test will failed.

 

The error stack is :
{code:java}
21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: User 
class threw exception: org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times,
 most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 
1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
at 
org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537)
at scala.collection.immutable.List.flatMap(List.scala:293)
at scala.collection.immutable.List.flatMap(List.scala:79)
at 
org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535)
at 
org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502)
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226)
at 
org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102)
at 
com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48)
at 
org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109)
at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346)
at 
org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266)
at 
org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432)
at 
org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76)
at 
org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59)
at 
org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83)
at 
org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
at 
org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

[jira] [Comment Edited] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files

2021-11-04 Thread Chuck Connell (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438789#comment-17438789
 ] 

Chuck Connell edited comment on SPARK-37198 at 11/4/21, 3:42 PM:
-

There are many hints/techtips on the Internet which say that 
{{[file://local_path|file://local_path/] }} already works to read and write 
local files from a Spark cluster. But in my testing (from Databricks) this is 
not true. I have never gotten it to work.

If there is already a way to read/write local files, please say the exact, 
tested method to do so. 


was (Author: chconnell):
There are many hints/techtips on the Internet which say that 
{{file://local_path }}already works to read and write local files from a Spark 
cluster. But in my testing (from Databricks) this is not true. I have never 
gotten it to work.

If there is already a way to read/write local files, please say the exact, 
tested method to do so. 

> pyspark.pandas read_csv() and to_csv() should handle local files 
> -
>
> Key: SPARK-37198
> URL: https://issues.apache.org/jira/browse/SPARK-37198
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> Pandas programmers who move their code to Spark would like to import and 
> export text files to and from their local disk. I know there are technical 
> hurdles to this (since Spark is usually in a cluster that does not know where 
> your local computer is) but it would really help code migration. 
> For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or 
> something like this) should import and export to the local disk on Windows. 
> Similarly for Mac and Linux. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files

2021-11-04 Thread Chuck Connell (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438789#comment-17438789
 ] 

Chuck Connell commented on SPARK-37198:
---

There are many hints/techtips on the Internet which say that 
{{file://local_path }}already works to read and write local files from a Spark 
cluster. But in my testing (from Databricks) this is not true. I have never 
gotten it to work.

If there is already a way to read/write local files, please say the exact, 
tested method to do so. 

> pyspark.pandas read_csv() and to_csv() should handle local files 
> -
>
> Key: SPARK-37198
> URL: https://issues.apache.org/jira/browse/SPARK-37198
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> Pandas programmers who move their code to Spark would like to import and 
> export text files to and from their local disk. I know there are technical 
> hurdles to this (since Spark is usually in a cluster that does not know where 
> your local computer is) but it would really help code migration. 
> For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or 
> something like this) should import and export to the local disk on Windows. 
> Similarly for Mac and Linux. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37203) Fix NotSerializableException when observe with TypedImperativeAggregate

2021-11-04 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37203:
---
Summary: Fix NotSerializableException when observe with 
TypedImperativeAggregate  (was: Fix NotSerializableException when observe with 
percentile_approx)

> Fix NotSerializableException when observe with TypedImperativeAggregate
> ---
>
> Key: SPARK-37203
> URL: https://issues.apache.org/jira/browse/SPARK-37203
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> {code:java}
> val namedObservation = Observation("named")
> val df = spark.range(100)
> val observed_df = df.observe(
>namedObservation, percentile_approx($"id", lit(0.5), 
> lit(100)).as("percentile_approx_val"))
> observed_df.collect()
> namedObservation.get
> {code}
> throws exception as follows:
> {code:java}
> 15:16:27.994 ERROR org.apache.spark.util.Utils: Exception encountered
> java.io.NotSerializableException: 
> org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1434)
>   at 
> org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51)
>   at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:616)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438784#comment-17438784
 ] 

Thomas Graves commented on SPARK-37208:
---

Note, I'm working on this.

> Support mapping Spark gpu/fpga resource types to custom YARN resource type
> --
>
> Key: SPARK-37208
> URL: https://issues.apache.org/jira/browse/SPARK-37208
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> Currently Spark supports gpu/fpga resource scheduling and specifically on 
> YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
> yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
> made it easier for users to plugin in custom resource types. This means users 
> may create a custom resource type that represents a GPU or FPGAs because they 
> want additional logic that YARN the built in versions don't have. Ideally 
> Spark users still just  use the generic "gpu" or "fpga" types in Spark. So we 
> should add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type

2021-11-04 Thread Thomas Graves (Jira)
Thomas Graves created SPARK-37208:
-

 Summary: Support mapping Spark gpu/fpga resource types to custom 
YARN resource type
 Key: SPARK-37208
 URL: https://issues.apache.org/jira/browse/SPARK-37208
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 3.0.0
Reporter: Thomas Graves


Currently Spark supports gpu/fpga resource scheduling and specifically on YARN 
it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and 
yarn.io/fpga.  YARN also supports custom resource types and in Hadoop 3.3.1 
made it easier for users to plugin in custom resource types. This means users 
may create a custom resource type that represents a GPU or FPGAs because they 
want additional logic that YARN the built in versions don't have. Ideally Spark 
users still just  use the generic "gpu" or "fpga" types in Spark. So we should 
add the ability to change the Spark internal mappings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37038) Sample push down in DS v2

2021-11-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37038:
---

Assignee: Huaxin Gao

> Sample push down in DS v2
> -
>
> Key: SPARK-37038
> URL: https://issues.apache.org/jira/browse/SPARK-37038
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37038) Sample push down in DS v2

2021-11-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37038.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34451
[https://github.com/apache/spark/pull/34451]

> Sample push down in DS v2
> -
>
> Key: SPARK-37038
> URL: https://issues.apache.org/jira/browse/SPARK-37038
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438611#comment-17438611
 ] 

Apache Spark commented on SPARK-37207:
--

User 'dhirennavani' has created a pull request for this issue:
https://github.com/apache/spark/pull/34484

> Python API does not have isEmpty
> 
>
> Key: SPARK-37207
> URL: https://issues.apache.org/jira/browse/SPARK-37207
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Dhiren Navani
>Priority: Minor
>
> Python Dataframe API does not have isEmpty but Scala one does. 
> This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438610#comment-17438610
 ] 

Apache Spark commented on SPARK-37207:
--

User 'dhirennavani' has created a pull request for this issue:
https://github.com/apache/spark/pull/34484

> Python API does not have isEmpty
> 
>
> Key: SPARK-37207
> URL: https://issues.apache.org/jira/browse/SPARK-37207
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Dhiren Navani
>Priority: Minor
>
> Python Dataframe API does not have isEmpty but Scala one does. 
> This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438563#comment-17438563
 ] 

Apache Spark commented on SPARK-37207:
--

User 'dhirennavani' has created a pull request for this issue:
https://github.com/apache/spark/pull/34483

> Python API does not have isEmpty
> 
>
> Key: SPARK-37207
> URL: https://issues.apache.org/jira/browse/SPARK-37207
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Dhiren Navani
>Priority: Minor
>
> Python Dataframe API does not have isEmpty but Scala one does. 
> This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438562#comment-17438562
 ] 

Apache Spark commented on SPARK-37207:
--

User 'dhirennavani' has created a pull request for this issue:
https://github.com/apache/spark/pull/34483

> Python API does not have isEmpty
> 
>
> Key: SPARK-37207
> URL: https://issues.apache.org/jira/browse/SPARK-37207
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Dhiren Navani
>Priority: Minor
>
> Python Dataframe API does not have isEmpty but Scala one does. 
> This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37207:


Assignee: (was: Apache Spark)

> Python API does not have isEmpty
> 
>
> Key: SPARK-37207
> URL: https://issues.apache.org/jira/browse/SPARK-37207
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Dhiren Navani
>Priority: Minor
>
> Python Dataframe API does not have isEmpty but Scala one does. 
> This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37207:


Assignee: Apache Spark

> Python API does not have isEmpty
> 
>
> Key: SPARK-37207
> URL: https://issues.apache.org/jira/browse/SPARK-37207
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Dhiren Navani
>Assignee: Apache Spark
>Priority: Minor
>
> Python Dataframe API does not have isEmpty but Scala one does. 
> This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37207) Python API does not have isEmpty

2021-11-04 Thread Dhiren Navani (Jira)
Dhiren Navani created SPARK-37207:
-

 Summary: Python API does not have isEmpty
 Key: SPARK-37207
 URL: https://issues.apache.org/jira/browse/SPARK-37207
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.2.0
Reporter: Dhiren Navani


Python Dataframe API does not have isEmpty but Scala one does. 
This is to just add the api to Python code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35149) I am facing this issue regularly, how to fix this issue.

2021-11-04 Thread freedom1993 feng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438538#comment-17438538
 ] 

freedom1993 feng commented on SPARK-35149:
--

how to solve this problem

> I am facing this issue regularly, how to fix this issue.
> 
>
> Key: SPARK-35149
> URL: https://issues.apache.org/jira/browse/SPARK-35149
> Project: Spark
>  Issue Type: Question
>  Components: Spark Submit
>Affects Versions: 2.2.2
>Reporter: Eppa Rakesh
>Priority: Critical
>
> 21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for 
> BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312
>  java.io.EOFException: Unexpected EOF while trying to read response from 
> server
>  at 
> org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
>  at 
> org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086)
>  21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for 
> BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline 
> [DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK],
>  
> DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK],
>  
> DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]:
>  datanode 
> 0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK])
>  is bad.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7

2021-11-04 Thread Kousuke Saruta (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438535#comment-17438535
 ] 

Kousuke Saruta commented on SPARK-35496:


[~LuciferYang] Scala 2.13.7 was released a few days ago.
https://github.com/scala/scala/releases/tag/v2.13.7
Would you like to continue to work on this?

> Upgrade Scala 2.13 to 2.13.7
> 
>
> Key: SPARK-35496
> URL: https://issues.apache.org/jira/browse/SPARK-35496
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>
> This issue aims to upgrade to Scala 2.13.7.
> Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). 
> However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 
> which is different from both Scala 2.13.5 and Scala 3.
> - https://github.com/scala/bug/issues/12403
> {code}
> scala3-3.0.0:$ bin/scala
> scala> Array.empty[Double].intersect(Array(0.0))
> val res0: Array[Double] = Array()
> scala-2.13.6:$ bin/scala
> Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292).
> Type in expressions for evaluation. Or try :help.
> scala> Array.empty[Double].intersect(Array(0.0))
> java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D
>   ... 32 elided
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37180) PySpark.pandas should support __version__

2021-11-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37180.
--
Resolution: Won't Fix

> PySpark.pandas should support __version__
> -
>
> Key: SPARK-37180
> URL: https://issues.apache.org/jira/browse/SPARK-37180
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> In regular pandas you can say
> {quote}pd.___version___ 
> {quote}
> to get the pandas version number. PySpark pandas should support the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37180) PySpark.pandas should support __version__

2021-11-04 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438526#comment-17438526
 ] 

Hyukjin Kwon commented on SPARK-37180:
--

Yeah I think we don't need this.

> PySpark.pandas should support __version__
> -
>
> Key: SPARK-37180
> URL: https://issues.apache.org/jira/browse/SPARK-37180
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> In regular pandas you can say
> {quote}pd.___version___ 
> {quote}
> to get the pandas version number. PySpark pandas should support the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37180) PySpark.pandas should support __version__

2021-11-04 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438522#comment-17438522
 ] 

dch nguyen commented on SPARK-37180:


As Koalas was merged into Pyspark, so Should pyspark.pandas.__version__ be 
aliased spark.version ? [~hyukjin.kwon]

> PySpark.pandas should support __version__
> -
>
> Key: SPARK-37180
> URL: https://issues.apache.org/jira/browse/SPARK-37180
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> In regular pandas you can say
> {quote}pd.___version___ 
> {quote}
> to get the pandas version number. PySpark pandas should support the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37180) PySpark.pandas should support __version__

2021-11-04 Thread dch nguyen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438522#comment-17438522
 ] 

dch nguyen edited comment on SPARK-37180 at 11/4/21, 7:31 AM:
--

As Koalas was merged into Pyspark, so Should pyspark.pandas.__version__ be 
aliased of spark.version ? [~hyukjin.kwon]


was (Author: dchvn):
As Koalas was merged into Pyspark, so Should pyspark.pandas.__version__ be 
aliased spark.version ? [~hyukjin.kwon]

> PySpark.pandas should support __version__
> -
>
> Key: SPARK-37180
> URL: https://issues.apache.org/jira/browse/SPARK-37180
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Chuck Connell
>Priority: Major
>
> In regular pandas you can say
> {quote}pd.___version___ 
> {quote}
> to get the pandas version number. PySpark pandas should support the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org