[jira] [Resolved] (SPARK-37137) Inline type hints for python/pyspark/conf.py
[ https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37137. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34411 [https://github.com/apache/spark/pull/34411] > Inline type hints for python/pyspark/conf.py > > > Key: SPARK-37137 > URL: https://issues.apache.org/jira/browse/SPARK-37137 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: Byron Hsu >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37137) Inline type hints for python/pyspark/conf.py
[ https://issues.apache.org/jira/browse/SPARK-37137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37137: Assignee: Byron Hsu > Inline type hints for python/pyspark/conf.py > > > Key: SPARK-37137 > URL: https://issues.apache.org/jira/browse/SPARK-37137 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: Byron Hsu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37211: Assignee: (was: Apache Spark) > More descriptions and adding an image to the failure message about enabling > GitHub Actions > -- > > Key: SPARK-37211 > URL: https://issues.apache.org/jira/browse/SPARK-37211 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Yuto Akutsu >Priority: Minor > > I've seen and experienced that the build-and-test workflow of first-time PRs > fails and it was caused by developers forgetting to enable Github Actions on > their own repositories. > I think developers will be able to notice the cause quicker by adding more > descriptions and an image to the test-failure message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37211: Assignee: Apache Spark > More descriptions and adding an image to the failure message about enabling > GitHub Actions > -- > > Key: SPARK-37211 > URL: https://issues.apache.org/jira/browse/SPARK-37211 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Yuto Akutsu >Assignee: Apache Spark >Priority: Minor > > I've seen and experienced that the build-and-test workflow of first-time PRs > fails and it was caused by developers forgetting to enable Github Actions on > their own repositories. > I think developers will be able to notice the cause quicker by adding more > descriptions and an image to the test-failure message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-37211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439042#comment-17439042 ] Apache Spark commented on SPARK-37211: -- User 'yutoacts' has created a pull request for this issue: https://github.com/apache/spark/pull/34487 > More descriptions and adding an image to the failure message about enabling > GitHub Actions > -- > > Key: SPARK-37211 > URL: https://issues.apache.org/jira/browse/SPARK-37211 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Yuto Akutsu >Priority: Minor > > I've seen and experienced that the build-and-test workflow of first-time PRs > fails and it was caused by developers forgetting to enable Github Actions on > their own repositories. > I think developers will be able to notice the cause quicker by adding more > descriptions and an image to the test-failure message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37157) Inline type hints for python/pyspark/util.py
[ https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37157. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34438 [https://github.com/apache/spark/pull/34438] > Inline type hints for python/pyspark/util.py > > > Key: SPARK-37157 > URL: https://issues.apache.org/jira/browse/SPARK-37157 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37157) Inline type hints for python/pyspark/util.py
[ https://issues.apache.org/jira/browse/SPARK-37157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37157: Assignee: dch nguyen > Inline type hints for python/pyspark/util.py > > > Key: SPARK-37157 > URL: https://issues.apache.org/jira/browse/SPARK-37157 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Byron Hsu >Assignee: dch nguyen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37210) An error occurred while concurrently writing to different static partitions
[ https://issues.apache.org/jira/browse/SPARK-37210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Wang updated SPARK-37210: -- Description: An error occurred while concurrently writing to different static partitions. For writing to a static partition, committerOutputPath is the location path of the table. When multiple tasks write to the same table concurrently, the _temporary path will be deleted after one task ends, causing another task to fail. test code: {code:java} // code placeholder object HiveTests { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[*]") .appName("HiveTests") .enableHiveSupport() .getOrCreate() //rows val users1 = new util.ArrayList[Row]() users1.add(Row(1, "user1", "2021-11-03", 10)) users1.add(Row(2, "user2", "2021-11-03", 10)) users1.add(Row(3, "user3", "2021-11-03", 10)) //schema val structType = StructType(Array( StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("dt", StringType, true), StructField("hour", IntegerType, true) )) spark.sql("set hive.exec.dynamic.partition=true") spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") spark.sql("drop table if exists default.test") spark.sql( """ |create table if not exists default.test ( | id int, | name string) |partitioned by (dt string, hour int) |stored as parquet |""".stripMargin) spark.sql("desc formatted default.test").show() spark.sqlContext .createDataFrame(users1, structType) .select("id", "name") .createOrReplaceTempView("user1") val thread1 = new Thread(() => { spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-03', hour=10) select * from user1") }) thread1.start() val thread2 = new Thread(() => { spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-04', hour=10) select * from user1") }) thread2.start() thread1.join() thread2.join() spark.sql("select * from test").show() spark.stop() } } {code} error message: {code:java} // code placeholder 21/11/04 19:01:21 ERROR Utils: Aborting task ExitCodeException exitCode=1: chmod: cannot access '/data/spark-examples/spark-warehouse/test/_temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01_ 4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc': No such file or directoryat org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294) at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175) at org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74) at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:150) at org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:290) at org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(F
[jira] [Created] (SPARK-37211) More descriptions and adding an image to the failure message about enabling GitHub Actions
Yuto Akutsu created SPARK-37211: --- Summary: More descriptions and adding an image to the failure message about enabling GitHub Actions Key: SPARK-37211 URL: https://issues.apache.org/jira/browse/SPARK-37211 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 3.3.0 Reporter: Yuto Akutsu I've seen and experienced that the build-and-test workflow of first-time PRs fails and it was caused by developers forgetting to enable Github Actions on their own repositories. I think developers will be able to notice the cause quicker by adding more descriptions and an image to the test-failure message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37210) An error occurred while concurrently writing to different static partitions
[ https://issues.apache.org/jira/browse/SPARK-37210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439025#comment-17439025 ] Zhen Wang commented on SPARK-37210: --- The test code can be executed normally in spark 2.4.3. I noticed that spark 2.4.3 uses InsertIntoHiveTable, and spark 3.1.1 uses InsertIntoHadoopFsRelationCommand, is this a problem? > An error occurred while concurrently writing to different static partitions > --- > > Key: SPARK-37210 > URL: https://issues.apache.org/jira/browse/SPARK-37210 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.2.0 >Reporter: Zhen Wang >Priority: Major > > An error occurred while concurrently writing to different static partitions. > > test code: > > {code:java} > // code placeholder > object HiveTests { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder() > .master("local[*]") > .appName("HiveTests") > .enableHiveSupport() > .getOrCreate() > //rows > val users1 = new util.ArrayList[Row]() > users1.add(Row(1, "user1", "2021-11-03", 10)) > users1.add(Row(2, "user2", "2021-11-03", 10)) > users1.add(Row(3, "user3", "2021-11-03", 10)) > //schema > val structType = StructType(Array( > StructField("id", IntegerType, true), > StructField("name", StringType, true), > StructField("dt", StringType, true), > StructField("hour", IntegerType, true) > )) > spark.sql("set hive.exec.dynamic.partition=true") > spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") > spark.sql("drop table if exists default.test") > spark.sql( > """ > |create table if not exists default.test ( > | id int, > | name string) > |partitioned by (dt string, hour int) > |stored as parquet > |""".stripMargin) > spark.sql("desc formatted default.test").show() > spark.sqlContext > .createDataFrame(users1, structType) > .select("id", "name") > .createOrReplaceTempView("user1") > val thread1 = new Thread(() => { > spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-03', > hour=10) select * from user1") > }) > thread1.start() > val thread2 = new Thread(() => { > spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-04', > hour=10) select * from user1") > }) > thread2.start() > thread1.join() > thread2.join() > spark.sql("select * from test").show() > spark.stop() > } > } > {code} > > error message: > > {code:java} > // code placeholder > 21/11/04 19:01:21 ERROR Utils: Aborting task > ExitCodeException exitCode=1: chmod: cannot access > '/data/spark-examples/spark-warehouse/test/_temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01_ > 4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc': > No such file or directoryat > org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) > at org.apache.hadoop.util.Shell.run(Shell.java:901) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289) > at > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324) > at > org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294) > at > org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428) > at > org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521) > at > org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175) > at > org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74) > at > org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRec
[jira] [Created] (SPARK-37210) An error occurred while concurrently writing to different static partitions
Zhen Wang created SPARK-37210: - Summary: An error occurred while concurrently writing to different static partitions Key: SPARK-37210 URL: https://issues.apache.org/jira/browse/SPARK-37210 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.1.1 Reporter: Zhen Wang An error occurred while concurrently writing to different static partitions. test code: {code:java} // code placeholder object HiveTests { def main(args: Array[String]): Unit = { val spark = SparkSession .builder() .master("local[*]") .appName("HiveTests") .enableHiveSupport() .getOrCreate() //rows val users1 = new util.ArrayList[Row]() users1.add(Row(1, "user1", "2021-11-03", 10)) users1.add(Row(2, "user2", "2021-11-03", 10)) users1.add(Row(3, "user3", "2021-11-03", 10)) //schema val structType = StructType(Array( StructField("id", IntegerType, true), StructField("name", StringType, true), StructField("dt", StringType, true), StructField("hour", IntegerType, true) )) spark.sql("set hive.exec.dynamic.partition=true") spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") spark.sql("drop table if exists default.test") spark.sql( """ |create table if not exists default.test ( | id int, | name string) |partitioned by (dt string, hour int) |stored as parquet |""".stripMargin) spark.sql("desc formatted default.test").show() spark.sqlContext .createDataFrame(users1, structType) .select("id", "name") .createOrReplaceTempView("user1") val thread1 = new Thread(() => { spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-03', hour=10) select * from user1") }) thread1.start() val thread2 = new Thread(() => { spark.sql("INSERT OVERWRITE TABLE test PARTITION(dt = '2021-11-04', hour=10) select * from user1") }) thread2.start() thread1.join() thread2.join() spark.sql("select * from test").show() spark.stop() } } {code} error message: {code:java} // code placeholder 21/11/04 19:01:21 ERROR Utils: Aborting task ExitCodeException exitCode=1: chmod: cannot access '/data/spark-examples/spark-warehouse/test/_temporary/0/_temporary/attempt_202111041901182933014038999149736_0001_m_01_ 4/dt=2021-11-03/hour=10/.part-1-95895b03-45d2-4ac6-806b-b76fd1dfa3dc.c000.snappy.parquet.crc': No such file or directoryat org.apache.hadoop.util.Shell.runCommand(Shell.java:1008) at org.apache.hadoop.util.Shell.run(Shell.java:901) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1213) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307) at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:978) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:324) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.(RawLocalFileSystem.java:294) at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:439) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:428) at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:459) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:437) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:521) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175) at org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74) at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:329) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:150) at org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:290) at org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(FileFormatDataWriter.scala:357
[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7
[ https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438986#comment-17438986 ] Kousuke Saruta commented on SPARK-35496: [~dongjoon] Thank you for letting me know. That's great. > Upgrade Scala 2.13 to 2.13.7 > > > Key: SPARK-35496 > URL: https://issues.apache.org/jira/browse/SPARK-35496 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > This issue aims to upgrade to Scala 2.13.7. > Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). > However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 > which is different from both Scala 2.13.5 and Scala 3. > - https://github.com/scala/bug/issues/12403 > {code} > scala3-3.0.0:$ bin/scala > scala> Array.empty[Double].intersect(Array(0.0)) > val res0: Array[Double] = Array() > scala-2.13.6:$ bin/scala > Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292). > Type in expressions for evaluation. Or try :help. > scala> Array.empty[Double].intersect(Array(0.0)) > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D > ... 32 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7
[ https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438984#comment-17438984 ] Dongjoon Hyun commented on SPARK-35496: --- Hi, [~sarutak]. [~LuciferYang] already updated his PR. Please see the PR. > Upgrade Scala 2.13 to 2.13.7 > > > Key: SPARK-35496 > URL: https://issues.apache.org/jira/browse/SPARK-35496 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > This issue aims to upgrade to Scala 2.13.7. > Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). > However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 > which is different from both Scala 2.13.5 and Scala 3. > - https://github.com/scala/bug/issues/12403 > {code} > scala3-3.0.0:$ bin/scala > scala> Array.empty[Double].intersect(Array(0.0)) > val res0: Array[Double] = Array() > scala-2.13.6:$ bin/scala > Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292). > Type in expressions for evaluation. Or try :help. > scala> Array.empty[Double].intersect(Array(0.0)) > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D > ... 32 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36895) Add Create Index syntax support
[ https://issues.apache.org/jira/browse/SPARK-36895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438920#comment-17438920 ] Apache Spark commented on SPARK-36895: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/34486 > Add Create Index syntax support > --- > > Key: SPARK-36895 > URL: https://issues.apache.org/jira/browse/SPARK-36895 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type
[ https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438880#comment-17438880 ] Apache Spark commented on SPARK-37208: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/34485 > Support mapping Spark gpu/fpga resource types to custom YARN resource type > -- > > Key: SPARK-37208 > URL: https://issues.apache.org/jira/browse/SPARK-37208 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > Currently Spark supports gpu/fpga resource scheduling and specifically on > YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and > yarn.io/fpga. YARN also supports custom resource types and in Hadoop 3.3.1 > made it easier for users to plugin in custom resource types. This means users > may create a custom resource type that represents a GPU or FPGAs because they > want additional logic that YARN the built in versions don't have. Ideally > Spark users still just use the generic "gpu" or "fpga" types in Spark. So we > should add the ability to change the Spark internal mappings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type
[ https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37208: Assignee: Apache Spark > Support mapping Spark gpu/fpga resource types to custom YARN resource type > -- > > Key: SPARK-37208 > URL: https://issues.apache.org/jira/browse/SPARK-37208 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Apache Spark >Priority: Major > > Currently Spark supports gpu/fpga resource scheduling and specifically on > YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and > yarn.io/fpga. YARN also supports custom resource types and in Hadoop 3.3.1 > made it easier for users to plugin in custom resource types. This means users > may create a custom resource type that represents a GPU or FPGAs because they > want additional logic that YARN the built in versions don't have. Ideally > Spark users still just use the generic "gpu" or "fpga" types in Spark. So we > should add the ability to change the Spark internal mappings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type
[ https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37208: Assignee: (was: Apache Spark) > Support mapping Spark gpu/fpga resource types to custom YARN resource type > -- > > Key: SPARK-37208 > URL: https://issues.apache.org/jira/browse/SPARK-37208 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > Currently Spark supports gpu/fpga resource scheduling and specifically on > YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and > yarn.io/fpga. YARN also supports custom resource types and in Hadoop 3.3.1 > made it easier for users to plugin in custom resource types. This means users > may create a custom resource type that represents a GPU or FPGAs because they > want additional logic that YARN the built in versions don't have. Ideally > Spark users still just use the generic "gpu" or "fpga" types in Spark. So we > should add the ability to change the Spark internal mappings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31726) Make spark.files available in driver with cluster deploy mode on kubernetes
[ https://issues.apache.org/jira/browse/SPARK-31726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438862#comment-17438862 ] Martin Andersson edited comment on SPARK-31726 at 11/4/21, 6:10 PM: I've also run into this issue. Trying to simply include logging and application configuration using {{--files}}, but I suppose I'll have to include those in the docker image instead until this issue gets fixed. EDIT: I tried using the --jars option as well to include a logging config file as well. Files included in this fashion should also be added to the classpath, but that doesn't seem to be the case as well. was (Author: beregon87): I've also run into this issue. Trying to simply include logging and application configuration using {{--files}}, but I suppose I'll have to include those in the docker image instead until this issue gets fixed. > Make spark.files available in driver with cluster deploy mode on kubernetes > --- > > Key: SPARK-31726 > URL: https://issues.apache.org/jira/browse/SPARK-31726 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.0.0 >Reporter: koert kuipers >Priority: Minor > > currently on yarn with cluster deploy mode --files makes the files available > for driver and executors and also put them on classpath for driver and > executors. > on k8s with cluster deploy mode --files makes the files available on > executors but they are not on classpath. it does not make the files available > on driver and they are not on driver classpath. > it would be nice if the k8s behavior was consistent with yarn, or at least > makes the files available on driver. once the files are available there is a > simple workaround to get them on classpath using > spark.driver.extraClassPath="./" > background: > we recently started testing kubernetes for spark. our main platform is yarn > on which we use client deploy mode. our first experience was that client > deploy mode was difficult to use on k8s (we dont launch from inside a pod). > so we switched to cluster deploy mode, which seems to behave well on k8s. but > then we realized that our program rely on reading files on classpath > (application.conf, log4j.properties etc.) that are on the client but now are > no longer on the driver (since driver is no longer on client). an easy fix > for this seems to be to ship the files using --files to make them available > on driver, but we could not get this to work. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31726) Make spark.files available in driver with cluster deploy mode on kubernetes
[ https://issues.apache.org/jira/browse/SPARK-31726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438862#comment-17438862 ] Martin Andersson commented on SPARK-31726: -- I've also run into this issue. Trying to simply include logging and application configuration using {{--files}}, but I suppose I'll have to include those in the docker image instead until this issue gets fixed. > Make spark.files available in driver with cluster deploy mode on kubernetes > --- > > Key: SPARK-31726 > URL: https://issues.apache.org/jira/browse/SPARK-31726 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core >Affects Versions: 3.0.0 >Reporter: koert kuipers >Priority: Minor > > currently on yarn with cluster deploy mode --files makes the files available > for driver and executors and also put them on classpath for driver and > executors. > on k8s with cluster deploy mode --files makes the files available on > executors but they are not on classpath. it does not make the files available > on driver and they are not on driver classpath. > it would be nice if the k8s behavior was consistent with yarn, or at least > makes the files available on driver. once the files are available there is a > simple workaround to get them on classpath using > spark.driver.extraClassPath="./" > background: > we recently started testing kubernetes for spark. our main platform is yarn > on which we use client deploy mode. our first experience was that client > deploy mode was difficult to use on k8s (we dont launch from inside a pod). > so we switched to cluster deploy mode, which seems to behave well on k8s. but > then we realized that our program rely on reading files on classpath > (application.conf, log4j.properties etc.) that are on the client but now are > no longer on the driver (since driver is no longer on client). an easy fix > for this seems to be to ship the files using --files to make them available > on driver, but we could not get this to work. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
[ https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37209: - Attachment: success-unit-tests.log > YarnShuffleIntegrationSuite and other two similar cases in > `resource-managers` test failed > --- > > Key: SPARK-37209 > URL: https://issues.apache.org/jira/browse/SPARK-37209 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > Attachments: failed-unit-tests.log, success-unit-tests.log > > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will successful. > > Execute : > # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # Delete assembly/target/scala-2.12/jars manually > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > The error stack is : > {code:java} > 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: > User class threw exception: org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 4 times, > most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor > 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:216) > at > org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537) > at scala.collection.immutable.List.flatMap(List.scala:293) > at scala.collection.immutable.List.flatMap(List.scala:79) > at > org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535) > at > org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226) > at > org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102) > at > com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) > at > org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346) > at > org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266) > at > org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432) > at > org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76) > at > org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59) > at > org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83) > at > org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at > org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:5
[jira] [Updated] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
[ https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37209: - Attachment: failed-unit-tests.log > YarnShuffleIntegrationSuite and other two similar cases in > `resource-managers` test failed > --- > > Key: SPARK-37209 > URL: https://issues.apache.org/jira/browse/SPARK-37209 > Project: Spark > Issue Type: Bug > Components: Tests, YARN >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > Attachments: failed-unit-tests.log, success-unit-tests.log > > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will successful. > > Execute : > # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > Execute : > # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud > -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl > -Pkubernetes -Phive > # Delete assembly/target/scala-2.12/jars manually > # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn > -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive > -Pscala-2.13 -pl resource-managers/yarn > The test will failed. > > The error stack is : > {code:java} > 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: > User class threw exception: org.apache.spark.SparkException: Job aborted due > to stage failure: Task 0 in stage 0.0 failed 4 times, > most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor > 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:216) > at > org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537) > at scala.collection.immutable.List.flatMap(List.scala:293) > at scala.collection.immutable.List.flatMap(List.scala:79) > at > org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535) > at > org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502) > at > org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226) > at > org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102) > at > com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) > at > org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109) > at > org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346) > at > org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266) > at > org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432) > at > org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76) > at > org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59) > at > org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83) > at > org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) > at > org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70) > at > org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52
[jira] [Updated] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
[ https://issues.apache.org/jira/browse/SPARK-37209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-37209: - Description: Execute : # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl resource-managers/yarn The test will successful. Execute : # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl resource-managers/yarn The test will failed. Execute : # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive # Delete assembly/target/scala-2.12/jars manually # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl resource-managers/yarn The test will failed. The error stack is : {code:java} 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:216) at org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537) at scala.collection.immutable.List.flatMap(List.scala:293) at scala.collection.immutable.List.flatMap(List.scala:79) at org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535) at org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226) at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102) at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346) at org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266) at org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432) at org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76) at org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59) at org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83) at org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: breeze.linalg.Matrix at java.net.URLClassLoader.findClass(URLClassLoader.java:387) at java.lang.ClassLoader.loadClass
[jira] [Created] (SPARK-37209) YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed
Yang Jie created SPARK-37209: Summary: YarnShuffleIntegrationSuite and other two similar cases in `resource-managers` test failed Key: SPARK-37209 URL: https://issues.apache.org/jira/browse/SPARK-37209 Project: Spark Issue Type: Bug Components: Tests, YARN Affects Versions: 3.3.0 Reporter: Yang Jie Execute : # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl resource-managers/yarn The test will successful. Execute : # build/mvn clean -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive # build/mvn clean test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl resource-managers/yarn The test will failed. Execute : # build/mvn clean package -DskipTests -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive # Delete assembly/target/scala-2.12/jars manually # build/mvn test -Phadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl resource-managers/yarn The test will failed. The error stack is : {code:java} 21/11/04 19:48:52.159 main ERROR Client: Application diagnostics message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6) (localhost executor 1): java.lang.NoClassDefFoundError: breeze/linalg/Matrix at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:216) at org.apache.spark.serializer.KryoSerializer$.$anonfun$loadableSparkClasses$1(KryoSerializer.scala:537) at scala.collection.immutable.List.flatMap(List.scala:293) at scala.collection.immutable.List.flatMap(List.scala:79) at org.apache.spark.serializer.KryoSerializer$.loadableSparkClasses$lzycompute(KryoSerializer.scala:535) at org.apache.spark.serializer.KryoSerializer$.org$apache$spark$serializer$KryoSerializer$$loadableSparkClasses(KryoSerializer.scala:502) at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:226) at org.apache.spark.serializer.KryoSerializer$$anon$1.create(KryoSerializer.scala:102) at com.esotericsoftware.kryo.pool.KryoPoolQueueImpl.borrow(KryoPoolQueueImpl.java:48) at org.apache.spark.serializer.KryoSerializer$PoolWrapper.borrow(KryoSerializer.scala:109) at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:346) at org.apache.spark.serializer.KryoSerializationStream.(KryoSerializer.scala:266) at org.apache.spark.serializer.KryoSerializerInstance.serializeStream(KryoSerializer.scala:432) at org.apache.spark.shuffle.ShufflePartitionPairsWriter.open(ShufflePartitionPairsWriter.scala:76) at org.apache.spark.shuffle.ShufflePartitionPairsWriter.write(ShufflePartitionPairsWriter.scala:59) at org.apache.spark.util.collection.WritablePartitionedIterator.writeNext(WritablePartitionedPairCollection.scala:83) at org.apache.spark.util.collection.ExternalSorter.$anonfun$writePartitionedMapOutput$1(ExternalSorter.scala:772) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.util.collection.ExternalSorter.writePartitionedMapOutput(ExternalSorter.scala:775) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70) at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:507) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1468) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:510) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[jira] [Comment Edited] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files
[ https://issues.apache.org/jira/browse/SPARK-37198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438789#comment-17438789 ] Chuck Connell edited comment on SPARK-37198 at 11/4/21, 3:42 PM: - There are many hints/techtips on the Internet which say that {{[file://local_path|file://local_path/] }} already works to read and write local files from a Spark cluster. But in my testing (from Databricks) this is not true. I have never gotten it to work. If there is already a way to read/write local files, please say the exact, tested method to do so. was (Author: chconnell): There are many hints/techtips on the Internet which say that {{file://local_path }}already works to read and write local files from a Spark cluster. But in my testing (from Databricks) this is not true. I have never gotten it to work. If there is already a way to read/write local files, please say the exact, tested method to do so. > pyspark.pandas read_csv() and to_csv() should handle local files > - > > Key: SPARK-37198 > URL: https://issues.apache.org/jira/browse/SPARK-37198 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > Pandas programmers who move their code to Spark would like to import and > export text files to and from their local disk. I know there are technical > hurdles to this (since Spark is usually in a cluster that does not know where > your local computer is) but it would really help code migration. > For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or > something like this) should import and export to the local disk on Windows. > Similarly for Mac and Linux. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37198) pyspark.pandas read_csv() and to_csv() should handle local files
[ https://issues.apache.org/jira/browse/SPARK-37198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438789#comment-17438789 ] Chuck Connell commented on SPARK-37198: --- There are many hints/techtips on the Internet which say that {{file://local_path }}already works to read and write local files from a Spark cluster. But in my testing (from Databricks) this is not true. I have never gotten it to work. If there is already a way to read/write local files, please say the exact, tested method to do so. > pyspark.pandas read_csv() and to_csv() should handle local files > - > > Key: SPARK-37198 > URL: https://issues.apache.org/jira/browse/SPARK-37198 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > Pandas programmers who move their code to Spark would like to import and > export text files to and from their local disk. I know there are technical > hurdles to this (since Spark is usually in a cluster that does not know where > your local computer is) but it would really help code migration. > For read_csv() and to_csv(), the syntax {{*file://c:/Temp/my_file.csv* }}(or > something like this) should import and export to the local disk on Windows. > Similarly for Mac and Linux. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37203) Fix NotSerializableException when observe with TypedImperativeAggregate
[ https://issues.apache.org/jira/browse/SPARK-37203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37203: --- Summary: Fix NotSerializableException when observe with TypedImperativeAggregate (was: Fix NotSerializableException when observe with percentile_approx) > Fix NotSerializableException when observe with TypedImperativeAggregate > --- > > Key: SPARK-37203 > URL: https://issues.apache.org/jira/browse/SPARK-37203 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > {code:java} > val namedObservation = Observation("named") > val df = spark.range(100) > val observed_df = df.observe( >namedObservation, percentile_approx($"id", lit(0.5), > lit(100)).as("percentile_approx_val")) > observed_df.collect() > namedObservation.get > {code} > throws exception as follows: > {code:java} > 15:16:27.994 ERROR org.apache.spark.util.Utils: Exception encountered > java.io.NotSerializableException: > org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2(TaskResult.scala:55) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$2$adapted(TaskResult.scala:55) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at scala.collection.IterableLike.foreach$(IterableLike.scala:73) > at scala.collection.AbstractIterable.foreach(Iterable.scala:56) > at > org.apache.spark.scheduler.DirectTaskResult.$anonfun$writeExternal$1(TaskResult.scala:55) > at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1434) > at > org.apache.spark.scheduler.DirectTaskResult.writeExternal(TaskResult.scala:51) > at > java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1459) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1430) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:616) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type
[ https://issues.apache.org/jira/browse/SPARK-37208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438784#comment-17438784 ] Thomas Graves commented on SPARK-37208: --- Note, I'm working on this. > Support mapping Spark gpu/fpga resource types to custom YARN resource type > -- > > Key: SPARK-37208 > URL: https://issues.apache.org/jira/browse/SPARK-37208 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > Currently Spark supports gpu/fpga resource scheduling and specifically on > YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and > yarn.io/fpga. YARN also supports custom resource types and in Hadoop 3.3.1 > made it easier for users to plugin in custom resource types. This means users > may create a custom resource type that represents a GPU or FPGAs because they > want additional logic that YARN the built in versions don't have. Ideally > Spark users still just use the generic "gpu" or "fpga" types in Spark. So we > should add the ability to change the Spark internal mappings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37208) Support mapping Spark gpu/fpga resource types to custom YARN resource type
Thomas Graves created SPARK-37208: - Summary: Support mapping Spark gpu/fpga resource types to custom YARN resource type Key: SPARK-37208 URL: https://issues.apache.org/jira/browse/SPARK-37208 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.0.0 Reporter: Thomas Graves Currently Spark supports gpu/fpga resource scheduling and specifically on YARN it knows how to map gpu/fpga to the YARN resource types yarn.io/gpu and yarn.io/fpga. YARN also supports custom resource types and in Hadoop 3.3.1 made it easier for users to plugin in custom resource types. This means users may create a custom resource type that represents a GPU or FPGAs because they want additional logic that YARN the built in versions don't have. Ideally Spark users still just use the generic "gpu" or "fpga" types in Spark. So we should add the ability to change the Spark internal mappings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37038) Sample push down in DS v2
[ https://issues.apache.org/jira/browse/SPARK-37038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-37038: --- Assignee: Huaxin Gao > Sample push down in DS v2 > - > > Key: SPARK-37038 > URL: https://issues.apache.org/jira/browse/SPARK-37038 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37038) Sample push down in DS v2
[ https://issues.apache.org/jira/browse/SPARK-37038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-37038. - Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34451 [https://github.com/apache/spark/pull/34451] > Sample push down in DS v2 > - > > Key: SPARK-37038 > URL: https://issues.apache.org/jira/browse/SPARK-37038 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37207) Python API does not have isEmpty
[ https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438611#comment-17438611 ] Apache Spark commented on SPARK-37207: -- User 'dhirennavani' has created a pull request for this issue: https://github.com/apache/spark/pull/34484 > Python API does not have isEmpty > > > Key: SPARK-37207 > URL: https://issues.apache.org/jira/browse/SPARK-37207 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Dhiren Navani >Priority: Minor > > Python Dataframe API does not have isEmpty but Scala one does. > This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37207) Python API does not have isEmpty
[ https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438610#comment-17438610 ] Apache Spark commented on SPARK-37207: -- User 'dhirennavani' has created a pull request for this issue: https://github.com/apache/spark/pull/34484 > Python API does not have isEmpty > > > Key: SPARK-37207 > URL: https://issues.apache.org/jira/browse/SPARK-37207 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Dhiren Navani >Priority: Minor > > Python Dataframe API does not have isEmpty but Scala one does. > This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37207) Python API does not have isEmpty
[ https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438563#comment-17438563 ] Apache Spark commented on SPARK-37207: -- User 'dhirennavani' has created a pull request for this issue: https://github.com/apache/spark/pull/34483 > Python API does not have isEmpty > > > Key: SPARK-37207 > URL: https://issues.apache.org/jira/browse/SPARK-37207 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Dhiren Navani >Priority: Minor > > Python Dataframe API does not have isEmpty but Scala one does. > This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37207) Python API does not have isEmpty
[ https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438562#comment-17438562 ] Apache Spark commented on SPARK-37207: -- User 'dhirennavani' has created a pull request for this issue: https://github.com/apache/spark/pull/34483 > Python API does not have isEmpty > > > Key: SPARK-37207 > URL: https://issues.apache.org/jira/browse/SPARK-37207 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Dhiren Navani >Priority: Minor > > Python Dataframe API does not have isEmpty but Scala one does. > This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37207) Python API does not have isEmpty
[ https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37207: Assignee: (was: Apache Spark) > Python API does not have isEmpty > > > Key: SPARK-37207 > URL: https://issues.apache.org/jira/browse/SPARK-37207 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Dhiren Navani >Priority: Minor > > Python Dataframe API does not have isEmpty but Scala one does. > This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37207) Python API does not have isEmpty
[ https://issues.apache.org/jira/browse/SPARK-37207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37207: Assignee: Apache Spark > Python API does not have isEmpty > > > Key: SPARK-37207 > URL: https://issues.apache.org/jira/browse/SPARK-37207 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Dhiren Navani >Assignee: Apache Spark >Priority: Minor > > Python Dataframe API does not have isEmpty but Scala one does. > This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37207) Python API does not have isEmpty
Dhiren Navani created SPARK-37207: - Summary: Python API does not have isEmpty Key: SPARK-37207 URL: https://issues.apache.org/jira/browse/SPARK-37207 Project: Spark Issue Type: New Feature Components: PySpark Affects Versions: 3.2.0 Reporter: Dhiren Navani Python Dataframe API does not have isEmpty but Scala one does. This is to just add the api to Python code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35149) I am facing this issue regularly, how to fix this issue.
[ https://issues.apache.org/jira/browse/SPARK-35149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438538#comment-17438538 ] freedom1993 feng commented on SPARK-35149: -- how to solve this problem > I am facing this issue regularly, how to fix this issue. > > > Key: SPARK-35149 > URL: https://issues.apache.org/jira/browse/SPARK-35149 > Project: Spark > Issue Type: Question > Components: Spark Submit >Affects Versions: 2.2.2 >Reporter: Eppa Rakesh >Priority: Critical > > 21/04/19 21:02:11 WARN hdfs.DataStreamer: Exception for > BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 > java.io.EOFException: Unexpected EOF while trying to read response from > server > at > org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:448) > at > org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213) > at > org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1086) > 21/04/19 21:04:01 WARN hdfs.DataStreamer: Error Recovery for > BP-823308525-10.56.47.77-1544458538172:blk_1170699623_96969312 in pipeline > [DatanodeInfoWithStorage[10.34.39.42:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK], > > DatanodeInfoWithStorage[10.56.47.67:9866,DS-c28dab54-8fa0-4a49-80ec-345cc0cc52bd,DISK], > > DatanodeInfoWithStorage[10.56.47.55:9866,DS-79f5dd22-d0bc-4fe0-8e50-8a570779de17,DISK]]: > datanode > 0(DatanodeInfoWithStorage[10.56.47.36:9866,DS-0ad94d03-fa3f-486b-b204-3e8d2df91f17,DISK]) > is bad. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35496) Upgrade Scala 2.13 to 2.13.7
[ https://issues.apache.org/jira/browse/SPARK-35496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438535#comment-17438535 ] Kousuke Saruta commented on SPARK-35496: [~LuciferYang] Scala 2.13.7 was released a few days ago. https://github.com/scala/scala/releases/tag/v2.13.7 Would you like to continue to work on this? > Upgrade Scala 2.13 to 2.13.7 > > > Key: SPARK-35496 > URL: https://issues.apache.org/jira/browse/SPARK-35496 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > > This issue aims to upgrade to Scala 2.13.7. > Scala 2.13.6 released(https://github.com/scala/scala/releases/tag/v2.13.6). > However, we skip 2.13.6 because there is a breaking behavior change at 2.13.6 > which is different from both Scala 2.13.5 and Scala 3. > - https://github.com/scala/bug/issues/12403 > {code} > scala3-3.0.0:$ bin/scala > scala> Array.empty[Double].intersect(Array(0.0)) > val res0: Array[Double] = Array() > scala-2.13.6:$ bin/scala > Welcome to Scala 2.13.6 (OpenJDK 64-Bit Server VM, Java 1.8.0_292). > Type in expressions for evaluation. Or try :help. > scala> Array.empty[Double].intersect(Array(0.0)) > java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [D > ... 32 elided > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37180) PySpark.pandas should support __version__
[ https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37180. -- Resolution: Won't Fix > PySpark.pandas should support __version__ > - > > Key: SPARK-37180 > URL: https://issues.apache.org/jira/browse/SPARK-37180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say > {quote}pd.___version___ > {quote} > to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37180) PySpark.pandas should support __version__
[ https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438526#comment-17438526 ] Hyukjin Kwon commented on SPARK-37180: -- Yeah I think we don't need this. > PySpark.pandas should support __version__ > - > > Key: SPARK-37180 > URL: https://issues.apache.org/jira/browse/SPARK-37180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say > {quote}pd.___version___ > {quote} > to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37180) PySpark.pandas should support __version__
[ https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438522#comment-17438522 ] dch nguyen commented on SPARK-37180: As Koalas was merged into Pyspark, so Should pyspark.pandas.__version__ be aliased spark.version ? [~hyukjin.kwon] > PySpark.pandas should support __version__ > - > > Key: SPARK-37180 > URL: https://issues.apache.org/jira/browse/SPARK-37180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say > {quote}pd.___version___ > {quote} > to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37180) PySpark.pandas should support __version__
[ https://issues.apache.org/jira/browse/SPARK-37180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438522#comment-17438522 ] dch nguyen edited comment on SPARK-37180 at 11/4/21, 7:31 AM: -- As Koalas was merged into Pyspark, so Should pyspark.pandas.__version__ be aliased of spark.version ? [~hyukjin.kwon] was (Author: dchvn): As Koalas was merged into Pyspark, so Should pyspark.pandas.__version__ be aliased spark.version ? [~hyukjin.kwon] > PySpark.pandas should support __version__ > - > > Key: SPARK-37180 > URL: https://issues.apache.org/jira/browse/SPARK-37180 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Chuck Connell >Priority: Major > > In regular pandas you can say > {quote}pd.___version___ > {quote} > to get the pandas version number. PySpark pandas should support the same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org